B. Edwards, “The ten greatest pc games ever,” http://www.pcworld.com/article/158850/ best pc games.html, 2009.
 M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” J. Artif. Intell. Res. (JAIR), vol. 47, pp. 253–279, 2013. [Online]. Available: http://dx.doi.org/10. 1613/jair.3912
 G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, andW. Zaremba, “Openai universe,” https://github.com/openai/universe, 2016.
 ——, “Openai gym,” CoRR, vol. abs/1606.01540, 2016. [Online]. Available: http://arxiv.org/abs/ 1606.01540
 M. Kempka, M. Wydmuch, G. Runc, J. Toczek, and W. Jaskowski, “Vizdoom: A doom-based AI research platform for visual reinforcement learning,” CoRR, vol. abs/1605.02097, 2016. [Online]. Available: http://arxiv.org/abs/1605. 02097
 V. Cerny and F. Dechterenko, “Rogue-like games as a playground for artificial intelligence– evolutionary approach,” in International Conference on Entertainment Computing. Springer, 2015, pp. 261–271.
 krajj7, “Bothack,” https://github.com/krajj7/ BotHack, 2015.
 R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, 1st ed. Cambridge, MA, USA: MIT Press, 1998.
 M.Wiering and J. Schmidhuber, “Solving pomdps with levin search and EIRA,” in Machine Learning, Proceedings of the Thirteenth International Conference (ICML ’96), Bari, Italy, July 3-6, 1996, L. Saitta, Ed. Morgan Kaufmann, 1996, pp. 534–542.
 D. Wierstra, A. F¨orster, J. Peters, and J. Schmidhuber, “Solving deep memory pomdps with recurrent policy gradients,” in Artificial Neural Networks - ICANN 2007, 17th International Conference, Porto, Portugal, September 9-13, 2007, Proceedings, Part I, ser. Lecture Notes in Computer Science, J. M. de S´a, L. A. Alexandre, W. Duch, and D. P. Mandic, Eds., vol. 4668. Springer, 2007, pp. 697–706.
 A. Tamar, S. Levine, and P. Abbeel, “Value iteration networks,” CoRR, vol. abs/1602.02867, 2016. [Online]. Available: http://arxiv.org/abs/ 1602.02867
 S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. [Online]. Available: https://doi.org/10.1162/neco.19188.8.131.525
 F. A. Gers, J. Schmidhuber, and F. A. Cummins, “Learning to forget: Continual prediction with LSTM,” Neural Computation, vol. 12, no. 10, pp. 2451–2471, 2000. [Online]. Available: https: //doi.org/10.1162/089976600300015015
 J. Chung, C¸ . G¨ulc¸ehre, K. Cho, and Y. Bengio, “Gated feedback recurrent neural networks,” in Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, ser. JMLR Workshop and Conference Proceedings, F. R. Bach and D. M. Blei, Eds., vol. 37. JMLR.org, 2015, pp. 2067–2075. [Online]. Available: http://jmlr.org/ proceedings/papers/v37/chung15.html
 M. J. Hausknecht and P. Stone, “Deep recurrent q-learning for partially observable mdps,” CoRR, vol. abs/1507.06527, 2015. [Online]. Available: http://arxiv.org/abs/1507.06527  V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. [Online]. Available: https://doi.org/10.1038/nature14236
 G. Lample and D. S. Chaplot, “Playing FPS games with deep reinforcement learning,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., S. P. Singh and S. Markovitch, Eds. AAAI Press, 2017, pp. 2140–2146. [Online]. Available: http://aaai.org/ ocs/index.php/AAAI/AAAI17/paper/view/14456
 M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015, pp. 2017–2025. [Online]. Available: http://papers.nips.cc/paper/ 5854-spatial-transformer-networks
 E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 640–651, 2017. [Online]. Available: https://doi.org/10.1109/TPAMI.2016.2572683
 M. McPartland and M. Gallagher, “Creating a multi-purpose first person shooter bot with reinforcement learning,” in Proceedings of the 2008 IEEE Symposium on Computational Intelligence and Games, CIG 2009, Perth, Australia, 15-18 December, 2008, P. Hingston and L. Barone, Eds. IEEE, 2008, pp. 143–150. [Online]. Available: https://doi.org/10.1109/CIG.2008.5035633
 B. Tastan, Y. Chang, and G. Sukthankar, “Learning to intercept opponents in first person shooter games,” in 2012 IEEE Conference on Computational Intelligence and Games, CIG 2012, Granada, Spain, September 11-14, 2012. IEEE, 2012, pp. 100–107. [Online]. Available: https://doi.org/10.1109/CIG.2012.6374144
 F. Chollet et al., “Keras,” https://github.com/ fchollet/keras, 2015.
 M. L. Mauldin, G. Jacobson, A. Appel, and L. Hamey, “Rog-o-matic: A belligerent expert system,” in Fifth Biennial Conference of the Canadian Society for Computational Studies of Intelligence, London Ontario, May 16, 1984., 1984.
 B. Harrison. Angband borg. [Online]. Available: http://www.thangorodrim.net/borg.html
 M. J. Hausknecht and P. Stone, “The impact of determinism on learning atari 2600 games,” in Learning for General Competency in Video Games, Papers from the 2015 AAAI Workshop, Austin, Texas, USA, January 26, 2015., ser. AAAIWorkshops, M. Bowling, M. G. Bellemare, E. Talvitie, J. Veness, and M. C. Machado, Eds., vol. WS-15-10. AAAI Press, 2015. [Online]. Available: http://aaai.org/ocs/index. php/WS/AAAIW15/paper/view/9564
 Theano Development Team, “Theano: A Python framework for fast computation of mathematical expressions,” arXiv e-prints, vol. abs/1605.02688, May 2016. [Online]. Available: http://arxiv.org/ abs/1605.02688
 M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi´egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. [Online]. Available: http://tensorflow.org/
 V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” CoRR, vol. abs/1602.01783, 2016. [Online]. Available: http://arxiv.org/abs/1602.01783