Top

The Computer Games Journal

Published in:

05-11-2019 | Research

Efficient Exploration in Side-Scrolling Video Games with Trajectory Replay

Authors: I-Huan Chiang, Chung-Ming Huang, Nien-Hu Cheng, Hsin-Yu Liu, Shi-Chun Tsai

Published in: The Computer Games Journal | Issue 3/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Deep Reinforcement learning agent has outperformed human players in many games, such as the Atari 2600 games. In more complicated games, previous related works proposed a curiosity-driven exploration for learning. Nevertheless, it generally requires substantial computational resources to train the agent. We attempt to design a method to assist with our agent to explore the environment. By utilizing prior learned experience more effectively, we develop a new memory replay mechanism, which consists of two modules: Trajectory Replay Module to record the agent moving trajectory information with much less space, and the Trajectory Optimization Module to formulate the state information as a reward. We evaluate our approach with two popular side-scrolling video games: Super Mario Bros and Sonic the Hedgehog. The experiment results show that our method can help the agent explore the environment efficiently, pass through various tough scenarios and successfully reach the goal in most of the testing game levels with merely four workers and of ordinary CPU computational resources for training. The demo videos are at Super Mario Bros and Sonic the Hedgehog.

next article An Optimized Meta-heuristic Bees Algorithm for Players’ Frame Rate Allocation Problem in Cloud Gaming Environments

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. URL https://www.tensorflow.org/. Software available from tensorflow.org.

Arjona-Medina, J. A., Gillhofer, M., Widrich, M., Unterthiner, T., & Hochreiter, S. (2018). Rudder: Return decomposition for delayed rewards. arXiv preprint arXiv:1806.07857.

Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. In Advances in neural information processing systems (pp. 1471–1479).

de Bruin, T., Kober, J., Tuyls, K., & Babuška, R. (2015). The importance of experience replay database composition in deep reinforcement learning. In Deep reinforcement learning workshop, NIPS.

Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A.A. (2018). Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355.

Clark, J., & Amodei, D. (2016). Faulty reward functions in the wild. URL https://blog.openai.com/faulty-reward-functions/.

Clevert, D. A., Unterthiner, T., & Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289.

Conti, E., Madhavan, V., Such, F. P., Lehman, J., Stanley, K., & Clune, J. (2018). Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Advances in neural information processing systems (pp. 5027–5038).

Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K. O., & Clune, J. (2019). Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995.

Fu, J., Co-Reyes, J., & Levine, S. (2017). Ex2: Exploration with exemplar models for deep reinforcement learning. In Advances in neural information processing systems (pp. 2577–2587).

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.CrossRef

Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., & Abbeel, P. (2016). Curiosity-driven exploration in deep reinforcement learning via bayesian neural networks. arXiv preprint arxiv.1605.09674.

Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397.

Kelly, M. (2017). An introduction to trajectory optimization: How to do your own direct collocation. SIAM Review, 59(4), 849–904.MathSciNetCrossRef

Kimura, D., Chaudhury, S., Tachibana, R., & Dasgupta, S. (2018). Internal model from observations for reward shaping. arXiv preprint arXiv:1806.01267.

Kompella, V. R., Stollenga, M., Luciw, M., & Schmidhuber, J. (2017). Continual curiosity-driven skill acquisition from high-dimensional video inputs for humanoid robots. Artificial Intelligence, 247, 313–335.MathSciNetCrossRef

Liu, R., & Zou, J. (2017). The effects of memory replay in reinforcement learning. arXiv preprint arXiv:1710.06574.

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807–814).

Nichol, A., Pfau, V., Hesse, C., Klimov, O., & Schulman, J. (2018). Gotta learn fast: A new benchmark for generalization in rl. arXiv preprint arXiv:1804.03720.

Oh, J., Guo, Y., Singh, S., & Lee, H. (2018). Self-imitation learning. arXiv preprint arXiv:1806.05635.

OpenAI: Openai five. https://blog.openai.com/openai-five/ (2018).

Pardo, F., Levdik, V., & Kormushev, P. (2018). Goal-oriented trajectories for efficient exploration. arXiv preprint arXiv:1807.02078.

Pardo, F., Levdik, V., & Kormushev, P. (2018). Q-map: a convolutional approach for goal-oriented reinforcement learning. arXiv preprint arXiv:1810.02927.

Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning (ICML) (vol. 2017).

Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952.

Schulman, J., Klimov, O., Wolski, F., Dhariwal, P., & Radford, A. (2017). Proximal policy optimization. URL https://openai.com/blog/openai-baselines-ppo/

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Simonini, T. (2018). Sonic the hedgehog. in openai gym. github:simoninithomas/Deep\_reinforcement\_learning\_Course.

Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., & Fergus, R. (2017). Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407.

Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, O.X., Duan, Y., Schulman, J., DeTurck, F., & Abbeel, P. (2017). # exploration: A study of count-based exploration for deep reinforcement learning. In Advances in neural information processing systems (pp. 2753–2762).

Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W.M., Dudzik, A., Huang, A., Georgiev, P., Powell, R., Ewalds, T., Horgan, D., Kroiss, M., Danihelka, I., Agapiou, J., Oh, J., Dalibard, V., Choi, D., Sifre, L., Sulsky, Y., Vezhnevets, S., Molloy, J., Cai, T., Budden, D., Paine, T., Gulcehre, C., Wang, Z., Pfaff, T., Pohlen, T., Wu, Y., Yogatama, D., Cohen, J., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Apps, C., Kavukcuoglu, K., Hassabis, D., & Silver, D. (2019). Alphastar: Mastering the real-time strategy game starcraft ii. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/.

Title: Efficient Exploration in Side-Scrolling Video Games with Trajectory Replay
Authors: I-Huan Chiang
Chung-Ming Huang
Nien-Hu Cheng
Hsin-Yu Liu
Shi-Chun Tsai
Publication date: 05-11-2019
Publisher: Springer New York
Published in: The Computer Games Journal / Issue 3/2020
Electronic ISSN: 2052-773X
DOI: https://doi.org/10.1007/s40869-019-00089-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2020

Monitoring Simulated Worlds in Indigenous Strategy Games

CliNCare: An Educational Game—The Reasoning Behind the Graphic Choices and Their Impact on Player Opinions

Correction to: Hermeneutic Inquiry for Digital Games Research

An Optimized Meta-heuristic Bees Algorithm for Players’ Frame Rate Allocation Problem in Cloud Gaming Environments

A. Drachen, P. Mirza-Babaei and L. Nacke: Games User Research

Correction to: Revisiting the Twentieth Century Through the Lens of Generation X and Digital Games: A Scoping Review

Premium Partner