Skip to main content

2022 | OriginalPaper | Buchkapitel

25. IBPO: Solving 3D Strategy Game with the Intrinsic Reward

verfasst von : Huale Li, Rui Cao, Xiaohan Hou, Xuan Wang, Linlin Tang, Jiajia Zhang, Shuhan Qi

Erschienen in: Advances in Smart Vehicular Technology, Transportation, Communication and Applications

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, deep reinforcement learning achieves great success in many fields, especially in the field of games, such as AlphaGo, AlphaZero and AlphaStar. However, reward sparsity is still a problem in the 3D strategy games with a higher dimension of state space and more complex game scenarios. To solve this problem, in this paper, we propose an intrinsic-based policy optimization algorithm (IBPO) for reward sparsity. The IBPO incorporates the intrinsic reward into the traditional policy, which composed by the differential fusion mechanism and the modified value network. The experimental results show our method can obtain better performance than the previous methods on the VizDoom.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015) Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
2.
Zurück zum Zitat Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)CrossRef Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)CrossRef
3.
Zurück zum Zitat Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: Vizdoom: a doom-based ai research platform for visual reinforcement learning. In: Proceedings of the 2016 IEEE Conference on Computational Intelligence and Games, pp. 1–8. IEEE (2016) Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: Vizdoom: a doom-based ai research platform for visual reinforcement learning. In: Proceedings of the 2016 IEEE Conference on Computational Intelligence and Games, pp. 1–8. IEEE (2016)
4.
Zurück zum Zitat Shao, K., Zhu, Y., Zhao, D.: Starcraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans. Emerg. Topics Comput. Intell. 3(1), 73–84 (2018)CrossRef Shao, K., Zhu, Y., Zhao, D.: Starcraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans. Emerg. Topics Comput. Intell. 3(1), 73–84 (2018)CrossRef
5.
Zurück zum Zitat Vinyals, O., Babuschkin, I., Chung, J., M.: Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019) Vinyals, O., Babuschkin, I., Chung, J., M.: Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
6.
Zurück zum Zitat Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 3675–3683 (2016) Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 3675–3683 (2016)
7.
Zurück zum Zitat Song, S., Weng, J., Su, H., Yan, D., Zou, H., Zhu, J.: Playing fps games with environment-aware hierarchical reinforcement learning. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 3475–3482. AAAI Press (2019) Song, S., Weng, J., Su, H., Yan, D., Zou, H., Zhu, J.: Playing fps games with environment-aware hierarchical reinforcement learning. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 3475–3482. AAAI Press (2019)
8.
Zurück zum Zitat Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the International Conference on Machine Learning, pp. 2778–2787 (2017) Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the International Conference on Machine Learning, pp. 2778–2787 (2017)
9.
Zurück zum Zitat Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the International Conference on Machine Learning, pp. 1928–1937 (2016) Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the International Conference on Machine Learning, pp. 1928–1937 (2016)
10.
Zurück zum Zitat Zhang, S., Su, X., Jiang, X., Chen, M., Wu, T.: A traffic prediction method of bicycle-sharing based on long and short term memory network. J. Network Intell. 4(2), 17–29 (2019) Zhang, S., Su, X., Jiang, X., Chen, M., Wu, T.: A traffic prediction method of bicycle-sharing based on long and short term memory network. J. Network Intell. 4(2), 17–29 (2019)
11.
Zurück zum Zitat Wang, Y., Wang, J., Deng, C., Zhu, H., Wang, S.: L1–L2 norms based target representation for visual tracking. J. Network Intell. 3(2), 102–112 (2018) Wang, Y., Wang, J., Deng, C., Zhu, H., Wang, S.: L1–L2 norms based target representation for visual tracking. J. Network Intell. 3(2), 102–112 (2018)
12.
Zurück zum Zitat Lample, G., Chaplot, D.S.: Playing fps games with deep reinforcement learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 2140–2146 (2017) Lample, G., Chaplot, D.S.: Playing fps games with deep reinforcement learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 2140–2146 (2017)
13.
Zurück zum Zitat Dosovitskiy, A., Koltun, V.: Learning to act by predicting the future. In: Proceedings of the International Conference on Learning Representations, pp. 637–645 (2017) Dosovitskiy, A., Koltun, V.: Learning to act by predicting the future. In: Proceedings of the International Conference on Learning Representations, pp. 637–645 (2017)
Metadaten
Titel
IBPO: Solving 3D Strategy Game with the Intrinsic Reward
verfasst von
Huale Li
Rui Cao
Xiaohan Hou
Xuan Wang
Linlin Tang
Jiajia Zhang
Shuhan Qi
Copyright-Jahr
2022
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-4039-1_25

    Premium Partner