nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

17. Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model

verfasst von : Robert J. Smith, Malcolm I. Heywood

Erschienen in: Genetic Programming Theory and Practice XVII

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Reinforcement learning (RL) tasks have often assumed a Markov decision process, which is to say, state information is ‘complete’, hence there is no need to learn what to learn from. However, recent advances—such as visual reinforcement learning—have enabled the tasks typically addressed using RL to expand to include significant amounts of partial observability. This implies that the representation needs to support multiple forms of memory, thus credit assignment needs to: find efficient ways to encode high dimensional data, as well has, determining under what conditions to save and recall specific pieces of information, and for what purpose. In this work, we assume the tangled program graph (TPG) formulation for genetic programming, where this has already demonstrated competitiveness with deep learning solutions to multiple RL tasks (under complete information). In this work, TPG is augmented with indexed memory using a probabilistic formulation of a write operation (defines long and short term memory) and an indexed read. Moreover, the register information specific to the programs co-operating within a program is used to provide the low dimensional encoding of state. We demonstrate that TPG can then successfully evolve a behaviour for a hero bot in the Dota 2 game engine when playing in a single lane 1-on-1 configuration with the game engine hero bot as the opponent. Specific recommendations are made regarding the design of an appropriate fitness function. We show that TPG without indexed memory completely fails to learn any useful behaviour. Only with indexed memory are useful hero behaviours discovered.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel 2019 Evolutionary Algorithms Review

Nächstes Kapitel Modelling Genetic Programming as a Simple Sampling Algorithm

http://blog.dota2.com/?l=english.

https://openai.com/blog/how-to-train-your-openai-five/.

https://openai.com/blog/openai-baselines-ppo/.

The computational resources used by OpenAI are in the order of 180 years of gameplay per day.

https://dota2.gamepedia.com/Heroes.

LSTM is widely used as it addresses one of the potential pathologies of recurrent connectivity under gradient decent, that of vanishing gradients.

Conditional instructions could change this [6].

Indexed memory is initialized once at generation zero with NULL content.

Up to 20,000 features are available, however, we are only interested in the case of a 1-on-1 single lane configuration of the game (as opposed to 5 heroes per team over 3 lanes).

https://dota2.gamepedia.com/Shadow_Fiend.

https://dota2.gamepedia.com/Creep_control_techniques.

Heat map produced using [5].

Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: EvoApplications, LNCS, vol. 9597, pp. 19–34 (2016)

Aiyer, S.V.B., Niranjan, N., Fallside, F.: A theoretical investigation into the performance of the Hopfield model. IEEE Transactions on Neural Networks 15, 204–215 (1990)CrossRef

Andersson, B., Nordin, P., Nordahl, M.: Reactive and memory-based genetic programming for robot control. In: European Conference on Genetic Programming, LNCS, vol. 1598, pp. 161–172 (1999)

Andre, D.: Evolution of mapmaking ability: Strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994)

Babicki, S., Arndt, D., Marcu, A., Liang, Y., Grant, J.R., Maciejewski, A., Wishart, D.S.: Heatmapper: web-enabled heat mapping for all. Nucleic Acids Research (2016). http://www.heatmapper.ca/

Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer (2007)

Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996)

Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)CrossRef

Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014)

10.

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A.P., Hermann, K.M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)CrossRef

11.

Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28(10), 2222–2231 (2017)MathSciNetCrossRef

12.

Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016)

13.

Grossberg, S.: Content-addressable memory storage by neural networks: A general model and global Liapunov method. In: E.L. Schwartz (ed.) Computational Neuroscience, pp. 56–65. MIT Press (1990)

14.

Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: EvoApplications, LNCS, vol. 7835, pp. 529–539 (2013)

15.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)CrossRef

16.

Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996)

17.

Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)MathSciNetCrossRef

18.

Kelly, S., Banzhaf, W.: Temporal memory sharing in visual reinforcement learning. In: W. Banzhaf, E. Goodman, L. Sheneman, L. Trujillo, B. Worzel (eds.) Genetic Programming Theory and Practice, vol. XVII. Springer (2020)

19.

Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: European Conference on Genetic Programming, LNCS, vol. 10196, pp. 64–79 (2017)

20.

Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017)

21.

Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evolutionary Computation 26(3), 347–380 (2018)CrossRef

22.

Kelly, S., Smith, R.J., Heywood, M.I.: Emergent policy discovery for visual reinforcement learning through tangled program graphs: A tutorial. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice, vol. XVI, chap. 3, pp. 37–57. Springer (2019)

23.

Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic (1998)

24.

Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)

25.

Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research 61, 523–562 (2018)MathSciNetCrossRef

26.

Merrild, J., Rasmussen, M.A., Risi, S.: Hyperntm: Evolving scalable neural turing machines through hyperneat. In: EvoApplications, pp. 750–766 (2018)

27.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef

28.

Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press (1994)

29.

Poli, R., McPhee, N.F., Citi, L., Crane, E.: Memory with memory in genetic programming. Journal of Artificial Evolution and Applications (2009)

30.

Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2016)

31.

Sapienza, A., Peng, H., Ferrara, E.: Performance dynamics and success in online games. In: IEEE International Conference on Data Mining Workshops, pp. 902–909 (2017)

32.

Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming, LNCS, vol. 10781, pp. 135–150 (2018)

33.

Smith, R.J., Heywood, M.I.: Evolving Dota 2 Shadow Fiend bots using genetic programming with external memory. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference (2019)

34.

Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: European Conference on Genetic Programming, LNCS, vol. 11451, pp. 162–177 (2019)

35.

Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996)

36.

Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR abs/1712.06567 (2018)

37.

Teller, A.: The evolution of mental models. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press (1994)

38.

Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994)

39.

Wayne, G., Hung, C.C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwińska, A., Rae, J., Mirowski, P., Leibo, J.Z., Santoro, A., Gemici, M., Reynolds, M., Harley, T., Abramson, J., Mohamed, S., Rezende, D., Saxton, D., Cain, A., Hillier, C., Silver, D., Kavukcuoglu, K., Botvinick, M., Hasssbis, D., Lillicrap, T.: Unsupervised predictive memory in a goal-directed agent. CoRR abs/1803.10760 (2018)

40.

Wydmuch, M., Kempka, M., Jaśkowski, W.: ViZDoom competitions: Playing doom from pixels. IEEE Transactions on Games to appear (2019)

Titel: Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model
verfasst von: Robert J. Smith
Malcolm I. Heywood
Verlag: Springer International Publishing
Buch: Genetic Programming Theory and Practice XVII
Print ISBN: 978-3-030-39957-3

Electronic ISBN: 978-3-030-39958-0

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-39958-0_17

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner