Skip to main content

2020 | OriginalPaper | Buchkapitel

17. Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model

verfasst von : Robert J. Smith, Malcolm I. Heywood

Erschienen in: Genetic Programming Theory and Practice XVII

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Reinforcement learning (RL) tasks have often assumed a Markov decision process, which is to say, state information is ‘complete’, hence there is no need to learn what to learn from. However, recent advances—such as visual reinforcement learning—have enabled the tasks typically addressed using RL to expand to include significant amounts of partial observability. This implies that the representation needs to support multiple forms of memory, thus credit assignment needs to: find efficient ways to encode high dimensional data, as well has, determining under what conditions to save and recall specific pieces of information, and for what purpose. In this work, we assume the tangled program graph (TPG) formulation for genetic programming, where this has already demonstrated competitiveness with deep learning solutions to multiple RL tasks (under complete information). In this work, TPG is augmented with indexed memory using a probabilistic formulation of a write operation (defines long and short term memory) and an indexed read. Moreover, the register information specific to the programs co-operating within a program is used to provide the low dimensional encoding of state. We demonstrate that TPG can then successfully evolve a behaviour for a hero bot in the Dota 2 game engine when playing in a single lane 1-on-1 configuration with the game engine hero bot as the opponent. Specific recommendations are made regarding the design of an appropriate fitness function. We show that TPG without indexed memory completely fails to learn any useful behaviour. Only with indexed memory are useful hero behaviours discovered.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
4
The computational resources used by OpenAI are in the order of 180 years of gameplay per day.
 
6
LSTM is widely used as it addresses one of the potential pathologies of recurrent connectivity under gradient decent, that of vanishing gradients.
 
7
Conditional instructions could change this [6].
 
8
Indexed memory is initialized once at generation zero with NULL content.
 
9
Up to 20,000 features are available, however, we are only interested in the case of a 1-on-1 single lane configuration of the game (as opposed to 5 heroes per team over 3 lanes).
 
12
Heat map produced using [5].
 
Literatur
1.
Zurück zum Zitat Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: EvoApplications, LNCS, vol. 9597, pp. 19–34 (2016) Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: EvoApplications, LNCS, vol. 9597, pp. 19–34 (2016)
2.
Zurück zum Zitat Aiyer, S.V.B., Niranjan, N., Fallside, F.: A theoretical investigation into the performance of the Hopfield model. IEEE Transactions on Neural Networks 15, 204–215 (1990)CrossRef Aiyer, S.V.B., Niranjan, N., Fallside, F.: A theoretical investigation into the performance of the Hopfield model. IEEE Transactions on Neural Networks 15, 204–215 (1990)CrossRef
3.
Zurück zum Zitat Andersson, B., Nordin, P., Nordahl, M.: Reactive and memory-based genetic programming for robot control. In: European Conference on Genetic Programming, LNCS, vol. 1598, pp. 161–172 (1999) Andersson, B., Nordin, P., Nordahl, M.: Reactive and memory-based genetic programming for robot control. In: European Conference on Genetic Programming, LNCS, vol. 1598, pp. 161–172 (1999)
4.
Zurück zum Zitat Andre, D.: Evolution of mapmaking ability: Strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994) Andre, D.: Evolution of mapmaking ability: Strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994)
5.
Zurück zum Zitat Babicki, S., Arndt, D., Marcu, A., Liang, Y., Grant, J.R., Maciejewski, A., Wishart, D.S.: Heatmapper: web-enabled heat mapping for all. Nucleic Acids Research (2016). http://www.heatmapper.ca/ Babicki, S., Arndt, D., Marcu, A., Liang, Y., Grant, J.R., Maciejewski, A., Wishart, D.S.: Heatmapper: web-enabled heat mapping for all. Nucleic Acids Research (2016). http://​www.​heatmapper.​ca/​
6.
Zurück zum Zitat Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer (2007) Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer (2007)
7.
Zurück zum Zitat Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996) Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996)
8.
Zurück zum Zitat Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)CrossRef Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)CrossRef
9.
Zurück zum Zitat Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014) Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014)
10.
Zurück zum Zitat Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A.P., Hermann, K.M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)CrossRef Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A.P., Hermann, K.M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)CrossRef
11.
Zurück zum Zitat Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28(10), 2222–2231 (2017)MathSciNetCrossRef Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28(10), 2222–2231 (2017)MathSciNetCrossRef
12.
Zurück zum Zitat Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016) Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016)
13.
Zurück zum Zitat Grossberg, S.: Content-addressable memory storage by neural networks: A general model and global Liapunov method. In: E.L. Schwartz (ed.) Computational Neuroscience, pp. 56–65. MIT Press (1990) Grossberg, S.: Content-addressable memory storage by neural networks: A general model and global Liapunov method. In: E.L. Schwartz (ed.) Computational Neuroscience, pp. 56–65. MIT Press (1990)
14.
Zurück zum Zitat Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: EvoApplications, LNCS, vol. 7835, pp. 529–539 (2013) Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: EvoApplications, LNCS, vol. 7835, pp. 529–539 (2013)
15.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)CrossRef
16.
Zurück zum Zitat Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996) Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996)
17.
Zurück zum Zitat Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)MathSciNetCrossRef Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)MathSciNetCrossRef
18.
Zurück zum Zitat Kelly, S., Banzhaf, W.: Temporal memory sharing in visual reinforcement learning. In: W. Banzhaf, E. Goodman, L. Sheneman, L. Trujillo, B. Worzel (eds.) Genetic Programming Theory and Practice, vol. XVII. Springer (2020) Kelly, S., Banzhaf, W.: Temporal memory sharing in visual reinforcement learning. In: W. Banzhaf, E. Goodman, L. Sheneman, L. Trujillo, B. Worzel (eds.) Genetic Programming Theory and Practice, vol. XVII. Springer (2020)
19.
Zurück zum Zitat Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: European Conference on Genetic Programming, LNCS, vol. 10196, pp. 64–79 (2017) Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: European Conference on Genetic Programming, LNCS, vol. 10196, pp. 64–79 (2017)
20.
Zurück zum Zitat Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017) Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017)
21.
Zurück zum Zitat Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evolutionary Computation 26(3), 347–380 (2018)CrossRef Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evolutionary Computation 26(3), 347–380 (2018)CrossRef
22.
Zurück zum Zitat Kelly, S., Smith, R.J., Heywood, M.I.: Emergent policy discovery for visual reinforcement learning through tangled program graphs: A tutorial. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice, vol. XVI, chap. 3, pp. 37–57. Springer (2019) Kelly, S., Smith, R.J., Heywood, M.I.: Emergent policy discovery for visual reinforcement learning through tangled program graphs: A tutorial. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice, vol. XVI, chap. 3, pp. 37–57. Springer (2019)
23.
Zurück zum Zitat Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic (1998) Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic (1998)
24.
Zurück zum Zitat Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010) Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)
25.
Zurück zum Zitat Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research 61, 523–562 (2018)MathSciNetCrossRef Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research 61, 523–562 (2018)MathSciNetCrossRef
26.
Zurück zum Zitat Merrild, J., Rasmussen, M.A., Risi, S.: Hyperntm: Evolving scalable neural turing machines through hyperneat. In: EvoApplications, pp. 750–766 (2018) Merrild, J., Rasmussen, M.A., Risi, S.: Hyperntm: Evolving scalable neural turing machines through hyperneat. In: EvoApplications, pp. 750–766 (2018)
27.
Zurück zum Zitat Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef
28.
Zurück zum Zitat Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press (1994) Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press (1994)
29.
Zurück zum Zitat Poli, R., McPhee, N.F., Citi, L., Crane, E.: Memory with memory in genetic programming. Journal of Artificial Evolution and Applications (2009) Poli, R., McPhee, N.F., Citi, L., Crane, E.: Memory with memory in genetic programming. Journal of Artificial Evolution and Applications (2009)
30.
Zurück zum Zitat Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2016) Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2016)
31.
Zurück zum Zitat Sapienza, A., Peng, H., Ferrara, E.: Performance dynamics and success in online games. In: IEEE International Conference on Data Mining Workshops, pp. 902–909 (2017) Sapienza, A., Peng, H., Ferrara, E.: Performance dynamics and success in online games. In: IEEE International Conference on Data Mining Workshops, pp. 902–909 (2017)
32.
Zurück zum Zitat Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming, LNCS, vol. 10781, pp. 135–150 (2018) Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming, LNCS, vol. 10781, pp. 135–150 (2018)
33.
Zurück zum Zitat Smith, R.J., Heywood, M.I.: Evolving Dota 2 Shadow Fiend bots using genetic programming with external memory. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference (2019) Smith, R.J., Heywood, M.I.: Evolving Dota 2 Shadow Fiend bots using genetic programming with external memory. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference (2019)
34.
Zurück zum Zitat Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: European Conference on Genetic Programming, LNCS, vol. 11451, pp. 162–177 (2019) Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: European Conference on Genetic Programming, LNCS, vol. 11451, pp. 162–177 (2019)
35.
Zurück zum Zitat Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996) Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996)
36.
Zurück zum Zitat Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR abs/1712.06567 (2018) Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR abs/1712.06567 (2018)
37.
Zurück zum Zitat Teller, A.: The evolution of mental models. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press (1994) Teller, A.: The evolution of mental models. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press (1994)
38.
Zurück zum Zitat Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994) Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994)
39.
Zurück zum Zitat Wayne, G., Hung, C.C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwińska, A., Rae, J., Mirowski, P., Leibo, J.Z., Santoro, A., Gemici, M., Reynolds, M., Harley, T., Abramson, J., Mohamed, S., Rezende, D., Saxton, D., Cain, A., Hillier, C., Silver, D., Kavukcuoglu, K., Botvinick, M., Hasssbis, D., Lillicrap, T.: Unsupervised predictive memory in a goal-directed agent. CoRR abs/1803.10760 (2018) Wayne, G., Hung, C.C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwińska, A., Rae, J., Mirowski, P., Leibo, J.Z., Santoro, A., Gemici, M., Reynolds, M., Harley, T., Abramson, J., Mohamed, S., Rezende, D., Saxton, D., Cain, A., Hillier, C., Silver, D., Kavukcuoglu, K., Botvinick, M., Hasssbis, D., Lillicrap, T.: Unsupervised predictive memory in a goal-directed agent. CoRR abs/1803.10760 (2018)
40.
Zurück zum Zitat Wydmuch, M., Kempka, M., Jaśkowski, W.: ViZDoom competitions: Playing doom from pixels. IEEE Transactions on Games to appear (2019) Wydmuch, M., Kempka, M., Jaśkowski, W.: ViZDoom competitions: Playing doom from pixels. IEEE Transactions on Games to appear (2019)
Metadaten
Titel
Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model
verfasst von
Robert J. Smith
Malcolm I. Heywood
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-39958-0_17

Premium Partner