Skip to main content

2024 | OriginalPaper | Buchkapitel

Evolving Reservoirs for Meta Reinforcement Learning

verfasst von : Corentin Léger, Gautier Hamon, Eleni Nisioti, Xavier Hinaut, Clément Moulin-Frier

Erschienen in: Applications of Evolutionary Computation

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Animals often demonstrate a remarkable ability to adapt to their environments during their lifetime. They do so partly due to the evolution of morphological and neural structures. These structures capture features of environments shared between generations to bias and speed up lifetime learning. In this work, we propose a computational model for studying a mechanism that can enable such a process. We adopt a computational framework based on meta reinforcement learning as a model of the interplay between evolution and development. At the evolutionary scale, we evolve reservoirs, a family of recurrent neural networks that differ from conventional networks in that one optimizes not the synaptic weights, but hyperparameters controlling macro-level properties of the resulting network architecture. At the developmental scale, we employ these evolved reservoirs to facilitate the learning of a behavioral policy through Reinforcement Learning (RL). Within an RL agent, a reservoir encodes the environment state before providing it to an action policy. We evaluate our approach on several 2D and 3D simulated environments. Our results show that the evolution of reservoirs can improve the learning of diverse challenging tasks. We study in particular three hypotheses: the use of an architecture combining reservoirs and reinforcement learning could enable (1) solving tasks with partial observability, (2) generating oscillatory dynamics that facilitate the learning of locomotion tasks, and (3) facilitating the generalization of learned behaviors to new tasks unknown during the evolution phase.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019) Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631 (2019)
2.
Zurück zum Zitat Bäck, T., Schwefel, H.P.: An overview of evolutionary algorithms for parameter optimization. Evol. Comput. 1(1), 1–23 (1993)CrossRef Bäck, T., Schwefel, H.P.: An overview of evolutionary algorithms for parameter optimization. Evol. Comput. 1(1), 1–23 (1993)CrossRef
5.
Zurück zum Zitat Bertschinger, N., Natschläger, T.: Real-time computation at the edge of chaos in recurrent neural networks. Neural Comput. 16(7), 1413–1436 (2004)CrossRef Bertschinger, N., Natschläger, T.: Real-time computation at the edge of chaos in recurrent neural networks. Neural Comput. 16(7), 1413–1436 (2004)CrossRef
6.
Zurück zum Zitat Chang, H., Futagami, K.: Reinforcement learning with convolutional reservoir computing. Appl. Intell. 50, 2400–2410 (2020)CrossRef Chang, H., Futagami, K.: Reinforcement learning with convolutional reservoir computing. Appl. Intell. 50, 2400–2410 (2020)CrossRef
7.
Zurück zum Zitat Chang, H.H., Song, H., Yi, Y., Zhang, J., He, H., Liu, L.: Distributive dynamic spectrum access through deep reinforcement learning: a reservoir computing-based approach. IEEE Internet Things J. 6(2), 1938–1948 (2018)CrossRef Chang, H.H., Song, H., Yi, Y., Zhang, J., He, H., Liu, L.: Distributive dynamic spectrum access through deep reinforcement learning: a reservoir computing-based approach. IEEE Internet Things J. 6(2), 1938–1948 (2018)CrossRef
8.
Zurück zum Zitat Clune, J.: Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv preprint arXiv:1905.10985 (2019) Clune, J.: Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv preprint arXiv:​1905.​10985 (2019)
9.
Zurück zum Zitat Doya, K.: Reinforcement learning: computational theory and biological mechanisms. HFSP J. 1(1), 30 (2007)CrossRef Doya, K.: Reinforcement learning: computational theory and biological mechanisms. HFSP J. 1(1), 30 (2007)CrossRef
10.
Zurück zum Zitat Duan, Y., Schulman, J., Chen, X., Bartlett, P.L., Sutskever, I., Abbeel, P.: Rl squared: fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779 (2016) Duan, Y., Schulman, J., Chen, X., Bartlett, P.L., Sutskever, I., Abbeel, P.: Rl squared: fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:​1611.​02779 (2016)
11.
Zurück zum Zitat Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017) Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135. PMLR (2017)
14.
Zurück zum Zitat Hinaut, X., Dominey, P.F.: A three-layered model of primate prefrontal cortex encodes identity and abstract categorical structure of behavioral sequences. J. Physiol.-Paris 105(1–3), 16–24 (2011)CrossRef Hinaut, X., Dominey, P.F.: A three-layered model of primate prefrontal cortex encodes identity and abstract categorical structure of behavioral sequences. J. Physiol.-Paris 105(1–3), 16–24 (2011)CrossRef
15.
Zurück zum Zitat Hinaut, X., Dominey, P.F.: Real-time parallel processing of grammatical structure in the fronto-striatal system: a recurrent network simulation study using reservoir computing. PLoS ONE 8(2), e52946 (2013)CrossRef Hinaut, X., Dominey, P.F.: Real-time parallel processing of grammatical structure in the fronto-striatal system: a recurrent network simulation study using reservoir computing. PLoS ONE 8(2), e52946 (2013)CrossRef
16.
Zurück zum Zitat Hougen, D.F., Shah, S.N.H.: The evolution of reinforcement learning. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1457–1464. IEEE (2019) Hougen, D.F., Shah, S.N.H.: The evolution of reinforcement learning. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1457–1464. IEEE (2019)
18.
Zurück zum Zitat Johnston, T.D.: Selective costs and benefits in the evolution of learning. In: Advances in the Study of Behavior, vol. 12, pp. 65–106. Elsevier (1982) Johnston, T.D.: Selective costs and benefits in the evolution of learning. In: Advances in the Study of Behavior, vol. 12, pp. 65–106. Elsevier (1982)
19.
Zurück zum Zitat Kauffman, S.A.: The Origins of Order: Self Organization and Selection in Evolution. Oxford University Press, Oxford (1993)CrossRef Kauffman, S.A.: The Origins of Order: Self Organization and Selection in Evolution. Oxford University Press, Oxford (1993)CrossRef
22.
Zurück zum Zitat Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3(3), 127–149 (2009)CrossRef Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3(3), 127–149 (2009)CrossRef
23.
Zurück zum Zitat Mante, V., Sussillo, D., Shenoy, K.V., Newsome, W.T.: Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503(7474), 78–84 (2013)CrossRef Mante, V., Sussillo, D., Shenoy, K.V., Newsome, W.T.: Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503(7474), 78–84 (2013)CrossRef
24.
Zurück zum Zitat Marder, E., Bucher, D.: Central pattern generators and the control of rhythmic movements. Curr. Biol. 11(23), R986–R996 (2001)CrossRef Marder, E., Bucher, D.: Central pattern generators and the control of rhythmic movements. Curr. Biol. 11(23), R986–R996 (2001)CrossRef
26.
Zurück zum Zitat Monahan, G.E.: State of the art-a survey of partially observable Markov decision processes: theory, models, and algorithms. Manag. Sci. 28(1), 1–16 (1982)CrossRef Monahan, G.E.: State of the art-a survey of partially observable Markov decision processes: theory, models, and algorithms. Manag. Sci. 28(1), 1–16 (1982)CrossRef
27.
Zurück zum Zitat Moulin-Frier, C.: The ecology of open-ended skill acquisition. Ph.D. thesis, Université de Bordeaux (UB) (2022) Moulin-Frier, C.: The ecology of open-ended skill acquisition. Ph.D. thesis, Université de Bordeaux (UB) (2022)
28.
Zurück zum Zitat Najarro, E., Sudhakaran, S., Risi, S.: Towards self-assembling artificial neural networks through neural developmental programs. In: Artificial Life Conference Proceedings, vol. 35, p. 80. MIT Press, Cambridge (2023) Najarro, E., Sudhakaran, S., Risi, S.: Towards self-assembling artificial neural networks through neural developmental programs. In: Artificial Life Conference Proceedings, vol. 35, p. 80. MIT Press, Cambridge (2023)
29.
Zurück zum Zitat Nussenbaum, K., Hartley, C.A.: Reinforcement learning across development: what insights can we draw from a decade of research? Dev. Cogn. Neurosci. 40, 100733 (2019)CrossRef Nussenbaum, K., Hartley, C.A.: Reinforcement learning across development: what insights can we draw from a decade of research? Dev. Cogn. Neurosci. 40, 100733 (2019)CrossRef
30.
Zurück zum Zitat Pearson, K.: Neural adaptation in the generation of rhythmic behavior. Ann. Rev. Physiol. 62(1), 723–753 (2000)CrossRef Pearson, K.: Neural adaptation in the generation of rhythmic behavior. Ann. Rev. Physiol. 62(1), 723–753 (2000)CrossRef
31.
Zurück zum Zitat Pedersen, J., Risi, S.: Learning to act through evolution of neural diversity in random neural networks. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1248–1256 (2023) Pedersen, J., Risi, S.: Learning to act through evolution of neural diversity in random neural networks. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1248–1256 (2023)
32.
Zurück zum Zitat Pedersen, J.W., Risi, S.: Evolving and merging hebbian learning rules: increasing generalization by decreasing the number of rules. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 892–900 (2021) Pedersen, J.W., Risi, S.: Evolving and merging hebbian learning rules: increasing generalization by decreasing the number of rules. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 892–900 (2021)
33.
Zurück zum Zitat Puterman, M.L.: Markov decision processes. Handb. Oper. Res. Manag. Sci. 2, 331–434 (1990)MathSciNet Puterman, M.L.: Markov decision processes. Handb. Oper. Res. Manag. Sci. 2, 331–434 (1990)MathSciNet
35.
Zurück zum Zitat Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(1), 12348–12355 (2021) Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22(1), 12348–12355 (2021)
36.
Zurück zum Zitat Reddy, M.J., Kumar, D.N.: Computational algorithms inspired by biological processes and evolution. Curr. Sci. 370–380 (2012) Reddy, M.J., Kumar, D.N.: Computational algorithms inspired by biological processes and evolution. Curr. Sci. 370–380 (2012)
37.
Zurück zum Zitat Ren, G., Chen, W., Dasgupta, S., Kolodziejski, C., Wörgötter, F., Manoonpong, P.: Multiple chaotic central pattern generators with learning for legged locomotion and malfunction compensation. Inf. Sci. 294, 666–682 (2015)MathSciNetCrossRef Ren, G., Chen, W., Dasgupta, S., Kolodziejski, C., Wörgötter, F., Manoonpong, P.: Multiple chaotic central pattern generators with learning for legged locomotion and malfunction compensation. Inf. Sci. 294, 666–682 (2015)MathSciNetCrossRef
38.
Zurück zum Zitat Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017) Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:​1707.​06347 (2017)
39.
Zurück zum Zitat Seoane, L.F.: Evolutionary aspects of reservoir computing. Phil. Trans. R. Soc. B 374(1774), 20180377 (2019)CrossRef Seoane, L.F.: Evolutionary aspects of reservoir computing. Phil. Trans. R. Soc. B 374(1774), 20180377 (2019)CrossRef
40.
Zurück zum Zitat Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)CrossRef Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)CrossRef
43.
Zurück zum Zitat Stork: Is backpropagation biologically plausible? In: International 1989 Joint Conference on Neural Networks, pp. 241–246. IEEE (1989) Stork: Is backpropagation biologically plausible? In: International 1989 Joint Conference on Neural Networks, pp. 241–246. IEEE (1989)
44.
Zurück zum Zitat Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018) Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
45.
Zurück zum Zitat Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12 (1999) Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 12 (1999)
46.
Zurück zum Zitat Tierney, A.: Evolutionary implications of neural circuit structure and function. Behav. Proc. 35(1–3), 173–182 (1995)CrossRef Tierney, A.: Evolutionary implications of neural circuit structure and function. Behav. Proc. 35(1–3), 173–182 (1995)CrossRef
49.
Zurück zum Zitat Watson, R.A., Szathmáry, E.: How can evolution learn? Trends Ecol. Evol. 31(2), 147–157 (2016)CrossRef Watson, R.A., Szathmáry, E.: How can evolution learn? Trends Ecol. Evol. 31(2), 147–157 (2016)CrossRef
50.
Zurück zum Zitat Wyffels, F., Schrauwen, B.: Design of a central pattern generator using reservoir computing for learning human motion. In: 2009 Advanced Technologies for Enhanced Quality of Life, pp. 118–122. IEEE (2009) Wyffels, F., Schrauwen, B.: Design of a central pattern generator using reservoir computing for learning human motion. In: 2009 Advanced Technologies for Enhanced Quality of Life, pp. 118–122. IEEE (2009)
51.
Zurück zum Zitat Yu, Y., Si, X., Hu, C., Zhang, J.: A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019)MathSciNetCrossRef Yu, Y., Si, X., Hu, C., Zhang, J.: A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019)MathSciNetCrossRef
52.
Zurück zum Zitat Zador, A.M.: A critique of pure learning and what artificial neural networks can learn from animal brains. Nat. Commun. 10(1), 3770 (2019)CrossRef Zador, A.M.: A critique of pure learning and what artificial neural networks can learn from animal brains. Nat. Commun. 10(1), 3770 (2019)CrossRef
Metadaten
Titel
Evolving Reservoirs for Meta Reinforcement Learning
verfasst von
Corentin Léger
Gautier Hamon
Eleni Nisioti
Xavier Hinaut
Clément Moulin-Frier
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-56855-8_3

Premium Partner