Skip to main content
Erschienen in: Swarm Intelligence 1/2018

13.10.2017

Reinforcement learning in a continuum of agents

verfasst von: Adrian Šošić, Abdelhak M. Zoubir, Heinz Koeppl

Erschienen in: Swarm Intelligence | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a decision-making framework for modeling the collective behavior of large groups of cooperatively interacting agents based on a continuum description of the agents’ joint state. The continuum model is derived from an agent-based system of locally coupled stochastic differential equations, taking into account that each agent in the group is only partially informed about the global system state. The usefulness of the proposed framework is twofold: (i) for multi-agent scenarios, it provides a computational approach to handling large-scale distributed decision-making problems and learning decentralized control policies. (ii) For single-agent systems, it offers an alternative approximation scheme for evaluating expectations of state distributions. We demonstrate our framework on a variant of the Kuramoto model using a variety of distributed control tasks, such as positioning and aggregation. As part of our experiments, we compare the effectiveness of the controllers learned by the continuum model and agent-based systems of different sizes, and we analyze how the degree of observability in the system affects the learning process.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
2
 
2
Note that we use the terms decision-making and control interchangeably in this work.
 
3
Note that both these exploration types are different from the exploration in policy space, which we discuss in detail in Sect. 4.
 
4
Note that the value in Eq. (11) is based on a global definition of reward. We can easily switch to a “local” (i.e., agent-based) value computation by choosing \(R^G\) as in Eq. (10), which is in accordance with the definition of private value in Šošić et al. (2017).
 
5
This function is not to be confused with the probability density function of a single agent’s state (see Sect. 3.4) which, in contrast to the object defined here, is a deterministic quantity.
 
6
Recall that the continuum model requires only one system roll-out (see Sect. 3.3).
 
Literatur
Zurück zum Zitat Abelson, H., Allen, D., Coore, D., Hanson, C., Homsy, G., Knight, T. F., et al. (2000). Amorphous computing. Communications of the ACM, 43(5), 74–82.CrossRef Abelson, H., Allen, D., Coore, D., Hanson, C., Homsy, G., Knight, T. F., et al. (2000). Amorphous computing. Communications of the ACM, 43(5), 74–82.CrossRef
Zurück zum Zitat Beal, J. (2005). Programming an amorphous computational medium. In J. P Banâtre, P. Fradet, J. L. Giavitto, & O. Michel (Eds.), Unconventional programming paradigms (pp. 121–136). Berlin: Springer. Beal, J. (2005). Programming an amorphous computational medium. In J. P Banâtre, P. Fradet, J. L. Giavitto, & O. Michel (Eds.), Unconventional programming paradigms (pp. 121–136). Berlin: Springer.
Zurück zum Zitat Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.MathSciNetCrossRefMATH Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.MathSciNetCrossRefMATH
Zurück zum Zitat Brambilla, M., Ferrante, E., Birattari, M., & Dorigo, M. (2013). Swarm robotics: A review from the swarm engineering perspective. Swarm Intelligence, 7(1), 1–41.CrossRef Brambilla, M., Ferrante, E., Birattari, M., & Dorigo, M. (2013). Swarm robotics: A review from the swarm engineering perspective. Swarm Intelligence, 7(1), 1–41.CrossRef
Zurück zum Zitat Correll, N., & Martinoli, A. (2006). System identification of self-organizing robotic swarms. In M. Gini & R. Voyles (Eds.) Distributed autonomous robotic systems 7 (pp. 31–40). Tokyo: Springer Japan. Correll, N., & Martinoli, A. (2006). System identification of self-organizing robotic swarms. In M. Gini & R. Voyles (Eds.) Distributed autonomous robotic systems 7 (pp. 31–40). Tokyo: Springer Japan.
Zurück zum Zitat Couzin, I. D., Krause, J., James, R., Ruxton, G. D., & Franks, N. R. (2002). Collective memory and spatial sorting in animal groups. Journal of Theoretical Biology, 218(1), 1–11.MathSciNetCrossRef Couzin, I. D., Krause, J., James, R., Ruxton, G. D., & Franks, N. R. (2002). Collective memory and spatial sorting in animal groups. Journal of Theoretical Biology, 218(1), 1–11.MathSciNetCrossRef
Zurück zum Zitat Crutchfield, J. P., & Mitchell, M. (1995). The evolution of emergent computation. Proceedings of the National Academy of Sciences, 92(23), 10742–10746.CrossRefMATH Crutchfield, J. P., & Mitchell, M. (1995). The evolution of emergent computation. Proceedings of the National Academy of Sciences, 92(23), 10742–10746.CrossRefMATH
Zurück zum Zitat Dean, D. S. (1996). Langevin equation for the density of a system of interacting Langevin processes. Journal of Physics A: Mathematical and General, 29(24), L613.MathSciNetCrossRef Dean, D. S. (1996). Langevin equation for the density of a system of interacting Langevin processes. Journal of Physics A: Mathematical and General, 29(24), L613.MathSciNetCrossRef
Zurück zum Zitat Deisenroth, M. P., Neumann, G., & Peters, J. (2013). A survey on policy search for robotics. Foundations and Trends in Robotics, 2(1–2), 1–142. Deisenroth, M. P., Neumann, G., & Peters, J. (2013). A survey on policy search for robotics. Foundations and Trends in Robotics, 2(1–2), 1–142.
Zurück zum Zitat Doucet, A., Godsill, S., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208.CrossRef Doucet, A., Godsill, S., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208.CrossRef
Zurück zum Zitat Dubkov, A., & Spagnolo, B. (2005). Generalized Wiener process and Kolmogorov’s equation for diffusion induced by non-Gaussian noise source. Fluctuation and Noise Letters, 5(02), L267–L274.MathSciNetCrossRef Dubkov, A., & Spagnolo, B. (2005). Generalized Wiener process and Kolmogorov’s equation for diffusion induced by non-Gaussian noise source. Fluctuation and Noise Letters, 5(02), L267–L274.MathSciNetCrossRef
Zurück zum Zitat Ermentrout, G. B., & Edelstein-Keshet, L. (1993). Cellular automata approaches to biological modeling. Journal of Theoretical Biology, 160(1), 97–133.CrossRef Ermentrout, G. B., & Edelstein-Keshet, L. (1993). Cellular automata approaches to biological modeling. Journal of Theoretical Biology, 160(1), 97–133.CrossRef
Zurück zum Zitat Freitas, R. A. (2005). Current status of nanomedicine and medical nanorobotics. Journal of Computational and Theoretical Nanoscience, 2(1), 1–25. Freitas, R. A. (2005). Current status of nanomedicine and medical nanorobotics. Journal of Computational and Theoretical Nanoscience, 2(1), 1–25.
Zurück zum Zitat Grondman, I., Busoniu, L., Lopes, G. A. D., & Babuska, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1291–1307.CrossRef Grondman, I., Busoniu, L., Lopes, G. A. D., & Babuska, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1291–1307.CrossRef
Zurück zum Zitat Hamann, H. (2014). Evolution of collective behaviors by minimizing surprise. In Proceedings of the 14th international conference on the synthesis and simulation of living systems (pp. 344–351). MIT Press. Hamann, H. (2014). Evolution of collective behaviors by minimizing surprise. In Proceedings of the 14th international conference on the synthesis and simulation of living systems (pp. 344–351). MIT Press.
Zurück zum Zitat Hamann, H., & Wörn, H. (2008). A framework of space–time continuous models for algorithm design in swarm robotics. Swarm Intelligence, 2(2), 209–239.CrossRef Hamann, H., & Wörn, H. (2008). A framework of space–time continuous models for algorithm design in swarm robotics. Swarm Intelligence, 2(2), 209–239.CrossRef
Zurück zum Zitat Hayes, A. T. (2002). How many robots? Group size and efficiency in collective search tasks. In H. Asama, T. Arai, T. Fukuda, & T. Hasegawa (Eds.), Distributed autonomous robotic systems 5 (pp. 289–298). Tokyo: Springer Japan. Hayes, A. T. (2002). How many robots? Group size and efficiency in collective search tasks. In H. Asama, T. Arai, T. Fukuda, & T. Hasegawa (Eds.), Distributed autonomous robotic systems 5 (pp. 289–298). Tokyo: Springer Japan.
Zurück zum Zitat Houchmandzadeh, B., & Vallade, M. (2015). Exact results for a noise-induced bistable system. Physical Review E, 91(2), 022115.MathSciNetCrossRef Houchmandzadeh, B., & Vallade, M. (2015). Exact results for a noise-induced bistable system. Physical Review E, 91(2), 022115.MathSciNetCrossRef
Zurück zum Zitat Hüttenrauch, M., Šošić, A., & Neumann, G. (2017). Guided deep reinforcement learning for swarm systems. In AAMAS workshop on autonomous robots and multirobot systems. arXiv:1709.06011. Hüttenrauch, M., Šošić, A., & Neumann, G. (2017). Guided deep reinforcement learning for swarm systems. In AAMAS workshop on autonomous robots and multirobot systems. arXiv:​1709.​06011.
Zurück zum Zitat Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1), 99–134.MathSciNetCrossRefMATH Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1), 99–134.MathSciNetCrossRefMATH
Zurück zum Zitat Karatzas, I., & Shreve, S. (1998). Brownian motion and stochastic calculus. Berlin: Springer Science & Business Media.CrossRefMATH Karatzas, I., & Shreve, S. (1998). Brownian motion and stochastic calculus. Berlin: Springer Science & Business Media.CrossRefMATH
Zurück zum Zitat Krylov, N. V. (2008). Controlled diffusion processes. Berlin: Springer Science & Business Media. Krylov, N. V. (2008). Controlled diffusion processes. Berlin: Springer Science & Business Media.
Zurück zum Zitat Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear oscillators. In International symposium on mathematical problems in theoretical physics (pp. 420–422). Springer. Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear oscillators. In International symposium on mathematical problems in theoretical physics (pp. 420–422). Springer.
Zurück zum Zitat Land, M., & Belew, R. K. (1995). No perfect two-state cellular automata for density classification exists. Physical Review Letters, 74(25), 5148.CrossRef Land, M., & Belew, R. K. (1995). No perfect two-state cellular automata for density classification exists. Physical Review Letters, 74(25), 5148.CrossRef
Zurück zum Zitat Lerman, K., Martinoli, A., & Galstyan, A. (2005). A review of probabilistic macroscopic models for swarm robotic systems. In Swarm robotics: SAB 2004 international workshop (pp. 143–152). Berlin: Springer. Lerman, K., Martinoli, A., & Galstyan, A. (2005). A review of probabilistic macroscopic models for swarm robotic systems. In Swarm robotics: SAB 2004 international workshop (pp. 143–152). Berlin: Springer.
Zurück zum Zitat Lesser, V., Ortiz, C. L., & Tambe, M. (2003). Distributed sensor networks: A multiagent perspective. Berlin: Springer Science & Business Media.CrossRefMATH Lesser, V., Ortiz, C. L., & Tambe, M. (2003). Distributed sensor networks: A multiagent perspective. Berlin: Springer Science & Business Media.CrossRefMATH
Zurück zum Zitat MacLennan, B. J. (1990). Continuous spatial automata. Technical report, University of Tennessee, Computer Science Department. MacLennan, B. J. (1990). Continuous spatial automata. Technical report, University of Tennessee, Computer Science Department.
Zurück zum Zitat Macua, S. V., Chen, J., Zazo, S., & Sayed, A. H. (2015). Distributed policy evaluation under multiple behavior strategies. IEEE Transactions on Automatic Control, 60(5), 1260–1274.MathSciNetCrossRefMATH Macua, S. V., Chen, J., Zazo, S., & Sayed, A. H. (2015). Distributed policy evaluation under multiple behavior strategies. IEEE Transactions on Automatic Control, 60(5), 1260–1274.MathSciNetCrossRefMATH
Zurück zum Zitat Martinoli, A., Ijspeert, A. J., & Mondada, F. (1999). Understanding collective aggregation mechanisms: From probabilistic modelling to experiments with real robots. Robotics and Autonomous Systems, 29(1), 51–63.CrossRef Martinoli, A., Ijspeert, A. J., & Mondada, F. (1999). Understanding collective aggregation mechanisms: From probabilistic modelling to experiments with real robots. Robotics and Autonomous Systems, 29(1), 51–63.CrossRef
Zurück zum Zitat Michini, B., & How, J. P. (2012). Bayesian nonparametric inverse reinforcement learning. In P. A. Flach, T. De Bie, & N. Cristianini (Eds.), Machine learning and knowledge discovery in databases (pp. 148–163). Berlin: Springer. Michini, B., & How, J. P. (2012). Bayesian nonparametric inverse reinforcement learning. In P. A. Flach, T. De Bie, & N. Cristianini (Eds.), Machine learning and knowledge discovery in databases (pp. 148–163). Berlin: Springer.
Zurück zum Zitat Munos, R. (2006). Policy gradient in continuous time. Journal of Machine Learning Research, 7, 771–791.MathSciNetMATH Munos, R. (2006). Policy gradient in continuous time. Journal of Machine Learning Research, 7, 771–791.MathSciNetMATH
Zurück zum Zitat Ohkubo, J., Shnerb, N., & Kessler, D. A. (2008). Transition phenomena induced by internal noise and quasi-absorbing state. Journal of the Physical Society of Japan, 77(4), 044002.CrossRef Ohkubo, J., Shnerb, N., & Kessler, D. A. (2008). Transition phenomena induced by internal noise and quasi-absorbing state. Journal of the Physical Society of Japan, 77(4), 044002.CrossRef
Zurück zum Zitat Ramaswamy, S. (2010). The mechanics and statistics of active matter. Annual Review of Condensed Matter Physics, 1(1), 323–345.MathSciNetCrossRef Ramaswamy, S. (2010). The mechanics and statistics of active matter. Annual Review of Condensed Matter Physics, 1(1), 323–345.MathSciNetCrossRef
Zurück zum Zitat Risken, H. (1996). Fokker–Planck equation. In H. Haken (Ed.) The Fokker–Planck equation (pp. 63–95). Berlin, Heidelberg: Springer. Risken, H. (1996). Fokker–Planck equation. In H. Haken (Ed.) The Fokker–Planck equation (pp. 63–95). Berlin, Heidelberg: Springer.
Zurück zum Zitat Schweitzer, F. (2003). Brownian agents and active particles: Collective dynamics in the natural and social sciences. Berlin, Heidelberg: Springer.MATH Schweitzer, F. (2003). Brownian agents and active particles: Collective dynamics in the natural and social sciences. Berlin, Heidelberg: Springer.MATH
Zurück zum Zitat Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., & Schmidhuber, J. (2010). Parameter-exploring policy gradients. Neural Networks, 23(4), 551–559.CrossRef Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., & Schmidhuber, J. (2010). Parameter-exploring policy gradients. Neural Networks, 23(4), 551–559.CrossRef
Zurück zum Zitat Sipper, M. (1999). The emergence of cellular computing. Computer, 32(7), 18–26.CrossRef Sipper, M. (1999). The emergence of cellular computing. Computer, 32(7), 18–26.CrossRef
Zurück zum Zitat Šošić, A., KhudaBukhsh, W. R., Zoubir, A. M., Koeppl, H. (2017). Inverse reinforcement learning in swarm systems. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (pp. 1413–1421). International Foundation for Autonomous Agents and Multiagent Systems. Šošić, A., KhudaBukhsh, W. R., Zoubir, A. M., Koeppl, H. (2017). Inverse reinforcement learning in swarm systems. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (pp. 1413–1421). International Foundation for Autonomous Agents and Multiagent Systems.
Zurück zum Zitat Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
Zurück zum Zitat Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., & Shochet, O. (1995). Novel type of phase transition in a system of self-driven particles. Physical Review Letters, 75(6), 1226–1229.MathSciNetCrossRef Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., & Shochet, O. (1995). Novel type of phase transition in a system of self-driven particles. Physical Review Letters, 75(6), 1226–1229.MathSciNetCrossRef
Zurück zum Zitat Whitesides, G. M., & Grzybowski, B. (2002). Self-assembly at all scales. Science, 295(5564), 2418–2421.CrossRef Whitesides, G. M., & Grzybowski, B. (2002). Self-assembly at all scales. Science, 295(5564), 2418–2421.CrossRef
Metadaten
Titel
Reinforcement learning in a continuum of agents
verfasst von
Adrian Šošić
Abdelhak M. Zoubir
Heinz Koeppl
Publikationsdatum
13.10.2017
Verlag
Springer US
Erschienen in
Swarm Intelligence / Ausgabe 1/2018
Print ISSN: 1935-3812
Elektronische ISSN: 1935-3820
DOI
https://doi.org/10.1007/s11721-017-0142-9