nach oben

Swarm Intelligence

Erschienen in:

13.10.2017

Reinforcement learning in a continuum of agents

verfasst von: Adrian Šošić, Abdelhak M. Zoubir, Heinz Koeppl

Erschienen in: Swarm Intelligence | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present a decision-making framework for modeling the collective behavior of large groups of cooperatively interacting agents based on a continuum description of the agents’ joint state. The continuum model is derived from an agent-based system of locally coupled stochastic differential equations, taking into account that each agent in the group is only partially informed about the global system state. The usefulness of the proposed framework is twofold: (i) for multi-agent scenarios, it provides a computational approach to handling large-scale distributed decision-making problems and learning decentralized control policies. (ii) For single-agent systems, it offers an alternative approximation scheme for evaluating expectations of state distributions. We demonstrate our framework on a variant of the Kuramoto model using a variety of distributed control tasks, such as positioning and aggregation. As part of our experiments, we compare the effectiveness of the controllers learned by the continuum model and agent-based systems of different sizes, and we analyze how the degree of observability in the system affects the learning process.

Vorheriger Artikel Particle swarm stability: a theoretical extension using the non-stagnate distribution assumption

Nächster Artikel An ant-inspired model for multi-agent interaction networks without stigmergy

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Note that we use the terms decision-making and control interchangeably in this work.

Note that both these exploration types are different from the exploration in policy space, which we discuss in detail in Sect. 4.

Note that the value in Eq. (11) is based on a global definition of reward. We can easily switch to a “local” (i.e., agent-based) value computation by choosing \(R^G\) as in Eq. (10), which is in accordance with the definition of private value in Šošić et al. (2017).

This function is not to be confused with the probability density function of a single agent’s state (see Sect. 3.4) which, in contrast to the object defined here, is a deterministic quantity.

Recall that the continuum model requires only one system roll-out (see Sect. 3.3).

Abelson, H., Allen, D., Coore, D., Hanson, C., Homsy, G., Knight, T. F., et al. (2000). Amorphous computing. Communications of the ACM, 43(5), 74–82.CrossRef

Aumann, R. J. (1964). Markets with a continuum of traders. Econometrica, 32(1), 39–50.MathSciNetCrossRefMATH

Beal, J. (2005). Programming an amorphous computational medium. In J. P Banâtre, P. Fradet, J. L. Giavitto, & O. Michel (Eds.), Unconventional programming paradigms (pp. 121–136). Berlin: Springer.

Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840.MathSciNetCrossRefMATH

Billingsley, P. (1999). Convergence of probability measures. New York: Wiley.CrossRefMATH

Brambilla, M., Ferrante, E., Birattari, M., & Dorigo, M. (2013). Swarm robotics: A review from the swarm engineering perspective. Swarm Intelligence, 7(1), 1–41.CrossRef

Correll, N., & Martinoli, A. (2006). System identification of self-organizing robotic swarms. In M. Gini & R. Voyles (Eds.) Distributed autonomous robotic systems 7 (pp. 31–40). Tokyo: Springer Japan.

Couzin, I. D., Krause, J., James, R., Ruxton, G. D., & Franks, N. R. (2002). Collective memory and spatial sorting in animal groups. Journal of Theoretical Biology, 218(1), 1–11.MathSciNetCrossRef

Crutchfield, J. P., & Mitchell, M. (1995). The evolution of emergent computation. Proceedings of the National Academy of Sciences, 92(23), 10742–10746.CrossRefMATH

Dean, D. S. (1996). Langevin equation for the density of a system of interacting Langevin processes. Journal of Physics A: Mathematical and General, 29(24), L613.MathSciNetCrossRef

Deisenroth, M. P., Neumann, G., & Peters, J. (2013). A survey on policy search for robotics. Foundations and Trends in Robotics, 2(1–2), 1–142.

Doucet, A., Godsill, S., & Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10(3), 197–208.CrossRef

Dubkov, A., & Spagnolo, B. (2005). Generalized Wiener process and Kolmogorov’s equation for diffusion induced by non-Gaussian noise source. Fluctuation and Noise Letters, 5(02), L267–L274.MathSciNetCrossRef

Ermentrout, G. B., & Edelstein-Keshet, L. (1993). Cellular automata approaches to biological modeling. Journal of Theoretical Biology, 160(1), 97–133.CrossRef

Fornberg, B., & Flyer, N. (2015). Solving PDEs with radial basis functions. Acta Numerica, 24, 215–258.MathSciNetCrossRefMATH

Freitas, R. A. (2005). Current status of nanomedicine and medical nanorobotics. Journal of Computational and Theoretical Nanoscience, 2(1), 1–25.

Grondman, I., Busoniu, L., Lopes, G. A. D., & Babuska, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1291–1307.CrossRef

Hamann, H. (2014). Evolution of collective behaviors by minimizing surprise. In Proceedings of the 14th international conference on the synthesis and simulation of living systems (pp. 344–351). MIT Press.

Hamann, H., & Wörn, H. (2008). A framework of space–time continuous models for algorithm design in swarm robotics. Swarm Intelligence, 2(2), 209–239.CrossRef

Hayes, A. T. (2002). How many robots? Group size and efficiency in collective search tasks. In H. Asama, T. Arai, T. Fukuda, & T. Hasegawa (Eds.), Distributed autonomous robotic systems 5 (pp. 289–298). Tokyo: Springer Japan.

Houchmandzadeh, B., & Vallade, M. (2015). Exact results for a noise-induced bistable system. Physical Review E, 91(2), 022115.MathSciNetCrossRef

Hüttenrauch, M., Šošić, A., & Neumann, G. (2017). Guided deep reinforcement learning for swarm systems. In AAMAS workshop on autonomous robots and multirobot systems. arXiv:1709.06011.

Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1), 99–134.MathSciNetCrossRefMATH

Karatzas, I., & Shreve, S. (1998). Brownian motion and stochastic calculus. Berlin: Springer Science & Business Media.CrossRefMATH

Krylov, N. V. (2008). Controlled diffusion processes. Berlin: Springer Science & Business Media.

Kuramoto, Y. (1975). Self-entrainment of a population of coupled non-linear oscillators. In International symposium on mathematical problems in theoretical physics (pp. 420–422). Springer.

Land, M., & Belew, R. K. (1995). No perfect two-state cellular automata for density classification exists. Physical Review Letters, 74(25), 5148.CrossRef

Lasry, J.-M., & Lions, P.-L. (2007). Mean field games. Japanese Journal of Mathematics, 2(1), 229–260.MathSciNetCrossRefMATH

Lerman, K., Martinoli, A., & Galstyan, A. (2005). A review of probabilistic macroscopic models for swarm robotic systems. In Swarm robotics: SAB 2004 international workshop (pp. 143–152). Berlin: Springer.

Lesser, V., Ortiz, C. L., & Tambe, M. (2003). Distributed sensor networks: A multiagent perspective. Berlin: Springer Science & Business Media.CrossRefMATH

MacLennan, B. J. (1990). Continuous spatial automata. Technical report, University of Tennessee, Computer Science Department.

Macua, S. V., Chen, J., Zazo, S., & Sayed, A. H. (2015). Distributed policy evaluation under multiple behavior strategies. IEEE Transactions on Automatic Control, 60(5), 1260–1274.MathSciNetCrossRefMATH

Martinoli, A., Ijspeert, A. J., & Mondada, F. (1999). Understanding collective aggregation mechanisms: From probabilistic modelling to experiments with real robots. Robotics and Autonomous Systems, 29(1), 51–63.CrossRef

Michini, B., & How, J. P. (2012). Bayesian nonparametric inverse reinforcement learning. In P. A. Flach, T. De Bie, & N. Cristianini (Eds.), Machine learning and knowledge discovery in databases (pp. 148–163). Berlin: Springer.

Munos, R. (2006). Policy gradient in continuous time. Journal of Machine Learning Research, 7, 771–791.MathSciNetMATH

Ohkubo, J., Shnerb, N., & Kessler, D. A. (2008). Transition phenomena induced by internal noise and quasi-absorbing state. Journal of the Physical Society of Japan, 77(4), 044002.CrossRef

Ramaswamy, S. (2010). The mechanics and statistics of active matter. Annual Review of Condensed Matter Physics, 1(1), 323–345.MathSciNetCrossRef

Risken, H. (1996). Fokker–Planck equation. In H. Haken (Ed.) The Fokker–Planck equation (pp. 63–95). Berlin, Heidelberg: Springer.

Schweitzer, F. (2003). Brownian agents and active particles: Collective dynamics in the natural and social sciences. Berlin, Heidelberg: Springer.MATH

Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., & Schmidhuber, J. (2010). Parameter-exploring policy gradients. Neural Networks, 23(4), 551–559.CrossRef

Sipper, M. (1999). The emergence of cellular computing. Computer, 32(7), 18–26.CrossRef

Šošić, A., KhudaBukhsh, W. R., Zoubir, A. M., Koeppl, H. (2017). Inverse reinforcement learning in swarm systems. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (pp. 1413–1421). International Foundation for Autonomous Agents and Multiagent Systems.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.

Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., & Shochet, O. (1995). Novel type of phase transition in a system of self-driven particles. Physical Review Letters, 75(6), 1226–1229.MathSciNetCrossRef

Whitesides, G. M., & Grzybowski, B. (2002). Self-assembly at all scales. Science, 295(5564), 2418–2421.CrossRef

Titel: Reinforcement learning in a continuum of agents
verfasst von: Adrian Šošić
Abdelhak M. Zoubir
Heinz Koeppl
Publikationsdatum: 13.10.2017
Verlag: Springer US
Erschienen in: Swarm Intelligence / Ausgabe 1/2018
Print ISSN: 1935-3812
Elektronische ISSN: 1935-3820
DOI: https://doi.org/10.1007/s11721-017-0142-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"