Skip to main content
Erschienen in: Soft Computing 2/2020

13.12.2019 | Foundations

learning with policy prediction in continuous state-action multi-agent decision processes

Erschienen in: Soft Computing | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Inspired by recent attention to multi-agent reinforcement learning (MARL), the effort to provide efficient methods in this field is increasing. But, there are many issues which make this field challenging. Decision making of an agent depends on the other agents’ behavior while sharing information is not always possible. On the other hand, predicting other agents’ policies while they are also learning is a difficult task. Also, some agents in a multi-agent environment may not behave rationally. In such cases, achieving Nash equilibrium, as a target in a system with ideal behavior, is not possible and the best policy is the best response to the other agents’ policies. In addition, many real-world multi-agent problems have a continuous nature in their state and action spaces. This induces complexity in MARL scenarios. In order to overcome these challenges, we propose a new multi-agent learning method based on fuzzy least-square policy iteration. The proposed method consists of two parts: an Inner Model as one other agent policy approximation method and a multi-agent method to learn a near-optimal policy based on the others agents’ policies. Both of the proposed algorithms are applicable to problems with continuous state and action spaces. These methods can be used independently or in combination with each other. They are defined to perfectly suit each other so that the outputs of Inner Model are entirely consistent with the nature of inputs of the multi-agent method. In problems with no possibility of explicit communication, combinations of the proposed methods are recommended. In addition, theoretical analysis proves the near optimality of the policies learned by these methods. We evaluate the learning methods in problems with continuous state-action spaces: the well-known predator–prey problem and the unit commitment problem in the smart power grid. The results are satisfactory and show acceptable performance of our methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Albrecht SV, Ramamoorthy S (2012) Comparative evaluation of mal algorithms in a diverse set of ad hoc team problems. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, vol 1. International Foundation for Autonomous Agents and Multiagent Systems, pp 349–356 Albrecht SV, Ramamoorthy S (2012) Comparative evaluation of mal algorithms in a diverse set of ad hoc team problems. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, vol 1. International Foundation for Autonomous Agents and Multiagent Systems, pp 349–356
Zurück zum Zitat Albrecht SV, Stone P (2018) Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif Intell 258:66–95MathSciNetCrossRef Albrecht SV, Stone P (2018) Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif Intell 258:66–95MathSciNetCrossRef
Zurück zum Zitat Banerjee B, Peng J (2003) Adaptive policy gradient in multiagent learning. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM, pp 686–692 Banerjee B, Peng J (2003) Adaptive policy gradient in multiagent learning. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM, pp 686–692
Zurück zum Zitat Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH
Zurück zum Zitat Billings D, Papp D, Schaeffer J, Szafron D (1998) Opponent modeling in poker. In: AAAI/IAAI, pp 493–499 Billings D, Papp D, Schaeffer J, Szafron D (1998) Opponent modeling in poker. In: AAAI/IAAI, pp 493–499
Zurück zum Zitat Boutilier C (1999) Sequential optimality and coordination in multiagent systems. IJCAI 99:478–485 Boutilier C (1999) Sequential optimality and coordination in multiagent systems. IJCAI 99:478–485
Zurück zum Zitat Bowling M (2005) Convergence and no-regret in multiagent learning. In: Advances in neural information processing systems, vol 17, pp 209–216 Bowling M (2005) Convergence and no-regret in multiagent learning. In: Advances in neural information processing systems, vol 17, pp 209–216
Zurück zum Zitat Bowling M, Veloso M (2001) Rational and convergent learning in stochastic games. In: International joint conference on artificial intelligence, vol 17. Lawrence Erlbaum Associates ltd, pp 1021–1026 Bowling M, Veloso M (2001) Rational and convergent learning in stochastic games. In: International joint conference on artificial intelligence, vol 17. Lawrence Erlbaum Associates ltd, pp 1021–1026
Zurück zum Zitat Busoniu L, Babushka R, Bart DS (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C 38(2):156–172CrossRef Busoniu L, Babushka R, Bart DS (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C 38(2):156–172CrossRef
Zurück zum Zitat Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI/IAAI, pp 746–752 Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI/IAAI, pp 746–752
Zurück zum Zitat Collins B (2007) Combining opponent modeling and modelbased reinforcement learning in a two-player competitive game. Master’s thesis, School of Informatics, University of Edinburgh Collins B (2007) Combining opponent modeling and modelbased reinforcement learning in a two-player competitive game. Master’s thesis, School of Informatics, University of Edinburgh
Zurück zum Zitat Davidson A (1999) Using artificial neural networks to model opponents in Texas hold’em (Unpublished manuscript) Davidson A (1999) Using artificial neural networks to model opponents in Texas hold’em (Unpublished manuscript)
Zurück zum Zitat Dibangoye J, Doniec A, Fakham H, Colas F, Guillaud X (2015) Distributed economic dispatch of embedded generation in smart grids. Eng Appl Artif Intell 44:64–78CrossRef Dibangoye J, Doniec A, Fakham H, Colas F, Guillaud X (2015) Distributed economic dispatch of embedded generation in smart grids. Eng Appl Artif Intell 44:64–78CrossRef
Zurück zum Zitat Galstyan A (2013) Continuous strategy replicator dynamics for multi-agent q-learning. Autonom Agents Multi Agent Syst 26(1):37–53 CrossRef Galstyan A (2013) Continuous strategy replicator dynamics for multi-agent q-learning. Autonom Agents Multi Agent Syst 26(1):37–53 CrossRef
Zurück zum Zitat Ganzfried S, Sandholm T (2011) Game theory-based opponent modeling in large imperfect-information games. In: The 10th International conference on autonomous agents and multiagent systems, vol 2. International Foundation for Autonomous Agents and Multiagent Systems, pp 533–540 Ganzfried S, Sandholm T (2011) Game theory-based opponent modeling in large imperfect-information games. In: The 10th International conference on autonomous agents and multiagent systems, vol 2. International Foundation for Autonomous Agents and Multiagent Systems, pp 533–540
Zurück zum Zitat Ghorbani F, Derhami V, Afsharchi M (2017) Fuzzy least square policy iteration and its mathematical analysis. Int J Fuzzy Syst 19(3):849–862MathSciNetCrossRef Ghorbani F, Derhami V, Afsharchi M (2017) Fuzzy least square policy iteration and its mathematical analysis. Int J Fuzzy Syst 19(3):849–862MathSciNetCrossRef
Zurück zum Zitat Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems. Springer, pp 66–83 Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems. Springer, pp 66–83
Zurück zum Zitat Hart S, Mas-Colell A (2001) A reinforcement procedure leading to correlated equilibrium. In: Hildenbrand W (ed) Economics essays. Springer, Berlin, pp 181–200 CrossRef Hart S, Mas-Colell A (2001) A reinforcement procedure leading to correlated equilibrium. In: Hildenbrand W (ed) Economics essays. Springer, Berlin, pp 181–200 CrossRef
Zurück zum Zitat Howard RA (1960) Dynamic programming and markov processes. Technology Press and Wiley, New York Howard RA (1960) Dynamic programming and markov processes. Technology Press and Wiley, New York
Zurück zum Zitat Hu J, Wellman MP et al (1998) Multiagent reinforcement learning: theoretical framework and an algorithm. In: ICML, vol 98. Citeseer, pp 242–250 Hu J, Wellman MP et al (1998) Multiagent reinforcement learning: theoretical framework and an algorithm. In: ICML, vol 98. Citeseer, pp 242–250
Zurück zum Zitat Hwang KS, Chen YJ, Lin TF, Jiang WC (2011) Continuous action for multi-agent q-learning. In: 2011 8th Asian control conference (ASCC). IEEE, pp 418–423 Hwang KS, Chen YJ, Lin TF, Jiang WC (2011) Continuous action for multi-agent q-learning. In: 2011 8th Asian control conference (ASCC). IEEE, pp 418–423
Zurück zum Zitat Ishiwaka Y, Sato T, Kakazu Y (2003) An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning. Robot Autonom Syst 43(4):245–256CrossRef Ishiwaka Y, Sato T, Kakazu Y (2003) An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning. Robot Autonom Syst 43(4):245–256CrossRef
Zurück zum Zitat Kim YH, Ahn SC, Kwon WH (2000) Computational complexity of general fuzzy logic control and its simplification for a loop controller. Fuzzy Sets Syst 111(2):215–224MathSciNetCrossRef Kim YH, Ahn SC, Kwon WH (2000) Computational complexity of general fuzzy logic control and its simplification for a loop controller. Fuzzy Sets Syst 111(2):215–224MathSciNetCrossRef
Zurück zum Zitat Kok JR, Vlassis N (2006) Collaborative multiagent reinforcement learning by payoff propagation. J Mach Learn Res 7(Sep):1789–1828MathSciNetMATH Kok JR, Vlassis N (2006) Collaborative multiagent reinforcement learning by payoff propagation. J Mach Learn Res 7(Sep):1789–1828MathSciNetMATH
Zurück zum Zitat Kuo JY, Yu HF, Liu KFR, Lee FW (2015) Multiagent cooperative learning strategies for pursuit-evasion games. Math Probl Eng 2015:13 MathSciNetMATH Kuo JY, Yu HF, Liu KFR, Lee FW (2015) Multiagent cooperative learning strategies for pursuit-evasion games. Math Probl Eng 2015:13 MathSciNetMATH
Zurück zum Zitat Lagoudakis M, Parr R (2003a) Least-squares policy iteration. J Mach Learn Res 4(4):1107–1249MathSciNetMATH Lagoudakis M, Parr R (2003a) Least-squares policy iteration. J Mach Learn Res 4(4):1107–1249MathSciNetMATH
Zurück zum Zitat Lagoudakis MG, Parr R (2003b) Least-squares policy iteration. J Mach Learn Res 4(Dec):1107–1149MathSciNetMATH Lagoudakis MG, Parr R (2003b) Least-squares policy iteration. J Mach Learn Res 4(Dec):1107–1149MathSciNetMATH
Zurück zum Zitat Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the eleventh international conference on machine learning, vol 157, pp 157–163CrossRef Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the eleventh international conference on machine learning, vol 157, pp 157–163CrossRef
Zurück zum Zitat Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: AAAI conference on artificial intelligence (AAAI) Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: AAAI conference on artificial intelligence (AAAI)
Zurück zum Zitat Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6382–6393 Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6382–6393
Zurück zum Zitat Miller Sam RA Ramchurn Sarvapali D (2012) Optimal decentralised dispatch of embedded generation in the smart grid. In: Proceedings of AAMAS 2012 Miller Sam RA Ramchurn Sarvapali D (2012) Optimal decentralised dispatch of embedded generation in the smart grid. In: Proceedings of AAMAS 2012
Zurück zum Zitat Orponen P (1994) Computational complexity of neural networks: a survey. Nord J Comput 1(1):94–110MathSciNet Orponen P (1994) Computational complexity of neural networks: a survey. Nord J Comput 1(1):94–110MathSciNet
Zurück zum Zitat Panella A, Gmytrasiewicz P (2017) Interactive pomdps with finite-state models of other agents. Autonom Agents Multi Agent Syst 31(4):861–904CrossRef Panella A, Gmytrasiewicz P (2017) Interactive pomdps with finite-state models of other agents. Autonom Agents Multi Agent Syst 31(4):861–904CrossRef
Zurück zum Zitat Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. arXiv preprint arXiv:1802.09640 Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. arXiv preprint arXiv:​1802.​09640
Zurück zum Zitat Richards M, Amir E (2007) Opponent modeling in scrabble. In: IJCAI, pp 1482–1487 Richards M, Amir E (2007) Opponent modeling in scrabble. In: IJCAI, pp 1482–1487
Zurück zum Zitat Song L, Vempala S, Wilmes J, Xie B (2017) On the complexity of learning neural networks. In: Advances in neural information processing systems, pp 5520–5528 Song L, Vempala S, Wilmes J, Xie B (2017) On the complexity of learning neural networks. In: Advances in neural information processing systems, pp 5520–5528
Zurück zum Zitat Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, et al. (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 2085–2087 Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, et al. (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 2085–2087
Zurück zum Zitat Sutton RS, Barto AG (1999) Reinforcement learning: an introduction. Robotica 17(2):229–235MATH Sutton RS, Barto AG (1999) Reinforcement learning: an introduction. Robotica 17(2):229–235MATH
Zurück zum Zitat Tacchetti A, Song HF, Mediano PA, Zambaldi V, Rabinowitz NC, Graepel T, Botvinick M, Battaglia PW (2018) Relational forward models for multi-agent learning. arXiv preprint arXiv:1809.11044 Tacchetti A, Song HF, Mediano PA, Zambaldi V, Rabinowitz NC, Graepel T, Botvinick M, Battaglia PW (2018) Relational forward models for multi-agent learning. arXiv preprint arXiv:​1809.​11044
Zurück zum Zitat Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337CrossRef Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337CrossRef
Zurück zum Zitat Tesauro G (2003) Extending q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems Tesauro G (2003) Extending q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems
Zurück zum Zitat Uther W, Veloso M (1997) Adversarial reinforcement learning. Technical report, Carnegie Mellon University (Unpublished) Uther W, Veloso M (1997) Adversarial reinforcement learning. Technical report, Carnegie Mellon University (Unpublished)
Zurück zum Zitat Zhang K, Yang Z, Liu H, Zhang T, Başar T (2018) Fully decentralized multi-agent reinforcement learning with networked agents. arXiv preprint arXiv:1802.08757 Zhang K, Yang Z, Liu H, Zhang T, Başar T (2018) Fully decentralized multi-agent reinforcement learning with networked agents. arXiv preprint arXiv:​1802.​08757
Metadaten
Titel
learning with policy prediction in continuous state-action multi-agent decision processes
Publikationsdatum
13.12.2019
Erschienen in
Soft Computing / Ausgabe 2/2020
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04600-4

Weitere Artikel der Ausgabe 2/2020

Soft Computing 2/2020 Zur Ausgabe