nach oben

Soft Computing

Erschienen in:

13.12.2019 | Foundations

learning with policy prediction in continuous state-action multi-agent decision processes

Erschienen in: Soft Computing | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Inspired by recent attention to multi-agent reinforcement learning (MARL), the effort to provide efficient methods in this field is increasing. But, there are many issues which make this field challenging. Decision making of an agent depends on the other agents’ behavior while sharing information is not always possible. On the other hand, predicting other agents’ policies while they are also learning is a difficult task. Also, some agents in a multi-agent environment may not behave rationally. In such cases, achieving Nash equilibrium, as a target in a system with ideal behavior, is not possible and the best policy is the best response to the other agents’ policies. In addition, many real-world multi-agent problems have a continuous nature in their state and action spaces. This induces complexity in MARL scenarios. In order to overcome these challenges, we propose a new multi-agent learning method based on fuzzy least-square policy iteration. The proposed method consists of two parts: an Inner Model as one other agent policy approximation method and a multi-agent method to learn a near-optimal policy based on the others agents’ policies. Both of the proposed algorithms are applicable to problems with continuous state and action spaces. These methods can be used independently or in combination with each other. They are defined to perfectly suit each other so that the outputs of Inner Model are entirely consistent with the nature of inputs of the multi-agent method. In problems with no possibility of explicit communication, combinations of the proposed methods are recommended. In addition, theoretical analysis proves the near optimality of the policies learned by these methods. We evaluate the learning methods in problems with continuous state-action spaces: the well-known predator–prey problem and the unit commitment problem in the smart power grid. The results are satisfactory and show acceptable performance of our methods.

Vorheriger Artikel A note on the algebraicity of L-fuzzy subalgebras in universal algebra

Nächster Artikel Solving fuzzy regression equation and its approximation for random fuzzy variable and their application

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Albrecht SV, Ramamoorthy S (2012) Comparative evaluation of mal algorithms in a diverse set of ad hoc team problems. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems, vol 1. International Foundation for Autonomous Agents and Multiagent Systems, pp 349–356

Albrecht SV, Stone P (2018) Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif Intell 258:66–95MathSciNetCrossRef

Banerjee B, Peng J (2003) Adaptive policy gradient in multiagent learning. In: Proceedings of the second international joint conference on Autonomous agents and multiagent systems. ACM, pp 686–692

Bellman R (1957) A markovian decision process. Technical report, DTIC documentMathSciNetCrossRef

Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATH

Billings D, Papp D, Schaeffer J, Szafron D (1998) Opponent modeling in poker. In: AAAI/IAAI, pp 493–499

Boutilier C (1999) Sequential optimality and coordination in multiagent systems. IJCAI 99:478–485

Bowling M (2005) Convergence and no-regret in multiagent learning. In: Advances in neural information processing systems, vol 17, pp 209–216

Bowling M, Veloso M (2002) Multiagent learning using a variable learning rate. Artif Intell 136(2):215–250MathSciNetCrossRef

Bowling M, Veloso M (2001) Rational and convergent learning in stochastic games. In: International joint conference on artificial intelligence, vol 17. Lawrence Erlbaum Associates ltd, pp 1021–1026

Busoniu L, Babushka R, Bart DS (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern Part C 38(2):156–172CrossRef

Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI/IAAI, pp 746–752

Collins B (2007) Combining opponent modeling and modelbased reinforcement learning in a two-player competitive game. Master’s thesis, School of Informatics, University of Edinburgh

Davidson A (1999) Using artificial neural networks to model opponents in Texas hold’em (Unpublished manuscript)

Dibangoye J, Doniec A, Fakham H, Colas F, Guillaud X (2015) Distributed economic dispatch of embedded generation in smart grids. Eng Appl Artif Intell 44:64–78CrossRef

Galstyan A (2013) Continuous strategy replicator dynamics for multi-agent q-learning. Autonom Agents Multi Agent Syst 26(1):37–53 CrossRef

Ganzfried S, Sandholm T (2011) Game theory-based opponent modeling in large imperfect-information games. In: The 10th International conference on autonomous agents and multiagent systems, vol 2. International Foundation for Autonomous Agents and Multiagent Systems, pp 533–540

Ghorbani F, Derhami V, Afsharchi M (2017) Fuzzy least square policy iteration and its mathematical analysis. Int J Fuzzy Syst 19(3):849–862MathSciNetCrossRef

Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems. Springer, pp 66–83

Hart S, Mas-Colell A (2001) A reinforcement procedure leading to correlated equilibrium. In: Hildenbrand W (ed) Economics essays. Springer, Berlin, pp 181–200 CrossRef

Howard RA (1960) Dynamic programming and markov processes. Technology Press and Wiley, New York

Hu J, Wellman MP et al (1998) Multiagent reinforcement learning: theoretical framework and an algorithm. In: ICML, vol 98. Citeseer, pp 242–250

Hwang KS, Chen YJ, Lin TF, Jiang WC (2011) Continuous action for multi-agent q-learning. In: 2011 8th Asian control conference (ASCC). IEEE, pp 418–423

Ishiwaka Y, Sato T, Kakazu Y (2003) An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning. Robot Autonom Syst 43(4):245–256CrossRef

Kim YH, Ahn SC, Kwon WH (2000) Computational complexity of general fuzzy logic control and its simplification for a loop controller. Fuzzy Sets Syst 111(2):215–224MathSciNetCrossRef

Kok JR, Vlassis N (2006) Collaborative multiagent reinforcement learning by payoff propagation. J Mach Learn Res 7(Sep):1789–1828MathSciNetMATH

Kuo JY, Yu HF, Liu KFR, Lee FW (2015) Multiagent cooperative learning strategies for pursuit-evasion games. Math Probl Eng 2015:13 MathSciNetMATH

Lagoudakis M, Parr R (2003a) Least-squares policy iteration. J Mach Learn Res 4(4):1107–1249MathSciNetMATH

Lagoudakis MG, Parr R (2003b) Least-squares policy iteration. J Mach Learn Res 4(Dec):1107–1149MathSciNetMATH

Ledezma A, Aler R, Sanchis A, Borrajo D (2009) Ombo: an opponent modeling approach. AI Commun 22(1):21–35MathSciNetCrossRef

Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the eleventh international conference on machine learning, vol 157, pp 157–163CrossRef

Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: AAAI conference on artificial intelligence (AAAI)

Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6382–6393

Miller Sam RA Ramchurn Sarvapali D (2012) Optimal decentralised dispatch of embedded generation in the smart grid. In: Proceedings of AAMAS 2012

Orponen P (1994) Computational complexity of neural networks: a survey. Nord J Comput 1(1):94–110MathSciNet

Panella A, Gmytrasiewicz P (2017) Interactive pomdps with finite-state models of other agents. Autonom Agents Multi Agent Syst 31(4):861–904CrossRef

Raileanu R, Denton E, Szlam A, Fergus R (2018) Modeling others using oneself in multi-agent reinforcement learning. arXiv preprint arXiv:1802.09640

Richards M, Amir E (2007) Opponent modeling in scrabble. In: IJCAI, pp 1482–1487

Song L, Vempala S, Wilmes J, Xie B (2017) On the complexity of learning neural networks. In: Advances in neural information processing systems, pp 5520–5528

Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, et al. (2018) Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 2085–2087

Sutton RS, Barto AG (1999) Reinforcement learning: an introduction. Robotica 17(2):229–235MATH

Tacchetti A, Song HF, Mediano PA, Zambaldi V, Rabinowitz NC, Graepel T, Botvinick M, Battaglia PW (2018) Relational forward models for multi-agent learning. arXiv preprint arXiv:1809.11044

Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337CrossRef

Tesauro G (2003) Extending q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems

Uther W, Veloso M (1997) Adversarial reinforcement learning. Technical report, Carnegie Mellon University (Unpublished)

Zhang K, Yang Z, Liu H, Zhang T, Başar T (2018) Fully decentralized multi-agent reinforcement learning with networked agents. arXiv preprint arXiv:1802.08757

Titel: learning with policy prediction in continuous state-action multi-agent decision processes
Publikationsdatum: 13.12.2019
Erschienen in: Soft Computing / Ausgabe 2/2020
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-019-04600-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 2/2020

Scheduling parallel machine problem under general effects of deterioration and learning with past-sequence-dependent setup time: heuristic and meta-heuristic approaches

Equilibrium strategy for human resource management with limited effort: in-house versus outsourcing

A note on the algebraicity of L-fuzzy subalgebras in universal algebra

Algebraic semantics of the -fragment of Propositional Lax Logic

Choquet integrals of weighted generalized and group generalized intuitionistic fuzzy soft sets

Neural collision avoidance system for biomimetic autonomous underwater vehicle