Top

Autonomous Agents and Multi-Agent Systems

Published in:

15-05-2019

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Authors: Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl Tuyls, Wanli Xue, Zhiyong Feng

Published in: Autonomous Agents and Multi-Agent Systems | Issue 4/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer’s perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner’s dilemma game and the coordination game) are analyzed in detail. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions.

previous article High-multiplicity election problems

next article A game-theoretic approach for selecting optimal time-dependent thresholds for anomaly detection

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Abdallah, S., & Lesser, V. (2008). A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 33(1), 521–549.MathSciNetCrossRefMATH

Alvard, M. S. (2004) The ultimatum game, fairness, and cooperation among big game hunters. In Foundations of human sociality (pp. 413–435).

Andreoni, J., & Croson, R. (1998). Partners versus strangers: Random rematching in public goods experiments. Amsterdam: Elsevier.

Banerjee, B., & Peng, J. (2003). Adaptive policy gradient in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 686–692).

Banerjee, B., & Peng, J. (2004). The role of reactivity in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 538–545).

Banerjee, B., & Peng, J. (2005). Efficient learning of multi-step best response. In Proceedings of the fourth international joint conference on autonomous agents and multiagent systems (pp. 60–66).

Banerjee, D., & Sen, S. (2007). Reaching pareto optimality in Prisoner’s Dilemma using conditional joint action learning. In AAMAS’07 (pp. 91–108).

Bloembergen, D., Tuyls, K., Hennes, D., & Kaisers, M. (2015). Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research, 53, 659–697.MathSciNetCrossRefMATH

Bowling, M. (2004). Convergence and no-regret in multiagent learning. In International conference on neural information processing systems (pp. 209–216).

10.

Bowling, M. H., & Veloso, M. M. (2003). Multiagent learning using a variable learning rate. Artificial Intelligence, 136, 215–250.MathSciNetCrossRefMATH

11.

Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.CrossRef

12.

Chakraborty, D., & Stone, P. (2014). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-agent Systems, 28(2), 182–213.CrossRef

13.

Coddington, E. A., & Levinson, N. (1955). Theory of ordinary differential equations. New York: McGraw-Hill.MATH

14.

Conitzer, V., & Sandholm, T. (2007). Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1–2), 23–43.CrossRef

15.

Crandall, J. W. (2013). Just add pepper: Extending learning algorithms for repeated matrix games to repeated Markov games. In International conference on autonomous agents and multiagent systems (pp. 399–406).

16.

Foerster, J. N., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2017). Learning with opponent-learning awareness. CoRR arXiv:1709.04326.

17.

Hauert, C., & Szab, G. (2003). Prisoner’s Dilemma and public goods games in different geometries: Compulsory versus voluntary interactions. Complexity, 8(4), 31–38.MathSciNetCrossRef

18.

Hu, J., & Wellman, M. P. (2003). Nash q-learning for general-sum stochastic games. The Journal of Machine Learning Research, 4, 1039–1069.MathSciNetMATH

19.

Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A.G., et al. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in neural information processing systems (pp. 3330–3340).

20.

Lauer, M., & Rienmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In ICML’00 (pp. 535–542).

21.

Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning, (pp. 322–328).

22.

Littman, M. L. (2001). Friend-or-foe q-learning in general-sum games. In ICML (Vol. 1, pp. 322–328).

23.

Matignon, L., Laurent, G. J., & Le Fort-Piat, N. (2012). Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. The Knowledge Engineering Review, 27(01), 1–31.CrossRef

24.

Peysakhovich, A., & Lerer, A. (2017). Prosocial learning agents solve generalized stag hunts better than selfish ones. CoRR arXiv:1709.02865.

25.

Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In IJCAI (Vol. 5, pp. 817–822).

26.

Rodrigues Gomes, E., & Kowalczyk, R. (2009). Dynamic analysis of multiagent q-learning with \(\varepsilon \)-greedy exploration. In Proceedings of the 26th annual international conference on machine learning (pp. 369–376). ACM.

27.

Shilnikov, L. P., Shilnikov, A. L., Turaev, D. V., & Chua, L. O. (2001). Methods of qualitative theory in nonlinear dynamics (Vol. 5). Singapore: World Scientific.CrossRefMATH

28.

Shivshankar, S., & Jamalipour, A. (2015). An evolutionary game theory-based approach to cooperation in vanets under different network conditions. IEEE Transactions on Vehicular Technology, 64(5), 2015–2022.CrossRef

29.

Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 541–548). Morgan Kaufmann.

30.

Tuyls, K., Hoen, P. J., & Vanschoenwinkel, B. (2006). An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-agent Systems, 12(1), 115–153.CrossRef

31.

Tuyls, K., Verbeeck, K., & Lenaerts, T. (2003). A selection-mutation model for q-learning in multi-agent systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 693–700). ACM.

32.

Vohra, R. V., & Wellman, M. P. (2007). Foundations of multi-agent learning. Artificial Intelligence, 171(7), 363–452.MathSciNetCrossRefMATH

33.

Watkins, C. J. C. H. (1989). Learning from delayed rewards. Robotics & Autonomous Systems, 15(4), 233–235.

34.

Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 8, 279–292.MATH

35.

Wei, G., Zhu, P., Vasilakos, A. V., & Mao, Y. (2013). Cooperation dynamics on collaborative social networks of heterogeneous population. IEEE Journal on Selected Areas in Communications, 31(6), 1135–1146.CrossRef

36.

Zhang, C., & Lesser, V. R. (2010). Multi-agent learning with policy prediction. In Proceedings of the twenty-fourth AAAI conference on artificial intelligence (pp. 927–934).

37.

Zhang, Z., Zhao, D., Gao, J., Wang, D., & Dai, Y. (2017). Fmrq—A multiagent reinforcement learning algorithm for fully cooperative tasks. IEEE Transactions on Cybernetics, 47(6), 1367–1379.CrossRef

38.

Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In ICML (pp. 928–936).

Title: SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes
Authors: Chengwei Zhang
Xiaohong Li
Jianye Hao
Siqi Chen
Karl Tuyls
Wanli Xue
Zhiyong Feng
Publication date: 15-05-2019
Publisher: Springer US
Published in: Autonomous Agents and Multi-Agent Systems / Issue 4/2019
Print ISSN: 1387-2532
Electronic ISSN: 1573-7454
DOI: https://doi.org/10.1007/s10458-019-09411-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner