Skip to main content
Top
Published in: Autonomous Agents and Multi-Agent Systems 4/2019

15-05-2019

SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes

Authors: Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl Tuyls, Wanli Xue, Zhiyong Feng

Published in: Autonomous Agents and Multi-Agent Systems | Issue 4/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer’s perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner’s dilemma game and the coordination game) are analyzed in detail. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abdallah, S., & Lesser, V. (2008). A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 33(1), 521–549.MathSciNetCrossRefMATH Abdallah, S., & Lesser, V. (2008). A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 33(1), 521–549.MathSciNetCrossRefMATH
2.
go back to reference Alvard, M. S. (2004) The ultimatum game, fairness, and cooperation among big game hunters. In Foundations of human sociality (pp. 413–435). Alvard, M. S. (2004) The ultimatum game, fairness, and cooperation among big game hunters. In Foundations of human sociality (pp. 413–435).
3.
go back to reference Andreoni, J., & Croson, R. (1998). Partners versus strangers: Random rematching in public goods experiments. Amsterdam: Elsevier. Andreoni, J., & Croson, R. (1998). Partners versus strangers: Random rematching in public goods experiments. Amsterdam: Elsevier.
4.
go back to reference Banerjee, B., & Peng, J. (2003). Adaptive policy gradient in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 686–692). Banerjee, B., & Peng, J. (2003). Adaptive policy gradient in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 686–692).
5.
go back to reference Banerjee, B., & Peng, J. (2004). The role of reactivity in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 538–545). Banerjee, B., & Peng, J. (2004). The role of reactivity in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 538–545).
6.
go back to reference Banerjee, B., & Peng, J. (2005). Efficient learning of multi-step best response. In Proceedings of the fourth international joint conference on autonomous agents and multiagent systems (pp. 60–66). Banerjee, B., & Peng, J. (2005). Efficient learning of multi-step best response. In Proceedings of the fourth international joint conference on autonomous agents and multiagent systems (pp. 60–66).
7.
go back to reference Banerjee, D., & Sen, S. (2007). Reaching pareto optimality in Prisoner’s Dilemma using conditional joint action learning. In AAMAS’07 (pp. 91–108). Banerjee, D., & Sen, S. (2007). Reaching pareto optimality in Prisoner’s Dilemma using conditional joint action learning. In AAMAS’07 (pp. 91–108).
8.
go back to reference Bloembergen, D., Tuyls, K., Hennes, D., & Kaisers, M. (2015). Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research, 53, 659–697.MathSciNetCrossRefMATH Bloembergen, D., Tuyls, K., Hennes, D., & Kaisers, M. (2015). Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research, 53, 659–697.MathSciNetCrossRefMATH
9.
go back to reference Bowling, M. (2004). Convergence and no-regret in multiagent learning. In International conference on neural information processing systems (pp. 209–216). Bowling, M. (2004). Convergence and no-regret in multiagent learning. In International conference on neural information processing systems (pp. 209–216).
10.
go back to reference Bowling, M. H., & Veloso, M. M. (2003). Multiagent learning using a variable learning rate. Artificial Intelligence, 136, 215–250.MathSciNetCrossRefMATH Bowling, M. H., & Veloso, M. M. (2003). Multiagent learning using a variable learning rate. Artificial Intelligence, 136, 215–250.MathSciNetCrossRefMATH
11.
go back to reference Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.CrossRef Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.CrossRef
12.
go back to reference Chakraborty, D., & Stone, P. (2014). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-agent Systems, 28(2), 182–213.CrossRef Chakraborty, D., & Stone, P. (2014). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-agent Systems, 28(2), 182–213.CrossRef
13.
go back to reference Coddington, E. A., & Levinson, N. (1955). Theory of ordinary differential equations. New York: McGraw-Hill.MATH Coddington, E. A., & Levinson, N. (1955). Theory of ordinary differential equations. New York: McGraw-Hill.MATH
14.
go back to reference Conitzer, V., & Sandholm, T. (2007). Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1–2), 23–43.CrossRef Conitzer, V., & Sandholm, T. (2007). Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1–2), 23–43.CrossRef
15.
go back to reference Crandall, J. W. (2013). Just add pepper: Extending learning algorithms for repeated matrix games to repeated Markov games. In International conference on autonomous agents and multiagent systems (pp. 399–406). Crandall, J. W. (2013). Just add pepper: Extending learning algorithms for repeated matrix games to repeated Markov games. In International conference on autonomous agents and multiagent systems (pp. 399–406).
16.
go back to reference Foerster, J. N., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2017). Learning with opponent-learning awareness. CoRR arXiv:1709.04326. Foerster, J. N., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2017). Learning with opponent-learning awareness. CoRR arXiv:​1709.​04326.
17.
go back to reference Hauert, C., & Szab, G. (2003). Prisoner’s Dilemma and public goods games in different geometries: Compulsory versus voluntary interactions. Complexity, 8(4), 31–38.MathSciNetCrossRef Hauert, C., & Szab, G. (2003). Prisoner’s Dilemma and public goods games in different geometries: Compulsory versus voluntary interactions. Complexity, 8(4), 31–38.MathSciNetCrossRef
18.
go back to reference Hu, J., & Wellman, M. P. (2003). Nash q-learning for general-sum stochastic games. The Journal of Machine Learning Research, 4, 1039–1069.MathSciNetMATH Hu, J., & Wellman, M. P. (2003). Nash q-learning for general-sum stochastic games. The Journal of Machine Learning Research, 4, 1039–1069.MathSciNetMATH
19.
go back to reference Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A.G., et al. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in neural information processing systems (pp. 3330–3340). Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A.G., et al. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in neural information processing systems (pp. 3330–3340).
20.
go back to reference Lauer, M., & Rienmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In ICML’00 (pp. 535–542). Lauer, M., & Rienmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In ICML’00 (pp. 535–542).
21.
go back to reference Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning, (pp. 322–328). Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning, (pp. 322–328).
22.
go back to reference Littman, M. L. (2001). Friend-or-foe q-learning in general-sum games. In ICML (Vol. 1, pp. 322–328). Littman, M. L. (2001). Friend-or-foe q-learning in general-sum games. In ICML (Vol. 1, pp. 322–328).
23.
go back to reference Matignon, L., Laurent, G. J., & Le Fort-Piat, N. (2012). Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. The Knowledge Engineering Review, 27(01), 1–31.CrossRef Matignon, L., Laurent, G. J., & Le Fort-Piat, N. (2012). Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. The Knowledge Engineering Review, 27(01), 1–31.CrossRef
24.
go back to reference Peysakhovich, A., & Lerer, A. (2017). Prosocial learning agents solve generalized stag hunts better than selfish ones. CoRR arXiv:1709.02865. Peysakhovich, A., & Lerer, A. (2017). Prosocial learning agents solve generalized stag hunts better than selfish ones. CoRR arXiv:​1709.​02865.
25.
go back to reference Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In IJCAI (Vol. 5, pp. 817–822). Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In IJCAI (Vol. 5, pp. 817–822).
26.
go back to reference Rodrigues Gomes, E., & Kowalczyk, R. (2009). Dynamic analysis of multiagent q-learning with \(\varepsilon \)-greedy exploration. In Proceedings of the 26th annual international conference on machine learning (pp. 369–376). ACM. Rodrigues Gomes, E., & Kowalczyk, R. (2009). Dynamic analysis of multiagent q-learning with \(\varepsilon \)-greedy exploration. In Proceedings of the 26th annual international conference on machine learning (pp. 369–376). ACM.
27.
go back to reference Shilnikov, L. P., Shilnikov, A. L., Turaev, D. V., & Chua, L. O. (2001). Methods of qualitative theory in nonlinear dynamics (Vol. 5). Singapore: World Scientific.CrossRefMATH Shilnikov, L. P., Shilnikov, A. L., Turaev, D. V., & Chua, L. O. (2001). Methods of qualitative theory in nonlinear dynamics (Vol. 5). Singapore: World Scientific.CrossRefMATH
28.
go back to reference Shivshankar, S., & Jamalipour, A. (2015). An evolutionary game theory-based approach to cooperation in vanets under different network conditions. IEEE Transactions on Vehicular Technology, 64(5), 2015–2022.CrossRef Shivshankar, S., & Jamalipour, A. (2015). An evolutionary game theory-based approach to cooperation in vanets under different network conditions. IEEE Transactions on Vehicular Technology, 64(5), 2015–2022.CrossRef
29.
go back to reference Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 541–548). Morgan Kaufmann. Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 541–548). Morgan Kaufmann.
30.
go back to reference Tuyls, K., Hoen, P. J., & Vanschoenwinkel, B. (2006). An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-agent Systems, 12(1), 115–153.CrossRef Tuyls, K., Hoen, P. J., & Vanschoenwinkel, B. (2006). An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-agent Systems, 12(1), 115–153.CrossRef
31.
go back to reference Tuyls, K., Verbeeck, K., & Lenaerts, T. (2003). A selection-mutation model for q-learning in multi-agent systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 693–700). ACM. Tuyls, K., Verbeeck, K., & Lenaerts, T. (2003). A selection-mutation model for q-learning in multi-agent systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 693–700). ACM.
32.
33.
go back to reference Watkins, C. J. C. H. (1989). Learning from delayed rewards. Robotics & Autonomous Systems, 15(4), 233–235. Watkins, C. J. C. H. (1989). Learning from delayed rewards. Robotics & Autonomous Systems, 15(4), 233–235.
34.
go back to reference Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 8, 279–292.MATH Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 8, 279–292.MATH
35.
go back to reference Wei, G., Zhu, P., Vasilakos, A. V., & Mao, Y. (2013). Cooperation dynamics on collaborative social networks of heterogeneous population. IEEE Journal on Selected Areas in Communications, 31(6), 1135–1146.CrossRef Wei, G., Zhu, P., Vasilakos, A. V., & Mao, Y. (2013). Cooperation dynamics on collaborative social networks of heterogeneous population. IEEE Journal on Selected Areas in Communications, 31(6), 1135–1146.CrossRef
36.
go back to reference Zhang, C., & Lesser, V. R. (2010). Multi-agent learning with policy prediction. In Proceedings of the twenty-fourth AAAI conference on artificial intelligence (pp. 927–934). Zhang, C., & Lesser, V. R. (2010). Multi-agent learning with policy prediction. In Proceedings of the twenty-fourth AAAI conference on artificial intelligence (pp. 927–934).
37.
go back to reference Zhang, Z., Zhao, D., Gao, J., Wang, D., & Dai, Y. (2017). Fmrq—A multiagent reinforcement learning algorithm for fully cooperative tasks. IEEE Transactions on Cybernetics, 47(6), 1367–1379.CrossRef Zhang, Z., Zhao, D., Gao, J., Wang, D., & Dai, Y. (2017). Fmrq—A multiagent reinforcement learning algorithm for fully cooperative tasks. IEEE Transactions on Cybernetics, 47(6), 1367–1379.CrossRef
38.
go back to reference Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In ICML (pp. 928–936). Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In ICML (pp. 928–936).
Metadata
Title
SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes
Authors
Chengwei Zhang
Xiaohong Li
Jianye Hao
Siqi Chen
Karl Tuyls
Wanli Xue
Zhiyong Feng
Publication date
15-05-2019
Publisher
Springer US
Published in
Autonomous Agents and Multi-Agent Systems / Issue 4/2019
Print ISSN: 1387-2532
Electronic ISSN: 1573-7454
DOI
https://doi.org/10.1007/s10458-019-09411-3

Premium Partner