Weitere Artikel dieser Ausgabe durch Wischen aufrufen
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer’s perspective, it is desirable if the agents can learn to coordinate towards socially optimal outcomes, while also avoiding being exploited by selfish opponents. To this end, we propose a novel gradient ascent based algorithm (SA-IGA) which augments the basic gradient-ascent algorithm by incorporating social awareness into the policy update process. We theoretically analyze the learning dynamics of SA-IGA using dynamical system theory and SA-IGA is shown to have linear dynamics for a wide range of games including symmetric games. The learning dynamics of two representative games (the prisoner’s dilemma game and the coordination game) are analyzed in detail. Based on the idea of SA-IGA, we further propose a practical multiagent learning algorithm, called SA-PGA, based on Q-learning update rule. Simulation results show that SA-PGA agent can achieve higher social welfare than previous social-optimality oriented Conditional Joint Action Learner (CJAL) and also is robust against individually rational opponents by reaching Nash equilibrium solutions.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Alvard, M. S. (2004) The ultimatum game, fairness, and cooperation among big game hunters. In Foundations of human sociality (pp. 413–435).
Andreoni, J., & Croson, R. (1998). Partners versus strangers: Random rematching in public goods experiments. Amsterdam: Elsevier.
Banerjee, B., & Peng, J. (2003). Adaptive policy gradient in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 686–692).
Banerjee, B., & Peng, J. (2004). The role of reactivity in multiagent learning. In International joint conference on autonomous agents and multiagent systems (pp. 538–545).
Banerjee, B., & Peng, J. (2005). Efficient learning of multi-step best response. In Proceedings of the fourth international joint conference on autonomous agents and multiagent systems (pp. 60–66).
Banerjee, D., & Sen, S. (2007). Reaching pareto optimality in Prisoner’s Dilemma using conditional joint action learning. In AAMAS’07 (pp. 91–108).
Bowling, M. (2004). Convergence and no-regret in multiagent learning. In International conference on neural information processing systems (pp. 209–216).
Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172. CrossRef
Chakraborty, D., & Stone, P. (2014). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-agent Systems, 28(2), 182–213. CrossRef
Coddington, E. A., & Levinson, N. (1955). Theory of ordinary differential equations. New York: McGraw-Hill. MATH
Conitzer, V., & Sandholm, T. (2007). Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1–2), 23–43. CrossRef
Crandall, J. W. (2013). Just add pepper: Extending learning algorithms for repeated matrix games to repeated Markov games. In International conference on autonomous agents and multiagent systems (pp. 399–406).
Foerster, J. N., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2017). Learning with opponent-learning awareness. CoRR arXiv:1709.04326.
Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A.G., et al. (2018). Inequity aversion improves cooperation in intertemporal social dilemmas. In Advances in neural information processing systems (pp. 3330–3340).
Lauer, M., & Rienmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In ICML’00 (pp. 535–542).
Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning, (pp. 322–328).
Littman, M. L. (2001). Friend-or-foe q-learning in general-sum games. In ICML (Vol. 1, pp. 322–328).
Matignon, L., Laurent, G. J., & Le Fort-Piat, N. (2012). Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. The Knowledge Engineering Review, 27(01), 1–31. CrossRef
Peysakhovich, A., & Lerer, A. (2017). Prosocial learning agents solve generalized stag hunts better than selfish ones. CoRR arXiv:1709.02865.
Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In IJCAI (Vol. 5, pp. 817–822).
Rodrigues Gomes, E., & Kowalczyk, R. (2009). Dynamic analysis of multiagent q-learning with \(\varepsilon \)-greedy exploration. In Proceedings of the 26th annual international conference on machine learning (pp. 369–376). ACM.
Shivshankar, S., & Jamalipour, A. (2015). An evolutionary game theory-based approach to cooperation in vanets under different network conditions. IEEE Transactions on Vehicular Technology, 64(5), 2015–2022. CrossRef
Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In Proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 541–548). Morgan Kaufmann.
Tuyls, K., Hoen, P. J., & Vanschoenwinkel, B. (2006). An evolutionary dynamical analysis of multi-agent learning in iterated games. Autonomous Agents and Multi-agent Systems, 12(1), 115–153. CrossRef
Tuyls, K., Verbeeck, K., & Lenaerts, T. (2003). A selection-mutation model for q-learning in multi-agent systems. In Proceedings of the second international joint conference on autonomous agents and multiagent systems (pp. 693–700). ACM.
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Robotics & Autonomous Systems, 15(4), 233–235.
Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 8, 279–292. MATH
Wei, G., Zhu, P., Vasilakos, A. V., & Mao, Y. (2013). Cooperation dynamics on collaborative social networks of heterogeneous population. IEEE Journal on Selected Areas in Communications, 31(6), 1135–1146. CrossRef
Zhang, C., & Lesser, V. R. (2010). Multi-agent learning with policy prediction. In Proceedings of the twenty-fourth AAAI conference on artificial intelligence (pp. 927–934).
Zhang, Z., Zhao, D., Gao, J., Wang, D., & Dai, Y. (2017). Fmrq—A multiagent reinforcement learning algorithm for fully cooperative tasks. IEEE Transactions on Cybernetics, 47(6), 1367–1379. CrossRef
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In ICML (pp. 928–936).
- SA-IGA: a multiagent reinforcement learning method towards socially optimal outcomes
- Springer US
Neuer Inhalt/© ITandMEDIA