Skip to main content
Erschienen in: Autonomous Agents and Multi-Agent Systems 4/2015

01.07.2015

Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality

verfasst von: Jianye Hao, Ho-fung Leung

Erschienen in: Autonomous Agents and Multi-Agent Systems | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

During multiagent interactions, robust strategies are needed to help the agents to coordinate their actions on efficient outcomes. A large body of previous work focuses on designing strategies towards the goal of Nash equilibrium under self-play, which can be extremely inefficient in many situations such as prisoner’s dilemma game. To this end, we propose an alternative solution concept, socially optimal outcome sustained by Nash equilibrium (SOSNE), which refers to those outcomes that maximize the sum of all agents’ payoffs among all the possible outcomes that can correspond to a Nash equilibrium payoff profile in the infinitely repeated games. Adopting the solution concept of SOSNE guarantees that the system-level performance can be maximized provided that no agent will sacrifice its individual profits. On the other hand, apart from performing well under self-play, a good strategy should also be able to well respond against those opponents adopting different strategies as much as possible. To this end, we consider a particular class of rational opponents and we target at influencing those opponents to coordinate on SOSNE outcomes. We propose a novel learning strategy TaFSO which combines the characteristics of both teacher and follower strategies to effectively influence the opponent’s behavior towards SOSNE outcomes by exploiting their limitations. Extensive simulations show that our strategy TaFSO achieves better performance in terms of average payoffs obtained than previous work under both self-play and against the same class of rational opponents.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
One agent is randomly chosen to reveal its action in case that both agents choose to reveal their actions simultaneously.
 
2
Note that in general an outcome is a profile of mixed strategies of all agents [19], and a profile of pure strategies is a special case. In this paper, we adopt the meaning that an outcome is a pure strategy profile unless otherwise mentioned.
 
3
WoLF-PHC is short for Win or Learn Fast—policy hill climbing.
 
4
A preference relation \(\succsim _i\) for player \(i\) is defined under the limit of means criterion if it satisfies the following property: \(O_1 \succsim _i O_2\) if and only if \(\lim _{t \rightarrow \infty }\Sigma _{k=1}^{t}(p_1^k - p_2^k)/t \ge 0\), where \(O_1 = (a_{i,t}^1, a_{j,t}^1)_{t=1}^{\infty }\) and \(O_2 = (a_{i,t}^2, a_{j,t}^2)_{t=1}^{\infty }\) are the outcomes of the infinitely repeated game, and \(p_1^k\) and \(p_2^k\) are the corresponding payoffs player \(i\) receives in round k of outcomes \(O_1\) and \(O_2\) respectively.
 
5
Similar results can be observed when the opponent adopts other types of best-response strategies (Q-learning and FP) and are omitted here.
 
6
Note that theoretically, the opponents should be able to understand the punishment signal given enough explorations. In practice, due to the exploration schedule, the opponents do not explore enough to understand the punishment and thus settle on a sub-optimal strategy.
 
7
Note that only the payoffs obtained after 500 rounds are counted here since at the beginning the agents may achieve very low payoffs due to initial explorations. The results are averaged over 50 runs. For the tricky game, the payoffs of both TaFSO and SPaM learners are averaged over the cases when they play as the row or column players.
 
Literatur
1.
Zurück zum Zitat Airiau, S., & Sen, S. (2006). Learning to commit in repeated games. In AAMAS’06 (pp 1263–1265). Airiau, S., & Sen, S. (2006). Learning to commit in repeated games. In AAMAS’06 (pp 1263–1265).
2.
Zurück zum Zitat Airiau, S., & Sen, S. (2007). Evolutionary tournament-based comparison of learning and non-learning algorithms for iterated games. Journal of Artificial Societies and Social, Simulation, 10, 11. Airiau, S., & Sen, S. (2007). Evolutionary tournament-based comparison of learning and non-learning algorithms for iterated games. Journal of Artificial Societies and Social, Simulation, 10, 11.
3.
Zurück zum Zitat Banerjee, D., & Sen, S. (2007). Reaching pareto optimality in prisoner’s dilemma using conditional joint action learning. In AAMAS’07 (pp 91–108). Banerjee, D., & Sen, S. (2007). Reaching pareto optimality in prisoner’s dilemma using conditional joint action learning. In AAMAS’07 (pp 91–108).
4.
Zurück zum Zitat Bowling, M. H., & Veloso, M. M. (2003). Multiagent learning using a variable learning rate. Artificial Intelligence, 136, 215–250.MathSciNetCrossRef Bowling, M. H., & Veloso, M. M. (2003). Multiagent learning using a variable learning rate. Artificial Intelligence, 136, 215–250.MathSciNetCrossRef
5.
Zurück zum Zitat Brams, S. J. (1994). Theory of moves. Cambridge: Cambridge University Press.MATH Brams, S. J. (1994). Theory of moves. Cambridge: Cambridge University Press.MATH
6.
Zurück zum Zitat Chakraborty, D., & Stone, P. (2013). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-Agent Systems, 28, 182–213.CrossRef Chakraborty, D., & Stone, P. (2013). Multiagent learning in the presence of memory-bounded agents. Autonomous Agents and Multi-Agent Systems, 28, 182–213.CrossRef
7.
Zurück zum Zitat Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI’98 (pp 746–752). Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI’98 (pp 746–752).
8.
Zurück zum Zitat Crandall, J. W., & Goodrich, M. A. (2005). Learning to teach and follow in repeated games. In AAAI Workshop on Multiagent Learning. Crandall, J. W., & Goodrich, M. A. (2005). Learning to teach and follow in repeated games. In AAAI Workshop on Multiagent Learning.
9.
Zurück zum Zitat Fudenberg, D., & Levine, D. K. (1998). The theory of learning in games. Cambridge, MA: MIT Press.MATH Fudenberg, D., & Levine, D. K. (1998). The theory of learning in games. Cambridge, MA: MIT Press.MATH
10.
Zurück zum Zitat Hu, J., & Wellman, M. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning (pp 242–250). Hu, J., & Wellman, M. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning (pp 242–250).
11.
Zurück zum Zitat Jafari, A., Greenwald, A., Gondek, D., & Ercal, G. (2001). On no-regret learning, fictitious play, and Nash equilibrium. In: ICML’01 (pp 226–233) Jafari, A., Greenwald, A., Gondek, D., & Ercal, G. (2001). On no-regret learning, fictitious play, and Nash equilibrium. In: ICML’01 (pp 226–233)
12.
Zurück zum Zitat Jong, S., Tuyls, K., & Verbeeck, K. (2008). Artificial agents learning human fairness. In AAMAS’08, ACM Press (pp 863–870). Jong, S., Tuyls, K., & Verbeeck, K. (2008). Artificial agents learning human fairness. In AAMAS’08, ACM Press (pp 863–870).
13.
Zurück zum Zitat Lauer, M., & Rienmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In ICML’00 (pp 535–542). Lauer, M., & Rienmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In ICML’00 (pp 535–542).
14.
Zurück zum Zitat Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning (pp 322–328). Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning (pp 322–328).
15.
Zurück zum Zitat Littman, M. L., & Stone, P. (2001). Leading best-response strategies in repeated games. In IJCAI Workshop on Economic Agents, Models, and Mechanisms. Littman, M. L., & Stone, P. (2001). Leading best-response strategies in repeated games. In IJCAI Workshop on Economic Agents, Models, and Mechanisms.
16.
Zurück zum Zitat Littman, M. L., & Stone, P. (2005). A polynomial time Nash equilibrium algorithm for repeated games. Decision Support Systems, 39, 55–66.CrossRef Littman, M. L., & Stone, P. (2005). A polynomial time Nash equilibrium algorithm for repeated games. Decision Support Systems, 39, 55–66.CrossRef
17.
Zurück zum Zitat Moriyama, K. (2008). Learning-rate adjusting Q-learning for prisoner’s dilemma games. In WI-IAT ’08 (pp 322–325). Moriyama, K. (2008). Learning-rate adjusting Q-learning for prisoner’s dilemma games. In WI-IAT ’08 (pp 322–325).
18.
Zurück zum Zitat Oh, J., & Smith, S. F. (2008). A few good agents: multi-agent social learning. In AAMAS’08 (pp 339–346). Oh, J., & Smith, S. F. (2008). A few good agents: multi-agent social learning. In AAMAS’08 (pp 339–346).
19.
Zurück zum Zitat Osborne, M. J., & Rubinstein, A. (1994). A course in game theory. Cambridge: MIT Press.MATH Osborne, M. J., & Rubinstein, A. (1994). A course in game theory. Cambridge: MIT Press.MATH
20.
Zurück zum Zitat Powers, R., & Shoham, Y. (2004). New criteria and a new algorithm for learning in multi-agent systems. In NIPS’04 17 (pp. 1089–1096). Powers, R., & Shoham, Y. (2004). New criteria and a new algorithm for learning in multi-agent systems. In NIPS’04 17 (pp. 1089–1096).
21.
Zurück zum Zitat Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In IJCAI’05 (pp 817–822). Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. In IJCAI’05 (pp 817–822).
22.
Zurück zum Zitat Sen, S., Airiau, S., & Mukherjee, R. (2003). Towards a pareto-optimal solution in general-sum games. In AAMAS’03 (pp 153–160). Sen, S., Airiau, S., & Mukherjee, R. (2003). Towards a pareto-optimal solution in general-sum games. In AAMAS’03 (pp 153–160).
23.
Zurück zum Zitat Shoham, Y., Powers, R., & Grenager, T. (2007). If multi-agent learning is the answer, what is the question? Artificial Intelligence, 171, 365–377.MATHMathSciNetCrossRef Shoham, Y., Powers, R., & Grenager, T. (2007). If multi-agent learning is the answer, what is the question? Artificial Intelligence, 171, 365–377.MATHMathSciNetCrossRef
24.
Zurück zum Zitat Stimpson, J. L., Goodrich, M. A., Walters, L. C. (2001). Satisficing and learning cooperation in the prisoner’s dilemma. In IJCAI’01 (pp 535–540). Stimpson, J. L., Goodrich, M. A., Walters, L. C. (2001). Satisficing and learning cooperation in the prisoner’s dilemma. In IJCAI’01 (pp 535–540).
25.
Zurück zum Zitat Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 8, 279–292.MATH Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 8, 279–292.MATH
Metadaten
Titel
Introducing decision entrustment mechanism into repeated bilateral agent interactions to achieve social optimality
verfasst von
Jianye Hao
Ho-fung Leung
Publikationsdatum
01.07.2015
Verlag
Springer US
Erschienen in
Autonomous Agents and Multi-Agent Systems / Ausgabe 4/2015
Print ISSN: 1387-2532
Elektronische ISSN: 1573-7454
DOI
https://doi.org/10.1007/s10458-014-9265-1

Weitere Artikel der Ausgabe 4/2015

Autonomous Agents and Multi-Agent Systems 4/2015 Zur Ausgabe