Skip to main content
Erschienen in: Autonomous Agents and Multi-Agent Systems 2/2014

01.03.2014

Multiagent learning in the presence of memory-bounded agents

verfasst von: Doran Chakraborty, Peter Stone

Erschienen in: Autonomous Agents and Multi-Agent Systems | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, great strides have been made towards creating autonomous agents that can learn via interaction with their environment. When considering just an individual agent, it is often appropriate to model the world as being stationary, meaning that the same action from the same state will always yield the same (possibly stochastic) effects. However, in the presence of other independent agents, the environment is not stationary: an action’s effects may depend on the actions of the other agents. This non-stationarity poses the primary challenge of multiagent learning and comprises the main reason that it is best considered distinctly from single agent learning. The multiagent learning problem is often studied in the stylized settings provided by repeated matrix games. The goal of this article is to introduce a novel multiagent learning algorithm for such a setting, called Convergence with Model Learning and Safety (or CMLeS), that achieves a new set of objectives which have not been previously achieved. Specifically, CMLeS is the first multiagent learning algorithm to achieve the following three objectives: (1) converges to following a Nash equilibrium joint-policy in self-play; (2) achieves close to the best response when interacting with a set of memory-bounded agents whose memory size is upper bounded by a known value; and (3) ensures an individual return that is very close to its security value when interacting with any other set of agents. Our presentation of CMLeS is backed by a rigorous theoretical analysis, including an analysis of sample complexity wherever applicable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
A very common assumption in RL literature while dealing with MDPs in average reward setting [9, 24, 28]
 
2
\(||_\infty \) is the max norm.
 
3
Though we agree that making that assumption would simplify the algorithmic structure of CMLeS
 
Literatur
1.
Zurück zum Zitat Airiau, S., Saha, S., & Sen, S. (2007). Evolutionary tournament-based comparison of learning and non-learning algorithms for iterated games. Journal of Artificial Societies and Social Simulation, 10, 1–12. Airiau, S., Saha, S., & Sen, S. (2007). Evolutionary tournament-based comparison of learning and non-learning algorithms for iterated games. Journal of Artificial Societies and Social Simulation, 10, 1–12.
2.
3.
Zurück zum Zitat Banerjee, B., & Peng, J. (2004). Performance bounded reinforcement learning in strategic interactions. In L. Deborah, McGuinness, & G. Ferguson (Eds.), AAAI’04: Proceedings of the 19th National Conference on Artifical Intelligence (pp. 2–7). Menlo Park, CA: AAAI Press/The MIT Press. Banerjee, B., & Peng, J. (2004). Performance bounded reinforcement learning in strategic interactions. In L. Deborah, McGuinness, & G. Ferguson (Eds.),  AAAI’04: Proceedings of the 19th National Conference on Artifical Intelligence (pp. 2–7). Menlo Park, CA: AAAI Press/The MIT Press.
4.
Zurück zum Zitat Banerjee, B., Sen, S., & Peng, J. (2001). Fast concurrent reinforcement learners. In N. Bernhard (Ed.), Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (pp. 825–830). San Francisco, CA: Morgan Kaufmann. Banerjee, B., Sen, S., & Peng, J. (2001). Fast concurrent reinforcement learners. In N. Bernhard (Ed.), Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (pp. 825–830). San Francisco, CA: Morgan Kaufmann.
5.
Zurück zum Zitat Bouzy, B., & Metivier, M. (2010). Multi-agent learning experiments on repeated matrix games. In J. Furnkranz & T. Joachims (Eds.), Proceedings of the Twenty-Seventh International Conference on Machine Learning. Haifa: ICML. Bouzy, B., & Metivier, M. (2010). Multi-agent learning experiments on repeated matrix games. In J. Furnkranz & T. Joachims (Eds.), Proceedings of the Twenty-Seventh International Conference on Machine Learning. Haifa: ICML.
6.
Zurück zum Zitat Bowling, M. (2005). Convergence and no-regret in multiagent learning. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), NIPS’05: Advances in Neural Information Processing Systems (pp. 209–216). Cambridge, MA: MIT Press. Bowling, M. (2005). Convergence and no-regret in multiagent learning. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), NIPS’05: Advances in Neural Information Processing Systems (pp. 209–216). Cambridge, MA: MIT Press.
7.
Zurück zum Zitat Bowling, M., & Veloso, M. (2001a). Convergence of gradient dynamics with a variable learning rate. Procedings of the 18th International Conference on Machine Learning (pp. 27–34). Morgan Kaufmann, San Francisco, CA. Bowling, M., & Veloso, M. (2001a). Convergence of gradient dynamics with a variable learning rate. Procedings of the 18th International Conference on Machine Learning (pp. 27–34). Morgan Kaufmann, San Francisco, CA.
8.
Zurück zum Zitat Bowling, M., & Veloso, M. (2001b). Rational and convergent learning in stochastic games. In B. Nebel (Ed.), International Joint Conference on Artificial Intelligence (pp. 1021–1026). San Francisco, CA: Morgan Kaufmann. Bowling, M., & Veloso, M. (2001b). Rational and convergent learning in stochastic games. In B. Nebel (Ed.), International Joint Conference on Artificial Intelligence (pp. 1021–1026). San Francisco, CA: Morgan Kaufmann.
9.
Zurück zum Zitat Brafman, R. I., & Tennenholtz, M. (2003). R-max—a general polynomial time algorithm for near-optimal reinforcement learning. Menlo Park, CA: MIT Press. Brafman, R. I., & Tennenholtz, M. (2003). R-max—a general polynomial time algorithm for near-optimal reinforcement learning. Menlo Park, CA: MIT Press.
10.
Zurück zum Zitat Brown, G. (1951). Iterative solution to games by fictitious play. In T. C. Koopmans (Ed.), Activity analysis of production and allocation (pp. 374–376). New York, NY: Wiley. Brown, G. (1951). Iterative solution to games by fictitious play. In T. C. Koopmans (Ed.), Activity analysis of production and allocation (pp. 374–376). New York, NY: Wiley.
11.
Zurück zum Zitat Chakraborty, D., & Stone, P. (2008). Online multiagent learning against memory bounded adversaries. European Conference on Machine Learning (pp. 211–226). Antwerp, Belgium. Chakraborty, D., & Stone, P. (2008). Online multiagent learning against memory bounded adversaries. European Conference on Machine Learning (pp. 211–226). Antwerp, Belgium.
12.
Zurück zum Zitat Chakraborty, D., & Stone, P. (2010). Convergence, targeted optimality and safety in multiagent learning. In J. Furnkranz & T. Joachims (Eds.), Proceedings of the Twenty-Seventh International Conference on Machine Learning. Haifa: ICML. Chakraborty, D., & Stone, P. (2010). Convergence, targeted optimality and safety in multiagent learning. In J. Furnkranz & T. Joachims (Eds.), Proceedings of the Twenty-Seventh International Conference on Machine Learning. Haifa: ICML.
13.
Zurück zum Zitat Chen, X. & Deng, X. (2006). Settling the complexity of two-player Nash equilibrium. Proceedings of the 47th Foundations of Computer Science (FOCS) (pp. 261–272). Berkeley, CA. Chen, X. & Deng, X. (2006). Settling the complexity of two-player Nash equilibrium. Proceedings of the 47th Foundations of Computer Science (FOCS) (pp. 261–272). Berkeley, CA.
14.
Zurück zum Zitat Chevaleyre, Y., Dunne, P. E., Endriss, U., Lang, J., Lemaêtre, M., Maudet, N., et al. (2006). Issues in multiagent resource allocation. Informatica, 30, 3–31.MATH Chevaleyre, Y., Dunne, P. E., Endriss, U., Lang, J., Lemaêtre, M., Maudet, N., et al. (2006). Issues in multiagent resource allocation. Informatica, 30, 3–31.MATH
15.
Zurück zum Zitat Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In J. Mostow & C. Rich (Eds.), Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 746–752). Menlo Park: AAAI Press. Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In J. Mostow & C. Rich (Eds.), Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 746–752). Menlo Park: AAAI Press.
16.
Zurück zum Zitat Conitzer, V., & Sandholm, T. (2006). AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67, 23–43.CrossRef Conitzer, V., & Sandholm, T. (2006). AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67, 23–43.CrossRef
17.
Zurück zum Zitat Foster, D. P., & Vohra, R. V. (1993). A randomization rule for selecting forecasts. Institute for Operations Research and the Management Sciences (INFORMS), 41, 704–709.MATH Foster, D. P., & Vohra, R. V. (1993). A randomization rule for selecting forecasts. Institute for Operations Research and the Management Sciences (INFORMS), 41, 704–709.MATH
18.
Zurück zum Zitat Fudenberg, D., & Levine, D. K. (1995). Consistency and cautious fictitious play. Journal of Economic Dynamics and Control, 19, 1065–1089.CrossRefMATHMathSciNet Fudenberg, D., & Levine, D. K. (1995). Consistency and cautious fictitious play. Journal of Economic Dynamics and Control, 19, 1065–1089.CrossRefMATHMathSciNet
19.
Zurück zum Zitat Fudenberg, D., & Levine, D. K. (1999). The theory of learning in games (1st ed.). Cambridge, MA: MIT Press. Fudenberg, D., & Levine, D. K. (1999). The theory of learning in games (1st ed.). Cambridge, MA: MIT Press.
20.
Zurück zum Zitat Hannan, J. (1957). Approximation to Bayes risk in repeated plays. Contributions to the theory of games. Princeton, NJ: Princeton University Press. Hannan, J. (1957). Approximation to Bayes risk in repeated plays. Contributions to the theory of games. Princeton, NJ: Princeton University Press.
21.
Zurück zum Zitat Hart, S., & Mas-Colel, A. (2000). A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68, 1127–1150.CrossRefMATHMathSciNet Hart, S., & Mas-Colel, A. (2000). A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68, 1127–1150.CrossRefMATHMathSciNet
22.
Zurück zum Zitat Hu, J. & Wellman M.P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. Proceedings 15th International Conference on Machine Learning (pp. 242–250). Morgan Kaufmann, San Francisco, CA. Hu, J. & Wellman M.P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. Proceedings 15th International Conference on Machine Learning (pp. 242–250). Morgan Kaufmann, San Francisco, CA.
23.
Zurück zum Zitat Kaisers, M., & Tuyls, K. (2010). Frequency adjusted multi-agent Q-learning. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Volume 1– Volume 1 (pp. 309–316). Richland, SC. Kaisers, M., & Tuyls, K. (2010). Frequency adjusted multi-agent Q-learning. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Volume 1– Volume 1 (pp. 309–316). Richland, SC.
24.
Zurück zum Zitat Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proceedings of the 15th International Conference on Machine Learning (pp. 260–268). Morgan Kaufmann, San Francisco, CA. Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proceedings of the 15th International Conference on Machine Learning (pp. 260–268). Morgan Kaufmann, San Francisco, CA.
25.
Zurück zum Zitat Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the Eleventh International Conference on Machine Learning (pp. 157–163). San Francisco, CA: Morgan Kaufmann. Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the Eleventh International Conference on Machine Learning (pp. 157–163). San Francisco, CA: Morgan Kaufmann.
26.
Zurück zum Zitat Littman, M. L., & Stone, P. (2005). A polynomial-time Nash equilibrium algorithm for repeated games (pp. 55–66). Amsterdam: Elsevier. Littman, M. L., & Stone, P. (2005). A polynomial-time Nash equilibrium algorithm for repeated games (pp. 55–66). Amsterdam: Elsevier.
27.
Zurück zum Zitat Littman, M. L., & Szepesvari, C. (1996). A generalized reinforcement-learning model: Convergence and applications. In L. Saitta (Ed.), Proceedings of the 13th International Conference on Machine Learning (pp. 310–318). San Francisco, CA: Morgan Kaufmann Publishers. Littman, M. L., & Szepesvari, C. (1996). A generalized reinforcement-learning model: Convergence and applications. In L. Saitta (Ed.), Proceedings of the 13th International Conference on Machine Learning (pp. 310–318). San Francisco, CA: Morgan Kaufmann Publishers.
28.
Zurück zum Zitat Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1–3), 159–195. Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1–3), 159–195.
29.
30.
Zurück zum Zitat Osborne, M. J., & Rubinstein, A. (1994). A course in game theory. Cambridge, MA: The MIT Press.MATH Osborne, M. J., & Rubinstein, A. (1994). A course in game theory. Cambridge, MA: The MIT Press.MATH
31.
Zurück zum Zitat Pardoe, D., Chakraborty, D., & Stone, P. (2010). TacTex09: A champion bidding agent for ad auctions. In van der Hoek, Kaminka, Lesperance, Luck, & Sen (Eds.), Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010). Dunbeath: International Foundation for Autonomous Agents and Multiagent Systems. Pardoe, D., Chakraborty, D., & Stone, P. (2010). TacTex09: A champion bidding agent for ad auctions. In van der Hoek, Kaminka, Lesperance, Luck, & Sen (Eds.), Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010). Dunbeath: International Foundation for Autonomous Agents and Multiagent Systems.
32.
Zurück zum Zitat Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (pp. 817–822). Edinburgh, Scotland. Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (pp. 817–822). Edinburgh, Scotland.
33.
Zurück zum Zitat Powers, R., Shoham, Y., & Vu, T. (2007). A general criterion and an algorithmic framework for learning in multi-agent systems. Machine Learning, 67, 45–76.CrossRef Powers, R., Shoham, Y., & Vu, T. (2007). A general criterion and an algorithmic framework for learning in multi-agent systems. Machine Learning, 67, 45–76.CrossRef
34.
Zurück zum Zitat Puterman, M. L. (1994). Markov Decision processes: Discrete stochastic dynamic programming. New York, NY: Wiley.CrossRefMATH Puterman, M. L. (1994). Markov Decision processes: Discrete stochastic dynamic programming. New York, NY: Wiley.CrossRefMATH
35.
Zurück zum Zitat Sela, A., & Herreiner, D. K. (1997). Fictitious play in coordination games. Discussion paper serie B. University of Bonn, Germany. Sela, A., & Herreiner, D. K. (1997). Fictitious play in coordination games. Discussion paper serie B. University of Bonn, Germany.
36.
Zurück zum Zitat Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In C. Boutilier & M. Goldszmidt (Eds.), Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (pp. 541–548). San Francisco, CA: Morgan Kaufmann Publishers. Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In C. Boutilier & M. Goldszmidt (Eds.), Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (pp. 541–548). San Francisco, CA: Morgan Kaufmann Publishers.
37.
Zurück zum Zitat Southey, F., Hoehn, B., & Holte, R. (2008). Effective short-term opponent exploitation in simplified poker. Machine Learning, 74(2), 159–189.CrossRefMATH Southey, F., Hoehn, B., & Holte, R. (2008). Effective short-term opponent exploitation in simplified poker. Machine Learning, 74(2), 159–189.CrossRefMATH
38.
Zurück zum Zitat Stone, P., Dresner, K., Fidelman, P., Kohl, N., Kuhlmann, G., Sridharan, M., et al. (2005). The UT Austin Villa 2005 RoboCup four-legged team: Technical report. The University of Texas, Austin. Stone, P., Dresner, K., Fidelman, P., Kohl, N., Kuhlmann, G., Sridharan, M., et al. (2005). The UT Austin Villa 2005 RoboCup four-legged team: Technical report. The University of Texas, Austin.
39.
Zurück zum Zitat Stone, P., & Littman, M. L. (2001). Implicit negotiation in repeated games. In J.-J. Meyer & M. Tambe (Eds.), Pre-Proceedings of the Eighth International Workshop on Agent Theories, Architectures, and Languages (ATAL-2001) (pp. 96–105). Heidelberg: Springer. Stone, P., & Littman, M. L. (2001). Implicit negotiation in repeated games. In J.-J. Meyer & M. Tambe (Eds.), Pre-Proceedings of the Eighth International Workshop on Agent Theories, Architectures, and Languages (ATAL-2001) (pp. 96–105). Heidelberg: Springer.
40.
Zurück zum Zitat Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345–383.CrossRef Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345–383.CrossRef
41.
Zurück zum Zitat Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press.
42.
Zurück zum Zitat Sykulski, A. M., Chapman, A. C., de Cote, E. M., & Jennings, N. R. (2010). EA2: The winning strategy for the inaugural lemonade stand game tournament. In H. Coelho, R. Studer, & M. Wooldridge (Eds.), ECAI 2010: 19th European Conference on Artificial Intelligence (pp. 209–214). Amsterdam: IOS Press. Sykulski, A. M., Chapman, A. C., de Cote, E. M., & Jennings, N. R. (2010). EA2: The winning strategy for the inaugural lemonade stand game tournament. In H. Coelho, R. Studer, & M. Wooldridge (Eds.), ECAI 2010: 19th European Conference on Artificial Intelligence (pp. 209–214). Amsterdam: IOS Press.
43.
Zurück zum Zitat Tuyls, K., & Parson, S. (2007). What evolutionary game theory tells us about multiagent learning. Artificial Intelligence, 171(7), 406–416.CrossRefMATHMathSciNet Tuyls, K., & Parson, S. (2007). What evolutionary game theory tells us about multiagent learning. Artificial Intelligence, 171(7), 406–416.CrossRefMATHMathSciNet
44.
Zurück zum Zitat Van Dyke Parunak, H. (1999). Industrial and practical applications of DAI (pp. 377–421). Cambridge, MA: The MIP Press. Van Dyke Parunak, H. (1999). Industrial and practical applications of DAI (pp. 377–421). Cambridge, MA: The MIP Press.
45.
Zurück zum Zitat Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 3, 279–292. Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 3, 279–292.
46.
Zurück zum Zitat Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1, 67–82.CrossRef Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1, 67–82.CrossRef
47.
Zurück zum Zitat Wooldridge, M. J. (2001). Introduction to multiagent systems. New York, NY: Wiley. Wooldridge, M. J. (2001). Introduction to multiagent systems. New York, NY: Wiley.
Metadaten
Titel
Multiagent learning in the presence of memory-bounded agents
verfasst von
Doran Chakraborty
Peter Stone
Publikationsdatum
01.03.2014
Verlag
Springer US
Erschienen in
Autonomous Agents and Multi-Agent Systems / Ausgabe 2/2014
Print ISSN: 1387-2532
Elektronische ISSN: 1573-7454
DOI
https://doi.org/10.1007/s10458-013-9222-4

Weitere Artikel der Ausgabe 2/2014

Autonomous Agents and Multi-Agent Systems 2/2014 Zur Ausgabe