nach oben

Autonomous Agents and Multi-Agent Systems

Erschienen in:

01.03.2014

Multiagent learning in the presence of memory-bounded agents

verfasst von: Doran Chakraborty, Peter Stone

Erschienen in: Autonomous Agents and Multi-Agent Systems | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In recent years, great strides have been made towards creating autonomous agents that can learn via interaction with their environment. When considering just an individual agent, it is often appropriate to model the world as being stationary, meaning that the same action from the same state will always yield the same (possibly stochastic) effects. However, in the presence of other independent agents, the environment is not stationary: an action’s effects may depend on the actions of the other agents. This non-stationarity poses the primary challenge of multiagent learning and comprises the main reason that it is best considered distinctly from single agent learning. The multiagent learning problem is often studied in the stylized settings provided by repeated matrix games. The goal of this article is to introduce a novel multiagent learning algorithm for such a setting, called Convergence with Model Learning and Safety (or CMLeS), that achieves a new set of objectives which have not been previously achieved. Specifically, CMLeS is the first multiagent learning algorithm to achieve the following three objectives: (1) converges to following a Nash equilibrium joint-policy in self-play; (2) achieves close to the best response when interacting with a set of memory-bounded agents whose memory size is upper bounded by a known value; and (3) ensures an individual return that is very close to its security value when interacting with any other set of agents. Our presentation of CMLeS is backed by a rigorous theoretical analysis, including an analysis of sample complexity wherever applicable.

Vorheriger Artikel Interacting behavioral Petri nets analysis for distributed causal model-based diagnosis

Nächster Artikel A model driven engineering process of platform neutral agents for ambient intelligence devices

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

A very common assumption in RL literature while dealing with MDPs in average reward setting [9, 24, 28]

\(||_\infty \) is the max norm.

Though we agree that making that assumption would simplify the algorithmic structure of CMLeS

http://gamut.stanford.edu

Airiau, S., Saha, S., & Sen, S. (2007). Evolutionary tournament-based comparison of learning and non-learning algorithms for iterated games. Journal of Artificial Societies and Social Simulation, 10, 1–12.

Aumann, R. (1974). Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1, 67–96.CrossRefMATHMathSciNet

Banerjee, B., & Peng, J. (2004). Performance bounded reinforcement learning in strategic interactions. In L. Deborah, McGuinness, & G. Ferguson (Eds.), AAAI’04: Proceedings of the 19th National Conference on Artifical Intelligence (pp. 2–7). Menlo Park, CA: AAAI Press/The MIT Press.

Banerjee, B., Sen, S., & Peng, J. (2001). Fast concurrent reinforcement learners. In N. Bernhard (Ed.), Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (pp. 825–830). San Francisco, CA: Morgan Kaufmann.

Bouzy, B., & Metivier, M. (2010). Multi-agent learning experiments on repeated matrix games. In J. Furnkranz & T. Joachims (Eds.), Proceedings of the Twenty-Seventh International Conference on Machine Learning. Haifa: ICML.

Bowling, M. (2005). Convergence and no-regret in multiagent learning. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), NIPS’05: Advances in Neural Information Processing Systems (pp. 209–216). Cambridge, MA: MIT Press.

Bowling, M., & Veloso, M. (2001a). Convergence of gradient dynamics with a variable learning rate. Procedings of the 18th International Conference on Machine Learning (pp. 27–34). Morgan Kaufmann, San Francisco, CA.

Bowling, M., & Veloso, M. (2001b). Rational and convergent learning in stochastic games. In B. Nebel (Ed.), International Joint Conference on Artificial Intelligence (pp. 1021–1026). San Francisco, CA: Morgan Kaufmann.

Brafman, R. I., & Tennenholtz, M. (2003). R-max—a general polynomial time algorithm for near-optimal reinforcement learning. Menlo Park, CA: MIT Press.

10.

Brown, G. (1951). Iterative solution to games by fictitious play. In T. C. Koopmans (Ed.), Activity analysis of production and allocation (pp. 374–376). New York, NY: Wiley.

11.

Chakraborty, D., & Stone, P. (2008). Online multiagent learning against memory bounded adversaries. European Conference on Machine Learning (pp. 211–226). Antwerp, Belgium.

12.

Chakraborty, D., & Stone, P. (2010). Convergence, targeted optimality and safety in multiagent learning. In J. Furnkranz & T. Joachims (Eds.), Proceedings of the Twenty-Seventh International Conference on Machine Learning. Haifa: ICML.

13.

Chen, X. & Deng, X. (2006). Settling the complexity of two-player Nash equilibrium. Proceedings of the 47th Foundations of Computer Science (FOCS) (pp. 261–272). Berkeley, CA.

14.

Chevaleyre, Y., Dunne, P. E., Endriss, U., Lang, J., Lemaêtre, M., Maudet, N., et al. (2006). Issues in multiagent resource allocation. Informatica, 30, 3–31.MATH

15.

Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In J. Mostow & C. Rich (Eds.), Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 746–752). Menlo Park: AAAI Press.

16.

Conitzer, V., & Sandholm, T. (2006). AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67, 23–43.CrossRef

17.

Foster, D. P., & Vohra, R. V. (1993). A randomization rule for selecting forecasts. Institute for Operations Research and the Management Sciences (INFORMS), 41, 704–709.MATH

18.

Fudenberg, D., & Levine, D. K. (1995). Consistency and cautious fictitious play. Journal of Economic Dynamics and Control, 19, 1065–1089.CrossRefMATHMathSciNet

19.

Fudenberg, D., & Levine, D. K. (1999). The theory of learning in games (1st ed.). Cambridge, MA: MIT Press.

20.

Hannan, J. (1957). Approximation to Bayes risk in repeated plays. Contributions to the theory of games. Princeton, NJ: Princeton University Press.

21.

Hart, S., & Mas-Colel, A. (2000). A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68, 1127–1150.CrossRefMATHMathSciNet

22.

Hu, J. & Wellman M.P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. Proceedings 15th International Conference on Machine Learning (pp. 242–250). Morgan Kaufmann, San Francisco, CA.

23.

Kaisers, M., & Tuyls, K. (2010). Frequency adjusted multi-agent Q-learning. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Volume 1– Volume 1 (pp. 309–316). Richland, SC.

24.

Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proceedings of the 15th International Conference on Machine Learning (pp. 260–268). Morgan Kaufmann, San Francisco, CA.

25.

Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the Eleventh International Conference on Machine Learning (pp. 157–163). San Francisco, CA: Morgan Kaufmann.

26.

Littman, M. L., & Stone, P. (2005). A polynomial-time Nash equilibrium algorithm for repeated games (pp. 55–66). Amsterdam: Elsevier.

27.

Littman, M. L., & Szepesvari, C. (1996). A generalized reinforcement-learning model: Convergence and applications. In L. Saitta (Ed.), Proceedings of the 13th International Conference on Machine Learning (pp. 310–318). San Francisco, CA: Morgan Kaufmann Publishers.

28.

Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1–3), 159–195.

29.

Nash, J. F, Jr. (1950). Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, 36, 48–49.CrossRefMATHMathSciNet

30.

Osborne, M. J., & Rubinstein, A. (1994). A course in game theory. Cambridge, MA: The MIT Press.MATH

31.

Pardoe, D., Chakraborty, D., & Stone, P. (2010). TacTex09: A champion bidding agent for ad auctions. In van der Hoek, Kaminka, Lesperance, Luck, & Sen (Eds.), Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010). Dunbeath: International Foundation for Autonomous Agents and Multiagent Systems.

32.

Powers, R., & Shoham, Y. (2005). Learning against opponents with bounded memory. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (pp. 817–822). Edinburgh, Scotland.

33.

Powers, R., Shoham, Y., & Vu, T. (2007). A general criterion and an algorithmic framework for learning in multi-agent systems. Machine Learning, 67, 45–76.CrossRef

34.

Puterman, M. L. (1994). Markov Decision processes: Discrete stochastic dynamic programming. New York, NY: Wiley.CrossRefMATH

35.

Sela, A., & Herreiner, D. K. (1997). Fictitious play in coordination games. Discussion paper serie B. University of Bonn, Germany.

36.

Singh, S., Kearns, M., & Mansour, Y. (2000). Nash convergence of gradient dynamics in general-sum games. In C. Boutilier & M. Goldszmidt (Eds.), Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (pp. 541–548). San Francisco, CA: Morgan Kaufmann Publishers.

37.

Southey, F., Hoehn, B., & Holte, R. (2008). Effective short-term opponent exploitation in simplified poker. Machine Learning, 74(2), 159–189.CrossRefMATH

38.

Stone, P., Dresner, K., Fidelman, P., Kohl, N., Kuhlmann, G., Sridharan, M., et al. (2005). The UT Austin Villa 2005 RoboCup four-legged team: Technical report. The University of Texas, Austin.

39.

Stone, P., & Littman, M. L. (2001). Implicit negotiation in repeated games. In J.-J. Meyer & M. Tambe (Eds.), Pre-Proceedings of the Eighth International Workshop on Agent Theories, Architectures, and Languages (ATAL-2001) (pp. 96–105). Heidelberg: Springer.

40.

Stone, P., & Veloso, M. (2000). Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8(3), 345–383.CrossRef

41.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press.

42.

Sykulski, A. M., Chapman, A. C., de Cote, E. M., & Jennings, N. R. (2010). EA2: The winning strategy for the inaugural lemonade stand game tournament. In H. Coelho, R. Studer, & M. Wooldridge (Eds.), ECAI 2010: 19th European Conference on Artificial Intelligence (pp. 209–214). Amsterdam: IOS Press.

43.

Tuyls, K., & Parson, S. (2007). What evolutionary game theory tells us about multiagent learning. Artificial Intelligence, 171(7), 406–416.CrossRefMATHMathSciNet

44.

Van Dyke Parunak, H. (1999). Industrial and practical applications of DAI (pp. 377–421). Cambridge, MA: The MIP Press.

45.

Watkins, C. J. C. H., & Dayan, P. D. (1992). Q-learning. Machine Learning, 3, 279–292.

46.

Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1, 67–82.CrossRef

47.

Wooldridge, M. J. (2001). Introduction to multiagent systems. New York, NY: Wiley.

Titel: Multiagent learning in the presence of memory-bounded agents
verfasst von: Doran Chakraborty
Peter Stone
Publikationsdatum: 01.03.2014
Verlag: Springer US
Erschienen in: Autonomous Agents and Multi-Agent Systems / Ausgabe 2/2014
Print ISSN: 1387-2532
Elektronische ISSN: 1573-7454
DOI: https://doi.org/10.1007/s10458-013-9222-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2014

Efficient bidding strategies for Cliff-Edge problems

Interacting behavioral Petri nets analysis for distributed causal model-based diagnosis

Computational complexity and approximability of social welfare optimization in multiagent resource allocation

A model driven engineering process of platform neutral agents for ambient intelligence devices