Abstract
We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Albus, J.S. (1975). A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Dynamic Systems, Measurement and Control, 97, 220–227.
Asada, M., Uchibe, E., Noda, S., Tawaratsumida, S., & Hosoda, K. (1994). A vision-based reinforcement learning for coordination of soccer playing behaviors. Proceedings of AAAI-94 Workshop on AI and A-life and Entertainment (pp. 16–21).
Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning (Technical Report CMU-CS-94-163), Carnegie Mellon University, Pittsburgh.
Baluja, S., & Caruana, R. (1995). Removing the genetics from the standard genetic algorithm. In A. Prieditis, & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 38–46). San Francisco, CA: Morgan Kaufmann Publishers.
Bertsekas, D.P., & Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Belmont, MA: Athena Scientific.
Cramer, N.L. (1985). A representation for the adaptive generation of simple sequential programs. In J. Grefenstette (Ed.), Proceedings of an International Conference on Genetic Algorithms and their Applications (pp. 183–187). Hillsdale, NJ: Lawrence Erlbaum Associates.
Crites, R., & Barto, A. (1996). Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 1017–1023). Cambridge, MA: MIT Press.
Dickmanns, D., Schmidhuber, J., & Winklhofer, A. (1987). Der genetische Algorithmus: Eine Implementierung in Prolog. Fortgeschrittenenpraktikum, Institut für Informatik, Lehrstuhl Prof. Radig, Technische Universität München.
Gallant, S.I. (1993). Neural Network Learning and Expert Systems. Cambridge, MA: MIT Press.
Koza, J.R. (1992). Genetic Programming—On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press.
Levin, L.A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.
Levin, L.A. (1984). Randomness conservation inequalities: Information and independence in mathematical theories. Information and Control, 61, 15–37.
Li, M., & Vitányi, P.M.B. (1993). An Introduction to Kolmogorov Complexity and its Applications. New York, NY: Springer-Verlag.
Lin, L.J. (1993). Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. In A. Prieditis, & S. Russell (Eds.), Machine Learning: Proceedings of the Eleventh International Conference (pp. 157–163). San Francisco, CA: Morgan Kaufmann Publishers.
Luke, S., Hohn, C., Farris, J., Jackson, G., & Hendler, J. (1997). Co-evolving soccer softbot team coordination with genetic programming. Proceedings of the First International Workshop on RoboCup, at the International Joint Conference on Artificial Intelligence (IJCAI-97).
Matsubara, H., Noda, I., & Hiraki, K. (1996). Learning of cooperative actions in multi-agent systems: A case study of pass play in soccer. In S. Sen (Ed.), Working Notes for the AAAI-96 Spring Symposium on Adaptation, Coevolution and Learning in Multi-agent Systems (pp. 63–67). Menlo Park, CA: AAAI Press.
Nadella, R., & Sen, S. (1996). Correlating internal parameters and external performance: Learning soccer agents. In G. Weiss (Ed.), Distributed Artificial Intelligence Meets Machine Learning. Learning in Multi-Agent Environments (pp. 137–150). Berlin: Springer-Verlag.
Nowlan, S.J., & Hinton, G.E. (1992). Simplifying neural networks by soft weight sharing. Neural Computation, 4, 173–193.
Peng, J., & Williams, R. (1996). Incremental multi-step Q-learning. Machine Learning, 22, 283–290.
Sahota, M. (1993). Real-time intelligent behaviour in dynamic environments: Soccer-playing robots. Master's thesis, University of British Columbia.
Sałustowicz, R.P., & Schmidhuber, J. (1997). Probabilistic incremental program evolution. Evolutionary Compu-tation, 5(2), 123–141.
Sałustowicz, R.P., Wiering, M.A., & Schmidhuber, J. (1997a). Evolving soccer strategies. Proceedings of the Fourth International Conference on Neural Information Processing (ICONIP'97) (pp. 502–506). Singapore: Springer-Verlag.
Sałustowicz, R.P., Wiering, M.A., & Schmidhuber, J. (1997b). On learning soccer strategies. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proceedings of the Seventh International Conference on Artificial Neural Networks (ICANN'97), volume 1327 of Lecture Notes in Computer Science (pp. 769–774). Berlin Heidelberg: Springer-Verlag.
Schmidhuber, J. (1997a). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5), 857–873.
Schmidhuber, J. (1997b). A general method for incremental self-improvement and multi-agent learning in unrestricted environments. In X. Yao (Ed.), Evolutionary Computation: Theory and Applications. Singapore: Scientific Publ. Co., in press.
Schmidhuber, J., Zhao, J., & Schraudolph, N. (1997a). Reinforcement learning with self-modifying policies. In S. Thrun & L. Pratt (Eds.), Learning to Learn (pp. 293–309). Boston, MA: Kluwer.
Schmidhuber, J., Zhao, J., & Wiering, M. (1997b). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28, 105–130.
Solomonoff, R. (1986). An application of algorithmic probability to problems in artificial intelligence. In L.N. Kanal & J.F. Lemmer (Eds.), Uncertainty in Artificial Intelligence (pp. 473–491). Elsevier Science Publishers.
Stone, P., & Veloso, M. (1996a). Beating a defender in robotic soccer: Memory-based learning of a continuous function. In G. Tesauro, D.S. Touretzky, & T.K. Leen (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 896–902). Cambridge, MA: MIT Press.
Stone, P., & Veloso, M. (1996b). A layered approach to learning client behaviors in the robocup soccer server. Applied Artificial Intelligence (AAI), 1998, to appear.
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Sutton, R.S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D.S. Touretzky, M.C. Mozer, & M.E. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 1038–1045). Cambridge, MA: MIT Press.
Sutton, R.S. (1997). Personal communication at the Seventh International Conference on Artificial Neural Networks (ICANN'97).
Tesauro, G. (1994). TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2), 215–219.
Versino, C., & Gambardella, L.M. (1997). Learning real team solutions. In G. Weiss (Ed.), DAI Meets Machine Learning, volume 1221 of Lecture Notes in Artificial Intelligence (pp. 40–61). Berlin: Springer-Verlag.
Watkins, C. (1989). Learning from Delayed Rewards. PhD thesis, King's College, Cambridge.
Weiss, G. (1996). Adaptation and learning in multi-agent systems: Some remarks and a bibliography. In G.Weiss & S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems, volume 1042 of Lecture Notes in Artificial Intelligence (pp. 1–21). Berlin Heidelberg: Springer-Verlag.
Widrow, B., & Hoff, M.E. (1960). Adaptive switching circuits. 1960 IRE WESCON Convention Record (Vol. 4, pp. 96–104). New York: IRE. Reprinted in Anderson and Rosenfeld (1988).
Wiering, M.A., & Schmidhuber, J. (1996). Solving POMDPs with Levin search and EIRA. In L. Saitta (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (pp. 534–542). San Francisco, CA: Morgan Kaufmann Publishers.
Wiering, M.A., & Schmidhuber, J. (1997). Fast online Q(λ) (Technical Report IDSIA-21-97), IDSIA,Lugano, Switzerland.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sałustowicz, R.P., Wiering, M.A. & Schmidhuber, J. Learning Team Strategies: Soccer Case Studies. Machine Learning 33, 263–282 (1998). https://doi.org/10.1023/A:1007570708568
Issue Date:
DOI: https://doi.org/10.1023/A:1007570708568