Learning Team Strategies: Soccer Case Studies

Sałustowicz, Rafał P.; Wiering, Marco A.; Schmidhuber, Jürgen

doi:10.1023/A:1007570708568

Learning Team Strategies: Soccer Case Studies

Published: November 1998

Volume 33, pages 263–282, (1998)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Learning Team Strategies: Soccer Case Studies

Download PDF

Rafał P. Sałustowicz¹,
Marco A. Wiering¹ &
Jürgen Schmidhuber¹

925 Accesses
26 Citations
Explore all metrics

Abstract

We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.

Avoid common mistakes on your manuscript.

References

Albus, J.S. (1975). A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Dynamic Systems, Measurement and Control, 97, 220–227.
Google Scholar
Asada, M., Uchibe, E., Noda, S., Tawaratsumida, S., & Hosoda, K. (1994). A vision-based reinforcement learning for coordination of soccer playing behaviors. Proceedings of AAAI-94 Workshop on AI and A-life and Entertainment (pp. 16–21).
Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning (Technical Report CMU-CS-94-163), Carnegie Mellon University, Pittsburgh.
Google Scholar
Baluja, S., & Caruana, R. (1995). Removing the genetics from the standard genetic algorithm. In A. Prieditis, & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 38–46). San Francisco, CA: Morgan Kaufmann Publishers.
Google Scholar
Bertsekas, D.P., & Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Belmont, MA: Athena Scientific.
Google Scholar
Cramer, N.L. (1985). A representation for the adaptive generation of simple sequential programs. In J. Grefenstette (Ed.), Proceedings of an International Conference on Genetic Algorithms and their Applications (pp. 183–187). Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Crites, R., & Barto, A. (1996). Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 1017–1023). Cambridge, MA: MIT Press.
Google Scholar
Dickmanns, D., Schmidhuber, J., & Winklhofer, A. (1987). Der genetische Algorithmus: Eine Implementierung in Prolog. Fortgeschrittenenpraktikum, Institut für Informatik, Lehrstuhl Prof. Radig, Technische Universität München.
Gallant, S.I. (1993). Neural Network Learning and Expert Systems. Cambridge, MA: MIT Press.
Google Scholar
Koza, J.R. (1992). Genetic Programming—On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press.
Google Scholar
Levin, L.A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.
Google Scholar
Levin, L.A. (1984). Randomness conservation inequalities: Information and independence in mathematical theories. Information and Control, 61, 15–37.
Google Scholar
Li, M., & Vitányi, P.M.B. (1993). An Introduction to Kolmogorov Complexity and its Applications. New York, NY: Springer-Verlag.
Google Scholar
Lin, L.J. (1993). Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
Google Scholar
Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. In A. Prieditis, & S. Russell (Eds.), Machine Learning: Proceedings of the Eleventh International Conference (pp. 157–163). San Francisco, CA: Morgan Kaufmann Publishers.
Google Scholar
Luke, S., Hohn, C., Farris, J., Jackson, G., & Hendler, J. (1997). Co-evolving soccer softbot team coordination with genetic programming. Proceedings of the First International Workshop on RoboCup, at the International Joint Conference on Artificial Intelligence (IJCAI-97).
Matsubara, H., Noda, I., & Hiraki, K. (1996). Learning of cooperative actions in multi-agent systems: A case study of pass play in soccer. In S. Sen (Ed.), Working Notes for the AAAI-96 Spring Symposium on Adaptation, Coevolution and Learning in Multi-agent Systems (pp. 63–67). Menlo Park, CA: AAAI Press.
Google Scholar
Nadella, R., & Sen, S. (1996). Correlating internal parameters and external performance: Learning soccer agents. In G. Weiss (Ed.), Distributed Artificial Intelligence Meets Machine Learning. Learning in Multi-Agent Environments (pp. 137–150). Berlin: Springer-Verlag.
Google Scholar
Nowlan, S.J., & Hinton, G.E. (1992). Simplifying neural networks by soft weight sharing. Neural Computation, 4, 173–193.
Google Scholar
Peng, J., & Williams, R. (1996). Incremental multi-step Q-learning. Machine Learning, 22, 283–290.
Google Scholar
Sahota, M. (1993). Real-time intelligent behaviour in dynamic environments: Soccer-playing robots. Master's thesis, University of British Columbia.
Sałustowicz, R.P., & Schmidhuber, J. (1997). Probabilistic incremental program evolution. Evolutionary Compu-tation, 5(2), 123–141.
Google Scholar
Sałustowicz, R.P., Wiering, M.A., & Schmidhuber, J. (1997a). Evolving soccer strategies. Proceedings of the Fourth International Conference on Neural Information Processing (ICONIP'97) (pp. 502–506). Singapore: Springer-Verlag.
Google Scholar
Sałustowicz, R.P., Wiering, M.A., & Schmidhuber, J. (1997b). On learning soccer strategies. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proceedings of the Seventh International Conference on Artificial Neural Networks (ICANN'97), volume 1327 of Lecture Notes in Computer Science (pp. 769–774). Berlin Heidelberg: Springer-Verlag.
Google Scholar
Schmidhuber, J. (1997a). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5), 857–873.
Google Scholar
Schmidhuber, J. (1997b). A general method for incremental self-improvement and multi-agent learning in unrestricted environments. In X. Yao (Ed.), Evolutionary Computation: Theory and Applications. Singapore: Scientific Publ. Co., in press.
Schmidhuber, J., Zhao, J., & Schraudolph, N. (1997a). Reinforcement learning with self-modifying policies. In S. Thrun & L. Pratt (Eds.), Learning to Learn (pp. 293–309). Boston, MA: Kluwer.
Google Scholar
Schmidhuber, J., Zhao, J., & Wiering, M. (1997b). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28, 105–130.
Google Scholar
Solomonoff, R. (1986). An application of algorithmic probability to problems in artificial intelligence. In L.N. Kanal & J.F. Lemmer (Eds.), Uncertainty in Artificial Intelligence (pp. 473–491). Elsevier Science Publishers.
Stone, P., & Veloso, M. (1996a). Beating a defender in robotic soccer: Memory-based learning of a continuous function. In G. Tesauro, D.S. Touretzky, & T.K. Leen (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 896–902). Cambridge, MA: MIT Press.
Google Scholar
Stone, P., & Veloso, M. (1996b). A layered approach to learning client behaviors in the robocup soccer server. Applied Artificial Intelligence (AAI), 1998, to appear.
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Google Scholar
Sutton, R.S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D.S. Touretzky, M.C. Mozer, & M.E. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 1038–1045). Cambridge, MA: MIT Press.
Google Scholar
Sutton, R.S. (1997). Personal communication at the Seventh International Conference on Artificial Neural Networks (ICANN'97).
Tesauro, G. (1994). TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2), 215–219.
Google Scholar
Versino, C., & Gambardella, L.M. (1997). Learning real team solutions. In G. Weiss (Ed.), DAI Meets Machine Learning, volume 1221 of Lecture Notes in Artificial Intelligence (pp. 40–61). Berlin: Springer-Verlag.
Google Scholar
Watkins, C. (1989). Learning from Delayed Rewards. PhD thesis, King's College, Cambridge.
Google Scholar
Weiss, G. (1996). Adaptation and learning in multi-agent systems: Some remarks and a bibliography. In G.Weiss & S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems, volume 1042 of Lecture Notes in Artificial Intelligence (pp. 1–21). Berlin Heidelberg: Springer-Verlag.
Google Scholar
Widrow, B., & Hoff, M.E. (1960). Adaptive switching circuits. 1960 IRE WESCON Convention Record (Vol. 4, pp. 96–104). New York: IRE. Reprinted in Anderson and Rosenfeld (1988).
Google Scholar
Wiering, M.A., & Schmidhuber, J. (1996). Solving POMDPs with Levin search and EIRA. In L. Saitta (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (pp. 534–542). San Francisco, CA: Morgan Kaufmann Publishers.
Google Scholar
Wiering, M.A., & Schmidhuber, J. (1997). Fast online Q(λ) (Technical Report IDSIA-21-97), IDSIA,Lugano, Switzerland.
Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Corso Elvezia 36, 6900, Lugano, Switzerland
Rafał P. Sałustowicz, Marco A. Wiering & Jürgen Schmidhuber

Authors

Rafał P. Sałustowicz
View author publications
You can also search for this author in PubMed Google Scholar
Marco A. Wiering
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sałustowicz, R.P., Wiering, M.A. & Schmidhuber, J. Learning Team Strategies: Soccer Case Studies. Machine Learning 33, 263–282 (1998). https://doi.org/10.1023/A:1007570708568

Download citation

Issue Date: November 1998
DOI: https://doi.org/10.1023/A:1007570708568

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning Team Strategies: Soccer Case Studies

Abstract

Article PDF

Similar content being viewed by others

On Diversity, Teaming, and Hierarchical Policies: Observations from the Keepaway Soccer Task

Scaling multi-agent reinforcement learning to full 11 versus 11 simulated robotic football

Training multi-agent teams from zero knowledge with the competitive coevolutionary team-based particle swarm optimiser

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Learning Team Strategies: Soccer Case Studies

Abstract

Article PDF

Similar content being viewed by others

On Diversity, Teaming, and Hierarchical Policies: Observations from the Keepaway Soccer Task

Scaling multi-agent reinforcement learning to full 11 versus 11 simulated robotic football

Training multi-agent teams from zero knowledge with the competitive coevolutionary team-based particle swarm optimiser

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation