Learning to Play Chess Using Temporal Differences

Baxter, Jonathan; Tridgell, Andrew; Weaver, Lex

doi:10.1023/A:1007634325138

Learning to Play Chess Using Temporal Differences

Published: September 2000

Volume 40, pages 243–263, (2000)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Learning to Play Chess Using Temporal Differences

Download PDF

Jonathan Baxter¹,
Andrew Tridgell² &
Lex Weaver³

2382 Accesses
78 Citations
Explore all metrics

Abstract

In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program “KnightCap” used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.

References

Beal, D. F. & Smith, M. C. (1997). Learning piece values using temporal differences. Journal of The International Computer Chess Association.
Bertsekas, D. P. & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
Marsland, T. A. & Schaeffer, J. (1990). Computers, Chess and Cognition. Springer Verlag.
Plaat, A., Schaeffer, J., Pijls, W., & de Bruin, A. (1996). Best-first fixed-depth minmax algorithms. Artificial Intelligence, 87, 255–293.
Google Scholar
Pollack, J., Blair, A., & Land, M. (1996). Coevolution of a backgammon player. In Proceedings of the Fifth Artificial Life Conference, Nara, Japan.
Google Scholar
Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3, 210–229.
Google Scholar
Schaeffer, J. (1989). The history of heuristic and alpha-beta search enhancements in practice. IEEE Transactions on Pattern Analysis and Machine Learning, 11(11), 1203–1212.
Google Scholar
Schraudolph, N., Dayan, P., & Sejnowski, T. (1994). Temporal difference learning of position evaluation in the game of go. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in Neural Information Processing Systems 6. San Fransisco: Morgan Kaufmann.
Google Scholar
Sutton, R. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9–44.
Google Scholar
Sutton, R. S. & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge MA: MIT Press. ISBN 0–262–19398–1.
Google Scholar
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–278.
Google Scholar
Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215–219.
Google Scholar
Thrun, S. (1995). Learning to play the game of chess. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in Neural Information Processing Systems 7. San Fransisco: Morgan Kaufmann.
Google Scholar
Tridgell, A. (1997). KnightCap—A parallel chess program on the AP1000+. In Proceedings of the Seventh Fujitsu Parallel Computing Workshop, Canberra, Australia. ftp://samba.anu.edu.au/tridge/knightcap_pcw97.ps.gz source code: http://wwwsysneg.anu.edu.au/lsg.
Tsitsiklis, J. N. & Roy, B. V. (1997). An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674–690.
Google Scholar
Walker, S., Lister, R., & Downs, T. (1993). On self-learning patterns in the othello board game by the method of temporal differences. In C. Rowles, H. Liu, & N. Foo (Eds.), Proceedings of the 6th Australian Joint Conference on Artificial Intelligence (pp. 328–333). Melbourne: World Scientific.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering, Australian National University, 0200, Australia
Jonathan Baxter
Department of Computer Science, Australian National University, 0200, Australia
Andrew Tridgell
Department of Computer Science, Australian National University, 0200, Australia
Lex Weaver

Authors

Jonathan Baxter
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Tridgell
View author publications
You can also search for this author in PubMed Google Scholar
Lex Weaver
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baxter, J., Tridgell, A. & Weaver, L. Learning to Play Chess Using Temporal Differences. Machine Learning 40, 243–263 (2000). https://doi.org/10.1023/A:1007634325138

Download citation

Issue Date: September 2000
DOI: https://doi.org/10.1023/A:1007634325138

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning to Play Chess Using Temporal Differences

Abstract

Article PDF

Similar content being viewed by others

AlphaZero

Enhancing Stockfish: A Chess Engine Tailored for Training Human Players

Artificial Intelligence for Games

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Learning to Play Chess Using Temporal Differences

Abstract

Article PDF

Similar content being viewed by others

AlphaZero

Enhancing Stockfish: A Chess Engine Tailored for Training Human Players

Artificial Intelligence for Games

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation