Abstract
In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program “KnightCap” used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Beal, D. F. & Smith, M. C. (1997). Learning piece values using temporal differences. Journal of The International Computer Chess Association.
Bertsekas, D. P. & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
Marsland, T. A. & Schaeffer, J. (1990). Computers, Chess and Cognition. Springer Verlag.
Plaat, A., Schaeffer, J., Pijls, W., & de Bruin, A. (1996). Best-first fixed-depth minmax algorithms. Artificial Intelligence, 87, 255–293.
Pollack, J., Blair, A., & Land, M. (1996). Coevolution of a backgammon player. In Proceedings of the Fifth Artificial Life Conference, Nara, Japan.
Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3, 210–229.
Schaeffer, J. (1989). The history of heuristic and alpha-beta search enhancements in practice. IEEE Transactions on Pattern Analysis and Machine Learning, 11(11), 1203–1212.
Schraudolph, N., Dayan, P., & Sejnowski, T. (1994). Temporal difference learning of position evaluation in the game of go. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in Neural Information Processing Systems 6. San Fransisco: Morgan Kaufmann.
Sutton, R. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9–44.
Sutton, R. S. & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge MA: MIT Press. ISBN 0–262–19398–1.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–278.
Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215–219.
Thrun, S. (1995). Learning to play the game of chess. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in Neural Information Processing Systems 7. San Fransisco: Morgan Kaufmann.
Tridgell, A. (1997). KnightCap—A parallel chess program on the AP1000+. In Proceedings of the Seventh Fujitsu Parallel Computing Workshop, Canberra, Australia. ftp://samba.anu.edu.au/tridge/knightcap_pcw97.ps.gz source code: http://wwwsysneg.anu.edu.au/lsg.
Tsitsiklis, J. N. & Roy, B. V. (1997). An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674–690.
Walker, S., Lister, R., & Downs, T. (1993). On self-learning patterns in the othello board game by the method of temporal differences. In C. Rowles, H. Liu, & N. Foo (Eds.), Proceedings of the 6th Australian Joint Conference on Artificial Intelligence (pp. 328–333). Melbourne: World Scientific.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Baxter, J., Tridgell, A. & Weaver, L. Learning to Play Chess Using Temporal Differences. Machine Learning 40, 243–263 (2000). https://doi.org/10.1023/A:1007634325138
Issue Date:
DOI: https://doi.org/10.1023/A:1007634325138