Top

Published in:

2014 | OriginalPaper | Chapter

Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves

Authors : Akira Ura, Makoto Miwa, Yoshimasa Tsuruoka, Takashi Chikayama

Published in: Computers and Games

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Automated tuning of parameters in computer game playing is an important technique for building strong computer programs. Comparison training is a supervised learning method for tuning the parameters of an evaluation function. It has proven to be effective in the game of Chess and Shogi. The training method requires a large number of training positions and moves extracted from game records of human experts; however, the number of such game records is limited. In this paper, we propose a practical approach to create additional training data for comparison training by using the program itself. We investigate three methods for generating additional positions and moves. Then we evaluate them using a Shogi program. Experimental results show that the self-generated training data can improve the playing strength of the program.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Further Investigations of 3-Member Simple Majority Voting for Chess

next chapter Automatic Generation of Opening Books for Dark Chess

http://wdoor.c.u-tokyo.ac.jp/shogi/floodgate.html

We can get a sufficient variety of game positions by making the first 36 moves from game records of experts. A shogi game is usually still in the opening stage even after playing the first 36 moves. The generation of Leaf and Random is done with 35 moves while Self-play uses only 30 moves because the base player may make the same moves as experts. Some extra moves of the base player are needed in Self-play to generate different positions from game records of experts.

It takes several tens of seconds for Gekisashi to perform a search with a depth of 20 in a typical middle-game position.

For example, when the training data included the Leaf training data and the Random training data, the test data included the Leaf test data and the Random test data.

Players with a rating higher than 2550 as of June 10, 2013.

Baxter, J., Tridgell, A., Weaver, L.: Reinforcement learning and chess. In: Furnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 91–116. Nova Science Publishers, Inc., Hauppauge (2001)

Beal, D.F., Smith, M.C.: Temporal difference learning applied to game playing and the results of application to shogi. Theor. Comput. Sci. 252(1–2), 105–119 (2001)CrossRefMATHMathSciNet

Bošković, B., Brest, J., Zamuda, A., Greiner, S., Žumer, V.: History mechanism supported differential evolution for chess evaluation function tuning. Soft Comput. 15(4), 667–683 (2010)CrossRef

Buro, M.: From simple features to sophisticated evaluation functions. In: van den Herik, H.J., Iida, H. (eds.) CG 1998. LNCS, vol. 1558, pp. 126–145. Springer, Heidelberg (1999) CrossRef

Buro, M.: Improving heuristic mini-max search by supervised learning. Artif. Intell. 134(1–2), 85–99 (2002)CrossRefMATH

Campbell, M., Hoane, A., et al.: Deep blue. Artif. Intell. 134(1–2), 57–83 (2002)CrossRefMATH

Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: EMNLP ’02, pp. 1–8. Association for Computational Linguistics (2002)

David-Tabibi, O., Koppel, M., Netanyahu, N.S.: Expert-driven genetic algorithms for simulating evaluation functions. Genet. Program. Evolvable Mach. 12(1), 5–22 (2011)CrossRef

Fogel, D.B., Hays, T.J., Hahn, S.L., Quon, J.: A self-learning evolutionary chess program. Proc. IEEE 92(12), 1947–1954 (2004)CrossRef

10.

Fürnkranz, J.: Machine learning in games: a survey. In: Fürnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 11–59. Nova Science Publishers, Inc., Hauppauge (2001)

11.

Hoki, K., Kaneko, T.: The global landscape of objective functions for the optimization of shogi piece values with a game-tree search. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 184–195. Springer, Heidelberg (2012) CrossRef

12.

Kaneko, T.: Evaluation functions of computer shogi programs and supervised learning using game records. J. Jpn. Soc. Artif. Intell. 27(1), 75–82 (2012) (In Japanese)

13.

Kaneko, T., Hoki, K.: Analysis of evaluation-function learning by comparison of sibling nodes. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 158–169. Springer, Heidelberg (2012) CrossRef

14.

Lee, K.F., Mahajan, S.: A pattern classification approach to evaluation function learning. Artif. Intell. 36(1), 1–25 (1988)CrossRef

15.

Mandziuk, J.: Knowledge-Free and Learning-Based Methods in Intelligent Game Playing. Springer, Heidelberg (2010)CrossRefMATH

16.

Sato, Y., Miwa, M., Takeuchi, S., Takahashi, D.: Optimizing objective function parameters for strength in computer game-playing. In: AAAI ’13, pp. 869–875 (2013)

17.

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Cambridge University Press, Cambridge (1998)

18.

Tesauro, G.: Comparison training of chess evaluation functions. Machines That Learn To play Games, pp. 117–130. Nova Science Publishers, Inc., New York (2001)

19.

Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)CrossRefMATH

20.

Tsuruoka, Y., Yokoyama, D., Chikayama, T.: Game-tree search algorithm based on realization probability. ICGA J. 25(3), 146–153 (2002)

21.

Vázquez-Fernández, E., Coello, C.A.C., Troncoso, F.D.S.: An evolutionary algorithm coupled with the Hooke-Jeeves algorithm for tuning a chess evaluation function. In: IEEE CEC ’12, pp. 1–8 (2012)

22.

Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. Adv. Neural Inf. Process. Syst. 22, 1937–1945 (2009)

Title: Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves
Authors: Akira Ura
Makoto Miwa
Yoshimasa Tsuruoka
Takashi Chikayama
Publisher: Springer International Publishing
Book: Computers and Games
Print ISBN: 978-3-319-09164-8

Electronic ISBN: 978-3-319-09165-5

Copyright Year: 2014
DOI: https://doi.org/10.1007/978-3-319-09165-5_18

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner