Skip to main content
Top

2016 | OriginalPaper | Chapter

Monte Carlo Tree Search with Robust Exploration

Authors : Takahisa Imagawa, Tomoyuki Kaneko

Published in: Computers and Games

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents a new Monte-Carlo tree search method that focuses on identifying the best move. UCT which minimizes the cumulative regret, has achieved remarkable success in Go and other games. However, recent studies on simple regret reveal that there are better exploration strategies. To further improve the performance, a leaf to be explored is determined not only by the mean but also by the whole reward distribution. We adopted a hybrid approach to obtain reliable distributions. A negamax-style backup of reward distributions is used in the shallower half of a search tree, and UCT is adopted in the rest of the tree. Experiments on synthetic trees show that this presented method outperformed UCT and similar methods, except for trees having uniform width and depth.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefMATH Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefMATH
2.
go back to reference Baudiš, P.: Balancing MCTS by dynamically adjusting the komi value. ICGA J. Int. Comput. Games Assoc. 34(3), 131 (2011) Baudiš, P.: Balancing MCTS by dynamically adjusting the komi value. ICGA J. Int. Comput. Games Assoc. 34(3), 131 (2011)
5.
go back to reference Browne, C., Powley, E.J., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intellig. AI Games 4(1), 1–43 (2012)CrossRef Browne, C., Powley, E.J., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intellig. AI Games 4(1), 1–43 (2012)CrossRef
6.
go back to reference Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in finitely-armed and continuous-armed bandits. Theor. Comput. Sci. 412(19), 1832–1852 (2011). Algorithmic Learning Theory (ALT 2009)MathSciNetCrossRefMATH Bubeck, S., Munos, R., Stoltz, G.: Pure exploration in finitely-armed and continuous-armed bandits. Theor. Comput. Sci. 412(19), 1832–1852 (2011). Algorithmic Learning Theory (ALT 2009)MathSciNetCrossRefMATH
7.
go back to reference Cazenave, T.: Sequential halving applied to trees. IEEE Trans. Comput. Intellig. AI Games 7(1), 102–105 (2015)CrossRef Cazenave, T.: Sequential halving applied to trees. IEEE Trans. Comput. Intellig. AI Games 7(1), 102–105 (2015)CrossRef
8.
go back to reference Enzenberger, M., Muller, M., Arneson, B., Segal, R.: Fuego-an open-source framework for board games and go engine based on Monte Carlo tree search. IEEE Trans. Comput. Intellig. AI Games 2(4), 259–270 (2010)CrossRef Enzenberger, M., Muller, M., Arneson, B., Segal, R.: Fuego-an open-source framework for board games and go engine based on Monte Carlo tree search. IEEE Trans. Comput. Intellig. AI Games 2(4), 259–270 (2010)CrossRef
9.
go back to reference Furtak, T., Buro, M.: Minimum proof graphs and fastest-cut-first search heuristics. In: IJCAI, pp. 492–498 (2009) Furtak, T., Buro, M.: Minimum proof graphs and fastest-cut-first search heuristics. In: IJCAI, pp. 492–498 (2009)
10.
go back to reference Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C., Teytaud, O.: The grand challenge of computer go: Monte Carlo tree search and extensions. Commun. ACM 55(3), 106–113 (2012)CrossRef Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C., Teytaud, O.: The grand challenge of computer go: Monte Carlo tree search and extensions. Commun. ACM 55(3), 106–113 (2012)CrossRef
11.
go back to reference Graf, T., Schaefers, L., Platzner, M.: On semeai detection in Monte-Carlo go. In: Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2013. LNCS, vol. 8427, pp. 14–25. Springer, Heidelberg (2014). doi:10.1007/978-3-319-09165-5_2 Graf, T., Schaefers, L., Platzner, M.: On semeai detection in Monte-Carlo go. In: Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2013. LNCS, vol. 8427, pp. 14–25. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-09165-5_​2
12.
go back to reference Imagawa, T., Kaneko, T.: Enhancements in Monte Carlo tree search algorithms for biased game trees. In: IEEE Computational Intelligence and Games (CIG), pp. 43–50 (2015) Imagawa, T., Kaneko, T.: Enhancements in Monte Carlo tree search algorithms for biased game trees. In: IEEE Computational Intelligence and Games (CIG), pp. 43–50 (2015)
13.
go back to reference Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). doi:10.1007/11871842_29 CrossRef Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). doi:10.​1007/​11871842_​29 CrossRef
14.
go back to reference Liu, Y.C., Tsuruoka, Y.: Regulation of exploration for simple regret minimization in Monte-Carlo tree search. In: 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 35–42, August 2015 Liu, Y.C., Tsuruoka, Y.: Regulation of exploration for simple regret minimization in Monte-Carlo tree search. In: 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 35–42, August 2015
15.
go back to reference Pepels, T., Cazenave, T., Winands, M.H.M., Lanctot, M.: Minimizing simple and cumulative regret in Monte-Carlo tree search. In: Cazenave, T., Winands, M.H.M., Björnsson, Y. (eds.) CGW 2014. CCIS, vol. 504, pp. 1–15. Springer, Heidelberg (2014). doi:10.1007/978-3-319-14923-3_1 CrossRef Pepels, T., Cazenave, T., Winands, M.H.M., Lanctot, M.: Minimizing simple and cumulative regret in Monte-Carlo tree search. In: Cazenave, T., Winands, M.H.M., Björnsson, Y. (eds.) CGW 2014. CCIS, vol. 504, pp. 1–15. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-14923-3_​1 CrossRef
16.
go back to reference Plaat, A., Schaeffer, J., Pijls, W., de Bruin, A.: Best-first fixed depth minimax algorithms. Artif. Intell. 87, 255–293 (1996)MathSciNetCrossRef Plaat, A., Schaeffer, J., Pijls, W., de Bruin, A.: Best-first fixed depth minimax algorithms. Artif. Intell. 87, 255–293 (1996)MathSciNetCrossRef
17.
go back to reference Ramanujan, R., Sabharwal, A., Selman, B.: On adversarial search spaces and sampling-based planning. In: ICAPS, pp. 242–245 (2010) Ramanujan, R., Sabharwal, A., Selman, B.: On adversarial search spaces and sampling-based planning. In: ICAPS, pp. 242–245 (2010)
18.
go back to reference Tesauro, G., Rajan, V., Segal, R.: Bayesian inference in Monte-Carlo tree search. In: the 26th Conference on Uncertainty in Artificial Intelligence (UAI 2010) (2010) Tesauro, G., Rajan, V., Segal, R.: Bayesian inference in Monte-Carlo tree search. In: the 26th Conference on Uncertainty in Artificial Intelligence (UAI 2010) (2010)
19.
go back to reference Tolpin, D., Shimony, S.E.: MCTS based on simple regret. In: AAAI (2012) Tolpin, D., Shimony, S.E.: MCTS based on simple regret. In: AAAI (2012)
20.
go back to reference Winands, M.H.M., Björnsson, Y., Saito, J.-T.: Monte-Carlo tree search solver. In: Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 25–36. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87608-3_3 CrossRef Winands, M.H.M., Björnsson, Y., Saito, J.-T.: Monte-Carlo tree search solver. In: Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 25–36. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-87608-3_​3 CrossRef
21.
go back to reference Yokoyama, D., Kitsuregawa, M.: A randomized game-tree search algorithm for shogi based on Bayesian approach. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 937–944. Springer, Heidelberg (2014). doi:10.1007/978-3-319-13560-1_81 Yokoyama, D., Kitsuregawa, M.: A randomized game-tree search algorithm for shogi based on Bayesian approach. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 937–944. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-13560-1_​81
Metadata
Title
Monte Carlo Tree Search with Robust Exploration
Authors
Takahisa Imagawa
Tomoyuki Kaneko
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-50935-8_4

Premium Partner