Elsevier

Knowledge-Based Systems

Volume 34, October 2012, Pages 3-11
Knowledge-Based Systems

Single-player Monte-Carlo tree search for SameGame

https://doi.org/10.1016/j.knosys.2011.08.008Get rights and content

Abstract

Classic methods such as A and IDA are a popular and successful choice for one-player games. However, without an accurate admissible evaluation function, they fail. In this article we investigate whether Monte-Carlo tree search (MCTS) is an interesting alternative for one-player games where A and IDA methods do not perform well. Therefore, we propose a new MCTS variant, called single-player Monte-Carlo tree search (SP-MCTS). The selection and backpropagation strategy in SP-MCTS are different from standard MCTS. Moreover, SP-MCTS makes use of randomized restarts. We tested IDA and SP-MCTS on the puzzle SameGame and used the cross-entropy method to tune the SP-MCTS parameters. It turned out that our SP-MCTS program is able to score a substantial number of points on the standardized test set.

Introduction

The traditional approaches to deterministic one-player games with perfect information [27] are applying A [23], IDA [29] or one of their variants (e.g., Multiobjective A [40], Limited-damage A [2]). These methods have been quite successful for solving this type of games. The disadvantage of the methods is that they require an admissible heuristic evaluation function. The construction of such a function can be difficult. Since Monte-Carlo tree search (MCTS) [11], [17], [28] does not require an admissible heuristic and because of its success in two-player games [30] and multi-player games [41], MCTS seems a promising approach for deterministic one-player games with perfect information as well [39], [43].

So far, MCTS has not been widely applied in one-player games. One example is the sailing domain [28], which is a non-deterministic game with perfect information. MCTS has also been used for optimization and planning problems which can be represented as deterministic one-player games. Chaslot et al. [10] applied MCTS in production management problems. Mesmay et al. [32] proposed the MCTS variant TAG for optimizing libraries for different platforms.

This article proposes an MCTS method, called single-player Monte-Carlo tree search (SP-MCTS), for a one-player game.1 MCTS for two-player games forms the starting point for this search method. We adapted MCTS by two modifications resulting in SP-MCTS. The modifications are (1) in the selection strategy and (2) in the backpropagation strategy. SP-MCTS is tested in the puzzle2 SameGame, because there exists no reliable admissible heuristic evaluation function for this game. This article also investigates whether the cross-entropy method [36] is able to fine tune the parameters of SP-MCTS.

The article is organized as follows. In Section 2 we present the rules, complexity and related work of SameGame. In Section 3 we discuss why classic approaches such as A and IDA are not suitable for SameGame. Then, we introduce the SP-MCTS approach in Section 4. Section 5 describes the cross-entropy method which is used for tuning the SP-MCTS parameters. Experiments and results are given in Section 6. Section 7 gives the conclusions and indicates future research.

Section snippets

SameGame

SameGame is a puzzle invented by Kuniaki Moribe under the name chain shot! in 1985. It was distributed for Fujitsu FM-8/7 series in a monthly personal computer magazine called Gekkan ASCII [33]. The puzzle was afterwards re-created by Eiji Fukumoto under the name of SameGame in 1992.

In this section, we first explain the rules in Section 2.1. Subsequently, we analyze the complexity of SameGame in Section 2.2. Finally, we present related work in Section 2.3.

Search algorithms for puzzles

The classic approach to puzzles involves methods such as A [23] and IDA [29]. A is a best-first search where all nodes have to be stored in a list. The list is sorted by an admissible evaluation function. At each iteration the first element is removed from the list and its children are added to the sorted list. This process is continued until the goal state arrives at the start of the list. IDA is an iterative deepening variant of A search. It uses a depth-first approach in such a way that

Single-player Monte-Carlo tree search

MCTS is a best-first search method, which does not require a positional evaluation function. MCTS builds a search tree employing Monte-Carlo simulations at the leaf nodes. Each node in the tree represents an actual board position and typically stores the average score found in the corresponding subtree and the number of visits. MCTS constitutes a family of tree-search algorithms applicable to the domain of board games [11], [17], [28].

In general, MCTS consists of four steps, repeated until time

The cross-entropy method

Choosing the correct SP-MCTS parameter values is important for its success. For instance, an important parameter is the C constant which is responsible for the balance between exploration and exploitation. Optimizing these parameters manually may be a hard and time-consuming task. Although it is possible to make educated guesses for some parameters, for other parameters it is not possible. Specially hidden dependencies between the parameters complicate the tuning process. Here, a learning

Experiments and results

Section 6.1 tests the quality of the two simulation strategies TabuRandom and TabuColorRandom. Thereafter, the results of manual parameter tuning are presented in Section 6.2. Subsequently, Section 6.3 gives the performance of the randomized restarts on a set of 250 positions. In Section 6.4, it is investigated whether it is better to exhaust all available time at the initial position or to distribute the time uniformly for every move. Section 6.5 investigates which final move selection

Conclusions and future research

In this article we proposed a new MCTS variant called single-player Monte-Carlo tree search (SP-MCTS). We adapted MCTS by two modifications resulting in SP-MCTS. The modifications are (1) in the selection strategy and (2) in the backpropagation strategy. Below we provide five observations and one conclusion.

First, we observed that our TabuColorRandom strategy significantly increased the score of SP-MCTS in SameGame. Compared to the pure random play-outs, an increase of 50% in the average score

Acknowledgments

This work is funded by the Dutch Organisation for Scientific Research (NWO) in the framework of the project Go4Nature, Grant no. 612.000.938.

References (45)

  • T. Cazenave

    Nested Monte-Carlo search

  • G.M.J.-B. Chaslot, Monte-Carlo tree search, Ph.D. thesis, Department of Knowledge Engineering, Maastricht University,...
  • G.M.J.-B. Chaslot, S. de Jong, J.-T. Saito, J.W.H.M. Uiterwijk, Monte-Carlo tree search in production management...
  • G.M.J.-B. Chaslot, J. Saito, B. Bouzy, J.W.H.M. Uiterwijk, H.J. van den Herik, Monte-Carlo strategies for computer go,...
  • G.M.J.-B. Chaslot et al.

    Cross-entropy for Monte-Carlo tree search

    ICGA Journal

    (2008)
  • G.M.J.-B. Chaslot et al.

    Progressive strategies for Monte-Carlo tree search

    New Mathematics and Natural Computation

    (2008)
  • G.M.J.-B. Chaslot et al.

    Parallel Monte-Carlo Tree Search

  • K.-H. Chen et al.

    Monte-Carlo go with knowledge-guided simulations

    ICGA Journal

    (2008)
  • S.A. Cook

    The complexity of theorem-proving procedures

  • R. Coulom

    Efficient selectivity and backup operators in monte–carlo tree search

  • J.C. Culberson et al.

    Pattern databases

    Computational Intelligence

    (1998)
  • S. Edelkamp, P. Kissmann, D. Sulewski, H. Messerschmidt, Finding the needle in the haystack with heuristically guided...
  • Cited by (69)

    View all citing articles on Scopus
    View full text