Single-player Monte-Carlo tree search for SameGame

doi:10.1016/j.knosys.2011.08.008

Knowledge-Based Systems

Volume 34, October 2012, Pages 3-11

https://doi.org/10.1016/j.knosys.2011.08.008 Get rights and content

Abstract

Classic methods such as A^∗ and IDA^∗ are a popular and successful choice for one-player games. However, without an accurate admissible evaluation function, they fail. In this article we investigate whether Monte-Carlo tree search (MCTS) is an interesting alternative for one-player games where A^∗ and IDA^∗ methods do not perform well. Therefore, we propose a new MCTS variant, called single-player Monte-Carlo tree search (SP-MCTS). The selection and backpropagation strategy in SP-MCTS are different from standard MCTS. Moreover, SP-MCTS makes use of randomized restarts. We tested IDA^∗ and SP-MCTS on the puzzle SameGame and used the cross-entropy method to tune the SP-MCTS parameters. It turned out that our SP-MCTS program is able to score a substantial number of points on the standardized test set.

Introduction

The traditional approaches to deterministic one-player games with perfect information [27] are applying A^∗ [23], IDA^∗ [29] or one of their variants (e.g., Multiobjective A^∗ [40], Limited-damage A^∗ [2]). These methods have been quite successful for solving this type of games. The disadvantage of the methods is that they require an admissible heuristic evaluation function. The construction of such a function can be difficult. Since Monte-Carlo tree search (MCTS) [11], [17], [28] does not require an admissible heuristic and because of its success in two-player games [30] and multi-player games [41], MCTS seems a promising approach for deterministic one-player games with perfect information as well [39], [43].

So far, MCTS has not been widely applied in one-player games. One example is the sailing domain [28], which is a non-deterministic game with perfect information. MCTS has also been used for optimization and planning problems which can be represented as deterministic one-player games. Chaslot et al. [10] applied MCTS in production management problems. Mesmay et al. [32] proposed the MCTS variant TAG for optimizing libraries for different platforms.

This article proposes an MCTS method, called single-player Monte-Carlo tree search (SP-MCTS), for a one-player game.¹ MCTS for two-player games forms the starting point for this search method. We adapted MCTS by two modifications resulting in SP-MCTS. The modifications are (1) in the selection strategy and (2) in the backpropagation strategy. SP-MCTS is tested in the puzzle² SameGame, because there exists no reliable admissible heuristic evaluation function for this game. This article also investigates whether the cross-entropy method [36] is able to fine tune the parameters of SP-MCTS.

The article is organized as follows. In Section 2 we present the rules, complexity and related work of SameGame. In Section 3 we discuss why classic approaches such as A^∗ and IDA^∗ are not suitable for SameGame. Then, we introduce the SP-MCTS approach in Section 4. Section 5 describes the cross-entropy method which is used for tuning the SP-MCTS parameters. Experiments and results are given in Section 6. Section 7 gives the conclusions and indicates future research.

Section snippets

SameGame

SameGame is a puzzle invented by Kuniaki Moribe under the name chain shot! in 1985. It was distributed for Fujitsu FM-8/7 series in a monthly personal computer magazine called Gekkan ASCII [33]. The puzzle was afterwards re-created by Eiji Fukumoto under the name of SameGame in 1992.

In this section, we first explain the rules in Section 2.1. Subsequently, we analyze the complexity of SameGame in Section 2.2. Finally, we present related work in Section 2.3.

Search algorithms for puzzles

The classic approach to puzzles involves methods such as A^∗ [23] and IDA^∗ [29]. A^∗ is a best-first search where all nodes have to be stored in a list. The list is sorted by an admissible evaluation function. At each iteration the first element is removed from the list and its children are added to the sorted list. This process is continued until the goal state arrives at the start of the list. IDA^∗ is an iterative deepening variant of A^∗ search. It uses a depth-first approach in such a way that

Single-player Monte-Carlo tree search

MCTS is a best-first search method, which does not require a positional evaluation function. MCTS builds a search tree employing Monte-Carlo simulations at the leaf nodes. Each node in the tree represents an actual board position and typically stores the average score found in the corresponding subtree and the number of visits. MCTS constitutes a family of tree-search algorithms applicable to the domain of board games [11], [17], [28].

In general, MCTS consists of four steps, repeated until time

The cross-entropy method

Choosing the correct SP-MCTS parameter values is important for its success. For instance, an important parameter is the C constant which is responsible for the balance between exploration and exploitation. Optimizing these parameters manually may be a hard and time-consuming task. Although it is possible to make educated guesses for some parameters, for other parameters it is not possible. Specially hidden dependencies between the parameters complicate the tuning process. Here, a learning

Experiments and results

Section 6.1 tests the quality of the two simulation strategies TabuRandom and TabuColorRandom. Thereafter, the results of manual parameter tuning are presented in Section 6.2. Subsequently, Section 6.3 gives the performance of the randomized restarts on a set of 250 positions. In Section 6.4, it is investigated whether it is better to exhaust all available time at the initial position or to distribute the time uniformly for every move. Section 6.5 investigates which final move selection

Conclusions and future research

In this article we proposed a new MCTS variant called single-player Monte-Carlo tree search (SP-MCTS). We adapted MCTS by two modifications resulting in SP-MCTS. The modifications are (1) in the selection strategy and (2) in the backpropagation strategy. Below we provide five observations and one conclusion.

First, we observed that our TabuColorRandom strategy significantly increased the score of SP-MCTS in SameGame. Compared to the pure random play-outs, an increase of 50% in the average score

Acknowledgments

This work is funded by the Dutch Organisation for Scientific Research (NWO) in the framework of the project Go4Nature, Grant no. 612.000.938.

References (45)

S. Bayili et al.
Limited-damage A^∗: a path search algorithm that considers damage as a feasibility criterion
Knowledge-Based Systems
(2011)
D.F. Beal et al.
Temporal difference learning for heuristic search and game playing
Information Sciences
(2000)
A.L. Blum et al.
Fast planning through planning graph analysis
Artificial Intelligence
(1997)
B. Bouzy
Associating domain-dependent knowledge and monte–carlo approaches within a go program
Information Sciences, Heuristic Search and Computer Game Playing IV
(2005)
D.S. Johnson
A catalog of complexity classes
R.E. Korf
Depth-first iterative deepening: an optimal admissible tree search
Artificial Intelligence
(1985)
C. Okutan et al.
A monolithic approach to automated composition of semantic web services with the event calculus
Knowledge-Based Systems
(2010)
L.V. Allis, Searching for solutions in games and artificial intelligence, Ph.D. thesis, Department of Computer Science,...
Biedl, T.C., Demaine, E.D., Demaine, M.L., Fleischer, R., Jacobsen, L., Munro, J.I., 2002. The Complexity of...
D. Billings
Personal Communication
(2007)

T. Cazenave

Nested Monte-Carlo search

G.M.J.-B. Chaslot, Monte-Carlo tree search, Ph.D. thesis, Department of Knowledge Engineering, Maastricht University,...

G.M.J.-B. Chaslot, S. de Jong, J.-T. Saito, J.W.H.M. Uiterwijk, Monte-Carlo tree search in production management...

G.M.J.-B. Chaslot, J. Saito, B. Bouzy, J.W.H.M. Uiterwijk, H.J. van den Herik, Monte-Carlo strategies for computer go,...

G.M.J.-B. Chaslot et al.

Cross-entropy for Monte-Carlo tree search

ICGA Journal

(2008)

G.M.J.-B. Chaslot et al.

Progressive strategies for Monte-Carlo tree search

New Mathematics and Natural Computation

(2008)

G.M.J.-B. Chaslot et al.

Parallel Monte-Carlo Tree Search

K.-H. Chen et al.

Monte-Carlo go with knowledge-guided simulations

ICGA Journal

(2008)

S.A. Cook

The complexity of theorem-proving procedures

R. Coulom

Efficient selectivity and backup operators in monte–carlo tree search

J.C. Culberson et al.

Pattern databases

Computational Intelligence

(1998)

S. Edelkamp, P. Kissmann, D. Sulewski, H. Messerschmidt, Finding the needle in the haystack with heuristically guided...

Cited by (69)

An analysis of Single-Player Monte Carlo Tree Search performance in Sokoban
2022, Expert Systems with Applications
Citation Excerpt :
The game is extremely challenging, both for human and artificial players, and it has been shown to be NP-hard and P-space complete (Culberson, 1997). In this paper, we apply the extension of Monte Carlo Tree Search for single player games (SP-MCTS) (Schadd et al., 2012) to Sokoban and compare its performance to a solver based on Iterative Deepening A* (IDA*) (Korf, 1985) and the work of (Junghanns & Eltern, 1999; Junghanns & Schaeffer, 1998a, 1998b, 2001a). Sokoban introduces several challenges to MCTS methods.
We apply the extension of Monte Carlo Tree Search for single player games (SP-MCTS) to Sokoban and compare its performance to a solver integrating Iterative Deepening A* (IDA*) with several problem-specific heuristics. We introduce two extensions of MCTS to deal with some of the challenges that Sokoban poses to MCTS methods, namely, the reduced search space that deadlock situations can cause and the large number of cycles. We also evaluate three domain-independent enhancements that have been shown to improve MCTS performance, namely, UCB1-Tuned, Rapid Action Value Estimation (RAVE), and Node Recycling. We perform a series of experiments to determine the best SP-MCTS configuration and then compare its performance to IDA*. We show that SP-MCTS can solve around 85% of the levels with 1000000 iterations, that is the same performance reached by IDA* with only 10000 nodes. Overall, our results suggest that IDA* is still the best solver for Sokoban, also because it can easily integrate much domain knowledge. At the same time, our results also highlight some interesting directions to design better MCTS solvers for this domain.
An Analysis on the Effects of Evolving the Monte Carlo Tree Search Upper Confidence for Trees Selection Policy on Unimodal, Multimodal and Deceptive Landscapes
2024, SSRN
Maximal coverage problems with routing constraints using cross-entropy Monte Carlo tree search
2024, Autonomous Robots
Applying Larger N-Tuple Networks to EinStein Wurfelt Nicht!
2023, ACM International Conference Proceeding Series
An Analysis on the Effects of Evolving the Monte Carlo Tree Search Upper Confidence for Trees Selection Policy on Unimodal, Multimodal and Deceptive Landscapes
2023, arXiv
Evolving the MCTS Upper Confidence Bounds for Trees Using a Semantic-Inspired Evolutionary Algorithm in the Game of Carcassonne
2023, IEEE Transactions on Games

View all citing articles on Scopus

View full text

Single-player Monte-Carlo tree search for SameGame

Abstract

Introduction

Section snippets

SameGame

Search algorithms for puzzles

Single-player Monte-Carlo tree search

The cross-entropy method

Experiments and results

Conclusions and future research

Acknowledgments

Knowledge-Based Systems

Information Sciences

Artificial Intelligence

Information Sciences, Heuristic Search and Computer Game Playing IV

Artificial Intelligence

Knowledge-Based Systems

Personal Communication

Nested Monte-Carlo search

Cross-entropy for Monte-Carlo tree search

ICGA Journal

Progressive strategies for Monte-Carlo tree search

New Mathematics and Natural Computation

Parallel Monte-Carlo Tree Search

Monte-Carlo go with knowledge-guided simulations

ICGA Journal

The complexity of theorem-proving procedures

Efficient selectivity and backup operators in monte–carlo tree search

Pattern databases

Computational Intelligence