Strategy improvement for concurrent reachability and turn-based stochastic safety games,☆☆

https://doi.org/10.1016/j.jcss.2012.12.001Get rights and content
Under a Creative Commons license
open access

Abstract

We consider concurrent games played on graphs. At every round of a game, each player simultaneously and independently selects a move; the moves jointly determine the transition to a successor state. Two basic objectives are the safety objective to stay forever in a given set of states, and its dual, the reachability objective to reach a given set of states. First, we present a simple proof of the fact that in concurrent reachability games, for all ε>0, memoryless ε-optimal strategies exist. A memoryless strategy is independent of the history of plays, and an ε-optimal strategy achieves the objective with probability within ε of the value of the game. In contrast to previous proofs of this fact, our proof is more elementary and more combinatorial. Second, we present a strategy-improvement (a.k.a. policy-iteration) algorithm for concurrent games with reachability objectives. Finally, we present a strategy-improvement algorithm for turn-based stochastic games (where each player selects moves in turns) with safety objectives. Our algorithms yield sequences of player-1 strategies which ensure probabilities of winning that converge monotonically (from below) to the value of the game.

Highlights

► Simple proof of optimality of memoryless strategies in concurrent reachability game. ► A strategy-improvement algorithm for concurrent reachability game. ► A strategy-improvement algorithm for turn-based stochastic safety game.

Keywords

Game theory
Stochastic games
Concurrent games
Reachability and safety objectives
Strategy-improvement algorithms

Cited by (0)

This work was partially supported in part by the NSF grants CCR-0132780, CNS-0720884, CCR-0225610, by the Swiss National Science Foundation, ERC Start Grant Graph Games (Project No. 279307), FWF NFN Grant S11407-N23 (RiSE), and a Microsoft faculty fellowship.

☆☆

This paper is an improved version of Chatterjee et al. (2006, 2009) [4], [3]. There is a serious and irreparable error in Theorem 4.3 of Chatterjee et al. (2009) [3] regarding the convergence property of the improvement algorithm for concurrent safety games. This is illustrated in an example of Chatterjee et al. (2012) [2]. In the present version the result is presented for turn-based stochastic safety games. We thank anonymous reviewers for many insightful comments that helped us immensely, and warmly acknowledge their help.