Das Kapitel untersucht die effiziente Berechnung von Energiewerten in Spielen mit endloser Dauer, in denen zwei Spieler, Min und Max, durch eine endlich gerichtete Grafik navigieren, um ihre Auszahlungen zu optimieren. Es führt einen einheitlichen Ansatz zur schnellen Wertwiederholung ein, der bestehende Algorithmen vereinheitlicht und vereinfacht und eine systematische Methode zur Herstellung effizienter Löser bereitstellt. Der Text vertieft sich in die theoretischen Grundlagen von Energie- und Mean-Paid-Spielen und hebt die Positionsbestimmung hervor, die es den Spielern erlaubt, Strategien anzuwenden, die nur von der aktuellen Spielposition abhängig sind. Es werden verschiedene algorithmische Paradigmen diskutiert, darunter Wertwiederholungen und Strategieverbesserungen, und hybride Algorithmen eingeführt, die Elemente aus beiden kombinieren. Das Kapitel stellt ein auf potenziellen Verringerungen basierendes Rahmenwerk für die schnelle Iteration von Werten vor, das Zyklusgewichte und mittlere Auszahlungswerte beibehält und so die Korrektheit der daraus abgeleiteten Algorithmen sicherstellt. Es untersucht und vereinheitlicht bestehende Algorithmen wie OSI und QDPM und schlägt neue Algorithmen wie Positive Path Iteration (PPI) und ihre dynamische Variante (DPPI) vor. Die empirische Bewertung vergleicht diese Algorithmen mit Paritätslösern der Spitzenklasse und demonstriert deren Effizienz und Robustheit, insbesondere im Umgang mit harten Instanzen. Das Kapitel schließt mit einer Diskussion offener Fragen und zukünftiger Forschungsrichtungen, wobei das Potenzial für weitere Verbesserungen und neue algorithmische Entwicklungen betont wird.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
We study algorithms for solving parity, mean-payoff and energy games. We propose a systematic framework, which we call Fast value iteration, for describing, comparing, and proving correctness of such algorithms. The approach is based on potential reductions, as introduced by Gurvich, Karzanov and Khachiyan (1988). This framework allows us to provide simple presentations and correctness proofs of known algorithms, unifying the Optimal strategy improvement algorithm by Schewe (2008) and the quasi dominions approach by Benerecetti et al. (2020), amongst others. The new approach also leads to novel symmetric versions of these algorithms, highly efficient in practice, but for which we are unable to prove termination. We report on empirical evaluation, comparing the different fast value iteration algorithms, and showing that they are competitive even to top parity game solvers.
1 Introduction
Mean-payoff and energy games. The games under study are infinite duration games where two players, Min and Max, move a token over a finite directed graph with no sink, where the edges of the graph are labelled by payoffs in \(\mathbb Z\). When playing a mean-payoff game, the players optimise (minimise or maximise, respectively) the asymptotic average payoff. In an energy game, they instead optimise the supremum cumulative sum of payoffs within \([0,\infty ]\). These games are positionaly determined [5, 12]: the two players can play optimally even when restricted to strategies that only depend on the current position of the game. We refer to Figure 1 for a complete example.
Fig. 1.
Example of a game; circles belong to Min and squares belong to Max. From left to right, the mean-payoff values are \(-2,-2,-\frac{1}{2}, - \frac{1}{2},1\) and 1, and positional strategies for mean-payoff values are identified in bold. Energy values are \(0,2,9,0,\infty \) and \(\infty \), and with optimal strategies given by the double-headed arrows.
In this paper, we are interested in the problem of computing energy values of the vertices in a given game which we call solving the energy game. It easily follows from positional determinacy that the energy value of a vertex is finite if and only if its mean-payoff value is non-positive [6]. Therefore solving an energy game also solves the so called threshold problem for the associated mean-payoff game. As it turns out, all state-of-the-art algorithms [2, 4, 6, 11, 30, 32] for the mean-payoff threshold problem actually solve the energy game.
Anzeige
Mean-payoff values achieved by positional strategies can be computed in polynomial time, and therefore the threshold problem belongs to \(\textrm{NP}\cap \textrm{coNP}\). Despite numerous efforts, no polynomial algorithm is known. Mean-payoff games are known [31] to generalise parity games [13, 29] which also belong to \(\textrm{NP}\cap \textrm{coNP}\) but for which algorithms with quasipolynomial runtime were recently devised [7]. However, quasipolynomial algorithms for parity games do not generalise to mean-payoff games [15].
Algorithmic paradigms. There are two well-established paradigms for solving energy games: value iteration (sometimes called “progress measure”) and strategy improvement. The standard value iteration for energy games (which we will call
, SVI for short) was introduced by Brim et al. [6]. While subject to good theoretical (pseudopolynomial) bounds, it is well-known to be prohibitively slow in practice, as its worse-case behaviour is frequently displayed. On the other hand, strategy improvement algorithms [3] typically solve practical instances in a constant number of iterations. Although it offers a useful categorization of older algorithms, the value iteration versus strategy improvement dichotomy fails to accurately describe a new wave of efficient algorithms.
In recent years, multiple hybrid algorithms – borrowing ideas from both paradigms – have been put forward. In 2008, Schewe [32] introduced an algorithm called
(OSI) for solving parity or mean-payoff games. As explained by Luttenberger [26], Schewe’s presentation of OSI is in fact closer to value iteration, but it can also be formally cast as a strategy improvement in a carefully generalised framework allowing for nondeterministic strategies. In 2019, Dorfman et al. [11] presented a value iteration method augmented by a carefully crafted acceleration mechanism (which we call DKZ), thereby improving on the best theoretical guarantees (this algorithm can be seen as a reformulation of the GKK algorithm [18], see also [30] for further analyses). Based on the idea of quasi dominions (similar to Fearnley’s snares [14] in a strategy improvement context), Benerecetti et al. [2] proposed another such acceleration mechanism, obtaining the algorithm QDPM. Some of these algorithms are extremely efficient: a version of OSI is a key component in the LTL-synthesis tool STRIX [27, 28], which is one of the top competitors in the annual synthesis competition SYNTCOMP [19]. On the other hand, QDPM is currently the top-performing mean-payoff game solver [2] while, remarkably, preserving state-of-the-art theoretical guarantees.
Although differences in the performances of these algorithms have been observed empirically [2], we lack a good understanding of how they compare to each other theoretically, and more generally, of what are the fundamental algorithmic mechanisms that lead to efficient game solvers in practice.
Anzeige
Contributions. Our contributions are as follows.
(1) Fast value iteration framework. We consider potential reductions, as introduced by Gurvich, Karzanov and Khachiyan [18], to design a systematic method for producing algorithms for energy games, which we call the fast value iteration framework. A potential is a mapping which assigns a positive weight to each vertex. Such a potential naturally induces a transformation (a potential reduction) of the game, which preserves the weight of every cycle and thus the values in the mean-payoff game. The fast value iteration meta-algorithm (Algorithm 1) simply iterates on potential reductions until a fixpoint is reached. This meta-algorithm can be instantiated on any given class of potentials, leading to different algorithms, whose correctness is automatically guaranteed under mild assumptions on the potentials (Theorem 4). Interestingly, the framework also provides a symmetric meta-algorithm, for which termination is observed in practice, but we have not been able to prove it theoretically.
The algorithms from the fast value iteration framework share some properties that make them convenient for practical applications. The main reason STRIX uses OSI is its support for modularity. Since games coming from LTL-formulas are typically huge, an important feature is to be able to solve them piecewise, avoiding loading the entire game into memory. We show that all algorithms within the fast value iteration framework are well-suited for this modular approach, which also opens exciting perspectives for parallelised implementations.
(2) Unifying and simplifying existing algorithms. We revisit various algorithms in the light of the above framework. Naturally, the classic SVI [6] is captured (Example 6), as well as the algorithms GKK [18] and DKZ [11] (Section 4.3), whose original presentations fit the potential reduction framework.
More interestingly, we also capture algorithms showcasing an excellent performance in practice, defying the common belief that VI algorithms are slow. We unify and simplify the algorithms OSI by Schewe [32] and the involved QDPM by Benerecetti et al. [2]. Our presentations are streamlined (see Section 4 for details), leading to immediate correctness proofs. It also allows to isolate the core algorithmic idea underlying these two algorithms, which is a natural adaptation of Dijkstra’s algorithm to the two-player setting. We call the obtained reinterpretation of OSI and QDPM within the fast value iteration framework, the Positive Path Iteration (PPI).
The abstraction provided by our approach sets the stage to easily craft new algorithms. Showcasing its applicability, we propose a dynamic version of PPI (DPPI), which provably breaks the theoretical barrier set by OSI and QDPM (Lemma 13). Many possibilities for future work are proposed in the conclusion.
(3) Empirical evaluation. We compare the implementations of the algorithms described in the fast value iteration framework to OSI and QDPM, as well as to the top parity game solvers. This evaluation shows: (i) fast value iteration algorithms are highly efficient in practice, and especially robust towards hard instances; (ii) alternating versions of the algorithms not only terminate, but are remarkably efficient.
2 Preliminaries
A
is a tuple \(\mathcal {G}=(G,w,V_{\textrm{Min}}, V_{\textrm{Max}})\), where \(G=(V,E)\) is a finite sinkless directed graph, \(w:E \rightarrow \mathbb Z\) is a labelling of its edges by integer weights, and \(V_{\textrm{Min}},V_{\textrm{Max}}\) is a partition of V. We set \(n=|V|,m=|E|\) and \(W=\max \limits _{e\in E} |w(e)|\). We say that vertices in
belong to Min and that those in
belong to Max. We now fix a game \(\mathcal {G}= (G,w,V_{\textrm{Min}},V_{\textrm{Max}})\).
We simply write \(vv'\) for an edge \((v,v')\in E\). A path is a (possibly empty, possibly infinite) sequence of edges \(\pi = e_0e_1 \dots \), with \(e_{i}=v_{i}v_{i}'\), such that \(v_i' = v_{i+1}\). We write \(v_0 \rightarrow v_1 \rightarrow \dots \) to denote such a path. The sum of a finite path \(\pi \) is the sum of the weights appearing on it, we denote it by
. Given a finite or infinite path \(\pi =e_0e_1 \dots \) and an integer \(k \ge 0\), we let \(\pi _{< k} = e_0 e_1 \dots e_{k-1}\), and we let \(\pi _{\le k} = \pi _{< k+1}\). Note that \(\pi _{<0}\) is the empty path, and that \(\pi _{<k}\) has length k. By convention, the empty path starts and ends in all vertices.
A
is a map \(\textrm{val}: \mathbb Z^\omega \rightarrow \mathbb R\cup \{\infty \}\) assigning a potentially infinite value to infinite sequences of weights. We use
and
to denote respectively \(\mathbb R\cup \{\infty \}, \mathbb Z\cup \{\infty \}\) and \(\mathbb N\cup \{\infty \}\). The four valuations studied in this paper are the mean-payoff, energy, positive-energy, and first-if-positive valuations given by:
where \(w=w_0w_1\dots \) is a sequence of weights and \(k_\text {neg}= \min \{k \mid w_k <0\} \in {\mathbb N^{\infty }}\) is the first index of a negative weight. For technical convenience, we will also consider games in which weights are potentially (positively) infinite. We extend the definitions of \({\textrm{En}},{\textrm{En}^+}\) and \({\textrm{First}^+}\) to words in \(({\mathbb Z^{\infty }})^\omega \), using the same formula. Note that for any \(w \in ({\mathbb Z^{\infty }})^\omega \) we have \({\textrm{En}^+}\le {\textrm{En}}\). The four valuations are illustrated on a given sequence of weights in Figure 2.
Fig. 2.
The three valuations over a given sequence of weights. The mean-payoff value is given by the slope of the line, which corresponds to the long-term average. In this case, the mean-payoff is \(\le 0\), and both \({\textrm{En}}\) and \({\textrm{En}^+}\) are finite.
A
for Min is a map \(\sigma : V_{\textrm{Min}}\rightarrow E\) such that for all \(v \in V_{\textrm{Min}}\), it holds that \(\sigma (v)\) is an edge outgoing from v. We say that a (finite or infinite) path \(\pi = e_0e_1 \dots \) is consistent with \(\sigma \) if whenever \(e_i=v_iv_{i+1}\) is defined and \(v_i \in V_{\textrm{Min}}\), it holds that \(e_i = \sigma (v_i)\). We write in this case \(\pi {\models }\sigma \). Strategies for Max are defined similarly and written \(\tau : V_{\textrm{Max}}\rightarrow E\). The theorem below states that the three valuations are determined with positional strategies. It is well known for \({\textrm{MP}}\) and \({\textrm{En}}\) and easy to prove for \({\textrm{En}^+}\). We remark that positional determinacy also holds for the two energy valuations \({\textrm{En}}\) and \({\textrm{En}^+}\) over games where we allow for infinite weights.
Theorem 1
([5, 12]). For each \(\textrm{val}\in \{{\textrm{MP}},{\textrm{En}},{\textrm{En}^+}\}\), there exist strategies \(\sigma _0\) for Min and \(\tau _0\) for Max such that for all \(v \in V\) we have
where \(\sigma , \tau \) and \(\pi \) respectively range over strategies for Min, strategies for Max, and infinite paths from v.
The quantity defined by the equilibrium above is called the
of v in the \(\textrm{val}\) game, and we denote it by
; the strategies \(\sigma _0\) and \(\tau _0\) are called \(\textrm{val}\)-optimal, note that they do not depend on v. The two main algorithmic problems we are interested in are (i) computing the value \({\textrm{En}}_\mathcal {G}(v)\) of a given vertex v in a game, and (ii) decide whether \({\textrm{MP}}_\mathcal {G}(v) \le 0\) (threshold problem). The following result relates the values in the mean-payoff and energy games; this direct consequence of Theorem 1 was first stated in [6].
Therefore computing \({\textrm{En}}\)-values of the games is harder than the mean-payoff threshold problem. It is easy to deduce \({\textrm{En}}\)-optimal strategies for Min from the knowledge of the \({\textrm{En}}\)-values: we select Min-edges that minimise the sum of the edge’s weight and the energy of the destination. However no knowledge is gained about Max strategies besides the winning region (over which \({\textrm{En}}\) values are \(\infty \)). As explained in the introduction, all state-of-the-art algorithms for the threshold problem actually compute \({\textrm{En}}\) values. This shifts our focus from mean-payoff to energy games.
Attractors. Given a subset \(S \subseteq V\), the attractor
to S in \(\mathcal {G}\) is defined to be the set of vertices v such that Max can ensure to reach S from v.
Simple games. A finite path \(v_0 \rightarrow v_1 \rightarrow \dots \rightarrow v_k\) is simple if there is no repetition in \(v_0,v_1,\dots ,v_{k-1}\); note that a cycle may be simple. A game is
if all simple cycles have nonzero sum. The following result is folklore and states that one may reduce to a simple game at the cost of a linear blow up on W. It holds thanks to the fact that positive mean-payoff values are \(\ge 1/n\) (with \(n=|V|\)), which is a well-known consequence of Theorem 1.
Lemma 3
Let \(\mathcal {G}=(G,w,V_{\textrm{Min}},V_{\textrm{Max}})\) be an arbitrary game. The game \(\mathcal {G}'=(G,w',V_{\textrm{Min}},V_{\textrm{Max}})\), with \(w' = (n+1)w-1\), is simple and has the same vertices of positive mean-payoff values as \(\mathcal {G}\).
3 Fast value iteration: A meta-algorithm based on potential reductions
3.1 Potential reductions
Fix a game \(\mathcal {G}= (G=(V,E),w,V_{\textrm{Min}},V_{\textrm{Max}})\). A
is a map \(\varphi :V \rightarrow {\mathbb N^{\infty }}\). Potentials are partially ordered coordinatewise. We write \(\varphi = 0\) if \(\varphi (v)=0\) for all \(v\in V\). Given an edge \(vv' \in E\), we define its \(\varphi \)-modified weight to be
The
is simply the game \((G,w_\varphi ,V_{\textrm{Min}},V_{\textrm{Max}})\); informally, all weights are replaced by the modified weights. Note that the underlying graph does not change, in particular paths in \(\mathcal {G}\) and \(\mathcal {G}_\varphi \) are the same. Moving from \(\mathcal {G}\) to \(\mathcal {G}_\varphi \) for a given potential \(\varphi \) is called a
.
Weights of cycles are preserved by finite potential reductions, and therefore, as an easy consequence of positionality (Theorem 1), mean-payoff values are preserved. Note that any edge outgoing from a vertex v with potential \(\varphi (v) = \infty \) has weight \(\infty \) in the modified game, therefore v has \({\textrm{En}}\) and \({\textrm{En}^+}\)-values \(\infty \) in \(G_\varphi \). Note also that sequential applications of potential reductions correspond to reducing with respect to the sum of the potentials: \((\mathcal {G}_\varphi )_{\varphi '} = \mathcal {G}_{\varphi +\varphi '}\).
Potential reductions were introduced by Gallai [17] for studying network-related problems such as shortest-paths problems. In the context of mean-payoff or energy games, they were introduced in [18] and later sometimes rediscovered.
3.2 The fast value iteration meta-algorithm
A
is a function \(\varPsi \) that assigns a potential \(\varPsi (\mathcal {G}):V \rightarrow {\mathbb N^{\infty }}\) to each game \(\mathcal {G}\). A potential assigner \(\varPsi \) induces a fast value iteration algorithm (called \(\varPsi \)-FVI) as follows: successively apply potential reductions using the potentials given by \(\varPsi \), until a game \(\mathcal {G}'\) is reached with \(\varPsi (\mathcal {G}')(V) \subseteq \{ 0, \infty \}\). For an arbitrary potential assigner, this algorithm might not terminate, or provide a final game \(\mathcal {G}'\) carrying irrelevant information. However, we show that under mild hypotheses on \(\varPsi \), this algorithm terminates, and \({\textrm{En}}_{\mathcal {G}'} = \varPsi (\mathcal {G}')\), with the vertices with \({\textrm{En}}\)-value 0 corresponding to the vertices with finite value in the original game. Moreover, the exact \({\textrm{En}}\)-values of the original game can be recovered from the sequence of potentials obtained during the computation.
We formalise this idea in Algorithm 1 and Theorem 4. To ensure termination, we need to artificially increase the potential of some vertices to \(\infty \) whenever a threshold is reached, and then remove Max’s attractor to \(\infty \). This technique is standard in value iteration algorithms, see e.g. [6].
Let us isolate two relevant properties of potential assigners: (1)
: for any game \(\mathcal {G}\), \(\varPsi (\mathcal {G}) \le {\textrm{En}_{\mathcal {G}}}\); (2)
: for any \(\mathcal {G}\), if \(\varPsi (\mathcal {G}) = 0\) then \({\textrm{En}_{\mathcal {G}}}=0\). We also say that a potential \(\varphi \) is sound over a given game if condition (1) is met. We may now state our first main result.
Theorem 4
Let \(\varPsi \) be a sound and complete potential assigner. Then Algorithm 1 terminates in at most \(n^2W\) iterations, and returns \(\varPhi = {\textrm{En}_{\mathcal {G}}}\).
Remark 5
Note that the hypotheses of the theorem are minimal. If a potential assigner \(\varPsi \) is not sound, there is a game \(\mathcal {G}\) for which the algorithm returns \(\varPhi \ge \varPsi (\mathcal {G}) > {\textrm{En}_{\mathcal {G}}}\). If it does not satisfy (ii), there is a game for which the algorithm stops in the first iteration, returning the potential \(\varPhi = 0 \ne {\textrm{En}_{\mathcal {G}}}\).
Example 6
(Simple value iteration of Brim et al. [6]). Define the potential assigner
by assigning the potential \({\textrm{First}^+_{\mathcal {G}}}(v)\), the first-if-positive value, to a vertex v. This potential is easily computed in linear time as it coincides for each Max (resp. Min) vertex v, with the maximal (resp. minimal) value of \(\max (w,0)\) where w ranges over outgoing weights. Clearly \({\varPsi _{\textrm{First}^+}}\le {\textrm{En}_{\mathcal {G}}}\), since for any sequence of weights \(w_0w_1\dots \), it holds that \({\textrm{First}^+}(w_0w_1\dots ) \le {\textrm{En}}(w_0w_1\dots )\). Finally, if \({\varPsi _{\textrm{First}^+}}=0\), then from any vertex Min can ensure that no positive weight is ever seen, which entails \({\textrm{En}_{\mathcal {G}}}=0\). We conclude that \({\varPsi _{\textrm{First}^+}}\) is sound and complete; the fast value iteration algorithm coincides with that of [6].2
Example 7
Any (determined) valuation \(\textrm{val}:\mathbb Z^\omega \rightarrow \mathbb R^{\infty }\) induces a potential assigner
, namely, the one that assigns to each game \(\mathcal {G}\) the potential given by \(\textrm{val}_\mathcal {G}(v)\). If the valuation satisfies \(\textrm{val}\le {\textrm{En}}\) over weight sequences, then \({\varPsi _{\textrm{val}}}\) is sound. Moreover, if \(\textrm{val}(w_0w_1...) >0\) whenever \(w_0>0\), then \({\varPsi _{\textrm{val}}}\) is complete. This includes the previous example, and more interestingly, this includes the valuation \({\textrm{En}^+}\), which is the subject of Section 4.1.
Of course, an important requirement over \(\varPsi \) to make Algorithm 1 relevant is that we should be able to compute \(\varPsi (\mathcal {G})\) efficiently. Note that the potential assigner corresponding to the \({\textrm{En}}\)-values of a game satisfies all the required hypothesis, and makes Algorithm 1 terminate in a single iteration.
\(\infty \)-attraction. In many occurrences, the algorithm can be simplified by removing lines 6-8 and stopping when a fixpoint is reached (which can be implemented by replacing line 9 with “while \(\mathcal {G}_{\varPsi (\mathcal {G})} \ne \mathcal {G}\)”). We say that potential assigners with this property are
.
Modularity. The framework of fast value iteration is specially well suited for a modular approach, allowing to solve games piecewise, as we show next. A subgame is a pair \((\mathcal {G}',\mathcal {G})\), with \(\mathcal {G}'\subseteq \mathcal {G}\). Let \(\mathcal {S}\) be a class of
. We say that a potential assigner \(\varPsi \) is
if for all subgames \((\mathcal {G}',\mathcal {G})\in \mathcal {S}\) it holds that \(\varPsi (\mathcal {G}') \le {\textrm{En}_{\mathcal {G}}}|_{\mathcal {G}'}\), that is, the potential is sound when applied only to this part of the game. (We note that for instance, any sound potential is \(\mathcal {S}_{\textrm{Trap}}\)-sound for the class of subgames \((\mathcal {G}',\mathcal {G})\) such that \(\mathcal {G}'\) is a \(\textrm{Min}\)-trap.) Therefore, if \(\varPsi \) is \(\mathcal {S}\)-sound, we can solve subgames in \(\mathcal {S}\) partially, and apply the corresponding potential reduction in the whole game, progressing towards a computation of the \({\textrm{En}}\)-values.
The rest of the section is devoted to a proof of Theorem 4.
Termination. Termination of Algorithm 1 is ensured thanks to lines 6 and 7: the function \(\varPhi \) strictly increases in each non-terminating iteration, and it only takes values in \([0,nW]\cup \{\infty \}\), hence the bound \(n^2W\).
Correctness. We now state the key technical theorem enabling our framework. It describes the effect of sound potential reductions over energy values, allowing to combine them. From it, we easily derive compositionality of sound potentials.
Theorem 8
(Update of energy values). If \(\varphi \) is sound then \({\textrm{En}_{\mathcal {G}}}= \varphi + {\textrm{En}}_{\mathcal {G}_\varphi }\).
Corollary 9
(Compositionality). If \(\varphi \) is sound for \(\mathcal {G}\) and \(\varphi '\) is sound for \(\mathcal {G}_\varphi \) then \(\varphi + \varphi '\) is sound for \(\mathcal {G}\).
Proof
As \(\varphi '\) is sound for \(\mathcal {G}_\varphi \), we have that \(\varphi ' \le {\textrm{En}}_{\mathcal {G}_\varphi }\). Adding \(\varphi \) on both sides, we get \(\varphi + \varphi ' \le \varphi + {\textrm{En}}_{\mathcal {G}_\varphi }\). By Theorem 8, the right hand-side is equal to \({\textrm{En}_{\mathcal {G}}}\), as desired. \(\square \)
(Informal proof for Theorem4). Let \(\mathcal {G}_i\), \(\varphi _i=\varPsi (\mathcal {G}_i)\) and \(\varPhi _i=\varphi _1 + \dots + \varphi _{i-1}\) denote the game, potential and cumulative sum at the i-th iteration of the algorithm. Since \(\varPsi \) is sound, \(\varphi _i\) is sound for \(\mathcal {G}_i\) for all i. Thus it follows from an easy induction and compositionality that for all i, \(\varPhi _i\) is sound for \(\mathcal {G}\). In particular, for the maximal i, Theorem 8 gives \(\varPhi _i + {\textrm{En}}_{\mathcal {G}_i} = {\textrm{En}_{\mathcal {G}}}\), but moreover since \(\varphi _i=0\) we get by completeness that \({\textrm{En}}_{\mathcal {G}_i}=0\) which concludes. \(\square \)
3.3 Asymmetry and alternating fast value iteration
Fast value iteration is based on successive underapproximations of the energy valuation \({\textrm{En}}\), which is inherently asymmetric. However, the initial problem (solving mean-payoff games) is itself symmetric, which calls for the design of more symmetrical solutions, a recurring theme in the literature [20, 21, 33, 35].
Dual algorithm computing\(\textrm{Max}\)-values. Let \(\overline{\mathcal {G}}\) be the game obtained by swapping \(V_{\textrm{Min}}\) and \(V_{\textrm{Max}}\) and relabelling the weights by \(\overline{w} = -w\). The two games are essentially equivalent, for instance the mean-payoff values in \(\overline{\mathcal {G}}\) and \(\mathcal {G}\) are opposite. However asymmetric algorithms such as value iterations behave differently over each game; this is useful for instance if one wants to compute Max strategies in \(\mathcal {G}\), which are output by running value iterations in the dual. But this still does not provide a symmetric solution.
Alternating fast value iteration. We now consider
of the algorithm, by working with potentials in \(\varphi : V \rightarrow \mathbb Z\cup \{\pm \infty \}\). The algorithm applies potential reductions corresponding to \(\varPsi \) and its dualized version \(\overline{\varPsi }\)on the same game in an alternating fashion , until all vertices are sent to \(+\infty \) or \(-\infty \). Naturally, when a vertex is set to \(+\infty \) or \(-\infty \), the adequate attractor is computed and removed from the game.
Assuming the potential assigner \(\varPsi \) is sound, since sound potential reductions do not alter winning regions, the algorithm is correct and Min’s winning region is the preimage of \(-\infty \) by the final potential. Termination, however, is not easily guaranteed. Interestingly, we observe experimentally that, for some potential assigners, this alternating algorithm always terminates, and it is even remarkably fast (see Section 5). We leave as an interesting open problem to determine for which potential assigners (if any) this algorithm terminates (see conclusion).
4 Instances of fast value iteration and theoretical comparisons
We have already shown (Example 6) how SVI instantiates in our framework. In this section, we introduce further potential assigners to capture known efficient algorithms for energy games, and prove their soundness. This provides a streamlined and unified presentation of (versions of) the algorithms OSI [32] and QDPM [2] (Section 4.1), namely the positive path iteration algorithm (PPI). We also propose a dynamic variant DPPI, corresponding to a potential assigner generating potentials with provably larger values. At the end of the section we also discuss the GKK algorithm [18], and then provide formal comparisons between the four algorithms stated in our framework.
In all cases, we find that the algorithms are easier to explain over simple games, which we will assume without loss of generality (see Lemma 3); note also that simplicity is preserved by potential reductions.
4.1 The positive path iteration algorithm
We now study the fast value iteration algorithm corresponding to the potential assigner
. We call it the
. It is immediate to check that the potential assigner \({\varPsi _{\textrm{En}^+}}\) is sound and complete (see Example 7), so Theorem 4 applies, directly giving correctness of PPI. Moreover, we can in this case simplify the algorithm by removing lines 6-7 in Algorithm 1, because \({\varPsi _{\textrm{En}^+}}\) is \(\infty \)-attracting.
Proposition 10
The potential assigner \({\varPsi _{\textrm{En}^+}}\) is \(\infty \)-attracting.
We let \(N_\mathcal {G}\) denote the set of vertices from which Min can ensure to immediately see a negative vertex: \(v \in V_{\textrm{Max}}\) (resp. \(V_{\textrm{Min}}\)) belongs to N if and only if all outgoing edges (resp. some outgoing edge) have weight \(<0\). Note that computing \({\textrm{En}^+}\)-values in \(\mathcal {G}\) corresponds to solving a variant of the energy game which stops whenever N is reached. It turns out that this problem is (efficiently) tractable, thanks to two-player game extensions of Dijkstra’s algorithm. In fact, two seemingly distinct algorithms are known, corresponding to OSI [32] and QPDM [2]. Remarkably, Khachiyan, Gurvich and Zhao [23] solved the same problem3 earlier and in a different context (with an algorithm similar to Schewe’s).
Two algorithms for computing\({\textrm{En}^+}\). We now describe the two algorithms, respectively extracted from [32] and [2]. We first introduce some notation. For a subset \(F\subseteq V\), a vertex \(v\in F\) and an edge \(vv'\), we define \({\textsf{esc}_F}(vv') = w(vv')\) if \(v'\notin F\), and \({\textsf{esc}_F}(vv') = \infty \), if \(v'\in F\). We define the escape value of a vertex as:
We will only consider subsets \(F \subseteq {{N_{\mathcal {G}}}}^{\textsf{c}}\), so Max vertices have a non-negative outgoing edge and Min vertices have only non-negative outgoing edges, in particular \({\textsf{esc}_F}(v) \ge 0\). It can be seen as the minimal weight that \(\textrm{Min}\) can force to see while leaving F immediately from v, or \(\infty \) is she cannot force to leave F in one step, assuming Max is constrained to playing non-negative edges. We further let
denote the set of vertices with finite \({\textsf{esc}_F}\), and
and
their intersections with \(V_{\textrm{Max}}\) and \(V_{\textrm{Min}}\). Last, for \(v\in V\), the notation \(\varphi _v(v) \leftarrow x\) indicates that \(\varphi _v\) is the potential defined by \(\varphi _v(v) = x\) and \(\varphi _v(v') = 0\) for \(v'\ne v\).
Theorem 11
(Adapted from [2, 23, 32]). Algorithms 2 and 3 both compute \({\textrm{En}^+_{\mathcal {G}}}\), and both can be implemented to run in \(O(m+n\log n)\) operations.
4.2 A new fast value iteration algorithm
Drawing inspiration from Algorithms 2 and 3 above, we introduce another potential assigner, leading to a fast value iteration algorithm which we call
. Note that both algorithms above compute the Min attractor to \({N_{\mathcal {G}}}\) over non-negative edges, which corresponds exactly to the set of vertices with finite \({\textrm{En}^+}\), and obtain the values of \({\textrm{En}^+}\) by backtracking. We will also backtrack over the same attractor, and just as in Algorithm 3, we make potential updates on the fly. The difference is in the precise way in which we choose the vertices, which enables in our case that some of the potential updates may cause new edges to become positive, which will then be taken into account, sometimes leading to a potential \(>{\textrm{En}^+}\).
Lemma 12
The potential assigner \({\varPsi _{\textsf{DPPI}}}\) is sound and complete for simple games.
We see DPPI as a marginal improvement over PPI, but an improvement nonetheless, showing that the barrier imposed by PPI can be broken, motivating future work. Figure 3 shows a game where DPPI performs fewer iterations than PPI, while Lemma 13 below proves that for any game \(\mathcal {G}\), \(\varPsi _\textsf{DPPI}(\mathcal {G}) \ge \varPsi _{{\textrm{En}^+}}(\mathcal {G})\).
Fig. 3.
A game \(\mathcal {G}\) (all vertices belong to \(\textrm{Max}\)) where DPPI performs a single iteration, as \({\varPsi _{\textsf{DPPI}}}(\mathcal {G})=[v_1 \rightarrow 10;v_2\rightarrow 9;v_N \rightarrow 0]={\textrm{En}}_\mathcal {G}\). In contrast, PPI requires two iterations since \({\textrm{En}^+}=[v_1 \rightarrow 10;v_2\rightarrow 5;v_N \rightarrow 0]\).
4.3 The GKK algorithm
We include a short discussion about the GKK algorithm; a more detailed modern exposition, including state-of-the-art upper bounds and comparison with the related approach of Dorman et al. [11], was proposed by Ohlmann [30].
The
is the \({\varPsi _{\textsf{GKK}}}\)-fast value iteration where
is the potential assigner defined as follows.
Let \(V_{-}\) be the set of vertices from which Min can ensure that a negative edge is seen before the first positive edge. (Note that \(V_{-}\) coincides with \(({\textrm{En}^+_{\mathcal {G}}})^{-1}(0)\).) Likewise, let \(V_{+}\) denote the set of vertices from which Max can ensure seeing a positive edge before a negative one; and observe that in a simple game, \(V_{+}\) is the complement of \(V_{-}\). Consider the maximal value
such that from any vertex of \(V_{+}\) Max can ensure to add up to \({w_+}\) before a negative weight is seen (alternatively, \({w_+}\) is the smallest nonzero value of \({\textrm{En}^+_{\mathcal {G}}}\)) ; and dually for \({w_-}\). Note that, if from any vertex in \(V_{+}\), Max can ensure to remain in \(V_{+}\) while seeing positive vertices, then \({w_+}=\infty \). Clearly \({w_+}\le {\textrm{En}^+}\le {\textrm{En}}\) over \(V_{+}\). We define \({\varPsi _{\textsf{GKK}}}(\mathcal {G})(v)\) to be \(\min ({w_+},-{w_-})\) if \(v \in V_{+}\) and 0 otherwise. Soundness follows from the inequality above, and completeness is easy to prove. Moreover, \({\varPsi _{\textsf{GKK}}}\) is \(\infty \)-attracting.
The potential \({\varPsi _{\textsf{GKK}}}\) has a remarkable symmetric property: the assigned potentials are the same over \(\mathcal {G}\) and over its dual \(\overline{\mathcal {G}}\): \({\varPsi _{\textsf{GKK}}}= \overline{{\varPsi _{\textsf{GKK}}}}\).4 In particular, the algorithm and its alternating version coincide.
4.4 Comparing fast value iteration algorithms
We now propose formal comparisons between the above potential assigners. Intuitively, in order to minimise the number of iterations of a fast value iteration algorithm, we should seek for potentials assigning large values to vertices, so that a “big step” is produced in each iteration. In this sense, if \(\varPsi \le \varPsi '\), the \(\varPsi '\)-FVI algorithm is expected to perform better. A priori, the sequence of games produced by the two algorithms will diverge, impeding formal comparisons on the number of iterations. However, for monotone potential assigners, we can also compare the number of iterations of the induced FVI algorithms.
Moreover, there are games making these inequalities strict. The potential assigners \({\varPsi _{\textrm{First}^+}}\) and \({\varPsi _{\textsf{GKK}}}\) are incomparable.
Let \(\varPsi , \varPsi '\) be two potential assigners. We say that \(\varPsi '\) is monotonically larger than \(\varPsi \), noted
if, for all game and potentials \(\varphi \le \varphi '\), it holds
We say that \(\varPsi \) is monotone if \(\varPsi \mathrel {{\le _{\textrm{mon}}}}\varPsi \). The next two lemmas are immediate.
Lemma 14
Let \(\varPsi , \varPsi '\) be sound, complete potential assigners, and assume \(\varPsi \mathrel {{\le _{\textrm{mon}}}}\varPsi '\). Then over any input game \(\mathcal {G}\), the \(\varPsi '\)-FVI algorithm terminates in less iterations than the \(\varPsi \)-FVI algorithm.
Lemma 15
Let \(\varPsi , \varPsi _1, \varPsi _2\) be potential assigners. It holds:
In particular, if \(\varPsi \) is monotone and \(\varPsi \le \varPsi '\), then \(\varPsi \mathrel {{\le _{\textrm{mon}}}}\varPsi '\).
Proposition 16
The potential assigner \({\varPsi _{\textrm{First}^+}}\) is monotone. Therefore, PPI and DPPI terminate in less iterations than SVI over any input game.
An interesting open question is whether the potential \({\varPsi _{\textsf{GKK}}}\) is monotone.
5 Experimental results
We focus on two distinct game-solving applications: energy game solving, which is the natural target for our algorithms, and parity game solving, which incurs a conversion cost to energy games but allows using established parity game benchmarks and comparison with other parity game solvers.
After explaining the technical aspects of our implementation, and choices of algorithms and benchmarks, we discuss the most remarkable behaviours that can be observed in the experiments.
Experiments were carried on an Intel® Core™ i7-8700 CPU @ 3.20GHz paired with 16GiB of memory, each test being capped at 60 seconds and 10GiB of memory. Arithmetic operations over multiple precision integers are carried out using the GNU Multiple Precision Arithmetic library (GMP). All games are available at: https://github.com/michaelcadilhac/game-benchmarks/tree/TACAS25.
Set of algorithms. We compare our implementations of PPI, DPPI and their alternating versions (PPI-alt and DPPI-alt)5 to 4 other algorithms: QDPM from [2], Zielonka’s recursive algorithm (ZLK), Tangle learning (TL) and Recursive tangle learning (RTL) (winner of the latest edition of SYNTCOMP) from [8, 9]. Only one of them (QDPM) can be executed over general energy games, the other three are parity game solvers
We remark that we do not include comparisons with SVI [6], nor with GKK-DKZ [11, 18], as these algorithms are known to be inefficient in practice [2] and incur in frequent timeouts. Also, we have not compared to an independent implementation of OSI, as we have not found one such implementation computing winning regions consistent with the rest of the algorithms.
5.1 Parity game solving
We show the results of our experiments on parity games in Figure 4. We rely on the yearly competition SYNTCOMP24 for our benchmarks, which has a competition track for parity game solvers, and on the benchmark suite of Keiren [22]. We subdivide the 779 benchmarks into two categories: synthetic games (crafted by researchers, usually with the intent of being hard for certain solving approaches) and organic games (the natural counterpart of the synthetic games). We note that the synthetic games include the “two counter games” examples [10], in which TL and RTL show an exponential behaviour. It also contains the family of examples by Friedmann [16], exponential for OSI. The organic games are essentially the ones provided by Keiren [22], see therein for their origin.
Fig. 4.
Survival plot for parity games benchmarks, divided in organic and synthetic.
As is usual in this settings we present the experimental results as a survival plot, which indicates how many tests are solved (x-axis) within a time limit (y-axis, time per test). In order to solve input parity games with energy games solvers, we first need to convert the parity game into an energy one. This step is rather costly, as the priorities of the parity game suffer an exponential blow-up when converted to weights of an energy game. This cost is included in the runtime of our algorithms as well as QDPM.
5.2 Energy game solving
We show the results of our experiments on parity games in Figure 4. We modified Oink so that it would accept negative weights and implemented a strategy-checker for energy games — this boils down to checking that, in the game restricted to the strategy, Max-winning strongly-connected components do not have infinite negative cycles, and symmetrically for Min.
We consider randomly generated bipartite graphs. The restriction to bipartite graphs is justified by the fact that, otherwise, the vast majority of vertices are part of winning cycles controlled by the same player, making the game (and its resolution) much easier. We separate instances that are sparse (the out-degree of each vertex is 2) or dense (the number of edges is \(n^2 / 5\)).
Fig. 5.
Survival plot for energy games benchmarks, divided in sparse and dense.
5.3 Conclusions of the experiments
In light of the experiments above, we derive the following conclusions.
1.
Overall, the fast value iteration framework captures several algorithms (PPI, PPI-alt, QDPM) that perform competitively in standard benchmarks of parity games. Despite being less efficient than leading parity game solvers, they are remarkably robust against hard instances, particularly PPI-alt.
2.
The alternating version of PPI and DPPI, for which we were unable to prove termination in theory, always terminate. Moreover, over instances coming from parity games benchmarks, they are significantly faster than their asymmetric counterparts.
3.
While DPPI was introduced as a theoretically enhanced version of PPI, there is no significant difference in the running time of these algorithms. In fact, DPPI tends to be slightly slower, due to the increased cost in the computation of the potential.
4.
Although based on the same algorithmic ideas, QDPM consistently outperforms PPI, by almost an order of magnitude. This difference can be explained by two factors: (1) QDPM uses some smart implementation optimizations [2, Sect. 5], and (2) our implementation of PPI is tailored for (usual) edge-weighted games, whereas QDPM is implemented for vertex-labelled game (for which two weights outgoing a given vertex are always equal).
6 Conclusion and future work
We have presented a general framework to describe algorithms for energy games, capturing and providing simple descriptions and correctness proofs for many of them, including the top performing ones in practice. The fast value iteration framework raises numerous exciting questions; we outline some of them here.
New algorithms. The new framework provides a very easy way to propose new correct algorithms: it suffices to define a potential assigner which is sound, complete, and computable in polynomial time. We have isolated \(\varPsi _{{\textrm{En}^+}}\) as a important potential assigner, implicitly used by the two fastest algorithms solving energy games, and presented the potential \(\varPsi _\textsf{DPPI}\) which, while still being computable in polynomial time, is \(\ge \varPsi _{{\textrm{En}^+}}\) in general.
Question 17
Does there exist a reasonable6 potential assigner which is sound, complete, computable in polynomial time and \(\ge \varPsi _\textsf{DPPI}\)?
Alternating algorithms. Our framework also allows to design symmetric alternating algorithms, for which we are unable to prove termination using the currently available tools. Our empirical study shows that, in practice, these not only terminate, but are often considerably faster than their asymmetric counterparts.
Question 18
Do alternating fast value iterations terminate over simple games?
We stress the fact that the question is open for all sound and complete potential assigners (except for GKK, for which the alternating algorithm coincides with the normal one, see Section 4.3).
Lower bounds. Friedmann proposed notoriously involved constructions which provide exponentially many iterations for strategy improvement algorithms in [16]. Although these include OSI (see [16, Sect. 4.6.2]), our experiments show that PPI can solve these instances in linear time, and PPI-alt in a constant number of 2 iterations. Currently, we lack any family of examples in which PPI takes more than a linear number of iterations, although we expect that it should admit exponential lower bounds.
Question 19
Can one design superpolynomial lower bounds on the number of iterations for PPI? And (more challenging) for its alternating variant?
Randomized initialization. As remarked in Section 3.1, the weights of the cycles of \(\mathcal {G}\) and \({\mathcal {G}_{\varphi }}\) coincide for any finite potential \(\varphi \), so the threshold problem for the mean-payoff objective is equivalent over these games. Therefore, we can initialize a given game with an arbitrarily potential \(\varphi \), and solve the “perturbed game”. This directly provides a randomized version of any algorithm: add a random perturbation before execution. This idea is not novel, it was studied empirically by Beffara and Vorobyov [1] for the GKK algorithm; and lower bounds were later derived by Lebedev [24] for the same algorithm.
Question 20
Is the randomized variant of PPI subexponential? More generally, can we design a potential assigner whose associated randomized fast value iteration is subexponential?
Smooth analysis. An interesting parallel can be drawn with smooth analysis [34], which consider small perturbations of the input (randomized initialization is difference since we get an equivalent input). In fact, it was recently established that there is a strategy improvement algorithm for mean-payoff games that is polynomial in the sense of smooth analysis [25].
Question 21
Can the algorithm of [25] be recast as a fast value iteration?
Acknowledgments
We thank the authors of [2] for kindly providing their implementation of QDPM and Alexander Kozachinskyi for pointing out to us several important references.
Antonio Casares is supported by the Polish National Science Centre (NCN) grant “Polynomial finite state computation” (2022/46/A/ST6/00072).
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
By a small abuse of notation, we allow to sum potential with different domains. If \(\varphi :V \rightarrow {\mathbb N^{\infty }}\) and \(\varphi ':V' \rightarrow {\mathbb N^{\infty }}\) with \(V'\subseteq V\), then \(\varphi + \varphi '(v) = \varphi (v)\) for all \(v\notin V'\).
In favour of clarity, we omit the DPPI-alt plots, as they perform identically to PPI-alt. This is expected, given the similarity of the plots of PPI and DPPI.