1 Introduction
1.1 Our contribution
-
Determinacy In contrast to SG with multiple objectives that are not determined, we establish determinacy of SG with a lexicographic combination of reachability and safety objectives.
-
Computational complexity For the associated decision problem we establish the following: (a) if the number of objectives is constant, then the decision problem lies in \(\textsf{NP}\cap \textsf{coNP}\), matching the current known bound for SG with a single objective; (b) in general, the decision problem is \(\textsf{PSPACE}\)-hard and can be solved in \(\textsf{NEXPTIME}\cap \textsf{coNEXPTIME}\).
-
Strategy complexity We show that there exist lexicographically optimal strategies that are deterministic and use finite memory. We also show that memory is only needed in order to remember the already satisfied and violated objectives, i.e., the memory is arena-independent [20].
-
Algorithm We present an algorithm that computes the unique lexicographic value and the witness lexicographically optimal strategies via a reduction to the computation of optimal strategies in a sequence of single-objectives games.
-
Experimental results We have implemented the algorithm and present experimental results on several case studies.
-
Algorithm We present an algorithm that computes the unique lexicographic value and witness lexicographically optimal strategies for MDP with a tuple of lexicographic Streett objectives in polynomial time.
-
Computational complexity We establish that the associated decision problem is in \(\textsf{P}\). This result relies on the objectives being given as Streett-conditions. If they are in a different form, e.g. LTL formulae, we get the corresponding doubly exponential blow-up of transforming LTL to Streett.
-
Strategy complexity We show that there exist lexicographically optimal strategies. They need either randomization or finite memory, but only in a limited part of the state space, the so-called end components. In the rest of the MDP, memoryless deterministic strategies suffice.
-
Experimental results We have implemented the algorithm and present experimental results on several case studies, comparing our deterministic algorithm to the one from [18], which is based on reinforcement learning. Surprisingly, our algorithm often outperforms the learning-based approach.
1.2 Related work
1.3 Organization of this paper
2 Preliminaries
2.1 Games, strategies, and basic objectives
2.1.1 Stochastic games and Markov decision processes
2.1.2 Strategies
2.1.3 Markov chains and probability measures
2.1.4 Reachability and safety objectives
2.1.5 \(\omega \)-regular objectives and streett conditions
2.2 Lexicographic objectives
2.2.1 Lex-value of actions and lex-optimal actions
2.2.2 Lex-optimal strategies
3 Lexicographic 2-player stochastic games
3.1 Lexicographic reach-safe SG with absorbing targets
3.1.1 Characterizing optimal strategies
-
\(\Omega _n = \textsf{Safe}\left( S_n\right) \). Consider the MC \(\mathcal {G}^{\sigma ,\tau }\). Since \({\sigma ,\tau }\) are both MD, this MC has the same finite state space S as the game and its transition probability function is defined as \(P^{\sigma ,\tau }(s,s') = P(s,\sigma (s),s')\) if \(s \in S_{\square }\) and \(P^{\sigma ,\tau }(s,s') = P(s,\tau (s),s')\) if \(s \in S_{\lozenge }\). In \(\mathcal {G}^{\sigma ,\tau }\), the safety probabilities \(\mathbb {P}_s(\Omega _n) = \mathbb {P}_s(\textsf{Safe}\left( S_n\right) )\) constitute the greatest fixed point of the operatorwhich is obtained from the operator \(\mathcal {R}\) for reachability using the relation \(\mathbb {P}_s(\textsf{Safe}\left( S_n\right) ) = 1 - \mathbb {P}_s(\textsf{Reach}\left( S_n\right) )\). Just like \(\mathcal {R}\), the operator \(\mathcal {S}\) is also monotonic on the complete lattice \([0,1]^S\) and we can apply the well-known Theorem of Knaster & Tarski: If we can prove that for all \(s \in S\)$$\begin{aligned} \mathcal {S}:[0,1]^S \rightarrow [0,1]^S,\ \mathcal {S}(x)(s) = {\left\{ \begin{array}{ll} 0 &{}\text {if } s \in S_n\\ \sum _{s'} P^{\sigma ,\tau }(s,s') \cdot x(s') &{}\text {else} \end{array}\right. } \end{aligned}$$then this implies \(\textbf{v}^{\textsf{lex}}_n(s) \le (\textsf{gfp}\ \mathcal {S})(s) = \mathbb {P}_s^{\sigma ,\tau }(\Omega _n)\), where \(\textsf{gfp}\ \mathcal {S}\) denotes the greatest fixed point of \(\mathcal {S}\). To prove (6), we let \(s \in S\) and make another case distinction:$$\begin{aligned} \textbf{v}^{\textsf{lex}}_n(s) \le \mathcal {S}(\textbf{v}^{\textsf{lex}}_n)(s) \end{aligned}$$(6)Thus together with (5) we have \(\textbf{v}^{\textsf{lex}}_n(s) = \mathbb {P}_s^{\sigma ,\tau }(\Omega _n)\) and \(\sigma \) is lex-optimal for \(\varvec{\Omega }= (\Omega _1,\ldots ,\Omega _n)\).
-
\(s \in S_n\). In this case, \(\textbf{v}^{\textsf{lex}}_n(s) = 0 \le \mathcal {S}(\textbf{v}^{\textsf{lex}}_n)(s)\) holds trivially.
-
\(s \in S_{\square }\setminus S_n\). Thenand thus in particular \(\textbf{v}^{\textsf{lex}}_n(s) = \mathcal {S}(\textbf{v}^{\textsf{lex}}_n)(s)\).$$\begin{aligned} \textbf{v}^{\textsf{lex}}(s)&= \max _{a \in \textsf{Act}(s)}{} & {} \sum _{s'} P(s,a,s') \cdot \textbf{v}^{\textsf{lex}}(s') \quad \quad \quad (\text {by Lemma 1(b)})\\= & {} {}&\sum _{s'} P(s,\sigma (s),s') \cdot \textbf{v}^{\textsf{lex}}(s') (\sigma \, \text {is locally lex-optimal}) \end{aligned}$$
-
\(s \in S_{\lozenge }\setminus S_n\). Let \(\textbf{v}^{\textsf{lex}}_{<n}(s)\) be the lex-value with respect to the first \(n-1\) objectives \(\varvec{\Omega }_{<n}\). Since \(\sigma \) is lex-optimal for \(\varvec{\Omega }_{<n}\) and \(\tau \) is a lex-optimal counter-strategy against \(\sigma \), we have thatLet \(\textsf{Act}_{<n}(s)\) be the lex-optimal actions available in s with respect to \(\varvec{\Omega }_{<n}\). By the previous equation, \(\tau (s) \in \textsf{Act}_{<n}(s)\). Therefore,$$\begin{aligned} \textbf{v}^{\textsf{lex}}_{<n}(s)&= \min _{a \in \textsf{Act}(s)}{} {} \sum _{s'} P(s,a,s') \cdot \textbf{v}^{\textsf{lex}}_{<n}(s') \quad \quad \quad (\text {by Lemma 1(a)})\\ &= {} {}\sum _{s'} P(s,\tau (s),s') \cdot \textbf{v}^{\textsf{lex}}_{<n}(s') \end{aligned}$$$$\begin{aligned} \textbf{v}^{\textsf{lex}}_n(s)&= \min _{a \in \textsf{Act}_{<n}(s)}{} {} \sum _{s'} P(s,a,s') \cdot \textbf{v}^{\textsf{lex}}_n(s')\\\le & {} {}\sum _{s'} P(s,\tau (s),s') \cdot \textbf{v}^{\textsf{lex}}_n(s') = \mathcal {S}(\textbf{v}^{\textsf{lex}}_n)(s). \end{aligned}$$
-
-
\(\Omega _n = \textsf{Reach}\left( S_n\right) \). This case is a similar but slightly more involved than the previous case. As mentioned earlier, in \(\mathcal {G}^{\sigma ,\tau }\) the probabilities \(\mathbb {P}_s(\Omega _n)\) constitute the unique fixed point of the following monotonic operator:where the transition probability function P of the Markov chain \(\mathcal {G}^{\sigma ,\tau }\) is defined as before. As in the other case, we prove that \(\textbf{v}^{\textsf{lex}}_n(s) \le \mathcal {R}(\textbf{v}^{\textsf{lex}}_n)(s)\) for all \(s \in S\), which implies \(\textbf{v}^{\textsf{lex}}_n(s) \le (\textsf{gfp}\ \mathcal {R})(s) = \mathbb {P}_s^{\sigma ,\tau }(\Omega _n)\). Notice that the greatest fixed point \(\textsf{gfp}\ \mathcal {R}\) is equal to the unique fixed point of \(\mathcal {R}\). Let \(s \in S\) and let us again make a case distinction to prove \(\textbf{v}^{\textsf{lex}}_n(s) \le \mathcal {R}(\textbf{v}^{\textsf{lex}}_n)(s)\) for all s:$$\begin{aligned} \mathcal {R}:[0,1]^S \rightarrow [0,1]^S,\ \mathcal {R}(x)(s) = {\left\{ \begin{array}{ll} 1 &{}\text {if } s \in S_n\\ 0 &{}\text {if } s \text { cannot reach } S_n \\ \sum _{s'} P(s,s') \cdot x(s') &{}\text {else} \end{array}\right. } \end{aligned}$$
-
If \(s \in S_n\), then \(\textbf{v}^{\textsf{lex}}_n(s) = 1 = \mathcal {R}(\textbf{v}^{\textsf{lex}}_n)(s)\).
-
The cases where s can reach \(S_n\) but \(s \notin S_n\) can be shown exactly as in the previous case where \(\Omega _n\) was safety.
-
Now suppose s cannot reach \(S_n\) in \(\mathcal {G}^{\sigma ,\tau }\), i.e., \(\mathbb {P}_s^{\sigma ,\tau }(\textsf{Reach}\left( S_n\right) ) = 0\). In this case we need to show that \(\textbf{v}^{\textsf{lex}}_n(s) = 0\), or equivalently, \(s \in \textsf{Zero}_n\). By condition (3), we have for all \(t \in S\) thatand thus \(\mathbb {P}_t^{\sigma ,\tau }(\textsf{Reach}\left( S_n\right) ) = \mathbb {P}_t^{\sigma ,\tau }(\textsf{Safe}\left( \textsf{Zero}_n\right) )\). Therefore, \(\sigma \) is also locally lex-optimal for the objective \((\Omega _1,\Omega _2,\ldots ,\textsf{Safe}\left( \textsf{Zero}_n\right) )\). But then we can show exactly as in the previous case that \(\textbf{v}^{\textsf{lex}}_n(t) \le \mathcal {S}(\textbf{v}^{\textsf{lex}}_n)(t)\) where \(\mathcal {S}\) is the fixed point operator for safety probabilities associated to the objective \(\textsf{Safe}\left( \textsf{Zero}_n\right) \). Thus \(\textbf{v}^{\textsf{lex}}_n(s) \le (\textsf{gfp}\ \mathcal {S})(s) = \mathbb {P}_s^{\sigma ,\tau }(\textsf{Safe}\left( \textsf{Zero}_n\right) ) = 0\).$$\begin{aligned}&1 \\ ~=~&\mathbb {P}_t^{\sigma ,\tau }(\textsf{Reach}\left( S_n \cup \textsf{Zero}_n \right) ) \\ ~=~&\mathbb {P}_t^{\sigma ,\tau }(\textsf{Reach}\left( S_n\right) ) + \mathbb {P}_t^{\sigma ,\tau }(\textsf{Reach}\left( \textsf{Zero}_n\right) )\\ ~=~&\mathbb {P}_t^{\sigma ,\tau }(\textsf{Reach}\left( S_n\right) ) + 1 - \mathbb {P}_t^{\sigma ,\tau }(\textsf{Safe}\left( \textsf{Zero}_n\right) ) \end{aligned}$$
-
-
If \(s \in \textsf{Sinks}(\mathcal {G})\), then either \(s \in S_i\) for some \(i \in R\), or otherwise s is a sink which is not contained in any of the \(S_i\) with \(i \in R\) and thus \(s \in \textsf{Zero}_i\) for all \(i \in R\). Thus \(s \in F\) by definition of F and \(v(s) = 1\), contradiction.
-
Let \(s \notin \textsf{Sinks}(\mathcal {G})\). Let \(\sigma \) be an MD optimal strategy for \(\textsf{Reach}\left( F\right) \) in \(\widetilde{\mathcal {G}}\) and let \(\tau \) be an MD optimal counter-strategy. Notice that such strategies exist because we are only considering a single objective [1]. As usual, we consider the finite MC \(\widetilde{\mathcal {G}}^{\sigma ,\tau }\). Since \(v(s) < 1\), we have \(\mathbb {P}^{\sigma ,\tau }_s(\textsf{Reach}\left( F\right) ) < 1\) which means that there is a BSCC \(B \subseteq S\) in \(\widetilde{\mathcal {G}}^{\sigma ,\tau }\) such that \(B \cap F = \emptyset \) and \(\mathbb {P}^{\sigma ,\tau }_s(\textsf{Reach}\left( B\right) ) > 0\). Let \(t \in B\) be any state in the BSCC. Then clearly, \(\mathbb {P}^{\sigma ,\tau }_t(\textsf{Reach}\left( F\right) ) = 0\) and thus \(v(t) = 0\) because \(\sigma \) is optimal for \(\textsf{Reach}\left( F\right) \). But since \(t \notin F\), we have by definition of F that \(\exists i \in R :t \notin \textsf{Zero}_i\), which means that \(\textbf{v}^{\textsf{lex}}_i(t) > 0\). Notice that here, \(\textbf{v}^{\textsf{lex}}\) are the lex-values in the original game \(\mathcal {G}\), however by Lemma 1(b), they coincide with the lex-values in \(\widetilde{\mathcal {G}}\). Thus since \(\textbf{v}^{\textsf{lex}}_i(t) > 0\), there is a strategy of \(\textsf{Max}\) in \(\widetilde{\mathcal {G}}\) that reaches \(S_i\) with positive probability against all counter-strategies of \(\textsf{Min}\) and thus also reaches F with positive probability because \(S_i \subseteq F\). This is a contradiction to \(v(t) = 0\).
3.1.2 Algorithm for SG with absorbing targets
-
\(\widetilde{\mathcal {G}}\)-invariant For \(i\ge 1\), in the i-th iteration of the loop, \(\widetilde{\mathcal {G}}\) is the original SG restricted to only those actions that are locally lex-optimal for the targets 1 to \((i{-}1)\); this is the case because Line 11 was executed for all previous targets.
-
Single-objective case The single-objective that is solved in Line 5 can be either reachability or safety. We can use any (precise) single-objective solver as a black box, e.g. strategy iteration [1]. By Remark 1, it is no problem to call a single-objective solver with a QRO since there is a trivial reduction.
-
QRO for safety If an objective is of type reachability, no further steps need to be taken; if on the other hand, it is safety, we need to ensure that the problem explained in Example 4 does not occur. Thus we compute the Final Set \(F_{<i}\) for the i-th target and then construct and solve the QRO as in Lemma 4.
-
Resulting strategy When storing the resulting strategy, we again need to avoid errors induced by the fact that locally lex-optimal actions need not be globally lex-optimal. This is why for a reachability objective, we only update the strategy in states that have a positive value for the current objective; if the value is 0, the current strategy does not have any preference, and we need to keep the old strategy. For safety objectives, we need to update the strategy in two ways: for all states in the Final Set \(F_{<i}\), we set it to the safety strategy \(\widetilde{\sigma }\) (from Line 5) as within \(F_{<i}\) we do not have to consider the previous reachability objectives and therefore must follow an optimal safety strategy. For all states in \(S \setminus F_{<i}\), we set it to the reachability strategy from the QRO \(\sigma _Q\) (from Line 9). This is correct, as \(\sigma _Q\) ensures almost-sure reachability of \(F_{<i}\) which is necessary to satisfy all preceding reachability objectives; moreover \(\sigma _Q\) prefers those states in \(F_{<i}\) that have a higher safety value (cf. Lemma 4).
-
Case 1 \(\Omega _n = \textsf{Reach}\left( S_n\right) \). By the I.H., \(\widetilde{\mathcal {G}}\) is the correct restriction of \(\mathcal {G}\) to lex-optimal actions for both players with respect to the first \(n-1\) objectives \(\varvec{\Omega }_{<n} = (\Omega _1,\ldots ,\Omega _{n-1})\) and \(\sigma \) is a lex-optimal MD strategy in \(\mathcal {G}\) with respect to \(\varvec{\Omega }_{<n}\). The algorithm computes an MD optimal strategy \(\widetilde{\sigma }\) in \(\widetilde{\mathcal {G}}\) with respect to the single-objective \(\textsf{Reach}\left( S_n\right) \), and the single-objective values \(v(s)\) of this objective in \(\widetilde{\mathcal {G}}\) by calling \(\texttt{SolveSingleObj}\) (line 5). The strategy \(\sigma \in \Sigma _{\textsf{Max}}^{\textsf{MD}}\) is then updated as follows:We claim that \(\sigma \) is lex-optimal for the whole lex-objective \(\varvec{\Omega }= (\Omega _1,\ldots ,\Omega _{n})\).$$\begin{aligned} \sigma (s) \quad = \quad {\left\{ \begin{array}{ll} \widetilde{\sigma }(s) &{}\text {if } \widetilde{v}(s) > 0\\ \sigma _{old}(s) &{}\text {if } \widetilde{v}(s) = 0. \end{array}\right. } \end{aligned}$$
-
We first show that \(\sigma \) remains lex-optimal for the first \(n-1\) objectives \(\varvec{\Omega }_{<n}\) by applying the “if”-direction of Lemma 3: First observe that by definition, \(\sigma \) is locally lex-optimal with respect to \(\varvec{\Omega }_{<n}\). Therefore it only remains to show condition (3) in Lemma 3. Let \(i<n\) such that \(\Omega _i = \textsf{Reach}\left( S_i\right) \), let \(s \in S\) and let \(\tau \in \Sigma _{\textsf{Min}}^{\textsf{MD}}\) be a counter-strategy against \(\sigma \). If \(v(s) = 0\), then \(\mathbb {P}_s^{\sigma ,\tau }(\textsf{Reach}\left( S_i \cup \textsf{Zero}_i\right) ) = 1\) because from initial state s, \(\sigma \) behaves like \(\sigma _{old}\) which is lex-optimal for \(\varvec{\Omega }_{<n}\). Thus let \(v(s) > 0\). From the “only if”-direction of Lemma 3 applied to \(\widetilde{\sigma }\), we know that \(\mathbb {P}^{\sigma ,\tau }_s(\textsf{Reach}\left( S_n \cup \{s \in S \mid v(s) = 0\}\right) ) = 1\). Thus for a play \(\pi \) that starts in s and is consistent with \({\sigma ,\tau }\), almost-surely one of the following two cases occurs:Thus \(\mathbb {P}_s^{\sigma ,\tau }(\textsf{Reach}\left( S_i \cup \textsf{Zero}_i\right) ) = 1\) and \(\sigma \) remains lex-optimal for \(\varvec{\Omega }_{<n}\).
-
\(*\) If \(\pi \) reaches a state \(t \in S_n\), then since t is a sink, we have \(t \in S_i \cup \textsf{Zero}_i\).
-
\(*\) If t with \(v(t) = 0\) is reached in \(\pi \), then since we play according to \(\sigma _{old}\) from t, we either reach \(S_i\) or \(\textsf{Zero}_i\) by the “only if”-direction of Lemma 1 applied to \(\sigma _{old}\).
-
-
To complete the proof that \(\sigma \) is lex-optimal for \(\varvec{\Omega }\), notice that \(v(s) = \textbf{v}^{\textsf{lex}}_n(s)\) for all \(s \in S\) by Lemma 1.
-
-
Case 2 \(\Omega _n = \textsf{Safe}\left( S_n\right) \). Since by the I.H., the values \(\textbf{v}^{\textsf{lex}}_{1},\ldots ,\textbf{v}^{\textsf{lex}}_{n-1}\) are the correct lex-values with respect to \(\varvec{\Omega }_{<n}\), the algorithm computes the final set \(F_{<n} = F_{<n+1} = F\).Next observe that \(v(s) = \textbf{v}^{\textsf{lex}}_n(s)\) for all \(s \in F\) because of the following:Notice that it is very well possible that \({v}(s) > \textbf{v}^{\textsf{lex}}_n(s)\) for \(s \notin F\) because the strategy \(\widetilde{\sigma }\) does not necessarily reach F from \(s \notin F\). Applying Lemma 4 concludes the proof: The quantified reachability objective \(q_n\) constructed by the algorithm indeed satisfies \(q_n(s) = {v}(s) = \textbf{v}^{\textsf{lex}}_n(s)\) for all \(s \in F\) as we have just shown. The result strategy \(\sigma \) defined by the algorithm is
-
First, \(\widetilde{\sigma }\) is locally lex-optimal w.r.t. \(\varvec{\Omega }_{<n}\) because it is defined in the subgame \(\widetilde{\mathcal {G}}\). Therefore by Lemma 1, \(\sigma \) is already (globally) lex-optimal for \(\varvec{\Omega }_{<n}\) from all \(s \in F\) because condition 2 is satisfied trivially. Thus \(v(s) \le \textbf{v}^{\textsf{lex}}_n(s)\).
-
Second, by the same argument, an MD lex-optimal strategy for \(\varvec{\Omega }\) is necessarily locally lex-optimal w.r.t. \(\varvec{\Omega }_{<n}\). The strategy \(\widetilde{\sigma }\) is locally lex-optimal w.r.t. \(\varvec{\Omega }_{<n}\) and moreover optimal for \(\Omega _n\) in the subgame \(\widetilde{\mathcal {G}}\). Thus \({v}(s) \ge \textbf{v}^{\textsf{lex}}_n(s)\).
where \(\sigma _{Q}\)is a lex-optimal strategy for \(\varvec{\Omega }'=(\Omega _1,\ldots ,\Omega _{n-1},\textsf{Reach}\left( q_n\right) )\). Thus with Lemma 4 and the above discussion, \(\sigma \) is lex-optimal from all states.$$\begin{aligned} \sigma (s) \quad = \quad {\left\{ \begin{array}{ll} \widetilde{\sigma }(s) &{}\text {if } s \in F\\ \sigma _{Q}(s) &{}\text {if } s\notin F \end{array}\right. } \end{aligned}$$ -
3.2 General lexicographic reach-safe SG
3.2.1 Reduction to absorbing reach-safe lex-objectives
3.2.2 Algorithm for general SG
-
Reduction to absorbing case We just have seen that once we have the quantitative objective vector \(\textsf{q}\varvec{\Omega }\), we can use the algorithm for absorbing SG (Line 13).
-
Computing the quantitative objective vector To compute \(\textsf{q}\varvec{\Omega }\), the algorithm calls itself recursively on all states in the union of all target sets (Line 5-7). We annotated this recursive call “With dynamic programming”, as we can reuse the results of the computations. In the worst case, we have to solve all \(2^n - 1\) possible non-empty stages. Finally, given the values \(^{\varvec{\Omega }(s)}\textbf{v}^{\textsf{lex}}\) for all \(s \in \bigcup _{j\le n} S_j\), we can construct the quantitative objective (Line 10 and 12) that is used for the call to \(\texttt{SolveAbsorbingRS}\).
-
Termination Since there are finitely many objectives in \(\varvec{\Omega }\) and in every recursive call at least one objective is removed from consideration, eventually we have a simple objective that can be solved by \(\texttt{SolveSingleObj}\) (Line 3).
-
Resulting strategy The resulting strategy is composed in Line 14: It adheres to the strategy for the quantitative query \(^{\textsf{q}\varvec{\Omega }}\sigma \) until some \(s \in \bigcup _{j \le n} S_j\) is reached. Then, to achieve the values promised by \(q_i(s)\) for all i with \(s \notin S_i\), it adheres to \(^{\varvec{\Omega }(s)} \sigma \), the optimal strategy for stage \(\varvec{\Omega }(s)\) obtained by the recursive call.
3.2.3 Determinacy and complexity
4 MDP with lexicographic \(\omega \)-regular objectives
4.1 Lexicographic streett in end components
4.1.1 Solving single streett in end components
4.1.2 Algorithm for lexicographic streett in end components
4.2 Lexicographic streett in general MDP
4.3 On SG with lexicographic \(\omega \)-regular objectives
5 Experimental evaluation
5.1 Reach-safe lex-objective in SG: experiments
5.1.1 Implementation
5.1.2 Case studies
-
Dice This example is shipped with PRISM-games [42] and models a simple dice game between two players. The number of throws in this game is a configurable parameter, which we instantiate with 10, 20 and 50. The game has three possible outcomes: Player \(\textsf{Max}\) wins, Player \(\textsf{Min}\) wins or draw. A natural lex-objective is thus to maximize the winning probability and then the probability of a draw.
-
Charlton This case study [37] is also included in PRISM-games. It models an autonomous car navigating through a road network. A natural lex-objective is to minimize the probability of an accident (possibly damaging human life) and then maximize the probability to reach the destination.
-
Hallway (HW) This instance is based on the Hallway example which is standard in the AI literature [62, 63]. A robot can move north, east, south or west in a known environment, but each move only succeeds with a certain probability, and otherwise rotates or moves the robot in an undesired direction. We extend the example by a target wandering around based on a mixture of probabilistic and demonic non-deterministic behavior, thereby obtaining a stochastic game that models, for instance, a panicking person in a building on fire. Moreover, we assume a 0.01 probability of damaging the robot when executing certain movements; a damaged robot’s actions succeed with even smaller probability. The primary objective is to save the human, and the secondary objective is to avoid damaging the robot. We use grid-worlds of sizes \(5 {\times } 5\), \(8 {\times } 8\), and \(10 {\times } 10\).
-
Avoid the observer (AV) This case study is inspired by a similar example in [64]. It models a game between an intruder and an observer in a grid-world. The grid can have different sizes as in HW, and we use \(10 {\times } 10\), \(15 {\times } 15\), and \(20 {\times } 20\). The most important objective of the intruder is to avoid the observer, and its secondary objective is to exit the grid. We assume that the observer can only detect the intruder within a certain distance and otherwise makes random moves. At every position, the intruder moreover has the option to stay and search for a treasure. In our example, a treasure is found with probability 0.1 each time the intruder decides to search for it. Collecting a treasure is the ternary (reachability) objective.
5.1.3 Experimental results
Model | |S| | Time [s] | Average #actions | #Stages | |||
---|---|---|---|---|---|---|---|
Lex | First | All | \(\mathcal {G}\) | \(\widetilde{\mathcal {G}}\) | |||
R – R | |||||||
Dice[10] | 4,855 | <1 | <1 | <1 | 1.42 | 1.41 | 1/3 |
Dice[20] | 16,915 | <1 | <1 | <1 | 1.45 | 1.45 | 1/3 |
Dice[50] | 96,295 | 3 | 2 | 2 | 1.48 | 1.48 | 1/3 |
S – R | |||||||
Charlton | 502 | <1 | <1 | <1 | 1.56 | 1.07 | 3/3 |
R – S | |||||||
HW[5\(\times \)5] | 25,000 | 10 | 7.15 | 7 | 2.44 | 1.02 | 3/3 |
HW[8\(\times \)8] | 163,840 | 152 | 117 | 117 | 2.50 | 1.01 | 3/3 |
HW[10\(\times \)10] | 400,000 | 548 | 435 | 435 | 2.52 | 1.01 | 3/3 |
S – R – R | |||||||
AV[10\(\times \)10] | 106,524 | 15 | <1 | 10 | 2.17 | 1.55, 1.36 | 4/7 |
AV[15\(\times \)15] | 480,464 | 85 | <1 | 50 | 2.14 | 1.52, 1.36 | 4/7 |
AV[20\(\times \)20] | 1,436,404 | 281 | 3 | 172 | 2.13 | 1.51, 1.37 | 4/7 |
5.2 MDP with lexicographic LTL: experiments
5.2.1 Implementation
5.2.2 Case studies
-
Cleaningrobot A robot cleans a house with two floors. It starts in the upper floor. At some point, it can decide to try to clean the stairs, but with a probability of 0.5 it will fall down and not clean the stairs. After this, it can only clean the ground floor, and never go up to the first floor again. The LTL objectives that we check are (in this order) (1) \(\textbf{GF}\;clean_{first}\), (2) \(\textbf{F}\;clean_{stairs}\), and (3) \(\textbf{GF}\;clean_{ground}\).Gridworld This example models a robot that moves around in a grid world, has to visit several outposts, avoid dangerous zones, and recharge every now and then. It can only move left, right, up and down. The uncertainty comes into play when the robot chooses to go into one direction: With probability of 0.5, it will move one step, and with 0.5 probability it will move two steps. A visualization can be found in Fig. 3, together with the objectives that we checked.Virus This example models a computer virus that is trying to breach a system. The system consists of a grid of nodes, 3\(\times \)3 in our case. If a node is infected, it can choose to attack its neighboring nodes. This will be detected with a probability of 0.5. After a node was successfully attacked, it will get infected with a probability of 0.5, as well. This model stems originally from [66]. The objectives to be verified are:The first one makes sure that node (3, 2) will eventually be infected, and once it is infected, node (3, 1) should be infected in at most two steps. The second objective defines a specific path to infect node (1, 1), without infecting nodes (2, 2) and (3, 2).1.\(\textbf{F}s_{3,2} = 2 ~\wedge ~\textbf{G} ((s_{3,2}=2 \wedge s_{3,1}\ne 2) \implies \textbf{XX}s_{3,1} = 2)\), and2.\(\textbf{F}s_{1,1} = 2~\wedge ~\textbf{G}(s_{2,2}\ne 2 \wedge s_{3,2}\ne 2)\) .UAV This model considers an unmanned aerial vehicle (UAV) that moves on a network of roads, originally presented in [67]. There are waypoints that should be visited (modelled by the first objective), and restricted operation zones that should be avoided (modelled by the second objective). Formally, the objectives are (1) \(\bigwedge _i \textbf{G}\lnot roz_i\), and (2) \(\textbf{F}w_1\wedge \textbf{F}w_2\wedge \textbf{F}w_6\).
5.2.3 Experimental results
Model | |S| | |MEC| | Order | Result | Time [s] | |
---|---|---|---|---|---|---|
Ours | [18] | |||||
CleaningRobot | 5 | 4 | 1, 2, 3 | 1, 0, 0 | <1 | 2 |
CleaningRobot | 5 | 4 | 2, 1, 3 | 0.5, 0.5, 0.5 | <1 | 3 |
CleaningRobot | 5 | 4 | 3, 2, 1 | 1, 0.5, 0 | <1 | 2 |
Gridworld[\(4{\times }4\)] | 16 | 14 | 1, 2, 3 | 1, 0, 1 | <1 | 13 |
Gridworld[\(4{\times }4\)] | 16 | 14 | 2, 1, 3 | 1, 0, 1 | <1 | 4 |
Gridworld[\(5{\times }5\)] | 25 | 2 | 1, 2, 3 | 1, 1, 1 | <1 | 26 |
Gridworld[\(4{\times }7\)] | 28 | 46 | 1, 2, 3 | 1, 0.75, 0.25 | <1 | 44 |
Gridworld[\(4{\times }7\)] | 28 | 46 | 2, 1, 3 | 1, 0.5, 0 | <1 | 24 |
Gridworld[\(10{\times }10\)] | 100 | 2 | 1, 2, 3 | 1, 1, 1 | <1 | 5 |
Gridworld[\(20{\times }20\)] | 400 | 2 | 1, 2, 3 | 1, 1, 1 | <1 | 7 |
Gridworld[\(30{\times }30\)] | 900 | 2 | 1, 2, 3 | 1, 1, 1 | <1 | 9 |
Virus | 809 | 1042 | 1, 2 | 1, 0 | 28 | 87 |
Virus | 809 | 1042 | 2, 1 | 1, 0.25 | 28 | 405 |
UAV | 11,448 | 6578 | 1, 2 | 1, 1 | 1367 | 213 |