Zum Inhalt
Erschienen in:

Open Access 2025 | OriginalPaper | Buchkapitel

Fixed Point Certificates for Reachability and Expected Rewards in MDPs

verfasst von : Krishnendu Chatterjee, Tim Quatmann, Maximilian Schäffeler, Maximilian Weininger, Tobias Winkler, Daniel Zilken

Erschienen in: Tools and Algorithms for the Construction and Analysis of Systems

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Dieses Kapitel untersucht die kritische Notwendigkeit der Zertifizierung von Algorithmen in Markov-Entscheidungsprozessen (MDPs) und betont die Berechnung von Erreichbarkeitswahrscheinlichkeiten und erwarteten Belohnungen. Es geht auf die Schwachstellen in bestehenden MDP-Modellprüfwerkzeugen ein und hebt Fehlerquellen wie Implementierungsfehler, unsolide Algorithmen, numerische Ungenauigkeiten und Abhängigkeiten von Drittanbietern hervor. Die Autoren schlagen einen neuen Standard für zertifizierte MDP-Modellüberprüfung vor, der Festpunktzertifikate einführt, die knappe, leicht überprüfbare Nachweise der Ergebnisse liefern. Diese Zertifikate sind solide und vollständig für willkürliche endliche MDPs und gewährleisten strenge Korrektheit und Präzision. Das Kapitel formalisiert die Theorie in Isabelle / HOL und erzeugt eine formal verifizierte Zertifikatsprüfer-Implementierung. Außerdem werden praktische Implementierungen des Storm-Modellcheckers vorgestellt, die die Anwendbarkeit und Erweiterbarkeit des vorgeschlagenen Ansatzes anhand umfangreicher Experimente mit dem Quantitativen Verifikationsbenchmark Set demonstrieren. Das Kapitel schließt mit einer Diskussion über Skalierbarkeit und Effizienz der zertifizierten Pipeline, die den Weg für zukünftige Fortschritte bei der probabilistischen Modellüberprüfung ebnet.
Hinweise
This project has received funding from the ERC CoG 863818 (ForM-SMArt), the Austrian Science Fund (FWF) 10.55776/COE12, a KI-Starter grant from the Ministerium für Kultur und Wissenschaft NRW, the DFG RTG 378803395 (ConVeY), the EU’s Horizon 2020 research and innovation programmes under the Marie Sklodowska-Curie grant agreement Nos. 101034413 (IST-BRIDGE) and 101008233 (MISSION), and the DFG RTG 2236 (UnRAVeL). Experiments were performed with computing resources granted by RWTH Aachen University under project rwth1632.

1 Introduction

Markov decision processes (MDPs) [5, 7, 48] are the model for sequential decision making in probabilistic environments. Their many applications [32, 53] frequently require computing reachability probabilities towards an (un-)desired system state, as well as the expected rewards (or costs) accumulated until doing so. MDP model checking amounts to computing (approximations of) these quantities in a mathematically rigorous way, with a formal guarantee of their correctness and precision. Various mature MDP model checking tools such as PRISM  [42], mcsta  [28], and Storm  [33] exist. Figure 1 shows an example MDP.
Who checks the model checker? The possibility of errors in complex, human-engineered formal verification tools is a delicate issue: How formal is a verification result produced by an informal, i.e., unverified implementation? We highlight four sources of errors: (i) classic implementation bugs, (ii) unintentionally unsound algorithms [8, 24], optimizations, and heuristics, (iii) numerical errors due to floating point arithmetic [27], and (iv) errors in third-party back end libraries or tools, e.g., commercial LP solvers [29].
Certifying algorithms [44] are a paradigm for establishing trust in implementations. A certifying algorithm produces a concise, easily verifiable proof—a certificate—of its result. The certificate can be checked independently, possibly even by an external, simpler program amenable to formal verification, or by a third party. Formally verified certificate checkers are already employed in tool competitions on software verification [12] or SAT-solving [9]. Existing proposals for certifying MDPs [22, 34, 35], however, have some drawbacks (detailed further below) hindering wider adoption in the community and its competitions [3, 15, 26].
The goal of this paper is to establish a new standard for certified MDP model checking, with a focus on applicability and extensibility.
Our contributions towards this goal are as follows:
  • We present fixed point certificates for two-sided bounds on extremal reachability probabilities (Figure 1) and expected rewards. Our certificates are sound and complete for arbitrary finite MDPs without structural restrictions.
  • We formalize the theory in Isabelle/HOL  [46], proving soundness of our certificates, and generate a formally verified certificate checker implementation.
  • We implement a certifying variant of [29] Interval Iteration [8] with floating point arithmetic in Storm  [33]. Using this, we give certified reference results for the Quantitative Verification Benchmark Set [32].
Fig. 1.
An MDP with states \(S= \{z,s,t\}\), two actions (distinguished by solid and dashed edges), uniform probabilities, and target set \(T = \{t\}\). The annotations above and below each state are a certificate for upper and lower bounds on \(\mathbb {P}^{\min }(\lozenge T)\), resp.
Extensibility towards further properties is enabled by our simple, clean theory summarized as four guiding principles: (GP1) We characterize the quantities of interest as a fixed point of basic, easy-to-evaluate Bellman-type operator [11]. The fundamental certification mechanism is to use fixed point induction for proving bounds on the least or greatest fixed point. (GP2) We certify qualitative reachability properties using ranking functions which are amenable to fixed point induction, too. (GP3) As the basic Bellman operators frequently have undesired, spurious fixed points [13, 24, 39] (often related to end components [2]), we consider slight modifications requiring qualitative reachability information which we certify following GP2. (GP4) When GP3 is insufficient or not applicable, we implicitly include a witness strategy in our certificate.
Technical challenges still arise in concretely applying these guiding principles. For instance, a key novelty of our paper is a ranking-function type certificate for not almost sure reachability (Proposition 2), which is surprisingly involved.
Table 1.
Our reachability certificates. Sound and complete for arbitrary finite MDPs.
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Tab1_HTML.png
Related work. Closest to our work are the previous proposals for certificates in MDP model checking: [34, Sec. 4] formally verifies a theory of certificates for reachability objectives, which is however limited to upper bounds on maximal and lower bounds on the minimal probabilities. [22] presents so-called “Farkas certificates” for reachability; however, it does not offer a formally verified certificate checker, is limited to MDPs without end components (ECs), and does not address certificate generation explicitly. With similar limitations, [6] provides Farkas certificates for multiple reachability or mean payoff objectives, which can be computed via linear programming. [35] suggests lifting the EC assumptions from [22] by certifying the full maximal EC decomposition. In contrast, our certificates are more concise as they handle ECs using at most one ranking function. Further, [35] proposes certificates for expected rewards, but they require the target to be reached almost surely, an assumption we do not have to make.
Witnessing subsystems [6, 22, 36, 54] are an alternative certification paradigm. However, their verification requires more computational effort than the simple, state-wise operations needed for checking Farkas or fixed point certificates. Still, they utilize similar ideas: The backward reachable states in [54, Sec. 3.3.3] essentially use ranking functions, as do the constraints in [37, Sec. 5.2.2].
The term “certifying algorithms” was coined in [38]. Previous work on certificates for other verification problems includes [20, 40, 41, 45, 47]. Further, certificates were recently investigated in hardware verification [21, 5759] and approximate model counting [52]. Finally, we mention that Optimistic Value Iteration (OVI) [4, 31], the supermartingales in [1, 43, 51], the certificates for probabilistic pushdown systems from [55, 56], and a recent strategy synthesis method for infinite MDPs [10] follow fixed point induction principles similar to GP1.
Paper outline. After the background on MDPs and fixed point theory (Section 2), we introduce ranking functions for qualitative reachability (Section 3). Building on this, we discuss quantitative reachability (Section 4, Table 1) and expected rewards (Section 5,[17, Tab. 2]). We explain how to compute certificates (Section 6) and report on experiments (Section 7). Omitted pen-and-paper proofs are in [17, App. C–E]. All proofs regarding the soundness of the certificates, even standard results from the literature, are formalized in Isabelle/HOL .

2 Preliminaries

A Markov decision processes (MDP) is a tuple \(\mathcal {M}= (S, {Act}, P)\) where \(S\) is a finite set of states, \({Act}\) is a finite set of actions, and \(P:S\times {Act}\times S\rightarrow [0,1]\) is a transition probability function with the property that \(\sum _{s' \in S} P(s,a,s') \in \{0,1\}\) for all \(s \in S\) and \(a \in {Act}\). For every \(s \in S\), the set of enabled actions \(a\in {Act}\) for which the above sum equals 1 is written \({Act}(s)\). It is required that \({Act}(s) \ne \emptyset \) for all \(s \in S\). For \(s \in S\) and \(a \in {Act}(s)\), we define the a-successors of s as \( Post (s,a) = \{s' \in S\mid P(s,a,s') > 0\}\). Notice that our MDPs do not have a distinguished initial state. See Figure 1 for an example MDP.
A (finite-state, discrete-time) Markov chain (DTMC) is the special case of an MDP with \(|{Act}(s)| =1\) for all \(s \in S\). A (memoryless and deterministic) strategy1 for an MDP \(\mathcal {M}= (S,{Act},P)\) is a function \(\sigma :S\rightarrow {Act}\) such that for all \(s \in S\) we have \(\sigma (s) \in {Act}(s)\). We may apply \(\sigma \) to \(\mathcal {M}\) to obtain the induced DTMC \(\mathcal {M}^\sigma = (S, {Act}, P^\sigma )\) which, intuitively, only retains the actions chosen by \(\sigma \). Formally, for all \(s, s' \in S\) we define \(P^\sigma (s,\sigma (s),s') = P(s,\sigma (s),s')\), and \(P^\sigma (s,a,s') = 0\) for all \(a \ne \sigma (s)\).
Reachability and Expected Rewards. Fix a DTMC \((S,{Act},P)\), a target set \(T\subseteq S\), and a reward function \(\text {rew}:S\rightarrow \mathbb {R}_{\ge 0}\). We define two random variables \(\lozenge T\) and \(\text {rew}^{\lozenge T}\) taking values in \(\{0,1\}\) and \(\overline{\mathbb {R}}_{\ge 0}\), respectively: For \(s_0s_1 ... \in S^\omega \) an infinite path, we set \(\lozenge T(s_0s_1\ldots ) = 1\) if and only if (iff) \(\exists i \in \mathbb {N}:s_i \in T\). Moreover:
$$\begin{aligned} \text {rew}^{\lozenge T}(s_0s_1\ldots ) ~=~ {\left\{ \begin{array}{ll} \sum _{k = 0}^{\min \{i \mid s_i \in T\}} \text {rew}(s_k) & \text {if } \exists i \in \mathbb {N}:s_i \in T\\ * & \text {else} \end{array}\right. } \end{aligned}$$
We consider both options \(* = \infty \) and \(* = \sum _{k =0}^\infty \text {rew}(s_k)\) [19]. We focus on the former in the main body, as it is standard in the literature [7, Def. 10.71] and tool competitions [15, 26]; we treat the latter in [17, App. F]. Intuitively, with \(* = \infty \), \(\text {rew}^{\lozenge T}\) assigns \(\infty \) to paths that never reach \(T\). The other paths receive the sum of rewards collected until reaching \(T\) for the first time. Given a state \(s \in S\), we define the reachability probability \(\mathbb {P}_{s}(\lozenge T)\) from s to \(T\) as the expected value (Lebesgue integral) of \(\lozenge T\) w.r.t. the probability measure \(\mathbb {P}_s\) on infinite paths of the DTMC with initial state fixed to s, see [7, Ch. 10] for the construction of \(\mathbb {P}_s\). Similarly, the expected reward \(\mathbb {E}_s(\text {rew}^{\lozenge T})\) from s to \(T\) is the expected value of \(\text {rew}^{\lozenge T}\). When \(\text {rew}\) is clear from context, we write \(\mathbb {E}_s(\lozenge T)\) instead of \(\mathbb {E}_s(\text {rew}^{\lozenge T})\).
Finally, given an MDP \(\mathcal {M}= (S, {Act}, P)\), a state \(s \in S\), a target set \(T\subseteq S\), a reward function \(\text {rew}:S\rightarrow \mathbb {R}_{\ge 0}\), and \(\operatorname {opt}\in \{\min , \max \}\) we define the optimal reachability probability \(\mathbb {P}^{\operatorname {opt}}_{s}(\lozenge T) = \operatorname {opt}_{\sigma } \mathbb {P}^{\sigma }_{s}(\lozenge T)\) and the optimal expected reward \(\mathbb {E}^{\operatorname {opt}}_{s}(\text {rew}^{\lozenge T}) = \operatorname {opt}_{\sigma } \mathbb {E}^{\sigma }_{s}(\text {rew}^{\lozenge T})\), where \(\mathbb {P}^{\sigma }_s(\lozenge T)\) and \(\mathbb {E}^{\sigma }_{s}(\text {rew}^{\lozenge T})\) are the reachability probabilities and expected rewards in the induced DTMC \(\mathcal {M}^\sigma \).
Fixed Point Theory. A partial order on a set \(X\) is a binary relation \(\preceq \) that is reflexive, transitive, and antisymmetric; in this case, the tuple \((X,\preceq )\) is called a poset. Given a poset \((X, \preceq )\), we call \(a \in X\) an upper bound on \(Y\subseteq X\) if for all \(b \in Y\) we have \(b \preceq a\). If an upper bound a on \(Y\) is minimal among all upper bounds, it is the unique supremum (or least upper bound) and denoted \(\sup {Y}\). Lower bounds and infima (or greatest lower bounds) are defined analogously.
The poset \((X, \preceq )\) is a complete lattice if \(\sup Y\) and \(\inf Y\) exist for every \(Y\subseteq X\). Every complete lattice has a least and greatest element \(\sup \emptyset \) and \(\inf \emptyset \), respectively. The following complete lattices are of interest in this paper:
  • \(({\overline{\mathbb {N}}}, \le )\) where \({\overline{\mathbb {N}}}= \mathbb {N}\cup \{\infty \}\) are the extended natural numbers and \(\le \) is the usual order on \(\mathbb {N}\) extended by \(a \le \infty \) for all \(a \in {\overline{\mathbb {N}}}\). Notice that for every \(Y\subseteq \mathbb {N}\), \(\sup Y= \infty \) iff \(Y\) is infinite.
  • Similarly, \((\overline{\mathbb {R}}_{\ge 0}, \le )\), with \(\overline{\mathbb {R}}_{\ge 0}= \mathbb {R}_{\ge 0}\cup \{\infty \}\) the extended non-negative reals, is a complete lattice. For every \(Y\subseteq \mathbb {R}_{\ge 0}\), \(\sup Y= \infty \) iff \(Y\) is unbounded.
  • \(([0,1], \le )\), the totally ordered set of real probabilities.
  • If \((X, \preceq )\) is an arbitrary complete lattice, then for all sets \(S\), \((X^S, \mathrel {\ddot{\preceq }})\) is a complete lattice, where \(X^S= \{f \mid f :S\rightarrow X\}\) is the set of functions from \(S\) to \(X\) and the partial order \(\mathrel {\ddot{\preceq }}\) is defined as \(f \mathrel {\ddot{\preceq }}g \iff \forall s \in S:f(s) \preceq g(s)\). In the following, we overload notation and write \(\preceq \) instead of \(\mathrel {\ddot{\preceq }}\). For example, if \(S\) is the set of states of an MDP, then we can think of \(([0,1]^S, \le )\) as the poset of “probability vectors” indexed by \(S\), partially ordered entry-wise.
Let \((X, \preceq )\) be a poset. A function \(\mathcal {F}:X\rightarrow X\) is called monotone if \(\forall a, b \in X:a \preceq b \implies \mathcal {F}(a) \preceq \mathcal {F}(b)\). The following is the key tool of this paper:
Theorem 1
(Knaster-Tarski). Let \((X, \preceq )\) be a complete lattice and \(\mathcal {F}:X\rightarrow X\) be monotone. Then, the set of fixed points \((\{a \in X\mid \mathcal {F}(a) = a\}, \preceq )\) is also a complete lattice. In particular, \(\mathcal {F}\) has a least and a greatest fixed point given by \(\operatorname {lfp}{\mathcal {F}} = \inf {\{a \in X\mid \mathcal {F}(a) \preceq a\}}\) and, dually, \(\operatorname {gfp}{\mathcal {F}} = \sup {\{a \in X\mid a \preceq \mathcal {F}(a)\}}\). As a consequence, the following fixed point induction rules are sound: \(\forall a \in X:\)
  • \(\mathcal {F}(a) \preceq a \implies \operatorname {lfp}{\mathcal {F}} \preceq a\) (fixed point induction)
  • \(a \preceq \mathcal {F}(a) \implies a \preceq \operatorname {gfp}{\mathcal {F}}\) (fixed point co-induction)
Elements \(a \in X\) with \(\mathcal {F}(a) \preceq a\) (or \(a \preceq \mathcal {F}(a)\)) are called (co-)inductive.
Throughout the rest of the paper, we fix an MDP \(\mathcal {M}= (S,{Act},P)\), a set of target states \(T\subseteq S\) and, for Section 5, a reward function \(\text {rew}:S\rightarrow \mathbb {R}_{\ge 0}\). Moreover, we let \(\operatorname {opt}\in \{\min , \max \}\) and write \(\overline{\min }= \max \) and \(\overline{\max }= \min \).

3 Certifying Qualitative Reachability

Most of the certificates presented in the forthcoming Sections 4 and 5 enclose a certificate for a qualitative reachability property, e.g., \(\mathbb {P}^{\operatorname {opt}}(\lozenge T) > 0\) or \(\mathbb {P}^{\operatorname {opt}}(\lozenge T) < 1\). Our approach to this is summarized as follows:
Definition 1
(Distance Operator). Let \((S,{Act},P)\), \(T\), and \(\operatorname {opt}\) be the fixed MDP, target set and optimization direction, respectively. (We omit these quantifications in the rest of the paper.) We define the following distance operator:
$$\begin{aligned} \mathcal {D}^{\operatorname {opt}} :~ {\overline{\mathbb {N}}}^S\rightarrow {\overline{\mathbb {N}}}^S, ~ \mathcal {D}^{\operatorname {opt}} (r)(s) = {\left\{ \begin{array}{ll} 0 & \text { if } s \in T\\ 1 ~+~ \underset{a \in {Act}(s)}{\operatorname {opt}}\ ~ \underset{s' \in Post (s,a)}{\min }\ r(s') & \text { if } s \in S\setminus T\end{array}\right. } \end{aligned}$$
\(\mathcal {D}^{\operatorname {opt}} \) is a monotone Bellman-type operator on the complete lattice \(({\overline{\mathbb {N}}}^S, \le )\) and thus has a least fixed point by Theorem 1. In fact, we even have the following:
Lemma 1
(Unique Fixed Point). \(\mathcal {D}^{\operatorname {opt}} \) has a unique fixed point.
Intuitively, if \(r= \operatorname {fp}{\mathcal {D}^{\min } }\), then \(r(s)\) represents the length of a shortest path from every state \(s \in S\) to \(T\), or \(r(s) = \infty \) if \(T\) is not reachable from s. For \(r= \operatorname {fp}{\mathcal {D}^{\max } }\), \(r(s)\) can be seen as the shortest path in the DTMC induced by a strategy that aims to avoid \(T\) or reach it as late as possible. We formalize this intuition in Lemma 2 (using the notation \(\overline{\min }= \max \) and \(\overline{\max }= \min \)), and then in Proposition 1 apply Principle 1 to certify positive reachability.
Lemma 2
Let \(r= \operatorname {fp}{\mathcal {D}^{\operatorname {opt}} }\). Then for all \(s \in S\), \(r(s) = \infty \iff \mathbb {P}_s^{\overline{\operatorname {opt}}}(\lozenge T) = 0\).
Proposition 1
(Certificates for \(\mathbb {P}^{\operatorname {opt}}(\lozenge T) \!>\! 0\)). A function \(r\in {\overline{\mathbb {N}}}^S\) is called a valid certificate for positive \(\operatorname {opt}\)-reachability if \(\mathcal {D}^{\overline{\operatorname {opt}}} (r) \!\le \! r\). If \(r\) is valid, then \(\forall s \!\in \! S:r(s) \!<\! \infty \!\implies \! \mathbb {P}_s^{\operatorname {opt}}(\lozenge T) \!>\! 0\).
Example 1
Consider the MDP in Figure 1 on page 1. The values on the bottom right of the states constitute a valid certificate \(r\) for positive \(\min \)-reachability. To check that \(\mathcal {D}^{\overline{\min }} (r) = \mathcal {D}^{\max } (r)\le r\) is indeed true, we verify the following:
and similar for z and t. As \(r(s), r(t) < \infty \), we conclude \(\mathbb {P}^{\min }_s(\lozenge T), \mathbb {P}^{\min }_t(\lozenge T) > 0\).
Remark 1
(Certificates for \(\mathbb {P}^{\operatorname {opt}}(\lozenge T) \!=\! 0\)). While we do not need this in our paper, it is instructive to notice that with Lemmas 1 and 2 we can also certify zero reachability probability: By Knaster-Tarski, any \(r\) with \(r \le \mathcal {D}^{\overline{\operatorname {opt}}} (r)\) witnesses \(r\le \operatorname {fp}{\mathcal {D}^{\overline{\operatorname {opt}}} }\), hence if \(r(s) = \infty \) for a state s, then \(\mathbb {P}^{\operatorname {opt}}_s(\lozenge T)= 0\).
Certificates for non-almost-sure (a.s.) reachability, i.e., \(\mathbb {P}^{\operatorname {opt}}(\lozenge T) \!<\! 1\), are needed in Section 5. Ranking function-based certificates for this property are—perhaps surprisingly—more involved. In Definition 2 below we define a complementary distance operator that captures (approximately) the distance to the states Z from which \(T\) is avoided surely, i.e., \(\mathbb {P}^{\operatorname {opt}}_s(\lozenge T) = 0\) for all \(s\in Z\). By Lemma 2, finite distance to Z witnesses positive \(\overline{\operatorname {opt}}\)-reachability of Z and thus non-a.s. \(\operatorname {opt}\)-reachability of \(T\). A major complication is that Z is not given explicitly. We address this by (i) considering the least fp, and (ii) letting the operator only increment the distance if two successors do not have the same rank. For this, we use Iverson bracket notation: \([\varphi ] = 1\) if \(\varphi \) is true; \([\varphi ] = 0\), else. Together, (i) and (ii) ensure that \(s \in S\) has rank 0 in the \(\operatorname {lfp} \) if and only if \(s \in Z\).
Definition 2
(Complementary Distance Operator).We define the complementary distance operator \(\overline{\mathcal {D}}^{\operatorname {opt}} :{\overline{\mathbb {N}}}^S\rightarrow {\overline{\mathbb {N}}}^S\), with
$$\begin{aligned} \overline{\mathcal {D}}^{\operatorname {opt}}(r)(s) ~=~ {\left\{ \begin{array}{ll} \infty & \text { if } s \in T\\[2pt] \underset{a \in {Act}(s)}{\operatorname {opt}}\ \bigg ( \underset{s' \in Post (s,a)}{\min }\ r(s') \,+\, \big [\exists u,v \!\in \! Post (s,a):r(u)\!\ne \!r(v)\big ] \bigg ) & \\ [-3pt] & \text { if } s \in S\setminus T\end{array}\right. } \end{aligned}$$
Note that unlike the distance operator \(\mathcal {D}^{\operatorname {opt}} \) from Definition 1, \(\overline{\mathcal {D}}^{\operatorname {opt}}\) does not have a unique \(\operatorname {fp} \): The constant \(r= \vec{\infty }\) is always a trivial fixed point.
Lemma 3
Let \(r= \operatorname {lfp}{\overline{\mathcal {D}}^{\operatorname {opt}}}\). Then for all \(s \in S\), \(r(s) = \infty \iff \mathbb {P}_s^{\operatorname {opt}}(\lozenge T) = 1\).
Proposition 2
(Certificates for \(\mathbb {P}^{\operatorname {opt}}(\lozenge T) \!<\! 1\)). A function \(r\!\in \! {\overline{\mathbb {N}}}^S\) is called a valid certificate for non-a.s. \(\operatorname {opt}\)-reachability if \(\overline{\mathcal {D}}^{\operatorname {opt}}(r) \!\le \! r\). If \(r\) is valid, then \(\forall s \!\in \! S:r(s) \!<\! \infty \!\implies \! \mathbb {P}_s^{\operatorname {opt}}(\lozenge T) \!<\! 1\).
Remark 2
(Certificates for \(\mathbb {P}^{\operatorname {opt}}(\lozenge T) \!=\! 1\)). Since \(\overline{\mathcal {D}}^{\operatorname {opt}}\) does not have a unique fp, we cannot use the trick from Remark 1 to certify \(\mathbb {P}^{\operatorname {opt}}(\lozenge T) \!=\! 1\) with ranking functions. Sections 4.2 and 4.3 present certificates for general lower bounds.

4 Certificates for Quantitative Reachability

This section presents our certificates for bounds on minimal and maximal reachability probabilities (Table 1). They are characterized via a Bellman operator:
Definition 3
(Bellman Operator for Reachability). We define the Bellman operator for reachability \(\mathcal {B}^{\operatorname {opt}}:[0,1]^S\rightarrow [0,1]^S\) as usual:
$$\begin{aligned} \mathcal {B}^{\operatorname {opt}}(x)(s) ~=~ {\left\{ \begin{array}{ll} 1 & \text { if } s \in T\\ \underset{a \in {Act}(s)}{\operatorname {opt}}\ \sum \limits _{s' \in Post (s,a)} P(s, a, s') \cdot x(s') & \text { if } s \in S\setminus T\end{array}\right. } \end{aligned}$$
Similar to \(\mathcal {D}^{\operatorname {opt}} \) from Section 3, \(\mathcal {B}^{\operatorname {opt}}\) is a monotone function on the complete lattice \(([0,1]^S, \le )\). Thus, \(\mathcal {B}^{\operatorname {opt}}\) has a least fixed point by Theorem 1.
Theorem 2
([16, Sec. 3.5]). For all \(s \in S\), \((\operatorname {lfp}{\mathcal {B}^{\operatorname {opt}}})(s) = \mathbb {P}^{\operatorname {opt}}_s(\lozenge T)\).
We stress that \(\mathcal {B}^{\operatorname {opt}}\) has multiple fixed points in general. For instance, \(x= \vec{1}\) is always a trivial fixed point. Theorem 2 states that the reachability probabilities are characterized as the least fixed point.

4.1 Upper Bounds on Optimal Reachability Probabilities

Following Principle 1, we obtain the following by Theorem 2:
Proposition 3
(Certificates for Upper Bounds on \(\mathbb {P}^{\operatorname {opt}}(\lozenge T)\)). A probability vector \(x\in [0,1]^S\) satisfying \(\mathcal {B}^{\operatorname {opt}}(x) \le x\) is a valid certificate for upper bounds on \(\operatorname {opt}\)-reachability. If \(x\) is valid, then \(\forall s \in S:\mathbb {P}^{\operatorname {opt}}_s(\lozenge T) \le x(s)\).
Example 2
We verify that the numbers \(x\) above the states in Figure 1 on page 1 are a valid certificate for upper bounds on \(\min \)-reachability: For s we check
and similar for z and t. Thus Proposition 3 yields \(\mathbb {P}^{\min }_{s}(\lozenge T) \le \frac{1}{2}\). This particular certificate remains valid when changing x(s) to any probability in \([\frac{1}{2}, 1]\). In general, however, increasing individual values may break inductivity.

4.2 Lower Bounds on Minimal Reachability Probabilities

With Theorem 1, we can only certify lower bounds on greatest fixed points. Lower bounds on reachability probabilities—which constitute the least fixed point of \(\mathcal {B}^{\operatorname {opt}}\)—are thus more involved. We propose to tackle this situation as follows:
We now focus on \(\min \)-reachability first and modify \(\mathcal {B}^{\min }\) as follows:
$$\begin{aligned} \tilde{\mathcal {B}}^{\min }:[0,1]^S\rightarrow [0,1]^S, ~ \tilde{\mathcal {B}}^{\min }(x)(s) = {\left\{ \begin{array}{ll} \mathcal {B}^{\min }(x)(s) & \text { if } \mathbb {P}^{\min }_s(\lozenge T) > 0 \\ 0 & \text { if } \mathbb {P}^{\min }_s(\lozenge T) = 0 \end{array}\right. } \end{aligned}$$
Lemma 4
(Unique Fixed Point[7, Thm. 10.109]).\(\tilde{\mathcal {B}}^{\min }\) has a unique fixed point \(\operatorname {fp}{\tilde{\mathcal {B}}^{\min }} = \operatorname {lfp}{\mathcal {B}^{\min }}\).
By Theorem 4 and Theorem 2, any probability vector \(x\le \tilde{\mathcal {B}}^{\min }(x)\) witnesses that \(x(s) \le \mathbb {P}^{\min }_s(\lozenge T)\) for all \(s \in S\). However, evaluating \(\tilde{\mathcal {B}}^{\min }(x)\) is not straightforward as it requires determining, for each \(s \in S\) whether \(\mathbb {P}^{\min }_{s}(\lozenge T) > 0\). Hence, we include an additional certificate for positive reachability from Section 3:
Proposition 4
(Certificates for Lower Bounds on \(\mathbb {P}^{\min }(\lozenge T)\)). A tuple of probability vector and ranking function \((x, r) \in [0,1]^S\times {\overline{\mathbb {N}}}^S\) is a valid certificate for lower bounds on \(\min \)-reachability if
$$\begin{aligned} 1)~\mathcal {D}^{\max } (r) \le r, \quad 2)~x\le \mathcal {B}^{\min }(x), \quad 3)~\forall s \in S\setminus T:x(s) > 0 \implies r(s) < \infty . \end{aligned}$$
If \((x, r)\) is valid, then \(\forall s \in S:\mathbb {P}^{\min }_s(\lozenge T) \ge x(s)\).
Example 3
We apply Proposition 4 to the MDP in Figure 1. The pairs \(x(v) \mid r(v)\) below each state v constitute a valid certificate \((x, r)\) for lower bounds on \(\min \)-reachability. Indeed, we have shown in Example 1 that it satisfies Condition 1) \(\mathcal {D}^{\max } (r) \le r\). Condition 2) \(x\le \mathcal {B}^{\min }(x)\) holds as well; in fact, we even have \(x= \mathcal {B}^{\min }(x)\), see Example 2. For the additional Condition 3), notice that s is the only state in \(S\setminus T\) with \(x(s) > 0\), and that \(r(s) < \infty \) holds as required. We conclude that https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figi_HTML.gif .

4.3 Lower Bounds on Maximal Reachability Probabilities

Our approach for lower bounds on \(\mathbb {P}^{\min }(\lozenge T)\) from Section 4.2 does not immediately extend to \(\max \)-reachability because \(\tilde{\mathcal {B}}^{\max }\) (a modification of \(\mathcal {B}^{\max }\) analogous to \(\tilde{\mathcal {B}}^{\min }\)) does not have a unique fixed point in general, see [17, App. D.3] for a concrete counterexample. This problem is caused by end components [2, Def. 3.13]. Towards a solution, we observe that, essentially by definition,
$$\begin{aligned} \forall s\in S:\quad \mathbb {P}^{\max }_s(\lozenge T) \ge x(s) \iff \exists \text { Strategy } \sigma :\mathbb {P}^{\sigma }_s(\lozenge T) \ge x(s) ~. \end{aligned}$$
In words, a lower bound on a \(\max \)-reachability probability is always witnessed by some strategy.2 Hence we adopt the following:
Certificates with an Explicit Witness Strategy.
Recall from Section 2 that given a strategy \(\sigma :S\rightarrow {Act}\) for MDP \(\mathcal {M}\), we can consider the induced DTMC \(\mathcal {M}^\sigma \). We write \(\mathcal {B}^{\sigma }\) for the Bellman operator associated with \(\mathcal {M}^\sigma \) (notice that a DTMC is just a special case of an MDP). Further, we let \(\tilde{\mathcal {B}}^{\sigma }\) be the corresponding modified Bellman operator. By Theorem 2 and Lemma 4:
Lemma 5
\(\tilde{\mathcal {B}}^{\sigma }\) has a unique fixed point \((\operatorname {fp}{\tilde{\mathcal {B}}^{\sigma }})(s) = \mathbb {P}^{\sigma }_s(\lozenge T)\) for all \(s \in S\).
Thus, we can certify lower bounds similar to Proposition 4 (we write \(\mathcal {D}^{\sigma } \) for the distance operator \(\mathcal {D}^{\operatorname {opt}} \) in the DTMC induced by \(\sigma \)):
Proposition 5
(Certificates for Lower Bounds on \(\mathbb {P}^{\max }(\lozenge T)\)+ Strategy). A triple \((x, r, \sigma ) \in [0,1]^S\times {\overline{\mathbb {N}}}^S\times {Act}^S\) is a valid certificate for lower bounds on \(\max \)-reachability with witness strategy if
$$\begin{aligned} 1)~\mathcal {D}^{\sigma } (r) \le r, \quad 2)~x\le \mathcal {B}^{\sigma }(x), \quad 3)~\forall s \in S\setminus T:x(s) > 0 \implies r(s) < \infty . \end{aligned}$$
If \((x, r, \sigma )\) is valid, then \(\forall s \in S:\mathbb {P}^{\max }_s(\lozenge T) \ge \mathbb {P}^{\sigma }_s(\lozenge T) \ge x(s)\).
Certificates without a Witness Strategy.
Increasing the size of the certificate by including the strategy can be avoided, as it can be “read off” from the certifying probability vector \(x\in [0,1]^S\). To this end, we define the \(x\)-increasing actions of state \(s \in S\): \({Act}_{x}^{\uparrow }(s)=\{a\in {Act}(s) \mid x(s) {\le } \sum _{s' \in Post {(s,a)}} P(s, a, s') {\cdot } x(s') \}\). If \(x\le \mathcal {B}^{\max }(x)\), then \({Act}_{x}^{\uparrow }(s)\) contains at least one action. Next, we define a variant of the distance operator which only considers \(x\)-increasing actions:
$$\begin{aligned} \mathcal {D}^{\min }_{x\uparrow } :{\overline{\mathbb {N}}}^S\rightarrow {\overline{\mathbb {N}}}^S, ~ \mathcal {D}^{\min }_{x\uparrow }(r)(s) = {\left\{ \begin{array}{ll} 0 & \text { if } s \in T\\ 1 ~+~ \min \limits _{a \in {Act}_{x}^{\uparrow }(s)} ~ \min \limits _{s' \in Post (s,a)} r(s') & \text { if } s \in S\setminus T\end{array}\right. } \end{aligned}$$
Proposition 6
(Certificates for Lower Bounds on \(\mathbb {P}^{\max }(\lozenge T)\)). A tuple \((x,r) \in [0,1]^S\times {\overline{\mathbb {N}}}^S\) is a valid certificate for lower bounds on \(\max \)-reachability if
$$\begin{aligned} 1)~\mathcal {D}^{\min }_{x\uparrow }(r) \le r, \quad 2)~x\le \mathcal {B}^{\max }(x), \quad 3)~\forall s \in S\setminus T:x(s) > 0 \implies r(s) < \infty . \end{aligned}$$
If \((x, r)\) is valid, then \(\forall s \in S:\mathbb {P}^{\max }_s(\lozenge T) \ge x(s)\).

5 Certificates for Expected Rewards

We present certificates for bounds on expected rewards (??) in the “\(* = \infty \)” semantics that assigns infinite reward to paths not reaching \(T\) , with the other case in [17, App. F]. We employ the reward variant of the Bellman operator:
Definition 4
(Bellman Operator for Expected Rewards). We define the Bellman operator for expected rewards \(\mathcal {E}^{\operatorname {opt}}:\overline{\mathbb {R}}_{\ge 0}^{S}\rightarrow \overline{\mathbb {R}}_{\ge 0}^{S}\) as follows:
$$\begin{aligned} \mathcal {E}^{\operatorname {opt}}(x)(s) ~=~ {\left\{ \begin{array}{ll} 0 & \text { if } s \in T\\ \text {rew}(s) + \underset{a \in {Act}(s)}{\operatorname {opt}}\ \sum \limits _{s' \in Post (s,a)} P(s, a, s') \cdot x(s') & \text { if } s \in S\setminus T\end{array}\right. } \end{aligned}$$
The above definition assumes that multiplication by \(\infty \) absorbs positive numbers, i.e., \(p \cdot \infty = \infty \) for all \(p > 0\), and \(a + \infty = \infty + a = \infty \) for all \(a \in \overline{\mathbb {R}}_{\ge 0}\).
Again, \(\mathcal {E}^{\operatorname {opt}}\) is a monotone function on the complete lattice \((\overline{\mathbb {R}}_{\ge 0}^{S}, \le )\) and thus has a least and a greatest fixed point by Theorem 1. Unfortunately, as it turns out, the sought-after expected rewards \(\mathbb {E}^{\operatorname {opt}}_s(\lozenge T)\), \(s \in S\), are neither of these two fixed points. Indeed, \(\operatorname {lfp}{\mathcal {E}^{\operatorname {opt}}}\) corresponds to the expected rewards in the semantics considered in [17, App. F], and \(\operatorname {gfp}{\mathcal {E}^{\operatorname {opt}}}\) is a trivial upper bound assigning \(\infty \) to all states, see the example in Section 5.1.
Remark 3
(Asymmetry and Duality). In Section 4, an asymmetry between upper and lower bounds arose as the reachability probabilities are a least fixed point. Further, for the case of maximizing reachability, spurious fixed points occurred and we required a witness strategy to “make progress” towards the targets (the fact that this case requires special treatment of end components is well established in literature, e.g., [31]). For safety objectives, where the goal is to avoid a set of bad states, the situation is dual: The safety probabilities are a greatest fixed point, so the lower bound case is simple, and when minimizing the upper bound, we require a witness strategy. The \(* = \infty \) semantics for expected rewards share some similarities with a safety objective, since the value is maximized (i.e., is infinite) when the target set is avoided. This section thus differs from Section 4 in two ways: (i) Everything is dual, as \(* = \infty \) is “safety-like”, and (ii) additional complications arise from the trivial greatest fixed point \(\operatorname {gfp}{\mathcal {E}^{\operatorname {opt}}} = \vec{\infty }\), see below.
Fig. 2.
A DTMC.

5.1 Lower Bounds on Optimal Expected Rewards

Due to the absorptive property of multiplication by \(\infty \), \(\operatorname {gfp}{\mathcal {E}^{\operatorname {opt}}}\) may assign \(\infty \) to states that actually have finite value: For instance, in the DTMC in Figure 2, the gfp assigns \(\infty \) to s because \(\infty = \text {rew}(s) + \frac{1}{2} \cdot \infty + \frac{1}{2} \cdot 0\), while in fact \(\mathbb {E}_s(\lozenge T) = 2 \cdot \text {rew}(s) < \infty \). To address this, we force the values of states that a.s. reach the target to be finite as follows:
Lemma 6
Let \(x\in \overline{\mathbb {R}}_{\ge 0}^{S}\) be such that 1) \(x\le \mathcal {E}^{\operatorname {opt}}(x)\) and 2) for all \(s \in S\): \(\mathbb {P}^{\overline{\operatorname {opt}}}_s(\lozenge T) = 1 \implies x(s) < \infty \) . Then it holds for all \(s \in S\) that \(x(s) \le \mathbb {E}^{\operatorname {opt}}_s(\lozenge T)\).
Intuitively, Lemma 6 requires that a lower bound on \(\mathbb {E}^{\min }_s(\lozenge T)\) can only be infinite if \(T\) cannot be reached a.s., i.e. \(\mathbb {P}^{\max }_s(\lozenge T) < 1\) (dually for \(\mathbb {E}^{\max }\)). Combining Lemma 6 and a certificate for non-a.s. reachability (Section 3) yields:
Proposition 7
(Certificates for Lower Bounds on \(\mathbb {E}^{\operatorname {opt}}(\lozenge T)\)). A tuple \((x, r) \in \overline{\mathbb {R}}_{\ge 0}^{S}\times {\overline{\mathbb {N}}}^S\) is a valid certificate for lower bounds on \(\operatorname {opt}\)-exp. rewards if
$$\begin{aligned} 1)~\overline{\mathcal {D}}^{\overline{\operatorname {opt}}}(r) \le r, \quad 2)~x\le \mathcal {E}^{\operatorname {opt}}(x), \quad 3)~\forall s \in S:x(s) = \infty \implies r(s) < \infty . \end{aligned}$$
If \((x,r)\) is valid, then \(\forall s\in S:\mathbb {E}^{\operatorname {opt}}_s(\lozenge T) \ge x(s)\).

5.2 Upper Bounds on Maximal Expected Rewards

Next we focus on upper bounds on maximal expected rewards. Using Principle 3 as for lower bounds on minimal reachability probabilities (Section 4.2), we obtain such certificates via a modified Bellman operator:
$$\begin{aligned} \tilde{\mathcal {E}}^{\max }:\overline{\mathbb {R}}_{\ge 0}^{S}\rightarrow \overline{\mathbb {R}}_{\ge 0}^{S}, \quad \tilde{\mathcal {E}}^{\max }(x)(s) ~=~ {\left\{ \begin{array}{ll} \mathcal {E}^{\max }(x)(s) & \text { if } \mathbb {P}^{\min }_s(\lozenge T) > 0 \\ \infty & \text { if } \mathbb {P}^{\min }_s(\lozenge T) = 0 \end{array}\right. } \end{aligned}$$
Lemma 7
For all \(s \in S\), \((\operatorname {lfp}{\tilde{\mathcal {E}}^{\max }})(s) = \mathbb {E}^{\max }_s(\lozenge T)\).
We stress that unlike \(\tilde{\mathcal {B}}^{\min }\) from Section 4.2, \(\tilde{\mathcal {E}}^{\max }\) does not have a unique fixed point, see Figure 2. Nonetheless, with Lemma 7, Principle 1, and the certificates for positive reachability from Proposition 1, we obtain:
Proposition 8
(Certificates for Upper Bounds on \(\mathbb {E}^{\max }(\lozenge T)\)). A tuple \((x, r) \in \overline{\mathbb {R}}_{\ge 0}^{S}\times {\overline{\mathbb {N}}}^S\) is a valid certificate for upper bounds on \(\max \)-exp. rewards if
$$\begin{aligned} 1)~\mathcal {D}^{\max } (r) \le r, \quad 2)~\mathcal {E}^{\max }(x) \le x, \quad 3)~\forall s \in S:x(s) < \infty \implies r(s) < \infty . \end{aligned}$$
If \((x,r)\) is valid, then \(\forall s\in S:\mathbb {E}^{\max }_s(\lozenge T) \le x(s)\).

5.3 Upper Bounds on Minimal Expected Rewards

Our approach for this case parallels the one for lower bounds on maximal reachability probabilities from Section 4.3. The modified Bellman operator \(\tilde{\mathcal {E}}^{\min }\) (defined analogous to \(\tilde{\mathcal {E}}^{\max }\) from above) does not characterize the minimal expected rewards as its least fixed point. The problem are, again, end components, see [17, App. E.5] for a counter-example. Following Principle 4 and Section 4.3, we can, however, certify upper bounds on \(\mathbb {E}^{\min }(\lozenge T)\) by including a witness strategy (see [17, App. E.6]).
As with lower bounds on \(\max \)-reachability, it is also possible to avoid this explicit witness strategy: We define the \(x\)-decreasing actions of s as \({Act}_{x}^{\downarrow }(s) = \{a\in {Act}(s) \mid x(s) \ge \text {rew}(s) + \sum _{s' \in Post (s,a)} P(s, a, s') \cdot x(s') \}\). If \(\mathcal {E}^{\min }(x) \le x\), then \({Act}_{x}^{\downarrow }(s) \ne \emptyset \). We define a distance operator with \(\mathcal {D}^{\min }_{x\downarrow }\) that only considers \(x\)-decreasing actions completely analogous to \(\mathcal {D}^{\min }_{x\uparrow }\) from Section 4.3.
Proposition 9
(Certificates for Upper Bounds on \(\mathbb {E}^{\min }(\lozenge T)\)). A tuple \((x, r) \in \overline{\mathbb {R}}_{\ge 0}^{S}\times {\overline{\mathbb {N}}}^S\) is a valid certificate for upper bounds on \(\min \)-exp. rewards if
$$\begin{aligned} 1)~\mathcal {D}^{\min }_{x\downarrow }(r) \le r, \quad 2)~\mathcal {E}^{\min }(x) \le x, \quad 3)~\forall s \in S:x(s) < \infty \implies r(s) < \infty . \end{aligned}$$
If \((x, r)\) is valid, then \(\forall s \in S:\mathbb {E}^{\min }_s(\lozenge T) \le x(s)\).

6 Computing Certificates

In Sections 3 to 5 we described what certificates are and discussed their verification conditions. We now elaborate on how to compute certificates. To this end, we first discuss computation of (co-)inductive value vectors \(x\) and then focus on the ranking functions \(r\) required by some certificates (see Figure 1). We stress that a sound certificate checker detects any wrong results produced by buggy implementations of the methods discussed in this section. Indeed, during implementation of the certificate computation algorithms in Storm, checking the certificates helped finding and resolving implementation bugs.
As we enter the realm of numeric computation, some remarks are in order. For computational purposes we assume that the transitions probabilities are rational numbers, i.e., fractions of integers. Moreover:
Our goal is to compute a certificate with a rational value vector \(x\) and to check it with exact, arbitrary p recision rational number arithmetic.
Certificates via Exact Algorithms. The conceptually easiest certifying MDP model checking algorithm is to compute the rational reachability probabilities or expected rewards exactly. The resulting value vector is both inductive and co-inductive. Thus, exact algorithms yield a certificate essentially as a by-product. We refer to [29, 30] for an in-depth comparison of exact algorithms based on Policy Iteration (PI), Rational Search (RS), and Linear Programming (LP). The practically most efficient algorithm is PI with exact LU decomposition as linear equation solver; see [30, Secs. 2.2 and 4.2] for a description of the algorithm.
Certificates via Approximate Algorithms. In practice, most probabilistic model checkers use algorithms that are not exact but approximate: They employ approximate, fixed-precision floating point arithmetic and use a variant of VI that only returns an approximate result, namely for each state an interval \([\ell ,u]\) containing the exact value, such that \(|\ell - u| \le \varepsilon \) for a given \(\varepsilon \) (typically \(10^{-6}\)). They do this because (i) when using exact arithmetic, fractions can grow very large, hindering scalability, (ii) VI-based algorithms often outperform PI, albeit not as dramatically as folklore claimed [29, 30], and (iii) approximate results usually suffice. We now exemplify with the VI-variant Interval Iteration (II) [8, 24] how to make an approximate, floating point-based algorithm certifying, leaving other variants such as optimistic VI [31] and Sound VI [49] for future work.
II for reachability3 works by first collapsing end components [13, 24] of the MDP to ensure that \(\mathcal {B}^{\operatorname {opt}}\) has a unique fixed point. II then runs two instances of VI in parallel, starting from \(x^{(0)} = \vec{0}\) and \(y^{(0)} = \vec{1}\):
$$\begin{aligned} \vec{0} = x^{(0)} \le \mathcal {B}^{\operatorname {opt}}(x^{(0)}) = x^{(1)} \le ~\ldots ~ \operatorname {fp}{\mathcal {B}^{\operatorname {opt}}} ~\ldots ~ \le y^{(1)} = \mathcal {B}^{\operatorname {opt}}(y^{(0)}) \le y^{(0)} = \vec{1} \end{aligned}$$
Both sequences contain (co-)inductive vectors only and converge to the fixed point. The iteration can be stopped when the difference is as small as desired.
However, as we demonstrate experimentally (Section 7), inexact floating point arithmetic usually breaks (co-)inductivity of the elements in the II sequences, as was already reported in [55] in a similar setting. More precisely, let \(\mathcal {B}^{\operatorname {opt}}_\mathbb {F}\) be a “floating point variant” of \(\mathcal {B}^{\operatorname {opt}}\), i.e., the (exact) result of each operation is rounded to a nearest float. This the default rounding mode in IEEE 754. Let \(x_{\mathbb {F}}^{(i)}\) be the i-th element, \(i > 0\), in the lower VI sequence of \(\mathcal {B}^{\operatorname {opt}}_\mathbb {F}\) starting from \(\vec{0}\). Then, due to rounding errors, \(x_{\mathbb {F}}^{(i)} \le \mathcal {B}^{\operatorname {opt}}(x_{\mathbb {F}}^{(i)})\) does not hold in general, i.e., \(x_{\mathbb {F}}^{(i)}\) might not be co-inductive. We propose two ways to mitigate this problem: Safe rounding [27] and Smooth II.
First, safe rounding amounts to configuring the IEEE754 rounding mode so that results of floating point computations are always rounded towards 0 when iterating from below, and towards \(\infty \) when iterating from above. While safe rounding provably yields sound bounds [27], it may slow down or even prevent convergence of II. Nonetheless, in practice, II with safe rounding finds significantly more certificates than II with default rounding (Section 7).
Second, for Smooth II we define the \(\gamma \)-smooth Bellman operator (\(\gamma \in [0,1)\))
$$ \mathcal {B}^{\operatorname {opt}}_\gamma (x) ~=~ \gamma \cdot x+ (1-\gamma ) \cdot \mathcal {B}^{\operatorname {opt}}(x) ~, $$
where scalar multiplication and addition are component-wise. \(\mathcal {B}^{\operatorname {opt}}_\gamma \) and \(\mathcal {B}^{\operatorname {opt}}\) have the same fixed points, and every (co-)inductive value vector w.r.t. \(\mathcal {B}^{\operatorname {opt}}_\gamma \) is also (co-)inductive w.r.t. \(\mathcal {B}^{\operatorname {opt}}\) [17, App. G.1]. The key property of \(\mathcal {B}^{\operatorname {opt}}_\gamma \) compared to \(\mathcal {B}^{\operatorname {opt}}\) is that the former enforces ultimately strictly monotonic VI sequences. This mitigates the floating point rounding issues. Notice, however, that smoothing slows down convergence. Smoothing and safe rounding may be combined.
Computing Ranking Functions. We briefly outline how to obtain the unique and least fixed points of \(\mathcal {D}^{\operatorname {opt}} \) and \(\overline{\mathcal {D}}^{\operatorname {opt}}\), respectively (see Definitions1 and 2).
First, \(\operatorname {fp}{\mathcal {D}^{\operatorname {opt}} }\) can be computed via VI from \(r^{(0)} = \vec{\infty }\). This iteration converges in finitely many steps. Second, to compute \(\operatorname {lfp}{\overline{\mathcal {D}}^{\operatorname {opt}} }\) we propose to perform VI from \(r^{(0)}\) with \(r^{(0)}(s) = [\mathbb {P}^{\overline{\operatorname {opt}}}_s(\lozenge T) = 1] \cdot \infty \) for all \(s \in S\). The condition in the Iverson bracket can be evaluated using standard graph analysis [7, Section 10.6.1]. This iteration converges in finitely many steps as well, see [17, Apps. G.2 and G.3] for details and a practically more efficient algorithm.

7 Experimental Evaluation

Implementation. We implemented certificate computation as discussed in Section 6 in Storm  [33]. Given a higher-level model description (PRISM  [42] or Jani  [14]), and a reachability probability or expected reward query, our implementation proceeds in three steps: First, Storm builds an explicit MDP from the description. Second, it computes a certificate for both lower and upper bounds, such that the relative difference between the two values is at most \(\varepsilon = 10^{-6}\) for each MDP state. Finally, Storm checks the validity of the certificate.
Following the discussion in Section 6, we consider the following algorithms: Regarding exact computation, we use PI with exact LU decomposition, called PI\(^X\). For approximate computation with floating point arithmetic, we employ II. Further, to investigate the impact of the rounding error mitigation techniques from Section 6, we complement II with either safe rounding (denoted https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figk_HTML.gif ), smoothing with parameter \(\gamma \) (denoted https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figl_HTML.gif ; we consider \(\gamma \in \{0.05,0.8,0.9,0.95\}\)), or a combination of both (denoted https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figm_HTML.gif ). Overall, we compare PI\(^X\) and seven variants of II. We employ additional standard modifications of the algorithms, namely: We eliminate end components whenever possible, apply topological optimizations for PI\(^X\) and II, and apply Gauß-Seidl Bellman updates for II  [8, 29].
In all three steps of the implementation, we represent numbers as arbitrary precision rationals implemented in GMP [23]—except when running II in the second step (in which case we convert rationals to their nearest floats, potentially yielding invalid certificates). We thus certify reachability probabilities and expected rewards with respect to the exact MDP without rounding errors.
The MDP and the certificate computed with Storm can be exported and checked by an independent formally verified certificate checker. To construct the latter, we verified the correctness of the certificate checking algorithms in the interactive proof assistant Isabelle/HOL  [46], extending previous work on MDPs [34, 50] by total rewards and qualitative reachability properties. Based on this library, we proved correct the soundness of the certificates described in Sections 3 5. We used Isabelle/HOL ’s code export mechanism [25] to obtain a verified, executable Standard ML implementation that employs exact rational arithmetic. The construction of the MDP from a PRISM or Jani model as well as export and parsing of MDPs and certificates are currently not verified.
Benchmarks and Setup. We use all 366 benchmark instances from the quantitative verification benchmark set (QVBS) [32] that (i) consider an MDP with a reachability or reward objective and (ii) for which Storm can build an explicit representation within 5 minutes. Additionally, since the QVBS contains no models exhibiting non-trivial ECs, we include 71 structurally diverse models from various sources detailed in [30, Sec. 5.3]. Overall, we consider the complete alljani set from [30]. We invoke Storm for each combination of benchmark instance and certificate algorithm and report the overall runtime (walltime). All experiments ran on Intel Xeon 8468 Sapphire 2.1 Ghz systems. We used Slurm to limit the individual executions to 4 CPU cores and 16GB of RAM, with a time limit of 900s. Next, we discuss our findings by answering three research questions (RQs).
Fig. 3.
RQ1: Runtime for computing certificates of PI\(^X\) and several combinations of mitigation techniques with II (left); and detailed comparison of PI\(^X\) and https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Fign_HTML.gif (right).
RQ1: Best algorithm for certificate generation? Figure 3 (left) compares the runtimes of PI\(^X\) and our seven II variants. A point (xy) for algorithm A indicates that there are x instances for which A computes a valid certificate within y seconds (including time for model construction but excluding time for exporting the certificate files). The triples https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figo_HTML.gif in the legend indicate that the algorithm produced a total of v valid and w invalid certificates (with invalidity likely due to floating point issues), while for the remaining u instances no result was found within the resource limits. As expected, all certificates produced by the exact PI\(^X\) are valid, while standard II produces many invalid certificates. Safe rounding and smoothing improve the number of valid certificates. Notably, https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figp_HTML.gif (only smoothing) performs best for \(\gamma \) values close to 1, while the performance of https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figq_HTML.gif (smoothing and safe rounding) is less sensitive towards \(\gamma \); see [17, App. H] for more details. Among all II variants, https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figr_HTML.gif shows the best overall performance.
The scatter plot in Figure 3 (right) further compares PI\(^X\) and https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figs_HTML.gif . A data point (xy) corresponds to one benchmark instance, where x and y are runtimes of PI\(^X\) and https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figt_HTML.gif . A point at \(\ge 300\) indicates a runtime between 300 and 900 seconds, inval means an invalid certificate, and n/a denotes an aborted computation due to time/memory limits. Many instances that PI\(^X\) cannot solve are solved by https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figu_HTML.gif and vice versa. This is already the case without computing certificates, as the structure of a benchmark affects the performance of the algorithms differently [29, 30]. Thus, as in the case without computing certificates, there is no “best algorithm”, and both PI\(^X\) and variants of II can be considered. Overall, 396 out of 447 instances are correctly solved and certified by PI\(^X\) or https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figv_HTML.gif (or both). We highlight that PI\(^X\) is not only a complete certifying algorithm, but also practically efficient, even though it uses exact arithmetic.
Fig. 4.
RQ2: Runtime overhead of certified MDP model checking for PI\(^X\) (left) and II (middle), and scalability of both with respect to the number of states (right).
Fig. 5.
RQ3: Runtime overhead of the certified pipeline/Runtime of certificate checking
RQ2: Runtime overhead of certificate generation? Figure 4 (left/middle) reports the runtime overhead of generating a certificate for PI\(^X\) and https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figw_HTML.gif . For PI\(^X\), the overhead is typically within a factor of 2, often significantly less. It is sometimes faster due to implementation differences in the certifying variant of PI\(^X\). For https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90653-4_7/MediaObjects/652619_1_En_7_Figx_HTML.gif , the overhead is slightly larger, typically around 1.5 to 4. This is partly due to the slower convergence caused by smoothing. Figure 4 (right) investigates the scalability of certificate generation with respect to the number of states. For MDPs with up to \(10^5\) states, certificate generation usually completes within a minute (often much less); for more than \(10^7\) states, it usually times out.
RQ3: Scalability of the formally verified certificate checker? Figure 5 (left) compares the runtime of the full pipeline including certificate generation and verification using our formally verified checker (PI\(^X\) +cert) with plain, uncertified MDP model checking based on PI\(^X\). Compared to Figure 4 (left), the added verification of the certificates causes additional time/memory outs, and roughly doubles the runtime of the other instances. Figure 5 (right) reveals that parsing is currently a major bottleneck in the verified checker. Nonetheless, the checker completes within a few seconds on MDPs with up to \(\approx 10^5\) states, and usually within 30s for instances with up to \(\approx 10^6\) states.

8 Conclusion and Future Work

We proposed fixed point certificates as a new standard for certified model checking of reachability and expected reward properties in MDPs. The soundness of these certificates was formalized in Isabelle/HOL, increasing their trustworthiness and enabling us to generate a formally verified certificate checker, applicable to non-trivial practically relevant instances. Our certificates can be generated with moderate overhead via minor, yet careful, modifications of established algorithms like II or PI. This allows tool developers and competitions [15]—for which our certificates provide formally verified reference results—to adopt our proposal with relatively low effort. Future work is to develop a more efficient certificate format. Further, we plan to extend our theory to other quantitative verification settings [3], e.g., stochastic games and \(\omega \)-regular properties, and make it amenable to techniques such as symbolic model checking and partial exploration.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Fußnoten
1
Aka. scheduler or policy. We do not define more general strategies as memoryless deterministic suffice for optimizing reachability probabilities and expected rewards.
 
2
Dually, an upper bound on a \(\min \)-reachability probability is also witnessed by a strategy, but our corresponding certificates from Proposition 3 do not rely on this.
 
3
For expected rewards, II additionally requires computing an upper bound, see [8].
 
Literatur
2.
Zurück zum Zitat de Alfaro, L.: Formal verification of probabilistic systems. Ph.D. thesis, Stanford University, USA (1997) de Alfaro, L.: Formal verification of probabilistic systems. Ph.D. thesis, Stanford University, USA (1997)
3.
Zurück zum Zitat Andriushchenko, R., Bork, A., Budde, C.E., Češka, M., Hahn, E.M., Hartmanns, A., Israelsen, B., Jansen, N., Jeppson, J., Junges, S., Köhl, M.A., Könighofer, B., Křetínský, J., Meggendorfer, T., Parker, D., Pranger, S., Quatmann, T., Ruijters, E., Taylor, L., Volk, M., Weininger, M., Zhang, Z.: Tools at the frontiers of quantitative verification: QComp 2023 competition report. In: International TOOLympics Challenge, pp. 90–146. Springer (2024) Andriushchenko, R., Bork, A., Budde, C.E., Češka, M., Hahn, E.M., Hartmanns, A., Israelsen, B., Jansen, N., Jeppson, J., Junges, S., Köhl, M.A., Könighofer, B., Křetínský, J., Meggendorfer, T., Parker, D., Pranger, S., Quatmann, T., Ruijters, E., Taylor, L., Volk, M., Weininger, M., Zhang, Z.: Tools at the frontiers of quantitative verification: QComp 2023 competition report. In: International TOOLympics Challenge, pp. 90–146. Springer (2024)
7.
Zurück zum Zitat Baier, C., Katoen, J.: Principles of model checking. MIT Press (2008) Baier, C., Katoen, J.: Principles of model checking. MIT Press (2008)
8.
Zurück zum Zitat Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: Interval iteration for Markov decision processes. In: CAV (1). Lecture Notes in Computer Science, vol. 10426, pp. 160–180. Springer (2017). https://doi.org/10.1007/978-3-319-63387-9_8 Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: Interval iteration for Markov decision processes. In: CAV (1). Lecture Notes in Computer Science, vol. 10426, pp. 160–180. Springer (2017). https://​doi.​org/​10.​1007/​978-3-319-63387-9_​8
10.
13.
Zurück zum Zitat Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: ATVA. Lecture Notes in Computer Science, vol. 8837, pp. 98–114. Springer (2014). https://doi.org/10.1007/978-3-319-11936-6_8 Brázdil, T., Chatterjee, K., Chmelik, M., Forejt, V., Kretínský, J., Kwiatkowska, M.Z., Parker, D., Ujma, M.: Verification of Markov decision processes using learning algorithms. In: ATVA. Lecture Notes in Computer Science, vol. 8837, pp. 98–114. Springer (2014). https://​doi.​org/​10.​1007/​978-3-319-11936-6_​8
15.
Zurück zum Zitat Budde, C.E., Hartmanns, A., Klauck, M., Kretínský, J., Parker, D., Quatmann, T., Turrini, A., Zhang, Z.: On correctness, precision, and performance in quantitative verification - QComp 2020 competition report. In: ISoLA (4). Lecture Notes in Computer Science, vol. 12479, pp. 216–241. Springer (2020). https://doi.org/10.1007/978-3-030-83723-5_15 Budde, C.E., Hartmanns, A., Klauck, M., Kretínský, J., Parker, D., Quatmann, T., Turrini, A., Zhang, Z.: On correctness, precision, and performance in quantitative verification - QComp 2020 competition report. In: ISoLA (4). Lecture Notes in Computer Science, vol. 12479, pp. 216–241. Springer (2020). https://​doi.​org/​10.​1007/​978-3-030-83723-5_​15
17.
23.
Zurück zum Zitat Granlund, T., Team, G.D.: GNU MP 6.0 Multiple Precision Arithmetic Library. Samurai Media Limited (2015) Granlund, T., Team, G.D.: GNU MP 6.0 Multiple Precision Arithmetic Library. Samurai Media Limited (2015)
26.
Zurück zum Zitat Hahn, E.M., Hartmanns, A., Hensel, C., Klauck, M., Klein, J., Kretínský, J., Parker, D., Quatmann, T., Ruijters, E., Steinmetz, M.: The 2019 comparison of tools for the analysis of quantitative formal models - (QComp 2019 competition report). In: TACAS (3). Lecture Notes in Computer Science, vol. 11429, pp. 69–92. Springer (2019). https://doi.org/10.1007/978-3-030-17502-3_5 Hahn, E.M., Hartmanns, A., Hensel, C., Klauck, M., Klein, J., Kretínský, J., Parker, D., Quatmann, T., Ruijters, E., Steinmetz, M.: The 2019 comparison of tools for the analysis of quantitative formal models - (QComp 2019 competition report). In: TACAS (3). Lecture Notes in Computer Science, vol. 11429, pp. 69–92. Springer (2019). https://​doi.​org/​10.​1007/​978-3-030-17502-3_​5
33.
Zurück zum Zitat Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker Storm. Int. J. Softw. Tools Technol. Transf. 24(4), 589–610 (2022). 10.1007/S10009-021-00633-Z Hensel, C., Junges, S., Katoen, J., Quatmann, T., Volk, M.: The probabilistic model checker Storm. Int. J. Softw. Tools Technol. Transf. 24(4), 589–610 (2022). 10.1007/S10009-021-00633-Z
35.
Zurück zum Zitat Jantsch, S.: Certificates and Witnesses for Probabilistic Model Checking. Ph.D. thesis, Dresden University of Technology, Germany (2022) Jantsch, S.: Certificates and Witnesses for Probabilistic Model Checking. Ph.D. thesis, Dresden University of Technology, Germany (2022)
Metadaten
Titel
Fixed Point Certificates for Reachability and Expected Rewards in MDPs
verfasst von
Krishnendu Chatterjee
Tim Quatmann
Maximilian Schäffeler
Maximilian Weininger
Tobias Winkler
Daniel Zilken
Copyright-Jahr
2025
DOI
https://doi.org/10.1007/978-3-031-90653-4_7