Dieses Kapitel untersucht die schwierige Aufgabe, Äquivalenz in probabilistischen Programmen (PPs) zu widerlegen, die Konditionierung beinhalten. PPs, die Standardprogramme um probabilistische Anweisungen erweitern, sind in verschiedenen Bereichen wie Vernetzung, Datenschutz, Sicherheit und Robotik weit verbreitet. Das Kapitel stellt eine bahnbrechende Methode zum Nachweis der Ungleichheit bedingter Produktionsverteilungen in ÖPP mit Konditionierung vor, ein Problem, das bis heute größtenteils ungelöst ist. Die Methode ist vollautomatisiert, anwendbar auf unendliche, diskrete und kontinuierliche Turing-vollständige PPs und liefert nachweislich korrekte Widerlegungen. Es baut auf den Vorstellungen von Supermartingalen (U / LESM) auf und führt das Konzept eines gewichteten Neustarts ein, um ÖPPs mit Konditionierung in konditionierungsfreie ÖPPs zu übersetzen. Das Kapitel stellt auch Garantien für die Solidität und Vollständigkeit der Beweisregel auf der Grundlage von U / LESM vor, ein neuartiges Ergebnis auf diesem Gebiet. Darüber hinaus enthält sie eine experimentelle Bewertung, die die Überlegenheit der Methode gegenüber bestehenden Baselines hinsichtlich der Anzahl der widerlegten Äquivalenzen belegt. Das Kapitel berührt auch die potenzielle Anwendung eines gewichteten Neustarts auf Ähnlichkeitswiderlegungen in ÖPP mit Konditionierung, was den Weg für zukünftige Forschungen in diesem Bereich ebnet.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
We consider the problem of refuting equivalence of probabilistic programs, i.e., the problem of proving that two probabilistic programs induce different output distributions. We study this problem in the context of programs with conditioning (i.e., with observe and score statements), where the output distribution is conditioned by the event that all the observe statements along a run evaluate to true, and where the probability densities of different runs may be updated via the score statements. Building on a recent work on programs without conditioning, we present a new equivalence refutation method for programs with conditioning. Our method is based on weighted restarting, a novel transformation of probabilistic programs with conditioning to the output equivalent probabilistic programs without conditioning that we introduce in this work. Our method is the first to be both a) fully automated, and b) providing provably correct answers. We demonstrate the applicability of our method on a set of programs from the probabilistic inference literature.
1 Introduction
Probabilistic programs. Probabilistic programs (PPs) are standard programs extended with probabilistic instructions, such as sampling of variable values from probability distributions. PPs are used in numerous application domains, such as networking [33], privacy and security [7, 8], or planning and robotics [35, 58], to name a few. Unlike deterministic programs, which (assuming they terminate) for each concrete input produce a single output, PPs induce output distributions, i.e. probability distributions over the space of possible outputs.
Conditioning. While PPs with sampling instructions can implement various randomized algorithms, the recent proliferation of probabilistic programming is driven largely by the advances in machine learning, where PPs serve as tools for Bayesian inference. For this, a PP has to be equipped with observe statements which can be used to condition the data in program’s variables through observations, as well as score statements which can be used to assign different probability densities to different program executions. Probability of the non-blocked runs is then renormalized [12, 43, 53]. In this way, the program’s output distribution is conditioned by the event that the program’s run satisfies all observe-statement assertions it encounters and all program run probability densities are renormalized according to their cummulative score values.
Anzeige
Static and relational analysis of PPs. Given their inherent randomness, PPs are extremely difficult to debug. This has spurred demand for formal analyses of PPs so as to obtain provable assurances of their correctness. Over the past decade, numerous static analyses for PPs were developed, focusing mostly on properties of individual programs: termination (e.g. [1, 14, 19, 22, 25, 26, 32, 46, 47, 49, 56, 57]), safety [10, 11, 20, 25, 54, 56, 63], runtime and cost analysis [5, 21, 29, 52, 60, 62], or input-output behavior [27]. Recently, attention has been increasingly given to verifying relational properties of PPs, i.e., properties of program pairs, such as sensitivity [2, 6, 42, 61] or differential privacy [3, 7].
Refuting equivalence of PPs. In this work, we study the relational property of program equivalence. Two PPs with fixed inputs are called equivalent if they induce the same output distributions [45, 48, 51]. Equivalence analysis allows one to check, e.g., whether compiler optimizations preserve the input-output behavior of the program, or if a sampler follows a desired distribution [15]. We focus on a bug-hunting aspect of the equivalence property, i.e., equivalence refutation: given two PPs, prove that their output distributions differ. Recently, the work [23] presented an automated method for equivalence refutation in PPs that has formal correctness guarantees. However, the method of [23] works only for PPs without conditioning. In this paper, we focus on refuting equivalence of PPs with conditioning, i.e., proving the inequality of their conditional output distributions.
Our contributions. We present a new method for refuting equivalence of PPs. Our method is the first to have all of the following properties: (i) It is fully automated, (ii) it is applicable infinite-state, discrete and continuous, Turing-complete PPs with conditioning, and (iii) it provides provably correct refutations.
We build on the notions of upper and lower expectation supermartingales (U/LESMs) presented in [23]. U/LESMs map program states to real numbers, providing upper and lower bounds on the expected value of some function \( f \)at output. During the refutation process, the concrete function \( f \) is synthesized along with the corresponding UESM and LESM to provide the required refutation proof: for a given pair \( (P_1, P_2) \) of PPs, we seek \( f \) whose expected value at the output of \( P_1 \) is provably different (due to the bounds provided by U/LESMs) from its expected value at the output of \( P_2 \) - this witnesses the difference of the underlying output distributions. The computation of \( f \) and the U/LESMs proceeds via polynomial constraint solving. However, the method is restricted to PPs without conditioning, and its extension to PPs with conditioning was left as an open problem (see [23, Section 9]). Our contributions are as follows:
1.
Equivalence refutation for PPs with conditioning. We extend the above to the much more general setting of PPs with conditioning, allowing for both observe and score statements. We achieve this via the notion of weighted restarting in PPs that we introduce in this work. Weighted restarting is a construction that translates a PP with conditioning into a PP without conditioning, whose output distribution is equivalent to that of the original PP. Hence, by applying weighted restarting to two PPs with conditioning, we can reduce their equivalence refutation to refuting equivalence of a pair of PPs without conditioning, for which we can utilize the method of [23].
2.
Soundness and completeness guarantees. While the method of [23] was proved to be sound, completeness of the proof rule was not studied and, to the best of our knowledge, no sound and complete proof rule for equivalence refutation in PPs has been proposed. In this work, we show that the proof rule based on U/LESMs is sound and complete for refuting equivalence. This result is novel both in the settings of PPs with and without conditioning. Moreover, it also allows us to prove that the polynomial constraint solving algorithm for computing a function f over program outputs together with the U/LESM is semi-complete, in the sense that it is guaranteed to be able to refute equivalence of PPs whenever equivalence can be witnessed by a function f and a pair of U/LESM that can be specified via polynomial functions of some maximal polynomial degree D. Note that this is a significant guarantee, given that the problem of equivalence checking of two programs is undecidable [36], hence no algorithm can be both sound and complete.
3.
Experimental evaluation. We evaluate our approach on benchmarks from the well known PSI inference tool [34], as well as on several loopy programs. We compare with a PSI+Mathematica [65] pipeline. Our method significantly outperforms the baseline in terms of the number of refuted equivalences.
Anzeige
Finally, while the focus of this work is on equivalence refutation, weighted restarting also allows us to design a new method for similarity refutation in PPs with conditioning. The similarity refutation problem is concerned with computing a lower bound on the distance between output distributions of two PPs. The work [23] proposed a method for similarity refutation in PPs without conditioning, and in this work we generalize it to the much more general setting of PPs with conditioning. Due to space restrictions, we defer this result to the extended version of the paper [24]. However, in our experiments in Section 6, for each refuted equivalence we also report a lower bound on the Kantorovich distance [59] computed by our method.
Related work. The study of PP equivalence is partly motivated by the necessity to reason about the correctness and accuracy of probabilistic inference algorithms. Previous approaches to this issue [9, 13, 16, 28, 30, 38, 39] employ sampling methods that provide statistical guarantees on the correctness of the result.
Relational analyses of PPs mostly focused on sensitivity properties [2, 6, 42, 61], often using coupling proofs [3, 7]. Intuitively, given a probabilistic program and its two inputs, the goal here is to bound the distance of output distributions corresponding to these inputs. In particular, the relational analysis in this context focuses on relations between different inputs, rather than different programs. Indeed, the previous approaches to sensitivity analysis typically exploit the "aligned" control flow of two executions on sufficiently close inputs (with a notable exception of the recent work [37] which aims to dispel the alignment requirement.) Our method makes no such assumptions. Moreover, none of the aforementioned works except for [42] considers programs with conditioning.
The work [66] uses “potential functions” to analyze resource usage difference of a program pair. Potential functions for the two programs are computed simultaneously, similarly to our approach. However, the work does not involve probabilistic programs and its overall aim (cost analysis) differs from ours.
2 Preliminaries
We assume that the reader is familiar with notions from probability theory such as probability measure, random variable and expected value. The term probability distribution refers to a probability measure over a subset of Euclidean space \(\mathbb {R}^n\). We use boldface notation to represent vectors, e.g. \(\textbf{x}\) or \(\textbf{y}\).
Syntax. We consider imperative arithmetic probabilistic programs (PPs) allowing standard programming constructs such as variable assignments, if-branching, sequential composition, go-to statements, and loops. All program variables are assumed to be integer- or real-valued. For PP semantics to be well defined, we assume that all arithmetic expressions are Borel-measurable functions [64].
In addition, we allow two types of probabilistic instructions, namely sampling and conditioning. Sampling instructions appear on the right-hand-side of program variable assignments and are represented in our syntax by a command \({\textbf {sample}}(d)\), where d is a probability distribution. We do not impose specific restrictions and allow sampling instructions from both discrete and continuous distributions. Importantly, we allow the two standard conditioning instructions: \({\textbf {observe}}(\phi )\), where \(\phi \) is a boolean predicate over program variables, and \( {\textbf {score}}(e) \), where \( e \) is a non-negative expression over program variables. A program is conditioning-free if it does not contain any conditioning instructions.
Probabilistic branching instructions, i.e. \({\textbf {if prob}}(p)\), can be obtained as syntactic sugar by using a sampling instruction followed by conditional branching.
Probabilistic control flow graphs (pCFGs). In order to formalize the semantics of our PPs and to present our results and algorithms, we consider an operational representation of PPs called probabilistic control-flow graphs (pCFGs) extended with the concept of weighting as in [63]. The use of pCFGs in the analysis of PPs is standard, see e.g. [1, 19]. Hence, we keep this exposition brief. Formally, a pCFG is a tuple \(\mathcal {C}=( L ,V,V_{ out },\ell _{ in },\textbf{x}_{ in },\mapsto ,G, Up ,w,\ell _{ out })\), where:
\( L \) is a finite set of locations;
\(V=\{x_1,\dots ,x_{|V|}\}\) is a finite set of program variables;
\(V_{ out }= \{x_1,\dots ,x_{|V_{ out }|}\} \subseteq V\) is a finite set of output variables;
\(\ell _{ in }\in L \) is the initial location and \(\textbf{x}_{ in }\in \mathbb {R}^{|V|}\) the initial variable valuation;
\(\mapsto \,\subseteq L \times L \) is a finite set of transitions;
\(G\) is a map assigning to each transition \(\tau \in \,\mapsto \) a guard\(G(\tau )\), which is a predicate over V specifying whether \(\tau \) can be executed.
\( Up \) is a map assigning to each transition \(\tau \in \,\mapsto \) an update\( Up (\tau )=(u_1,\dots ,u_{|V|})\) where for each \(j\in \{1,\dots ,|V|\}\) the variable update \(u_j\) is either
the bottom element \(\bot \) denoting no variable update, or
a Borel-measurable arithmetic expression \(u:\mathbb {R}^{|V|}\rightarrow \mathbb {R}\), or
a probability distribution \(u = \delta \);
\( w:\mapsto \times \mathbb {R}^{|V|} \rightarrow [0,\infty )\) is a weighting function, and
\(\ell _{ out }\in L \) the terminal location.
Translation of PPs into pCFGs is standard [1, 19, 63], hence we omit the details. We denote by \( \mathcal {C}(P) \) the pCFG induced by a program \( P \). Intuitively, locations of \( \mathcal {C}(P) \) correspond to program instructions, while transition(s) out of a location encodes the effects of the respective instruction. Weight of a transition is 1 unless the transition encodes one of the conditioning instructions: If \( \tau \) is encodes a \( {\textbf {score}}(e) \) instruction, we put \( w(\tau ,\textbf{x}) = e(\textbf{x}) \). If \( \tau \) encodes an \( {\textbf {observe}}(\phi ) \) statement, we put \( w(\tau ,\textbf{x}) = 1 \) if \( \textbf{x} \models \phi \) and \( w(\tau ,\textbf{x}) = 0 \) otherwise. Note that if \( P \) is conditioning-free then in \( \mathcal {C}(P) \) we have \( w(\tau ,\textbf{x}) = 1 \) for all \( \tau \) and \( \vec{x} \).
Example 1
(Running example). Fig. 1 shows a PP pair that will serve as our running example. Program variables in both PPs are \(V = \{x,y,c,r\}\), with output variables \(V_{ out }= \{x,y\}\). In each loop iteration, with probability 0.5 the variable x is decremented by a random value sampled according to a beta distribution, whereas with probability 0.5 the variable y is incremented by 1. Hence, y tracks the number of loop iterations in which x is not decremented. Upon loop termination, the score command is used to multiply the probability density of each run by the value of y upon termination. Thus, larger probability density is given to program runs that modify x a smaller number of times. The programs differ only in the parameters of the beta distribution (highlighted in blue).
Fig. 1.
Running example. Labels \(\ell _{ in }\), \(\ell _1, \dots , \ell _7, \ell _{ out }\) correspond to locations in the probabilistic control-flow graphs (pCFGs) of both programs.
States, paths and runs. A state in a pCFG \(\mathcal {C}\) is a tuple \((\ell ,\textbf{x})\), where \(\ell \) is a location and \(\textbf{x}\in \mathbb {R}^{|V|}\) is a variable valuation. A transition \(\tau =(\ell ,\ell ')\) is enabled at a state \((\ell ,\textbf{x})\) if \(\textbf{x}\models G(\tau )\). A state \((\ell ',\textbf{x}')\) is a successor of \((\ell ,\textbf{x})\), if there exists an enabled transition \(\tau =(\ell ,\ell ')\) in \(\mathcal {C}\) such that \(\textbf{x}'\) is a possible result of applying the update of \(\tau \) to \(\textbf{x}\). The state \((\ell _{ in },\textbf{x}_{ in })\) is the initial state. A state \((\ell ,\textbf{x})\) is said to be terminal, if \(\ell =\ell _{ out }\). We use \( State ^\mathcal {C}\) to denote the set of all states in \(\mathcal {C}\).
A finite path in \(\mathcal {C}\) is a sequence of successor states \((\ell _0,\textbf{x}_0),(\ell _1,\textbf{x}_1),\dots ,(\ell _k,\textbf{x}_k)\) with \((\ell _0,\textbf{x}_0)=(\ell _{ in },\textbf{x}_{ in })\). A state \((\ell ,\textbf{x})\) is reachable in \(\mathcal {C}\) if there exists a finite path in \(\mathcal {C}\) whose last state is \((\ell ,\textbf{x})\). A run (or an execution) in \(\mathcal {C}\) is an infinite sequence of states whose each finite prefix is a finite path. We use \( Fpath ^\mathcal {C}\), \( Run ^\mathcal {C}\), and \( Reach ^\mathcal {C}\) to denote the sets of all finite paths, runs, and reachable states in \(\mathcal {C}\).
A run \(\rho \in Run ^\mathcal {C}\) is terminating if it reaches some terminal state \((\ell _{ out },\textbf{x})\). We use \( Term \subseteq Run ^{\mathcal {C}}\) to denote the set of terminating runs in \( Run ^{\mathcal {C}}\). We define the termination time of \(\rho \) via \(T(\rho ) = \inf _{i \ge 0}\{i \mid \ell _i = \ell _{ out }\}\), with \(T(\rho ) = \infty \) if \(\rho \) is not terminating. A cumulative weight of a terminating run \( \rho = (\ell _0,\textbf{x}_0),(\ell _1,\textbf{x}_1),\dots \) is \( W(\rho ) = \prod _{i=0}^{T(\rho )-1} w((\ell _i,\ell _{i+1}),\textbf{x}_i), \) with an empty product being equal to 1.
Semantics. The pCFG semantics are formalized as Markov chains with weights [63]. In particular, a pCFG \(\mathcal {C}\) defines a discrete-time Markov process over the states of \(\mathcal {C}\) whose trajectories correspond to runs in \(\mathcal {C}\). Each trajectory starts in the initial state \((\ell _{ in },\textbf{x}_{ in })\) with weight 1. Then, at each time step \(i \in \mathbb {N}_0\), if the trajectory is in state \((\ell _i,\textbf{x}_i)\), the next trajectory state \((\ell _{i+1},\textbf{x}_{i+1})\) is defined according to the unique pCFG transition \(\tau _i\) enabled at \((\ell _i,\textbf{x}_i)\), and the weight of the trajectory is multiplied by the value of the weight function \(w(\tau _i,\textbf{x}_i)\).
This Markov process gives rise to a probability space \(( Run ^\mathcal {C}, \mathcal {F}^\mathcal {C}, \mathbb {P}^\mathcal {C})\) over the set of all pCFG runs, formally defined via the cylinder construction [50]. We use \(\mathbb {E}^\mathcal {C}\) to denote the expectation operator in this probability space.
Almost-sure termination. We restrict our attention to almost-surely terminating programs, which is necessary for them to define valid probability distributions on output. A program P (or, equivalently, its pCFG \( \mathcal {C}(P) \)) terminates almost-surely (a.s.) if \(\mathbb {P}^{\mathcal {C}(P)}[ Term ] = 1\). A.s. termination can be automatically verified in arithmetic PPs, by e.g. synthesizing a ranking supermartingale (RSM) [14, 17].
3 Equivalence Refutation Problem
For each variable valuation \(\textbf{x} \in \mathbb {R}^{|V|}\), we use \(\textbf{x}^{ out }\) to denote its projection to the components of output variables \(V_{ out }\). For a terminating run \(\rho \) that reaches a terminal state \((\ell _{ out },\textbf{x})\), we say that \(\textbf{x}^{ out }\) is the output of \( \rho \). In an a.s. terminating pCFG \( \mathcal {C}\), the probability measure \( \mathbb {P}^\mathcal {C}\) over runs naturally extends to a probability measure over sets of outputs, weighted by the cumulative weights of individual runs and normalized by the total expected weight over all runs. Formally, the normalized output distribution (NOD) of \(\mathcal {C}\) is the probability distribution over all output variable valuations \(\mathbb {R}^{|V_{ out }|}\) defined by
where (i) B is any Borel-measurable subset of \( \mathbb {R}^{|V_{ out }|} \), (ii) \( \mathbb {I}_{A} \) is the indicator variable of a Borel-measurable set of runs \( A \), and (iii):
$$\begin{aligned} Output (B) = \big \{ \rho \in Run ^\mathcal {C}\mid \rho \textit{ reaches a terminal state } (\ell _{ out },\textbf{x}) \text { s.t. } \textbf{x}^{ out } \in B \big \}. \end{aligned}$$
For the NOD \( \mu ^{\mathcal {C}} \) to be well-defined, we restrict our attention to pCFGs where \( 0 < \mathbb {E}^{\mathcal {C}}\big [ W] < \infty \), in which case we call \( \mathcal {C}\)integrable (in line with [63]).
Problem assumptions. For a pair of PPs \(P^1\) and \(P^2\), we assume that: (1) both \(\mathcal {C}(P^1)\) and \(\mathcal {C}(P^2)\) are a.s. terminating and integrable, for their output distributions to be well defined, and (2) \(\mathcal {C}(P^1)\) and \(\mathcal {C}(P^2)\) share the same set of output variables \(V_{ out }\), for their output distributions to be defined over the same sample space so that their equivalence is well defined. Our algorithm will automatically check (1), and (2) is trivially checked by comparing the two input pCFGs.
Problem. Let \(P^1\) and \(P^2\) be two PPs with pCFGs \(\mathcal {C}(P^1)\) and \(\mathcal {C}(P^2)\) being a.s. terminating, integrable, and having the same set of output variables \(V_{ out }\). Our goal is to prove that \(P^1\) and \(P^2\) are not (output) equivalent, i.e. that there exists a Borel-measurable set \(B \subseteq \mathbb {R}^{|V_{ out }|}\) such that \(\mu ^{\mathcal {C}(P^1)}[B] \ne \mu ^{\mathcal {C}(P^2)}[B]\).
4 Equivalence Refutation Proof Rule
In this section, we first introduce the weighted restarting transformation, and then use it to formulate a proof rule for equivalence refutation in programs with conditioning. We also show the completeness of this proof rule.
4.1 Weighted Restarting
We introduce weighted restarting, which transforms a PP into output equivalent conditioning-free PP. Our method of weighted restarting requires PPs and their induced pCFGs to satisfy the bounded cumulative weight property.
Assumption: Bounded cumulative weight property. We say that a program \( P \) (or equivalently, its pCFG \( \mathcal {C}(P) \)) satisfies the bounded cumulative weight property, if there exists \(M > 0\) such that the cumulative weight of every terminating run \(\rho \) is bounded from above by M, i.e. \(W(\rho ) \le M\). Note that, in an a.s. terminating pCFG \(\mathcal {C}\), the bounded cummulative weight property also implies the upper bound in the integrability property, since if \(W(\rho ) \le M\) holds for all terminating runs then we also have \(\mathbb {E}^{\mathcal {C}(P)}[W] \le M < \infty \). Our algorithm for equivalence refutation in Section 5 will formally verify this property and compute the value of M.
Weighted restarting. Consider a PP P which is a.s. terminating, integrable, and satisfies the bounded cummulative weight property with bound \(M>0\). Weighted restarting produces an output equivalent conditioning-free PP \(P_{\text {res}}\). To achieve this, it introduces a fresh program variable W, which we call the weight variable, to capture all information encoded via observe and score instructions. This allows us to remove all conditioning instructions from the PP. Then, for the output distribution to be equivalent to the normalized output distribution of the original PP, it adds a block of code that terminates a program run with probability proportional to its cumulative weight, and otherwise restarts it and moves it back to the initial program state. This construction is formalized as follows:
1.
Introduce weight variable W. Introduce a fresh program variable W. The value of W is initially set to \(W=1\).
2.
Remove observe instructions. Each \({\textbf {observe}}(\phi )\) instruction is replaced by the following block of code, which sets the cumulative weight of a run to 0 if the predicate in the observe instruction is false:
Remove score instructions. Each \({\textbf {score}}(e)\) instruction is replaced by the following block of code, which multiplies the weight variable W by the value of the expression in the score instruction:
$$\begin{aligned} W := W \cdot e \end{aligned}$$
4.
Add restarting upon termination. Finally, the following block of code is added at the end of the PP, which with probability \(1 - W/M\) ”restarts” a program run by moving it back to the initial state and resetting its weight to \(W=1\). We write "\(\textbf{x} := \textbf{x}_{ in }\)" as a shorthand for the block of code that restarts each program variable value to that specified by the initial variable valuation \(\textbf{x}_{ in }\):
The work [53] also proposed a "restarting" procedure for PPs with observe, but without score, instructions. This is achieved by introducing a new boolean variable \(\texttt {unblocked}\) whose value is initially true, but is set to false if any \({\textbf {observe}}(\phi )\) condition is violated. The program is then embedded into a "\({\textbf {while }} \texttt {unblocked}\):" loop. However, it is not clear how to extend this translation to PPs with score instructions, whereas our method supports both observe and score instructions.
Theorem 1
(Correctness of weighted restarting, proof in [24]). Consider a PP P which is a.s. terminating, integrable, and satisfies the bounded cumulative weight property with bound \(M>0\). Let \(P_{\text {res}}\) be a conditioning-free PP produced by the weighted restarting procedure. Then, \(P_{\text {res}}\) is also a.s. terminating, integrable, and satisfies the bounded cumulative weight property. Moreover, \(\mathcal {C}(P)\) and \(\mathcal {C}(P_{\text {res}})\) are output equivalent, i.e. \( \mu ^{\mathcal {C}(P)} = \mu ^{\mathcal {C}(P_{\text {res}})} \).
Example 2
Fig. 2 shows the PPs produced by the weighted restarting procedure applied to PPs in Fig. 1. Notice that the PPs in Fig. 1 are a.s. terminating, integrable, and satisfy the bounded cumulative weight property with \(M=1000\), as \(y \le 1000\) holds upon loop termination. Weighted restarting produces the PPs in Fig. 2. In the fourth step of the procedure, the restarting upon termination block of code is added to each program in lines \(7-10\). It can be realized using a single pCFG transition, hence we only need one pCFG location \(\ell _8\) for it.
Fig. 2.
PPs produced by weighted restarting applied to the PPs in Fig. 1. The changes introduced by the weighted restarting procedure are highlighted in red.
4.2 Equivalence Refutation in PPs with Conditioning
Theorem 1 effectively reduces the task of equivalence refutation for a PP pair \( (P^1, P^2) \) into the respective task for \( (P_{\text {res}}^1, P_{\text {res}}^2) \) of PPs without conditioning. Following such a reduction, we can apply the refutation rule from [23]. We briefly revisit this proof rules. We additionally show its completeness in settings both with and without conditioning, a result not proven previously.
In what follows, let \(\mathcal {C}=( L ,V,V_{ out },\ell _{ in },\textbf{x}_{ in },\mapsto ,G, Up ,w,\ell _{ out })\) be an a.s. terminating pCFG. We use the following terminology:
A state function\(\eta \) in \(\mathcal {C}\) is a function which to each location \(\ell \in L \) assigns a Borel-measurable function \(\eta (\ell ):\mathbb {R}^{|V|} \rightarrow \mathbb {R}\) over program variables.
A predicate function\(\Pi \) in \(\mathcal {C}\) is a function which to each location \(\ell \in L \) assigns a predicate \(\Pi (\ell )\) over program variables. It naturally induces a set of states \(\{(\ell ,\textbf{x}) \mid \textbf{x} \models \Pi (\ell )\}\), which we also refer to via \(\Pi \). \( \Pi \) is an invariant if it contains all reachable states in \(\mathcal {C}\), i.e. if \(\textbf{x} \models I(\ell )\) for each \((\ell ,\textbf{x}) \in Reach ^{\mathcal {C}}\).
Let \(f: \mathbb {R}^{|V_{ out }|} \rightarrow \mathbb {R}\) be a Borel-measurable function over the pCFG outputs. A UESM (resp. LESM) for f is a state function \(U_f\) (resp. \(L_f\)) that satisfies certain conditions in every reachable state which together ensure that it evaluates to an upper (resp. lower) bound on the expected value of f on the pCFG output. Since it is generally not feasible to exactly compute the set of reachable states, UESMs and LESMs are defined with respect to a supporting invariant that over-approximates the set of all reachable states in the pCFG. This is done in order to later allow for a fully automated computation of UESMs and LESMs in PPs.
Definition 1
(Upper expectation supermartingale (UESM) [23]). Let \(\mathcal {C}\) be an a.s. terminating pCFG, I be an invariant in \(\mathcal {C}\) and \(f:\mathbb {R}^{|V_{ out }|} \rightarrow \mathbb {R}\) be a Borel-measurable function over the output variables of \(\mathcal {C}\). An upper expectation supermartingale (UESM) for f with respect to the invariant I is a state function \(U_f\) satisfying the following two conditions:
1.
Zero on output. For every \(\textbf{x} \models I(\ell _{ out })\), we have \(U_f(\ell _{ out },\textbf{x}) = 0\).
2.
Expected f-decrease. For every location \(\ell \in L \) , transition \(\tau = (\ell ,\ell ') \in \,\mapsto \), and variable valuation \( \textbf{x}\) s.t. \(\textbf{x}\models I(\ell ) \wedge G(\tau )\), we require the following: denoting by \( \textbf{N} \) the expected valuation vector after performing transition \( \tau \) from state \( (\ell ,\vec{x}) \), it holds
$$\begin{aligned} U_f(\ell ,\textbf{x}) + f(\textbf{x}^{ out })\ge \mathbb {E}\Big [U_f(\ell ', \textbf{N}) + f(\textbf{N}^ out ) \Big ] \end{aligned}$$
(1)
with \( \textbf{N}^ out \) the projection of the random vector \( \textbf{N} \) onto \( V_ out \)-indexed variables.
Intuitively, an UESM is required to be equal to 0 on output and, in every computation step, any increase in the \( f \)-value is exceeded in expectation by the decrease in the UESM value. A lower expectation submartingale (LESM) \( L_f \) for \( f \) is defined analogously, with (1) replaced by the dual expected\( f \)-increase condition:
$$\begin{aligned} L_f(\ell ,\textbf{x}) + f(\textbf{x}^{ out })\le \mathbb {E}\Big [L_f(\ell ', \textbf{N}) + f(\textbf{N}^ out ) \Big ]. \end{aligned}$$
(2)
Example 3
Consider the two PPs obtained by weighted restarting in Fig. 2. Their output variables are \(V_{ out }= \{x,y\}\). By Theorem 1, both PPs are a.s. terminating. For a function over outputs \(f(x,y) = x+y\), one can check by inspection that the state function \(U_f\) below defines a UESM for f in the first program, and the state function \(L_f\) below defines an LESM for f in the second program. These are also the U/LESMs produced by our prototype implementation:
U/LESMs can be used to refute equivalence in conditioning-free PPs under so-called OST-soundness conditions. The name is due to the fact that correctness of the proof rule in [23] is proven via Optional Stopping Theorem (OST) [64] from martingale theory. The first three conditions below are imposed by the classical OST, and the fourth condition is derived from the Extended OST [62].
Definition 2
(OST-soundness). Let \( \mathcal {C}\) be a pCFG, \( \eta \) be a state function in \( \mathcal {C}\), and \( f :\mathbb {R}^{|V_ out |} \rightarrow \mathbb {R}\) be a Borel measurable function. Let \( S_i \) be the \( i \)-th state and \( \textbf{x}_i \) the \( i \)-th variable valuation along a run. Define \( Y_i\) by \( Y_i := \eta (S_i) + f(\textbf{x}_i^ out )\) for any \( i \ge 0 \). We say that the tuple \( (\mathcal {C}, \eta , f) \) is OST-sound if \( \mathbb {E}[|Y_i|] < \infty \) for every \( i\ge 0 \) and moreover, at least one of the following conditions (C1)–(C4) holds:
(C1)
There exists a constant \( c \) such that \( T\le c \) with probability 1.
(C2)
There exists a constant \( c \) such that for each \( t \in \mathbb {N}\) and run \( \rho \) we have \(|Y_{\min \{t, T(\rho )\}}(\rho )| \le c\) (i.e., \( Y_{i}(\rho ) \) is uniformly bounded along the run).
(C3)
\( \mathbb {E}[T] < \infty \) and there exists a constant \( c \) such that for every \( t \in \mathbb {N}\) it holds \( \mathbb {E}[|Y_{t+1}-Y_{t}|\mid \mathcal {F}_t] \le c \) (i.e., the expected one-step change of \( Y_i \) is uniformly bounded over runtime).
(C4)
There exist real numbers \( M, c_1, c_2, d \) such that (i) for all sufficiently large \( n \in \mathbb {N}\) it holds \( \mathbb {P}(T> n) \le c_1 \cdot e^{-c_2 \cdot n} \); and (ii) for all \( t \in \mathbb {N}\) it holds \( |Y_{n+1} - Y_n| \le M\cdot n^d \).
We now present our new proof rule for equivalence refutation in PPs with conditioning. Given a pair \( P^1,P^2 \) of a.s. terminating PPs that satisfy the bounded cumulative weight property, our proof rule applies the weighted restarting procedure to produce the pCFGs \( \mathcal {C}(P_{\text {res}}^1) \) and \( \mathcal {C}(P_{\text {res}}^2) \) and then applies the proof rule of [23] for PPs without conditioning. The following theorem formalizes the instantiation of the proof rule of [23] to the pCFG pair \( \mathcal {C}(P_{\text {res}}^1) \) and \( \mathcal {C}(P_{\text {res}}^2) \).
Theorem 2
(Equivalence refutation for PPs with conditioning). Let \( P^1,P^2 \) be two a.s. terminating programs (possibly with conditioning) satisfying the bounded cumulative weight property. Let the initial states of \( \mathcal {C}(P_{\text {res}}^1) \) and \( \mathcal {C}(P_{\text {res}}^2) \) be \( (\ell _{ in }^1,\textbf{x}_{ in }^1) \) and \( (\ell _{ in }^2,\textbf{x}_{ in }^2) \), respectively. Assume that there exist a Borel-measurable function \( f :\mathbb {R}^{|V_ out |} \rightarrow \mathbb {R}\) and two state functions, \( U_f \) for \( \mathcal {C}(P_{\text {res}}^1) \) and \( L_f \) for \( \mathcal {C}(P_{\text {res}}^2) \) such that the following holds:
i)
\( U_f \) is a UESM for \( f \) in \( \mathcal {C}(P_{\text {res}}^1) \) such that \( (\mathcal {C}(P_{\text {res}}^1), U_f, f) \) is OST-sound;
ii)
\( L_f \) is an LESM for \( f \) in \( \mathcal {C}(P_{\text {res}}^2) \) such that \( (\mathcal {C}(P_{\text {res}}^2), L_f, f) \) is OST-sound;
iii)
\( U_f(\ell _{ in }^1, \textbf{x}_{ in }^1) + f((\textbf{x}_{ in }^1)^ out ) < L_f(\ell _{ in }^2, \textbf{x}_{ in }^2) + f((\textbf{x}_{ in }^2)^ out ) \).
Then \( P^1,P^2 \) are not output equivalent,
The theorem proof follows immediately by the correctness of weighted restarting (Theorem 1) and the correctness of the proof rule in [23, Theorem 5.5].1
While [23] establishes soundness of the above proof rule for PPs without conditioning, which here we generalize to PPs with conditioning, to the best of our knowledge there is no known proof rule that is also complete. In what follows, we show that the proof rule in Theorem 2 is not only sound, but also complete. This result is new both in the setting of PPs with and without conditioning.
Theorem 3
(Completeness of the proof rule). Let \( P^1, P^2 \) be a two a.s. terminating programs satisfying the bounded cumulative weight property. Assume that they are not output equivalent. Then there exists a Borel-measurable function \( f :\mathbb {R}^{|V_ out |} \rightarrow \mathbb {R}\) and two state functions, \( U_f \) for f in \( \mathcal {C}(P_{\text {res}}^1) \) and \( L_f \) for f in \( \mathcal {C}(P_{\text {res}}^2) \), satisfying the conditions of Theorem 2.
Proof
Since \( \mu ^{\mathcal {C}(P^1)} \ne \mu ^{\mathcal {C}(P^2)}\), by Theorem 1 also \( \mu ^{\mathcal {C}(P_{\text {res}}^1)} \ne \mu ^{\mathcal {C}(P_{\text {res}}^2)}\). Hence, there is a Borel-measurable set \( B \subseteq \mathbb {R}^{V_{ out }} \) such that, w.l.o.g., \( \mu ^{\mathcal {C}(P_{\text {res}}^1)}[B] > \mu ^{\mathcal {C}(P_{\text {res}}^2)}[B] \). Let \(p^i_B(\ell ,\vec{x}) = \mathbb {P}^{\mathcal {C}(P_{\text {res}}^i)}[ Output (B) \mid \text {run initialized in } (\ell ,\vec{x})]\) be the probability of outputting a vector from \( B \) if \( \mathcal {C}(P_{\text {res}}^i) \) starts in \( (\ell ,\vec{x}) \), for \(i \in \{1,2\}\). We define:
A function \( f = \mathbb {I}_{B} \) over outputs;
A state function \( U_f(\ell ,\vec{x}) = p^1_B(\ell ,\vec{x}) - f(\vec{x}^ out ) \) in \( \mathcal {C}(P_{\text {res}}^1) \);
A state function \( L_f(\ell ,\vec{x}) = p^2_B(\ell ,\vec{x}) - f(\vec{x}^ out ) \) in \( \mathcal {C}(P_{\text {res}}^2) \).
We prove that \( U_f \) is an UESM f in \( \mathcal {C}(P_{\text {res}}^1) \), the argument for \( L_f \) is analogous. For any terminal state \( (\ell ,\vec{x}) \) it holds \( p^1_b(\ell ,\vec{x}) = \mathbb {I}_{B}(\vec{x}^ out ) = f(\vec{x}^ out ) \), hence \( U_f \) is zero on output. Moreover, for any state \( (\ell ,\vec{x}) \) it holds \( U_f(\ell ,\vec{x}) + f(\vec{x}^ out ) = p^1_B(\ell ,\vec{x}) \). Since \( p^1_B(\ell ,\vec{x}) = \mathbb {E}[p^1_B(\ell ',\textbf{N})] \) (the standard flow conservation of reachability probabilities), we get that \( U_f \) satisfies the expected \( f \)-decrease condition.
Regarding OST soundness, note that \( U_f(\ell ,\vec{x}) + f(\vec{x}^ out ) = p_B^1\in [0,1] \) for any state \( (\ell ,\vec{x}) \). Hence, \( (\mathcal {C}(P_{\text {res}}^1),U_f,f) \) is OST-sound – it satisfies condition (C2) – and similarly for \( (\mathcal {C}(P_{\text {res}}^2),L_f,f) \).
It remains to verify condition iii) from Theorem 2. We have
$$\begin{aligned} U_f(\ell _{ in }^1, \textbf{x}_{ in }^1) + f((\textbf{x}_{ in }^1)^ out ) &= p^1_B(\ell _{ in },\textbf{x}_{ in }) = \mu ^{\mathcal {C}(P_{\text {res}}^1)}[B] < \mu ^{\mathcal {C}(P_{\text {res}}^2)}[B] \\ &= p^2_B(\ell _{ in },\textbf{x}_{ in }) = L_f(\ell _{ in }^1, \textbf{x}_{ in }^1) + f((\textbf{x}_{ in }^1)^ out ). \end{aligned}$$
Example 4
To conclude this section, we illustrate our proof rule on our running example in Fig. 1. Consider the PPs \((P_{\text {res}}^1,P_{\text {res}}^2)\) obtained by weighted restarting in Fig. 2, and the function f over outputs, the UESM \(U_f\) for f in \(\mathcal {C}(P_{\text {res}}^1)\), and the LESM \(L_f\) for f in \(\mathcal {C}(P_{\text {res}}^2)\) constructed in Example 3. The triples \((\mathcal {C}(P_{\text {res}}^1),U_f,f)\) and \((\mathcal {C}(P_{\text {res}}^2),L_f,f)\) are both OST-sound and satisfy (C2) in Definition 2, as all variable values in \(P_{\text {res}}^1\) and \(P_{\text {res}}^2\) are bounded. Moreover, we have \(U_f(\ell _{ in }^1, \textbf{x}_{ in }^1) + f((\textbf{x}_{ in }^1)^ out ) = 500 < 600 = L_f(\ell _{ in }^2, \textbf{x}_{ in }^2) + f((\textbf{x}_{ in }^2)^ out )\). Hence, \((f,U_f,L_f)\) satisfy all conditions of Theorem 2, and PPs in Fig. 1 are not output equivalent.
5 Automated Equivalence Refutation
We now present our algorithm for equivalence refutation in PPs that may contain conditioning instructions. Given a PP pair \( (P^1, P^2) \), the algorithm first applies weighted restarting to translate them into output equivalent conditioning-free PPs \( (P_{\text {res}}^1, P_{\text {res}}^2) \). Then, the algorithm applies the equivalence refutation procedure of [23] for PPs without conditioning. Hence, given that our algorithm builds on that of [23], we only present an overview of the algorithm and refer the reader to [23] for details. However, this section presents two important novel results:
1.
Semi-completeness guarantees. Building on our result in Section 4 which shows that our proof rule is sound and complete, we show that our algorithm for equivalence refutation is sound and semi-complete. In particular, the algorithm is guaranteed to refute equivalence of PPs whenever non-equivalence can be witnessed by a function f and a pair of U/LESMs that can be specified via polynomial functions of some maximal polynomial degree D, and that satisfy OST-soundness condition (C2) in Definition 2.
2.
Streamlined algorithm for bounded termination time PPs. We show that the algorithm can be streamlined for the class of PPs whose (1) termination time is bounded by some constant value \(T > 0\), and (2) all sampling instructions consider probability distributions of bounded support. The first assumption is common and satisfied by most PPs considered in statistical inference literature, including all our benchmarks in Section 6.
Algorithm assumptions. In order to allow for a fully automated equivalence refutation, we need to impose several assumptions on our PPs:
Polynomial arithmetic. We require all arithmetic expressions appearing in PPs to be polynomials over program variables. This restriction is necessary for the full automation of our method. We also require that each probability distribution appearing in sampling instructions in pCFGs has finite moments, i.e. for each \( p \in \mathbb {N}\), the \( p \)-th moment \(m_{\delta }(p) = \mathbb {E}_{X \sim \delta }[|X|^p]\) is finite and can be computed by the algorithm. This is a standard assumption in static PP analysis and allows for most standard probability distributions.
Almost-sure termination. For the equivalence refutation problem to be well-defined and for our proof rule to be applicable, we require both PPs to be a.s. terminating. In polynomial arithmetic PPs, this requirement can be automatically checked by using the method of [17].
Bounded cumulative weight property. To automatically check this property, we can use an off-the-shelf invariant generator, e.g. [18, 31, 55], to generate an invariant which proves that \(W \le M\) holds for the weight variable W and some sufficiently large constant value M.
Supporting linear invariants. Recall that we defined UESMs and LESMs with respect to supporting invariants. To that end, before proceeding to equivalence refutation, our algorithm first synthesizes linear invariants\(I_1\) and \(I_2\). This can be fully automated by using existing linear invariant generators [31, 55], as done in our implementation. For the purpose of invariant generation, sampling instructions are semantically over-approximated via non-determinism, hence ensuring that the computed invariants are sound.
Template-based synthesis algorithm. Our algorithm first uses weighted restarting to produce \( P_{\text {res}}^1 \) and \( P_{\text {res}}^2 \), and constructs their pCFGs \( \mathcal {C}(P_{\text {res}}^1) \) and \( \mathcal {C}(P_{\text {res}}^2) \). It then follows a template-based synthesis approach and simultaneously synthesizes the triple \((f, U_f, L_f)\) required by our proof rule in Theorem 2. The algorithm proceeds in four steps. In Step 1, it fixes templates for f, \(U_f\) and \(L_f\), where the templates are symbolic polynomials over pCFG variables of some maximal polynomial degree D, an algorithm parameter provided by the user. In Step 2, the algorithm collects a system of constraints over the symbolic polynomial template variables, which together encode that \((f, U_f, L_f)\) satisfy all the requirements and give rise to a correct instance of the proof rule in Theorem 2. At this point, the resulting constraints contain polynomial expressions and quantifier alternation, which makes their solving challenging. Thus, in Step 3, the algorithm translates the collected system of constraints into a purely existentially quantified system of linear constraints over the symbolic template variables. This is done via Handelman’s theorem [41], analogously to previous template-based synthesis PP analyses [4, 62, 66]. In Step 4, the algorithm solves the system of linear constraints via a linear programming (LP) tool. If the system is feasible, its solution yields a concrete instance of \((f, U_f, L_f)\) from Theorem 2 and the algorithm reports “Not output-equivalent”. Due to reducing the synthesis to an LP instance, the runtime complexity of our algorithm is polynomial in the size of the pCFGs, for any fixed value of the polynomial degree parameter D. Finally, the proof of Theorem 3 establishes completeness of our proof rule with (C2) OST-soundness condition, and here we show semi-completeness of our algorithm.
Theorem 4
(Algorithm correctness, complexity and semi-completeness, proof in [24]). The algorithm is correct. If the algorithm outputs ”Not output-equivalent”, then \( (P^1, P^2) \) are indeed not output-equivalent and the computed triple \((f, U_f, L_f)\) forms a correct instance of the proof rule in Theorem 2.
Furthermore, the runtime complexity of the algorithm is polynomial in the size of the programs’ pCFGs, for any fixed value of the polynomial degree D.
Finally, the algorithm is semi-complete, meaning that if there exists a valid triple \((f, U_f, L_f)\) that can be specified via polynomial expressions of degree at most D and that satisfy OST-soundness condition (C2) in Definition 2, then the algorithm is guaranteed to output ”Not output-equivalent”.
Streamlined algorithm for bounded termination time PPs. We conclude this section by showing that the algorithm can be streamlined for the class of PPs whose (1) termination time is bounded by some \(T > 0\), and (2) all sampling instructions consider probability distributions of bounded support. We show that for this class of PPs, any polynomial function f over outputs and any polynomial UESM \(U_f\) (resp. LESM \(L_f\)) for f are guaranteed to be (C2) OST-sound. Hence, the algorithm does not need to check OST-soundness, which significantly simplifies the system of constraints that needs to be solved by the SMT-solver.
Theorem 5
(Proof in [24]). Let P be a PP with (1) termination time bounded by some \(T > 0\), and (2) all sampling instructions considering probability distributions of bounded support. Then, for any polynomial function f over outputs, polynomial UESM \(U_f\) for f in \(\mathcal {C}(P_{\text {res}}^1)\) and LESM \(L_f\) for f in \(\mathcal {C}(P_{\text {res}}^2)\), \((\mathcal {C}(P_{\text {res}}^1),U_f,f)\) and \((\mathcal {C}(P_{\text {res}}^2),L_f,f)\) satisfy the (C2) OST-soundness condition.
6 Experimental Results
Implementation. We implemented a prototype of our equivalence refutation algorithm. The prototype tool is implemented in Java. Gurobi [40] is used for solving the LP instances while STING [55] and ASPIC [31] are used for generating supporting invariants. For each input program pair, the tool tries to perform polynomial template-based synthesis by using polynomial degrees 1 to 5 in order. All experiments were conducted on an Ubuntu 24.04 machine with an 11th Gen Intel Core i5 CPU and 16 GB RAM with a timeout of 10 minutes.
Baseline. To the best of our knowledge, no prior work has considered formal and automated equivalence refutation for PPs with conditioning. However, a viable alternative approach to refuting PP equivalence would be to first compute the output distributions of the two PPs via exact symbolic inference, and then to use mathematical computation software to show that these distributions are different. Hence, we compare our method against such a baseline. We use the state-of-the-art exact inference tool PSI [34] for computing output distributions, and Mathematica [65] for proving that they are different.
Table 1.
Experimental results showing the performance of our tool and the baseline for refuting equivalence of PPs. A \(\checkmark \) in the “Eq. Ref.” column represents that the tool successfully refuted equivalence of the two input programs, whereas “TO” and “NA” stand for “Timeout” and “Not Applicable”, respectively. The Distance column shows the Kantorovich distance lower bound computed by out tool for the similarity refutation problem (see Section 1).
Benchmarks. While our method is applicable to PPs with unbounded loops, PSI (and hence our baseline) is restricted to PPs with statically bounded loops. Hence, for all our benchmarks, we bound the maximal number of iterations of each loop by 1000. Note that, while this modification ensures that all loops admit an upper bound on the number of iterations, it still allows loops to terminate early (i.e. in strictly less than 1000 iterations) if the termination condition is met early. Hence, executions of programs are not aligned by this modification, making them challenging in structure for relational analysis.
We ran our tool and the baseline on three sets of benchmarks. For each benchmark in the first two sets, we obtain a pair of non-output equivalent PPs by slightly perturbing exactly one of the sampling instructions appearing in the original PP. Benchmarks in the third set already come in pairs:
1.
We consider 12 PPs with discrete probability distributions from the PSI benchmark suite (taken from the repository of [11]). We selected benchmarks supported by our syntax, e.g., avoiding reading data from external .csv files.
2.
We consider 18 loopy programs from [23] on equivalence refutation in PPs without conditioning (benchmarks originating from [44, 62]). In order to introduce conditioning statements, to each program we add an observe statement at the end, which observes a bound on one of the output variables.
3.
Lastly, we consider the benchmarks used in [63] for symbolic inference in PPs with conditioning. Specifically, we consider pairs of benchmarks that already have several versions in the repository of [63] (for example there are four versions of the rdwalk program that were different only in one parameter; we ran our program on all 6 possible program pairs). All the programs in this benchmark set contain score statements. Note that PSI (and hence the baseline) does not support score statements. However, some benchmarks from this repository contain "if (g) score(1) else score(0) fi" fragments, which we replace by \(\texttt {observe(g)}\) in order to make PSI applicable.
Results. Our experimental results are shown in Table 1. We observe the following:
On the first benchmark set in which PPs terminate in a small number of steps, both our method and the baseline manage to refute equivalence of all 12 PP pairs. Our method is faster in 8/12 cases, whereas the baseline is faster in 4/12 cases. However, in some of these cases our method is significantly slower (e.g. coinBiasSmall), because the synthesis algorithm needed to use degree 5 polynomial templates to refute equivalence. However, our method overall shows a better performance than the baseline. Moreover, our method successfully computes Kantorovich distance lower bounds in 11/12 cases.
On the second benchmark set in which PPs take much longer time to terminate (e.g loops may be executed for up to 1000 iterations), our method refutes equivalence in 12/18 cases, whereas the baseline fails in all cases. This shows an advantage of our method. The baseline is based on exact symbolic inference, meaning that it needs to symbolically execute the program which is a computational bottleneck when the program runtime is large. In contrast, our method is a static analysis method and does not suffer from this limitation. When it comes to computing lower bounds on Kantorovich distance, our tool succeeds in only 4/18 cases. We believe this is due to (i) the invariant generation tool StIng failing to compute invariants in several cases, or (ii) the UESM/LESMs needed being non-polynomial (e.g. for Species Fight).
On the third benchmark set, our tool successfully refutes equivalence and computes lower bounds on distance in 10/11 cases. These results demonstrate the applicability of our method in the presence of both observe and score statements. The baseline solves no cases – fails on 3 whereas it is not applicable in 8 cases as PSI does not support score statements.
Limitations. In our experiments, we also observed a limitation of our algorithm, addressing which is an interesting direction of future work. Supporting invariants play an important role for guiding the U/LESM synthesis algorithm. Without good supporting invariants, the synthesis algorithm might fail. Generating good invariants can sometimes be time consuming. For example, our tool takes 169 seconds to solve the coupon4 benchmark in our experiments, because STING and ASPIC invariant generators take more than 2 minutes to generate an invariant.
7 Conclusion
We presented a new method for refuting equivalence of PPs with conditioning. To the best of our knowledge, we presented the first method that is fully automated, provides formal guarantees, and is sound in the presence of conditioning instructions. Moreover, we show our proof rule is also complete and our algorithm is semi-complete. Our prototype implementation demonstrates the applicability of our method to a range of pairs of PPs with conditioning. An interesting direction of future work would be to consider the automation of our method for non-polynomial programs. Another interesting direction would be to consider the applicability of the weighted restarting procedure to other static analyses in PPs with conditioning, towards reducing them to the conditioning-free setting.
Acknowledgments
This work was partially supported by ERC CoG 863818 (ForM-SMArt) and Austrian Science Fund (FWF) 10.55776/COE12. Petr Novotný is supported by the Czech Science Foundation grant no. GA23-06963S.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
We remark that the program syntax considered in [23] does not include go-to statements. However, the definition of U/LESMs and the proof rule in [23] are formalized with respect to pCFGs. Each go-to statement in a program can simply be modeled as a new edge in the program’s pCFG. Since [23] does not make any restriction on the topology of pCFGs, their result remains correct even for pCFGs induced by conditioning-free PPs with go-to statements.