Zum Inhalt
Erschienen in:

Open Access 2025 | OriginalPaper | Buchkapitel

Verifying Algorithmic Versions of the Lovász Local Lemma

verfasst von : Rongen Lin, Hongjin Liang, Xinyu Feng

Erschienen in: Programming Languages and Systems

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Die Lovász Local Lemma (LLL) ist ein leistungsstarkes Werkzeug in der Kombinatorik, das die Existenz kombinatorischer Objekte mit bestimmten Eigenschaften in einem Wahrscheinlichkeitsraum garantiert. Dieses Kapitel konzentriert sich auf die algorithmischen Versionen der LLL, insbesondere auf den Moser-Tardos-Algorithmus (MT), der durch iterative Suche im Wahrscheinlichkeitsraum einen konstruktiven Beweis für die LLL liefert. Der Text untersucht die Bedeutung einer formalen Überprüfung der Richtigkeit und Effizienz dieser Algorithmen und hebt die Herausforderungen beim Umgang mit endlosen Hinrichtungsspuren und die Notwendigkeit rigoroser Beweismethoden hervor. Sie führt zwei Proof-Rezepte ein: Schleifenabschaltung und Kopplung auf Basis von Resampling-Tabellen. Schleifenkürzung transformiert Schleifen, um endliche Ausführungsspuren zu verarbeiten, was es einfacher macht, die Richtigkeit der Algorithmen zu überprüfen. Die Kopplung auf der Grundlage von Resampling-Tabellen formalisiert die informellen Beweisideen bestehender Algorithmen und bietet eine prägnante und formale Methode zur Verifizierung relationaler Eigenschaften. Das Kapitel verifiziert mehrere wichtige Ergebnisse aus der bestehenden Literatur und präsentiert ein neues Ergebnis im Zusammenhang mit dem MT-Algorithmus, das die Anwendbarkeit und Effektivität der vorgeschlagenen Nachweisrezepte demonstriert. Darüber hinaus wird die Beziehung zwischen diesen Proof-Rezepten und bestehenden formalen Proof-Methoden diskutiert und ein umfassender Überblick über den aktuellen Stand der probabilistischen Programmverifizierung im Kontext der Lovász Local Lemma geboten.

1 Introduction

The Lovász Local Lemma [19, 57] (LLL) is a powerful tool in combinatorics. It guarantees the existence of a combinatorial object with certain properties in a probability space. It has also been helpful for proving the existence of solutions to numerous significant problems in computer science, such as the Boolean Satisfiability Problem and the Graph Coloring Problem, since these problems can be viewed as instances of the problem of finding some combinatorial objects.
Besides proving the solution’s existence, we also want to efficiently construct a solution. To this end, people have devised algorithmic versions of the Lovász Local Lemma (ALLLs). The most notable one is the Moser-Tardos (MT) algorithm proposed by Moser and Tardos in their Gödel Prize-winning paper [50]. The algorithm searches the probability space for the desired combinatorial object iteratively, bringing us a constructive proof for LLL. It is efficient in that the expected total number of iterations is bounded. Since then, a huge number of works have emerged, some explore the power of the MT algorithm [1, 31, 37, 42, 43, 53], some find variants of the MT algorithm [13, 16, 25, 2931, 34, 36], and some utilize the MT algorithm to solve problems in various areas of computer science [9, 14, 15, 23, 26, 27, 31, 33, 43, 55], including applications in real-world systems [2, 39].
Therefore it is of great importance to formally verify the (total) correctness of ALLLs, in particular, that the MT algorithm and its variants almost surely terminate (i.e. terminate with probability 1) and their expected iteration times have certain upper bounds. Previous works (e.g. [50]) have given proofs for the correctness of ALLLs, though these proofs are rather informal. Therefore, a natural choice is to formally verify ALLLs by formalizing existing informal proofs.
However, we encounter a challenge when verifying ALLLs by following existing proofs. We propose Proof Recipe 1 to circumvent this challenge, and propose Proof Recipe 2 for completing the verification after applying Proof Recipe 1.
Challenge: Handling infinite execution traces.
It is challenging to formulate some subgoals in ALLLs’ existing informal proofs using distribution-based semantics, which is commonly used in the literature of probabilistic program verification. The reason is that, on the one hand, these subgoals are about complex properties of the algorithm’s execution traces, and we have to take infinite traces into account until we prove their absence. On the other hand, distribution-based semantics can only describe certain simple properties of these infinite traces, e.g. their overall probability.
Proof Recipe 1. We propose a proof recipe called loop truncation to circumvent the above challenge. For a loop in an ALLL, we transform it to a set of arbitrarily truncated loops. Now we have a set of “truncated algorithms”, which can only generate finite execution traces. Then, instead of directly verifying the original algorithm, we prove a common bound of the expected iteration times for all the truncated algorithms. The latter can be proved following existing proofs, and now we do not have to handle infinite traces when formulating the subgoals.
Proof Recipe 2. A crucial step commonly found in many proofs of ALLLs, is to prove an inequality between probabilities involving two programs. Specifically, for the original ALLL program \(C_1\) and a property \(\textbf{p}\), one constructs a program \(C_2\) and a property \(\textbf{q}\), and shows that the probability of \(\textbf{p}\) holding after \(C_1\)’s execution is not greater than the probability of \(\textbf{q}\) holding after \(C_2\)’s execution.
To prove this inequality, existing informal proofs introduce variants of \(C_1\) and \(C_2\), say \(C'_1\) and \(C'_2\), that use a new random source called resampling table. By assuming that \(C_1\) and \(C_2\) are respectively equivalent to \(C'_1\) and \(C'_2\), they reduce the original inequality to a similar inequality that involves \(C'_1\) and \(C'_2\), and prove the latter. We elaborate on these proofs in Sec. 2.1.
Following the above proof idea, we propose a proof recipe called resampling-table-based coupling to formally prove the aforementioned inequality. At the core of this proof recipe is a new measure-theoretic semantics for probabilistic programs, which we call a resampling-table-based semantics. This semantics formalizes the resampling table in existing proofs as a built-in structure. We formulate \(C'_1\) (\(C'_2\)) by giving \(C_1\) (\(C_2\)) this new semantics without changing its syntax, and express the equivalence between \(C_1\) and \(C'_1\) (\(C_2\) and \(C'_2\)) as the equivalence between a classic probabilistic semantics and the new semantics. We prove the semantics equivalence once and for all, instead of repeatedly proving the equivalence between every pair of programs. Then it remains to prove the inequality involving \(C'_1\) and \(C'_2\), which is now an inequality on the new semantics.
Our proof recipe, resampling-table-based coupling, further reduces the problem to verifying the two programs \(C'_1\) and \(C'_2\) individually. The idea is to introduce an intermediate assertion specifying the resampling table as the common random source to bridge the two programs’ unary verification. The unary verification can be done using a simple Hoare-style program logic.
Contributions.
Using the above two proof recipes, we have successfully verified several ALLL-related results. In summary, we make the following contributions:
  • We verify six important results from [31, 42, 50, 53] for the first time. They include all the three “probabilistic” results from Moser and Tardos’s Gödel Prize-winning paper [50].
  • We propose a proof recipe called loop truncation, which circumvents the challenge when verifying ALLLs with classic distribution-based semantics.
  • We propose a proof recipe called resampling-table-based coupling. It expresses the informal proof idea of an important inequality in a formal and concise way, taking a perspective of semantics equivalence and Hoare-style reasoning.
  • We propose a new result related to the Moser-Tardos algorithm, with results from [42, 50, 53] as its corollaries. The statement and the proof of this result are formal, and the proof is done by applying our proof recipes.
Our proof recipes can also be used to prove general properties (i.e. total correctness and inequalities between probabilities) of probabilistic programs beyond ALLLs (see Ex. 1 and Ex. 2). We also discuss the relationship between our proof recipes and existing formal proof methods for positive almost sure termination and asynchronous coupling in Sec. 7.
Outline. We review the original informal proof of the MT algorithm, and introduce the challenge and our main ideas in Sec. 2. We then give the mathematical preliminaries in Sec. 3, and define the programming language, including our new semantics, in Sec. 4. Then we introduce our two proof recipes in Sec. 5. By applying these recipes, we verify six existing important ALLL-related results and a new result in Sec. 6. We finally discuss related work in Sec. 7.
The technical report [46] contains the full formal details of this work, including all the definitions and all the proofs for lemmas, theorems and examples.

2 Informal Development

To formally verify the ALLL-related results, a natural choice is to follow their original informal proofs. Below we first provide a brief overview of the original informal proof of Moser and Tardos’s seminal result [50], which serves as an example for understanding the ideas behind the original proofs of many ALLL-related results. We then explain the verification challenge and our proof recipes.

2.1 Moser and Tardos’s Proof

The Moser-Tardos (MT) algorithm efficiently constructs a solution for the following problem. Given \(N\) program variables \(X_1, \dots , X_N\) and \(M\) events \(\eta _1, \dots , \eta _M\), where each variable is associated with some random distribution and each event depends on some of \(X_1, \dots , X_N\), we would like to construct an assignment of \(X_1, \dots , X_N\) such that none of the \(M\) events occurs. The Lovász Local Lemma [19, 57] provides the Erdős-Lovász condition which sufficiently ensures the existence of such assignments. The MT algorithm finds such an assignment as shown in Fig. 1. Here “(re-)sample \(X_i\)” means the following: sample from the random distribution with which \(X_i\) is associated, and assign the result to \(X_i\).
Fig. 1.
The MT algorithm
Fig. 2.
The \(\text {check}(wt)\) algorithm
Moser and Tardos prove that, under the Erdős-Lovász condition, the expectation of the total iteration number of the algorithm’s outer loop is no more than a real number \(r_\textsf{EL}\), and thus the algorithm almost surely terminates. (Here we do not expose the definitions of the Erdős-Lovász condition and \(r_\textsf{EL}\), which can be found in Thm. 4.) In the remainder of this subsection, we sketch their proof.
Restatement of the proof goal. Moser and Tardos restate their proof goal using execution logs. For every execution of the algorithm, its execution log \(\varLambda \) is a sequence of events \(\eta _j\), which are dynamically chosen at the beginning of the outer loop iterations. We write \({\varLambda }\langle {i}\rangle \) for the i-th element of \(\varLambda \), which is the event chosen at the i-th iteration. We write \(|\varLambda |\) for the length of \(\varLambda \), so it specifies the total number of the outer loop iterations. If the loop does not terminate in an execution, then \(|\varLambda | = \infty \). Now, Moser and Tardos restate their proof goal as
$$\begin{aligned} \mathbb {E}[|\varLambda |] \le r_\textsf{EL} . \end{aligned}$$
(1)
That is, the expected length of the execution log has an upper bound \(r_\textsf{EL}\), where the randomness of \(\varLambda \) comes from the randomness of the MT algorithm. From (1), Moser and Tardos conclude that the program almost surely terminates. The proof of (1) can be divided into three stages, which will be discussed in turn.
Stage 1. In this stage, Moser and Tardos rewrite \(\mathbb {E}[|\varLambda |]\) by defining a special mathematical structure called witness trees. A witness tree \(wt\) is a tree with some special properties, where each node is labeled with an event from \(\eta _1, \dots , \eta _M\). One can construct a witness tree \(wt\) from an execution log \(\varLambda \) following some specific procedure, and we write \(wt= f_\textsf{WT}(\varLambda )\) for this. From the concrete definitions and properties of \(wt\) and \(f_\textsf{WT}\) (which we omit here), Moser and Tardos rewrite \(\mathbb {E}[|\varLambda |]\) as the infinite series in (2). It enumerates all witness trees \(wt\), and sums the probabilities that \(wt\) can be constructed from some prefix of \(\varLambda \) (that is, there exists a sequence \(\varLambda '\) such that: \(\varLambda '\) is a prefix of \(\varLambda \), and \(wt= f_\textsf{WT}(\varLambda ')\) holds).
$$\begin{aligned} \mathbb {E}[|\varLambda |] = \sum _{wt} \Pr [wt= f_\textsf{WT}(\text {some prefix of} \ \varLambda )] \end{aligned}$$
(2)
Stage 2. Next, Moser and Tardos give an upper bound of the probability in (2). That is, for all witness trees \(wt\), they prove that
$$\begin{aligned} \Pr [wt= f_\textsf{WT}(\text {some prefix of} \ \varLambda )] \le p(wt), \end{aligned}$$
(3)
where \(p(wt)\) is a specific real number related to \(wt\), whose definition we omit. Instead of directly proving (3) (which is challenging), Moser and Tardos construct a program \(\text {check}(wt)\), which outputs either 0 or 1, and then prove the following:
(a)
The \(\text {check}(wt)\) algorithm outputs 1 with probability \(p(wt)\).
 
(b)
\(\Pr [wt\!=\! f_\textsf{WT}(\text {some prefix of} \ \varLambda )] \le \Pr [\text {check}(wt)\text { outputs }1]\).
 
(3) then follows from the above two properties. The proof of (a) is not difficult. What is really interesting is the proof of (b). To see this, we present the \(\text {check}(wt)\) algorithm in Fig. 2, where \(g_\textsf{WT}(wt)\) gives us an event sequence collecting the labels of \(wt\)’s nodes in a certain order (in fact, a reversed BFS ordering of \(wt\)).
To prove (b), Moser and Tardos observe that whenever \(wt\) can be generated by the MT algorithm and \(\text {check}(wt)\) is run on the same random source, \(\text {check}(wt)\) outputs 1. They capture this observation by specifying the random sources using resampling tables (RT) and letting the algorithms explicitly use the tables.
Specifically, Moser and Tardos give an RT-MT algorithm1, and assume that it is “equivalent” to the MT algorithm, i.e., the two algorithms produce the same distribution of execution logs. The idea of the RT-MT algorithm is to transfer the lazy samplings in the MT algorithm to eager ones: the RT-MT algorithm performs all the samplings ahead of time and stores the results in an table (the RT) so that it can interpret all subsequent samplings as deterministic table queries.
Fig. 3.
The RT-MT algorithm
Fig. 4.
The \(\text {RT-check}(wt)\) algorithm
The RT-MT algorithm is shown in Fig. 3, where we highlight the difference with Fig. 1 in blue. At the beginning, the RT-MT algorithm randomly generates a resampling table \(RT\), which has \(N\) rows and an infinite number of columns. For all \(i \in [1, N]\), this step independently samples \(X_i\) an infinite number of times, and fills the i-th row of \(RT\) with these samples. Subsequently, every sampling step of the MT algorithm is replaced by a table-query step in the RT-MT algorithm. For instance, resampling \(X_i\) is replaced by reading the leftmost unread element from the i-th row of \(RT\), and assigning the result to \(X_i\).
Similarly, Moser and Tardos give the \(\text {RT-check}(wt)\) algorithm as shown in Fig. 4, and assume that it is “equivalent” to \(\text {check}(wt)\), i.e., the two algorithms have the same output distribution.
Since the MT algorithm and \(\text {check}(wt)\) are “equivalent” to their RT-based counterparts respectively, to prove (b), we only need to show that,
(b’)
\(\Pr [wt\!=\! f_\textsf{WT}(\text {some prefix of} \ \varLambda \ \text {of RT-MT})] \le \Pr [\text {RT-check}(wt)\text { outputs }1]\).
Note that the first lines of the RT-MT algorithm and \(\text {RT-check}(wt)\) are the same, and all other parts of these two programs are non-probabilistic. Thus, we couple the random sources of the RT-MT algorithm and \(\text {RT-check}(wt)\), or rather, let the first lines of these two programs generate the same \(RT\). Then it remains to prove that, for any \(RT\), if \(wt\) can be generated from the RT-MT algorithm using this \(RT\), then \(\text {RT-check}(wt)\) with the same \(RT\) must output 1.
The proof is based on the following observation. If \(wt\) can be generated from the RT-MT algorithm using \(RT\), then in retrospect \(RT\) must have some crucial properties, and these properties will make \(\text {RT-check}(wt)\) output 1. More precisely, for all events \(\eta _j\) in \(wt\), at the time \(\eta _j\) is chosen in the execution of the RT-MT algorithm, it must hold under the current assignment formed by some of \(RT\)’s entries. Then, during the execution of \(\text {RT-check}(wt)\), when the program tests \(\eta _j\), the test passes because the current assignment must be formed by (almost) the same entries of \(RT\).
Stage 3. Finally, Moser and Tardos prove that,
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-91118-7_15/MediaObjects/648518_1_En_15_Equ4_HTML.png
(4)
It can be proved in a purely mathematical (i.e. program-independent) yet simple way, as pointed out by Srinivasan [58].
Combining all three stages above, Moser and Tardos obtain (1):
$$\begin{aligned} \mathbb {E}[|\varLambda |] & = \sum _{wt} \Pr [wt= f_\textsf{WT}(\text {some prefix of} \ \varLambda )] &\text {Stage 1, (2)}\\ & \le \sum _{wt} p(wt) &\text {Stage 2, (3)}\\ & \le r_\textsf{EL}. &\text {Stage 3, (4)} \end{aligned}$$
Two parts in Moser and Tardos’s reasoning tha t need more careful formalization.
First, Moser and Tardos restate their ultimate proof goal as (1) using \(|\varLambda |\), the length of the execution log \(\varLambda \). However, their restatement is ambiguous, since without defining the program semantics, it is unclear how programs are executed and generate execution logs. Similar ambiguity arises when stating those subgoals that also involve quantities related to \(\varLambda \), e.g. (2) and (3).
Second, Moser and Tardos’s original proof of Stage 2 is far from rigorous. To prove (b), they assume that the MT algorithm and \(\text {check}(wt)\) are “equivalent” to their RT-based variants, but they did not strictly define and prove the “equivalences”. Besides, they did not give a rigorous proof of (b’) with these RT-based variants strictly defined.
In the next subsections, we show how we formally state and verify Moser and Tardos’s result. We illustrate the proof path in Fig. 5, which is also explained below.

2.2 Stating Proof Goals Using Distribution-Based Semantics

To formally state Moser and Tardos’s ultimate proof goal, we must formulate the program semantics and the expected total number of iterations (or equivalently, the expected length of the execution log \(\varLambda \)).
We use a classic distribution-based semantics as the formal program semantics. This semantics (and other equivalent semantics, e.g. the probabilistic \(\textsf{wp}\)-semantics [45, 48] and Kozen’s “Semantics 2” [44]) is commonly used in the literature of probabilistic program verification (e.g. [3, 7, 21, 45, 48]). It interprets the execution result of a program \(C\) as a sub-distribution \(\mu \) over states. For any state \(\sigma \), this final state sub-distribution \(\mu \) specifies the probability that the program \(C\) terminates at \(\sigma \).
For specifying the expected total number of iterations, we introduce a fresh program variable \( cnt \) that records the number of iterations. Our code of the MT algorithm, \(C_\textsf{MT}( cnt )\), sets \( cnt \) to zero at the beginning, and increments it in each iteration of the outer loop. Consequently, when \(C_\textsf{MT}( cnt )\) terminates, the value of \( cnt \) is the total number of iterations.
Now, our proof goal can be stated as the following total correctness Hoare triple (assuming that the Erdős-Lovász condition holds on the probability space):
$$\begin{aligned} \vDash \left[ \textbf{true}\right] C_\textsf{MT}( cnt ) \left[ \mathbb {E}[ cnt ] \le r_\textsf{EL}\right] . \end{aligned}$$
(5)
Informally it says, the execution of \(C_\textsf{MT}( cnt )\) in the distribution-based semantics almost surely terminates (i.e., terminates with probability 1), and the expectation of the value of \( cnt \) (represented as \(\mathbb {E}[ cnt ]\)) at the final state sub-distribution is no greater than \(r_\textsf{EL}\). The goal is shown on the top of Fig. 5.
Fig. 5.
Our proof path of Moser and Tardos’s result, where \(\textbf{q}_1 = \textsf{Gen}(wt, cnt , K)\) and \(\textbf{q}_2 = \textsf{Succ}\)
For proving (5), we follow the original proof. That is, we formulate the subgoals in the three stages in Sec. 2.1 using distribution-based semantics, and then prove them. However, we encounter a challenge when formulating (2) and (3).
Challenge: Handling infinite execution traces. The problem arises when formulating the probability (6), which appears in both (2) and (3).
$$\begin{aligned} \Pr [wt= f_\textsf{WT}(\text {some prefix of} \ \varLambda )] \end{aligned}$$
(6)
Let \(\mu \) be the final state sub-distribution of \(C_\textsf{MT}( cnt )\). Then, it is challenging to formulate (6) using \(\mu \). Note that (6) can be positive even when \(C_\textsf{MT}( cnt )\) never terminates. But if we simply define (6) as the probability of some event on \(\mu \), this probability must be 0 if \(C_\textsf{MT}( cnt )\) never terminates, since \(\mu \) is now a null sub-distribution (which specifies that \(C_\textsf{MT}( cnt )\) terminates at \(\sigma \) with probability 0 for any \(\sigma \)). Other definition attempts using \(\mu \) may also fail.
The difficulty in formulating (6) lies in the following facts. On the one hand, (6) is the total probability of \(C_\textsf{MT}( cnt )\)’s possibly infinite execution traces on which \(wt= f_\textsf{WT}(\text {some prefix of} \ \varLambda )\) holds. This is a complex property that may involve only some of \(C_\textsf{MT}( cnt )\)’s infinite traces. On the other hand, distribution-based semantics can only express certain simple properties of infinite traces, and thus cannot express (6). From \(\mu \), all we know about \(C_\textsf{MT}( cnt )\)’s infinite traces is their overall probability \(1 - |\mu |\), where \(|\mu |\) is the weight of \(\mu \) (see Sec. 3.1).
One should not simply rule out infinite traces by strengthening (2) and (3) to include almost sure termination of \(C_\textsf{MT}( cnt )\), since in Sec. 2.1 the termination has not been derived until the ultimate goal is fully proved (also, it is not easy to prove the termination alone, as discussed in Sec. 7).

2.3 Proof Recipe 1: Loop Truncation

We circumvent the aforementioned challenge by proposing loop truncation. Our idea is to do a code transformation on loops, so that the codes after transformation do not generate infinite traces. For the main loop in \(C_\textsf{MT}( cnt )\), our transformation introduces a loop bound \(K\) whose value is an arbitrary natural number, and turns the original loop \(\textbf{while}\;({b})\;\textbf{do}\;{C}\) into a set of truncated loops https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-91118-7_15/MediaObjects/648518_1_En_15_Figa_HTML.gif . Since we increment \( cnt \) in the loop body \(C\), each truncated loop \(\textbf{while}\;({b\wedge cnt < K})\;\textbf{do}\;{C}\) terminates in at most K rounds, and thus can only generate finite execution traces.
Soundness of this transformation can be captured by Lem. 1 below (we will show the more general form in Thm. 2 in Sec. 5.1). It says, the original loop guarantees almost sure termination and its expected total iteration number is bounded by r, as long as all the truncated loops terminate and their expected total iteration numbers have the same upper bound r. Here \(\left\lceil { cnt \ge 0}\right\rceil \) says, \( cnt \), the number of iterations, is always non-negative after \(\textbf{while}\;({b\wedge cnt < K})\;\textbf{do}\;{C}\)’s execution. Without this condition the transformation is unsound.
Lemma 1
For all \(P, b, C, r\), if
$$\begin{aligned} \forall K\in \mathbb {N}.\ \ \vDash \left[ P\right] \textbf{while}\;({b\wedge cnt < K})\;\textbf{do}\;{C} \left[ \mathbb {E}[ cnt ] \le r \wedge \left\lceil { cnt \ge 0}\right\rceil \right] , \end{aligned}$$
then \(\vDash \left[ P\right] \textbf{while}\;({b})\;\textbf{do}\;{C} \left[ \mathbb {E}[ cnt ] \le r\right] \).
Using this transformation, we can reduce (5) to proving the total correctness of \(C'_\textsf{MT}( cnt , K)\) for all \(K\), where \(C'_\textsf{MT}( cnt , K)\) is the resulting code after transforming the main loop of \(C_\textsf{MT}( cnt )\) to a truncated one. That is, we prove (7) for all \(K\).
$$\begin{aligned} \vDash \left[ \textbf{true}\right] C'_\textsf{MT}( cnt , K) \left[ \mathbb {E}[ cnt ] \le r_\textsf{EL} \wedge \left\lceil { cnt \ge 0}\right\rceil \right] \end{aligned}$$
(7)
We show this as Step https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-91118-7_15/MediaObjects/648518_1_En_15_Figb_HTML.gif in Fig. 5. The double arrow represents logical implication. Then we can prove (7) following Moser and Tardos’s proof ideas explained in Sec. 2.1. We formulate subgoals (2) and (3) for \(C'_\textsf{MT}( cnt , K)\); however, we will not encounter the aforementioned challenge, since \(C'_\textsf{MT}( cnt , K)\) does not have infinite execution traces.
Serving as a proof method for PAST. Lem. 1 is itself a general proof method for positive almost sure termination (PAST) [11], whenever we use \( cnt \) to record the number of program steps. The PAST property says, the program terminates not only almost surely, but also within finite number of steps in expectation. We give an example in Ex. 1 in Sec. 5.1.

2.4 Proof Recipe 2: Resampling-Table-Based Coupling

Following the ideas in Sec. 2.1, we prove (7) in three stages. The most challenging part is proving (b) in Stage 2, which is an inequality between probabilities involving two programs.
We first formally specify the inequality. To this end, we introduce the tuple \(\vDash \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}\). Here P is a predicate specifying state distributions \(\mu \), while \(\textbf{q}_1\) and \(\textbf{q}_2\) are predicates over states \(\sigma \). The tuple says that, the probability of \(\textbf{q}_1\) holding at the terminating states of \(C_1\) is not greater than the probability of \(\textbf{q}_2\) holding at the terminating states of \(C_2\), where \(C_1\) and \(C_2\)’s executions start from the same \(\mu \) satisfying P and use the distribution-based semantics. Then, we can formulate (b) for \(C'_\textsf{MT}( cnt , K)\) and \(\text {check}(wt)\) as follows.
$$\begin{aligned} \vDash \{\textbf{true}\} C'_\textsf{MT}( cnt , K) \le \text {check}(wt) \{\textsf {Gen}(wt, cnt , K), \ \textsf {Succ}\} \end{aligned}$$
(8)
Here \(\textsf {Gen}(wt, cnt , K)\) roughly says that \(wt\) can be generated and is well-formed with respect to \( cnt \) and \(K\). The predicate \(\textsf {Succ}\) says that the output \( succ \) is 1. See Step https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-91118-7_15/MediaObjects/648518_1_En_15_Figc_HTML.gif in Fig. 5.
Following Moser and Tardos’s proof in Sec. 2.1, we introduce the RT-MT algorithm (now with a truncated loop) and the \(\text {RT-check}(wt)\) algorithm. We need to give strict definitions of these variants, and to prove that they are indeed equivalent to the original \(C'_\textsf{MT}( cnt , K)\) and \(\text {check}(wt)\) respectively.
Resampling-table-based semantics. Instead of introducing the RT-MT algorithm and the \(\text {RT-check}(wt)\) algorithm with explicit statements for generating the \(RT\) and accessing it, our approach is to keep the program syntax unchanged but re-interpret the code using a new semantics. Our \(RT\) is a built-in structure of the new semantics, and it is randomly generated before programs start execution.
More specifically, we re-interpret (8) using the novel RT-based semantics. In this semantics, we let a program execute with a resampling table \(RT\), which stores all sampling results of the program in advance, and serves as an oracle for the sampling statements in the program. Each sampling statement is interpreted as a query to \(RT\). So this semantics is deterministic given a specific \(RT\).
Our RT-based semantics is equivalent to the classic distribution-based semantics explained in Sec. 2.2. By specifying and proving the semantics equivalence, we essentially show that all programs (including the MT algorithm and \(\text {check}(wt)\) in Sec. 2.1) are “equivalent” to their RT-based variants.
Based on the semantics equivalence, we can show the equivalence between \(\vDash \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}\) and \(\vDash _\textrm{RT} \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}\). The latter specifies the same relational property as the former but uses the RT-based semantics for execution. See Step https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-91118-7_15/MediaObjects/648518_1_En_15_Figd_HTML.gif in Fig. 5.
Resampling-table-based coupling. Our proof recipe reduces the relational verification for \(\vDash _\textrm{RT} \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}\) to unary verification of each of \(C_1\) and \(C_2\) in the RT-based semantics.
Specifically, we couple the random sources of \(C_1\) and \(C_2\), i.e. let them use the same \(RT\) in their executions. We prove: for all \(RT\), if \(C_1\) using \(RT\) terminates on a state satisfying \(\textbf{q}_1\), then \(C_2\) using the same \(RT\) must also terminate on a state satisfying \(\textbf{q}_2\).
To prove this, we introduce an intermediate assertion \(\textbf{R}\) to describe what kind of \(RT\) can make \(\textbf{q}_1\) hold after the execution of \(C_1\). Usually \(\textbf{R}\) specifies that “some entries in \(RT\) have some properties”. With \(\textbf{R}\), we can split the goal into the following two subgoals:
  • For all \(RT\), if \(C_1\) using \(RT\) terminates at a state satisfying \(\textbf{q}_1\), then in retrospect \(RT\) must satisfy \(\textbf{R}\). This is formulated as the Hoare-triple
    $$\begin{aligned} \vDash _\text {RT} \left\{ \, \cdots \right\} C_1 \left\{ \textbf{q}_1 \Rightarrow \textbf{R}\right\} . \end{aligned}$$
    (9)
    The post-condition reflects this retrospective reasoning. We omit the pre-condition, which usually degenerates to a regular state assertion. Then we only need classical (non-probabilistic) Hoare-style proofs for the Hoare triple.
  • Starting with any \(RT\) satisfying \(\textbf{R}\), the execution of \(C_2\) must terminate at a final state satisfying \(\textbf{q}_2\), that is,
    $$\begin{aligned} \vDash _\text {RT} \left[ \, \cdots \wedge \textbf{R}\right] C_2 \left[ \textbf{q}_2\right] . \end{aligned}$$
    (10)
    Here \(\textbf{R}\) is in the precondition. We omit the rest parts of the precondition.
Note that the first subgoal (9) only needs to be partial correctness. It says, for any execution of \(C_1\), if it terminates and the final state satisfies \(\textbf{q}_1\), \(RT\) must satisfy \(\textbf{R}\). Then the total correctness of \(C_2\) (the second subgoal (10)) says, starting from the same \(RT\), \(C_2\) terminates at a final state satisfying \(\textbf{q}_2\). This way we can prove that the probability of \(\textbf{q}_1\) at the end of \(C_1\) is not greater than the probability of \(\textbf{q}_2\) at the end of \(C_2\). Step https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-91118-7_15/MediaObjects/648518_1_En_15_Fige_HTML.gif in Fig. 5 shows this reduction of the relational reasoning to unary proofs of the two programs separately.
Our reasoning above benefits from a key novelty of our RT-based semantics with respect to existing random-source-based semantics (e.g. Kozen’s “Semantics 1” [44] and those in [10, 17]). That is, our \(RT\) is an immutable structure that never changes during program execution. In particular, used samples are not popped out of \(RT\). Therefore the assertion \(\textbf{R}\) derived from the post-condition of (9) must also hold over the \(RT\) at the beginning of the execution. So we can use it in the precondition in (10).
Finding such an \(\textbf{R}\) is not difficult in many cases, especially when verifying ALLLs. We give another example in Sec. 5.2.

3 Preliminaries

In this section, we review some fundamentals of probability theory in two stages. We first introduce some basics of discrete probability theory without mentioning their measure-theoretic extensions, serving as the foundation of our distribution-based semantics in Sec. 4.1. Then we turn to the measure-theoretic probability theory, which forms the basis of our RT-based semantics in Sec. 4.2.

3.1 Discrete Probability Theory

We use notations from [3, 21]. A (discrete) sub-distribution over a set A is defined as a function \(\mu : A \rightarrow [0, 1]\) that satisfies the following two conditions: (1) the support of \(\mu \), denoted by \( supp (\mu ) = \{a \in A : \mu (a) > 0\}\), is countable; (2) \(|\mu | \le 1\), where \(|\mu | = \sum _{a \in A} \mu (a)\) is \(\mu \)’s weight.
A sub-distribution \(\mu \) is called a distribution if \(|\mu | = 1\). We denote by \(\mathbb{S}\mathbb{D}_{A}\) all of the sub-distributions over A, and by \(\mathbb {D}_{A}\) all of the distributions over A. We write \(\Pr _{a \sim \mu }[E(a)]\), which is defined as \(\sum _{a\in A : E(a)} \mu (a)\), for the probability of \(E : A \rightarrow \textrm{Prop}\) on the sub-distribution \(\mu \). We write \(\mathbb {E}_{a \sim \mu }[V(a)]\), which is defined as \(\sum _{a\in A} \mu (a) \cdot V(a)\), for the expected value of \(V : A \rightarrow \mathbb {R}\) on \(\mu \).
For an infinite sequence \(\vec{\mu }\), we define \(\lim \vec{\mu }\) as the sub-distribution \(\mu \) such that \(\lim _{n\rightarrow \infty } \sum _{a \in A} | \vec{\mu }[n](a) - \mu (a)| = 0\). One can prove that such a \(\mu \) is unique if it exists, otherwise we leave \(\lim \vec{\mu }\) undefined.
For \(\mu \in \mathbb{S}\mathbb{D}_{A}\) and function \(f \in A \rightarrow \mathbb{S}\mathbb{D}_{B}\), we define the expected sub-distribution \(\mathbb {E}_{a \sim \mu } \{f(a)\} \in \mathbb{S}\mathbb{D}_{B}\) as \(\lambda b. \ \sum _{a \in A} \mu (a) \cdot f(a)(b)\).

3.2 Measure-Theoretic Probability Theory

A set of subsets of a set \(\varOmega \), say \(\mathcal {F}\), is a \(\sigma \)-algebra on \(\varOmega \) if it contains \(\varOmega \) and is closed under complement and countable union. A measurable space is defined as a pair \((\varOmega , \mathcal {F})\), where \(\mathcal {F}\) is a \(\sigma \)-algebra on \(\varOmega \). We call \(\varOmega \) the sample space.
A function \(\mathcal {M}: \mathcal {F}\rightarrow [0, \infty )\) is called a (finite) measure on measurable space \((\varOmega , \mathcal {F})\) if it satisfies \(\mathcal {M}(\varnothing ) = 0\) and is countably additive. A measure space is defined as a triple \((\varOmega , \mathcal {F}, \mathcal {M})\), where \(\mathcal {M}\) is a measure on measurable space \((\varOmega , \mathcal {F})\). \((\varOmega , \mathcal {F}, \mathcal {M})\) is called a probability space if \(\mathcal {M}(\varOmega ) = 1\).
A discrete distribution \(\mu \) can be lifted to a measure-theoretic probability space \((\varOmega , \mathcal {F}, \mathcal {M})\), where \(\varOmega = supp (\mu )\), \(\mathcal {F}= \mathcal {P}( supp (\mu ))\), and \(\mathcal {M}(A) = \sum _{a \in A} \mu (a)\) for all \(A \subseteq supp (\mu )\).
Let \(\{(\varOmega _i, \mathcal {F}_i, \mathcal {M}_i) : i \in I\}\) be a collection of probability spaces for some possibly infinite set I. We denote by \(\prod _{i \in I} (\varOmega _i, \mathcal {F}_i, \mathcal {M}_i)\) the product probability space of \(\{(\varOmega _i, \mathcal {F}_i, \mathcal {M}_i) : i \in I\}\), defined as \((\varOmega , \mathcal {F}, \mathcal {M})\), where: (1) \(\varOmega = \prod _{i \in I} \varOmega _i\), (2) \(\mathcal {F}\) is the smallest \(\sigma \)-algebra containing all \(\prod _{i\in I} A_i\) such that \(A_i \in \mathcal {F}_i\) and \(\{j : A_j \subsetneq \varOmega _j\}\) is finite, and (3) \(\mathcal {M}(\prod _{i\in I} A_i) = \prod _{j \in J} \mathcal {M}_j(A_j)\) when \(A_i \in \mathcal {F}_i\) and \(J = \{j : A_j \subsetneq \varOmega _j\}\) is finite. The above \((\varOmega , \mathcal {F}, \mathcal {M})\) exists and is unique (see [54]).
Fig. 6.
Syntax of the programming language

4 Two Semantics of the Language

In this section we define the programming language. We first define the language syntax, and then give two equivalent semantics in Sec. 4.1 and Sec. 4.2.
Global parameters. Throughout the paper, we assume four global parameters for programs: \(N\), \(M\), \(\mathcal {D}\) and \(\mathcal {E}\). They are viewed as meta-variables, and can be configured differently for different programs.
As defined at the top of Fig. 6, \(\mathcal {D}\) and \(\mathcal {E}\) represent the “\(N\) distributions” and “\(M\) events” in ALLL’s setting (see Sec. 2.1) respectively. Each event \(\eta _j\) in \(\mathcal {E}\) takes \(N\) reals as input, and outputs a boolean value. Each \(\kappa _i\) in \(\mathcal {D}\) is a distribution over reals, and is associated with the i-th argument of every \(\eta _j\) in \(\mathcal {E}\).
Fig. 6 also gives important notations related to \(\mathcal {D}\) and \(\mathcal {E}\), which are used in the statements and the formal proofs of ALLL-related results. \(\textrm{vbl}({\eta }, {j})\) holds iff the event \(\eta \) depends on its j-th argument.2 \(\textrm{P}(\eta )\) is the probability of the event \(\eta \) occurring, given that its \(N\) arguments are independently distributed according to \(\mathcal {D}[1], \dots , \mathcal {D}[N]\) respectively. \(\varGamma ({j})\) is the index set of events that depend on some argument that \(\mathcal {E}[j]\) also depends on, except \(\mathcal {E}[j]\) itself.
Syntax of the programming language. As shown at the bottom of Fig. 6, we use customized program statements, expressions and boolean expressions to formulate ALLLs’ code. We write \(x := \textsf{Sample}({e})\) to sample from the distribution \(\mathcal {D}[e]\) and store the result in the program variable x. The boolean expression \(\textsf{hold}({e},{e_1, \dots , e_N})\) tests if the event \(\mathcal {E}[e]\) holds with arguments \(e_1, \dots , e_N\). Moreover, \(\textsf{vbl}({e_1}, {e_2})\) tests if the event \(\mathcal {E}[e_1]\) depends on its \(e_2\)-th argument.
We use arrays to formulate the \(N\) variables \(X_1, \dots , X_N\) in ALLLs. We use \(a[e]\) to represent the element of array a with index \(e\), and use \(a[e_1] := e_2\) for the in-place update.
We use lists to formulate the execution logs in ALLLs. To access and manipulate the execution log, we introduce list-related expressions. We use \({e_1}\langle {e_2}\rangle \) for the \(e_2\)-th element of list \(e_1\), use \(\textsf{len}({e})\) for the length of list \(e\), and use \(\textsf{app}({e_1}, {e_2})\) for appending an element \(e_2\) to list \(e_1\).
Using the syntax in Fig. 6, we can formulate the code of the MT algorithm, \(C_\textsf{MT}( cnt )\), in Fig. 12 in Sec. 6.
States and state distributions. As defined below, a state \(\sigma \) maps each program variable in \( PVar \) to some value \(v\). For simplicity, we view each array element as a program variable. A value \(v\) is either a real \(r\) or a list \(\varLambda \) of natural numbers.
\( \begin{array}{rcclcrccl} (\textit{State}) & \ \ \sigma & \ \ \in & \ \ \textit{PVar} \rightarrow \textit{Val} & \qquad & (\textit{DState}) & \ \ \mu & \ \ \in & \ \ \mathbb {D}_{\textit{State}} \end{array} \)
State distributions \(\mu \) are used to specify that, with probability \(\mu (\sigma )\), the program state before or after the execution of a program is exactly \(\sigma \). We write \(\llbracket {e} \rrbracket _{\sigma }\) and \(\llbracket {b} \rrbracket _{\sigma }\) for the evaluation of \(e\) and \(b\) in a state \(\sigma \).
Below we give two equivalent probabilistic semantics of our language, a classic distribution-based semantics and an RT-based semantics. We use n for natural numbers and \(p, r\) for reals. Throughout this paper, we assume that the program’s execution does not get stuck, and the evaluation of expressions does not abort.

4.1 Distribution-Based Semantics

Following [3, 21], we first define the semantic function \(\llbracket {C} \rrbracket (\sigma ) \in \mathbb{S}\mathbb{D}_{ State }\). Here \(\llbracket {C} \rrbracket (\sigma )(\sigma ')\) represents the probability of \(C\)’s execution from \(\sigma \) finally reaching \(\sigma '\). For example, for the sampling operation \(x := \textsf{Sample}({e})\) that samples from the distribution \(\mathcal {D}[i]\) and gets r as the result, the probability is \(\mathcal {D}[i](r)\). That is,
$$ \llbracket {x := \textsf{Sample}({e})} \rrbracket (\sigma )(\sigma ') = {\left\{ \begin{array}{ll} \mathcal {D}[i](r) & \text {if} \ \llbracket {e} \rrbracket _\sigma = i \in [1, N] \ \text {and} \ \sigma ' = \sigma \{x \leadsto r\}\\ 0 & \textrm{otherwise}\end{array}\right. }. $$
We give the full definition in [46]. We further define \(\llbracket {C} \rrbracket (\mu ) \in \mathbb{S}\mathbb{D}_{ State }\) (where \(\mu \in DState \)) by lifting \(\llbracket {C} \rrbracket (\sigma )\), using the expected sub-distribution in Sec. 3.1:
$$\begin{aligned} \llbracket {C} \rrbracket (\mu ) \ \ \triangleq \ \ \mathbb {E}_{\sigma \sim \mu } \{\llbracket {C} \rrbracket (\sigma )\}. \end{aligned}$$

4.2 Resampling-Table-Based Semantics

Informally, in our new RT-based semantics, a program first randomly generates a resampling table (RT); with this table, the program then starts its deterministic execution. Below we first give the definition of an RT, and specify how the semantics “generates” an RT. Then we define an RT-based operational semantics, which describes the deterministic execution of the program with a certain RT. Finally, we combine all the above definitions into the RT-based semantic functions \(\llbracket {C} \rrbracket _\textrm{RT}(\sigma )\) and \(\llbracket {C} \rrbracket _\textrm{RT}(\mu )\).
The resampling table is defined as follows.
A resampling table \(RT\) is a matrix with size \(N\times \infty \). An example of such table is shown in Fig. 7, where \(N= 2\) and \(RT[i][j] = r_{ij}\) for \(i \in [1, 2]\) and \(j \in Nat \). Intuitively, as described in Sec. 2.1, the i-th row of \(RT\) stores the ahead-of-time samples from the distribution \(\mathcal {D}[i]\). Additionally, we require that \(\text {generable}({RT})\) holds. That is, every entry in the i-th row of \(RT\) must be able to be sampled from the distribution \(\mathcal {D}[i]\). This accords with the intuition of the RT.
Fig. 7.
A resampling table \(RT\) with \(N= 2\)
We specify how the semantics “generates” an RT. To this end, we define the probability space of all (generable) RTs as \((\varOmega , \mathcal {F}, \mathcal {M})\), and thus \(\mathcal {M}(\{RT\mid \cdots \,\})\) represents the probability of some RT from set \(\{RT\mid \cdots \,\}\) being generated. The definition is shown below:
$$\begin{aligned} (\varOmega , \mathcal {F}, \mathcal {M}) \ \ \triangleq \ \ \prod _{(i, j) \in [1, N] \times Nat } (\varOmega _{i,j}, \mathcal {F}_{i,j}, \mathcal {M}_{i,j}), \end{aligned}$$
where \((\varOmega _{i,j}, \mathcal {F}_{i,j}, \mathcal {M}_{i,j})\) is lifted from the discrete distribution \(\mathcal {D}[i]\) (see Sec. 3.2). Note that \(\varOmega = RTable \), i.e., the sample space is indeed the set of all RTs.
Below we explain our construction of \((\varOmega , \mathcal {F}, \mathcal {M})\). The probability space of all RTs is the infinite product of probability spaces of all entries, since an RT is generated by filling all of its entries by an infinite number of independent samples. For the entry in row i and (arbitrary) column j, its probability space is lifted from \(\mathcal {D}[i]\), from which the entry is sampled.
We then define the RT-based operational semantics, with selected semantics rules shown in Fig. 8. The definition is almost standard, except that it interprets sampling operations to table queries. Recall that, when the program performs a sampling from the distribution \(\mathcal {D}[i]\), it reads the leftmost unread entry in the i-th row of \(RT\) as the result. To keep track of these entries, we maintain the heads \(\iota \) in the program configuration to record their column numbers.
$$\begin{aligned} (\textit{Heads}) \ \ \iota \ \ {:}{:}=\ \ (n_1, \dots , n_N) \end{aligned}$$
\(\iota \) is an \(N\)-tuple. Its i-th component, \(\iota [i]\), represents the column number of the leftmost unread entry in the i-th row of \(RT\). Now, \(RT\vdash (C, \sigma , \iota ) \xrightarrow {\ }^{\!*} (C', \sigma ', \iota ')\) says that, starting from the program state \(\sigma \), with the leftmost unread entries of \(RT\) initially specified by \(\iota \), \(C\) deterministically executes to \(C'\) using \(RT\), where the result state is \(\sigma '\) and finally the leftmost unread entries in \(RT\) are specified by \(\iota '\). When the program performs a sampling from \(\mathcal {D}[i]\), it takes \(RT[i][\iota [i]]\) as the result and increments \(\iota [i]\). In other program steps, \(\iota \) remains unchanged.
Fig. 8.
RT-based operational semantics
Now the RT-based semantic functions are defined below, where \(\iota _\textsf{init}= (0, \dots , 0)\) represents the initial positions of heads.
Informally, the probability of \(C\)’s execution from \(\sigma \) finally reaching \(\sigma '\), say \(\llbracket {C} \rrbracket _\textrm{RT}(\sigma )(\sigma ')\), is the probability of some \(RT\), which satisfies the following property, being generated: starting from \(\sigma \), \(C\)’s execution using \(RT\) finally reaches \(\sigma '\). This property is formally stated as \(RT\vdash (C, \sigma , \iota _\textsf{init}) \xrightarrow {\ }^{\!*} (\textbf{skip}, \sigma ', \texttt {\_})\), with the help of the operational semantics.
Lem. 2 shows that the RT-based semantics is indeed well-defined.
Lemma 2
For all \(C, \sigma , \sigma ', \iota \), \( \{RT\mid RT\vdash (C, \sigma , \iota ) \xrightarrow {\ }^{\!*} (\textbf{skip}, \sigma ', \texttt {\_})\} \in \mathcal {F}\).
To conclude this subsection, we give the following theorem, which states the equivalence between the distribution-based semantics defined in Sec. 4.1 and the RT-based semantics.
Theorem 1 (Semantics Equivalence)
For all \(C\) and \(\mu \), \(\llbracket {C} \rrbracket (\mu ) = \llbracket {C} \rrbracket _\textrm{RT}(\mu )\).

5 Proof Recipes

Our ultimate proof goals are formulated as total correctness Hoare triples \(\vDash [P] C[Q]\) using the distribution-based semantics of Sec. 4.1.
Fig. 9.
Assertions over states and state distributions
Before showing the definition of \(\vDash [P] C[Q]\), we first define assertions in Fig. 9, following the assertion language in [21]. We write \(\textbf{p}, \textbf{q}, \textbf{r}\) for non-probabilistic assertions on program states, and PQR for probabilistic assertions on state distributions. The assertion \(\left\lceil {\textbf{q}}\right\rceil \) holds on the distribution \(\mu \) iff \(\textbf{q}\) holds on all states in the support of \(\mu \). We write \(\textbf{true}\) as a shorthand for \(\left\lceil {\textrm{true}}\right\rceil \). The expression \(\Pr [\textbf{q}]\) represents the probability that \(\textbf{q}\) holds, and \(\mathbb {E}[e]\) represents the expected value of \(e\). The assertion \(\exists X. \, Q\) holds on \(\mu \), if Q holds on \(\mu '\) obtained by assigning some constant \(v\) to X in all states in \(\mu \) (here \(\delta \) gives the Dirac distribution).
Then, \(\vDash [P] C[Q]\) says that, starting from a state distribution satisfying P, \(C\)’s execution terminates with probability 1, and thus the sub-distribution of the result states is actually a state distribution, which satisfies Q. We show the definition in Def. 1.
Definition 1 (Total Correctness)
For all \(P, C, Q\), \(\vDash [P] C[Q]\) holds iff
$$ \forall \mu . \ \ \mu \vDash P \implies |\llbracket {C} \rrbracket (\mu )| = 1 \wedge \llbracket {C} \rrbracket (\mu )\vDash Q. $$
In the following subsections, we formalize our two proof recipes, loop truncation and RT-based coupling.

5.1 Loop Truncation

We have explained a specialized form of loop truncation in Lem. 1 in Sec. 2.3. Below we show the more general theorem (Thm. 2).
Theorem 2 (Loop Truncation)
For all \(P, b, C, \textbf{E}, Q, e\) and r, if
$$ \begin{array}{l} \forall K\in \mathbb {N}.\ \ \vDash \left[ P\right] \, \textbf{E}[\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}] \left[ Q \wedge \mathbb {E}[e] \le r \wedge \left\lceil {e\ge 0}\right\rceil \right] , \end{array} $$
\(\textbf{modbf}(\textbf{E}, e)\) and \(t\text{- } \textbf{closed}(Q)\), then \(\vDash [P] \, \textbf{E}[\textbf{while}\;({b})\;\textbf{do}\;{C}] \, [Q]\).
Here \(\textbf{E}\) is a program context, and \(\textbf{E}[\textbf{while}\;({b})\;\textbf{do}\;{C}]\) fills the hole in \(\textbf{E}\) with the loop \(\textbf{while}\;({b})\;\textbf{do}\;{C}\).
Thm. 2 says that, to prove total correctness of \(\textbf{E}[\textbf{while}\;({b})\;\textbf{do}\;{C}]\), we transform the code to \(\textbf{E}[\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}]\) with a specific \(e\). How to choose \(e\) is application-dependent. Usually we choose as \(e\) the loop counter incremented in the loop body, such as \( cnt \) in \(C_\textsf{MT}( cnt )\) (see Sec. 2.2 and Fig. 12). With an inappropriate \(e\), the first premise of the theorem may be invalid or still hard to prove, though how \(e\) is chosen does not affect the validity of the theorem.
In addition to \(e\), the first premise also asks users to find a common bound r (a real number) that can bound \(\mathbb {E}[e]\) at the end of \(\textbf{E}[\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}]\) for all \(K\). Usually the postcondition Q can help us find such an r. Besides the upper bound r, we require that evaluating \(e\) at the end of \(\textbf{E}[\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}]\) must result in a non-negative real number. These two bounds are crucial for ensuring almost sure termination of \(\textbf{E}[\textbf{while}\;({b})\;\textbf{do}\;{C}]\).
The second premise, \(\textbf{modbf}(\textbf{E}, e)\), rules out those contexts \(\textbf{E}\) that make \(\mathbb {E}[e] \le r\) hold at the end of \(\textbf{E}[\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}]\) vacuously, e.g. those that modify the program variables in \(e\) at the end of the context and make \(e= r\) hold. \(\textbf{modbf}(\textbf{E}, e)\) syntactically restricts \(\textbf{E}\) such that the variables in \(e\) can be modified in \(\textbf{E}\) only before the code in the hole of \(\textbf{E}\) is executed. For example, \(\textbf{modbf}(C'; [\,], e)\) holds for any \(C'\) and \(e\), since only \(C'\), which is executed before the hole, can modify the variables in \(e\) in the context. Similarly, \(\textbf{modbf}([\,], e)\) holds. We give the definition of \(\textbf{modbf}(\textbf{E}, e)\) in [46].
The third premise, \(t\text{- } \textbf{closed}(Q)\), is for deriving the postcondition Q of \(\textbf{E}[\textbf{while}\;({b})\;\textbf{do}\;{C}]\) from the same Q of \(\textbf{E}[\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}]\). We say an assertion Q is t-closed [3], denoted by \(t\text{- } \textbf{closed}(Q)\), if for all infinite state distribution sequences \(\vec{\mu }\), if Q holds on \(\vec{\mu }[i]\) for each i and \(\lim \vec{\mu } = \mu \), then Q holds on \(\mu \). Many assertions are t-closed. For example, we can prove that \(t\text{- } \textbf{closed}(\mathbb {E}[e] \le r \wedge \left\lceil {e\ge 0}\right\rceil )\) always holds for any \(e\) and any r.
Since \(\textbf{modbf}([\,], e)\) and \(t\text{- } \textbf{closed}(\mathbb {E}[e] \le r \wedge \left\lceil {e\ge 0}\right\rceil )\) both hold, Lem. 1 can be derived from Thm. 2.
Proof
Sketch of Thm. 2. Due to the space limit, below we only show the case of \(\textbf{E} = [\,]\). We prove 1) almost sure termination and 2) the establishment of the postcondition Q, respectively.
For 1), assuming that \(\textbf{while}\;({b})\;\textbf{do}\;{C}\) terminates with probability \(p < 1\), we derive a contradiction. From the premise we know \(\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}\) almost surely terminates, so it terminates in a state where \(e\ge K\) with probability at least \(1 - p\). Thus, by the semantics of \(\mathbb {E}[e]\) (and since the value of \(e\) is non-negative), we know \(\mathbb {E}[e] \ge (1 - p) K\) holds at the end of \(\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}\). Therefore, we can find a sufficiently large \(K\) such that \(\mathbb {E}[e] \ge (1 - p) K> r\), which contradicts the premise.
For 2), the key is proving that, for all \(\mu \models P\),
\( \llbracket {\textbf{while}\;({b})\;\textbf{do}\;{C}} \rrbracket (\mu ) = \lim _{K\rightarrow \infty } \llbracket {\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}} \rrbracket (\mu ). \)
Then, we can establish Q for \(\textbf{while}\;({b})\;\textbf{do}\;{C}\), from \(t\text{- } \textbf{closed}(Q)\) and that Q is the postcondition for each \(\textbf{while}\;({b\wedge e< K})\;\textbf{do}\;{C}\).    \(\square \)
We apply Thm. 2 for the verification of the MT algorithm and its variants in Sec. 6. Here we show another example beyond ALLLs, which is taken from [41] (with slight modifications).
Example 1
Let \(N= 1\) and \(\mathcal {D}[1] = \{(0, \frac{1}{2}), (1, \frac{1}{2})\}\). The code \(C_\text {flip}\) is defined as \(\textbf{while}\;({y = 1})\;\textbf{do}\;{\{\ y := \textsf{Sample}({1}); cnt := cnt + 1; \} }\). We prove:
$$\begin{aligned} \vDash [\left\lceil { cnt = 0 \wedge y = 1}\right\rceil ] \ C_\text {flip} \ [\mathbb {E}[ cnt ] \le 2]. \end{aligned}$$
(11)
Here \(C_\text {flip}\) repeatedly flips a fair coin by sampling from \(\mathcal {D}[1]\), until it gets heads (\(y=0\)). We use \( cnt \) to record the number of coin flips. Then our proof goal (11) says that \(C_\text {flip}\) almost surely terminates, and it flips at most twice in expectation.
To prove (11), by Thm. 2 (or Lem. 1), we only need to prove that, for all \(K\in \mathbb {N}\), \(\vDash [\left\lceil { cnt = 0 \wedge y = 1}\right\rceil ]\ C_\text {flip}'(K)\ [\mathbb {E}[ cnt ] \le 2 \wedge \left\lceil { cnt \ge 0}\right\rceil ]\), where \(C_\text {flip}'(K)\) is defined as \(\textbf{while}\;({y = 1 \wedge cnt < K})\;\textbf{do}\;{\{y := \textsf{Sample}({1}); cnt := cnt + 1; \} }\). We adapt the program logic Ellora [3] to complete the proof.

5.2 Resampling-Table-Based Coupling

As informally explained in Sec. 2.4, our RT-based coupling is for proving the relational tuple \(\vDash \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}\), an intermediate proof goal that appears in ALLLs’ verification. We show the formal definition of \(\vDash \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}\) in Def. 2. Note that in this definition we neither require nor assume the termination of \(C_1\) and \(C_2\)’s executions.
Definition 2 (Inequality between Probabilities)
For all \(P, C_1, C_2, \textbf{q}_1, \textbf{q}_2\), \(\vDash \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}\) holds iff
Our RT-based coupling reduces the verification of the relational tuple to proving unary properties of \(C_1\) and \(C_2\)’s executions in the RT-based semantics respectively (i.e. the subgoals (9) and (10) in Sec. 2.4). We show the formal theorem in Thm. 3.
Theorem 3 (RT-Based Coupling)
For all \(\textbf{p}, C_1, C_2, \textbf{q}_1, \textbf{R}, \textbf{q}_2\), if
  • \(\textbf{RTonly}({\textbf{R}})\);
  • \(\vDash _\textrm{RT} \{\textbf{p}\wedge \textsf{hdinit}\} C_1 \{\textbf{q}_1 \Rightarrow \textbf{R}\}\);
  • \(\vDash _\textrm{RT} [\textbf{p}\wedge \textbf{R}\wedge \textsf{hdinit}] C_2 [\textbf{q}_2]\);
then \(\vDash \{\left\lceil {\textbf{p}}\right\rceil \} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}.\)
We apply Thm. 3 for verifying ALLLs, which we will explain in Sec. 6. Below we explain Thm. 3 in four aspects: (1) requiring \(\left\lceil {\textbf{p}}\right\rceil \) as the precondition in the relational tuple; (2) the assertions \(\textbf{R}\), \(\textsf{hdinit}\) and the requirement \(\textbf{RTonly}({\textbf{R}})\); (3) the RT-based unary triples \(\vDash _\textrm{RT}\); and (4) its proof ideas. We also show another example beyond ALLLs, and briefly discuss an extension of Thm. 3 at the end.
Lifting state assertions as preconditions. The relational tuples we prove are in a restricted form, namely that the precondition P is in the form of \(\left\lceil {\textbf{p}}\right\rceil \), where \(\textbf{p}\) is an assertion over states. Recall that \(\left\lceil {\textbf{p}}\right\rceil \) holds over \(\mu \) iff \(\textbf{p}\) holds over any \(\sigma \) such that \(\sigma \in supp (\mu )\) (see Fig. 9). Therefore the precondition \(\left\lceil {\textbf{p}}\right\rceil \) says we are only interested in the executions of \(C_1\) and \(C_2\) with the initial states satisfying \(\textbf{p}\). So we can fill the omitted part of the two subgoals (9) and (10) with \(\textbf{p}\), and turn them into classical (deterministic) Hoare triples \( \vDash _\text {RT} \left\{ \textbf{p}\right\} C_1 \left\{ \textbf{q}_1 \Rightarrow \textbf{R}\right\} \) and \( \vDash _\text {RT} \left[ \textbf{p}\wedge \textbf{R}\right] C_2 \left[ \textbf{q}_2\right] . \)
Fig. 10.
Non-probabilistic assertions on RT-extended states
Assertions over RT-extended states. Thm. 3 requires us to find an “intermediate assertion” \(\textbf{R}\) that describes (and only describes) the (non-probabilistic) properties of the resampling table \(RT\). Since we need explicit reasoning about \(RT\), the assertions used in the classical reasoning of \(\vDash _\textrm{RT}\) actually specify \(RT\) and the heads \(\iota \) as well as the states \(\sigma \).
In Fig. 10, we define non-probabilistic assertions \(\textbf{P}, \textbf{Q}, \textbf{R}\) over the extended states \((\sigma , RT, \iota )\). Besides using \(\textbf{q}\) to describe \(\sigma \) in the extended states, we introduce RT-expressions to specify \(RT\) and \(\iota \). We use \(\textsf{RT}[E_1][E_2]\) to represent the entry at row \(E_1\) and column \(E_2\) of \(RT\), and use \(\textsf{hd}_{n}\) to represent the n-th head \(\iota [n]\), where \(n \in [1, N]\).
The assertion \(\textsf{hdinit}\) (defined as a shorthand in Fig. 10) says that all of the heads \(\iota \) point to the first column of \(RT\). It specifies the initial heads before program execution, so it appears in the preconditions of the two \(\vDash _\textrm{RT}\) triples in Thm. 3.
The requirement \(\textbf{RTonly}({\textbf{R}})\) (defined in Fig. 10) says that changing \(\sigma \) and/or \(\iota \) in the extended state does not affect whether \(\textbf{R}\) holds. That is, \(\textbf{R}\) describes \(RT\) only. One can check that \(\textbf{RTonly}({\textbf{R}})\) holds if \(\textbf{R}\) does not syntactically contain any free variables and \(\textsf{hd}_{n}\)’s.
RT-based unary triples. Now we can define the RT-based unary triples, \(\vDash _\textrm{RT} [\textbf{P}] C[\textbf{Q}]\) and \(\vDash _\textrm{RT} \{\textbf{P}\} C\{\textbf{Q}\}\). They are standard Hoare triples for total correctness and partial correctness respectively, using the RT-based operational semantics (in Fig. 8 of Sec. 4.2) for program execution.
Definition 3 (Total Correctness in RT-Based Operational Semantics)
For all \(\textbf{P}, C, \textbf{Q}\), \(\vDash _\textrm{RT} [\textbf{P}] C[\textbf{Q}]\) holds iff
$$\begin{aligned} &\forall \sigma , RT, \iota . \ \ (\sigma , RT, \iota ) \vDash \textbf{P}\, \implies \\ &\qquad \qquad \quad \exists \sigma ', \iota '. \ RT\vdash (C, \sigma , \iota ) \xrightarrow {\ }^{\!*} (\textbf{skip}, \sigma ', \iota ') \wedge (\sigma ', RT, \iota ') \vDash \textbf{Q}. \end{aligned}$$
Definition 4 (Partial Correctness in RT-Based Operational Semantics)
For all \(\textbf{P}, C, \textbf{Q}\), \(\vDash _\textrm{RT} \{\textbf{P}\} C\{\textbf{Q}\}\) holds iff
$$\begin{aligned} & \forall \sigma , RT, \iota , \sigma ', \iota '. \ \ (\sigma , RT, \iota ) \vDash \textbf{P}\wedge RT\vdash (C, \sigma , \iota ) \xrightarrow {\ }^{\!*} (\textbf{skip}, \sigma ', \iota ')\\ &\qquad \qquad \qquad \quad \implies (\sigma ', RT, \iota ') \vDash \textbf{Q}. \end{aligned}$$
For total correctness, Def. 3 says there exists a terminating execution of \((C, \sigma , \iota )\) under \(RT\). This essentially ensures the absence of non-terminating executions, because the RT-based operational semantics is deterministic.
We can use a classical Hoare-style program logic to prove the \(\vDash _\textrm{RT}\) triples. We show the logic in [46].
Proof ideas of the theorem. To prove Thm. 3, we need to bridge two gaps between the \(\vDash _\textrm{RT}\) triples in the premises and the \(\vDash \) tuple in the conclusion. First, the \(\vDash _\textrm{RT}\) triples use the RT-based semantics, while the \(\vDash \) tuple uses the distribution-based semantics. Second, the \(\vDash _\textrm{RT}\) triples are unary, while the \(\vDash \) tuple is relational.
The key to bridging the gaps is reduction through the following RT-based tuple as an intermediate form, which is the counterpart of Def. 2 in the RT-based semantics.
Definition 5 (Inequality between Pr. in RT-Based Semantics)
For all \(P, C_1, C_2, \textbf{q}_1, \textbf{q}_2\), \(\vDash _\textrm{RT} \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}\) holds iff
Lemma 3 shows the equivalence between the two relational tuples, which follows from the semantics equivalence (Thm. 1). This lemma bridges the first gap, and is interesting in its own right.
Lemma 3
For all \(P, C_1, C_2, \textbf{q}_1, \textbf{q}_2\),
$$ \vDash \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\} \iff \, \vDash _\textrm{RT} \{P\} C_1 \le C_2 \{\textbf{q}_1, \textbf{q}_2\}. $$
Our “intermediate assertion” \(\textbf{R}\) allows us to split the \(\vDash _\textrm{RT}\) relational tuple into two unary \(\vDash _\textrm{RT}\) triples, bridging the second gap.
Fig. 11.
The code \(C_\textrm{PRF}^\textsf{bad}\) in Ex. 2
Example 2
This example is adapted from an intermediate goal in [6]’s proof of the PRP/PRF switching lemma.3 Let \(k \ge 1\). For any \(n_1, \dots , n_k\), we prove that
$$\begin{aligned} \begin{array}{l} \vDash \{\left\lceil {\textsf{inp}}\right\rceil \} \, C_\text {PRF}^\textsf{bad} \le C_\text {PRF} \\ \quad \{ bad = 1, \ \ \exists X_1, X_2, Y. \ X_1 \ne X_2 \wedge \textsf{find}(L, (X_1, Y)) \wedge \textsf{find}(L, (X_2, Y))\}. \end{array} \end{aligned}$$
(12)
We show the code of \(C_\text {PRF}^\textsf{bad}\) in Fig. 11, and the code of \(C_\text {PRF}\) results from removing lines 2 and 6 from the figure. The assertion \(\textsf{inp}\) says that \(n_1, \dots , n_k\) are the inputs stored in \(x[1], \dots , x[k]\), which is defined as \(\bigwedge _{i \in [1, k]}. \; x[i] = n_i\).
By extending the programming language, we implement a map in the program variable L, which stores some key-value pairs. One can insert a pair into the map by writing \(\textsf{app}({L}, {(e_1, e_2)})\), and query for the existence of a key, a value or a pair by writing \(\textsf{findkey}(L, e)\), \(\textsf{findval}(L, e)\) or \(\textsf{find}(L, (e_1, e_2))\).
\(C_\text {PRF}^\textsf{bad}\) and \(C_\text {PRF}\) do the following: for \(n = x[1], \dots , x[k]\), the programs check if n has been inserted in L as a key; if not, they sample a value y from \(\mathcal {D}[1]\), and then insert the key-value pair (ny) into L; if y has been inserted in L as a value, \(C_\text {PRF}^\textsf{bad}\) marks \( bad \).
(12) then says that, the probability of \(C_\text {PRF}^\textsf{bad}\) terminating with \( bad = 1\) is no more than the probability of \(C_\text {PRF}\) terminating with two key-value pairs with the same value left in L.
To prove (12), we apply Thm. 3. We take \(\textbf{R}= \textsf{coll}\), where
$$\begin{aligned} \textsf{coll} \ \triangleq \ \bigvee _{0 \le i < j < |\{n_1, \dots , n_k\}|}. \ \textsf{RT}[1][i] = \textsf{RT}[1][j]. \end{aligned}$$
\(\textsf{coll}\) says that, there exist two identical entries in the first row of \(RT\), which are picked as samples in the executions of both \(C_\text {PRF}^\textsf{bad}\) and \(C_\text {PRF}\). Therefore \(\textsf{coll}\) specifies the kind of \(RT\) that can make \( bad = 1\) hold after the execution of \(C_\text {PRF}^\textsf{bad}\).
We can check that \(\textbf{RTonly}({\textsf{coll}})\) holds. Then, by applying Thm. 3, it remains to prove the following two unary \(\vDash _\text {RT}\) triples.
We prove them using a simple Hoare-style program logic.
An extension of RT-based coupling. In [46], we give another relational proof recipe that extends Thm. 3. It asks users to provide two intermediate assertions \(\textbf{R}_1\) and \(\textbf{R}_2\) for splitting the \(\vDash _\textrm{RT}\) relational tuple, and provides more flexibility for reasoning about inequalities between probabilities.

6 Case Studies

We show the usefulness of our proof recipes (Thm. 2 and Thm. 3) by verifying several representative existing results about ALLLs and a new result about the MT algorithm. Below we first give a brief survey of several important research lines on ALLLs. Then we summarize the existing ALLL-related results that we have verified, and show how we verify Theorem 1.2 of [50] as an example. Finally, we explain our new result about the MT algorithm.
Research lines of ALLLs. The MT algorithm is first proposed in [50], where the expected iteration number of the algorithm is bounded under the Erdős-Lovász condition [19, 57] and the Erdős-Spencer condition [20]. Following [50], some works [1, 31, 37, 42, 43, 53] further analyze the termination property and the iteration times of the MT algorithm under other conditions. Besides analyzing the iteration times of the MT algorithm, a number of works (including [50]) also analyze other sequential ALLLs [29, 31, 34, 36], explore properties of output distributions of ALLLs [29, 31, 32, 35], or design parallel and distributed ALLLs [13, 16, 25, 30, 50]. However, the proofs in all these works are relatively informal.
Existing results we verify. As listed below, we verify six representative results that cover the aforementioned research lines.
First, we verify the termination and the expected iteration times of the MT algorithm, under the Erdős-Lovász condition [19, 57], the cluster expansion condition [8], the Shearer’s condition [56], and the Erdős-Spencer condition [20]. These four results are proposed and informally proved in Theorem 1.2 of [50], Theorem 1.4 of [53], Theorem 4 of [42] and Theorem 6.1 of [50].
Second, we verify (the second part of) Theorem 2.2 of [31] that estimates the output distribution of the MT algorithm under the Erdős-Lovász condition. This result can also be viewed as estimating the output distribution of a sequential ALLL that only executes on core events (see Theorem 3.3 of [31]).
Finally, we verify the termination and a tail bound of the iteration times of a parallelizable version of the MT algorithm, under the Erdős-Lovász condition with \(\epsilon \)-slack. This variant and the tail bound are given in Theorem 1.3 of [50].
It is worth noting that we verify all the three “probabilistic” results from Moser and Tardos’s Gödel Prize-winning paper [50].4
Fig. 12.
The code of the MT algorithm, \(C_\textsf{MT}( cnt )\)
Verifying Theorem 1.2 of [50]. As an example, we explain in more detail how we verify Theorem 1.2 of [50], which we informally described in Sec. 2.
Fig. 12 shows \(C_\textsf{MT}( cnt )\), the code of the MT algorithm that we verify. It first does independent samplings and stores the results in \(x[1], \ldots , x[N]\) (line 1), where d and a are temporal variables. For the main loop (lines 3-13), we introduce \( flag \) to indicate whether a required assignment is found, \( cnt \) to record the number of iteration times, and \( lst \) to collect the indexes of the events in the execution log. They are initialized at line 2. In the main loop (lines 3-13), we use z to represent the index of the chosen event, which is an event that holds under the current \(x[1], \dots , x[N]\) (lines 4-7). If no such event exists, the code marks \( flag \) (line 8) and exits the loop (line 3). Otherwise, it resamples from \(\mathcal {D}[d]\) for every d such that \(\textsf{vbl}({z}, {d})\) holds, and updates the corresponding x[d] (lines 10-13).
Having defined the code of the MT algorithm, Moser and Tardos’s result (Theorem 1.2 of [50]) is formally stated in Thm. 4. Note that \(N, M, \mathcal {D}\) and \(\mathcal {E}\) are global parameters and thus not fixed in Thm. 4, and \(r_\textsf{EL}\) is parametrized by \(M\).
Theorem 4
For all reals \(\alpha _1, \dots , \alpha _M\in (0, 1)\), if the Erdős-Lovász condition [19, 57] holds, i.e. \(\forall i \in [1, M]. \ \textrm{P}(\mathcal {E}[i]) \le \alpha _i \prod \nolimits _{j \in \varGamma ({i})} (1 - \alpha _{j})\), and let \(r_\textsf{EL} = \sum _{i \in [1, M]} \alpha _i (1 - \alpha _i)^{-1}\), then \(\vDash \left[ \textbf{true}\right] C_\textsf{MT}( cnt ) \left[ \mathbb {E}[ cnt ] \le r_\textsf{EL}\right] \).
Proof
Sketch. Our proof follows the path in Fig. 5. Due to the space limit, here we only explain our construction of \(\textbf{R}\), used in the two RT-triples at the bottom of Fig. 5. Let \(\varLambda = g_\textsf{WT}(wt)\). Then,
Informally \(\textbf{R}\) says that, every event in \(wt\) (denoted by \({\varLambda }\langle {l}\rangle \)) must hold under any assignment of \(V_1, \dots , V_N\) satisfying RTAssign. \(\textsf {RTAssign}\) says, the assignment contains the “relevant” entries of \(\textsf{RT}\) which make the event \({\varLambda }\langle {l}\rangle \) hold when it is chosen in the execution of \(C'_\textsf{MT}( cnt , K)\). For each such entry, its row number i corresponds to a variable that the event depends on (i.e. \(\textsf{vbl}({{\varLambda }\langle {l}\rangle }, {i})\) holds), and its column number is computed by \(\textsf{ve}(i, g_\textsf{WT}(wt), l-1)\). Note our \(\textbf{R}\) only talks about the RT (and the \(wt\)), not about the actual execution of \(C'_\textsf{MT}( cnt , K)\).
We prove the remaining intermediate proof goals in Fig. 5 by adapting the program logic Ellora [3] (for proving \(\models \) triples) and using a classical Hoare-style logic (for proving \(\models _\text {RT}\) triples).    \(\square \)
Our new result. Thm. 4 shows the MT algorithm’s total correctness with \(r_\textsf{EL}\) as the upper bound of expected iteration times, under the Erdős-Lovász condition. There are many works [1, 37, 42, 43, 53] that informally study similar properties of the MT algorithm under other conditions. Most of these results use similar ideas with Moser and Tardos to analyze the algorithm, except that they introduce other witness-tree-like structures for analysis and derive various bounds. Like [50], they generate their witness-tree-like structures \(ds\) from prefixes of the execution log, enumerate the events in \(ds\) in some specific order, and bound a sum over all such structures to get their final upper bounds.
We unify these results to a general one. Our new result enables that, when proving the expected iteration number of the MT algorithm, without doing the complete proof following Moser and Tardos’s idea, one only needs to instantiate the required witness-tree-like structures and prove some relevant mathematical side conditions. We show that Theorem 1.2 of [50], Theorem 1.4 of [53] and Theorem 4 of [42] are corollaries of our new result. We give details of our new result and proofs in [46].
(Positive) almost sure termination. Existing proof methods for almost sure termination (AST) can be roughly classified into the following two categories: “direct” methods [11, 12, 24, 38, 4749], which prove termination by constructing probabilistic ranking functions, and “indirect” methods [40, 41, 51, 52], which infer finite bounds on the expected runtime and then imply the termination.
However, these methods may not apply to ALLLs’ termination. To construct the structures (e.g. ranking supermatingales [12, 24] and upper \(\omega \)-invariants [41]) required by these methods, we need to understand what occurs during each iteration of the algorithm’s outer loop, which is, however, not yet well understood. For example, [50] only analyzes the properties of the entire MT algorithm (e.g. (2)), not of each individual iteration.
In Sec. 2.3, we emphasize Lem. 1 as a general proof method for positive almost sure termination (PAST) [11]. Lem. 1 also serves as a fallback plan for proving (P-)AST. Informally, a part of existing methods [12, 24, 41, 49] provide stronger premises than Lem. 1’s. These premises are easier to prove in most scenarios, except for ALLLs. For most programs, one can still apply these existing methods; for programs like ALLLs, one should take a step back and apply Lem. 1.
Asynchronous coupling. In Sec. 2.4, we apply the RT-based coupling proof recipe to (8), which involves \(C'_\textsf{MT}( cnt , K)\) and \(\text {check}(wt)\). Existing probabilistic relational program logics [46] support couplings, but none of them can prove (8). Specifically, these works only provide proof rules for synchronous couplings. Their rules say that, when the two programs sample from the same distribution synchronously, we can reason as if the two sampling statements return the same value. But, it may not be possible to synchronize the sampling statements in \(C'_\textsf{MT}( cnt , K)\) and \(\text {check}(wt)\) for the following reason. Given an execution log’s prefix \(\varLambda \) and the corresponding witness tree \(wt= f_\textsf{WT}(\varLambda )\), \(C'_\textsf{MT}( cnt , K)\) resamples the variables that \(\eta _j\) depends on for every event \(\eta _j\) in \(\varLambda \), and \(\text {check}(wt)\) does similar resamplings but its events are taken from the sequence \(g_\textsf{WT}(wt)\). However, \(g_\textsf{WT}(wt)\) can be different from \(\varLambda \), since the construction of \(wt\) (i.e. \(f_\textsf{WT}(\varLambda )\)) may drop some events in \(\varLambda \) and lose some ordering information of \(\varLambda \), which \(g_\textsf{WT}(wt)\) cannot recover.
Recently [28] proposes a probabilistic relational program logic that supports asynchronous coupling. They introduce presampling tapes, a new kind of ghost state, which store the sampling results ahead of time. Our work is developed independently, with a more focused goal of verifying ALLLs. Technically, our RTs look similar to their tapes, but there are two key differences as follows.
First, we give an RT-based operational semantics, where all the samples (which could be infinitely many) are generated at once and stored in the \(RT\) before programs start execution, and the \(RT\) is immutable during the program execution. By contrast, sample values are added into their tapes one at a time and on demand by ghost operations in the logical reasoning, and are popped out at sampling statements. We think their approach is more flexible, but ours is more suitable for complicated examples like ALLLs. In particular, as we explain at the end of Sec. 2.4, we can use an intermediate assertion \(\textbf{R}\) to specify the whole sampling history. \(\textbf{R}\) can be derived as the post-condition of the unary reasoning of one program, and then used as the pre-condition of the other, thanks to the immutability of \(RT\). With dynamically changing tapes, they would need ghost variables to track the popped samples, and write complicated assertions to describe the correspondence between the tapes used by the two programs. We give a more detailed comparison in [46].
Second, the two works have different focuses. We mainly focus on verifying ALLLs, so we verify almost sure termination as well as a restricted form of relational properties (like (8)). Their work verifies contextual refinement, but does not verify termination.
Other related works. [22] proposes the guard strengthening proof rule for verifying lower bounds of expected values at the end of while loops. This rule introduces a loop with strengthened loop guard, which is similar to the truncated one in the premise of our loop truncation (Lem. 1 and Thm. 2). However, these two methods have different focuses. Their rule focuses on proving lower bounds, while our loop truncation focuses on proving general total correctness and PAST. The PAST is about an upper bound of the expected runtime.
We have discussed other related works in Sec. 2.2 and Sec. 2.4, including: the semantics that are equivalent to the distribution-based semantics [44, 45, 48], and the semantics with explicit random sources [10, 17, 44]. In the future, we would like to test our proof recipes with more applications, such as the other ALLL-related results mentioned in Sec. 6. We also plan to mechanize our work in a proof assistant like Coq, as [18] has mechanized the classical (i.e. non-constructive) proof of the Lovász Local Lemma in Isabelle/HOL. Mechanizing our work requires a measure-theoretic library that supports infinite product of measure spaces, which, to the best of our knowledge, is still lacking for Coq.

Acknowledgments

We thank anonymous referees for their suggestions and comments on earlier versions of this paper. This work is supported in part by National Natural Science Foundation of China (NSFC) under Grant No. 62232015.

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Fußnoten
1
In [50], Moser and Tardos did not explicitly introduce new algorithms (RT-MT and \(\text {RT-check}\)). The algorithm here is a possible interpretation of their prose description.
 
2
The name “\(\textsf{vbl}\)” is short for “variables”. Moser and Tardos [50] used \(\text {vbl}(\eta )\) as the minimal set of variables (i.e. arguments of the event) that determine \(\eta \).
 
3
In [6], \(C_\text {PRF}^\textsf{bad}\) and \(C_\text {PRF}\) are defined using procedure calls. We adapt the code here.
 
4
In [50], Moser and Tardos propose four results, three related to the MT algorithm and its probabilistic variants, and one related to a deterministic variant.
 
Literatur
7.
Zurück zum Zitat Batz, K., Kaminski, B.L., Katoen, J.P., Matheja, C.: Relatively complete verification of probabilistic programs: an expressive language for expectation-based reasoning. Proc. ACM Program. Lang. 5(POPL), 1–30 (2021). https://doi.org/10.1145/3434320 Batz, K., Kaminski, B.L., Katoen, J.P., Matheja, C.: Relatively complete verification of probabilistic programs: an expressive language for expectation-based reasoning. Proc. ACM Program. Lang. 5(POPL), 1–30 (2021). https://​doi.​org/​10.​1145/​3434320
13.
19.
Zurück zum Zitat Erdős, P., Lovász, L.: Problems and results on 3-chromatic hypergraphs and some related questions. Infinite and finite sets 10(2), 609–627 (1975) Erdős, P., Lovász, L.: Problems and results on 3-chromatic hypergraphs and some related questions. Infinite and finite sets 10(2), 609–627 (1975)
21.
Zurück zum Zitat Fan, W., Liang, H., Feng, X., Jiang, H.: A program logic for concurrent randomized programs in the oblivious adversary model. To appear in ESOP 2025. Fan, W., Liang, H., Feng, X., Jiang, H.: A program logic for concurrent randomized programs in the oblivious adversary model. To appear in ESOP 2025.
22.
28.
Zurück zum Zitat Gregersen, S.O., Aguirre, A., Haselwarter, P.G., Tassarotti, J., Birkedal, L.: Asynchronous probabilistic couplings in higher-order separation logic. Proc. ACM Program. Lang. 8(POPL), 753–784 (2024). https://doi.org/10.1145/3632868 Gregersen, S.O., Aguirre, A., Haselwarter, P.G., Tassarotti, J., Birkedal, L.: Asynchronous probabilistic couplings in higher-order separation logic. Proc. ACM Program. Lang. 8(POPL), 753–784 (2024). https://​doi.​org/​10.​1145/​3632868
38.
Metadaten
Titel
Verifying Algorithmic Versions of the Lovász Local Lemma
verfasst von
Rongen Lin
Hongjin Liang
Xinyu Feng
Copyright-Jahr
2025
DOI
https://doi.org/10.1007/978-3-031-91118-7_15