We analyzed several benchmarks representative of different classes of problems. In the following, we analyze the peculiarities and complexity of modeling the algorithms belonging to each class. In Appendix A in the supplementary material, three models are described in detail. Most of the verified algorithms are synchronous and round-based, i.e. they assume that within a round both each process performs the computation of the algorithm for that round, and messages sent in the round are delivered to their destinations by the end of the same round. The few exceptions are highlighted. Only safety properties of the algorithms are verified with the proposed tool; liveness properties are considered just when they can be rewritten as safety properties (one way to get such a safety reformulation is to strenghten a liveness property by asking that a desired event must happen within a specified number of rounds).
All experiments have been conducted on a PC equipped with Intel Core i7-7700 processor 3.60 GHz and operating system Linux Ubuntu 18.04 (64 bits).
6.1.1 Agreement Algorithms with Either Omission or Malicious Failures.
The algorithms of this class consider a set of N processes residing on different hosts and communicating through a data network. Processes must reach an agreement about some value, in spite of possibile failures of some of them. Faulty processes may either (i) omit to send or receive some of the messages considered by the algorithm (benign failures), or (ii) maliciously fail (byzantine failures) reporting fake information. In the latter case, malicious processes might also coalesce in order to fool honest processes. In the former case, a process may also fail crash, that is, from a certain point on no message is anymore sent or received.
In order to express our problems within the restricted language explained in Sect.
5 (i.e. in the
\(\textsc {ArcaSim}\) format), all the models just consider the behavior of correct processes, while faulty ones are
abstracted away. (This is a technique used also in [
34,
35,
37‐
39]).
Both omission and byzantine failures are modeled by using a global variable f, whose value may be upper bounded by e.g. the algorithm resilience t. The number of correct processes is thus \(N-f\), and every message sent by a correct process is received by all other correct processes. Using the counter abstraction, we disregard the identities of processes sending a certain message; we simply impose that—if cm is the number of correct processes sending a certain kind of message—then each correct process receives in between cm and \(cm+f\) messages of that kind, where cm is the worst case of all faulty processes actually failing, and \(cm+f\) is the best case of all faulty processes behaving correctly. As far as omission failures are considered, faulty processes may or may not send their messages. As far as byzantine failures are concerned, independently of their state faulty processes may send whatever message they want (or none at all), and even send different messages to different destinations.
The verification results for this set of algorithms are reported in Table
1, where we show the considered algorithm, the property to be verified and the conditions under which it is verified, the number of transitions produced by
\(\textsc {ArcaSim}\) and its running time, the running time of
Z3 to process the file produced by
\(\textsc {ArcaSim}\) and the outcome of the verification. In order to properly understand the results in the Table, recall that when the
\(\mu Z\) module of
Z3 gives a
sat
answer, this means that
there exists a safety invariant for the abstracted counter system (so that the original system is also
safe); on the contrary, an
unsat
answer by
Z3 means that the safety condition for the abstracted counter system is violated (which is likely - but not necessarily - implying that the original is not safe).
14Table 1
Agreement algorithms with either benign or malicious failures
| Agreement | \(t \ge 2N/3\) | 11 | 0.58 | 0.50 | sat
|
| Agreement | \(t < 2N/3\) | 14 | 0.80 | 0.01 | unsat
|
| Weak Validity | \(t \ge 2N/3\) | 11 | 0.62 | 0.01 | sat
|
| Weak Validity | \(t < 2N/3\) | 14 | 0.81 | 0.01 | sat
|
| Irrevocability | \(t \ge 2N/3\) | 22 | 1.61 | 0.87 | sat
|
| Irrevocability | \(t < 2N/3\) | 28 | 2.23 | 0.07 | unsat
|
| Correctness | \(t < N/3\) | 7 | 0.43 | 0.04 | sat
|
| Correctness | \(t \le N/3\) | 7 | 0.41 | 0.03 | sat
|
| Unforgeability | \(t < N/3\) | 7 | 0.40 | 0.03 | sat
|
| Unforgeability | \(t \le N/3\) | 7 | 0.41 | 0.02 | unsat
|
| Relay (I) | \(t < N/3\) | 7 | 0.42 | 0.01 | sat
|
| Relay (I) | \(t \le N/3\) | 7 | 0.42 | 0.01 | sat
|
| Relay (II) | \(t < N/3\) | 6 | 0.38 | 0.03 | sat
|
| Relay (II) | \(t \le N/3\) | 6 | 0.39 | 0.03 | sat
|
| Integrity | – | 21 | 15.36 | 0.07 | sat
|
| Agreement | no \({{{\mathcal {P}}}}_{nosplit}\) | 21 | 19.85 | 0.03 | unsat
|
| Irrevocability | no \({{{\mathcal {P}}}}_{nosplit}\) | 36 | 28.62 | 0.16 | unsat
|
| Integrity | – | 15 | 288.66 | 0.14 | sat
|
| Agreement | no \({{{\mathcal {P}}}}_{nosplit}\) | 15 | 333.29 | 0.10 | sat
|
| Irrevocability | no \({{{\mathcal {P}}}}_{nosplit}\) | 23 | 408.05 | 0.04 | unsat
|
| Integrity | – | 11 | 257.81 | 0.04 | sat
|
| Agreement | no \({{{\mathcal {P}}}}_{nosplit}\) | 11 | 284.13 | 0.06 | sat
|
| Irrevocability | no \({{{\mathcal {P}}}}_{nosplit}\) | 19 | 410.09 | 1.04 | sat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Integrity | \(\alpha =0 \wedge \lnot {{{\mathcal {P}}}}_{safe}\) | 17 | 7.95 | 0.04 | unsat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Integrity | \(\alpha =0 \wedge {{{\mathcal {P}}}}_{safe}\) | 30 | 13.14 | 0.10 | sat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Integrity | \(\alpha =1 \wedge \lnot {{{\mathcal {P}}}}_{safe}\) | 14 | 6.74 | 0.05 | unsat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Integrity | \(1 \le \alpha < N/2 \wedge \lnot {{{\mathcal {P}}}}_{safe}\) | 14 | 6.75 | 0.06 | unsat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Integrity | \(\alpha =1 \wedge {{{\mathcal {P}}}}_{safe}\) | 26 | 11.52 | 0.13 | sat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Integrity | \(1 \le \alpha < N/2 \wedge {{{\mathcal {P}}}}_{safe}\) | 26 | 11.52 | 0.21 | sat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Agreement | \(\alpha =0 \wedge \lnot {{{\mathcal {P}}}}_{safe}\) | 17 | 10.73 | 0.10 | unsat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Agreement | \(\alpha =0 \wedge {{{\mathcal {P}}}}_{safe}\) | 30 | 17.66 | 0.90 | sat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Agreement | \(\alpha =1 \wedge \lnot {{{\mathcal {P}}}}_{safe}\) | 14 | 8.99 | 0.12 | unsat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Agreement | \(1 \le \alpha < N/2 \wedge \lnot {{{\mathcal {P}}}}_{safe}\) | 14 | 8.86 | 0.06 | unsat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Agreement | \(\alpha =1 \wedge {{{\mathcal {P}}}}_{safe}\) | 26 | 15.25 | 0.56 | sat
|
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [ 10] | Agreement | \(1 \le \alpha < N/2 \wedge {{{\mathcal {P}}}}_{safe}\) | 26 | 15.29 | 0.85 | sat
|
| Correctness | \(t < N/2 \wedge \ge (f+1)\) inits | 9 | 1.25 | 0.02 | sat
|
| Correctness | \(t < N/2 \wedge \ge f\) inits | 9 | 1.25 | 0.01 | unsat
|
| Unforgeability | \(t < N/2 \wedge \ge (f+1)\) inits | 9 | 1.25 | 0.02 | sat
|
| Relay (I) | \(t < N/2 \wedge \) no echo | 9 | 1.26 | 0.02 | sat
|
| Relay (II) | \(t < N/2 \wedge \) 1 echo | 9 | 1.61 | 0.02 | sat
|
| Validity | \(t < N/5\) | 13 | 0.56 | 0.02 | sat
|
| Validity | \(t \le N/3\) | 21 | 0.89 | 0.13 | unsat
|
| Agreement | \(t < N/5\) | 13 | 0.83 | 0.08 | sat
|
| Agreement | \(t \le N/3\) | 21 | 1.30 | 0.23 | unsat
|
The One-Third (OT) algorithm [
11] solves the Consensus problem in case of benign failures. Formally, the problem is defined as follows: each process starts with its own initial value. By the end of the algorithm, processes must decide for one of those values so that the following properties are satisfied:
-
Agreement whenever two processes have reached a decision, the values they have decided on must be equal.
-
Integrity - Weak Validity if all processes propose the same initial value, they must decide on that value.
-
Irrevocability if a process has decided on a value it does not revoke its decision later.
No upper bound is needed on the number of faulty processes. As for all other Consensus algorithms considered in this work, we limited the set of possibile initial values to
\(\{0, 1\}\) (this is often not a limitation, see the 0-1 theorems from [
46]).
The Irrevocability property requires to check that never in the future a revocation occurs. For OT and the other Consensus algorithms mentioned in the sequel, we verified it by re-formulating Irrevocability as a safety property: an integer global variable dec is used, which is initialized to 0, and whose value becomes 1 whenever a process having already decided for a value v takes a decision for a value \({\overline{v}} \ne v\). The unsafe condition is \(dec > 0\) and—through backward search—we check whether a state with \(dec =1\) can be reached from the initial condition with all processes undecided and with no constraint on their initial values.
Formal verification highlighted something that was not evident in the original formulation of some problems, that is, it allowed to discover that some properties in the problems enunciations are indeed “trivial”; in fact, they cannot be violated for any number of faulty processes. This is the case for the Weak Validity property of OT: if all processes own the same initial value, there is no way to decide for a different value, considering that processes cannot lie.
The Uniform Voting algorithm (UV) [
16] similarly solves the Consensus problem in the presence of benign failures. The solved problem is analogous to that defined for OT, reformulating the Integrity property as follows: “Any decision value is the initial value of some process” (which is also indicated as
Strong Validity). In order to guarantee the Agreement and Irrevocability property, UV requires the system to satisfy a
\(\mathcal{P}_{nosplit}\) condition that imposes that communication failures must not partition the network, that is, there cannot exist two subsets of processes such that processes belonging to the same subset communicate amongst themselves, but processes belonging to different subsets not. We omitted to include the
\({{{\mathcal {P}}}}_{nosplit}\) property in our models because—by abstracting away the processes identities, and just counting both the number of processes performing a certain action and the number of messages received whatever are their sources—we cannot represent the identities of communicating processes and thus there is no way to model the property within the restricted language of
\(\textsc {ArcaSim}\). As a consequence, the models verifications expectedly result in an unsafe outcome.
The Coordinated UV algorithm (CoUV) [
16] derives from UV and solves the same problem under the same conditions. Differently from UV—which adopts a distributed communication pattern where each process communicates with all the others—CoUV adopts a rotating coordinator paradigm such that at each round a process behaves as coordinator, to which other processes send their values and which tries to help them decide. The Simplified CoUV (SiCoUV) [
16] shortens the algorithm execution with slight modifications of the process computation that allow to reduce the number of rounds needed to decide. In both cases, the rotating coordinators have been modeled by using a local variable
C[
x] that assumes three values indicating whether
x already has been, currently is, or never was so far a coordinator. At the beginning of each round, a coordinator is nondeterministically taken from the yet unelected processes.
These two algorithms show the impact of central coordination on correctness. Both satisfy Agreement also in the presence of partitions: processes in the same partition as the coordinator decide according to its indications, while partitioned processes do not decide. Yet, in CoUV, they may retain some value different from that chosen by the coordinator. If partitions change later and the new coordinator is a previously partitioned process with a value different from that disseminated by the previous coordinator, then a process may change its decision, thus violating Irrevocability. By contrast, in SiCoUV, the simplification—consisting in voting just for the value received by the coordinator of the current phase or for no value at all—prevents processes owning values from previous coordinators to vote for those values, thus possibly inducing inconsistent decisions for stale values.
\({{{\mathcal {U}}}}_{T,E,\alpha }\) [
10] solves the Consensus problem without the Irrevocability property, in the presence of byzantine failures. It requires the system to fulfill two properties, namely that the number of malicious messages received by each process in each round is
\(\le \alpha < N/2\) (
\({{{\mathcal {P}}}}_{\alpha }\) property), and that the total number of received correct messages is
\(> N/2\) (
\({{{\mathcal {P}}}}_{safe}\) property). Various combinations of both
\(\alpha \) and
\({{{\mathcal {P}}}}_{safe}\) have been considered in our verifications.
In [
9], an algorithm is proposed to solve Consensus in asynchronous systems with byzantine failures and a resilience
\(t < N/5\). Such an algorithm may also work in synchronous systems if the number of faulty processes is upper bounded by
\(O(\sqrt{N})\). In this case, we modeled an
asynchronous system in that no time quantization is reproduced, but just a division in phases as in the original algorithm. In the first phase, a process waits till it receives at least
\(N-t\) messages; it decides the value to diffuse subsequently and switches to the next phase. In the second phase, a process waits till it receives at least
\(N-t\) messages and tries to decide. If this is not possible, the process goes back to the first phase. Leveraging the backward search, we started from the configurations in which the considered property is violated. The algorithm runs in which the sufficient number of messages is not received—which do not lead to any action—are unimportant; we verified whether the cases in which actions are undertaken by the processes can lead to an unsafe state. As a resilience value we used
\(t < N/5\) as in the original paper, which actually leads to a
safe result. Moreover—since the article ignores the lower bound on the resilience for Byzantine failures (
\(t < N/3\) [
48]) – we considered the case of violation of this lower bound which correctly gives
unsafe results.
The Byzantine Broadcast Primitive (BBP) [
55] aims at achieving agreement among the processes about the messages to deliver. This algorithm tolerates byzantine failures and requires that the number
t of faulty processes is such that
\(N > 3t\). BBP is a round-based algorithm operating in synchronous systems. It fulfills the following properties:
-
Correctness If correct process p broadcasts (p, m, k) in round k, then every correct process accepts (p, m, k) in the same round.
-
Unforgeability If process p is correct and does not broadcast (p, m, k), then no correct process ever accepts (p, m, k).
-
Relay If a correct process accepts (p, m, k) in round \(r > k\), then every other correct process accepts (p, m, k) in round \(r + 1\) or earlier.
The Relay property asks to check all the states reachable from the configurations satisfying the hypothesis; as this is not possibile, we had to re-write this property as two separate safety properties sequentially verified. In Appendix A.1 in the supplementary material, we explain in detail this procedure.
Formal verification reveals that Correctness and Relay properties cannot be violated for any number of faulty processes. The Correctness property cannot be violated since the threshold for acceptance is equal to the minimum number of correct processes, and the initial broadcast is performed by a correct process and thus received by all correct processes; hence, there is no way for a correct process to not receive enough echo’s. The Relay property cannot be violated because a correct accepting process must have received at least \(N-2t\) echo’s from correct processes; those echo’s are received by each correct process, all correct processes send their own echo, and as a consequence there are \(N-t\) correct echo’s around that allow each correct process to decide.
The Send Receive Broadcast Primitive (SRBP) algorithm from [
54] is proposed as a basis for clock synchronization in systems affected by benign failures. It requires an upper bound
t on the number
f of faulty processes, i.e.,
\(N > 2t\) and
\(t \ge f\). The algorithm satisfies the same properties as BBP, with the broadcast message consisting in a time signal
\((round\ k)\). The same two-steps verification of the Relay property is adopted as for BBP. In this case as well, formal verification shows that Unforgeability and Relay properties cannot be violated. The former trivially follows from the fact that no message is around and faulty processes cannot lie. The latter follows from considerations similar to the case of BBP.
6.1.2 Agreement and Reliable Multicast Algorithms with Crash Failures
The algorithms considered for this category have in common a failure model such that processes behave correctly until they possibly fail, but from the failure on no action is anymore taken, nor any message is sent or received (fail-stop model). Crash failures may partially disrupt the broadcast transmission of a message in the sense that the message may reach just a subset of its destinations.
Although counter-intuitive, these failures are harder to model than both omission and byzantine failures. We describe the state of each process with a local variable F that can assume three values, indicating if the process is (so far) correct, if it crashed in the past, or if it is crashing in the current algorithm step and it will send only part of the messages scheduled to be transmitted currently. In the last case, by the end of the step the process is moved to the crashed processes and it will do nothing in the future. At every step, the sum of both crashing and crashed processes must not exceed the algorithm resilience.
The results for these algorithms are reported in Table
2.
FloodSet [
45] is a Consensus algorithm that satisfies the same properties as OT, with no resilience and terminating after
\(f+1\) rounds with
f the number of crashed processes in the current run. Processes circulate the values received in the previous rounds; in the last round, processes decide either for the unique value they received, or for a default value if more values have been observed.
FloodMin [
45] is a Consensus algorithm that solves the
k-agreement problem: the algorithm runs for
\(\lfloor f/k \rfloor +1\) rounds—with
f defined as before—after which it is requested that the values decided upon by the processes are in a set of cardinality at most
k. This reduces to classical Consensus for
\(k=1\). FloodMin guarantees both the Weak Validity and the Strong Validity properties defined before. We performed experiments with two values of
k, and with a set
V of initial values of cardinality either 2 or 3.
FC [
50] is a Consensus algorithm that guarantees both Agreement and Strong Validity as defined above; it terminates at round
\(t+1\) with
\(t < N\) the algorithm resilience, and
\(f \le t\) number of actually crashed processes; the decision is the smallest received value.
EDAC [
15] as well solves Consensus, but it considers the Weak Validity property. This algorithm is
early-deciding in the sense that—if a process does not detect new failures in the current round—it decides, differently from the above three algorithms.The problem of modeling this algorithm is that it would require each process to record the observed crashes (a process is assumed having crashed when no expected message is received from it in the current round), which should be modeled with bi-dimensional arrays, not available in
\(\textsc {ArcaSim}\). Considering the syntactic constraints of
\(\textsc {ArcaSim}\), the fact that (
i) no message is anymore received from a process after it crashes and (
ii) a crashed process is detected within at most the following round by all alive processes, we abstracted this part of the algorithm by using a global variable
cnum that counts the crashed processes and is incremented for all processes as soon as a new crash is observed by the first process.
Table 2
Agreement and Reliable multicast algorithms with crash failures
| Weak Validity | – | 11 | 4.43 | 0.02 | sat
|
| Agreement | – | 11 | 5.18 | 0.03 | sat
|
| Weak Validity | \(k=1 \wedge |V|=2\) | 9 | 1.37 | 0.01 | sat
|
| Strong Validity | \(k=1 \wedge |V|=2\) | 9 | 1.42 | 0.01 | sat
|
| k-Agreement | \(k=1 \wedge |V|=2\) | 9 | 1.59 | 0.03 | sat
|
| Weak Validity | \(k=1 \wedge |V|=3\) | 17 | 11.84 | 0.05 | sat
|
| Strong Validity | \(k=1 \wedge |V|=3\) | 17 | 10.30 | 0.02 | sat
|
| k-Agreement | \(k=1 \wedge |V|=3\) | 17 | 13.03 | 0.08 | sat
|
| Weak Validity | \(k=2 \wedge |V|=3\) | 17 | 11.78 | 0.06 | sat
|
| Strong Validity | \(k=2 \wedge |V|=3\) | 17 | 10.31 | 0.02 | sat
|
| k-Agreement | \(k=2 \wedge |V|=3\) | 17 | 13.06 | 0.05 | sat
|
| (Strong) Validity | \(|V|=2\) | 9 | 13.92 | 0.03 | sat
|
| Agreement | \(|V|=2\) | 9 | 15.29 | 0.07 | sat
|
| (Weak) Validity | \(|V| = 2\) | 35 | 46.95 | 0.03 | sat
|
| Agreement | \(|V| = 2\) | 35 | 49.66 | 0.02 | sat
|
| Validity | – | 2 | 0.13 | 0.06 | sat
|
| Unif.Agreem. | – | 2 | 0.19 | 0.00\(^{15}\) | sat
|
| Integrity (I) | – | 2 | 0.16 | 0.00\(^{15}\) | sat
|
| Integrity (II) | – | 2 | 0.15 | 0.00\(^{15}\) | sat
|
UTRB1 [
8] solves the Uniform Timed Reliable Broadcast problem, which guarantees the following properties:
15-
Validity: If a correct process broadcasts a message m, then all correct processes eventually deliver m.
-
Integrity: For any message m, each process delivers m at most once and only if some process actually broadcasts m.
-
Uniform Agreement: If any process delivers a message m then all correct processes eventually deliver m.
-
Timeliness: There exists a known constant \(\varDelta \) such that if the broadcast of a message m is initiated at time t, no process delivers m after \(t + \varDelta \).
Following the pen-and-paper correctness proof of UTRB1, we applied the Timeliness property to instantiate the “eventual” attribute and verified that both uniform agreement and validity are reached by round
\(f+1\). As far as Integrity is concerned, we separately checked its two components, by verifying that (
i) if no process broadcasts
m then no process delivers it, and (
ii) each process delivers
m at most once; both conditions are checked at round
\(f+1\) when the algorithm run terminates.
6.1.3 Cache Coherence and Mutual Exclusion Algorithms
The MESI [
47] and MOESI [
53] algorithms have been designed to guarantee cache coherence. This problem affects shared-memory multi-processors systems where more than one process at a time may access the same memory location and copy the content to its processor’s cache, but
the algorithms must guarantee that at most one process at a time is allowed to modify its copy, and successive read operations to the same location must return the most updated value. The problem of verifying these algorithms w.r.t. the safety property above lies in the fact that there may be more processes in the same state—and thus satisfying the same guard—but just one of them is allowed to fire at a time; this feature differentiates the algorithms in question from pure counter-based algorithms. In order to model this characteristic, we used both a global variable
flag that is initialized to 0, becomes 1 when one of the processes in either the invalid or the shared state prepares to fire, and returns to 0 after the process has performed its operation, and a local array variable
F[
x] that is initialized to 0, becomes 1 for a process in either invalid or shared state that is selected to fire, and returns to 0 when that process fires. We constrain the system such that
\(\# \{x | F[x]=1\} \le 1\), that is, at most one process at a time—in one between the invalid and the shared state—may be selected to perform some operation. By contrast, in the system just one process at a time may be in either the modified or the exclusive state, and hence the constraint is not applied to those processes.
Dekker [
20] is a classical mutual exclusion algorithm that guarantees that no more than one process is in critical section. We used local variables
T[
x] to record what process has the turn, and
WE[
x] as the local flag to record that process
x wants to enter the critical section. As before, we use a constraint imposing that
\(\# \{x | T[x]=1\} \le 1\), that is, it is the turn of at most one process at a time; this way
T[
x] takes the place of the global variable
turn in Dekker’s algorithm. A variable
ST[
x] records whether
x is currently in the critical section.
The results achieved with these algorithms are shown in Table
3.
Table 3
Cache coherence and mutual exclusion algorithms
| Cache coherence | – | 15 | 0.21 | 0.05 | sat
|
| Cache coherence | – | 20 | 0.41 | 0.07 | sat
|
| Mutual exclusion | – | 5 | 0.04 | 0.00 | sat
|