Dieses Kapitel befasst sich mit der Generierung von Interpolanten aus DRUP-Proofs und konzentriert sich auf ihre Anwendung bei der Überprüfung von SAT-Modellen. Es stellt Drup2Itp vor, ein Framework, das auf dem CaDiCaL 2.0 SAT-Solver aufbaut und die Tracer-API nutzt, um inkrementelle DRUP-Protokollierung im Speicher zu ermöglichen, ohne die Einbauten des Solvers zu verändern. Dieses Design ermöglicht flexible und effiziente interpolante Berechnungen, die speziell für Anwendungen zur Modellprüfung zugeschnitten sind. Das Kapitel behandelt die Implementierungsdetails von Drup2Itp, einschließlich seiner entkoppelten Architektur, spezialisierter Verfahren zur Unit Propagation und Konfliktanalyse und der Einführung eines Proof-Minimization-Algorithmus. Der Evaluierungsteil präsentiert einen gründlichen Vergleich von Drup2Itp mit bestehenden Tools unter Verwendung von Benchmarks der Hardware Model Checking Competitions (HWMCC '19 und HWMCC' 20). Die Ergebnisse zeigen, dass Drup2Itp, wenn es mit CaDiCaL integriert wird, die Leistung des Avy-Modellprüfers signifikant verbessert, mehr Fälle löst und eine bessere Laufzeitperformance im Vergleich zur Vanilla-Version erreicht. Das Kapitel untersucht auch die Auswirkungen der Beweisminimierung auf die interpolante Generierung und bietet Einblicke in das Potenzial weiterer Verbesserungen bei Interpolationstechniken und deren Anwendung bei der Modellüberprüfung.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
We present our implementation of DRUP-based interpolants in CaDiCaL 2.0, and evaluate performance in the bit-level model checker Avy using the Hardware Model Checking Competition benchmarks.
CaDiCaL is a state-of-the-art, open-source SAT solver known for its efficiency and flexibility. In its latest release, version 2.0, CaDiCaL introduces a new proof tracer API. This paper presents a tool that leverages this API to implement the DRUP-based algorithm for generating interpolants.
By integrating this algorithm into CaDiCaL, we enable its use in model-checking workflows that require interpolants. Our experimental evaluation shows that integrating CaDiCaL with DRUP-based interpolants in Avy results in better performance (both runtime and number of solved instances) when compared to Avy with Glucose as the main SAT solver.
Our implementation is publicly available and can be used by the formal methods community to further develop interpolation-based algorithms using the state-of-the-art SAT solver CaDiCaL. Since our implementation uses the Tracer API, it should be maintainable and applicable to future releases of CaDiCaL.
1 Introduction
Boolean satisfiability (SAT) is the problem of deciding if a given logical formula has a satisfying assignment or not. Modern SAT solvers can produce either a satisfying assignment or a proof of unsatisfiability for satisfiable or unsatisfiable formulas, respectively. There has been a growing interest in certifying the results returned by SAT solvers [8, 14, 15, 26]. Unlike satisfying assignments, certifying unsatisfiable instances is more involving and requires modifications to the SAT solver. This has led to an extensive research suggesting different proof formats that are on the one hand easy to generate (namely, without modifying the SAT solver extensively and without negatively affecting its performance), and on the other hand are easy to check.
In this paper we focus on proofs of unsatisfiability and their usage in SAT-based model checking (MC) [3, 5, 19, 22, 23]. SAT-based MC reduces model checking to instances of SAT, where in most cases, a satisfiable instance indicates the existence of a counterexample, while an unsatisifiable instance corresponds to bounded correctness. Hence, in the case of an unsatisfiable instance, a proof of unsatisfiability corresponds to a bounded proof of correctness. These kind of proofs have many usages in model checking, such as in abstraction-refinement [20], interpolation [19, 22, 23], debugging, and more.
Anzeige
We present an implementation of a framework that enables interpolant computation from DRUP proofs, which originally appeared in [13], using CaDiCaL as a SAT solver. CaDiCaL is a state-of-the-art open-source SAT solver renowned for its efficiency and flexibility. Our implementation, Drup2Itp, leverages the new Tracer API introduced in CaDiCaL 2.0 [4] to trace proof-related events during solver execution, enabling in-memory incremental DRUP proof logging. Unlike existing proof tracers in CaDiCaL, which are primarily designed for proof certification, Drup2Itp is tailored for model checking applications: it facilitates proof core extraction and interpolant computation, making CaDiCaL 2.0 applicable to abstraction-based and SAT-based model checking algorithms.
Drup2Itp, presented in this paper, is more than a re-implementation of [13] in a modern SAT solver. In addition, it incorporates various enhancements that are due to better engineering and more efficient algorithms. From an engineering point of view, Drup2Itp is decoupled from the SAT solver and does not require any modifications to the SAT solver. This adds flexibility and allows the proof maintained by Drup2Itp to be different than the proof CaDiCaL maintains, even during an incremental execution. This design is enabled mainly by the Tracer API. As a result, it allows us to develop a minimization algorithm, which modifies the proof generated by the solver and “prepares” it for interpolation computation. This algorithmic improvement highly affects the performance of Drup2Itp.
The development of Drup2Itp enables the integration of CaDiCaL as the underlying SAT solver in Avy [17, 23, 24], a bit-level model checker that requires sequence interpolants. We evaluate how Drup2Itp with CaDiCaL affects the performance of Avy using the HWMCC’19 and HWMCC’20 benchmarks. The results clearly demonstrate that the overall runtime performance of Avy, when using CaDiCaL and Drup2Itp, is improved when compared to vanilla Avy (using Glucose with DRUP-based interpolants). Moreover, it can solve instances that the vanilla version cannot solve within a one-hour time limit. Furthermore, the proof minimization algorithm used in Drup2Itp improves interpolant generation when using CaDiCaL incrementally.
Lastly, our implementation of Drup2Itp and its integration in Avy, is publicly available1 and can be used by other members of the formal methods community to further improve interpolation techniques and their usage in model checking and other applications.
Anzeige
1.1 Related Work
Aside from the interpolation support described in [4], we are not aware of other tools that allow generating interpolants using CaDiCaL. Unlike the approach we present in this paper, the generation of interpolants in [4] is based on generating the antecedents by CaDiCaL for every learned clause. This approach, conceptually, is similar to generating and using the resolution proof to compute the interpolant. Even if the entire resolution proof is not constructed and saved during the execution of the solver, and partial interpolants are computed on-the-fly per learned clause, this means that partial interpolants may be computed for clauses that do not participate in the derivation of the empty clause. Lastly, the antecedent-based approach may suffer from a similar performance issue in the case of incremental solving (as we noticed in our experiments, when the minimization technique is not enabled, see Section 4.2).
When compared to [13], the implementation presented in this paper does not require any modifications to the internals of the SAT solver. This fact makes Drup2Itp easily maintainable and applicable to future releases of CaDiCaL. Moreover, this separation enables application-specific modifications to the proof (e.g. minimization), without modifying the proof maintained by the SAT solver.
2 Preliminaries
2.1 CNF, Resolution, and UP
Given a set U of Boolean variables, a literal\(\ell \) is either a variable \(u\in U\) or its negation \(\lnot u\), a clause is a disjunction of literals, and a formula in Conjunctive Normal Form (CNF) is a conjunction of clauses.
The resolution rule is a fundamental inference rule that captures many of the steps performed by SAT solvers. It allows deriving a new clause by resolving two clauses containing complementary literals. Formally, applying the resolution rule on the clauses \(C_1 = (A \vee l)\) and \(C_2 = (B \vee \lnot l)\), where A and B are clauses (that do not contain complementary literals) and l and \(\lnot l\) are complementary literals, derives the clause \(C_3 = (A \vee B)\). The derived clause \(C_3\) is implied by \(C_1 \wedge C_2\), thereby can be added to the CNF while preserving equivalence.
As defined in [13], a resolution derivation of a clause \(\alpha \) from a CNF formula G is sequence \(\pi =(\alpha _1, \alpha _2,..., \alpha _n\equiv \alpha )\), where each clause \(\alpha _k\) is either an original clause of G or is derived by applying the resolution rule to clauses \(\alpha _i, \alpha _j\) with \(i,j<k\). A resolution derivation \((\alpha _1, \alpha _2,..., \alpha _k)\) is trivial, if all variables resolved on are distinct, and each \(\alpha _i\), for \(i > 2\), is either an original clause of G or derived by resolving \(\alpha _{i-1}\) with an original clause. A chain resolution rule, written \(\alpha _1, \alpha _2, ..., \alpha _k \vdash ^{\vec{x}}_{\text {TVR}}\alpha \) , states that \(\alpha \) can be derived from \(\alpha _1, \alpha _2, ..., \alpha _k\), the chain, by a trivial resolution derivation, where \(\vec{x}=(x_1,x_2,...,x_k)\) are the chain pivots. A chain derivation is a sequence \(\pi =(\alpha _1,\alpha _2,...,\alpha _n)\) where each \(\alpha _k\) is either an original clause of G or is derived by chain resolution from preceding clauses. A derivation witness of a chain derivation \(\pi \) is a total function D from clauses of \(\pi \) to sub-sequences of \(\pi \) such that
A resolution proof of unsatisfiability of G is a chain derivation of the empty clause from G. We refer to a derivation of the empty clause as a proof.
Given a partial assignment \(\tau \), a clause c is a unit clause under \(\tau \) if there exists exactly one literal \(l\in c\) such that \(l,\lnot l\not \in \tau \), while the rest of the literals in c are assigned to false, i.e., \(\forall l^{'}\in c, l^{'}\not = l: l^{'}\not \in \tau , \lnot l^{'}\in \tau \). Given a formula G and an assignment \(\tau \), Unit Propagation (a.k.a Boolean Constraint Propagation, or BCP) w.r.t. G and \(\tau \), is the process of repeatedly extending \(\tau \) with unit literals from G until a fixed-point is reached. If for some literal l, both l and \(\lnot l\) are in \(\tau \), we say Unit Propagation derives a conflict.
2.2 Validating Clausal Proofs
In [12], the authors show that the sequence of all learnt clauses, in the order they are learned, by a CDCL SAT solver, form a chain derivation. Moreover, they show that the chain derivation can be validated using Unit Propagation (UP) facilities of the solver.
Definition 1
(RUP). Let G be a CNF formula and c a clause over \( Var (G)\). If UP derives a conflict w.r.t. \(G\wedge \lnot c\) then c is deducible from G, denoted as \(G\vdash _{\textsc {UP}}c\). Clauses that are deducible via UP are also known to have the Reverse Unit Propagation (RUP) property.
The following lemma relates RUP and trivial resolution.
Lemma 1
([2]). Given a CNF formula G and a clause c, c is deducible from G by UP iff c is deducible from G by trivial resolution. That is, \(G\vdash _{\textsc {UP}}c \iff G\vdash _{\textsc {TVR}}c\).
Two algorithms were introduced in [12]: one for forward and one for backward proof validation. The forward validation algorithm replays the proof forward, checking that each clause is subsumed (using UP) by prior clauses. Backward validation walks the proof backwards, removing clauses, and checking that each removed clause has the RUP property w.r.t. the remaining clauses.
Backward proof validation was improved in [14], where a new proof clausal proof format, Delete Reverse Unit Propagation (DRUP), was introduced. DRUP is a sequence \(\pi =((\alpha _0, d_0), ..., (\alpha _n, d_n)\)\(\equiv (\square , \bot ))\), where each \(d_k\) is a Boolean flag indicating whether the clause \(\alpha _k\) is deleted, and \(\alpha _k\) is either an original clause or is derived by chain resolution from the set of \(k-\)active clauses \(\{\alpha _j|(j < k \wedge d_j = \bot ) \wedge (\forall j < i < k: \alpha _i \not = \alpha _j)\}\). Including clause deletion information makes validating a DRUP proof much more efficient since the validation of a clause \(\alpha _k\) depends only on the \(k-\)active clauses. In backward validation [14], it is possible to trim the proof by removing all clauses that do not participate in the derivation of the empty clause.
2.3 Interpolation and Colors
Craig Interpolant Given a pair of CNF formulae \(\langle A,B\rangle \) such that \(A \wedge B\) is unsatisfiable, Craig Interpolant is a formula I such that (i) \(A \models I\), (ii) \(I \models \lnot B\), and (iii) I is over the common variables of A and B. It is known that an interpolant can be computed in polynomial time from a resolution proof of unsatisfiability of \(\langle A,B\rangle \) [19, 21].
Definition 2
(N-Colored CNF). A N-colored CNF is a pair \((G, \kappa )\) of a CNF formula G and a coloring function \(\kappa : G \rightarrow [1,\ldots ,N]\) that assigns to every clause \(\alpha \in G\) a color between 1 and N.
The coloring extends naturally to variables. For each \(v \in Var (G)\), we define its minimum and maximum color as follows:
$$\begin{aligned} \kappa _\downarrow (v) &= \min \{ i \mid \exists \alpha \in G_i \cdot v \in \alpha \}\end{aligned}$$
(2)
$$\begin{aligned} \kappa _\uparrow (v) &= \max \{ i \mid \exists \alpha \in G_i \cdot v \in \alpha \} \end{aligned}$$
(3)
Sequence Interpolant A sequence interpolant for an N-colored unsatisfiable striped CNF \((G, \kappa )\) is a sequence of formulas \((\top \equiv I_0, \ldots , I_{N} \equiv \bot )\) such that for all \(1 \le ~\hbox {i} \le N\):
$$\begin{aligned} I_{i-1} \wedge G_i &\Rightarrow I_{i} & \forall v \in Var (I_i) &\cdot \kappa _\downarrow (v) = i \wedge \kappa _\uparrow (v) = i+1 \end{aligned}$$
2.4 The Model Checker Avy
Avy is a SAT-based model checker [17, 23, 24] that combines sequence interpolants [22] and Property Directed Reachability (PDR) [5, 10]. It uses bounded model checking (BMC) [3] to search for a counterexample of length k. If such a counterexample does not exist, it extracts a sequence interpolant for the unsatisfiable BMC formula, and uses PDR to generalize the sequence interpolant and transform it to CNF. As a key part of the algorithm requires sequence interpolants, we use it in order to evaluate the DRUP interpolants framework implemented using CaDiCaL.
3 Implementing DRUP-based Interpolants in CaDiCaL
In this section we present the implementation of DRUP-based interpolation in CaDiCaL. In CaDiCaL 2.0 [4], a specialized mechanism is added for online interaction with proof-related operations performed by the solver. This is done via the Tracer API, which allows users to implement various techniques that relate to proof events (e.g. the addition or deletion of a clause), without modifying the SAT solver itself. A user-defined tracer is connected to the internal solver of CaDiCaL and can react online (via callbacks) to proof-related events.
3.1 The Drup2Itp class
To implement the approach presented in [13], we developed Drup2Itp, an extension of the Tracer2 class. Figure 1 presents a partial view, required for what follows, of the Drup2Itp class. Drup2Itp maintains a stack of DRUP clausal proofs and uses its own independent clause database, which is stored in a hash table (clauses member). Additionally, it implements specialized Unit Propagation (UP) and Conflict Analysis procedures (UP() and analyze_core(), respectively), and maintains its own trail, reasons, and watches data structures. These are required for an efficient implementation of UP(). Similar to the standard CDCL SAT algorithm [18], the trail is a queue that keeps track of all currently implied and assumed (decided) literals, in the order they were assigned. The watches data structure helps efficiently track two unassigned literals in each clause, allowing the solver to quickly identify when a clause becomes unit or falsified during UP. The reasons data structure maps each assigned variable to the clause that caused its assignment (i.e., its reason clause). It also features the RUP() procedure, which performs Reverse Unit Propagation on a given clause and analyzes the resulting conflict (by calling analyze_core()). Since Drup2Itp logs a DRUP proof, every learned clause is guaranteed to have the RUP property, meaning that invoking RUP() (correctly) always derives a conflict.
We note that there exists an implementation of the Tracer API which is used to check DRAT proofs. This implementation includes a hash table for storing clauses and implements unit propagation. Drup2Itp uses this implementation as a starting point but extends it to include interpolation specific features.
Fig. 1.
The Drup2Itp class
Figure 2 presents the clause object used by Drup2Itp. It contains the clause’s literals, identifier (id), range, a pointer to the next clause in the hash table, and three Boolean flags: original, garbage, and core. Upon each clause addition notification from the solver, the tracer API provides Drup2Itp with relevant details about the clause, including its id, whether it is an original or learned clause, and its literals. Using this information, Drup2Itp creates a new clause object, assigns it the clause id, literals, sets the original flag, and stores it in its clause database. A clause can either be an original clause from the SAT problem or a clause learnt by the solver. If core flag is set to true, the clause is part of the computed proof core, whereas if garbage flag is set to true, the clause is deleted (or "detached") — it is no longer watched and is skipped during Unit Propagation (UP).
Fig. 2.
Clause object.
As in the original algorithm presented in [13], in order to support sequence interpolants clauses of the input formula are assigned different colors, with each original clause assigned a specific color. The Range class corresponds to the pair \((\kappa _\downarrow , \kappa _\uparrow )\), hence, for original clauses, the range of a clause object (see Figure 2) is always a singleton. For example, if an original clause c is assigned the color k, then its range is set to (k, k) (since \(\kappa _\downarrow (c)=\kappa _\uparrow (c)=k\)). For learned clauses, the range is determined during the replay() procedure (as described in the next section) and reflects the range of colors of all the clauses used to derive that learned clause in the given derivation witness (following the definition of \(\kappa _\downarrow \) and \(\kappa _\uparrow \)). The user is responsible for providing the coloring information to Drup2Itp. It accomplishes this by setting a specific member in Drup2Itp via set_current_color() method to track the current color, ensuring that each original clause added has its range set to the current color (as a singleton range). The pseudocode algorithm 1, presented in section 3.5, illustrates how colors are used when interpolants are required via the use of set_current_color().
Although Drup2Itp maintains an independent clause database, it remains synchronized with the solver’s clause database via clause identifiers (id). This guarantees that any clause object in Drup2Itp and a clause object in CaDiCaL sharing the same id always contain the same literals. As a result, Drup2Itp can efficiently look up the matching clause in its hash table by computing the hash index directly from that identifier.
The separation of the implementation of Drup2Itp from the solver itself provides several benefits, including non-intrusive integration with the solver, simplified development, and greater flexibility in manipulating the proof independently of the proof maintained by the solver in CaDiCaL. Additionally, because Drup2Itp logs a DRUP clausal proof, it does not rely on CaDiCaL to supply antecedent information.
3.2 Generating Interpolants On-The-Fly
The process of computing an interpolant involves two main steps: (1) trim(): the proof is trimmed, and a proof core is identified; (2) replay(): the proof core is replayed, and interpolants are computed on-the-fly. The declarations of these methods appear in Figure 1. We do not go into the details of the interpolation system used to produce interpolants. Instead, we focus solely on the implementation details. We emphasize that the user can choose an interpolation system [9, 13, 16, 19] and implement it (through the ResolutionProofIterator class, discussed later).
We now describe the operation of Drup2Itp during different stages of execution: initialization, solving, trimming and replaying. Throughout the following composition, we assume the SAT solver is given the CNF formula \(\varphi \).
Initialization During initialization, a Drup2Itp instance is created initializing only an empty hash table (clauses) to store the clauses and an empty clausal proof stack (proof). It is then connected to a CaDiCaL instance, which from here on, we refer to as the solver. Initialization of Drup2Itp is performed right at the beginning before clauses of \(\varphi \) are added to the solver.
Solving While solving, Drup2Itp is notified by the solver of every new clause addition and every existing clause deletion. Throughout the execution of the solver Drup2Itp tracks these notifications. Namely, for a clause addition notification, it creates and stores a new clause object, while for a clause deletion notification it marks the corresponding clause as garbage. All notifications are logged in the DRUP clausal proof stack. In addition, any unit literals (from newly added unary clauses) are appended to the trail, and their reason clauses are recorded in reasons. These units are not propagated but only added to the trail during the execution of the solver.
Trimming The Proof In case the solver derives the empty clause, Drup2Itp is now required to trim the proof. Before trim() is executed, Drup2Itp propagates trail and derives the conflict on its own. Note that Drup2Itp derives the same conflict the solver derives. This is because it mirrors the derivation performed by the solver, i.e. it propagates all units in the same order before it propagates any newly discovered units, if any.
Let us denote the DRUP proof generated by the solver (and stored in Drup2Itp.proof) as \(\Pi =\langle c_1,\ldots ,c_n\rangle \). trim() is responsible for removing clauses that do not contribute the the derivation of the empty clause. This is achieved by marking clauses that participate in the derivation of the empty clause as core clauses.
To better understand how trim() works, recall that in a valid DRUP proof for a CNF formula \(\varphi \), a learned clause \(c_i\), where \(1\le i\le n\) must satisfy the RUP property (Definition 1). Let us denote by \(\alpha (\Pi , i)\) all clauses \(c_1,\ldots ,c_{i-1}\) that are not marked as garbage. Then for every i it holds that \(\varphi , \alpha (\Pi , i)\vdash _{\textsc {UP}}c_i\). Equivalently, executing UP() when the active clauses only include \(\varphi \) and \(\alpha (\Pi , i)\) must result in a conflict. This property is used while trimming the proof.
The procedure begins by marking the empty clause as core. Note that at this point, all original and learned clauses that are not marked as garbage are active. trim() proceeds by traversing the DRUP proof stack proof backwards. Let us denote the clause that is currently being processed during backward traversal as c. If c is marked with garbage (i.e. it is deleted), it is activated (and thus revived). Otherwise, c is deactivated. Deactivation of a clause requires updating the trail. Hence, in case c is a reason clause for a literal on the trail, that literal, and all literals coming after it are removed from the trail. Assume that d is a reason clause for a literal l removed from the trail during this process. It is important to note that if d is marked as core, then all other reason clauses for literals \(l'\in d\) where \(l'\ne l\) are marked as core.
If c is marked as core (and not garbage) and not marked with original, literals in \(\lnot c\) are added as assumptions to the trail and UP() is called. Once a conflict is reached (recall c satisfies the RUP property), Drup2Itp performs conflict analysis (all these steps are done by invoking RUP()). During conflict analysis the chain resolution leading to the derivation of c is identified and its clauses are marked as core.
Replaying The Proof Once the proof is trimmed and the core is identified, replay() is responsible for traversing the DRUP proof core forward, applying local transformations on the proof, and constructing an interpolant on-the-fly. To facilitate this, replay() accepts an instance of ResolutionProofIterator (see Figure 1 and Figure 3), which it notifies with resolution steps.
Fig. 3.
Proof iterator
In order to support interpolants, clauses in Drup2Itp are marked with colors (Definition 2). The color of clauses and variables determine the way in which interpolants are constructed and depend on the interpolation method of choice. This is beyond the scope of this paper and we refrain from discussing these details. Yet, different interpolation procedures can be implemented using different implementations of the abstract class ResolutionProofIterator.
replay() traverses the DRUP proof forwards, skipping over clauses that are not marked as core. It notifies the ResolutionProofIterator with every clause marked as core. For a learned clause \(c_i\), replay() invokes RUP() and analyzes the conflict, similar to what is done in trim(). During the conflict analysis, the chain derivation for \(c_i\) is traversed on-the-fly using ResolutionProofIterator.
The class ResolutionProofIterator (see Figure 3) offers two methods, both of which indicate a chain resolution step. To report the complete chain derivation, replay() uses these two methods to notify ResolutionProofIterator of all chain resolutions in a specific order, ensuring that each new clause is only reported after all the clauses used to derive it have been reported. The method chain_resolution(Clause*) is used to report a resolvent that is a learnt clause, whereas chain_resolution(int) is used to report a resolvent that is a unit clause propagated on the trail. In both cases, the chain member of ResolutionProofIterator contains all the clauses and pivots participating in the chain resolution that derives the given learnt clause. Note that chain is reset before every call to chain_resolution(). The traversal concludes when replay() notifies ResolutionProofIterator with a nullptr as a resolvent, indicating the empty clause. At this point, ResolutionProofIterator traversed the complete chain derivation.
3.3 Exploring The Space of Proofs
The Tracer API provides an infrastructure that allows interacting with proof events generated by the solver, without modifying the solver. As a result, Drup2Itp implements procedures like UP() and analyze_core(), even though such procedures exist in CaDiCaL. In this context, it is important to note that a derivation witness is not unique, and different implementations of procedures such as UP() and analyze_core(), can lead to different chain derivations of the empty clause. For example, in model checking applications, smaller proofs are sometimes preferred. In this case, core clauses can be prioritized when executing UP() during the trimming, as suggested in [14].
Another notable example, also in the context of model checking, focuses on the shape of produced interpolants. Due to various reasons (beyond the scope of this paper), it is often desired for interpolants to be in CNF [6, 23, 25]. CNF interpolants can be derived from colorable chain refutations [13]. Since the SAT solver does not necessarily generate colorable chain refutations, one can try and reorder the proof during replay(). In general, reordering the entire proof may be infeasible, hence, in our implementation of replay() we use a specialized propagation procedure called color-ordered propagation. In color-ordered propagation, propagation is performed in iterations, where in the \(i^{th}\) iteration, only i-colored clauses are propagated. This implementation of the propagation procedure is designed to increase the colorability of the constructed chain refutation, i.e., to heuristically and efficiently attempt to bring the proof closer to a colorable chain refutation, with the goal of producing interpolants in which part of the interpolant can be represented by a CNF. More details about this approach can be found in [13].
These are only two examples, but many other heuristics can be experimented with due to this architecture.
3.4 Incrementality and Proof Minimization
In many applications, and in model checking in particular, the underlying SAT solver is used incrementally. Incrementally trimming and replaying the proof, however, presents several challenges. One issue is that if the DRUP proof is not compressed after each invocation of the solver, the cumulative DRUP proof can become quite large. This may negatively impact both memory usage and the time required for trimming and replaying. While the Tracer API provides flexibility, it makes compressing the proof and managing the internal clause database of the solver quite tricky. Moreover, implementing such functionality goes against the separation between the proof and the state of the internal solver.
For example, one might attempt to compress the clause database that is maintained by Drup2Itp after trimming, keeping only the clauses that are marked as core. While this approach is theoretically sound, it is difficult in practice. Any modification to the Drup2Itp clause database must be mirrored in the internal solver’s clause database. If the clause databases in Drup2Itp and the solver are not synchronized, the solver may learn new clauses using clauses that are not “visible” to Drup2Itp. Such a scenario makes trimming and replaying more complicated.
To address this issue, we introduce a new class called Minimizer. The Minimizer has two key members a solver object min_solver and a Drup2Itp object min_drup2itp. We now briefly describe its operation.
During trim(), the Minimizer is notified about all clauses that are marked as core, and adds them to min_solver. These notifications are handled by ItpClauseIterator that is passed to trim() (see Figure 1). When trimming is done, the Minimizer invokes min_solver (to “re-solve” only the original core clauses), and then invokes min_drup2itp.trim() and min_drup2itp.replay(), to compute the interpolant.
It is worth noting that solving the core subset of the problem is often empirically much faster than solving the original problem. This approach further minimizes the proof core, leads to a more compact interpolant, and avoids the major degradation involved in incremental replay() as demonstrated by our experimental evaluation (Section 4).
UNSAT Core Extraction While somewhat unrelated to interpolants, the above can also be used to extract an UnSAT Core and minimize it, specifically when the solver is used incrementally. This is useful in many applications such proof-based abstraction [20] or counterexample-guided abstraction refinement [7].
3.5 Demonstration
We present an example to demonstrate how Drup2Itp are used to compute an interpolant in Algorithm 1 and Algorithm 2. Moreover, we show how Minimizer is used to compute interpolants in Algorithm 3. In the following pseudocode, we assume that the class Interpolate is a derived class from the base class ResolutionProofIterator. It computes an interpolant iteratively through two pure virtual methods that represent resolution steps in the proof produced by Drup2Itp during replay(), with each method being triggered by a callback from Drup2Itp.
Algorithm 1
. Set a Drup2Itp instance to trace a CaDiCaL instance to solve a colored-CNF formula and compute an interpolant.
Algorithm 2
. Interpolant computation with Drup2Itp.
Algorithm 3
. Interpolant computation with Minimizer.
Recall that replay() traverses a derivation witness for the empty clause, and notifies the ResolutionProofIterator instance of each learned clause and its chain resolution, ensuring that a clause is only reported after all the clauses used to derive it have been reported. This order is particularly well-suited for SAT-based interpolation algorithms that compute interpolants from a resolution proof such as McMillan interpolation system [19].
As explained in Section 3.4, a freshly allocated Minimizer is passed to the main Drup2Itp object, which then trims the instance and notifies the Minimizer of the original core. The Minimizer then proceeds to solve, trim, and replay only the original core.
It is important to note that, in practice, minimization can involve more than one step. This means adding additional trim and solve steps, and other heuristics, to further minimize the proof that is be replayed for interpolation. However, for simplicity, we include only one minimization step in the pseudo-code.
4 Experimental Evaluation
Drup2Itp is implemented in our fork of CaDiCaL3. By default, CaDiCaL emits a DRAT proof. However, in the latest version available at the time of writing (2.1.2), CaDiCaL does not actually produce RAT clauses, so we can safely treat the emitted proof as a DRUP proof. If future releases of CaDiCaL introduce techniques that add RAT clauses to the proof, those features should either be disabled to ensure a a DRUP proof is emitted, or specialized support for DRAT proofs should be implemented.
4.1 Avy
In order to evaluate Drup2Itp, we integrated CaDiCaL 2.0 [4] with Drup2Itp into Avy [17, 23, 24], a bit-level model checker that combines sequence interpolants and Property Directed Reachability (PDR)4. Both Drup2Itp and its integration in Avy can be found on our group’s GitHub page5. Prior to the integration of CaDiCaL and Drup2Itp in Avy, it supported only two SAT solvers — Minisat [11] and Glucose [1], both using DRUP-based interpolants [13]. The introduction of Drup2Itp allowed Avy to utilize CaDiCaL 2.0, a state-of-the-art SAT solver.
We conducted a series of experiments comparing the performance of Avy across various configurations, using benchmarks from the Hardware Model Checking Competitions (HWMCC’19 includes 317 instances and HWMCC’20 includes 324 instances). The vanilla configuration of Avy uses Glucose (including its DRUP-based interpolants) as a SAT solver. All the other configurations uses CaDiCaL as a SAT solver. In all CaDiCaL-based configurations, when solving the BMC formula, CaDiCaL is invoked with pre-processing and in-processing techniques. We tested several different configurations that affect Drup2Itp. Namely, with respect to sequence-interpolant computation:
1.
Drup2Itp with and without minimization.
2.
Minimizer with and without pre/in-processing.
The same Avy switches are used for all different SAT solver configurations. All experiments were executed on machines with AMD EPYC 74F3 CPU and 32GB of memory. The timeout was set to 3600 seconds.
Table 1.
Evaluation results. Average runtime is in seconds, and Average Depth represents the average convergence depth for Avy.
Set
Solver
SAT Instances
UNSAT Instances
Unique
Avg Runtime
Avg Depth
HWMCC’19
CaDiCaL
31
172
2
1504.6
21.2
Minimizer
27
179
4
1427.6
22.4
Glucose
29
176
2
1447.5
25.5
Virtual Best
32
184
8
1337.4
19.2
HWMCC’20
CaDiCaL
34
169
4
1501.1
19.2
Minimizer
31
175
6
1446.5
19.8
Glucose
33
168
6
1511.6
23.0
Virtual Best
36
182
16
1363.8
18.0
Table 1 summarizes the results. The row labeled with CaDiCaL is for CaDiCaL operating in incremental mode with pre/in-processing enabled. The row labeled Minimizer is the same as CaDiCaL with the addition of proof minimization (see Section 3.4). In the presented configuration Minimizer is used with pre/in-processing techniques enabled. This configuration yields the best results for Avy with respect to the benchmark set we used. The results demonstrate that compared to the base version, Avy with CaDiCaL and minimization performs better overall and solves instances that other configurations are unable to solve within a one-hour time limit.
Another notable observation was that different configurations of CaDiCaL with Drup2Itp solved different sets of instances, leading to a higher virtual best score when combined. This suggests that there is significant potential in tailoring configurations for specific types of instances.
Fig. 4.
Comparing performance with and without Minimizer.
To further illustrate the impact minimization has on performance, we compare the runtime of CaDiCaL and Minimizer. This comparison appears in the scatter plots in Figure 4a and Figure 4v, for both HWMCC’19 and HWMCC’20, respectively. The results indicate that using Minimizer leads to better runtime performance as the majority of instances are placed under the diagonal of parity.
4.2 Incrementality and Minimization
In Section 3.4, we discuss the degradation in interpolant computation time and size associated with incremental solving in Drup2Itp, and explain how the Minimizer method addresses it. To illustrate this, we use HWMCC’19 and HWMCC’20 benchmarks, and run the two variants described above: CaDiCaL and Minimizer. During this run, we force the Avy model checker to compute sequence interpolants at every bound. Specifically, for each bound k, both variants unroll the model to depth k, solve a BMC(k) instance and compute a sequence interpolant. Consequently, at every bound k, both variants start the process of computing a sequence interpolant from the same DRUP proof. The only difference is that the Minimizer invokes replay() on the minimized proof.
For each test and each bound k, we record both the replayed proof size and the resulting interpolant size (after simplification). In Figures 6a and 6b, we show, for each test, the sum of proof size among all bounds for both variants. Similarly, we report the sum of interpolant size in Figures 6e and 6f, and the sum of time it takes to compute an interpolant in Figures 6c and 6d. To keep comparisons fair, we compute values over mutual bounds and ignore any additional bounds one variant completes beyond the other within the same one-hour time limit. For instance, if the Minimizer variant executes BMC up to bound 200 but the CaDiCaL variant only reaches 150, we average over the first 150 bounds only.
Fig. 5.
Proof size comparison while disabling pre-/in-processing during minimization.
Fig. 6.
Proof size, interpolant size, and interpolant computation time comparison.
Analyzing the results, we observe that the replayed proof from which an interpolant is computed is significantly smaller in the Minimizer variant than in the CaDiCaL variant. Moreover, in \(70\%\) of the cases, the resulting interpolant is, on average, smaller in Minimizer than in CaDiCaL. Furthermore, Figures 6c and 6d show that overhead of the Minimizer is not only minimal, but also highly beneficial, reducing the total runtime on average. While CaDiCaL simply replays the main trimmed proof, Minimizer re-solves a core subset of the original clauses, trims the resulting proof, and then replays this minimized proof. These results also suggest that replay() time strongly correlates with the size of the replayed proof.
Another observation is that, by disabling pre-/in-processing during the minimization step (i.e., in the second solve call of each interpolant computation), Minimizer produces even smaller proofs, as shown in Figures 5a and 5b. However, although this approach remains competitive, its overall performance is slightly behind the Minimizer variant that keeps pre-/in-processing enabled when run on the Avy benchmarks from Section 4.1. This suggests that enabling pre-/in-processing in the minimization step creates a trade-off between proof size and solve time. Nevertheless, the resulting interpolants did not consistently show a comparable reduction in size.
5 Conclusions
In this work we present an implementation of DRUP-based interpolants in CaDiCaL 2.0. Our implementation uses the Tracer API and therefore does not require any modifications to the internals of the SAT solver. This architecture allows us (and others) to further explore possible improvements to interpolants computation in CaDiCaL.
As future work, we plan to further develop this framework and extend it such that it supports other proof formats (e.d. DRAT and LRAT). Moreover, we intend to further investigate how model checking algorithms, that require interpolants, can further be improved, harnessing the full potential of modern SAT solvers such as CaDiCaL.
Acknowledgment
This research was partially supported by the Israeli Science Foundation (ISF) grant No. 2875/21.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.