Skip to main content
Erschienen in: Journal of Automated Reasoning 1-4/2018

Open Access 12.03.2018

A Verified SAT Solver Framework with Learn, Forget, Restart, and Incrementality

verfasst von: Jasmin Christian Blanchette, Mathias Fleury, Peter Lammich, Christoph Weidenbach

Erschienen in: Journal of Automated Reasoning | Ausgabe 1-4/2018

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We developed a formal framework for conflict-driven clause learning (CDCL) using the Isabelle/HOL proof assistant. Through a chain of refinements, an abstract CDCL calculus is connected first to a more concrete calculus, then to a SAT solver expressed in a functional programming language, and finally to a SAT solver in an imperative language, with total correctness guarantees. The framework offers a convenient way to prove metatheorems and experiment with variants, including the Davis–Putnam–Logemann–Loveland (DPLL) calculus. The imperative program relies on the two-watched-literal data structure and other optimizations found in modern solvers. We used Isabelle’s Refinement Framework to automate the most tedious refinement steps. The most noteworthy aspects of our work are the inclusion of rules for forget, restart, and incremental solving and the application of stepwise refinement.

1 Introduction

Researchers in automated reasoning spend a substantial portion of their work time developing logical calculi and proving metatheorems about them. These proofs are typically carried out with pen and paper, which is error-prone and can be tedious. Today’s proof assistants are easier to use than their predecessors and can help reduce the amount of tedious work, so it makes sense to use them for this kind of research.
In this spirit, we started an effort, called https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figa_HTML.gif (Isabelle Formalization of Logic) [4], that aims at developing libraries and a methodology for formalizing modern research in the field, using the Isabelle/HOL proof assistant [45, 46]. Our initial emphasis is on established results about propositional and first-order logic. In particular, we are formalizing large parts of Weidenbach’s forthcoming textbook, tentatively called https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figb_HTML.gif . Our inspiration for formalizing logic is the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figc_HTML.gif (Isabelle Formalization of Rewriting) project [55], which focuses on term rewriting.
The objective of formalization work is not to eliminate paper proofs, but to complement them with rich formal companions. Formalizations help catch mistakes, whether superficial or deep, in specifications and theorems; they make it easy to experiment with changes or variants of concepts; and they help clarify concepts left vague on paper.
This article presents our formalization of CDCL (conflict-driven clause learning) based on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figd_HTML.gif , derived as a refinement of Nieuwenhuis, Oliveras, and Tinelli’s abstract presentation of CDCL [43]. It is the algorithm implemented in modern propositional satisfiability (SAT) solvers. We start with a family of formalized abstract DPLL (Davis–Putnam–Logemann–Loveland) [17] and CDCL [3, 6, 40, 42] transition systems from Nieuwenhuis et al. (Sect. 3). Some of the calculi include rules for learning and forgetting clauses and for restarting the search. All calculi are proved sound and complete, as well as terminating under a reasonable strategy.
The abstract CDCL calculus is refined into the more concrete calculus presented in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fige_HTML.gif and recently published [57] (Sect. 4). The latter specifies a criterion for learning clauses representing first unique implication points [6, Chapter 3], with the guarantee that learned clauses are not redundant and hence derived at most once. The correctness results (soundness, completeness, termination) are inherited from the abstract calculus. The calculus also supports incremental solving.
The concrete calculus is refined further to obtain a verified, but very naive, functional program extracted using Isabelle’s code generator (Sect. 5). The final refinement step derives an imperative SAT solver implementation with efficient data structures, including the well-known two-watched-literal optimization (Sect. 6).
Our work is related to other verifications of SAT solvers, which largely aimed at increasing their trustworthiness (Sect. 7). This goal has lost some of its significance with the emergence of formats for certificates that are easy to generate, even in highly optimized solvers, and that can be processed efficiently by verified checkers [16, 33]. In contrast, our focus is on formalizing the metatheory of CDCL, with the following objectives:
  • Develop a basic library of formalized results and a methodology aimed at researchers who want to experiment with calculi.
  • Study and connect the members of the CDCL family, including newer extensions.
  • Check the proofs in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figf_HTML.gif and provide a formal companion to the book.
  • Assess the suitability of Isabelle/HOL for formalizing logical calculi.
Compared with the other verified SAT solvers, the most noteworthy features of our framework are the inclusion of rules for forget, restart, and incremental solving and the application of stepwise refinement [59] to transfer results. The framework is available as part of the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figg_HTML.gif repository [20].
Any formalization effort is a case study in the use of a proof assistant. We depended heavily on the following features of Isabelle:
  • Isar [58] is a textual proof format inspired by the pioneering Mizar system [41]. It makes it possible to write structured, readable proofs—a requisite for any formalization that aims at clarifying an informal proof.
  • Sledgehammer [7, 48] integrates superposition provers and SMT (satisfiability modulo theories) solvers in Isabelle to discharge proof obligations. The SMT solvers, and one of the superposition provers [56], are built around a SAT solver, resulting in a situation where SAT solvers are employed to prove their own metatheory.
  • Locales [2, 25] parameterize theories over operations and assumptions, encouraging a modular style. They are useful to express hierarchies of concepts and to reduce the number of parameters and assumptions that must be threaded through a formal development.
  • The Refinement Framework [30] can be used to express refinements from abstract data structures and algorithms to concrete, optimized implementations. This allows us to reason about simple algebraic objects and yet obtain efficient programs. The Sepref tool [31] builds on the Refinement Framework to derive an imperative program, which can be extracted to Standard ML and other programming languages. For example, Isabelle’s algebraic lists can be refined to mutable arrays in ML.
An earlier version of this work was presented at IJCAR 2016 [11]. This article extends the conference paper with a description of the refinement to an imperative implementation (Sects. 2.4 and 6) and of the formalization of Weidenbach’s DPLL calculus (Sect. 4.1). To make the paper more accessible, we expanded the background material about Sledgehammer (Sect. 2.1) and Isar (Sect. 2.2).

2 Isabelle/HOL

Isabelle [45, 46] is a generic proof assistant that supports several object logics. The metalogic is an intuitionistic fragment of higher-order logic (HOL) [15]. The types are built from type variables https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq1_HTML.gif and n-ary type constructors, normally written in postfix notation (e.g, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq2_HTML.gif ). The infix type constructor https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq3_HTML.gif is interpreted as the (total) function space from https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq4_HTML.gif to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq5_HTML.gif . Function applications are written in a curried style without parentheses (e.g., https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq6_HTML.gif ). Anonymous functions \(x \mapsto t_x\) are written \(\lambda x.\; t_x\). The notation https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq9_HTML.gif indicates that term t has type \(\tau \). Propositions are terms of type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq11_HTML.gif , a type with at least two values. Symbols belonging to the signature (e.g., https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq12_HTML.gif ) are uniformly called constants, even if they are functions or predicates. No syntactic distinction is enforced between terms and formulas. The metalogical operators are universal quantification https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq13_HTML.gif , implication https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq14_HTML.gif , and equality https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq15_HTML.gif . The notation \({\bigwedge }x.\; p_x\) abbreviates \({\bigwedge }\;(\lambda x.\; p_x)\) and similarly for other binder notations.
Isabelle/HOL is the instantiation of Isabelle with HOL, an object logic for classical HOL extended with rank-1 (top-level) polymorphism and Haskell-style type classes. It axiomatizes a type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq18_HTML.gif of Booleans as well as its own set of logical symbols (\(\forall \), \(\exists \), https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figh_HTML.gif , https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figi_HTML.gif , \(\lnot \), \(\wedge \), \(\vee \), \(\longrightarrow \), https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq25_HTML.gif , \(=\)). The object logic is embedded in the metalogic via a constant https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq27_HTML.gif , which is normally not printed. In practice, the distinction between the two logical levels is important operationally but not semantically.
Isabelle adheres to the tradition that started in the 1970s by the LCF system [22]: All inferences are derived by a small trusted kernel; types and functions are defined rather than axiomatized to guard against inconsistencies. High-level specification mechanisms let us define important classes of types and functions, notably inductive datatypes, inductive predicates, and recursive functions. Internally, the system synthesizes appropriate low-level definitions and derives the user specifications via primitive inferences.
Isabelle developments are organized as collections of theory files that build on one another. Each file consists of definitions, lemmas, and proofs expressed in Isar [58], Isabelle’s input language. Isar proofs are expressed either as a sequence of tactics that manipulate the proof state directly or in a declarative, natural-deduction format inspired by Mizar. Our formalization almost exclusively employs the more readable declarative style.

2.1 Sledgehammer

The Sledgehammer subsystem [7, 48] integrates automatic theorem provers in Isabelle/HOL, including CVC4, E, LEO-II, Satallax, SPASS, Vampire, veriT, and Z3. Upon invocation, it heuristically selects relevant lemmas from the thousands available in loaded libraries, translates them along with the current proof obligation to SMT-LIB or TPTP, and invokes the automatic provers. In case of success, the machine-generated proof is translated to an Isar proof that can be inserted into the formal development, so that the external provers do not need to be trusted.
Sledgehammer is part of most Isabelle users’ workflow, and we invoke it dozens of times a day (according to the log files it produces). For example, while formalizing some results that depend on multisets, we found ourselves needing the basic property where A and B are finite multisets, \(\cup \) denotes union defined such that for each element x, the multiplicity of x in \(A \cup B\) is the maximum of the multiplicities of x in A and B, \(\cap \) denotes intersection, and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq32_HTML.gif denotes cardinality. This lemma was not available in Isabelle’s underdeveloped multiset library, so we invoked Sledgehammer. Within 30 s, the tool came back with a brief proof text invoking a suitable tactic with a list of ten lemmas from the library, which we could insert into our formalization: The generated proof refers to 10 library lemmas by name and applies the metis search tactic.

2.2 Isar

Without Sledgehammer, proving the above property could easily have taken 5–15 min. A manual proof, expressed in Isar’s declarative style, might look like this: The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq41_HTML.gif function returns the multiplicity of an element in a multiset. The \(\uplus \) operator denotes the disjoint union operation—for each element, it computes the sum of the multiplicities in the operands (as opposed to the maximum for \(\cup \)).
In Isar proofs, intermediate properties are introduced using https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figm_HTML.gif and proved using a tactic such as simp and auto. Proof blocks ( https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fign_HTML.gif \(\;\ldots \;\) https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figo_HTML.gif ) can be nested. The advantage of Isar proofs over one-line metis proofs is that we can follow and understand the steps. However, for lemmas about multisets and other background theories, we are usually content if we can get a proof automatic and carry on with formalizing the more interesting foreground theory.

2.3 Locales

Isabelle locales are a convenient mechanism for structuring large proofs. A locale fixes types, constants, and assumptions within a specified scope. A schematic example follows: The definition of locale https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq50_HTML.gif implicitly fixes a type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq51_HTML.gif , explicitly fixes a constant https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq52_HTML.gif whose type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq53_HTML.gif may depend on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq54_HTML.gif , and states an assumption https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq55_HTML.gif over https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq56_HTML.gif and  https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq57_HTML.gif . Definitions made within the locale may depend on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq58_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq59_HTML.gif , and lemmas proved within the locale may additionally depend on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq60_HTML.gif . A single locale can introduce several types, constants, and assumptions. Seen from the outside, the lemmas proved in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figq_HTML.gif are polymorphic in type variable https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq61_HTML.gif , universally quantified over https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq62_HTML.gif , and conditional on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq63_HTML.gif .
Locales support inheritance, union, and embedding. To embed https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figr_HTML.gif into https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figs_HTML.gif , or make https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figt_HTML.gif a sublocale of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figu_HTML.gif , we must recast an instance of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figv_HTML.gif into an instance of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figw_HTML.gif , by providing, in the context of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figx_HTML.gif , definitions of the types and constants of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figy_HTML.gif together with proofs of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figz_HTML.gif ’s assumptions. The command emits the proof obligation https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq65_HTML.gif , where \(\upsilon \) and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq67_HTML.gif may depend on types and constants available in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figab_HTML.gif . After the proof, all the lemmas proved in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figac_HTML.gif become available in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figad_HTML.gif , with https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq68_HTML.gif and  https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq69_HTML.gif instantiated with \(\upsilon \) and  https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq71_HTML.gif .

2.4 Refinement Framework

The Refinement Framework [30] provides definitions, lemmas, and tools that assist in the verification of functional and imperative programs via stepwise refinement [59]. The framework defines a programming language that is built on top of a nondeterminism monad. A program is a function that returns an object of type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq72_HTML.gif : The Isabelle syntax is similar to that of Standard ML and other typed functional programming languages: The type is freely generated by its two constructors, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq77_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq78_HTML.gif . The set X in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq79_HTML.gif specifies the possible values that can be returned. The return statement is defined as a constant https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq80_HTML.gif and specifies a single value, whereas https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq81_HTML.gif indicates that an unspecified positive number is returned. The simplest program is a semantic specification of the possible outputs, encapsulated in a https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figaf_HTML.gif constructor. The following example is a nonexecutable specification of the function that subtracts 1 from every element of the list https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq82_HTML.gif (with \(0 - 1\) defined as 0 on natural numbers): Program refinement uses the same source and target language. The refinement relation \(\le \) is defined by https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq87_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq88_HTML.gif for all r. For example, the concrete program https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq89_HTML.gif refines (\(\le \)) the abstract program https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq91_HTML.gif , meaning that all concrete behaviors are possible in the abstract version. The bottom element https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq92_HTML.gif is an unrefinable program; the top element https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figah_HTML.gif represents a run-time failure (e.g., a failed assertion) or divergence.
Refinement can be used to change the program’s data structures and algorithms, towards a more deterministic and usually more efficient program for which executable code can be generated. For example, we can refine the previous specification to a program that uses a ‘while’ loop: The program relies on the following constructs:
  • The ‘do’ construct is a convenient Haskell-inspired syntax for expressing monadic computations (here, on the nondeterminism monad).
  • The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq104_HTML.gif combinator takes a condition, a loop body, and a start value. In our example, the loop’s state is a pair of the form https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq105_HTML.gif . The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq106_HTML.gif subscript in the combinator’s name indicates that the loop must not diverge. Totality is necessary for code generation.
  • The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq107_HTML.gif statement takes an assertion that must always be true when the statement is executed.
  • The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq108_HTML.gif operation returns the \((i + 1)\)st element of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq110_HTML.gif , and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq111_HTML.gif replaces the \((i + 1)\)st element by y.
To prove the refinement lemma https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq113_HTML.gif , we can use the refine_vcg proof method provided by the Refinement Framework. This method heuristically aligns the statements of the two programs and generates proof obligations, which are passed to the user. If the abstract program has the form https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq114_HTML.gif or https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq115_HTML.gif , as is the case here, refine_vcg applies Hoare-logic-style rules to generate the verification conditions. For our example, two of the resulting proof obligations correspond to the termination of the ‘while’ loop and the correctness of the assertion. We can use the measure https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq116_HTML.gif to prove termination.
In a refinement step, we can also change the types. For our small program, if we assume that the natural numbers in the list are all nonzero, we can replace them by integers and use the subtraction operation on integers (for which \(0 - 1 = -1 \not = 0\)). The program remains syntactically identical except for the type annotation:
We want to establish the following relation: If all elements in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq120_HTML.gif are nonzero and the elements of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq121_HTML.gif are positionwise numerically equal to those of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq122_HTML.gif , then any list of integers returned by https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq123_HTML.gif is positionwise numerically equal to some list returned by https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq124_HTML.gif . The framework lets us express preconditions and connections between types using higher-order relations called relators: The relation https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq125_HTML.gif relates natural numbers with their integer counterparts (e.g., https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq126_HTML.gif ). The syntax of relators mimics that of types; for example, if R is the relation for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq127_HTML.gif , then https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq128_HTML.gif is the relation for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq129_HTML.gif , and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq130_HTML.gif is the relation for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq131_HTML.gif . The ternary relator \([p]\,R \rightarrow S\), for functions https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq133_HTML.gif , lifts the relations R and S for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq134_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq135_HTML.gif under precondition p.
The Imperative HOL library [14] defines a heap monad that can express imperative programs with side effects. On top of Imperative HOL, a separation logic, with assertion type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq136_HTML.gif , can be used to express relations https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq137_HTML.gif between plain values, of type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq138_HTML.gif , and data structures on the heap, of type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq139_HTML.gif . For example, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq140_HTML.gif relates lists of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq141_HTML.gif elements with mutable arrays of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq142_HTML.gif elements, where https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq143_HTML.gif is used to relate the elements. The relation between the ! operator on lists and its heap-based counterpart https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figal_HTML.gif can be expressed as follows: The arguments’ relations are annotated with \(^{\mathrm {k}}\) (“keep”) or \(^{\mathrm {d}}\) (“destroy”) superscripts that indicate whether the previous value can still be accessed after the operation has been performed. Reading an array leaves it unchanged, whereas updating it destroys the old array.
The Sepref tool automates the transition from the nondeterminism monad to the heap monad. It keeps track of the values that are destroyed and ensures that they are not used later in the program. Given a suitable source program, it can automatically generate the target program and prove the corresponding refinement lemma automatically. The main difficulty is that some low-level operations have side conditions, which we must explicitly discharge by adding assertions at the right points in the source program to guide Sepref.
The following command generates a heap program called https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq146_HTML.gif from the source program https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq147_HTML.gif : The generated array-based program is The end-to-end refinement theorem, obtained by composing the refinement lemmas, is
If we want to execute the program efficiently, we can translate it to Standard ML using Isabelle’s code generator [23]. The following imperative code, including its dependencies, is generated (in slightly altered form):
The ML idiom \(\texttt {(fn () => \ldots ) ()}\) is inserted to delay the evaluation of the body, so that the side effects occur in the intended order.

3 Abstract CDCL

The abstract CDCL calculus by Nieuwenhuis et al. [43] forms the first layer of our refinement chain. The formalization relies on basic Isabelle libraries for lists and multisets and on custom libraries for propositional logic. Properties such as partial correctness and termination (given a suitable strategy) are inherited by subsequent layers.

3.1 Propositional Logic

The DPLL and CDCL calculi distinguish between literals whose truth value has been decided arbitrarily and those that are entailed by the current decisions; for the latter, it is sometimes useful to know which clause entails it. To capture this information, we introduce a type of annotated literals, parameterized by a type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figar_HTML.gif of propositional variables and a type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figas_HTML.gif of clauses:
The simpler calculi do not use https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figau_HTML.gif ; they take https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq162_HTML.gif , a singleton type whose unique value is (). Informally, we write A, \(\lnot \,A\), and \(L^\dag \) for positive, negative, and decision literals, and we write \(L^C\) (with https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq166_HTML.gif ) or simply L (if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq167_HTML.gif or if the clause C is irrelevant) for propagated literals. The unary minus operator is used to negate a literal, with \(- (\lnot \,A) = A\).
As is customary in the literature [1, 57], clauses are represented by multisets, ignoring the order of literals but not repetitions. A https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq169_HTML.gif is a (finite) multiset over https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq170_HTML.gif . Clauses are often stored in sets or multisets of clauses. To ease reading, we write clauses using logical symbols (e.g., \(\bot \), L, and \(C \vee D\) for \(\emptyset \), \(\{L\}\), and \(C \uplus D\)). Given a clause C, we write \(\lnot \,C\) for the formula that corresponds to the clause’s negation.
Given a set or multiset I of literals, \(I \vDash C\) is true if and only if C and I share a literal. This is lifted to sets and multisets of clauses or formulas: https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq178_HTML.gif . A set or multiset is satisfiable if there exists a consistent set or multiset of literals I such that \(I \vDash N\). Finally, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq180_HTML.gif These notations are also extended to formulas.

3.2 DPLL with Backjumping

Nieuwenhuis et al. present CDCL as a set of transition rules on states. A state is a pair https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq181_HTML.gif , where M is the trail and N is the multiset of clauses to satisfy. In a slight abuse of terminology, we will refer to the multiset of clauses as the “clause set.” The trail is a list of annotated literals that represents the partial model under construction. The empty list is written https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq182_HTML.gif . Somewhat nonstandardly, but in accordance with Isabelle conventions for lists, the trail grows on the left: Adding a literal L to M results in the new trail \(L \cdot M\), where https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq184_HTML.gif . The concatenation of two lists is written \(M \mathbin {@} M'\). To lighten the notation, we often build lists from elements and other lists by simple juxtaposition, writing \(M L M'\) for \(M \mathbin {@} L \cdot M'\).
The core of the CDCL calculus is defined as a transition relation https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq188_HTML.gif , an extension of classical DPLL [17] with nonchronological backtracking, or backjumping. The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figav_HTML.gif part of the name refers to Nieuwenhuis, Oliveras, and Tinelli. The calculus consists of three rules, starting from an initial state https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq189_HTML.gif . In the following, we abuse notation, implicitly converting \(\vDash \)’s first operand from a list to a set and ignoring annotations on literals:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figaw_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq191_HTML.gif
    if N contains a clause \(C\vee L\) such that \(M \vDash \lnot \, C\) and L is undefined in M (i.e., neither \(M \vDash L\) nor \(M \vDash - L\))
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figax_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq196_HTML.gif
    if the atom of L occurs in N and is undefined in M
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figay_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq197_HTML.gif
    if N contains a conflicting clause C (i.e., \(M'L^\dag M\vDash \lnot \, C\)) and there exists a clause \(C'\vee L'\) such that \(N\vDash C'\vee L'\), \(M \vDash \lnot \, C'\), and \(L'\) is undefined in M but occurs in N or in \(M'L^\dag \)
The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figaz_HTML.gif rule is more general than necessary for capturing DPLL, where it suffices to negate the leftmost decision literal. The general rule can also express nonchronological backjumping, if \(C'\vee L'\) is a new clause derived from N (but not necessarily in N).
We represented the calculus as an inductive predicate. For the sake of modularity, we formalized the rules individually as their own predicates and combined them to obtain https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq205_HTML.gif : Since there is no call to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq210_HTML.gif in the assumptions, we could also have used a plain https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbb_HTML.gif here, but the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbc_HTML.gif command provides convenient introduction and elimination rules. The predicate operates on states of type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq211_HTML.gif . To allow for refinements, this type is kept as a parameter of the calculus, using a locale that abstracts over it and that provides basic operations to manipulate states: where https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbe_HTML.gif converts an abstract state of type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq223_HTML.gif to a pair (MN). Inside the locale, states are compared extensionally: https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq224_HTML.gif is true if the two states have identical trails and clause sets (i.e., if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq225_HTML.gif ). This comparison ignores any other fields that may be present in concrete instantiations of the abstract state type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq226_HTML.gif .
Each calculus rule is defined in its own locale, based on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbf_HTML.gif and parameterized by additional side conditions. Complex calculi are built by inheriting and instantiating locales providing the desired rules. For example, the following locale provides the predicate corresponding to the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbg_HTML.gif rule, phrased in terms of an abstract DPLL state:
Following a common idiom, the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbi_HTML.gif calculus is distributed over two locales: The first locale, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbj_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbk_HTML.gif , defines the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbl_HTML.gif calculus; the second locale, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbm_HTML.gif , extends it with an assumption expressing a structural invariant over https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbn_HTML.gif that is instantiated when proving concrete properties later. This cannot be achieved with a single locale, because definitions may not precede assumptions.
Theorem 1
(Termination [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbo_HTML.gif ]) The relation https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq238_HTML.gif is well founded.
Termination is proved by exhibiting a well-founded relation \(\prec \) such that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq240_HTML.gif whenever https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq241_HTML.gif . Let https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq242_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq243_HTML.gif with the decompositions
$$\begin{aligned} M&= \smash {M_n L_n^\dag \cdots M_1 L_1^\dag M_0}&M'&= \smash {M'_{n'} \smash {L'}_{\!\!n\smash {'}}^\dag \cdots M'_1 \smash {L'}_{\!\!\!1}^\dag M'_0} \end{aligned}$$
where the trail segments \(M_0,\ldots ,M_n,M'_0,\ldots ,M'_{n\smash {'}}\) contain no decision literals. Let V be the number of distinct variables occurring in the initial clause set N. Now, let \(\nu \,M = V - \left| M\right| \), indicating the number of unassigned variables in the trail M. Nieuwenhuis et al. define \(\prec \) such that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq247_HTML.gif if
(1)
there exists an index \(i \le n, n'\) such that \([\nu \, M'_0,\, \cdots ,\, \nu \, M'_{i-1}] = [\nu \, M_0,\, \cdots ,\, \nu \, M_{i-1}]\) and \(\nu \,M'_i < \nu \,M_i\); or
 
(2)
\([\nu \, M_0,\, \cdots ,\, \nu \, M_{n}]\) is a strict prefix of \([\nu \, M'_0,\, \cdots ,\, \nu \, M'_{n'}]\).
 
This order is not to be confused with the lexicographic order: We have https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq253_HTML.gif by condition (2), whereas https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq254_HTML.gif . Yet the authors justify well-foundedness by appealing to the well-foundedness of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq255_HTML.gif on bounded lists over finite alphabets. In our proof, we clarify and simplify matters by mapping states https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq256_HTML.gif to lists \(\bigl [\left| M_0\right| , \cdots ,\left| M_n\right| \bigr ]\), without appealing to \(\nu \). Using the standard lexicographic order, states become larger with each transition: The lists corresponding to possible states are bounded by the list \([V, \dots , V]\) consisting of V occurrences of V,  thereby delimiting a finite domain \(D = \{[k_1,\ldots ,k_n] \mid k_1,\cdots ,k_n,n \le V\}\). We take \(\prec \) to be the restriction of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq266_HTML.gif to D. A variant of this approach is to encode lists into a measure https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbq_HTML.gif and let https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq267_HTML.gif , building on the well-foundedness of > over bounded sets of natural numbers.
A final state is a state from which no transitions are possible. Given a relation https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq268_HTML.gif , we write https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq269_HTML.gif for the right-restriction of its reflexive transitive closure to final states (i.e., https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq270_HTML.gif if and only if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq271_HTML.gif ).
Theorem 2
(Partial Correctness [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbr_HTML.gif ]) If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq272_HTML.gif , then N is satisfiable if and only if \(M\vDash N.\)
We first prove structural invariants on arbitrary states https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq274_HTML.gif reachable from https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq275_HTML.gif , namely: (1) each variable occurs at most once in \(M'\); (2) if \(M' = M_2 L M_1\) where L is propagated, then \(M_1, N \vDash L\). From these invariants, together with the constraint that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq279_HTML.gif is a final state, it is easy to prove the theorem.

3.3 Classical DPLL

The locale machinery allows us to derive a classical DPLL calculus from DPLL with backjumping. We call this calculus https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq280_HTML.gif . It is achieved through a https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq281_HTML.gif locale that restricts the Backjump rule so that it performs only chronological backtracking:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbs_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq282_HTML.gif
    if N contains a conflicting clause and \(M'\) contains no decision literals
Because of the locale parameters, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq284_HTML.gif is strictly speaking a family of calculi.
Lemma 3
(Backtracking [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbt_HTML.gif ]) The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbu_HTML.gif rule is a special case of the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbv_HTML.gif rule.
The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbw_HTML.gif rule depends on two clauses: a conflict clause C and a clause \(C'\vee L'\) that justifies the propagation of \(L'\!.\) The conflict clause is specified by https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbx_HTML.gif . As for \(C'\vee L'\), given a trail \(M'L^\dag M\) decomposable as \(M_nL^\dag M_{n-1}L_{n\smash {-1}}^\dag \cdots M_1 L_1^ \dag M_0\) where \(M_0,\cdots ,M_n\) contain no decision literals, we can take \(C' = -L_1\vee \cdots \vee -L_{n-1}\).
Consequently, the inclusion https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq292_HTML.gif holds. In Isabelle, this is expressed as a locale instantiation: https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq293_HTML.gif is made a sublocale of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq294_HTML.gif , with a side condition restricting the application of the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figby_HTML.gif rule. The partial correctness and termination theorems are inherited from the base locale. https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq295_HTML.gif instantiates the abstract state type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq296_HTML.gif with a concrete type of pairs. By discharging the locale assumptions emerging with the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figbz_HTML.gif command, we also verify that these assumptions are consistent. Roughly:
If a conflict cannot be resolved by backtracking, we would like to have the option of stopping even if some variables are undefined. A state https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq310_HTML.gif is conclusive if \(M \vDash N\) or if N contains a conflicting clause and M contains no decision literals. For https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq312_HTML.gif , all final states are conclusive, but not all conclusive states are final.
Theorem 4
(Partial Correctness [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcb_HTML.gif ])
If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcc_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq313_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq314_HTML.gif is a conclusive state, N is satisfiable if and only if \(M\vDash N\).
The theorem does not require stopping at the first conclusive state. In an implementation, testing \(M\vDash N\) can be expensive, so a solver might fail to notice that a state is conclusive and continue for some time. In the worst case, it will stop in a final state—which is guaranteed to exist by Theorem 1. In practice, instead of testing whether \(M\vDash N\), implementations typically apply the rules until every literal is set. When N is satisfiable, this produces a total model.

3.4 The CDCL Calculus

The abstract CDCL calculus extends https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcd_HTML.gif with a pair of rules for learning new lemmas and forgetting old ones:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figce_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq318_HTML.gif    if \(N\vDash C\) and each atom of C is in N or M
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcf_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq320_HTML.gif    if \(N\vDash C\)   
In practice, the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcg_HTML.gif rule is normally applied to clauses built exclusively from atoms in M, because the learned clause is false in M. This property eventually guarantees that the learned clause is not redundant (e.g., it is not already contained in N).
We call this calculus https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq322_HTML.gif . In general, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq323_HTML.gif does not terminate, because it is possible to learn and forget the same clause infinitely often. But for some instantiations of the parameters with suitable restrictions on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figch_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figci_HTML.gif , the calculus always terminates.
Theorem 5
(Termination [20,   https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcj_HTML.gif ])
   Let https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq324_HTML.gif be an instance of the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq325_HTML.gif calculus (i.e., https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq326_HTML.gif ). If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq327_HTML.gif admits no infinite chains consisting exclusively of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figck_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcl_HTML.gif transitions, then https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq328_HTML.gif is well founded.
In many SAT solvers, the only clauses that are ever learned are the ones used for backtracking. If we restrict the learning so that it is always done immediately before backjumping, we can be sure that some progress will be made between a https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcm_HTML.gif and the next https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcn_HTML.gif or https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figco_HTML.gif . This idea is captured by the following combined rule:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcp_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq329_HTML.gif
    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq330_HTML.gif , L, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq331_HTML.gif , M, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq332_HTML.gif , N satisfy https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcq_HTML.gif ’s side conditions
The calculus variant that performs this rule instead of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcr_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcs_HTML.gif is called https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figct_HTML.gif . Because a single https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcu_HTML.gif transition corresponds to two transitions in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq333_HTML.gif , the inclusion https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq334_HTML.gif does not hold. Instead, we have https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq335_HTML.gif . Each step of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcv_HTML.gif corresponds to a single step in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcw_HTML.gif or a two-step sequence consisting of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcx_HTML.gif followed by https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcy_HTML.gif .

3.5 Restarts

Modern SAT solvers rely on a dynamic decision literal heuristic. They periodically restart the proof search to apply the effects of a changed heuristic. This helps the calculus focus on a part of the initial clauses where it can make progress. Upon a restart, some learned clauses may be removed, and the trail is reset to  https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq336_HTML.gif . Since our calculus has a https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figcz_HTML.gif rule, the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figda_HTML.gif rule needs only to clear the trail. Adding https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdb_HTML.gif to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq337_HTML.gif yields https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdc_HTML.gif . However, this calculus does not terminate, because https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdd_HTML.gif can be applied infinitely often.
A working strategy is to gradually increase the number of transitions between successive restarts. This is formalized via a locale parameterized by a base calculus https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figde_HTML.gif and an unbounded function  https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq338_HTML.gif . Nieuwenhuis et al. require f to be strictly increasing, but unboundedness is sufficient.
The extended calculus https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdf_HTML.gif operates on states of the form https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq339_HTML.gif , where https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq340_HTML.gif is a state in the base calculus and n counts the number of restarts. To simplify the presentation, we assume that bases states https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq341_HTML.gif are pairs (MN). The calculus https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdg_HTML.gif starts in the state https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq342_HTML.gif and consists of two rules:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdh_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq343_HTML.gif    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq344_HTML.gif and \(m \ge f\>n\)
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdi_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq346_HTML.gif    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq347_HTML.gif
The symbol https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq348_HTML.gif represents the base calculus https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdj_HTML.gif transition relation, and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq349_HTML.gif denotes an m-step transition in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdk_HTML.gif . The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdl_HTML.gif in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdm_HTML.gif reminds us that we count the number of transitions; in Sect. 4.5, we will review an alternative strategy based on the number of conflicts or learned clauses. Termination relies on a measure \(\mu _V\) associated with https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdn_HTML.gif that may not increase from restart to restart: If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq351_HTML.gif , then https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq352_HTML.gif . The measure may depend on V,  the number of variables occurring in the problem.
We instantiated the locale parameter https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq353_HTML.gif with https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq354_HTML.gif and f with the Luby sequence (\(1, 1, 2, 1, 1, 2, 4, \cdots \)) [35], with the restriction that no clause containing duplicate literals is ever learned, thereby bounding the number of learnable clauses and hence the number of transitions taken by  https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdo_HTML.gif .
Figure 1a summarizes the syntactic dependencies between the calculi reviewed in this section. An arrow https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq356_HTML.gif indicates that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq357_HTML.gif is defined in terms of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq358_HTML.gif . Figure 1b presents the refinements between the calculi. An arrow https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq359_HTML.gif indicates that we proved https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq360_HTML.gif or some stronger result—either by locale embedding ( https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdp_HTML.gif ) or by simulating https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdq_HTML.gif ’s behavior in terms of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdr_HTML.gif .

4 A Refined CDCL Towards an Implementation

The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figds_HTML.gif calculus captures the essence of modern SAT solvers without imposing a policy on when to apply specific rules. In particular, the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdt_HTML.gif rule depends on a clause \(C' \vee L'\) to justify the propagation of a literal, but does not specify a procedure for coming up with this clause. For https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdu_HTML.gif , Weidenbach developed a calculus that is more specific in this respect, and closer to existing solver implementations, while keeping many aspects unspecified [57]. This calculus, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdv_HTML.gif , is also formalized in Isabelle and connected to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdw_HTML.gif .

4.1 The New DPLL Calculus

Independently from the previous section, we formalized DPLL as described in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdx_HTML.gif . The calculus operates on states (MN), where M is the trail and N is the initial clause set. It consists of three rules:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdy_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq362_HTML.gif    if \(C\vee L \in N \uplus U\), \(M \vDash \lnot \, C\), and L is undefined in M
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figdz_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq365_HTML.gif    if L is undefined in M and occurs in N
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figea_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq366_HTML.gif
    if N contains a conflicting clause and \(M'\) contains no decision literals
https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figeb_HTML.gif performs chronological backtracking: It undoes the last decision and picks the opposite choice. Conclusive states for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq368_HTML.gif are defined as for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq369_HTML.gif (Sect. 3.3).
The termination and partial correctness proofs given by Weidenbach depart from Nieuwenhuis et al. We also formalized them:
Theorem 6
(Termination [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figec_HTML.gif ]) The relation https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq370_HTML.gif is well founded.
Termination is proved by exhibiting a well-founded relation that includes https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq371_HTML.gif . Let V be the number of distinct variables occurring in the clause set N. The weight \(\nu \,L\) of a literal L is 2 if L is a decision literal and 1 otherwise. The measure is
https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Equ2_HTML.gif
Lists are compared using the lexicographic order, which is well founded because there are finitely many literals and all lists have the same length. It is easy to check that the measure decreases with each transition:
Theorem 7
(Partial Correctness [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figee_HTML.gif ]) If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq379_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq380_HTML.gif is a conclusive state, N is satisfiable if and only if \(M\vDash N.\)
The proof is analogous to the proof of Theorem 2. Some lemmas are shared between both proofs. Moreover, we can link Weidenbach’s DPLL calculus with the version we derived from https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq382_HTML.gif in Sect. 3.3:
Theorem 8
(DPLL [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figef_HTML.gif ]) For all states https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq383_HTML.gif that satisfy basic structural invariants, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq384_HTML.gif if and only if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq385_HTML.gif
This provides another way to establish Theorems 6 and 7. Conversely, the simple measure that appears in the above termination proof can also be used to establish the termination of the more general https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq386_HTML.gif calculus (Theorem 1).

4.2 The New CDCL Calculus

The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figeg_HTML.gif calculus operates on states https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq387_HTML.gif , where M is the trail; N and U are the sets of initial and learned clauses, respectively; and D is a conflict clause, or the distinguished clause \(\top \) if no conflict has been detected.
In the trail M,  each decision literal L is marked as such (\(L^\dag \)—i.e., https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq390_HTML.gif ), and each propagated literal L is annotated with the clause C that caused its propagation (\(L^C\)—i.e., https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq392_HTML.gif ). The level of a literal L in M is the number of decision literals to the right of the atom of L in M, or 0 if the atom is undefined. The level of a clause is the highest level of any of its literals, with 0 for \(\bot \), and the level of a state is the maximum level (i.e., the number of decision literals). The calculus assumes that N contains no clauses with duplicate literals and never produces clauses containing duplicates.
The calculus starts in a state https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq394_HTML.gif . The following rules apply as long as no conflict has been detected:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figeh_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq395_HTML.gif
    if \(C\vee L \in N \uplus U\), \(M \vDash \lnot \, C\), and L is undefined in M
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figei_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq398_HTML.gif    if L is undefined in M and occurs in N
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figej_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq399_HTML.gif    if \(D \in N \uplus U\) and \(M \vDash \lnot \, D\)
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figek_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq402_HTML.gif    if \(M \not \vDash N\)
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figel_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq404_HTML.gif    if \(M \not \vDash N\) and M contains no literal \(L^C\)
The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figem_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figen_HTML.gif rules generalize their https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figeo_HTML.gif counterparts. Once a conflict clause has been detected and stored in the state, the following rules cooperate to reduce it and backtrack, exploring a first unique implication point [6, Chapter 3]:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figep_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq407_HTML.gif    if \(D \notin \{\bot ,\top \}\) and \(-L\) does not occur in D
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figeq_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq410_HTML.gif
    if D has the same level as the current state
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figer_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq411_HTML.gif
    if L has the level of the current state, D has a lower level, and K and D have the same level
Exhaustive application of these three rule corresponds to a single step by the combined learning and nonchronological backjumping rule https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figes_HTML.gif from https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figet_HTML.gif . The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figeu_HTML.gif rule is even more general and can be used to express learned clause minimization [54].
In https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figev_HTML.gif , \(C \cup D\) is the same as \(C \vee D\) (i.e., \(C \uplus D\)), except that it keeps only one copy of the literals that belong to both C and D. When performing propagations and processing conflict clauses, the calculus relies on the invariant that clauses never contain duplicate literals. Several other structural invariants hold on all states reachable from an initial state, including the following: The clause annotating a propagated literal of the trail is a member of \(N \uplus U.\) Some of the invariants were not mentioned in the textbook (e.g., whenever \(L^C\) occurs in the trail, L is a literal of C). Formalization helped develop a better understanding of the data structure and clarify the book.
Like https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figew_HTML.gif , https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figex_HTML.gif has a notion of conclusive state. A state https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figey_HTML.gif is conclusive if \(D = \top \) and \(M\vDash N\) or if \(D = \bot \) and N is unsatisfiable. The calculus always terminates, but without a suitable strategy, it can block in an inconclusive state. At the end of the following derivation, neither https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figez_HTML.gif nor https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfa_HTML.gif can process the conflict further:

4.3 A Reasonable Strategy

To prove correctness, we assume a reasonable strategy: https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfc_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfd_HTML.gif are preferred over https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfe_HTML.gif ; https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figff_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfg_HTML.gif are not applied. (We will lift the restriction on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfh_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfi_HTML.gif in Sect. 4.5.) The resulting calculus, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq420_HTML.gif , refines https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfj_HTML.gif with the assumption that derivations are produced by a reasonable strategy. This assumption is enough to ensure that the calculus can backjump after detecting a nontrivial conflict clause other than \(\bot \). The crucial invariant is the existence of a literal with the highest level in any conflict, so that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfk_HTML.gif can be applied. The textbook suggests preferring https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfl_HTML.gif to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfm_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfn_HTML.gif to the other rules. While this makes sense in an implementation, it is not needed for any of our metatheoretical results.
Theorem 9
(Partial Correctness [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfo_HTML.gif ]) If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfp_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq422_HTML.gif and N contains no clauses with duplicate literals, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq423_HTML.gif is a conclusive state.
Once a conflict clause has been stored in the state, the clause is first reduced by a chain of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfq_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfr_HTML.gif transitions. Then, there are two scenarios: (1) the conflict is solved by a https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfs_HTML.gif , at which point the calculus may resume propagating and deciding literals; (2) the reduced conflict is \(\bot \), meaning that N is unsatisfiable—i.e., for unsatisfiable clause sets, the calculus generates a resolution refutation.
The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq425_HTML.gif calculus is designed to have respectable complexity bounds. One of the reasons for this is that the same clause cannot be learned twice:
Theorem 10
(No Relearning [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figft_HTML.gif ]) 
  If we have https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfu_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq426_HTML.gif then no https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfv_HTML.gif transition is possible from the latter state causing the addition of a clause from \(N \uplus U\) to U.
The formalization of this theorem posed some challenges. The informal proof in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfw_HTML.gif is as follows (with slightly adapted notations):
Proof  By contradiction. Assume CDCL learns the same clause twice, i.e., it reaches a state https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq428_HTML.gif where https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfx_HTML.gif is applicable and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq429_HTML.gif More precisely, the state has the form https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfy_HTML.gif where the \(K_i\), \(i>1\) are propagated literals that do not occur complemented in D, as otherwise D cannot be of level i. Furthermore, one of the \(K_i\) is the complement of L. But now, because https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq433_HTML.gif is false in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq434_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq435_HTML.gif instead of deciding https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq436_HTML.gif the literal L should be propagated by a reasonable strategy. A contradiction. Note that none of the \(K_i\) can be annotated with https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq438_HTML.gif . \(\square \)
Many details are missing. To find the contradiction, we must show that there exists a state in the derivation with the trail \(M_2K^\dag M_1\), and such that \(D\vee L \in N \uplus U.\) The textbook does not explain why such a state is guaranteed to exist. Moreover, inductive reasoning is hidden under the ellipsis notation (\(K_n\cdots K_2\)). Such a high-level proof might be suitable for humans, but the details are needed in Isabelle, and Sledgehammer alone cannot fill in such large gaps, especially if induction is needed. The first version of the formal proof was over 700 lines long and is among the most difficult proofs we carried out.
We later refactored the proof. Following the book, each transition in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq443_HTML.gif was normalized by applying https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figfz_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figga_HTML.gif exhaustively. For example, we defined https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgb_HTML.gif so that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq444_HTML.gif if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgc_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgd_HTML.gif cannot be applied to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq445_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq446_HTML.gif for some state T. However, normalization is not necessary. It is simpler to define https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq447_HTML.gif as https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq448_HTML.gif , with the same condition on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq449_HTML.gif as before. This change shortened the proof by about 200 lines. In a subsequent refactoring, we further departed from the book: We proved the invariant that all propagations have been performed before deciding a new literal. The core argument (“the literal L should be propagated by a reasonable strategy”) remains the same, but we do not have to reason about past transitions to argue about the existence of an earlier state. The invariant also makes it possible to generalize the statement of Theorem 10: We can start from any state that satisfies the invariant, not only from an initial state. The final version of the proof is 250 lines long.
Using Theorem 10 and assuming that only backjumping has a cost, we get a complexity of \(\mathrm {O}(3^V)\), where V is the number of different propositional variables. If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figge_HTML.gif is always preferred over https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgf_HTML.gif , the learned clause is never redundant in the sense of ordered resolution [57], yielding a complexity bound of \(\mathrm {O}(2^V)\). We have not formalized this yet.
In https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgg_HTML.gif , and in our formalization, Theorem 10 is also used to establish the termination of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgh_HTML.gif . However, the argument for the termination of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgi_HTML.gif also applies to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgj_HTML.gif irrespective of the strategy, a stronger result. To lift this result, we must show that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgk_HTML.gif refines https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgl_HTML.gif .

4.4 Connection with Abstract CDCL

It is interesting to show that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgm_HTML.gif refines https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgn_HTML.gif , to establish beyond doubt that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgo_HTML.gif is a CDCL calculus and to lift the termination proof and any other general results about https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgp_HTML.gif . The states are easy to connect: We interpret a https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgq_HTML.gif tuple https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq452_HTML.gif as a https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgr_HTML.gif pair https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq453_HTML.gif , ignoring C.
The main difficulty is to relate the low-level conflict-related https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgs_HTML.gif rules to their high-level counterparts. Our solution is to introduce an intermediate calculus, called https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgt_HTML.gif , that combines consecutive low-level transitions into a single transition. This calculus refines both https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq454_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgu_HTML.gif and is sufficiently similar to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgv_HTML.gif so that we can transfer termination and other properties from https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq455_HTML.gif to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq456_HTML.gif through it.
Whenever the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgw_HTML.gif calculus performs a low-level sequence of transitions of the form https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq457_HTML.gif , the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgx_HTML.gif calculus performs a single transition of a new rule that subsumes all four low-level rules:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgy_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq458_HTML.gif
    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq459_HTML.gif
When simulating https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figgz_HTML.gif in terms of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figha_HTML.gif , two interesting scenarios arise. First, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighb_HTML.gif ’s behavior may comprise a backjump: The rule can be simulated using https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighc_HTML.gif ’s https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighd_HTML.gif rule. The second scenario arises when the conflict clause is reduced to \(\bot \), leading to a conclusive final state. Then, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighe_HTML.gif has no counterpart in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighf_HTML.gif . The two calculi are related as follows: If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq461_HTML.gif , either https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq462_HTML.gif or https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq463_HTML.gif is a conclusive state. Since https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighg_HTML.gif is well founded, so is https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighh_HTML.gif . This implies that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighi_HTML.gif without https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighj_HTML.gif terminates.
Since https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighk_HTML.gif is mostly a rephrasing of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighl_HTML.gif , it makes sense to restrict it to a reasonable strategy that prefers https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighm_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighn_HTML.gif over https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figho_HTML.gif , yielding https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighp_HTML.gif . The two strategy-restricted calculi have the same end-to-end behavior:
https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Equ3_HTML.gif

4.5 A Strategy with Restart and Forget

We could use the same strategy for restarts as in Sect. 3.5, but we prefer to exploit Theorem 10, which asserts that no relearning is possible. Since only finitely many different duplicate-free clauses can ever be learned, it is sufficient to increase the number of learned clauses between two restarts to ensure termination. This criterion is the norm in modern SAT solvers. The lower bound on the number of learned clauses is given by an unbounded function https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq464_HTML.gif . In addition, we allow an arbitrary subset of the learned clauses to be forgotten upon a restart but otherwise forbid https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighq_HTML.gif . The calculus https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighr_HTML.gif that realizes these ideas is defined by the two rules
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighs_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq465_HTML.gif
    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq466_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq467_HTML.gif
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fight_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq468_HTML.gif    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq469_HTML.gif
We formally proved that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighu_HTML.gif is totally correct. Figure 2 summarizes the situation, following the conventions of Fig. 1.

4.6 Incremental Solving

SMT solvers combine a SAT solver with theory solvers (e.g., for uninterpreted functions and linear arithmetic). The main loop runs the SAT solver on a clause set. If the SAT solver answers “unsatisfiable,” the SMT solver is done; otherwise, the main loop asks the theory solvers to provide further, theory-motivated clauses to exclude the current candidate model and force the SAT solver to search for another one. This design crucially relies on incremental SAT solving: The possibility of adding new clauses to the clause set C of a conclusive satisfiable state and of continuing from there.
As a step towards formalizing SMT, we designed a calculus https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighv_HTML.gif that provides incremental solving on top of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq470_HTML.gif :
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighw_HTML.gif \(_{\,C}\) https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq472_HTML.gif
    if \(M \not \vDash \lnot \, C\) and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq474_HTML.gif
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighx_HTML.gif \(_{\,C}\) https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq476_HTML.gif
    if \(L M \vDash \lnot \, C\), \(-L \in C\), \(M'\) contains no literal of C, and
    https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq480_HTML.gif
We first run the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq481_HTML.gif calculus on a clause set N, as usual. If N is satisfiable, we can add a nonempty, duplicate-free clause C to the set of clauses and apply one of the two above rules. These rules adjust the state and relaunch https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq482_HTML.gif .
Theorem 11
(Partial Correctness [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighy_HTML.gif ]) If state https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq483_HTML.gif is conclusive and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq484_HTML.gif , then https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq485_HTML.gif is conclusive.
The key is to prove that the structural invariants that hold for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq486_HTML.gif still hold after adding the new clause to the state. Then the proof is easy because we can reuse the invariants we have already proved about https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq487_HTML.gif .

5 A Naive Functional Implementation of CDCL

Sections 3 and 4 presented variants of DPLL and CDCL as parameterized transition systems, formalized using locales and inductive predicates. We now present a deterministic SAT solver that implements https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq488_HTML.gif , expressed as a functional program in Isabelle.
When implementing a calculus, we must make many decisions regarding the data structures and the order of rule applications. Our functional SAT solver is very naive and does not feature any optimizations beyond those already present in the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq489_HTML.gif calculus; in Sect. 6, we will refine the calculus further to capture the two-watched-literal optimization and present an imperative implementation relying on mutable data structures.
For our functional implementation, we choose to represent states by tuples https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq490_HTML.gif , where propositional variables are coded as natural numbers and multisets as lists. Each transition rule in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq491_HTML.gif is implemented by a corresponding function. For example, the function that implements the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Fighz_HTML.gif rule is given below:
The functions corresponding to the different rules are combined into a single function that performs one step. The combinator https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq500_HTML.gif takes a list of functions implementing rules and tries to apply them in turn, until one of them has an effect on the state: The main loop applies https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figic_HTML.gif until the transition has no effect: The main loop is a recursive program, specified using the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figie_HTML.gif command [27]. For Isabelle to accept the recursive definition of the main loop as a terminating program, we must discharge a proof obligation stating that its call graph is well founded. This is a priori unprovable: The solver is not guaranteed to terminate if starting in an arbitrary state.
To work around this, we restrict the input by introducing a subset type that contains a strong enough structural invariant, including the duplicate-freedom of all the lists in the data structure. With the invariant in place, it is easy to show that the call graph is included in the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq507_HTML.gif calculus, allowing us to reuse its termination argument. The partial correctness theorem can then be lifted, meaning that the SAT solver is a decision procedure for propositional logic.
The final step is to extract running code. Using Isabelle’s code generator [23], we can translate the program to Haskell, OCaml, Scala, or Standard ML. The resulting program is syntactically analogous to the source program in Isabelle, including its dependencies, and uses the target language’s facilities for datatypes and recursive functions with pattern matching. Invariants on subset types are ignored; when invoking the solver from outside Isabelle, the caller is responsible for ensuring that the input satisfies the invariant. The entire program is about 520 lines long in Standard ML. It is not efficient, due to its extensive reliance on lists, but it satisfies the need for a proof of concept.

6 An Imperative Implementation of CDCL

As an impure functional language, Standard ML provides assignment and mutable arrays. We use these features to derive an imperative SAT solver that is much more efficient than the functional implementation. We start by integrating the two-watched-literal optimization into https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq508_HTML.gif . Then we refine the calculus to apply rules deterministically, and we generate code that uses arrays to represent clauses and clause sets.
The resulting SAT solver is orders of magnitude faster than the naive functional implementation described in the previous section. However, it is one to two orders of magnitude slower than DPT 2.0 [21], the fastest imperative OCaml solver we know of, because it does not implement restarts or any sophisticated heuristics for learned clause minimization. We expect that many missing heuristics will be straightforward to implement. Due to inefficient memory handling, our solver is not competitive with state-of-the-art solvers.

6.1 The Two-Watched-Literal Scheme

The two-watched-literal (2WL or TWL) scheme [42] is a data structure that makes it possible to efficiently identify candidate clauses for unit propagation and conflict. In each non-unit clause, we distinguish two watched literals—the other literals are unwatched. Initially, any of a non-unit clause’s literals can be chosen to be watched. In the simplest version of the scheme, the solver maintains the following invariant for each non-unit clause:
  • (\(\alpha \)) A watched literal may be false only if all the unwatched literals are false.
As a consequence of this invariant, setting an unwatched literal will never yield a candidate for propagation or conflict, because the two watched literals can then only be true or unset.
For each literal L, the clauses that contain a watched L are chained together in a list (typically a linked list). When a literal L becomes true, the solver needs only to iterate through the list associated with \(-L\) to find candidates for propagation or conflict. For each candidate clause, there are four possibilities:
1.
If some of the unwatched literals are not false, we restore the invariant by updating the clause: We start watching one of the non-false unwatched literals instead of \(-L\).
 
2.
Otherwise, we consider the clause’s other watched literal:
2.1.
If it is not set, we can propagate it.
 
2.2.
If it is false, we have found a conflict.
 
2.3.
If it is true, there is nothing to do.
 
 
In https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figif_HTML.gif , a weaker invariant is used, inspired by MiniSat [18]:
  • (\(\beta \)) A watched literal may be false only if the other watched literal is true or all the unwatched literals are false.
This invariant is easier to establish than (\(\alpha \)): If the other watched literal is true, there is nothing to do, regardless of the truth value of the unwatched literals. The four-step procedure above can easily be adapted, by pulling step 2.3 to the front.
To illustrate how the solver maintains the invariant, whether (\(\alpha \)) or (\(\beta \)), we consider the small problem shown in Fig. 3. The clauses are numbered from 1 to 4. Gray cells identify watched literals. Thus, clause 1 is \(\lnot \,B\vee C \vee A\), where \(\lnot \,B\) and C are watched.
1.
We start with an empty trail and an arbitrary choice of watched literals (Fig. 3a).
 
2.
We decide to make A true. The trail becomes \(A^\dag \). In clauses 2 and 3, we exchange \(\lnot \,A\) with another literal to restore the invariant (Fig. 3b).
 
3.
We propagate B from clause 4. The trail becomes https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq520_HTML.gif . In clause 1, we exchange \(\lnot \,B\) with A to restore the invariant (Fig. 3c).
 
4.
From clauses 2 and 3, we find out that we can propagate \(\lnot \,C\) and C. We choose C. The trail becomes https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq523_HTML.gif . Clause 2 is in conflict. The decision made in step 2 was wrong, so we backtrack.
 
Upon backtracking, there is no need to update the data structure. A key property for the data structure’s efficiency is that the invariant is preserved when we remove literals from the trail.
In MiniSat and other implementations, propagation is performed immediately whenever a suitable clause is discovered, and when a conflict is detected, the solver stops updating the data structure and processes the conflict. Using this more efficient strategy, the following scenario is possible for the example of Fig. 3:
1.
We start with an empty trail and the same watched literals as before (Fig. 3a).
 
2.
We decide to make A true. The trail becomes \(A^\dag \).
 
3.
We propagate B from clause 4. The trail becomes https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq525_HTML.gif .
 
4.
We propagate C from clause 3. The trail becomes https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq526_HTML.gif . Clause 2 is in conflict. The decision made in step 2 was wrong, so we backtrack.
 
By making the right arbitrary choices, we could go from propagation to propagation without having to update the clauses. However, neither invariant holds for clauses 1 to 3 after step 3. To capture the new state of affairs, we need a more precise invariant and a richer notion of state that take into account any pending updates. The new invariant is as follows:
  • (\(\gamma \)) If there are no pending updates for the clause and no conflict is being processed, invariant (\(\beta \)) holds.
An update is represented by a pair (LC), where L is a literal that has become false and C is a clause that has L as one of its watched literals. Each time a literal L is added to the trail, all possible updates \((-L,\, C)\) are added to the set of pending updates, which is initially empty. Whenever a conflict is detected, the updates are reset to \(\emptyset \). Pending updates can be processed at any time by the calculus.

6.2 The CDCL Calculus with Watched Literals

CDCL with the 2WL data structure is defined as an abstract calculus https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq531_HTML.gif that refines https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq532_HTML.gif . Nonunit clauses are represented as https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq533_HTML.gif , where https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq534_HTML.gif is the multiset of watched literals (of cardinality 2) and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq535_HTML.gif the multiset of unwatched literals. Unit clauses are represented as singleton multisets. The state must also keep track of pending updates. States have the form https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq536_HTML.gif , where
  • M is the trail;
  • N is the initial nonunit clause set in 2WL format;
  • U is the learned nonunit clause set in 2WL format;
  • D is a conflict clause or \(\top \);
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq538_HTML.gif is the initial unit clause set;
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq539_HTML.gif is the learned unit clause set;
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq540_HTML.gif is a multiset of literal–clause pairs (LC) indicating that clause C must be updated with respect to literal L;
  • Q is a set of literals for which further updates are pending.
https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq541_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq542_HTML.gif do not influence the calculus; they are ghost components that are useful for connecting a 2WL state to the format expected by https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figig_HTML.gif :
https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Equ4_HTML.gif
The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq543_HTML.gif function converts a 2WL clause set to a standard clause set.
The first two rules of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figih_HTML.gif have direct counterparts in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figii_HTML.gif :
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figij_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq544_HTML.gif
    https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq545_HTML.gif
    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq546_HTML.gif , \(L'\) is not set in M, and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq548_HTML.gif
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figik_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq549_HTML.gif
    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq550_HTML.gif , \(-L' \in M\), and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq552_HTML.gif
For both rules, the side condition https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq553_HTML.gif is necessary because invariant (\(\beta \)) is not required to hold for C while a (LC) update is pending.
The next rules manipulate the state’s 2WL-specific components, without affecting the state’s semantics as seen through https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq555_HTML.gif :
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figil_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq556_HTML.gif
    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq557_HTML.gif , and \(N'\) and \(U'\) are obtained from N and U by replacing https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq560_HTML.gif with https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq561_HTML.gif
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figim_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq562_HTML.gif
    if https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq563_HTML.gif and \(L' \in M\)
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figin_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq565_HTML.gif
    https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq566_HTML.gif
As in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq567_HTML.gif , propagations and conflicts are preferred over decisions. This is achieved by checking that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq568_HTML.gif and \( Q \) are empty when making a decision:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figio_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq570_HTML.gif
        if L is not defined in M and appears in N
The restriction on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figip_HTML.gif is enough to ensure that the reasonable strategy is applied in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figiq_HTML.gif . https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figir_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figis_HTML.gif are as before, except that they also preserve the 2WL-specific components of the state. The https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figit_HTML.gif rule is replaced by two rules, because of the distinction between unit and nonunit clauses:
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figiu_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq571_HTML.gif
    https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq572_HTML.gif
    if \(D \ne \bot \) and L satisfies the conditions on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figiv_HTML.gif
  • https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figiw_HTML.gif https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq574_HTML.gif
    https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq575_HTML.gif    if L satisfies the conditions on https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figix_HTML.gif
Theorem 12
(Invariant [20, cdcl_twl_stgy_twl_struct_invs]) If state https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq576_HTML.gif satisfies invariant (\(\gamma \)) and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq578_HTML.gif , then T satisfies invariant (\(\gamma \)).
https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figiy_HTML.gif refines https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq580_HTML.gif in the following sense:
Theorem 13
(Refinement [20, full_cdcl_twl_stgy_cdcl\(_W\)_stgy]) Let https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq582_HTML.gif be a state that satisfies invariant (\(\gamma \)). If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq584_HTML.gif , then https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq585_HTML.gif
https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figiz_HTML.gif refines https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq586_HTML.gif ’s end-to-end behavior and produces final states that are also final states for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq587_HTML.gif . We can apply Theorem 9 to establish partial correctness.

6.3 Derivation of an Executable List-Based Program

The next step is to refine the calculus with watched literals to an executable program. The state is a tuple https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq588_HTML.gif , where https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq589_HTML.gif is a list (instead of a set) of clauses containing first n initial nonunit clauses followed by the learned nonunit clauses, where clauses are represented as lists of literals starting with the watched ones; M uses indices in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq590_HTML.gif to represent clause annotations; and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq591_HTML.gif uses indices in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq592_HTML.gif to represent clauses. The D, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq593_HTML.gif , https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq594_HTML.gif , and Q components are as before.
The program’s main loop invokes functions that implement specific rules or set of rules. The function for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figja_HTML.gif , https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjb_HTML.gif , https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjc_HTML.gif , and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjd_HTML.gif is presented below: The values https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjf_HTML.gif , https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjg_HTML.gif , and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjh_HTML.gif correspond to positive, negative, and undefined polarity, respectively. As we refine the program, we must provide additional invariants for the data structure—for example, indices in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq595_HTML.gif are valid and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq596_HTML.gif is a valid index. The assertion corresponding to the latter, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq597_HTML.gif , is not shown above, but it is needed for code generation.
The main loop is called https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq598_HTML.gif . Although it imposes an order on rule applications, it is not fully deterministic—for example, it does not specify which literal to choose in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figji_HTML.gif . The following theorem connects it to the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjj_HTML.gif calculus:
Theorem 14
(Refinement [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjk_HTML.gif ]) If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq599_HTML.gif is a well-formed state and invariant (\(\gamma \)) holds for all clauses occurring in its https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq601_HTML.gif component, then
https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Equ5_HTML.gif
where https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq602_HTML.gif translates program states to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq603_HTML.gif states.
The state returned by the program is final for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjl_HTML.gif , which means by Theorem 13 that it is also final for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjm_HTML.gif . We conclude that the program is a partially correct implementation of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq604_HTML.gif . In addition, since the specification always specifies a non- https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjn_HTML.gif result, the program always terminates normally.
In a further refinement step not presented here, we extend the state with watch lists that map from a literal to the clauses that are watched, instead of recalculating them each time. The watch lists are modeled by a function https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq605_HTML.gif such that https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq606_HTML.gif and update it in when required.

6.4 Generation of Imperative Code

To be complete in a practical sense, an executable SAT solver must first initialize the 2WL data structure, run the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjo_HTML.gif calculus, and return “satisfiable” (with a model) or “unsatisfiable,” depending on whether a conflict has been found. The initialization step is necessary not only to run the program on actual problems but also to ensure that it is possible to create a 2WL state that satisfies invariant (\(\gamma \)) for any input.
The input is a list of clauses, where each clause is itself a list. We require that the lists are nonempty and contain no duplicates. For each clause C, we perform the following steps:
1.
If C is a unit clause L:
1.1
Add L to the state’s https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq608_HTML.gif component.
 
1.2
If \(-L\) is in the trail, set the state’s D component to L and stop the procedure.
 
1.3
Otherwise, add L to the state’s M and Q components, unless this has already been done.
 
 
2.
Otherwise, add C to https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq610_HTML.gif Its first two literals are watched.
 
The result is a well-formed state that satisfies invariant (\(\gamma \)). If a conflict is found in step 1.2, the program can answer “unsatisfiable” immediately.
Before we can generate imperative code, we must first eliminate the remaining nondeterminism, notably the choice of literal in https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjp_HTML.gif . We implement the variable-move-to-front heuristic [5]. During initialization, we create a list containing all the literals. This list is used to initialize the doubly linked list needed by the heuristic. We also extract the maximal atom in the list to allocate the list of the polarity-checking optimization (Sect. 6.5) with the correct length.
Second, we must specify the data structures to use the generated code. Lists of clauses are refined to resizable arrays of nonresizable arrays. The dynamic aspect is required for adding learned clauses. Within a clause, only the order of literals needs to change. We had to formalize the data structure ourselves; for technical reasons, the resizable arrays from the Imperative Collection Framework [29, 31] cannot contain arrays. We were able to reuse some of the theorems proved on the separation logic level.
We used Sepref to refine the code of the SAT solver, including initialization. We restrict the type of the atoms https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjq_HTML.gif to natural numbers https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjr_HTML.gif . In our first version, we also used (unbounded) natural number to literals in the generated code: The literals https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq612_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq613_HTML.gif are encoded by the numbers \(2\cdot i\) and \(2\cdot i +1\), respectively. However, the extraction of an atom from the literals (the integer division by 2) was inefficient in Standard ML. Therefore, we changed our representation to 32-bits unsigned integers (so only \(2^{31}\) atoms are allowed). The extraction of atoms now becomes bit-shifting.
The end-to-end refinement theorem, relating a semantic satisfiability check on the input problem ( https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq617_HTML.gif that returns https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq618_HTML.gif if unsatisfiable) to the Imperative HOL heap code ( https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq619_HTML.gif ), is stated below, where the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq620_HTML.gif relation refines a multiset of multisets of literals to a list of lists of 32-bit unsigned integers, and the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjs_HTML.gif relation refines the model that is returned as a list of literals.
Theorem 15
(End-to-End Correctness [20, https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjt_HTML.gif ])
The following refinement relation holds:

6.5 Fast Polarity Checking

The imperative code described in the previous subsection suffers from a crippling inefficiency: The solver often needs to compute the polarity of a literal, and currently this is achieved by traversing the trail M, which may be very large. In practice, solvers employ a map from atoms to their current polarity.
Using stepwise refinement, we integrate this optimization into the imperative data structure used for the trail. This refinement step is isolated from the rest of the development, which only relies on its final result: a more efficient implementation of the trail and its operations. As Lammich observed elsewhere [32], this kind of modularity is invaluable when designing complex data structures.
Since the atoms are natural numbers, we enrich the trail data structure with a list of polarities (of type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq621_HTML.gif ), such that the \((i + 1)\)st element gives the polarity of atom i. The new https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjv_HTML.gif function is defined as follows:
Given \(N_1\) the set of all valid literals (i.e., the positive and negative version of all atoms that appear in the problem), the refinement relation between the trail with the list of polarities and the simple trail is defined as follows: This invariant ensures that the list https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq633_HTML.gif is long enough and contains the polarities. We can link the new polarity function to the simpler one. If https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq634_HTML.gif , then
In a subsequent refinement step, we use Sepref to implement the list of polarities by an array, and atoms are mapped to 32-bits unsigned integers ( https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figjz_HTML.gif ), as in Sect. 6.4. Accordingly, we define two auxiliary relations:
  • The relation https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq635_HTML.gif refines a literal with natural number atoms by a literal encoded as a 32-bit unsigned integer.
  • The relation https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq636_HTML.gif refines the trail data structure to use an array of polarities (instead of a list) and annotated literals of type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figka_HTML.gif , using the 32-bit representation of literals. The clause indices of type https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkb_HTML.gif remain unbounded unsigned integers.
Sepref generates the imperative program https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq637_HTML.gif and derives the following refinement theorem: The precondition, in square brackets, ensures that we can only take the polarity of a literal that is within bounds. The term after the arrow is the refinement for the result, which is trivial here because the data structure for polarities remains https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq638_HTML.gif .
Composing the refinement steps (1) and (2) yields the theorem where https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq639_HTML.gif combines the two refinement relations for trails https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq640_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq641_HTML.gif . The precondition https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq642_HTML.gif is a consequence of \(L \in N_1\) and the invariant https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq644_HTML.gif . If we invoke Sepref now and discharge https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figke_HTML.gif ’s preconditions, all occurrences of the unoptimized https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkf_HTML.gif function are be replaced by https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkg_HTML.gif . After adapting the initialization to allocate the array for https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq645_HTML.gif of the correct size, we can prove end-to-end correctness as before with respect to the optimized code (cf. Theorem 15).
Our formalization of the DPLL and CDCL calculi consists of about 28 000 lines of Isabelle text. The work was done over a period of 10 months almost entirely by Fleury, who also taught himself Isabelle during that time. It covers nearly all of the metatheoretical material of Sections 2.6 to 2.11 of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkh_HTML.gif and Section 2 of Nieuwenhuis et al., including normal form transformations and ground unordered resolution [19]. The refinement to an imperative program is about 20 000 lines long and took about 6 months to perform.
It is difficult to quantify the cost of formalization as opposed to paper proofs. For a sketchy argument, formalization may take an arbitrarily long time; indeed, Weidenbach’s eight-line proof of Theorem 10 initially took 700 lines of Isabelle. In contrast, given a very detailed paper proof, one can sometimes obtain a formalization in less time than it took to write the paper proof [60]. A frequent hurdle to formalization is the lack of suitable libraries. We spent considerable time adding definitions, lemmas, and automation hints to Isabelle’s multiset library, and the refinement to resizable arrays of arrays required an elaborate setup, but otherwise we did not need any special libraries. We also found that organizing the proof at a high level, especially locale engineering, is more challenging, and perhaps even more time consuming, than discharging proof obligations.
One of our initial motivations for using locales, besides the ease with which it lets us express relationships between calculi, was that it allows abstracting over the concrete representation of the state. However, we discovered that this is often too restrictive, because some data structures need sophisticated invariants, which we must establish at the abstract level. We found ourselves having to modify the base locale each time we attempted to refine the data structure, an extremely tedious endeavor.
In contrast, the Refinement Framework, with its focus on functions, allows us to exploit local assumptions. Consider the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figki_HTML.gif function (Sect. 3.2), which adds a literal to the trail. Whenever the function is called, the literal is not already set and appears in the clauses. The polarity-checking optimization (Sect. 6.5) relies on the latter property to avoid checking bounds when updating the atom-to-polarity map. With the Refinement Framework, there are enough assumptions in the context to establish the property. With a locale, we would have to restrict the specification of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkj_HTML.gif to handle only those cases where the literals is in the set of clauses, leading to changes in the locale definition itself and to all its uses, well beyond the polarity-checking code.
While refining to the heap monad, we discovered several issues with our program. We had forgotten several assertions (especially array bound checks) and sometimes mixed up the https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq646_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_IEq647_HTML.gif annotations, resulting in large, hard-to-interpret proof obligations. Sepref is a very useful tool, but it provides few safeguards or hints when something goes wrong. Moreover, the Isabelle/jEdit user interface can be unbearably slow at displaying large proof obligations.
Given the varied level of formality of the proofs in the draft of https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkk_HTML.gif , it is unlikely that Fleury will ever catch up with Weidenbach. But the insights arising from formalization have already enriched the textbook in many ways. For the calculi described in this paper, the main issues were that fundamental invariants were omitted and some proofs may have been too sketchy to be accessible to the book’s intended audience. We also found a major mistake in an extension of CDCL using the branch-and-bound principle: Given a weight function, the calculus aims at finding a model of minimal weight. In the course of formalization, Fleury came up with a counterexample that invalidates the main correctness theorem, whose proof confused partial and total models.
For discharging proof obligations, we relied heavily on Sledgehammer, including its facility for generating detailed Isar proofs [10] and the SMT-based smt tactic [13]. We found the SMT solver CVC4 particularly useful, corroborating earlier empirical evaluations [50]. In contrast, the counterexample generators Nitpick and Quickcheck [8] were seldom useful. We often discovered flawed conjectures by observing Sledgehammer fail to solve an easy-looking problem. As one example among many, we lost perhaps one hour working from the hypothesis that converting a set to a multiset and back is the identity. Because Isabelle’s multisets are finite, the property does not hold for infinite sets A; yet Nitpick and Quickcheck fail to find a counterexample, because they try only finite values for A (and Quickcheck cannot cope with underspecification anyway).
At the calculus level, we followed Nieuwenhuis et al. (Sect. 3) and Weidenbach (Sect. 4), but other accounts exist. In particular, Krstić and Goel [28] present a calculus that lies between https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkl_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkm_HTML.gif on a scale from abstract to concrete. Unlike Nieuwenhuis et al., they have a concrete https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkn_HTML.gif rule. On the other hand, whereas Weidenbach only allows to resolve the conflict ( https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figko_HTML.gif ) with the clause that was used to propagate a literal, Krstić and Goel allow any clause that could have cause the propagation (rule https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkp_HTML.gif ). Another difference is that their https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkq_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figkr_HTML.gif rules must explicitly check that no clause is learned twice (cf. Theorem 10).
Formalizing metatheoretical results about logic in a proof assistant is an enticing, if somewhat self-referential, prospect. Shankar’s proof of Gödel’s first incompleteness theorem [52], Harrison’s formalization of basic first-order model theory [24], and Margetson and Ridge’s formalized completeness and cut elimination theorems [36] are some of the landmark results in this area. Recently, SAT solvers have been formalized in proof assistants. Marić [37, 38] verified a CDCL-based SAT solver in Isabelle/HOL, including two watched literals, as a purely functional program. The solver is monolithic, which complicates extensions. In addition, he formalized the abstract CDCL calculus by Nieuwenhuis et al. and, together with Janičić [37, 39], the more concrete calculus by Krstić and Goel [28]. Marić’s methodology is quite different from ours, without the use of refinements, inductive predicates, locales, or even Sledgehammer.
In his Ph.D. thesis, Lescuyer [34] presents the formalization of the CDCL calculus and the core of an SMT solver in Coq. He also developed a reflexive DPLL-based SAT solver for Coq, which can be used as a tactic in the proof assistant. Another formalization of a CDCL-based SAT solver, including termination but excluding two watched literals, is by Shankar and Vaucher in PVS [53]. Most of this work was done by Vaucher during a two-month internship, an impressive achievement. Finally, Oe et al. [47] verified an imperative and fairly efficient CDCL-based SAT solver, expressed using the Guru language for verified programming. Optimized data structures are used, including for two watched literals and conflict analysis. However, termination is not guaranteed, and model soundness is achieved through a run-time check and not proved.

8 Conclusion

The advantages of computer-checked metatheory are well known from programming language research, where papers are often accompanied by formalizations and proof assistants are used in the classroom [44, 49]. This article, like its predecessors and relatives [9, 12, 51], reported on some steps we have taken to apply these methods to automated reasoning. Compared with other application areas of proof assistants, the proof obligations are manageable, and little background theory is required.
We presented a formal framework for DPLL and CDCL in Isabelle/HOL, covering the ground between an abstract calculus and a verified imperative SAT solver. Our framework paves the way for further formalization of metatheoretical results. We intend to keep following https://static-content.springer.com/image/art%3A10.1007%2Fs10817-018-9455-7/MediaObjects/10817_2018_9455_Figks_HTML.gif , including its generalization of ordered ground resolution with CDCL, culminating with a formalization of the full superposition calculus and extensions. Thereby, we aim at demonstrating that interactive theorem proving is mature enough to be of use to practitioners in automated reasoning, and we hope to help them by developing the necessary libraries and methodology.
The CDCL algorithm, and its implementation in highly efficient SAT solvers, is one of the jewels of computer science. To quote Knuth [26, p. iv], “The story of satisfiability is the tale of a triumph of software engineering blended with rich doses of beautiful mathematics.” What fascinates us about CDCL is not only how or how well it works, but also why it works so well. Knuth’s remark is accurate, but it is not the whole story.

Acknowledgements

Open access funding was provided by the Max Planck Society. Stephan Merz made this work possible in the first place. Dmitriy Traytel remotely cosupervised Fleury’s M.Sc. thesis and provided copious advice on using Isabelle. Andrei Popescu gave us his permission to reuse, in a slightly adapted form, the succinct description of locales he cowrote on a different occasion [9]. Simon Cruanes, Anders Schlichtkrull, Mark Summerfield, Dmitriy Traytel, and the reviewers suggested many textual improvements. The work has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 713999, Matryoshka).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Literatur
1.
Zurück zum Zitat Bachmair, L., Ganzinger, H.: Resolution theorem proving. In: Robinson, A., Voronkov, A. (eds.) Handbook of Automated Reasoning, vol. I, pp. 19–99. Elsevier, Amsterdam (2001)CrossRef Bachmair, L., Ganzinger, H.: Resolution theorem proving. In: Robinson, A., Voronkov, A. (eds.) Handbook of Automated Reasoning, vol. I, pp. 19–99. Elsevier, Amsterdam (2001)CrossRef
3.
Zurück zum Zitat Bayardo Jr., R.J., Schrag, R.: Using CSP look-back techniques to solve exceptionally hard SAT instances. In: Freuder, E.C. (ed.) CP96. LNCS, vol. 1118, pp. 46–60. Springer, Berlin (1996) Bayardo Jr., R.J., Schrag, R.: Using CSP look-back techniques to solve exceptionally hard SAT instances. In: Freuder, E.C. (ed.) CP96. LNCS, vol. 1118, pp. 46–60. Springer, Berlin (1996)
4.
Zurück zum Zitat Becker, H., Blanchette, J.C., Fleury, M., From, A.H., Jensen, A.B., Lammich, P., Larsen, J.B., Michaelis, J., Nipkow, T., Popescu, A., Schlichtkrull, A., Tourret, S., Traytel, D., Villadsen, J.: IsaFoL: Isabelle Formalization of Logic. https://bitbucket.org/isafol/isafol/. Accessed 13 Feb 2018 Becker, H., Blanchette, J.C., Fleury, M., From, A.H., Jensen, A.B., Lammich, P., Larsen, J.B., Michaelis, J., Nipkow, T., Popescu, A., Schlichtkrull, A., Tourret, S., Traytel, D., Villadsen, J.: IsaFoL: Isabelle Formalization of Logic. https://​bitbucket.​org/​isafol/​isafol/​. Accessed 13 Feb 2018
5.
Zurück zum Zitat Biere, A., Fröhlich, A.: Evaluating CDCL variable scoring schemes. In: Heule, M., Weaver, S. (eds.) SAT 2015. LNCS, vol. 5584, pp. 237–243. Springer, Berlin (2015) Biere, A., Fröhlich, A.: Evaluating CDCL variable scoring schemes. In: Heule, M., Weaver, S. (eds.) SAT 2015. LNCS, vol. 5584, pp. 237–243. Springer, Berlin (2015)
6.
Zurück zum Zitat Biere, A., Heule, M., van Maaren, H., Walsh, T. (eds.): Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185. IOS Press, Amsterdam (2009) Biere, A., Heule, M., van Maaren, H., Walsh, T. (eds.): Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 185. IOS Press, Amsterdam (2009)
7.
Zurück zum Zitat Blanchette, J.C., Böhme, S., Paulson, L.C.: Extending Sledgehammer with SMT solvers. J. Autom. Reason. 51(1), 109–128 (2013)MathSciNetCrossRefMATH Blanchette, J.C., Böhme, S., Paulson, L.C.: Extending Sledgehammer with SMT solvers. J. Autom. Reason. 51(1), 109–128 (2013)MathSciNetCrossRefMATH
8.
Zurück zum Zitat Blanchette, J.C., Bulwahn, L., Nipkow, T.: Automatic proof and disproof in Isabelle/HOL. In: Tinelli, C., Sofronie-Stokkermans, V. (eds.) FroCoS 2011. LNCS, vol. 6989, pp. 12–27. Springer, Berlin (2011) Blanchette, J.C., Bulwahn, L., Nipkow, T.: Automatic proof and disproof in Isabelle/HOL. In: Tinelli, C., Sofronie-Stokkermans, V. (eds.) FroCoS 2011. LNCS, vol. 6989, pp. 12–27. Springer, Berlin (2011)
9.
Zurück zum Zitat Blanchette, J.C., Popescu, A.: Mechanizing the metatheory of Sledgehammer. In: Fontaine, P., Ringeissen, C., Schmidt, R.A. (eds.) FroCoS 2013. LNCS, vol. 8152, pp. 245–260. Springer, Berlin (2013) Blanchette, J.C., Popescu, A.: Mechanizing the metatheory of Sledgehammer. In: Fontaine, P., Ringeissen, C., Schmidt, R.A. (eds.) FroCoS 2013. LNCS, vol. 8152, pp. 245–260. Springer, Berlin (2013)
10.
Zurück zum Zitat Blanchette, J.C., Böhme, S., Fleury, M., Smolka, S.J., Steckermeier, A.: Semi-intelligible Isar proofs from machine-generated proofs. J. Autom. Reason. 56(2), 155–200 (2016)MathSciNetCrossRefMATH Blanchette, J.C., Böhme, S., Fleury, M., Smolka, S.J., Steckermeier, A.: Semi-intelligible Isar proofs from machine-generated proofs. J. Autom. Reason. 56(2), 155–200 (2016)MathSciNetCrossRefMATH
11.
Zurück zum Zitat Blanchette, J.C., Fleury, M., Weidenbach, C.: A verified SAT solver framework with learn, forget, restart, and incrementality. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS, vol. 9706, pp. 25–44. Springer, Berlin (2016) Blanchette, J.C., Fleury, M., Weidenbach, C.: A verified SAT solver framework with learn, forget, restart, and incrementality. In: Olivetti, N., Tiwari, A. (eds.) IJCAR 2016. LNCS, vol. 9706, pp. 25–44. Springer, Berlin (2016)
12.
Zurück zum Zitat Blanchette, J.C., Popescu, A., Traytel, D.: Soundness and completeness proofs by coinductive methods. J. Autom. Reason. 58(1), 149–179 (2017)MathSciNetCrossRefMATH Blanchette, J.C., Popescu, A., Traytel, D.: Soundness and completeness proofs by coinductive methods. J. Autom. Reason. 58(1), 149–179 (2017)MathSciNetCrossRefMATH
13.
Zurück zum Zitat Böhme, S., Weber, T.: Fast LCF-style proof reconstruction for Z3. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 179–194. Springer, Berlin (2010) Böhme, S., Weber, T.: Fast LCF-style proof reconstruction for Z3. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 179–194. Springer, Berlin (2010)
14.
Zurück zum Zitat Bulwahn, L., Krauss, A., Haftmann, F., Erkök, L., Matthews, J.: Imperative functional programming with Isabelle/HOL. In: Mohamed, O.A., Muñoz, C.A., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 134–149. Springer, Berlin (2008) Bulwahn, L., Krauss, A., Haftmann, F., Erkök, L., Matthews, J.: Imperative functional programming with Isabelle/HOL. In: Mohamed, O.A., Muñoz, C.A., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 134–149. Springer, Berlin (2008)
16.
Zurück zum Zitat Cruz-Filipe, L., Heule, M.J.H., Jr., W.A.H., Kaufmann, M., Schneider-Kamp, P.: Efficient certified RAT verification. In: de Moura, L. (ed.) CADE-26. LNCS, vol. 10395, pp. 220–236. Springer, Berlin (2017) Cruz-Filipe, L., Heule, M.J.H., Jr., W.A.H., Kaufmann, M., Schneider-Kamp, P.: Efficient certified RAT verification. In: de Moura, L. (ed.) CADE-26. LNCS, vol. 10395, pp. 220–236. Springer, Berlin (2017)
17.
18.
Zurück zum Zitat Eén, N., Sörensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Berlin (2003) Eén, N., Sörensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Berlin (2003)
22.
Zurück zum Zitat Gordon, M.J.C., Milner, R., Wadsworth, C.P.: Edinburgh LCF: A Mechanised Logic of Computation, LNCS, vol. 78. Springer, Berlin (1979)MATH Gordon, M.J.C., Milner, R., Wadsworth, C.P.: Edinburgh LCF: A Mechanised Logic of Computation, LNCS, vol. 78. Springer, Berlin (1979)MATH
23.
Zurück zum Zitat Haftmann, F., Nipkow, T.: Code generation via higher-order rewrite systems. In: Blume, M., Kobayashi, N., Vidal, G. (eds.) FLOPS 2010. LNCS, vol. 6009, pp. 103–117. Springer, Berlin (2010) Haftmann, F., Nipkow, T.: Code generation via higher-order rewrite systems. In: Blume, M., Kobayashi, N., Vidal, G. (eds.) FLOPS 2010. LNCS, vol. 6009, pp. 103–117. Springer, Berlin (2010)
24.
Zurück zum Zitat Harrison, J.: Formalizing basic first order model theory. In: Grundy, J., Newey, M. (eds.) TPHOLs ’98. LNCS, vol. 1479, pp. 153–170. Springer, Berlin (1998) Harrison, J.: Formalizing basic first order model theory. In: Grundy, J., Newey, M. (eds.) TPHOLs ’98. LNCS, vol. 1479, pp. 153–170. Springer, Berlin (1998)
25.
Zurück zum Zitat Kammüller, F., Wenzel, M., Paulson, L.C.: Locales—a sectioning concept for Isabelle. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Théry, L. (eds.) TPHOLs ’99. LNCS, vol. 1690, pp. 149–166. Springer, Berlin (1999) Kammüller, F., Wenzel, M., Paulson, L.C.: Locales—a sectioning concept for Isabelle. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Théry, L. (eds.) TPHOLs ’99. LNCS, vol. 1690, pp. 149–166. Springer, Berlin (1999)
26.
Zurück zum Zitat Knuth, D.E.: The Art of Computer Programming, vol. 4, Fascicle 6: Satisfiability. Addison-Wesley, Boston (2015) Knuth, D.E.: The Art of Computer Programming, vol. 4, Fascicle 6: Satisfiability. Addison-Wesley, Boston (2015)
27.
Zurück zum Zitat Krauss, A.: Partial recursive functions in higher-order logic. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS, vol. 4130, pp. 589–603. Springer, Berlin (2006) Krauss, A.: Partial recursive functions in higher-order logic. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS, vol. 4130, pp. 589–603. Springer, Berlin (2006)
28.
Zurück zum Zitat Krstić, S., Goel, A.: Architecting solvers for SAT modulo theories: Nelson-Oppen with DPLL. In: Konev, B., Wolter, F. (eds.) FroCoS 2007. LNCS, vol. 4720, pp. 1–27. Springer, Berlin (2007) Krstić, S., Goel, A.: Architecting solvers for SAT modulo theories: Nelson-Oppen with DPLL. In: Konev, B., Wolter, F. (eds.) FroCoS 2007. LNCS, vol. 4720, pp. 1–27. Springer, Berlin (2007)
30.
Zurück zum Zitat Lammich, P.: Automatic data refinement. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 84–99. Springer, Berlin (2013) Lammich, P.: Automatic data refinement. In: Blazy, S., Paulin-Mohring, C., Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 84–99. Springer, Berlin (2013)
31.
Zurück zum Zitat Lammich, P.: Refinement to imperative/HOL. In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 253–269. Springer, Berlin (2015) Lammich, P.: Refinement to imperative/HOL. In: Urban, C., Zhang, X. (eds.) ITP 2015. LNCS, vol. 9236, pp. 253–269. Springer, Berlin (2015)
32.
Zurück zum Zitat Lammich, P.: Refinement based verification of imperative data structures. In: Avigad, J., Chlipala, A. (eds.) CPP 2016, pp. 27–36. ACM, New York (2016)CrossRef Lammich, P.: Refinement based verification of imperative data structures. In: Avigad, J., Chlipala, A. (eds.) CPP 2016, pp. 27–36. ACM, New York (2016)CrossRef
33.
Zurück zum Zitat Lammich, P.: Efficient verified (UN)SAT certificate checking. In: de Moura, L. (ed.) CADE-26. LNCS, vol. 10395, pp. 237–254. Springer, Berlin (2017) Lammich, P.: Efficient verified (UN)SAT certificate checking. In: de Moura, L. (ed.) CADE-26. LNCS, vol. 10395, pp. 237–254. Springer, Berlin (2017)
34.
Zurück zum Zitat Lescuyer, S.: Formalizing and implementing a reflexive tactic for automated deduction in Coq. Ph.D. thesis, Université Paris-Sud (2011) Lescuyer, S.: Formalizing and implementing a reflexive tactic for automated deduction in Coq. Ph.D. thesis, Université Paris-Sud (2011)
35.
38.
Zurück zum Zitat Marić, F.: Formal verification of a modern SAT solver by shallow embedding into Isabelle/HOL. Theor. Comput. Sci. 411(50), 4333–4356 (2010)MathSciNetCrossRefMATH Marić, F.: Formal verification of a modern SAT solver by shallow embedding into Isabelle/HOL. Theor. Comput. Sci. 411(50), 4333–4356 (2010)MathSciNetCrossRefMATH
40.
Zurück zum Zitat Marques-Silva, J.P., Sakallah, K.A.: GRASP–a new search algorithm for satisfiability. In: ICCAD ’96, pp. 220–227. IEEE Computer Society Press, Silver Spring (1996) Marques-Silva, J.P., Sakallah, K.A.: GRASP–a new search algorithm for satisfiability. In: ICCAD ’96, pp. 220–227. IEEE Computer Society Press, Silver Spring (1996)
41.
Zurück zum Zitat Matuszewski, R., Rudnicki, P.: Mizar: the first 30 years. Mech. Math. Appl. 4(1), 3–24 (2005) Matuszewski, R., Rudnicki, P.: Mizar: the first 30 years. Mech. Math. Appl. 4(1), 3–24 (2005)
42.
Zurück zum Zitat Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: Engineering an efficient SAT solver. In: DAC 2001, pp. 530–535. ACM, New York (2001) Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: Engineering an efficient SAT solver. In: DAC 2001, pp. 530–535. ACM, New York (2001)
43.
Zurück zum Zitat Nieuwenhuis, R., Oliveras, A., Tinelli, C.: Solving SAT and SAT modulo theories: from an abstract Davis–Putnam–Logemann–Loveland procedure to DPLL(T). J. ACM 53(6), 937–977 (2006)MathSciNetCrossRefMATH Nieuwenhuis, R., Oliveras, A., Tinelli, C.: Solving SAT and SAT modulo theories: from an abstract Davis–Putnam–Logemann–Loveland procedure to DPLL(T). J. ACM 53(6), 937–977 (2006)MathSciNetCrossRefMATH
44.
Zurück zum Zitat Nipkow, T.: Teaching semantics with a proof assistant: no more LSD trip proofs. In: Kuncak, V., Rybalchenko, A. (eds.) VMCAI 2012. LNCS, vol. 7148, pp. 24–38. Springer, Berlin (2012) Nipkow, T.: Teaching semantics with a proof assistant: no more LSD trip proofs. In: Kuncak, V., Rybalchenko, A. (eds.) VMCAI 2012. LNCS, vol. 7148, pp. 24–38. Springer, Berlin (2012)
45.
Zurück zum Zitat Nipkow, T., Klein, G.: Concrete Semantics: With Isabelle/HOL. Springer, Berlin (2014)CrossRefMATH Nipkow, T., Klein, G.: Concrete Semantics: With Isabelle/HOL. Springer, Berlin (2014)CrossRefMATH
46.
Zurück zum Zitat Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL: A Proof Assistant for Higher-Order Logic, LNCS, vol. 2283. Springer, Berlin (2002)MATH Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL: A Proof Assistant for Higher-Order Logic, LNCS, vol. 2283. Springer, Berlin (2002)MATH
47.
Zurück zum Zitat Oe, D., Stump, A., Oliver, C., Clancy, K.: versat: a verified modern SAT solver. In: Kuncak, V., Rybalchenko, A. (eds.) VMCAI 2012, LNCS, vol. 7148, pp. 363–378. Springer, Berlin (2012) Oe, D., Stump, A., Oliver, C., Clancy, K.: versat: a verified modern SAT solver. In: Kuncak, V., Rybalchenko, A. (eds.) VMCAI 2012, LNCS, vol. 7148, pp. 363–378. Springer, Berlin (2012)
48.
Zurück zum Zitat Paulson, L.C., Blanchette, J.C.: Three years of experience with Sledgehammer, a practical link between automatic and interactive theorem provers. In: Sutcliffe, G., Schulz, S., Ternovska, E. (eds.) IWIL-2010. EPiC, vol. 2, pp. 1–11. EasyChair (2012) Paulson, L.C., Blanchette, J.C.: Three years of experience with Sledgehammer, a practical link between automatic and interactive theorem provers. In: Sutcliffe, G., Schulz, S., Ternovska, E. (eds.) IWIL-2010. EPiC, vol. 2, pp. 1–11. EasyChair (2012)
49.
Zurück zum Zitat Pierce, B.C.: Lambda, the ultimate TA: using a proof assistant to teach programming language foundations. In: Hutton, G., Tolmach, A.P. (eds.) ICFP 2009, pp. 121–122. ACM, New York (2009) Pierce, B.C.: Lambda, the ultimate TA: using a proof assistant to teach programming language foundations. In: Hutton, G., Tolmach, A.P. (eds.) ICFP 2009, pp. 121–122. ACM, New York (2009)
50.
Zurück zum Zitat Reynolds, A., Tinelli, C., de Moura, L.: Finding conflicting instances of quantified formulas in SMT. In: Claessen, K., Kuncak, V. (eds.) FMCAD 2014, pp. 195–202. IEEE Computer Society Press, Silver Spring (2014) Reynolds, A., Tinelli, C., de Moura, L.: Finding conflicting instances of quantified formulas in SMT. In: Claessen, K., Kuncak, V. (eds.) FMCAD 2014, pp. 195–202. IEEE Computer Society Press, Silver Spring (2014)
51.
Zurück zum Zitat Schlichtkrull, A.: Formalization of the resolution calculus for first-order logic. In: Blanchette, J.C., Merz, S. (eds.) ITP 2016. LNCS, vol. 9807, pp. 341–357. Springer, Berlin (2016) Schlichtkrull, A.: Formalization of the resolution calculus for first-order logic. In: Blanchette, J.C., Merz, S. (eds.) ITP 2016. LNCS, vol. 9807, pp. 341–357. Springer, Berlin (2016)
52.
Zurück zum Zitat Shankar, N.: Metamathematics, Machines, and Gödel’s Proof, Cambridge Tracts in Theoretical Computer Science, vol. 38. Cambridge University Press, Cambridge (1994)CrossRef Shankar, N.: Metamathematics, Machines, and Gödel’s Proof, Cambridge Tracts in Theoretical Computer Science, vol. 38. Cambridge University Press, Cambridge (1994)CrossRef
53.
Zurück zum Zitat Shankar, N., Vaucher, M.: The mechanical verification of a DPLL-based satisfiability solver. Electr. Notes Theor. Comput. Sci. 269, 3–17 (2011)MathSciNetCrossRefMATH Shankar, N., Vaucher, M.: The mechanical verification of a DPLL-based satisfiability solver. Electr. Notes Theor. Comput. Sci. 269, 3–17 (2011)MathSciNetCrossRefMATH
54.
Zurück zum Zitat Sörensson, N., Biere, A.: Minimizing learned clauses. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 9340, pp. 237–243. Springer, Berlin (2009) Sörensson, N., Biere, A.: Minimizing learned clauses. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 9340, pp. 237–243. Springer, Berlin (2009)
56.
Zurück zum Zitat Voronkov, A.: AVATAR: the architecture for first-order theorem provers. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 696–710. Springer, Berlin (2014) Voronkov, A.: AVATAR: the architecture for first-order theorem provers. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 696–710. Springer, Berlin (2014)
57.
Zurück zum Zitat Weidenbach, C.: Automated reasoning building blocks. In: Meyer, R., Platzer, A., Wehrheim, H. (eds.) Correct System Design: Symposium in Honor of Ernst-Rüdiger Olderog on the Occasion of His 60th Birthday. LNCS, vol. 9360, pp. 172–188. Springer, Berlin (2015)CrossRef Weidenbach, C.: Automated reasoning building blocks. In: Meyer, R., Platzer, A., Wehrheim, H. (eds.) Correct System Design: Symposium in Honor of Ernst-Rüdiger Olderog on the Occasion of His 60th Birthday. LNCS, vol. 9360, pp. 172–188. Springer, Berlin (2015)CrossRef
58.
Zurück zum Zitat Wenzel, M.: Isabelle/Isar—a generic framework for human-readable proof documents. In: Matuszewski, R., Zalewska, A. (eds.) From Insight to Proof: Festschrift in Honour of Andrzej Trybulec, Studies in Logic, Grammar, and Rhetoric, vol. 10(23). University of Białystok (2007) Wenzel, M.: Isabelle/Isar—a generic framework for human-readable proof documents. In: Matuszewski, R., Zalewska, A. (eds.) From Insight to Proof: Festschrift in Honour of Andrzej Trybulec, Studies in Logic, Grammar, and Rhetoric, vol. 10(23). University of Białystok (2007)
59.
60.
Zurück zum Zitat Woodcock, J., Banach, R.: The verification grand challenge. J. Univers. Comput. Sci. 13(5), 661–668 (2007) Woodcock, J., Banach, R.: The verification grand challenge. J. Univers. Comput. Sci. 13(5), 661–668 (2007)
Metadaten
Titel
A Verified SAT Solver Framework with Learn, Forget, Restart, and Incrementality
verfasst von
Jasmin Christian Blanchette
Mathias Fleury
Peter Lammich
Christoph Weidenbach
Publikationsdatum
12.03.2018
Verlag
Springer Netherlands
Erschienen in
Journal of Automated Reasoning / Ausgabe 1-4/2018
Print ISSN: 0168-7433
Elektronische ISSN: 1573-0670
DOI
https://doi.org/10.1007/s10817-018-9455-7

Weitere Artikel der Ausgabe 1-4/2018

Journal of Automated Reasoning 1-4/2018 Zur Ausgabe