1 Introduction
DPRtrim
tool [28] to efficiently convert \(\textsf{PR}\) proofs into LPR format, and (iii) cake_lpr
, an efficient verified LPR proof checker with correctness guarantees, including for steps (a)–(c) enumerated above. The cake_lpr
tool was used to validate the unsatisfiability proofs for the 2020 SAT Competition because of its strong trust story combined with easy compilation and usage. Moreover, the stronger \(\textsf{PR}\) proof system could be supported in future competitions. The tool is publicly available at: https://github.com/tanyongkiam/cake_lprcake_lpr
to support the compositional proof format. The new correctness result for cake_lpr
allows users to exploit verified compositional proof checking by running parallel instances of the tool to check very large unsatisfiability proofs, such as those typically found in SATsolver aided proofs of mathematical results [23, 29, 35]. In particular, we explain how compositional proofs can be conveniently generated from the cubeandconquer [22] SAT solving technique that is naturally parallelizable. Together with our verified checker, this enables a fully parallel pipeline for SAT solving, proof generation, and verified proof checking. To the best of our knowledge, this is the first verification result for a proof checker that formally accounts for multiple, separate executions of the checker.2 Background
2.1 HOL4 and CakeML

the CakeML proofproducing translator [46] automatically synthesizes verified source code from pure algorithmic specifications;

the CakeML characteristic formula (CF) framework [19] provides a separation logic which can be used to manually verify (more efficient) imperative code for performancecritical parts of the proof checker.
2.2 SAT Problems and Clausal Proofs
3 Linear Propagation Redundancy
DPRtrim
tool^{1} to efficiently add hints to \(\textsf{PR}\) proofs, thereby turning them into LPR proofs. Throughout the section, we emphasize how LPR can be viewed as a clean and minimal extension of the existing LRAT proof format, which thereby enables its straightforward implementation in existing LRAT tools.DPRtrim
tool [28, Sect. 6] is built on top of DRATtrim
and facilitates verification of DPR proofs, a generalization of DRAT proofs with \(\textsf{PR}\) clause addition. If a clause addition step includes a witnessing assignment \(\omega \), then the PR redundancy check is performed. Otherwise DPRtrim
falls back on the RAT check from the DRATtrim
code base. Deletion steps are not validated. DPRtrim
has similar features compared to DRATtrim
, including backward checking, extraction of unsatisfiable cores, and proof optimization.DPRtrim
enriches input \(\textsf{PR}\) proofs by finding and adding all required unit clause hints to produce LPR proofs. Most of the changes generalize the code to produce LRAT proofs in DRATtrim
. The main \(\textsf{PR}\)specific optimization shrinks the witness \(\omega \) where possible: every literal in \(\omega \cap \alpha \) is removed as well as any literal in \(\omega \) that is implied by unit propagation from
. The shrinking was shown to be correct [28], but has not been implemented so far. We observed that the witnesses in the \(\textsf{PR}\) proofs produced by SaDiCaL
[27] can be substantially compressed using this method.p cnf
” followed by the number of variables and the number of clauses. The noncomment lines (not starting with “c
”) represent clauses, and they end with “0
”. Positive integers denote positive literals, while negative integers denote negative literals. By convention (following LRAT), the clauses are implicitly indexed according to their order of appearance in the file, starting from index 1. Fig. 2 (right) shows a corresponding proof in LPR format.^{2} Deletion lines in LPR are formatted identically to LRAT [10] (not shown here). For clause addition lines, the LPR format only differs from LRAT in case the clause to be added has \(\textsf{PR}\) but not \(\textsf{RAT}\) redundancy. A clause addition line in LPR format consists of three parts. The first part is the first integer on the line, which denotes the index of the new clause. The second part exactly matches the \(\textsf{PR}\) proof format [28]. It consists of the redundant clause and its witness; the first group of literals is the clause while the (potentially empty) witness starts from the second occurrence of the first literal of the clause until the first 0
that separates the unit clause hints. The third part (after the first 0
) are the unit clause hints, which exactly matches the LRAT format [10].
4 Compositional Proofs
4.1 Compositional Proof Checking

The indexes \(i_0,i_1,\dots ,i_k\) can be chosen arbitrarily by tools, as long as they cover all steps of the toplevel proof with \(i_0=0\), \(i_k=n\).

Different clausal proof formats and corresponding proof checkers can be used for each underlying proof.

The underlying proofs can be checked separately, in any order, and in parallel by running multiple instances of proof checkers on several machines.
4.2 Compositional Proof Generation
DRATtrim
to convert such partial proofs into LRAT proofs that end with the addition of clause \(\lnot {H_i}\). In addition, when generating the LRAT proof for \(\lnot {H_i}\), the compositional proof checking algorithm Fig. 5 allows us to assume all previously added clauses, i.e., starting with \(F \wedge \lnot {H_1} \wedge \dots \wedge \lnot {H_{i1}}\), which may help to simplify the LRAT proof.
5 CakeML Proof Checking
cake_lpr
, our verified CakeML LPR proof checker. Section 5.1 focuses on the highlevel verification strategy which we used to reduce the verification task to mostly routine lowlevel proofs (details of the latter are omitted). Section 5.2 explains the main verified theorems for the proof checker. Section 5.3 highlights some of the verified performance optimizations used in the proof checker.5.1 Implementation and Verification Strategy
cake_lpr
proceeds in three refinement steps, where each step progressively produces a more concrete and performant implementation of the proof checker. These refinements are visualized in the three columns of Fig. 9 for LPR proof checking. A similar verification process was used to add support for compositional proof checking.YES
or NO
answer according to the algorithm from Fig. 3. The correctness theorem for Step 2 shows that LPR proof checking correctly refines the \(\textsf{PR}\) proof system, i.e., if it outputs YES
, then there exists a valid \(\textsf{PR}\) proof for the input (lifted) CNF; by Step 1, this implies that the CNF is unsatisfiable. If the output is NO
, the input CNF could still be unsatisfiable, but the input LPR proof is not valid according to the algorithm in Fig. 3.cake_lpr
; this code is subsequently compiled with the verified CakeML compiler [54]. Composing the correctness theorem for source cake_lpr
with CakeML ’s compiler correctness theorem yields the corresponding correctness theorem for the cake_lpr
binary.cake_lpr
binary can be invoked from the commandline in five ways, each with associated soundness proofs (Section 5.2): cake_lpr
<DIMACS
>cake_lpr
<DIMACS
> <LPR
>cake_lpr
<DIMACS1
> <LPR
> <DIMACS2
>DIMACS1
is checked to imply DIMACS2
(satisfiabilitywise) using the LPR proof [10].cake_lpr
<DIMACS
> <COMP
> ij
<LPR
>COMP
for the range i
–j
and underlying LPR proof LPR
.cake_lpr
<DIMACS
> <COMP
> check
<OUTPUT
>cake_lpr
needs to be executed (with option 4) for a set of ranges covering the lines of the compositional (toplevel) proof COMP
and we expect users to exploit this compositionality by running separate instances of cake_lpr
in parallel on several machines. Accordingly, an important correctness caveat for compositional proof checking is that users correctly set up separate executions for their respective systems.cake_lpr
outputs the following string for each successful run of option 4, where md5
takes the MD5 hash of the input files:s VERIFIED RANGE md5(DIMACS) md5(COMP) ij
md5
hashes of their input DIMACS and proof files to check that the correct files were used on all machines. By concatenating these outputs into an output file OUTPUT
and executing cake_lpr
with option 5, cake_lpr
can be used to check that the output file contains the correct hashes and that the specified ranges appropriately cover the entire compositional proof. Our implementation checks range coverage (e.g., that ranges 0–2, 4–8, 2–4, 8–12 can be strung together to check range 0–12) by reusing a verified reachability checking function originally developed for a verified compiler optimization in the CakeML compiler [32].
5.2 Correctness Theorems
cake_lpr
in HOL4 is shown in Fig. 10. The first line (2) (in
) summarizes routine assumptions for compiled CakeML programs that use its basis library. Briefly, it assumes that the commandline \( cl \) and file system \( fs \) models are wellformed and the compiled code is placed in (and executed from) code memory of a machine state \( ms \) according to CakeML ’s x64 machine model \( mc \).cake_lpr
are some strings printed on standard output \( out \) and standard error \( err \). To minimize user confusion, cake_lpr
is designed to print all error messages to standard error and only success messages on standard output.cake_lpr
for each commandline option.^{3}cake_lpr
is given one commandline argument (\(\textsf {\small length}\; cl \;\textsf {\small =}\;2\))^{4}, it attempts to read and parse the file before printing (if successful) the parsed formula to standard output. The DIMACS parser (parse_dimacs) is proved to be left inverse to the DIMACS printer (print_dimacs) as follows: This says that for any wellformed formula \( fml \), printing that formula into DIMACS format then parsing it yields the formula back. All parsed formulas are proved to be wellformed (not shown here).
s VERIFIED UNSAT\n
” is printed onto standard output, cake_lpr
was provided with a file (in its first argument), and the file parses in DIMACS format to a formula \( fml \) whose lifted representation (\(\textsf {\small interp}\; fml \)) is unsatisfiable.cake_lpr
’s compositional proof checking is shown in Fig. 11. The definition check_successful_par characterizes a successful set of runs of cake_lpr
using command line options 4 and 5 on input strings \( fmlstr \) and \( pfstr \). The lines in
(6) say that there is a list of output strings \( outs \) such that every entry of this list is produced from the standard output of an execution of cake_lpr
with commandline option 4 (with appropriate setup of each machine’s filesystem and commandline arguments).^{5} The lines in
(7) say that \( outs \) is successfully checked by an execution of cake_lpr
with commandline option 5 and the corresponding success string (concat [\(\dots \)]) is printed to standard output. Using this definition, the correctness theorem (8) says that on a successful set of runs, satisfiability of the formula \( fml \) parsed from \( fmlstr \) implies satisfiability of the final formula \(\textsf {\small run\_proof}\; fml \; pf \) obtained by running all lines of the parsed proof \( pf \) on \( fml \), i.e., if \( fml \) is \(F_0\) and \( pf \) has n instructions, then \(\textsf {\small run\_proof}\; fml \; pf \) computes the accumulated formula \(F_n\) as defined in Section 4.1.5.3 Verified Optimizations
cake_lpr
. Our design decisions are based on empirical observations about the LPR proof checking algorithm. These are explained below with reference to specific steps in the algorithm outlined in Fig. 3.5.3.1 Arraybased representations
5.3.2 Proof checking exceptions
NO
) if the input proof is invalid, e.g., in Step 3.3. In a purely functional implementation, results are represented with an option: None indicating a failure and Some \( res \) indicating success with result \( res \). While conceptually simple, this means that common case (successful) intermediate results are always boxed within a Some constructor and then immediately unboxed with pattern matching to be used again. In cake_lpr
, failures instead raise exceptions which are directly handled at the top level. Thus, successful results can be passed directly, i.e., as \( res \), without any boxing. Support for verifying the use of exceptions is a unique feature of CakeML ’s CF framework [19].5.3.3 Hashtables
5.3.4 Buffered I/O streams
5.3.5 Producing MD5 hashes
6 Benchmarks
SaDiCaL
[27]; the second suite consists of unsatisfiable problems from the SAT Race 2019 competition.^{7} The \(\textsf{RAT}\) microbenchmark consists of proofs for large mutilated chessboards generated by a BDDbased SAT solver [8]. The unsatisfiability proofs for the Erdős discrepancy properties are generated by cubeandconquer (see Section 4.2). Raw timing data used to produce the figures and tables in this section is available from the cake_lpr
repository.cake_lpr
(default 4GB heap and stack space), while the other proof checkers used are labeled acl2lrat
(verified in ACL2 [20]), coqlrat
(verified in Coq [10]), and GRATchk
(verified in Isabelle/HOL [39]) respectively. All experiments were ran on identical nodes with Intel Xeon E52695 v3 CPUs (35M cache, 2.30GHz) and 128GB RAM; more specific configuration options for each benchmark suite are reported below.6.1 SaDiCaL
\(\textsf{PR}\) Benchmarks
SaDiCaL
solver produces \(\textsf{PR}\) proofs for hard SAT problems in its benchmark suite [27] and it is experimentally much faster than a plain DRATbased CDCL solver on those problems [27, Section 7]. The \(\textsf{PR}\) proofs are directly checked by cake_lpr
after conversion into LPR format with DPRtrim
. For all other checkers, the \(\textsf{PR}\) proofs were first converted to DRAT format using pr2drat
(as in an earlier approach [27]), and then into LRAT and GRAT formats using the DRATtrim
and GRATgen
^{8} tools respectively. All tools were ran with a timeout of 10000 seconds and all timings are reported in seconds (to one d.p.). Results are summarized in Tables 2 and 3.SaDiCaL
except mchess19
which exceeded the time limit. For the remaining benchmarks, generating and checking LPR proofs required a comparable (1–2.5x) amount of time to solving the problems, except mchess
, for which LPR generation and checking is much faster than solving the problems (Table 2). Unsurprisingly, direct checking of LPR proofs is much faster than the circuitous route of converting into DRAT and then into either LRAT or GRAT (Table 3). Unlike LPR, checking \(\textsf{PR}\) proofs via the LRAT route is 5–60x slower than solving those problems; this is a significant drawback to using the route in practice for certifying solver results. Compared to an unverified C implementation of LPR proof checking, cake_lpr
is approximately an order of magnitude slower on these benchmarks; a detailed comparison against unverified proof checking is in Section 6.3.mchess19
is omitted because SaDiCaL
timed out; timings for the Urquhart U.s3*
benchmarks are omitted because they took a negligible amount of time (\(< 1.0\)s total).mchess19
and U.s3*
benchmarks are omitted as in Table 2.Prob.  pr2drat  DRATtrim  \(\begin{array}{r}\texttt {cake\_lpr} \\ \hbox {(LRAT)}\end{array}\)  acl2lrat  coqlrat  GRATgen  GRATchk  \(\begin{array}{r}{} {\textbf {Total}} \\ {\textbf {(LPR)}}\end{array}\)  \(\begin{array}{r}\hbox {Total} \\ \hbox {(LRAT)}\end{array}\)  \(\begin{array}{r}\hbox {Total} \\ \hbox {(GRAT)}\end{array}\) 

hole20  0.8  4.4  18.5  7.9  966.7  4.6  18.2  2.2  14.2  24.6 
hole30  6.8  61.4  180.4  105.9  Fail(T)  24.5  647.9  15.4  181.0  686.1 
hole40  32.4  460.0  1039.5  711.8  Fail(T)  101.3  Fail(T)  66.3  1235.5   
hole50  108.6  2663.0  4697.4  3292.2  Fail(T)  337.2  Fail(T)  225.1  6165.5   
mchess15  7.7  48.2  49.3  36.2  Fail(T)  48.4  2023.1  21.7  110.6  2097.7 
mchess16  9.0  62.0  59.8  53.2  Fail(T)  55.2  2903.8  25.0  145.9  2989.6 
mchess17  14.5  105.0  97.3  88.5  Fail(T)  86.1  7050.9  39.8  242.7  7186.3 
mchess18  25.1  195.0  152.7  296.8  Fail(T)  135.9  Fail(T)  67.2  432.5   
U.s4b1  0.5  2.5  3.6  3.3  135.7  3.6  44.8  1.6  7.0  49.7 
U.s4b2  0.2  0.8  1.4  1.0  23.2  1.7  8.2  0.8  2.3  10.4 
U.s4b3  0.3  1.3  2.0  1.5  49.2  2.4  16.2  1.0  3.5  19.3 
U.s4b4  0.3  1.1  1.8  1.4  38.3  2.0  10.3  1.1  3.1  12.9 
U.s5b1  4.2  13.6  16.7  12.5  3048.7  17.4  933.2  4.7  32.8  957.3 
U.s5b2  1.7  5.6  7.3  5.5  614.7  7.7  189.6  2.4  13.9  200.2 
U.s5b3  5.0  18.4  26.3  22.2  8750.5  21.1  2316.3  6.8  48.8  2345.6 
U.s5b4  11.3  34.2  36.9  30.1  Fail(T)  40.6  Fail(T)  10.1  81.0   
cake_lpr
is also shown in Table 3, where it is used to check the generated LRAT proofs. Among the LRAT checkers, acl2lrat
is fastest, followed by cake_lpr
(LRAT checking), and coqlrat
. Although cake_lpr
(LRAT checking) is on average 1.3x slower than acl2lrat
, it scales better on the mchess
problems and is actually much faster than acl2lrat
on mchess18
. We also observed that the GRAT toolchain (summing SaDiCaL
, pr2drat
, GRATgen
and GRATchk
times) is much slower than the LRAT toolchains (summing SaDiCaL
, pr2drat
, DRATtrim
and fastest LRAT checking times). This is in contrast to the SAT Race 2019 benchmarks below (see Fig. 12), where we observed the opposite relationship. We believe that the difference in checking speed is due to the various checkers having different optimizations for checking the expensive \(\textsf{RAT}\) proof steps produced by conversion from \(\textsf{PR}\) proofs.6.2 SAT Race 2019 Benchmarks
CaDiCaL
. The N/A row counts problems that timed out or failed in an earlier step of the respective toolchains.
Status
 DRATtrim
 acl2lrat
 cake_lpr
 coqlrat
 GRATgen
 GRATchk


Success  97  96  97  36  100  100 
Timeout  5  0  0  61  0  0 
Other Failures  0  1  0  0  2  0 
N/A  0  5  5  5  0  2 
CaDiCaL
before conversion into the LRAT or GRAT formats. Proofs generated by CaDiCaL
on this suite rarely require \(\textsf{RAT}\) (or \(\textsf{PR}\)) steps, so the checkers are stresstested on their implementation of file I/O, parsing, and reverse unit propagation based on the annotated hints (Step 3.1 from Fig. 3); cake_lpr
is the only tool with a formally verified implementation of the former two steps. All tools were ran with the SAT competition solver timeout of 5000 seconds.CaDiCaL
were checked by at least one verified checker. The acl2lrat
checker fails with a parse error on one problem even though none of the other checkers reported such an error; GRATgen
aborted on two problems for an unknown reason. Plots comparing LRAT proof checking time and overall proof generation and checking time (LRAT and GRAT) are shown in Fig. 12. From Fig. 12 (top), the relative order of LRAT checking speeds remains the same as Table 3, where cake_lpr
is on average 1.2x slower than acl2lrat
, although cake_lpr
is faster on 28 benchmarks. From Fig. 12 (bottom), both LRAT toolchains are slower than the GRAT toolchain (average 3.5x slower for cake_lpr
and 3.4x for acl2lrat
). Part of this speedup for the GRAT toolchain comes from GRATgen
which can be run in parallel (with 8 threads). This suggests that adding native support for GRATbased input to cake_lpr
or adding LRAT support to GRATgen
could be worthwhile future extensions.6.3 Mutilated Chessboard \(\textsf{RAT}\) Microbenchmarks
pgbdd
, a BDDbased SAT solver [8]. Unlike the previous benchmark suites, LRAT proofs are emitted directly by the solver so additional DRATtrim
conversion is not needed. All tools were ran with a timeout of 10000 seconds and all timings are reported in seconds (to one d.p.). For additional scaling comparison, we also report results for lratcheck
, an unverified LRAT proof checker implemented in C.cake_lpr
’s \(\textsf{RAT}\) optimizations (Section 5.3). Notably, cake_lpr
scales essentially linearly in the size of the proofs (up to \(\approx 10\) million proof steps). As a result, cake_lpr
is significantly faster than acl2lrat
and coqlrat
on these \(\textsf{RAT}\)heavy proofs and it comes within a 5x factor of the unverified lratcheck
tool. Problem  pgbdd  lratcheck  cake_lpr  acl2lrat  coqlrat  LRAT Steps  File Size 

mchess20  3.9  0.5  0.5  19.6  3405.2  125752  5 
mchess40  47.5  1.0  3.5  453.4  Fail(T)  769287  36 
mchess60  311.7  2.7  10.6  4885.2  Fail(T)  2300522  114 
mchess80  1164.1  4.8  22.6  Fail(T)  Fail(T)  5089457  259 
mchess100  3599.0  9.3  44.2  Fail(T)  Fail(T)  9506092  499 
6.4 Erdős Discrepancy Properties
cake_lpr
’s compositional proof checking option on a cubeandconquer proof generated by iGlucose
, a version of Glucose 3.0
with support for iCNF files, and the technique reported in Section 4.2. The technique yields a 5226 line compositional toplevel proof where each line is justified by an underlying LRAT proof, see Fig. 8. The underlying proofs consist of 20 million clause addition lines in total and they vary widely in size, ranging from 88 bytes to 110 MB.cake_lpr
to check the underlying proofs for ranges \((i_0,i_1), (i_2,i_3), \dots \), while the second thread checks all evennumbered proof steps for ranges \((i_1,i_2), (i_3,i_4), \dots \), and similarly for more parallel threads. Results are summarized in Table 6 with wallclock execution times reported in seconds and relative speedup against a single thread. All values are rounded to one decimal place. Threads  cake_lpr  Speedup 

1  3871.7   
2  1985.4  2.0 
4  998.2  3.9 
8  517.4  7.5 
16  273.2  14.2 
32  136.4  28.4 
64  76.5  50.6 
128  60.2  64.3 
7 Related Work
DRATtrim
or GRATgen
. Alternative preprocessing tools are available, e.g., based on the recently proposed FRAT format [4]. The DRAT format is itself an extension of the DRUP format [21]; the Coq checker is based on a predecessor verified checker for the GRIT [11] format. The ACL2 checker can be efficiently and directly executed (without extraction) using imperative primitives native to the ACL2 kernel [20]. However, the implementation of these features in ACL2 itself must be trusted to trust the proof checking results [50], hence the yellow background in Table 1. SMTCoq [2, 14] is another certificatebased checker for SAT and SMT problems in Coq. Its resolutionbased proof certificates can be checked natively using native computation extensions of the Coq kernel.cake_lpr
checker is designed to handle the latter two challenges effectively. For the first challenge, the SAT encoding of mathematical problems can also be verified within proof assistants. This was done for the Boolean Pythagorean Triples problem building on the Coq proof checker [12].8 Conclusion
cake_lpr
, a performant proof checker supporting both formats down to its machinecode implementation. Given the strength of the \(\textsf{PR}\) proof system, there is ongoing research into the design of satisfactiondriven clause learning techniques [27, 28, 49] for SAT solvers based on \(\textsf{PR}\) clauses. Our proof checker opens up the possibility of using a verified checker to help check and debug the implementation of these new techniques. It also gives future SAT competitions the option of providing \(\textsf{PR}\) as the default (verified) proof system for participating solvers. Another interesting direction is to add cake_lpr
support for other proof formats [5, 39]. In particular, this would allow users to build compositional proofs that utilize different underlying proof formats for separate parts of the proof.Acknowledgements
GRATgen
and Stefan O’Rear for help with profiling CakeML programs.