1 Prologue: Larry Wos, A Recollection
My first encounter with Larry Wos dates to when I was a PhD student at the State University of New York at Stony Brook and Larry was the head of the theoremproving group at the Argonne National Laboratory. I had submitted to CADE (the international Conference on Automated DEduction) a paper, coauthored with Siva Anantharaman of the Université d’Orléans, about proving mechanically with the theorem prover SBR3 a theorem in Łukasiewicz manyvalued logic [2].
Łukasiewicz had conjectured that a set of five axioms, together with modus ponens, constitutes an axiomatization of the manyvalued logic [171] that would be later called after him. The conjecture had been proved first by Wajsberg, and then independently by Rose and Rosser [153] and by Chang [62, 64]. Then Meredith [129] and Chang [63] also independently had derived the fifth axiom from the other four. The latter problem had been brough to my attention by Daniele Mundici of the Università degli Studi di Milano, as a challenge for automated theorem provers. My CADE submission with Siva presented a mechanical proof of the dependency of the fifth axiom in an equivalent equational formulation in Wajsberg algebras, a class of algebras connected to Łukasiewicz manyvalued logic [85, 152]. The submission was rejected, and a referee report said that the reason for rejection was that they had given the problem to Otter, the theorem prover developed at Argonne, and Otter also had succeeded. Siva and I continued the investigation of manyvalued logic and Wajsberg algebras, proving other problems with SBR3 [3].
Anzeige
Several months after the CADE rejection, I read in the Newsletter of the Association for Automated Reasoning (AAR) an article by Larry Wos [182], presenting the dependency of the fifth axiom, in both formulations, as a challenge for theorem provers, and citing my unpublished work on the topic. After consulting with my advisor Jieh Hsiang, I wrote an email to Larry. To my great surprise, Larry called me on the phone, and we talked for quite a while. Larry encouraged me to send to the AAR Newsletter my own presentation of both original and equational formulations of the problem, which appeared several months later [41]. Łukasiewicz manyvalued logic became a source of challenge problems for theorem provers at Argonne. Larry and I became friends with that phone call and remained friends ever since.
When Bill McCune served on my PhD defense committee, he invited me to visit Argonne during the interval between my departure from Stony Brook and the start of a postdoc at INRIA Lorraine. During that visit I worked with Bill, but I often had the chance to go for lunch or otherwise meet informally with Larry. Larry talked fondly and often of his wife Nancy, who, like him, could not see. Larry loved to make fun of people around him, but would never make fun of Bill. I thought that Larry had so much respect for Bill, as the author of Otter, that he would not tease him. It is also possible that Larry knew that he would not get much satisfaction out of Bill’s reserved personality. At times I was a bit worried that Larry would tease me, but that never happened. Perhaps Larry guessed that I would have neither reacted nor let my feelings show, and hence there was not much fun in trying, or else he respected me for my conduct in relation to the events around the Łukasiewicz problem.
A few years later, when I was on the faculty of the University of Iowa, Larry invited me to succeed Bob Veroff as Secretary of the AAR, a job that I was glad to accept. The following summer I returned to Argonne to discuss research with Bill, but unfortunately I did not get to meet with Larry. I was delighted when Larry resumed calling me on the phone to discuss AAR matters, and all the more so because that was after I had moved to the Università degli Studi di Verona: a different time zone was obviously no deterrent for Larry when he wanted to talk on the phone. During one of these conversations he once told me: “Don’t you ever leave CADE and the AAR: we can’t do it without you.” I knew it could not be true, but it is an eloquent example of Larry’s style in letting people know how much they were appreciated.
My last interaction with Larry was when the late Mark Stickel and I took the initiative of editing a volume in memory of Bill McCune: Mark and I were thrilled that Larry contributed the opening chapter in the volume [183]. While Larry’s preference for the telephone over email was most likely due to the fact that he could not see, I think that it was also a kind of wisdom, as email lacks the information that the human voice can convey.
Anzeige
2 Introduction
Larry Wos broke new ground in the design of both fundamental components of a theoremproving strategy or proof procedure, namely the inference system and the search plan. His seminal contributions include the setofsupport strategy for resolutionbased theorem proving [187], and the demodulation [188] and paramodulation [147] inference rules for equational reasoning. Each of them had a profound impact on theorem proving.
The idea of set of support was a primer in controlling resolution [149] with semantic knowledge (i.e., that a part of the problem is satisfiable), opening the way to semantic or semanticallyguided, supported, and goalsensitive strategies (e.g., [17, 59, 60, 140, 142, 161, 162]). The setofsupport strategy is also at the origin of the givenclause algorithm implemented first in Otter (e.g., [127, 128]) and then in many theorem provers (e.g., [66, 96, 110, 146, 157, 159, 170, 175, 180]).
With demodulation, Larry Wos posed the problem of wellfounded replacement of equals by equals in theorem proving. With the concept of paramodulation, he challenged the field with the WosRobinson conjecture on the refutational completeness of paramodulation without paramodulating into variables and without functionally reflexive axioms. The successful solution of these problems involved decades of research, leading to a merger of resolutionbased and completionbased [6‐9, 54, 98, 101, 107] theorem proving that shaped the field of theorem proving for firstorder logic with equality. The resulting inference systems combine resolution, paramodulation, superposition, wellfounded demodulation, and subsumption (e.g., [10, 99, 133, 154, 155]). These inference systems have been called completionbased, rewritebased, saturationbased, or orderingbased [45], given the key role played by wellfounded orderings on terms, literals, and clauses.
Larry was interested mainly in devising inference rules and search plans, and refining them through experiments with the Argonne provers. Accordingly, this survey focuses on the history of inference rules and search plans, and covers neither that of fundamental concepts in theorem proving, such as completeness, fairness, saturation, redundancy, and canonicity, nor that of completeness proof techniques, such as semantic trees, proof orderings, transfinite semantic trees, and rewrite models.
Furthermore, this survey is about Larry Wos’ contributions, and hence it considers resolutionbased theorem proving and its extensions to equality reasoning leading to the above mentioned orderingbased inference systems. There are several other fundamental principles for theorem proving in firstorder logic, including model elimination [118, 119], matings [4] or connections [33], all three formalized as tableauxbased strategies (e.g., [18, 93, 114, 116]), instancebased methods (e.g., [15, 38, 88, 112]), tableaux with instance generation (e.g., [14, 16, 20]), model evolution [21, 22], SGGS (SemanticallyGuided GoalSensitive reasoning) which is modelbased, instancebased, and conflictdriven [59, 60], and the list is certainly incomplete.
The interested reader can find complementary material in books (e.g., [34, 35, 65, 113, 120, 141, 163]), surveys about theorem proving in general (e.g., [37, 45, 48, 117, 136, 139]), surveys about resolution, rewriting, and equational reasoning (e.g., [11, 12, 76, 78, 83, 130, 137]), and surveys of tableauxbased strategies (e.g., [18, 93, 114, 116]), instancebased strategies (e.g., [103, 108]), modelbased methods [51], and conflictdriven methods [47]. This article has historic contents, but given its focus on one scientist—Larry Wos—it cannot be a wellrounded account of the early history of theorem proving. Sources dedicated to the early history of the field (e.g., [36, 68, 185]) and to its roots in the general history of mathematics and computer science (e.g., [69]) are available.
This article is organized as follows. Section 2 introduces the theoremproving problem, reconstructing the state of the art prior to Larry Wos’ work, and outlining some of his ideas about the properties that inference rules, search plans, and proofs ought to have. Sections 3, 4, and 5 are devoted to the setofsupport strategy, demodulation, and paramodulation, respectively. For each of them, the main phases of their evolution from Larry’s time to their standardization are outlined, showing the impact of Larry’s ideas. Section 6 discusses a selection of subsequent research directions whose origins can be connected to Larry’s work.
3 Setting the Stage: Resolutionbased Theorem Proving
The theoremproving problem is about determining whether a formula \(\varphi \) is a logical consequence of a set of formulae H, written \(H \models ^? \varphi \), or, equivalently, whether the formula \(H\supset \varphi \) is valid, written \(\models ^? H\supset \varphi \). Mechanical theorem proving approaches this problem refutationally, by trying to determine whether \(H \cup \{\lnot \varphi \}\) is unsatisfiable, and in clausal form, by turning \(H \cup \{\lnot \varphi \}\) into an equisatisfiable set S of clauses. A clause is a disjunction of literals with variables implicitly universally quantified. A set of clauses is understood logically as their conjunction, where each clause has its own variables. A clause is a unit clause, if it contains exactly one literal; it is a positive clause, if all its literals are positive; it is a negative clause, if all its literals are negative; it is a Horn clause, if it contains at most one positive literal. An ordering is a binary relation that is irreflexive and transitive. A quasiordering is a binary relation that is reflexive and transitive. An ordering is wellfounded, if it admits no infinite descending chain.
3.1 Expansion and Contraction Inference Rules
In the context of refutational clausal theorem proving, an inference system \(\varGamma \) is a collection of inference rules that transform sets of clauses. In resolutionbased theorem proving the most important such rule is the binary resolution inference rule, that combines unification with resolving upon literals of opposite sign [149]. According to [36], the idea of propositional resolution appeared as early as 1937 [39], was rediscovered in 1955 [132], and applied to theorem proving in the DavisPutnam procedure [70] as well as in another procedure [82]. The basic idea of unification already appeared in the work of Herbrand [95] and Prawitz [143]. Nonetheless, it was Alan Robinson who understood how to merge these two ideas in the resolution principle for firstorder theorem proving [149]. Binary resolution generates from two clauses, termed parents, a new clause, termed binary resolvent, and adds it to the set:where \(L_1\) and \(L_2\) are the literals resolved upon that unify and have opposite sign, and C and D are disjunctions of literals. Here, and in the sequel, unifiers are most general unifiers, abbreviated mgu’s. Binary resolution is accompanied by factoring, which generates from a clause, termed parent, a new clause, termed factor, and adds it to the set [149]:In the original presentation [149], binary resolution and factoring were integrated in the resolution inference rule, so that a resolvent of two clauses C and D is either a binary resolvent of C and D, or a binary resolvent of a factor of C and D, or a binary resolvent of C and a factor of D, or a binary resolvent of a factor of C and a factor of D [65]. Thus, factoring only needs to be applied to resolution parents. Larry Wos propounded considering binary resolution and factoring as distinct inference rules (e.g., [187]), so that binary resolution can be called simply resolution, as we will do from now on.
$$\begin{aligned} Binary\ Resolution :\ \ \ \begin{array}{c}{S\cup \{L_1 \vee C,\ L_2 \vee D\}}\\ \hline {S\cup \{L_1 \vee C,\ L_2 \vee D,\ (C \vee D)\sigma \}}\end{array}\ \ \ L_1\sigma = \lnot L_2\sigma , \end{aligned}$$
$$\begin{aligned} Factoring :\ \ \ \begin{array}{c}{S\cup \{L_1 \vee \ldots \vee L_k \vee C\}}\\ \hline {S\cup \{L_1 \vee \ldots \vee L_k \vee C,\ (L_1 \vee C)\sigma \}}\end{array} \ \ \ L_1\sigma = L_2\sigma = \ldots = L_k\sigma . \end{aligned}$$
If a parent is a unit clause, resolution is unit resolution, also a feature of the DavisPutnam procedure [70]. Unit resolution is advantageous, because the resolvent is one literal shorter than the nonunit parent. In general, the resolvent inherits all the literals of its parents, except the two literals resolved upon, so that inferred clauses grow longer and longer, and hence more expensive to process. This wellknown disadvantage of resolution was later studied as duplication by combination [138, 141].
Resolution and factoring are expansion inference rules, as they fit in the expansion inference scheme:where \(S\subset S^\prime \) says that the existing set S of clauses is expanded by adding some clause. Symmetrically, contraction inference rules are rules that contract the set of clauses, because they delete clauses or replace them by smaller ones according to the contraction inference scheme:where \(S\not \subseteq S^\prime \) tells that something has been deleted, and \(S^\prime \mathop {{\prec }}_{mul} S\) says that \(S^\prime \) is smaller than S in the multiset extension [77] of a wellfounded ordering \(\prec \) on clauses. The multiset extension of a wellfounded ordering is also wellfounded [77]. The double inference line [49] emphasizes the diversity of contraction with respect to the traditional notion of inference in logic (e.g., natural deduction). Contraction rules that only delete clauses are also called deletion rules, whereas contraction rules that delete clauses and replace them by smaller ones are also called replacement rules. Expansion rules and replacement rules together are called generative inference rules [57], because they are those that generate clauses.
$$\begin{aligned} Expansion :\ \ \ \begin{array}{c}{S}\\ \hline {S^\prime }\end{array} \ \ \ \begin{array}{c} S\subset S^\prime , \end{array} \end{aligned}$$
$$\begin{aligned} Contraction :\ \ \ \begin{array}{c}{S}\\ \hline \hline {S^\prime }\end{array} \ \ \ \begin{array}{ccc} S\not \subseteq S^\prime ,&\,&S^\prime \mathop {{\prec }}_{mul} S,\end{array} \end{aligned}$$
Assuming \(Th(S) = \{C \mathop {\,:\,}S \models C\}\), a generative inference rule is sound, if \(S^\prime \subseteq Th(S)\): whatever is added is a logical consequence of what preexisted. A contraction rule is adequate [49], if \(S\subseteq Th(S^\prime )\): whatever is deleted is a logical consequence of what remains. Adequacy implies monotonicity [53]: a contraction rule is monotonic if \(Th(S)\subseteq Th(S^\prime )\). Soundness and adequacy together imply \(Th(S) = Th(S^\prime )\). An inference system \(\varGamma \) is sound and adequate if its rules are.
The contraction inference rules for resolutionbased theorem proving that were known prior to Larry Wos’ contributions are tautology deletion, purity deletion, and subsumption, all three deletion rules. Tautology deletion appeared in the Davis–Putnam procedure [70]:Purity deletion appeared first for propositional logic in the DavisPutnam procedure [70], and, according to [36], in the procedure in [81]. It was generalized to firstorder logic by integrating it with unification [149]:where literal \(L_1\) is pure in \(S\cup \{L_1 \vee C\}\) if S contains no clause \(L_2 \vee D\) such that \(L_1\sigma = \lnot L_2\sigma \). Subsumption also appeared in [149]:where \(\sigma \) is a matching substitution, \(\subseteq \) is the subset relation between clauses viewed as sets of literals, and \(\vert C \vert \) is the number of literals in clause C. The original definition [149] did not require that \(\vert C \vert \le \vert D \vert \), because factoring was integrated into resolution. If resolution and factoring are treated as separate inference rules, either the \(\vert C \vert \le \vert D \vert \) condition must be added, or clauses must be treated as multisets of literals, in order to prevent a clause from subsuming its factors:where \(\subseteq \) is the subset relation between clauses viewed as multisets of literals. From now on, clauses are considered as multisets of literals. On the other hand, a factor can subsume its parent, and the combination of factoring and subsumption where a factor is generated and then it subsumes its parent, is known as condensation [105, 127]. If C is a unit clause, subsumption is called unit subsumption.
$$\begin{aligned} Tautology\ Deletion :\ \ \ \begin{array}{c}{S\cup \{L \vee \lnot L \vee C\}}\\ \hline \hline {S}\end{array} \end{aligned}$$
$$\begin{aligned} Purity\ Deletion :\ \ \ \begin{array}{c}{S\cup \{L_1 \vee C\}}\\ \hline \hline {S}\end{array} \ \ \ if \ L_1\ is\ pure\ in \ S\cup \{L_1 \vee C\}, \end{aligned}$$
$$\begin{aligned} Subsumption :\ \ \ \begin{array}{c}{S\cup \{C,\ D\}}\\ \hline \hline {S\cup \{C\}}\end{array} \ \ \ \begin{array}{ccc} C\sigma \subseteq D,&\,&\vert C \vert \le \vert D \vert ,\end{array} \end{aligned}$$
$$\begin{aligned} Subsumption :\ \ \ \begin{array}{c}{S\cup \{C,\ D\}}\\ \hline \hline {S\cup \{C\}}\end{array}\ \ \ C\sigma \subseteq D, \end{aligned}$$
Subsumption and unit resolution can be combined into a replacement rule named Clausal Simplification [155]:where the resolvent C, produced by unit resolution of \(L_1\) and \(L_2\vee C\), subsumes its nonunit parent \(L_2\vee C\), because \(\sigma \) is a matching substitution that does not instantiate the variables in the literals of C.
$$\begin{aligned} Clausal\ Simplification :\ \ \ \begin{array}{c}{S\cup \{L_1,\ L_2\vee C\}}\\ \hline \hline {S\cup \{L_1,\ C\}}\end{array}\ \ \ L_1\sigma = \lnot L_2, \end{aligned}$$
Contraction inference rules use matching, whereas expansion inference rules use unification. When contraction is applied to delete or simplify a newly generated clause with respect to previously existing clauses, it is called forward contraction. When contraction is applied to delete or simplify previously existing clauses by a newly generated clause, it is called backward contraction. Showing his appreciation of the importance of contraction, Larry Wos wrote that subsumption should have been considered even more important than resolution among Alan Robinson’s contributions [186].
3.2 Derivations and Refutational Completeness
Given input set S of clauses and inference system \(\varGamma \), a derivation by \(\varGamma \), or \(\varGamma \)derivation, is a sequence of the form \(S_0 \mathop {\vdash }_{\varGamma } S_1 \mathop {\vdash }_{\varGamma } \ldots S_i \mathop {\vdash }_{\varGamma } S_{i+1} \mathop {\vdash }_{\varGamma } \ldots \), where \(S_0 = S\), and \(\forall i, i\ge 0\), \(S_{i+1}\) is derived by applying to \(S_i\) an inference rule of \(\varGamma \). An inference system discovers that S is unsatisfiable, by showing that S is inconsistent, that is, by deriving from S a contradiction, represented in clausal form by the empty clause, denoted with \(\Box \). A derivation with input S is a refutation of S, if there exists a k, \(k\ge 0\), such that \(\Box \in S_k\). An inference system \(\varGamma \) is refutationally complete, if for all unsatisfiable clause sets S given as input there exists at least a \(\varGamma \)derivation that is a refutation of S. A derivation is characterized by the set \(S^* = \bigcup _{i\ge 0} S_i\) of all input or generated clauses and the set \(S_\infty = \bigcup _{j\ge 0} \bigcap _{i\ge j} S_i\) of all persistent clauses. The latter set is called the limit of the derivation.
The inference system with resolution and factoring as expansion inference rules, and tautology deletion, purity deletion, subsumption, and clausal simplification as contraction inference rules, is sound and adequate, and it is refutationally complete [149], provided forward subsumption is applied before backward subsumption [53, 111, 120]. As remarked in [120], the reason for this proviso is that the subsumption ordering defined by \(C{\mathop {\_\!\_}\limits ^{\lessdot }}D\) if \(C\sigma \subseteq D\) is not wellfounded. More precisely, \({\mathop {\_\!\_}\limits ^{\lessdot }}\) is not an ordering, it is a quasiordering, and the induced equivalence relation \(C\doteq D\) if \(C{\mathop {\_\!\_}\limits ^{\lessdot }}D\) and \(D{\mathop {\_\!\_}\limits ^{\lessdot }}C\) admits equivalence classes of infinite cardinality. Thus, a derivation can generate an infinite series of equivalent clauses and an infinite series of subsumption steps that preempt the resolution steps leading to a contradiction. A solution is to restrict subsumption as follows (e.g., [155]):where the strict subsumption ordering \(\lessdot \), defined by \(C\lessdot D\) if \(C{\mathop {\_\!\_}\limits ^{\lessdot }}D\) and
, is wellfounded. However, this solution prevents the inference system from subsuming one out of two clauses C and D such that \(C\doteq D\). Such clauses can be similar clauses or variants. They are similar [60], if \(C\sigma \subseteq D\) and \(D\rho \subseteq C\) by substitutions \(\sigma \) and \(\rho \) that replace variables by variables, possibly replacing distinct variables by the same. They are variants, if they are equal up to variable renaming, that is, \(C\sigma \subseteq D\) and \(D\rho \subseteq C\) by substitutions \(\sigma \) and \(\rho \) that are variable renamings, meaning that they replace variables by variables, without replacing distinct variables by the same. Similar clauses have factors that are variants.
$$\begin{aligned} Proper\ Subsumption :\ \ \ \begin{array}{c}{S\cup \{C,\ D\}}\\ \hline \hline {S\cup \{C\}}\end{array}\ \ \ C\lessdot D, \end{aligned}$$
Example 1
Clauses \(C = P(x)\vee P(y)\vee Q(y)\) and \(D = P(w)\vee Q(w)\vee Q(v)\) are similar as they satisfy \(C\sigma \subseteq D\) with \(\sigma = \{x\leftarrow w, y \leftarrow w\}\) and \(D\rho \subseteq C\) with \(\rho = \{w \leftarrow y, v\leftarrow y\}\)). Clauses \(P(x)\vee Q(y)\) and \(P(w)\vee Q(v)\) are variants.
Assume that all clauses in \(S^*\) are given distinct increasing natural numbers as identifiers, as it happens in implementations. For clauses C and D such that \(C\doteq D\), the solution is to take the lexicographic combination of \(\lessdot \) and the ordering on the natural numbers and apply it to pairs (C, n), where n is the identifier of clause C. This ordering is wellfounded, because a lexicographic combination of wellfounded orderings is wellfounded. Given pairs (C, n) and (D, m) such that \(C\doteq D\), clause C subsumes D if \(n < m\), that is, if C was generated before D. Applying forward subsumption before backward subsumption implements this concept.
3.3 Search Plans, Fairness, Strategies, and Proof Reconstruction
Given a set S of clauses and an inference system \(\varGamma \), there is in general more than one way to apply rules in \(\varGamma \) to S. This means that \(\varGamma \) is nondeterministic, so that the \(\varGamma \)derivation with input S is not unique. In order to obtain a deterministic procedure, \(\varGamma \) is coupled with a search plan \(\varSigma \) that chooses at each stage of the derivation which rule to apply to which clauses. A search plan \(\varSigma \) is fair for an inference system \(\varGamma \), if for all input clause sets S, if there exist refutations of S by \(\varGamma \), the \(\varGamma \)derivation driven by \(\varSigma \) is a refutation of S.
A theoremproving strategy, or proof procedure, is a pair \({{\mathcal {P}}} = \langle \varGamma , \varSigma \rangle \), where \(\varGamma \) is an inference system and \(\varSigma \) is a search plan. Given input set S of clauses, \({{\mathcal {P}}}\) generates the unique \(\varGamma \)derivation driven by \(\varSigma \). A theoremproving strategy \({{\mathcal {P}}} = \langle \varGamma , \varSigma \rangle \) is complete, if for all unsatisfiable input clause sets S the \(\varGamma \)derivation generated by \(\mathcal{P}\) is a refutation of S. Thus, if \(\varGamma \) is refutationally complete and \(\varSigma \) is fair, \(\mathcal{P} = \langle \varGamma , \varSigma \rangle \) is complete, and it is a semidecision procedure for validity in firstorder logic.
As both \(S^*\) and \(S_\infty \) contain many clauses that are unrelated to the generation of \(\Box \), when \(\Box \in S_k\), the strategy reconstructs the generated proof in the form of the ancestorgraph [56] of \(\Box \), denoted \(\varPi (\Box )\). The reconstruction starts from \(\Box \) and proceeds backward until it reaches input clauses, following the applications of generative rules, whereas applications of deletion rules do not appear in proofs. Since a clause may be used as premise more than once, \(\varPi (C)\) is a tree, if different nodes are allowed to have the same clause as label, it is a directed acyclic graph otherwise. The ancestorgraph \(\varPi (C)\) is defined for all \(C\in S^*\). If C is an input clause, \(\varPi (C)\) has one node labeled by C itself. If C is generated by a generative rule from premises \(D_1,\ldots ,D_n\), the ancestorgraph \(\varPi (C)\) has root labeled C and subtrees \(\varPi (D_1),\ldots ,\varPi (D_n)\). Since every clause has its own variables, and variants are treated as distinct clauses, no clause is generated twice, and \(\varPi (C)\) is uniquely defined for all \(C\in S^*\).
3.4 The Theorem Proving Challenge
Theorem proving is a challenging problem, because it requires to balance contrasting requirements. The theoremproving strategy should be complete, but also efficient in the use of time and memory, as Larry Wos emphasized in his seminal papers [147, 187]. Thus, the inference system should be refutationally complete, while featuring powerful contraction rules to counter the combinatorial explosion of expansion inferences. The search plan should be fair, but not exhaustive as in a breadthfirst search. Achieving simultaneously completeness and efficiency is so difficult in theorem proving, that not only in Larry Wos’ time, but also nowadays, it is standard practice to establish completeness in theory, and then include incomplete strategies in implementations and experiments. Proving the refutational completeness of an inference system is crucial to understand it, while playing with incomplete strategies allows the experimenter to prove more theorems by machine and may give ideas for new, complete, and more efficient strategies.
For theoremproving strategies, Larry Wos proposed sensitivity [187], later renamed goalsensitivity [141]. The transformation of a problem of the form \(H \models ^? \varphi \) into a problem of the form \(S \mathop {\vdash }_{\varGamma }^{?} \Box \), where S is the clausal form of \(H \cup \{\lnot \varphi \}\), loses the information about the distinction between H and \(\varphi \), information that may be useful for efficiency of the search. A strategy is goalsensitive, if it generates only, or preferably, clauses C such that at least a leaf of \(\varPi (C)\) is labeled by a clause in the clausal form of \(\lnot \varphi \). Since being goalsensitive may not be beneficial for all problems, a strategy ought to be flexible with respect to goalsensitivity [60].
For inference rules, Larry Wos suggested immediacy, convergence, and generality [147]. The first two properties mean that the inference rule generates neither intermediate results nor their consequences, a requirement fulfilled by hyperinferences in Larry’s time, as we shall see in the next section. The combined usage of most general unifiers in expansion inferences, and a wellfounded subsumption ordering for subsumption, fulfilled Larry’s notion of generality, in the sense of avoiding reasoning with instances when it is possible to reason with more general clauses.
For proofs, Larry Wos stressed brevity and naturalness [147, 186, 187]. The quest for shorter proofs was a main driver of Larry’s experimental work with Otter, as described in another article of this issue [25]. Naturalness means that the mechanical proof ought to resemble a human proof. Although this is a recurring concern in the theorem proving literature (e.g., [151]), the development of automated theorem proving has rather led to the discovery of forms of mechanical reasoning that are different from human reasoning.
4 The Set of Support Strategy
The setofsupport strategy was motivated by the objective of reducing irrelevant inferences [187] as advocated also in [67]. Larry Wos did not define formally in [187] the notion of irrelevant inference, but most likely he meant inferences that do not appear in any proofs. The setofsupport strategy was inspired by theoremproving problems \(H \models ^? \varphi \) where H contains the axioms of a mathematical theory. Indeed, mathematics was Larry’s preferred field of application for theorem proving. Since H is known to be satisfiable, generating logical consequences of H alone cannot lead to the discovery of a contradiction. Therefore, the idea of the setofsupport strategy is to forbid resolution inferences where both parents are in the clausal form of H. While the original description and completeness proof of the setofsupport strategy [187] were given for resolution and factoring only, in the next section the setofsupport strategy is presented for an inference system including the contraction rules encountered thus far, that are known to preserve the refutational completeness of resolution.
4.1 The Set of Support Strategy with Contraction
Given the input set S of clauses, obtained by transforming \(H \cup \{\lnot \varphi \}\) into clausal form, the setofsupport strategy partitions S into the set A of the clauses in the clausal form of H, and the set \( SOS \) (acronym of set of support) of the clauses in the clausal form of \(\lnot \varphi \). Therefore, one can write \(S = A \uplus SOS \), where \(\uplus \) denotes the union of disjoint sets, or \(A = S \setminus SOS \), where \(\setminus \) denotes subtraction between sets. If H is satisfiable, hence consistent, so is A. If \(\varphi \) is an implication \(\psi _1 \supset \psi _2\), so that \(\lnot \varphi \) is \(\psi _1 \wedge \lnot \psi _2\), one can also put in \( SOS \) only the clauses in the clausal form of \(\lnot \psi _2\), leaving in A the clauses in the clausal form of \(H\cup \{\psi _1\}\), provided the resulting A is consistent [188]. Then, only resolution steps with at most one parent from the complement of the set of support are allowed, so that the setofsupport strategy is goalsensitive. All resolvents are added to the set of support, leading to derivations of the form:where \(\forall i, i\ge 0\), \(S_i = A_i \uplus SOS _i\). For the first component, \(A_0 = A\) and \(\forall i, i\ge 0\), \(A_{i+1}\) is derived from \((A_i; SOS _i)\) in one of the following ways:For the second component, \( SOS _0 = SOS \) and \(\forall i, i\ge 0\), \( SOS _{i+1}\) is derived from \((A_i; SOS _i)\) in one of the following ways:Clauses in \(\bigcup _{i\ge 0} SOS _i\) are said to be supported, and a resolution inference is supported if at least a parent is. In the original presentation of the setofsupport strategy [187], all the factors of clauses in A are added to \(A_0\) in a preprocessing step, so that \(\forall i, i\ge 0\), \(A_i = A_0\), and only the set of support is expanded. Adding contraction while preserving the completeness of the setofsupport strategy requires to distinguish between deletion rules and replacement rules. The addition of a deletion rule is unproblematic: at any stage of the derivation a clause in either the set of support or its complement can be deleted by the deletion rule. Since a replacement rule is a generative rule, it can be added to the setofsupport strategy only in a way that preserves the consistency of the complement of the set of support: for clausal simplification this means that when applying \(L_1\) to simplify \(L_2 \vee C\) to C, the new clause C can be placed in \(A_{i+1}\) only if both \(L_1\) and \(L_2 \vee C\) are in \(A_i\), and must be placed in \( SOS _{i+1}\) otherwise [55].
$$\begin{aligned} (A_0;{SOS}_0)\mathop {\vdash }_{\varGamma }(A_1;{SOS}_1)\mathop {\vdash }_{\varGamma }\ldots (A_i;{SOS}_i)\mathop {\vdash }_{\varGamma }(A_{i+1};{SOS}_{i+1})\mathop {\vdash }_{\varGamma }\ldots , \end{aligned}$$

Add a factor of a clause in \(A_i\);

Delete a clause in \(A_i\) by tautology deletion or by purity deletion with respect to \(A_i \uplus SOS _i\);

Subsume a clause in \(A_i\) by a clause in \(A_i \uplus SOS _i\);

Apply clausal simplification to simplify a clause in \(A_i\) by a clause in \(A_i\) putting the simplified clause in \(A_{i+1}\).

Add a resolvent of a clause in \(A_i\) and a clause in \( SOS _i\);

Add a resolvent of two clauses in \( SOS _i\);

Add a factor of a clause in \( SOS _i\);

Delete a clause in \( SOS _i\) by tautology deletion or by purity deletion with respect to \(A_i \uplus SOS _i\);

Subsume a clause in \( SOS _i\) by a clause in \(A_i \uplus SOS _i\);

Apply clausal simplification to simplify a clause in \( SOS _i\) by a clause in \(S_i\) putting the simplified clause in \( SOS _{i+1}\);

Apply clausal simplification to simplify a clause in \(A_i\) by a clause in \( SOS _i\) putting the simplified clause in \( SOS _{i+1}\).
Larry Wos suggested two incomplete refinements of the setofsupport strategy [187]: one is based on a level bound and it forbids generating clause C if the depth of \(\varPi (C)\) is higher than the bound; the other one is based on a literal bound and it forbids generating clause C if \(\vert C\vert \) is higher than the bound.
4.2 Other Supported Strategies
Other supported strategies [45, 140, 141] can be obtained by giving different definitions of the initial set of support \( SOS \). In resolution with forward support \( SOS \) contains the positive input clauses. Thus, A contains the nonpositive input clauses and it is satisfied by the allnegative interpretation \(\mathcal{I}^\) that satisfies all negative literals. In resolution with backward support \( SOS \) contains the negative input clauses. Thus, A contains the nonnegative input clauses and it is satisfied by the allpositive interpretation \(\mathcal{I}^+\) that satisfies all positive literals. In resolution with user support \( SOS \) contains any subset of S chosen by the user, provided that its complement A is satisfiable. Larry Wos’ setofsupport strategy is an instance of resolution with user support.
Supported strategies where the initial set of support is defined based on sign are related to signbased refinements of resolution. Positive resolution, also known as the P1strategy [99, 148] or P1deduction [141], requires that every resolution step has a positive parent. Negative resolution, also known as allnegativeresolution [141], requires that every resolution step has a negative parent. Positive resolution is more restrictive than resolution with forward support, because the latter also allows resolutions between generated nonpositive parents, as long as at least one of them is supported. The same holds for negative resolution and resolution with backward support, except in the special case of Horn clauses, because a resolution between a negative clause and a nonnegative Horn clause generates a negative clause, so that only negative clauses are supported.
4.3 Semantic Strategies and Hyperinferences
The concept of not expanding a satisfiable subset of the set of clauses connects the setofsupport strategy with semantic resolution [161]. Semantic resolution restricts resolution by assuming a fixed interpretation \(\mathcal{I}\) for semantic guidance. The input set S is partitioned into the subset \(A = \{C \mathop {\,:\,}\mathcal{I}\models C\}\) of clauses satisfied by \(\mathcal{I}\), and its complement \( SOS = S\setminus A\), called \( SOS \) by analogy with the setofsupport strategy. However, semantic resolution moves the restriction from the parents (i.e., at most one from A) to the resolvent, by requiring that no resolvent C such that \(\mathcal{I}\models C\) is generated. Given a clause \(N = L_1 \vee \ldots \vee L_k\vee C\), termed nucleus, and k clauses \(E_1 = M_1 \vee D_1,\ldots ,E_k = M_k \vee D_k\), termed electrons or satellites, where C and \(D_i\), for \(i = 1\ldots k\), are disjunctions of literals, if there is a simultaneous mgu \(\sigma \) such that \(L_i\sigma = \lnot M_i\sigma \) for \(i = 1\ldots k\), semantic resolution generates the semantic resolvent \(R = (C \vee D_1 \vee \ldots \vee D_k)\sigma \):provided that \(\mathcal{I}\not \models R\). Since it embeds multiple resolution steps, semantic resolution is a hyperinference, and it fulfills Larry’s desiderata of immediacy and convergence [147], because the intermediate resolvents are not generated. Hyperresolution [148] is an instance of semantic resolution. If \(\mathcal{I}\) is \(\mathcal{I}^\), semantic resolution yields positive hyperresolution that resolves away all negative literals in the nucleus with positive satellites to generate a positive hyperresolvent. If \(\mathcal{I}\) is \(\mathcal{I}^+\), semantic resolution yields negative hyperresolution that resolves away all positive literals in the nucleus with negative satellites to generate a negative hyperresolvent. Resolution with set of support [187] does not work by hyperinferences, but it fits in the paradigm of semantic resolution, assuming an ad hoc interpretation \(\mathcal{I}\) such that \(\mathcal{I}\models A\) and \(\mathcal{I}\not \models SOS \).
$$\begin{aligned} Semantic\ Resolution :\ \ \ \begin{array}{c}{S\cup \{N,\ E_1,\ldots ,E_k\}}\\ \hline {S\cup \{N,\ E_1,\ldots ,E_k,\ R\}}\end{array} \ \ \ \mathcal{I}\not \models R, \end{aligned}$$
Larry Wos recognized that semantic resolution is more restrictive than resolution with set of support, and that completeness of the latter can be derived from completeness of the former [188], but he was mostly concerned with the risk that neither hyperresolution nor resolution with set of support suffice in practice [188]. He was interested in enlarging what he called the unit sections [188] of a derivation, that is, the stretches of a derivation where the resolution steps are unit resolution steps. He had already proposed the unitpreference strategy [184], where unit resolution steps have priority over other resolution steps. The next move was to devise an inference rule to generate unit clauses. To this end, Larry Wos applied the hyperinference concept towards a syntactic property (i.e., being a unit clause) rather than a semantic one. The result was unitresulting resolution [125, 188], or UR resolution for short. UR resolution is a hyperinference geared to generate unit clauses. Given a nucleus \(N = L_1 \vee \ldots \vee L_k \vee L_{k+1}\) with \(k+1\) literals, and k unit satellites \(M_1,\ldots , M_k\) (\(k\ge 1\)), UR resolution generates a unit resolvent:If literal \(L_{k+1}\) is allowed to be absent, UR resolution is allowed to generate \(\Box \). This inference rule appears at the bottom of page 702 and is the main object of the first definition on page 703 of [188]. The name unitresulting resolution appeared only much later [125], leading to the erroneous belief (e.g., [36]) that UR resolution appeared for the first time in [125]. In reality, and not surprisingly, UR resolution appeared in the same “milieu” of hyperresolution. This is testified also by the footnote on page 702 of [188], which relates the concept of UR resolution to that of clash in [150]. The term “clash” refers to the simultaneous resolution of multiple literals as in hyperresolution and semantic resolution (e.g., it is used systematically to present hyperinference rules in [65]).
$$\begin{aligned} UR\ Resolution :\ \ \ \begin{array}{c}{S\cup \{N,\ M_1,\ldots ,M_k\}}\\ \hline {S\cup \{N,\ M_1,\ldots ,M_k,\ L_{k+1}\sigma \}}\end{array} \ \ \ \forall i,\ 1{\le }i{\le }k,\ L_i\sigma = \lnot M_i\sigma . \end{aligned}$$
According to [36], UR resolution was invented independently by Gerd Veenker in his PhD thesis in 1966, and published the following year [173], the same year as [188]. While the main contribution of Veenker’s thesis was a complete procedure that can be considered an early forerunner of connectionbased methods [4, 33, 34], Veenker also proposed a strategy, that he called the NEU strategy, combining unit resolution and UR resolution as in Wos’ work [188]. An inference system including only unit resolution and UR resolution is incomplete, something that was wellknown to both Wos and Veenker. Nonetheless, UR resolution is widely adopted as a useful enhancement, because it accelerates the generation of unit clauses that trigger in turn unit subsumption inferences that eliminate clauses and unit resolution inferences that generate shorter resolvents.
4.4 The GivenClause Algorithm
The setofsupport strategy is also at the origin of the main algorithm inside most resolutionbased theorem provers, up to those that represent the state of the art today (e.g., the E prover [157, 159], Spass [180], Vampire [110, 146], Waldmeister [96], Zipperposition [66], and GKC [170]). The reason is that the setofsupport strategy was built into the Argonne’s provers AURA, LMA/ITP [122], and Otter [127, 128], and Otter’s main algorithm, called the givenclause algorithm, inspired most subsequent developers.
The givenclause algorithm maintains two lists of clauses, originally named axioms and sos. If axioms and sos are initialized with the clauses in A and \( SOS \), respectively, the givenclause algorithm implements the setofsupport strategy, and it satisfies the invariant that no expansion inference whose premises are all in the initial axioms will ever be performed. If axioms is initialized to be empty, and sos is initialized to contain all input clauses, the givenclause algorithm performs all possible inferences, and it satisfies the above invariant vacuously. Thus, the connection between the givenclause algorithm and the setofsupport strategy was weakened by renaming axioms as usable in Otter and its successor Prover9 [126].
The givenclause algorithm executes a loop, exiting when either a proof is found, or sos becomes empty, which means that the input set of clauses is satisfiable, or the prover hits a predefined threshold of time or memory. At every iteration, the prover selects from sos a clause, termed the given clause, moves it from sos to usable, and performs all applicable expansion inferences having as premises the given clause and clauses in usable. The fact that the given clause moves from sos to usable means that even if usable and sos initially contain the clauses in A and \( SOS \), respectively, the givenclause algorithm does not maintain the invariant that the clauses in sos are supported and those in usable are not supported, another reason for departing from the names axioms and sos.
If the given clause is the best clause according to some heuristic evaluation function, the givenclause algorithm performs a bestfirst search. For example, the notion of weight of a clause, defined as the sum of the userdefined weights of the symbols occuring in the clause, was introduced for this purpose in Otter [127, 128]. Another feature of Otter that became a fixture of the givenclause algorithm in most provers (e.g., [160]) is the pickgivenratio parameter, which allows the strategy to mix bestfirst and breadthfirst search. If the value of this parameter is k, the givenclause algorithm picks as given clause the oldest rather than the best clause once every \(k+1\) choices. The description of the givenclause algorithm will be extended to include contraction after introducing demodulation.
5 The Demodulation Inference Rule
Larry Wos was very interested in applying theorem proving to mathematics, and since the vast majority of such problems involves equality, he proposed demodulation [188] as a contraction inference rule to replace equals by equals.
5.1 The Original Definition of Demodulation
Given an equality unit clause, or equation, \(l\simeq r\), and a clause \(C[l\sigma ]\) containing as subterm an instance \(l\sigma \) of the side l of \(l\simeq r\), Larry Wos called \(C[r\sigma ]\) an immediate modulant of \(C[l\sigma ]\) [188]. Then a kmodulant, for \(k > 0\), is the result of k such replacement steps, and a modulant is any kmodulant [188]. As a clause has infinitely many modulants in general, but only finitely many kmodulants for a fixed k, Larry Wos defined kmodulation as the generation of a resolvent of parents \(C_k\) and \(D_k\), where \(C_k\) and \(D_k\) are kmodulants of clauses C and D [188]. However, Larry Wos also defined demodulation as replacement by a modulant, where each immediate modulant has strictly fewer symbols than its predecessor, and the final modulant has no immediate modulant with fewer symbols [188]. Thus, we can formalize his rule as follows:where \(l\simeq r\) is called demodulant, \(\parallel \! C\! \parallel \) is the number of symbols in C, and demodulation is defined as performing only one equational replacement step, according to the standard style for replacement rules. Subsequently, and especially in implementations, the name demodulator was also used in place of demodulant.
$$\begin{aligned} Demodulation :\ \ \ \begin{array}{c}{S\cup \{l\simeq r,\ C[l\sigma ]\}}\\ \hline \hline {S\cup \{l\simeq r,\ C[r\sigma ]\}}\end{array} \ \ \ {\parallel \! C[l\sigma ] \!\parallel }\ >\ {\parallel \! C[r\sigma ] \!\parallel }, \end{aligned}$$
However, the intended notion of number of symbols was not made explicit. If a term is viewed as a string also parentheses contribute to the symbol count, whereas they do not if a term is viewed as a tree. Also, number of symbols is ambiguous with respect to how to count repeated occurrences of the same symbol.
The size of an atom is the number of occurrences of predicate, function, constant, and variable symbols. For example, \(\parallel \! P(f(a),g(a))\!\parallel \ = 5\). Assume that the number of symbols in a clause is defined as the sum of the sizes of the atoms that occur in the clause. Then, the ordering whereby C is smaller than D if \(\parallel \! C\!\parallel \ <\ \parallel \! D\!\parallel \) is wellfounded. The ordering based on size was implemented in Otter, and remained available alongside with more sophisticated orderings such as recursive path orderings [74] and KnuthBendix orderings [107] that were introduced later (cf. Sect. 4.2). Since there are infinitely many variants of a clause and they all have the same size, variants have to be eliminated by subsumption (cf. Sect. 2.2). Indeed, theorem provers such as Otter apply subsumption before demodulation, so that if two clauses are variants, one is deleted by subsumption.
Nonetheless, the sizebased ordering does not allow the system to apply as demodulants many equations that it would be useful to apply, because the two sides of the equation have the same number of symbols. Also, this ordering may not allow the system to apply an equation in the desired direction. For example, \(\parallel \! x * (y + z) \!\parallel \ = 5\) and \(\parallel \! x * y + x * z \!\parallel \ = 7\), so that the distributivity law \(x * (y + z) \simeq x * y + x * z\) would be applied from right to left.
In summary, Larry Wos’ definition of demodulation is wellfounded, but the problem of wellfounded demodulation, in the sense of finding more and better wellfounded orderings to enable the demodulation of clauses, remained open.
5.2 WellFounded Demodulation by Rewrite Rules
The discovery of a solution to the problem of wellfounded demodulation was advanced significantly in the context of the Knuth–Bendix completion procedure [101, 107]. This procedure works with rewrite rules, where a rewrite rule is an equation \(l\simeq r\) that is written \(l\rightarrow r\) because \(l\succ r\) in a wellfounded ordering \(\succ \) on terms. A rewrite rule reduces or rewrites a term \(t[l\sigma ]_u\) to \(t[r\sigma ]_u\), where \(\sigma \) is a substitution, the notation \(t[l\sigma ]_u\) means that \(l\sigma \) occurs as a subterm in t at position u, and \(t[r\sigma ]_u\) is the term obtained by replacing the occurrence of \(l\sigma \) at position u with \(r\sigma \). Positions are strings of natural numbers: if terms are viewed as trees and arcs are labeled with natural numbers, every subterm has a position defined as the string of natural numbers from the root to the subterm. From now on positions are omitted for simplicity.
Knuth and Bendix defined a wellfounded ordering on terms, called since then the Knuth–Bendix ordering or KBO for short [107, 121, 124]. A KBO orders terms based on a precedence and a weighting function. A precedence is an ordering on symbols that may be partial or total. A weighting function assigns nonnegative weights to symbols. Since the definition is parametric with respect to precedence and weighting function, it defines a family of orderings.
A KBO is a reduction ordering, meaning that it is wellfounded, stable (\(t \succ u\) implies \(t\sigma \succ u\sigma \) for all substitutions \(\sigma \)), and monotonic (\(t \succ u\) implies \(c[t] \succ c[u]\) for all contexts c, where a context is a term with a hole). Another reduction ordering is the recursive path ordering [74], or RPO for short, that orders terms based on a precedence and a status (either multiset [74] or lexicographic [106]) of every symbol. If the status is lexicographic for all symbols, the ordering is called lexicographic path ordering, or LPO for short. Here too, since the definitions are parametric with respect to precedence and status, one gets families of orderings. The interested reader may find more information about orderings in surveys on rewriting [75, 76, 78]. Since weights are nonnegative, KBO’s correlate well with size, and therefore incorporate the intuition in Larry Wos’ definition of demodulation, whereby clauses are made simpler by reducing the number of symbols.
The Knuth–Bendix completion procedure was formalized as an inference system [6‐8] that transforms pairs (E; R), where E is a set of equations, and R is a set of rewrite rules, such that for all rules \(l\rightarrow r\in R\) it holds that \(l\succ r\) in a given reduction ordering \(\succ \) on terms. The inference rules of completion are seen as transforming the equational proofs of the theorems in \(Th(E \cup R)\) with respect to a proof ordering, that is, a stable, monotonic (with respect to replacement of subproofs), and wellfounded ordering > on proofs [6‐8, 49]. A key property of completion is that the inference rules are proofreducing [54] or good [49]: an inference rule deriving \((E^\prime ;R^\prime )\) from (E; R) is good, if for all theorems \(s\simeq t\in Th(E \cup R)\) and for all proofs \(\pi \) of \(s\simeq t\) in \(E \cup R\) there exists a proof \(\pi ^\prime \) of \(s\simeq t\) in \(E^\prime \cup R^\prime \) such that \(\pi \ge \pi ^\prime \).
Since the state of the derivation is a pair (E; R), there are three contraction inference rules that realize wellfounded demodulation by reducing a side of an equation or a side of a rewrite rule. Simplify reduces a side of an equation:where \(\simeq \) is symmetric. Compose reduces the righthand side of a rewrite rule, so that another rewrite rule is produced:Collapse reduces the lefthand side of a rewrite rule, so that an equation is produced:
where
is the strict encompassment ordering on terms. If an equation in E has the form \(s \simeq s\), the Delete inference rule removes it. If an equation \(p\simeq q\) in E is such that \(p\succ q\), the Orient inference rule removes \(p\simeq q\) from E and adds \(p\rightarrow q\) to R.
$$\begin{aligned} Simplify :\ \ \ \begin{array}{c}{(E\cup \{p[l\sigma ] \simeq q\};R\cup \{l \rightarrow r\})}\\ \hline \hline {(E\cup \{p[r\sigma ] \simeq q\};R\cup \{l \rightarrow r\})}\end{array} \end{aligned}$$
$$\begin{aligned} Compose :\ \ \ \begin{array}{c}{(E;R\cup \{p \rightarrow q[l\sigma ], l \rightarrow r\})}\\ \hline \hline {(E;R\cup \{p \rightarrow q[r\sigma ],\ l \rightarrow r\})}\end{array} \end{aligned}$$
The encompassment ordering is obtained by combining the subterm ordering and the subsumption ordering on terms. The subterm ordering is defined by \(t \unlhd s\) if \(s = c[t]\) for some context c. The subsumption ordering on terms is defined by \(s{\mathop {\_\!\_}\limits ^{\lessdot }}t\) if \(t = s\vartheta \) for some substitution \(\vartheta \). Terms s and t are variants, written \(s\doteq t\), if \(s {\mathop {\_\!\_}\limits ^{\lessdot }}t\) and \(t {\mathop {\_\!\_}\limits ^{\lessdot }}s\). The encompassment ordering is defined by
if \(t = c[s\vartheta ]\) for some context c and substitution \(\vartheta \). The strict encompassment ordering is defined by
if
and
, that is, \(t = c[s\vartheta ]\) where either the context c is not empty or the substitution \(\vartheta \) is not a variable renaming.
The purpose of the strict encompassment condition of the Collapse inference rule is to prevent \(l\rightarrow r\) from reducing \(p[l\sigma ]\) if l and \(p[l\sigma ]\) are variants. The reason is that such a step is not good (in the above sense of proofreducing) [6‐8]. In the Knuth–Bendix procedure the coexistence of two rewrite rules whose lefthand sides are variants is avoided by giving Simplification higher priority than Orient. If \(p\simeq q\) and \(l\simeq r\) are two equations such that \(p\succ q\), \(l\succ r\), and \(p \doteq l\), one of them, say \(p\simeq q\), gets oriented first into \(p\rightarrow q\), so that \(p\rightarrow q\) simplifies \(l\simeq r\) to \(q\simeq r\) before \(l\simeq r\) may get oriented into \(l\rightarrow r\).
If an equation in E can be neither simplified, nor deleted, nor oriented, the procedure fails. Thus, KnuthBendix completion provided only a partial solution to the problem of wellfounded demodulation.
5.3 WellFounded Demodulation by Equations
Knuth–Bendix completion solved the problem of wellfounded demodulation at the price of considering as demodulants only those equations that can be oriented into rewrite rules by the adopted ordering. This limitation was removed with the inception of unfailing [98] or ordered [6, 7, 9] completion, henceforth completion for short. Completion allows the inference system to use equations as demodulants provided the applied instance is oriented by the ordering.
Completion is a theoremproving strategy for problems of the form \(E \models ^? \forall {\bar{x}}. s\simeq t\), where E is a set of equations, the presentation of an equational theory, and \({\bar{x}}\) is the vector of all variables in \(s\simeq t\) [9, 42, 52, 54, 98, 101]. The negation of the conjecture yields \({\hat{s}}\not \simeq {\hat{t}}\), where \({\hat{s}}\) and \({\hat{t}}\) are s and t, respectively, with all the variables in \({\bar{x}}\) replaced by Skolem constants. The given ordering \(\succ \) on terms is assumed to be a reduction ordering [6, 7, 9], or a complete simplification ordering (CSO) [98]. A simplification ordering is stable, monotonic, and with the subterm property, which means that it includes the strict subterm ordering (i.e., \(p\rhd l\) implies \(p\succ l\)). A simplification ordering is wellfounded [74], hence it is a reduction ordering. A complete simplification ordering is also total on ground terms. KBO’s, RPO’s, and LPO’s are simplification orderings. KBO’s and LPO’s are CSO’s if the precedence is total, but not all RPO’s are CSO’s [10].
As it is no longer necessary to separate equations and rewrite rules, and completion is seen as theorem proving, the inference system can be written [42, 52, 54] as transforming pairs \((E;{\hat{s}}\not \simeq {\hat{t}})\), where \({\hat{s}}\not \simeq {\hat{t}}\) is called the target. The inference rules of completion are good [49] or proofreducing [42, 54] with respect to all ground theorems, which is enough for theorem proving, since the target is ground. The objective of the derivation is to reduce \({\hat{s}} \) and \({\hat{t}}\) to a common form so as to discover a contradiction with \(x\simeq x\), the clausal form of the reflexivity axiom for equality. Accordingly, one can distinguish between Simplification of the target:and Simplification of the presentation:
where \(l \simeq r\) is called a simplifier, and the second condition incorporates the side condition of Collapse. This side condition for simplification lets \(l \simeq r\) simplify \(p[l\sigma ] \simeq q\) when \(p[l\sigma ]\) is a variant of l, but q is not a variant of r, provided that \(q\succ p[r\sigma ]\), or, equivalently, \(q\succ r\sigma \) (if \(p[l\sigma ]\doteq l\), the context p is empty, \(\sigma \) is a variable renaming, \(p[l\sigma ] = l\sigma \), and \(p[r\sigma ] = r\sigma \)). For example, simplifying \(f(e,x)\simeq x\) by \(f(e,y)\simeq y\) is not allowed; simplifying \(f(e,x)\simeq h(x)\) by \(f(e,y)\simeq y\) is allowed as \(h(x)\succ x\); simplifying \(f(e,y)\simeq y\) by \(f(e,x)\simeq h(x)\) is not allowed as \(y\not \succ h(y)\).
$$\begin{aligned} \begin{array}{c}{(E\cup \{l \simeq r\};{\hat{s}}[l\sigma ]\not \simeq {\hat{t}})}\\ \hline \hline {(E\cup \{l \simeq r\};{\hat{s}}[r\sigma ]\not \simeq {\hat{t}})}\end{array} \ \ \ l\sigma \succ r\sigma , \end{aligned}$$
The next challenge was to generalize simplification to clauses, as intended in Larry Wos’ definition of demodulation, while preserving as much as possible the behavior of simplification in completion. This requires to extend the ordering \(\succ \) beyond terms. A step in this direction was achieved with the inference system in [155]. This system assumes that the ordering \(\succ \) is a CSO on terms and atoms that satisfies two additional properties. First, for all terms l, r, p, and q, such that \(l\succeq r\), and for all atoms A, (i) if \(l\unlhd A\) and the predicate symbol of A is not \(\simeq \), then \((l\simeq r) \prec A\); and (ii) if \(l\lhd p\) or \(l\lhd q\), then \((l\simeq r) \prec (p\simeq q)\). Second, for all ground terms l, r, and s, and for all ground atoms A, if \(l\succ r\), \(l\succ s\), and \((l\simeq r) \prec A \prec (l\simeq s)\), then A has the form \(l\simeq t\) for some ground term t.
This definition is illustrated in [155] with a predicatefirst extension of a CSO \(\succ \) on terms to atoms. It assumes a total precedence on predicate symbols such that \(\simeq \) is the smallest predicate symbol. Then, \(P(s_1,\ldots ,s_m) \prec Q(t_1,\ldots ,t_n)\) holds if P is smaller than Q in the precedence, or \(P = Q \ne {{\simeq }}\) and \((s_1,\ldots ,s_m) \prec _{lex} (t_1,\ldots ,t_n)\), or \(P = Q = {{\simeq }}\) and \((s_1,s_2) \prec _{mul} (t_1,t_2)\), where \(\prec _{lex}\) and \(\prec _{mul}\) are the lexicographc and multiset extensions of \(\prec \), respectively.
The inference system in [155] includes a simplification inference rule that allows a simplifier \(l \simeq r\) to simplify a clause \(C[l\sigma ]\) to \(C[r\sigma ]\), if \(l\sigma \succ r\sigma \) and \(C[l\sigma ]\) contains an atom A such that \(A\succ (l\sigma \simeq r\sigma )\). While the simplification rule of [155] allows some simplification, it does not preserve the behavior of simplification in completion.
Example 2
Given equations \(\{(1)\ f(x)\simeq g(x),\ (2)\ g(h(y))\simeq k(y)\}\), target theorem \(f(h(b))\not \simeq k(b)\), and precedence \(f> g> h> k > b\), the Simplification rule of completion allows equation (1) to simplify the target to \(g(h(b))\not \simeq k(b)\), with matching substitution \(\sigma = \{x\leftarrow h(b)\}\), since \(f(h(b)) \succ g(h(b))\) and
. Another step by the same simplification rule applies equation (2) to simplify \(g(h(b))\not \simeq k(b)\) to \(k(b)\not \simeq k(b)\), with matching substitution \(\vartheta = \{y\leftarrow b\}\), since \(g(h(b)) \succ k(b)\) and
. On the other hand, the simplification rule of [155] cannot perform these steps, and hence cannot yield a refutation by simplification. For example, for the first step, \(\{f(h(b)), k(b)\} \succ _{mul} \{f(h(b)), g(h(b))\}\) does not hold.
The footnote on page 2 of [10] says^{1} that the method of [155] “does discuss simplification to some extent, but for practical purposes his simplification tecniques are inadequate even for the very simplest case – completion of sets of universally quantified equations.” The issue is the generalization of the ordering beyond terms. The inference system of the superposition calculus [10]—henceforth \(\mathcal{SP}\mathcal{}\)—offered a solution with a systematic way to extend a reduction ordering on terms to atoms, literals, and clauses. The reduction ordering is assumed to be complete or completable, which means it is included in a complete ordering. All RPO’s are completable [10]. The first step is to treat nonequational literals as equational literals by treating nonequational atoms like terms, and reading a positive literal L as \(L\simeq \top \) and a negative literal \(\lnot L\) as \(L\not \simeq \top \), where \(\top \) is a new symbol such that \(t \succ \top \) for all terms t.
The second step is to extend the ordering \(\succ \) on terms to literals. This can be done in one of two ways. One way is to treat an equation \(p\simeq q\) as the multiset \(\{p, q\}\), a negated equation \(p\not \simeq q\) as the multiset \(\{p, p, q, q\}\), and compare literals in the multiset extension of the ordering on terms. The other one is to treat an equation \(p\simeq q\) as the multiset of multisets \(\{\{p\}, \{q\}\}\), a negated equation \(p\not \simeq q\) as the multiset of multisets \(\{\{p, \bot \}, \{q, \bot \}\}\), where \(\bot \) is a new symbol such that \(t \succ \bot \succ \top \) for all terms t, and compare literals by taking twice the multiset extension of the ordering on terms. The third step is to extend the ordering on literals to clauses by taking once more the multiset extension.
Simplification appears in \(\mathcal{SP}\mathcal{}\) as an instance of an inference rule called contextual reductive rewriting [10]. If the simplifier is a unit equational clause, contextual reductive rewriting yields the following rule:The second side condition requires that the applied instance of the simplifier is smaller than the clause it simplifies. This condition is only superficially similar to the one of the simplification rule in [155] as the difference is in the ordering.
$$\begin{aligned} Simplification :\ \ \ \begin{array}{c}{S\cup \{C[l\sigma ],\ l \simeq r\}}\\ \hline \hline {S\cup \{C[r\sigma ],\ l \simeq r\}}\end{array} \ \ \ \begin{array}{ccc} l\sigma \succ r\sigma ,&\,&C[l\sigma ]\succ (l\sigma \simeq r\sigma ).\end{array} \end{aligned}$$
Example 3
Consider the problem in Example 2. The simplification rule of \(\mathcal{SP}\mathcal{}\) allows both simplification steps, because \(\{f(h(b)), f(h(b)), k(b), k(b)\} \succ _{mul} \{f(h(b)), g(h(b))\}\) holds for the first step, and \(\{g(h(b)), g(h(b)), k(b), k(b)\} \succ _{mul} \{g(h(b)), k(b)\}\) holds for the second step.
However, Simplification of \(\mathcal{SP}\mathcal{}\) and Simplification of completion do not behave in general in the same way in the purely equational case.
Example 4
If \(b \succ c\), both the Collapse rule of KnuthBendix completion and the Simplification rule of completion allow \(f(x) \rightarrow b\) to simplify \(f(b) \rightarrow c\) to \(b \rightarrow c\), with matching substitution \(\sigma = \{x\leftarrow b\}\), because \(f(b) \succ b\) and
. On the other hand, \(\{f(b), c\} \succ _{mul} \{f(b), b\}\) does not hold, so that Simplification of \(\mathcal{SP}\mathcal{}\) does not allow the step.
A comparison of the second condition for Simplification in completion with the second condition for Simplification of \(\mathcal{SP}\mathcal{}\) explains the difference. Assume that the ordering on terms is a CSO. The second condition for Simplification in completion is
. The second condition for Simplification in \(\mathcal{SP}\mathcal{}\) when an equation \(l\simeq r\) simplifies a unit positive equational clause \(p[l\sigma ] \simeq q\) is \(\{p[l\sigma ], q\}\succ _{mul} \{l\sigma , r\sigma \}\).
If the Simplification rule of completion applies because
holds as p is not empty, we have \(p[l\sigma ] \succ l\sigma \succ r\sigma \) by the subterm property and the condition \(l\sigma \succ r\sigma \) in both simplification rules. Thus, \(\{p[l\sigma ], q\}\succ _{mul} \{l\sigma , r\sigma \}\) follows and Simplification of \(\mathcal{SP}\mathcal{}\) applies.
If the Simplification rule of completion applies because
does not hold (p is empty and \(\sigma \) is a variable renaming) and \(q \succ p[r\sigma ]\) holds, we have \(p[l\sigma ] = l\sigma \), \(p[r\sigma ] = r\sigma \), and hence \(q \succ r\sigma \), so that \(\{p[l\sigma ], q\}\succ _{mul} \{l\sigma , r\sigma \}\) follows and Simplification of \(\mathcal{SP}\mathcal{}\) applies.
If the Simplification rule of completion applies because
holds as p is empty, but \(\sigma \) is not a variable renaming, and \(q \succ p[r\sigma ]\) does not hold, then \(\{p[l\sigma ], q\}\succ _{mul} \{l\sigma , r\sigma \}\) does not follow, and Simplification of \(\mathcal{SP}\mathcal{}\) does not apply. Example 4 illustrates this situation, where completion lets a simplifier (i.e., \(f(x) \rightarrow b\)) rewrite a proper instance of its lefthand side (i.e., f(b)) even if the righthand side of the simplifier (i.e., b) is larger than the righthand side of the rewrite rule to be simplified (i.e., c).
It is also interesting to see how simplification is implemented. For instance, the E prover [157] distinguishes between Simplification of negative literals and Simplification of positive literals. The former isbecause \(l\sigma \succ r\sigma \) implies \(\{p[l\sigma ], p[l\sigma ], q, q\}\succ _{mul} \{l\sigma , r\sigma \}\). Indeed, if p is not empty, \(\{p[l\sigma ], p[l\sigma ], q, q\}\succ _{mul} \{l\sigma , r\sigma \}\) follows from \(p[l\sigma ] \succ l\sigma \succ r\sigma \) as discussed above. If p is empty, \(\{l\sigma , l\sigma , q, q\}\succ _{mul} \{l\sigma , r\sigma \}\) follows from \(l\sigma \succ r\sigma \). Thus, it suffices to consider the literal being rewritten to establish the second condition of Simplification of \(\mathcal{SP}\mathcal{}\). On the other hand, Simplification of positive literals embeds conditions from the inference rules of completion:^{2}
Consider the second condition. If the first disjunct is true, either \(\{M, \top \} \succ _{mul} \{p[l\sigma ], q\}\) or \(\{M, M, \top , \top \} \succ _{mul} \{p[l\sigma ], q\}\). Either way, \(M\succ p[l\sigma ]\) and \(M\succ q\), so that \(M\succ p[l\sigma ] \succeq l\sigma \succ r\sigma \) holds. Thus, either \(\{M, \top \} \succ _{mul} \{l\sigma , r\sigma \}\) or \(\{M, M, \top , \top \} \succ _{mul} \{l\sigma , r\sigma \}\) holds and the second condition of Simplification of \(\mathcal{SP}\mathcal{}\) is fulfilled. If the second disjunct \(p[l\sigma ] \not \succ q\) is true, the step is an instance of Simplify or Compose. If the third disjunct
is true, the step is an instance of Collapse. Although the second disjunct \(q \succ p[r\sigma ]\) in the second condition of Simplification in completion does not appear, this confirms that the conditions for simplification from completion are important in the practice.
$$\begin{aligned} \begin{array}{c}{S\cup \{p[l\sigma ] \not \simeq q \vee D,\ l \simeq r\}}\\ \hline \hline {S\cup \{p[r\sigma ] \not \simeq q \vee D,\ l \simeq r\}}\end{array} \ \ \ l\sigma \succ r\sigma , \end{aligned}$$
Larry Wos’ intuition of demodulation as comprising multiple steps, until no further step can be applied, was captured in the context of completion and rewriting with the notion of normalization, or reduction to normal form. A clause C is in normal form with respect to a set S of clauses, if no unit equational clause in S can simplify it; equivalently, C is irreducible with respect to S, or Sirreducible. The normal form of C with respect to S is denoted \(C\!\downarrow _S\), where \({C \downarrow _S}\ = C\) if C is Sirreducible.
5.4 Demodulation and the GivenClause Algorithm
Larry Wos was interested in the application of demodulation in the context of the setofsupport strategy [188], which leads to the more general issue of the application of demodulation in the givenclause algorithm. The goal is to ensure that the givenclause algorithm implements an eagercontraction search plan, namely one where contraction has priority over expansion (e.g., [56]). In other words, the objective is to prevent a clause that can be deleted or replaced from playing the role of parent in an expansion inference.
In the givenclause algorithm, when a new clause C is generated by expansion, C is subject to forward contraction, that is, contraction with respect to a set S of already existing clauses. The prover tries first the deletion rules. Thus, C may be deleted by tautology deletion, or by purity deletion, or by subsumption by a clause in S (forward subsumption), or because it is a unit equational clause \(s\simeq s\).
If clause C survives these tests, the prover tries the replacement rules. Thus, C may be simplified by clausal simplification by a clause in S, or reduced to \(C\!\downarrow _S\) by demodulation with the demodulants in S. Let \(C\!\downarrow _S\) represent the final result of the application of all applicable replacement rules. If C, and hence \(C\!\downarrow _S\), is an equation, the test to determine whether it can be oriented is applied to \(C\!\downarrow _S\). Thus, the implementation of contraction respects the requirement from completion of orienting equations only after their sides have been normalized (cf. Sect. 4.2).
Only at this stage clause \(C\!\downarrow _S\) gets an identifier and is appended to the sos list. Therefore, forward contraction is part of the generation of a new clause. Indeed, in Otter this phase is called preprocessing of a clause. Also the test for the generation of the empty clause happens during preprocessing: if \(C\!\downarrow _S\) is a unit clause the prover tests whether it generates the empty clause with a unit clause in usable or sos. This is because one wants to get the empty clause as soon as possible. Thus, the test for a contradiction is applied as soon as a unit clause is generated, without waiting until it is selected as given clause.
For backward contraction the prover tests whether \(C\!\downarrow _S\) can contract a previously existing clause \(D\in S\). In Otter this phase is called postprocessing of a clause. For all \(D\in S\) for which this is the case, D is treated like if it were a newly generated clause, and subjected to forward contraction as described above. The resulting \(D\!\downarrow _S\) gets a new identifier and is appended to the sos list. Thus, a clause generated by backward contraction is treated as a clause generated by expansion.
There are two versions of the givenclause algorithm, named from the Otter prover [127, 128] and the E prover [73, 157‐159], respectively. The two versions differ primarily in the implementation of backward contraction. In both versions the set S of clauses in the above description of forward contraction is given by \(\mathtt {usable}\cup \mathtt {sos}\), meaning the union of the set of clauses in usable and the set of clauses in sos. On the other hand, the set S of clauses in the above description of backward contraction is \(\mathtt {usable}\cup \mathtt {sos}\) in the Otter version, whereas it is usable in the E version.
The Otter version of the givenclause algorithm aims at maintaining the set \(\mathtt {usable}\cup \mathtt {sos}\) interreduced or, more generally, contracted [49]. Suppose that the expansion inferences between a given clause C and the clauses in usable generate a bunch of new clauses, each of whom is subjected to forward contraction as described above, so that clauses \(C_0,\ldots ,C_k\) get appended to sos. In the Otter version, the prover tests whether \(C_i\), for all i, \(0\le i\le k\), can backwardcontract any clause in \(\mathtt {usable}\cup \mathtt {sos}\). Suppose that for all i, \(0\le i\le k\), backwardcontraction by \(C_i\) appends clauses \(D^i_0,\ldots ,D^i_{n_i}\) to sos. Then, for all i, \(0\le i\le k\), for all j, \(0\le j\le n_i\), the prover tests whether \(D_j\) can backwardcontract any clause in \(\mathtt {usable}\cup \mathtt {sos}\). The process continues until no more contraction applies.
Example 5
Suppose that sos contains \((1)\ f(g(x)) \simeq b\) and \((2)\ h(f(y)) \simeq c\), and Otter derives \(g(z) \simeq z\) by some inference and appends it to sos as \((3)\ g(z) \simeq z\). Otter applies (3) to backdemodulate (1) to \(f(x) \simeq b\), removes (1), and appends \((4)\ f(x) \simeq b\) to sos. Then Otter applies (4) to backdemodulate (2) to \(h(b) \simeq c\), removes (2), and appends \((5)\ h(b) \simeq c\) to sos.
The E version of the givenclause algorithm aims at maintaining usable contracted. The prover tests whether a clause C can backwardcontract any clause in usable only when C is selected as given clause and moved from sos to usable. As usable may have changed since the time when C was subjected to forward contraction, the prover first applies the clauses in usable to contract C, and then applies C to contract the clauses in usable, before trying the expansion inferences between C and clauses in usable. If a clause in usable is removed by backward contraction, its descendants in sos are deleted as orphans. Except for orphan deletion, all backward contraction happens in usable. The rationale is that maintaining usable contracted is good enough, because the premises of expansion inferences come from usable.
Example 6
Given the initial situation as in Example 5, E applies no backward demodulation in sos. Suppose that E selects \((3)\ g(z) \simeq z\) as given clause before (1) and (2). Thus, (3) moves from sos to usable. E applies (3) to backdemodulate \((1)\ f(g(x)) \simeq b\) only when (1) is selected as given clause and joins (3) in usable. As a result, E deletes (1) and appends \((4)\ f(x) \simeq b\) to sos. Suppose that E selects (4) as given clause before (2), so that (4) moves from sos to usable. E applies (4) to backdemodulate \((2)\ h(f(y)) \simeq c\) only when (2) is selected as given clause and joins (4) in usable. As a result, E deletes (2) and appends \((5)\ h(b) \simeq c\) to sos. If E selects (2) as given clause before (4), E applies (4) to backdemodulate (2) only when (4) is selected as given clause and joins (2) in usable. As a result, E deletes (2), appends \((5)\ h(b) \simeq c\) to sos, and deletes any orphan of (2) in sos.
In the E version of the givenclause algorithm the lists usable and sos were renamed active and passive, respectively. The E version was born primarily from a concern that the cost of backward contraction as in the Otter version could outweight its benefits. For example, it may happen that the prover spends a lot of time doing backward contraction, when it would be more beneficial to go ahead with expansion, because an expansion inference with the next given clause would generate a unit clause that yields the contradiction. On the other hand, the delay in backward contraction in the E version may cause the passive list to grow too much, reaching a memory limit, or it may delay finding a proof. For example, it may happen that the prover goes ahead to do more expansion, postponing backward demodulation steps in sos that would generate a unit clause that yields the contradiction.
In practice, most clauses that get deleted are deleted by forward contraction. Then, expansion and backward demodulation can be seen as two ways to generate clauses that need to be balanced. One could say that the Otter version leans toward prioritizing backward demodulation and the E version leans toward prioritizing expansion. There is no conclusive evidence that one is better than the other in general. Most theorem provers feature both versions of the givenclause algorithm, because one pays off on some problems and the other on others.
6 The Paramodulation Inference Rule
Adding demodulation to resolution does not suffice for refutational completeness in firstorder logic with equality. Larry Wos started the research on paramodulation [147], precisely to complement resolution and demodulation with an expansion inference rule for equality that would yield a refutationally complete inference system for firstorder logic with equality. This quest turned out to be one of the most fascinating in the history of automated theorem proving.
6.1 The Original Definition of Paramodulation
Prior to the inception of paramodulation, the only way to reason about equality in resolutionbased theorem proving was to add to the input set the clausal form of the axioms of equality:for all function symbols f and predicate symbols P of arity n, where \({\bar{x}}\) and \({\bar{y}}\) stand for \(x_1,\ldots ,x_n\) and \(y_1,\ldots ,y_n\), respectively. It soon emerged that these axioms are so general that their presence causes resolution to generate so many clauses that the efficiency of the inference system is unbearably compromised in most cases. Thus, George A. Robinson and Larry Wos introduced paramodulation [147] as a generalization of resolution with equality builtin:where \(\simeq \) is regarded as symmetric, \(\sigma \) is the mgu of a side l of the equation \(l\simeq r\) and a subterm t of a literal M in a clause \(M[t] \vee D\), and C and D are disjunctions of literals. Clause \(l\simeq r \vee C\) is called the clause paramodulated from, or parafrom clause for short, and \(l\simeq r\) is the literal paramodulated from, or parafrom literal for short. Clause \(M[t] \vee D\) is called the clause paramodulated into, or parainto clause for short, and M[t] is the literal paramodulated into, or parainto literal for short. The generated clause \((C \vee M[r] \vee D)\sigma \) is termed a paramodulant.
$$\begin{aligned} x \simeq x&(Reflexivity) \nonumber \\ x \not \simeq y \vee y \simeq x&(Symmetry) \nonumber \\ x \not \simeq y \vee y \not \simeq z \vee x \simeq z&(Transitivity) \\ \bigvee _{i=1}^n x_i \not \simeq y_i \vee f({\bar{x}}) \simeq f({\bar{y}})&(Function\ Substitutivity) \\ \bigvee _{i=1}^n x_i \not \simeq y_i \vee \lnot P({\bar{x}}) \vee P(\bar{y})&{ (Predicate\ Substitutivity) } \end{aligned}$$
$$\begin{aligned} Paramodulation :\ \ \ \begin{array}{c}{S\cup \{l\simeq r \vee C,\ M[t] \vee D\}}\\ \hline {S\cup \{l\simeq r \vee C,\ M[t] \vee D,\ (C \vee M[r] \vee D)\sigma \}}\end{array}\ \ \ l\sigma = t\sigma , \end{aligned}$$
While the appearance of paramodulation represented a breakthrough, a proof of refutational completeness could be obtained only under the assumption that the input set includes not only \(x\simeq x\), but also the functionally reflexive axioms, that is, the instances of reflexivity of the form \(f({\bar{x}}) \simeq f({\bar{x}})\), for all function symbols f. Furthermore, the original paramodulation inference rule is very prolific, because the term t paramodulated into can be a variable, which unifies with any term. However, paramodulation into variables could not be excluded, because it was necessary to prove a paramodulation lifting lemma [147] analogous to the lifting lemma used in the proof of completeness of resolution [65, 149]. In order to show that to every paramodulation between ground instances of clauses corresponds a paramodulation between the general clauses themselves, paramodulation into variables was needed, because the ground term paramodulated into could be the instance of a variable. The conjecture that paramodulation is refutationally complete without the functionally reflexive axioms and without paramodulating into variables became known as the Wos–Robinson conjecture.
A first step towards settling the Wos–Robinson conjecture was represented by the modification method [61]. This method consists of preprocessing the input set of clauses with respect to the equality axioms (the “modification” in the name), and then applying resolution and factoring to the modified set of clauses, without including axioms other than \(x\simeq x\). The completeness of resolution, factoring, and paramodulation without functionally reflexive axioms follows via a simulation argument, provided some paramodulations into variables are allowed. The WosRobinson conjecture was still considered open, because a direct proof of the refutational completeness of resolution, factoring, and paramodulation, without functionally reflexive axioms, and with no paramodulation into variables, was not given. Another challenge that remained open was to prove refutational completeness in the presence of demodulation and other contraction inference rules.
6.2 Superposition Between Rewrite Rules or Equations
Unaware of paramodulation,^{3} Knuth and Bendix coined the name superposition for a related inference rule, which is the main mechanism of the KnuthBendix completion procedure [101, 107]. In the formalization of completion as an inference system [6‐8], derivation states have the form (E; R), where E is a set of equations, and R is a set of rewrite rules oriented by a reduction ordering \(\succ \) on terms (cf. Sect. 4.2). Then, superposition is defined as follows:where \(\sigma \) is the mgu of the lefthand side l of a rewrite rule and a nonvariable subterm t of the lefthand side of another rewrite rule, X is the set of variable symbols, and the generated equation \(p[r]\sigma \simeq q\sigma \) is called a critical pair. If the critical pair cannot be simplified, deleted, or oriented, the procedure fails.
$$\begin{aligned} Superposition :\ \ \ \begin{array}{c}{(E;R\cup \{l\rightarrow r,\ p[t] \rightarrow q\})}\\ \hline {(E\cup \{p[r]\sigma \simeq q\sigma \}; R\cup \{l\rightarrow r,\ p[t] \rightarrow q\})}\end{array} \ \ \ t\not \in X,\ l\sigma {=}t\sigma , \end{aligned}$$
As with demodulation, unfailing [98] or ordered [6, 7, 9] completion removed the limitation of working only with rewrite rules, leading to Superposition of equations:where E is a set of equations, and the equations \(l\simeq r\) and \(p[t] \simeq q\) are allowed to superpose only if their instances according to the mgu \(\sigma \) are either orientable (i.e., \(l\sigma \succ r\sigma \)) or uncomparable (i.e., \(l\sigma \not \succ r\sigma \ \wedge \ r\sigma \not \succ l\sigma \ \wedge \ l\sigma \ne r\sigma \), abbreviated \(l\sigma \mathrel {\#} r\sigma \)). The ordering \(\succ \) on terms is a CSO [98] or a reduction ordering [6, 7, 9].
$$\begin{aligned} \begin{array}{c}{E\cup \{l\simeq r,\ p[t] \simeq q\}}\\ \hline {E\cup \{l\simeq r,\ p[t] \simeq q,\ p[r]\sigma \simeq q\sigma \}}\end{array} \ \ \ t\not \in X,\ l\sigma {=}t\sigma ,\ l\sigma \not \preceq r\sigma , \ p[t]\sigma \not \preceq q\sigma , \end{aligned}$$
This superposition inference rule is less general than paramodulation, as it applies only to unit equational clauses, but it avoids superposition into variables, is restricted by the ordering, and is refutationally complete for problems of the form \(E \models ^? \forall {\bar{x}}. s\simeq t\) also in the presence of contraction. The contraction rules of completion are deletion of equations of the form \(s\simeq s\), simplification, subsumption, and another subsumption rule for equations based on the encompassment ordering [98]:
where
if \(p = c[l\vartheta ]\), \(q = c[r\vartheta ]\), and either the context c is not empty or the substitution \(\vartheta \) is not a variable renaming.
A challenge related to the WosRobinson conjecture was how to obtain an inference system for firstorder logic with equality that avoids paramodulating or superposing into variables, is restricted by the ordering, is refutationally complete also in the presence of contraction, and reduces to completion if given an input of the form \(E\cup \{{\hat{s}}\not \simeq {\hat{t}}\}\).
6.3 Paramodulation and Superposition
The next step towards settling the WosRobinson conjecture and related challenges was a proof that an inference system with resolution, factoring, paramodulation, subsumption, and simplification is refutationally complete for firstlogic with equality, without adding equality axioms other than \(x\simeq x\) and without paramodulating into variables [133]. A key feature of this approach is a CSO on terms and atoms that is orderisomorphic to the positive integers. This ordering is used for simplification, and, in the proof of completeness, to build semantic trees based on an enumeration of the Herbrand base, where an equation \(l\simeq r\) appears before any atom that \(l\simeq r\) can simplify. The issue encountered by Wos and Robinson with the paramodulation lifting lemma is solved by showing that it suffices to consider substitutions that replace variable by irreducible terms, so that the substitution cannot replace a variable with a ground term that can be simplified, or, equivalently, paramodulated into [99, 133, 154, 155].
A KBO is orderisomorphic to the positive integers, provided weights are positive [133], but RPO’s and most other orderings are not. Thus, the WosRobinson conjecture was considered truly solved only when this requirement on the ordering was lifted. This result was reached with the proof of refutational completeness of an inference system called the orderedliteral strategy [99, 154, 155]. The orderedliteral strategy, or, rather, the orderedliteral inference system features resolution, factoring, paramodulation, and superposition as expansion inference rules, and tautology deletion, subsumption, clausal simplification, demodulation, and functional subsumption as contraction inference rules.
A key characteristic of this inference system, and the reason for its name, is that the expansion inference rules are restricted to work on literals that are strictly maximal in a CSO on terms and atoms. Since clauses are multisets of literals, a literal L is maximal in a clause C if \(\lnot (\exists M\in C.\ M \succ L)\), or, equivalently, \(\forall M\in C.\ L \not \prec M\). In other words, the other literals can only be smaller, equal, or uncomparable. A literal L is strictly maximal in a clause C if \(\lnot (\exists M\in C.\ M \succeq L)\), or, equivalently, \(\forall M\in C.\ L \not \preceq M\). In other words, the other literals can only be smaller or uncomparable. If the ordering is defined on atoms as in [99, 154, 155], literals are identified with their atoms when applying the ordering. The proof that the orderedliteral inference system is refutationally complete without adding equality axioms other than \(x\simeq x\), without paramodulating into variables, and without the requirement that the CSO on terms and atoms is orderisomorphic to the positive integers, was obtained by working with transfinite semantic trees [99, 154, 155].
Resolution and factoring are restricted to resolve upon strictly maximal literals:where (1) is \(\forall M\in C.\ L_1\sigma \not \preceq M\sigma \) and (2) is \(\forall M\in D.\ L_2\sigma \not \preceq M\sigma \). These orderingbased restrictions to resolution and factoring appeared in [99] and have remained in the subsequent orderingbased inference systems, including the superposition calculus \(\mathcal{SP}\mathcal{}\) where a reduction ordering on terms is extended to literals as seen in Sect. 4.3.
$$\begin{aligned} \begin{array}{lcl} Resolution :&{} \begin{array}{c}{S\cup \{L_1 \vee C,\ L_2 \vee D\}}\\ \hline {S\cup \{L_1 \vee C,\ L_2 \vee D,\ (C \vee D)\sigma \}}\end{array} &{} L_1\sigma = \lnot L_2\sigma ,\ (1),\ (2) \\ &{} &{} \\ Factoring :&{} \begin{array}{c}{S\cup \{L_1 \vee \ldots \vee L_k \vee C\}}\\ \hline {S\cup \{L_1 \vee \ldots \vee L_k \vee C,\ (L_1 \vee C)\sigma \}}\end{array}&L_1\sigma = L_2\sigma = \ldots L_k\sigma ,\ (1) \end{array} \end{aligned}$$
For paramodulation and superposition, the challenge of solving the Wos–Robinson conjecture was intertwined with the challenge of obtaining inference rules for firstorder logic with equality that reduce to the superposition rule of completion in the purely equational case. In completion superposition is restricted to work on maximal sides of equations (cf. Sect. 5.2). Thus, collecting the restrictions on literals and those on sides of equational literals, one gets four orderingbased conditions. In order to state them, we recall some terminology and notation that applies to all versions of paramodulation and superposition in this section. The parafrom clause is written \(l\simeq r \vee C\), where \(l\simeq r\) is the parafrom literal. The parainto clause is written \(M[t] \vee D\), where M[t] is the parainto literal. If the inference system distinguishes the case where the parainto literal is an equational literal, the parainto clause is written \(p[t] \simeq q \vee D\) or \(p[t] \not \simeq q \vee D\), where \(p[t] \simeq q\) or \(p[t] \not \simeq q\) is the parainto literal, respectively. One can also say superposedfrom and superposedinto with the analogous meanings. The subterm t is not a variable (i.e., \(t\not \in X\) where X is the set of variable symbols), and the substitution \(\sigma \) is the mgu of the terms l and t (i.e., \(l\sigma = t\sigma \)).
The four orderingbased conditions involved in restricting paramodulation and superposition are the following: The orderedliteral inference system in [99] added to resolution and factoring as above the following Paramodulation inference rule:Similar to the original paramodulation inference rule (cf. Sect. 5.1), this inference rule does not distinguish whether the parainto literal is equational or not. The requirement that \(t\not \in X\) and three orderingbased conditions out of four represented major restrictions with respect to the paramodulation inference rule of Robinson and Wos.
(i)
The parafrom literal is strictly maximal in the instance of the parafrom clause: \(\forall Q\in C.\ (l\simeq r)\sigma \not \preceq Q\sigma \);
(ii)
The lefthand side of the parafrom literal is strictly maximal in the instance of the parafrom literal: \(l\sigma \not \preceq r\sigma \);
(iii.a)
The parainto literal is strictly maximal in the instance of the parainto clause: \(\forall Q\in D.\ M[t]\sigma \not \preceq Q\sigma \) or \(\forall Q\in D.\ (p[t] \simeq q)\sigma \not \preceq Q\sigma \);
(iii.b)
If the parainto literal is a negative equational literal \(p[t] \not \simeq q\), it is maximal in the instance of the parainto clause: \(\forall Q\in D.\ (p[t] \not \simeq q)\sigma \not \prec Q\sigma \);
(iv)
If the parainto literal is a positive equational literal \(p[t] \simeq q\), its lefthand side is strictly maximal in the instance of the parainto literal: \(p[t]\sigma \not \preceq q\sigma \).
$$\begin{aligned} \begin{array}{c}{S\cup \{l\simeq r \vee C,\ M[t] \vee D\}}\\ \hline {S\cup \{l\simeq r \vee C,\ M[t] \vee D,\ (C \vee M[r] \vee D)\sigma \}}\end{array} \ \ \ (i),\ (ii),\ (iii.a). \end{aligned}$$
Aiming at the challenge of lifting to firstorder logic superposition as in completion, the inference system of [155] replaced the paramodulation inference rule of [99] with two rules, one called superposition and one called paramodulation. Superposition applies if the parainto literal is a positive equational literal:Paramodulation was used if the parainto literal M[t] is a nonequational literal or a negative equational literal:Thus, Superposition has Conditions (ii) and (iv) from superposition in completion (cf. Sect. 5.2), but both rules had to drop Condition (i). The inference system in [155] includes the contraction inference rules.^{4} However, as discussed in Sect. 4.3, due to the choice of the ordering, demodulation as in [155] does not reproduce the behavior of the simplification rule of completion in the equational case. Therefore, the inference system of [155] generalized completion only as far as the superposition inference rule is concerned.
$$\begin{aligned} \begin{array}{c}{S\cup \{l\simeq r \vee C,\ p[t] \simeq q \vee D\}}\\ \hline {S\cup \{l\simeq r \vee C,\ p[t] \simeq q \vee D,\ (C \vee p[r] \simeq q \vee D)\sigma \}}\end{array} \ \ \ (ii),\ (iii.a),\ (iv). \end{aligned}$$
$$\begin{aligned} \begin{array}{c}{S\cup \{l\simeq r \vee C,\ M[t] \vee D\}}\\ \hline {S\cup \{l\simeq r \vee C,\ M[t] \vee D,\ (C \vee M[r] \vee D)\sigma \}}\end{array} \ \ \ (ii),\ (iii.a). \end{aligned}$$
The conjecture as to whether an orderingbased inference system is still refutationally complete, if all four orderingbased conditions are imposed remained open. It was answered affirmatively with the development of the superposition calculus \(\mathcal{SP}\mathcal{}\) [10]. As already discussed in Sect. 4.3 for demodulation, a basic, but crucial, ingredient is the appropriate extension of the ordering on terms to literals. Another key ingredient is the addition of a new expansion inference rule [10, 130]:where Condition (v) is \(\forall Q\in C\cup \{u^\prime \simeq s^\prime \}.\ (u\simeq s)\sigma \not \prec Q\sigma \). This rule is a generalization of factoring that can be seen as a conditional factoring rule. If it holds that \(u\sigma = u^\prime \sigma \) and \(s\sigma = s^\prime \sigma \), that is \((u\simeq s)\sigma = (u^\prime \simeq s^\prime )\sigma \), factoring can be applied. Equational factoring tests only \(u\sigma = u^\prime \sigma \), provided \(u\sigma \not \preceq s\sigma \), and adds \(s\sigma \simeq s^\prime \sigma \) as a condition, hence negated, in the generated clause.
$$\begin{aligned} Equational\ Factoring \ \ \ \begin{array}{c}{C\vee u\simeq s \vee u^\prime \simeq s^\prime }\\ \hline {(C\vee s\not \simeq s^\prime \vee u\simeq s^\prime )\sigma }\end{array}\ \ \ u\sigma = u^\prime \sigma ,\ u\sigma \not \preceq s\sigma ,\ (v), \end{aligned}$$
The superposition calculus \(\mathcal{SP}\mathcal{}\) uses only the name superposition [10, 130]. In \(\mathcal{SP}\mathcal{}\) even resolution becomes a special case of superposition, because all literals are transformed into equational literals as seen in Sect. 4.3. However, for continuity, we refrain from subsuming resolution into superposition, and we still use the name paramodulation when the parainto literal is not equational. For Paramodulation the three applicable orderingbased conditions are restored:Superposition affords all four orderingbased conditions:with the weaker version of the third one (Condition (iii.b) in place of Condition (iii.a)) when the parainto literal is negative. These versions of paramodulation and superposition, together with resolution, factoring, equational factoring, tautology deletion, subsumption, and simplification form the refutationally complete inference system \(\mathcal{SP}\mathcal{}\) for firstorder logic with equality. The proof of refutational completeness was obtained by an approach based on rewrite models [10] that became a standard (e.g., [123]) and was reformulated also in terms of semantic trees [92].
$$\begin{aligned} \begin{array}{c}{S\cup \{l\simeq r \vee C,\ M[t] \vee D\}}\\ \hline {S\cup \{l\simeq r \vee C,\ M[t] \vee D,\ (C \vee M[r] \vee D)\sigma \}}\end{array} \ \ \ (i),\ (ii),\ (iii.a). \end{aligned}$$
$$\begin{aligned} \begin{array}{cl} \begin{array}{c}{S\cup \{l\simeq r \vee C,\ p[t] \simeq q \vee D\}}\\ \hline {S\cup \{l\simeq r \vee C,\ p[t] \simeq q \vee D,\ (C \vee p[r] \simeq q \vee D)\sigma \}}\end{array} &{} (i),\ (ii),\ (iii.a),\ (iv), \\ &{} \\ \begin{array}{c}{S\cup \{l\simeq r \vee C,\ p[t] \not \simeq q \vee D\}}\\ \hline {S\cup \{l\simeq r \vee C,\ p[t] \not \simeq q \vee D,\ (C \vee p[r] \not \simeq q \vee D)\sigma \}}\end{array}&(i),\ (ii),\ (iii.b),\ (iv), \end{array} \end{aligned}$$
In summary, the superposition calculus [10‐12] imposed the strongest known orderingbased restrictions on expansion rules, and met the challenge of getting a refutationally complete inference system for firstorder logic with equality that reduces to completion if the input is purely equational.^{5} For these reasons, the superposition calculus became the standard orderingbased inference system.
7 Discussion
With set of support [187], demodulation [188], and paramodulation [147], Larry Wos contributed three fundamental ideas that have nurtured research on theorem proving for decades, and are still fruitful today.
The setofsupport strategy influenced both search plan design, as witnessed by the givenclause algorithm [127, 128], and inference rule design, beginning with semantic resolution [161]. Since the givenclause algorithm is at the heart of contemporary provers (e.g., [66, 96, 110, 159, 170, 180]), it continues to be an object of study. The design of heuristic evaluation functions for the selection of the given clause has been an active research topic (e.g., [1, 44, 71, 72, 127, 128, 160]). For example, the search may employ multiple evaluation functions by maintaining multiple priority queues with either parallel search [46] or interleaving [159]. The weight of clauses can be used to break ties when the best clause according to an evaluation function is not unique [91].
The ideas in the setofsupport strategy and in semantic resolution were generalized and developed into notions of semantic guidance, goalsensitivity, and hyperinference, that had an impact also beyond resolutionbased theorem proving, including tableauxbased methods (e.g., [14, 17, 19]), instancebased methods (e.g., [142]), and SGGS [59, 60] (see [51] for a survey with an emphasis on these features).
The challenge of getting the theoremproving strategy to focus on the conjecture to be proved, that Larry Wos meant to address with the setofsupport strategy, is more relevant than ever, given the growth of large and very large knowledge bases, in mathematics and other domains (e.g., [145, 172]). The existence of such knowledge bases also poses the problem of applying theorem proving to check their consistency: the meaning and impact of semantic guidance and goalsensitivity for this problem is still uncharted territory.
Larry’s concept of irrelevant, or, dually, relevant, inference and clauses was formalized and generalized [94], and his notion that inferences should be general resurfaced in investigations of abstraction in resolution theorem proving (e.g., [135]). The already mentioned instancebased methods (see [45, 51, 103, 108] for surveys) explore a complementary direction that is often most fruitful for model building given a satisfiable input.
Larry’s UR resolution hyperinference rule [188] became a standard feature of resolutionbased theorem provers and beyond. For example, UR resolution was used to generate unit lemmas for PTTP (Prolog Technology Theorem Proving) provers [167] that implemented model elimination [118, 119] on top of a Prolog engine such as the Warren Abstract Machine [179]. A similar idea was pursued in a parallel setting [169]. Both sequential and parallel tableauxbased theorem provers such as SEtheo [115] and CPtheo [86] preprocessed the input with respect to UR resolution, unit resolution, and unit subsumption.
The thread of research that Larry Wos opened with the notion of demodulation has been a major one in theorem proving and continues to the present. Wellfounded demodulation is a fundamental inference rule for equality in all reasoning contexts. Under the name of simplification or rewriting, it was generalized to conditional rewriting, or reasoning in Horn equational logic (e.g., [10, 50, 109]), and to contextual rewriting (e.g, [90, 97, 181, 189]), with applications also beyond theorem proving. Furthermore, Larry’s notion of applying the rule for a predefined number k of steps, as in kmodulation, may still be useful in practice. For example, it may be employed as a form of preprocessing when no suitable wellfounded ordering orients defining equations in the desired direction [174].
The efficient implementation of demodulation, and more generally contraction, is an active research topic, because theorem provers may spend a lot of time performing contraction (e.g, [97]). As it is typical in theorem proving, the issue is one of finding a good balance between the eagerness and the cost of contraction. For example, one can distinguish between fullfledged contraction and cheap simplification (e.g., demodulation by rewrite rules) [158, 159] or light simplification (e.g., demodulation by ground rewrite rules) [80] that are less expensive and can be applied more eagerly. If given clause C generates a set N of new clauses, immediate simplification [80] consists of interreducing N and then applying it to backwardcontract the clauses in usable. If clause C itself is deleted in the process, all clauses in N can be deleted as orphans, except those that justify the deletion of C.
Larry Wos pioneered a paramodulation principle for building equality into resolution, that many other researchers, over several decades, endeavoured to bring to maturity, merging it successfully with completionbased theorem proving (e.g., [6, 7, 9, 49, 54, 98, 101, 107]). The resulting orderingbased inference systems (e.g., [10, 99, 133, 155]) are refutationally complete, combining expansion inference rules such as resolution, factoring, paramodulation, and superposition, with contraction inference rules such as subsumption and wellfounded demodulation or simplification. The number of years and people involved, starting from different angles and with different motivations, shows the greatness of Larry’s original insight.
Subsumption and simplification are based on distinct wellfounded orderings. An abstract framework to treat in a unified manner these two contraction principles was developed [178]. Another area of investigation is the reproduction and verification of the proofs of refutational completeness of orderingbased inference systems (e.g., [12]) in proof assistants [156, 178].
Orderingbased inference systems were implemented first in the Otter theorem prover [127, 128], that Larry used for his experiments throughout his long career, and then in most subsequent resolutionbased theorem provers, up to those that represent the state of the art for firstorder logic with equality (e.g., the E prover [159], Spass [180], Vampire [110], Waldmeister [96], Zipperposition [174], and GKC [170]). The growth of superpositionbased theorem proving was a main reason for the evolution of the givenclause algorithm from an implementation of the set of support strategy into a general algorithm for implementing multiple strategies. Indeed, the setofsupport strategy is not complete in general for either ordered resolution in firstorder logic, or paramodulation and superposition in firstorder logic with equality, unless the complement of the set of support is saturated with respect to the inference system in a preprocessing phase [10], which defeats the spirit of the strategy. Making reasoning goalsensitive, or targetoriented, is more challenging in the presence of equality [54].
The power and flexibility of orderingbased inference systems is witnessed by the fact that they allow some theory reasoning (e.g., [13, 54, 79, 89, 100, 104, 134, 166, 168, 177]) yield decision procedures (e.g., [5, 84, 87, 102, 164, 165]), get interfaced with other reasoning paradigms (e.g., [23, 58, 80, 144]), form the basis of approaches to parallel theorem proving (e.g., [42, 43, 46, 53] and [48] for a survey), and are generalized to higherorder logic as in lambdasuperposition [24, 26‐28, 40, 131, 175, 176] and in combinatory superposition [29‐32].
Acknowledgements
Parts of this work were done while the author was participating in a program at the Simons Institute for the Theory of Computing, and visiting the Computer Science Laboratory of SRI International, whose support is greatly appreciated. The author thanks very much the anonymous reviewers for their precious technical remarks, and Wolfgang Bibel for his comments on the early history of theorem proving.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.