## 1 Introduction

The bottom-up model generation (BUMG) paradigm encompasses a wide family of calculi and proof procedures designed to construct models of given clause sets by reading clauses as rules and applying them in a bottom-up way until saturation. Variants of (positive) hyperresolution, grounding tableau calculi, hypertableau calculi and instantiation-based methods belong to this family. BUMG methods have been known for a long time to be useful for proving theorems and the dual task of computing models for satisfiable problems, which is recognized as important to solving mathematical problems [96, 99], program synthesis [59], in software engineering [30], model checking, and other applications for fault analysis [9] and debugging of logical specifications [18, 70].

This paper discusses several ways of enhancing the methods in the BUMG paradigm for first-order logic with equality (function symbols are allowed). Our presentation is focused on the use of hyperresolution, or the positive strategy or selection-based refinement in the framework of resolution and superposition defined over complete simplification orders and other refinements [4, 48, 68, 75] as a BUMG method. We assume that interpretations are Herbrand interpretations defined by ground atom sets (which is a commonly used definition). To compute such models (i) clauses must be transformed into range-restricted form and (ii) these methods must be enhanced with splitting of positive ground clauses.

Anzeige

These observations are based on the following commonly exploited insights, e.g. [24, 42, 53, 63]. A clause is range-restricted when every variable occurring in the positive part of the clause (the head) also occurs in the negative part of the clause (the body). In this class all positive clauses (those without any negative literals) are ground. The class of range-restricted clauses is a reduction class for first-order logic; this means that every first-order formula (and also every first-order clause set) can be transformed in an equi-satisfiability and model-preserving way into a set of range-restricted clauses. In general, hyperresolution resolves n positive clauses with the main clause and the resolvent is again a positive clause. Consequently, on range-restricted clauses, hyperresolution resolves n ground, positive clauses with the main clause, always producing a positive, ground resolvent. In a calculus with splitting, the ground clauses are split into ground positive units and derivations are tree derivations. The positive units in an open branch in such a derivation can be seen to define a (partial) Herbrand model of the input clause set.

One of the contributions of the paper is the introduction to first-order logic of blocking techniques partially inspired by techniques successfully used in description and modal logic tableau-based theorem proving [2, 50, 79]. In this context blocking is an important mechanism to discover periodicity and merging worlds, or objects, which creates loops in the model reducing the otherwise infinite, freely generated (Herbrand) model to a finite model.

^{1}A common form of blocking or loop checking in this context is ancestor blocking. In [16] we introduced a first-order version of ancestor blocking and showed how blocking techniques of description and modal logic tableau-based theorem provers can be generalized to full first-order logic with equality. In the present paper this blocking method is named subterm domain blocking. An unrestricted version of this blocking technique, called unrestricted blocking, was introduced in [79] showing that it allows semantic tableau methods to decide the expressive description logic \(\mathcal{ALBO}\). Unrestricted blocking was shown to be a strong technique to devise general tableau-based decision procedures for modal-type logics and expressive description logics with complex role operators [80, 82] and is an important ingredient in the tableau synthesis framework and the prover generator MetTeL [56, 81].Subterm domain blocking and unrestricted blocking drop the requirement to check for a loop in the partially constructed model and allows for the merging of terms to be undone [16, 79], which is an important departure from loop-checking. Roughly, the idea of unrestricted blocking is to add a clause \(x\approx y \vee x\not \approx y\) to the input set and use it via a range-restriction transformation and hyperresolution to produce instances \(s\approx t \vee s\not \approx t\) for all ground terms s and t. However these clauses are not treated as tautologies, because \(\not \approx \) in the literal \(x\not \approx y\) is a new predicate constrained by the clause \(\lnot (x \not \approx y) \vee \lnot (x\approx y)\). This means \(x\not \approx y\) should be thought as being a positive literal and \(x\approx y \vee x\not \approx y\) is a positive clause. Splitting on such a clause creates one branch where \(s\approx t\) holds and a second branch where it does not hold, but \(s\not \approx t\) holds. In the left branch, \(s\approx t\) is handled by rewriting; this means that s and t are merged (and reduced to an irreducible term with respect to all the equalities present on the current branch). We can think of the larger of the two terms as being blocked by the other one. When no models can be found in the left subderivation rooted by the equation \(s\approx t\), standard backtracking reverts the blocking of s and t and the inference process continues with the right branch asserting \(s\not \approx t\). This avoids that the two terms are equated again in subsequent steps of the derivation.

Since the clause \(x\approx y \vee x\not \approx y\) (together with the ‘definition’ \(\lnot (x \not \approx y) \vee \lnot (x\approx y)\) of \(\not \approx \)) is tautologous it can be added to any set of clauses; consequently blocking in this form is generally sound and can be widely used. Any instance of the clause is a tautology (taking into account the ‘definition’ of \(\not \approx \)), thus blocking for any specific terms s and t is sound. Even with a constraint \(\mathcal{B}\) the clause \(\mathcal{B}\rightarrow x\approx y \vee x\not \approx y\) preserves soundness and thus realizes a form of blocking but is triggered only when the constraint \(\mathcal{B}\) holds. For instance, the constraint could trigger merging of two terms s and t only when s is a subterm of t. This generalises dynamic anywhere ancestor blocking from the description logic literature to first-order logic.

Anzeige

In this paper we present four different blocking techniques for varying the kinds of models that can be found. The difference between the four techniques is how restrictive the blocking is. Unrestricted domain blocking ensures that domain minimal models are generated. With subterm domain blocking or subterm predicate blocking larger models are produced because two terms are only merged if one is a subterm of the other. With unrestricted predicate blocking and subterm predicate blocking two terms are merged if they both belong to the extension of a unary predicate symbol, the intention being that less constrained (and thus larger) finite model can be found. This is useful for applications with types.

The second contribution of the paper is a refinement of the ‘transformation to range-restricted form’ introduced by Winker [96] as a trick to find small counter-examples for solving axiom independence problems. This technique has been used by Manthey and Bry [63] to build the SATCHMO prover based on hyperresolution and splitting. An improved range-restriction transformation can be found in [15]. These range-restricting transformations have the property that they force BUMG methods to enumerate the entire Herbrand universe and are therefore non-terminating except in the simplest cases. Two solutions are presented in this paper: One solution is to combine classical range-restriction transformations [63, 96] with blocking. Another solution is to modify the range-restricting transformation so that new terms are created only when needed. All the BUMG enhancements introduced in this paper are defined as transformations of the clause sets and are shown to preserve satisfiability equivalence or equivalence with respect to E-satisfiability, where E is the theory of equality.

Since the enhancements are encoded on the clausal level for use with standard hyperresolution techniques, we have implemented them using the SPASS theorem prover [93, 94]. The third contribution is a comprehensive evaluation of the different techniques on the TPTP problem library [89] with an analysis of the results leading to interesting findings.

The transformations enable standard resolution theorem proving approaches to decide the Bernays-Schönfinkel class (with or without equality).

^{2}For this class blocking is not essential. Moreover, we claim that BUMG methods with blocking can decide fragments of first-order logic with the finite model property and can construct finite models for finitely satisfiable formulae.The structure of the paper is as follows. Definitions of basic terminology and notation can be found in Sect. 2. In Sect. 3 we recall the characteristic properties of BUMG methods and recall the definition of hyperresolution with splitting. The main part of the paper are Sects. 4 to 10. Sections 4, 5, 6 and 7 define new techniques for generating small models and generating them more efficiently. The techniques are based on a series of transformations including a refined range-restricting transformation (Sect. 4), its modification for equality (Sect. 5), instances of standard renaming and flattening (Sect. 6), and the introduction of blocking in various forms via amendments of the clause set and standard saturation-based equality reasoning (Sect. 7). The transformations are shown to be sound and complete in Sect. 8. One consequence of the results is a general decidability result of the Bernays–Schönfinkel class for all BUMG methods and related approaches. This is presented in Sect. 9. In Sect. 10 we present and discuss results of an evaluation carried out with our implementation in SPASS-Yarralumla on clausal problems in the TPTP library. Sect. 11 summarizes and discusses related work.

This paper is an extended and improved version of [16].

## 2 Basic Definitions

We use standard terminology from automated reasoning. We assume as given a signature \(\Sigma = \Sigma _f \cup \Sigma _P\) of function symbols \(\Sigma _f\) and predicate symbols \(\Sigma _P\) of given arities. We assume \(\Sigma _f\) contains at least one constant symbol. As we are working (also) with equality, we assume \(\Sigma _P\) contains a distinguished binary predicate symbol \(\approx \). Terms, atoms, literals and formulas over \(\Sigma \) and a given (denumerable) set of variables V are defined as usual. In particular, every term is either a variable or a functional term\(f(t_1,\ldots ,t_n)\), i.e., the application of an n-ary function symbol f to n terms \(t_1,\ldots ,t_n\), for some \(n \ge 0\). A function symbol c of arity 0 is also called a constant and we just write c instead of the functional term c(). A functional term is called compound (or proper) if it is not a constant. An atom is of the form \(P(t_1,\ldots ,t_n)\) where P is an n-ary predicate symbol, for some \(n \ge 0\). Equations are atoms of the form \({\approx } (t_1,t_2)\), which we always write using infix notation as \(t_1 \approx t_2\). A literal is an atom or the negation of an atom. When L is a literal we write L[t] to indicate that L contains a subterm t at a position left implicit.

We treat the Boolean operators \(\wedge \) and \(\vee \) as multi-arity operators that ignore the order of their arguments, which is justified by their associativity and commutativity properties. In other words, for instance, a disjunction \(H_1 \vee \cdots \vee H_n\) of atoms is treated as the (possibly empty) multiset \(\{H_1,\ldots ,H_n\}\); similarly for conjunctions.

A clause is a (finite) implicitly universally quantified disjunction \(\lnot B_1\vee \cdots \vee \lnot B_k \vee H_1 \vee \cdots \vee H_m\) of literals, where \(m,k \ge 0\). To emphasize their bottom-up operational semantics we write clauses in implication style \(B_1 \wedge \cdots \wedge B_k \rightarrow H_1 \vee \cdots \vee H_m\). The part \(H_1 \vee \cdots \vee H_m\) is called the head of the clause and \(B_1 \wedge \cdots \wedge B_k\) is its body. We write \(\bot \) for the empty head. Each \(H_i\), if any, is called a head atom, and each \(B_j\), if any, is called a body atom. Clauses with empty head are called negative clauses, and clauses with empty body are called positive clauses. Positive clauses are written as \(H_1 \vee \cdots \vee H_m\), instead of \({} \rightarrow H_1 \vee \cdots \vee H_m\). Notice the empty clause is written as \(\bot \).

When writing expressions such as \(B \wedge \mathcal{B}\rightarrow H \vee \mathcal{H}\) we mean any clause whose head literals are H and those in the disjunction (or multiset) of literals \(\mathcal{H}\), and whose body literals are B and those in the conjunction (or multiset) of literals \(\mathcal{B}\).

Let F be a term, literal, head, body or clause. By \({{\,\mathrm{var}\,}}(F)\) we denote the set of variables occurring in F. If \({{\,\mathrm{var}\,}}(F) = \emptyset \) then F is ground. As usual, a substitution is a mapping from variables to terms that is the identity almost everywhere. We identify a substitution \(\sigma \) with its homomorphic extension to terms, literals, heads, bodies and clauses. Consequently, \(\sigma (F)\) denotes the result of simultaneously replacing in F every occurrence of every variable \(x \in {{\,\mathrm{var}\,}}(F)\) by \(\sigma (x)\). We use the notion of (most general) unifiers in the standard way. Given two terms or atoms \(e_1\) and \(e_2\) we write \(\sigma ={{{\,\mathrm{mgu}\,}}}(e_1,e_2)\) to indicate that the substitution \(\sigma \) is a most general unifier of \(e_1\) and \(e_2\). We also need the notion of simultaneous most general unifiers: given a set \(E = \{(e_1^1, e_2^1),\ldots ,(e_1^n, e_2^n) \}\) of pairs of terms or atoms, where \(n \ge 0\), we say that \(\sigma \) is a simultaneous most general unifier ofE and write \(\sigma = {{\,\mathrm{smgu}\,}}(E)\) iff \(\sigma \) is a most general substitution such that \(\sigma (e_1^i) = \sigma (e_2^i)\) for all \(i\in \{1,\ldots ,n\}\). It is well-known that a simultaneous most general unifier can always be computed (if it exists) by iterated most general unifier computation.

For a given atom \(P(t_1,\ldots ,t_n)\), the terms \(t_1,\ldots ,t_n\) are called the top-level terms of \(P(t_1,\ldots ,t_n)\) (P being \(\approx \) is permitted). This notion generalizes to clause bodies, clause heads and clauses as expected. For example, for a clause \( \mathcal{B}\rightarrow \mathcal{H}\) the top-level terms of its body \(\mathcal{B}\) are exactly the top-level terms of its body atoms.

A (standard first-order) interpretationI consists of a non-empty domain, denoted as |I|, and interpretation functions \(P^I\) and \(f^I\) on |I|, for every \(P \in \Sigma _P\) and \(f \in \Sigma _f\), as usual. A valuation\(\mu \) is a (total) mapping from the set of variables to |I|. We form pairs \((I,\mu )\) for evaluating terms, atoms and formulas in the usual way and write, e.g., \((I,\mu )(t)\) for the result of evaluating the term t. We write \((I,\mu ) \models F\) to denote the fact that \((I,\mu )\) satisfies F. If a term t does not have free variables we may write I(t) instead of \((I,\mu )(t)\) without loosing anything. Similarly for atoms, formulas and clauses.

As usual, the Herbrand universe\({{\,\mathrm{HU}\,}}_\Sigma \)of\(\Sigma \) is the set of all ground terms of the signature \(\Sigma _f\), and the Herbrand base\({{\,\mathrm{HB}\,}}_\Sigma \)of\(\Sigma \) is the set of all ground atoms of \(\Sigma \). A Herbrand interpretationI is identified with a subset of ground atoms in \({{\,\mathrm{HB}\,}}_\Sigma \), namely, those that are true in the interpretation. Using the above terminology, for every Herbrand interpretation I we have \(|I| = {{\,\mathrm{HU}\,}}_\Sigma \) and \(I(t) = t\) for every \(t \in {{\,\mathrm{HU}\,}}_\Sigma \).

We work mostly, but not always, with Herbrand interpretations. Of particular interest are standard first-order interpretations that behave like Herbrand interpretations for a subset of the Herbrand universe. They are introduced as quasi-Herbrand interpretations below. Indeed, the purpose of our transformations is to enable BUMG methods to construct such interpretations with a domain as small as possible (and ideally a finite one).

Let us say that a set \(U \subseteq {{\,\mathrm{HU}\,}}_\Sigma \) of ground terms is subterm-closed if with every \(t \in U\) the set U also contains every subterm of t.

Definition 1

(Quasi-Herbrand interpretation) A quasi-Herbrand interpretation is a standard first-order interpretation I with domain \(|I| \subseteq {{\,\mathrm{HU}\,}}_\Sigma \) such that |I| is subterm-closed and \(I(t) = t\) for every \(t \in |I|\). Furthermore, |I| must contain at least one constant c.

Every Herbrand interpretation is a quasi-Herbrand interpretation by taking \(|I| = {{\,\mathrm{HU}\,}}_\Sigma \). Unlike in Herbrand interpretations, function symbols are not always trivially interpreted as total functions in quasi-Herbrand interpretations. For instance, in the presence of a constant \(\mathsf{a}\), a unary function symbol \(\mathsf{f}\), and the domain \(|I| = \{ \mathsf{a}, \mathsf{f}(\mathsf{a}) \}\), say, one has to assign a value in the interpretation to every term. However \(\mathsf{f}(\mathsf{f}(\mathsf{a}))\), for instance, cannot be evaluated to itself, as \(\mathsf{f}(\mathsf{f}(\mathsf{a})) \notin |I|\) and, hence, one has to have \(\mathsf{f}^I(\mathsf{f}(\mathsf{a})) = \mathsf{a}\) or \(\mathsf{f}^I(\mathsf{f}(\mathsf{a})) = \mathsf{f}(\mathsf{a})\). But still we can specify quasi-Herbrand interpretations I as sets of atoms, as with Herbrand interpretations, however in conjunction with interpretation functions \(f^I\) for every \(f \in \Sigma _f\).

Satisfiability and validity in a Herbrand interpretation of ground literals, clauses, and clause sets are defined as expected. We write \(I \models F\) to denote that I satisfies F, where F is a ground literal, clause or clause set (read conjunctively). For instance, for a ground clause \(\mathcal{B}\rightarrow \mathcal{H}\) it holds that \(I \models \mathcal{B}\rightarrow \mathcal{H}\) iff \(I \not \models \mathcal{B}\) or \(I \models \mathcal{H}\). The satisfaction relation \(\models \) carries over to non-ground clauses and clause sets by taking a clause as the set of its ground instances and a clause set as the union of their sets of ground instances, respectively. If M is a clause set and \(I \models M\) we say that Iis a Herbrand model ofM.

If I is instead a quasi-Herbrand interpretation then, as the domain |I| consists of terms, we can still define satisfaction of non-ground clauses (and clause sets) in terms of substitutions instead of using pairs \((I,\mu )\). For instance, \(I \models \mathcal{B}\rightarrow \mathcal{H}\) iff for all substitutions \(\gamma \) with codomain |I| it holds that \(I \models (\mathcal{B}\rightarrow \mathcal{H})\gamma \).

An E-interpretation is a standard first-order interpretation I such that \((I,\mu ) \models s \approx t\) if and only if \((I,\mu )(s) = (I,\mu )(t)\). We use the notion of E-(un)satisfiability in the expected way, as (un)satisfiability with respect to E-interpretations. In particular, if M is a clause set and \(I \models M\) we say that Iis an E-model ofM.

An E-interpretation is not necessarily a Herbrand interpretation, but E-satisfiability is the same as satisfiability in Herbrand interpretations that interpret \(\approx \) as a congruence relation. In such Herbrand interpretations the following is true: for every ground term t, \(t \approx t \in I\), and for every ground atom A (including ground equations) whenever \(I \models A[s]\) and \(I \models s\approx t\), then \(I \models A[t]\). Alternatively, given such a Herbrand interpretation I its induced E-interpretation\(I_\approx \) is given by the domain \(I_\approx = |I|/{\approx }\), the partitioning of |I| into its equivalence classes modulo \(\approx \). Let \([t]_\approx \), or just [t], denote the equivalence class modulo \(\approx \) of a ground term \(t \in |I|\) (\(= {{\,\mathrm{HU}\,}}_\Sigma \)). Each n-ary function \(f \in \Sigma _f\) is interpreted in \(I_\approx \) as \(f^{I_\approx }([t_1],\ldots ,[t_n]) = [f(t_1,\ldots ,t_n)]\), and each n-ary predicate symbol \(P \in \Sigma _P\) is interpreted in \(I_\approx \) as \(P^{I_\approx }([t_1],\ldots ,[t_n]) = P^I(t_1,\ldots ,t_n)\). This is a well-known construction and it holds that a clause set M is satisfied by a model I iff it is E-satisfied by \(I_\approx \). See [36] for details.

Yet another characterization is to add to a given clause set M its equality axioms \(\text {EAX}(\Sigma )\), that is, the axioms expressing that \(\approx \) is a congruence relation on the terms and atoms induced by the predicate symbols \(\Sigma _P\) and function symbols \(\Sigma _f\) occurring in M. It is well-known that M is E-satisfiable iff \(M \cup \text {EAX}(\Sigma )\) is satisfiable [36].

## 3 BUMG Methods

Reasoners based on model generation approaches establish the satisfiability of a problem by trying to build a model for the problem. In this paper we are interested in bottom-up model generation approaches (BUMG) for first-order logic with equality and function symbols. BUMG approaches use a forward reasoning approach where implications or clauses, \(\mathcal{B}\rightarrow \mathcal{H}\), are read as rules and are repeatedly used to derive (instances of) the head \(\mathcal{H}\) from (instances of) the body \(\mathcal{B}\) until saturation is achieved.

The family of BUMG approaches includes many familiar calculi and proof procedures such as Smullyan-type semantic tableaux [86], SATCHMO [41, 63], positive unit hyperresolution (PUHR) tableaux [23, 24], the model generation theorem prover MGTP [37] and hypertableaux [12]. A well-established and widely known method for BUMG is hyperresolution [75]. In the framework of resolution and superposition-based on complete simplification orders, positive strategies are BUMG methods (also known as negative selection-based refinements). Relevant references are [32] for the Horn case and [4, 48] for the general case. As outlined in the Introduction, BUMG can be realized by simple combinations of insights and results based on such hyperresolution-style methods. In a nutshell, what is exploited is that any first-order formula can be effectively transformed into a class of range-restricted clauses, and on range-restricted clauses these methods are instantiation methods, the positive premises are always positive ground clauses and the conclusion is a positive ground clause. With splitting they produce models represented as sets of positive ground atoms, which is a common representation of Herbrand models (e.g. [42, 63]).

In this paper we describe methods for transforming a given clause set into another clause set so that BUMG methods benefit from the transformation. Our theoretical results are all of a semantic nature, in essence, equi-satisfiability results, which are complemented by practical experiments with a BUMG prover. Strictly speaking, we do not need to get into the details of the calculi or proof procedures behind BUMG methods for the purpose of this paper. For the sake of self-containedness, however, we summarize one BUMG method, the hyperresolution calculus extended with a splitting rule. A good in-depth overview of all aspects of the following brief exposition can be found in [6, 92], see also [31, 42, 51, 53, 63, 73] where splitting is used in resolution calculi, including hyperresolution calculi. The methods introduced below can be adapted to other BUMG methods without great effort.

In its most basic form, the hyperresolution calculus consists of two inference rules, positive hyperresolution (HRes) and positive factoring (Fact):

×

On ground clauses, positive factoring amounts to the elimination of duplicate head atoms. In the sequel we often omit the inference rule qualifier ‘positive’.

A derivation then is a (possibly infinite) sequence of clauses that starts with the given clauses, in any order, and that is extended step by step with the conclusion of an inference rule applied to premise clauses earlier in the sequence. A derivation that contains (ends with) the empty clause is called a refutation, which indicates that the given clause set is unsatisfiable.

Proof procedures for resolution do not directly search for derivations (say, by backtracking) and instead are saturation-based. That is, a clause set initially comprised of the given clauses is closed under application of the inference rules modulo deletion of redundant clauses (e.g., subsumed clauses, tautological clauses). Once the empty clause is derived a refutation can be extracted easily from the history of inference rule applications. If the empty clause is not derived then the given clause set is satisfiable. In the finite case, a (Herbrand) model can be extracted from the positive clauses of the saturated state. Specifically, for each such clause \(\mathcal{H}\) it suffices to pick an atom \(H \in \mathcal{H}\) and assign true in the model to all ground instances of H. While this is a crude scheme (and can be improved in practice by ordering restrictions) it is helpful for getting an understanding of the transformations below.

Most BUMG methods for first-order logic include some form of \(\beta \)-rule (in tableaux parlance) or splitting rule. If \(C \vee D\) is a clause such that C and D are both non-empty and do not share variables then we say that \(C \vee D\) is splittable. The terminology ‘splitting \(C \vee D\)’ refers to deriving from a clause set \(M \cup \{ C \vee D \}\) the two clause sets \(M \cup \{ C \}\) and \(M \cup \{ D \}\). By variable-disjointness, \(M \cup \{ C \vee D \}\) is equi-satisfiable with the conjunction of the two latter clause sets. Notice that derivations (or saturations) with a splitting rule enabled are tree-shaped now, in accordance with the splittings applied in the course of a derivation. Moreover, the empty clause has to be derived in every branch to make a refutation. As for model generation, it suffices to derive one branch in a saturated state and extract a model from that.

The splitting rule can be defined with arbitrary restrictions, as deemed useful, without loosing refutational completeness. For instance, the SPASS prover [93, 94] implements a version that splits on positive parts of clauses only. In the hyperresolution context it can be defined (approximately) as follows:

×

Making the Split rule mandatory, however, can be useful for model computation. In particular, if all given and derived positive, non-unit clauses \(H \vee \mathcal{H}\) are splittable into a unit clause H and a rest-clause \(\mathcal{H}\) then every branch not containing the empty clause in its saturated state specifies a model simply by these unit clauses. Indeed, our transformations below make sure it is always the case that all derived positive non-unit clauses\(H \vee \mathcal{H}\)are splittable in this way, and, moreover, that the (ground) unit clausesHin a saturated state not containing the empty clause specify a model of the given clause set.

Our experiments show the splitting rule is useful for BUMG (cf. Sect. 10). Moreover, for our blocking transformations (considered later in Sect. 7), splitting on positive ground clauses (those with an empty body \(\mathcal{B}\) in the Split rule) is in fact mandatory to make it effective. Most BUMG procedures support this splitting technique.

Another crucial requirement for the effective use of blocking is support of equality reasoning (for example, ordered paramodulation, rewriting or superposition [5, 48, 68]), in combination with simplification techniques based on orderings. We refer to [5, 7, 20] for general notions of redundancy in superposition-based theorem proving approaches. However, for the sake of self-containedness we include inference rules for equality below:The EqRes rule eliminates a body equation \(s \approx t\) whose sides can be made equal by means of a most general unifier \(\sigma \). The ParaUnitPos rule replaces in a head atom (equational or otherwise) a subterm s by t in presence of a unit clause \(s \approx t\). The ParaNeg rule is similar, but it targets a body atom and it uses unification to make the target term u and s equal. In both rules the right premise \(s \approx t\) stands also for its symmetric version \(t \approx s\).

These inference rules are slimmed-down versions of more sophisticated inference rules for equality reasoning as exhibited, e.g., in the superposition calculus. First, all inference rules can be restricted by applying them only if certain ordering restrictions are met. We refer the reader to the literature cited above. Second, the inference rule ParaUnitPos does not use unification. This is not necessary in our context as all positive unit clauses, derived or given, are always ground (so that unifiability coincides with syntactic equality). We also do not need to formulate the rule on non-unit clauses, as (positive) non-unit clauses are always split eagerly into unit clauses. The ParaNeg rule, however, is more general by being applicable to clauses of arbitrary shape and by using unification.

^{3}To sum up, the inference rules of our prototypical BUMG method are HRes, Fact (which is actually not necessary in our context), Split, EqRes, ParaUnitPos and ParaNeg. We assume that the Split rule is applied eagerly.

^{4}The hypertableau calculus with equality [13, 14] features similarly structured inference rules, but includes ordering restrictions and redundancy elimination techniques.Figure 1 shows a deliberately simple sample derivation. It exhibits an initially given clause set M, as stated, and its \(\text {rr}\) transformed version \(\text {rr}(M)\) for the derivation, proper. The \(\text {rr}\) transformation is introduced only below; how it is obtained is not important yet. The purpose of the example is to convey the flavour and some important aspects of the approach.

×

Referring to Fig. 1, if we develop the derivation tree in a depth-first left-to-right fashion, say, then the branch with Case (1) leads to the empty clause. The derivation hence continues by branching into Case (2), Case (2.1) and then Case (2.1.1). No new clause is derivable here and the derivation stops. (Other branches are infinite.) The positive unit clauses in this saturated state are

×

They can be interpreted as a model of M in two ways. First, as a Herbrand model whose domain is the set of all ground terms and that assigns true to all atoms in the saturated state. Notice that \(\approx \) is interpreted as a congruence relation. (Corollary 1 below has the formal result.) Second, as an E-interpretation that is derived from the quasi-Herbrand model with a (finite) domain. The (finite) domain of that quasi-Interpretation J is \(|J| = \{ \mathsf{a}, \mathsf{b}, \mathsf{f}(\mathsf{a}), \mathsf{f}(\mathsf{b}) \}\) as specified by the \(\mathsf{dom}\)-atoms in the saturated state. The induced E-interpretation \(J_\approx \) has a domain consisting of equivalence classes determined by the congruence relation obtained from J and the interpretation function \(f^J\). In this construction, ground terms not in |J| are put into equivalence classes by uniformly evaluating them to a default constant. In the example, if \(\mathsf{a}\) is that constant, we get \(\mathsf{f}(\mathsf{f}(\mathsf{a})) \approx \mathsf{a}\), \(\mathsf{f}(\mathsf{f}(\mathsf{b})) \approx \mathsf{a}\) and \(|J_\approx | = \{ \{ \mathsf{a}, \mathsf{b}, \mathsf{f}(\mathsf{f}(\mathsf{a})), \mathsf{f}(\mathsf{f}(\mathsf{b})), \mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{a})))), \ldots \},\ \{\mathsf{f}(\mathsf{a}), \mathsf{f}(\mathsf{b}),\mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{a}))), \ldots \}\}\). Every element in \(|J_\approx |\) is represented by an element from |J|, but not vice versa. The construction is explained in detail in Sect. 5.

## 4 Range-Restricting Transformations

Definition 2

(Range-restricted clause (set) [63]) A clause \(\mathcal{B}\rightarrow \mathcal{H}\) is range-restricted iff \({{\,\mathrm{var}\,}}(\mathcal{H}) \subseteq {{\,\mathrm{var}\,}}(\mathcal{B})\), i.e., the body \(\mathcal{B}\) contains all the variables in the clause. A clause set is range-restricted iff it contains only range-restricted clauses. \(\square \)

This means that a positive clause \(\mathcal{H}\) is range-restricted only if it is a ground clause. A negative clause \(\mathcal{B}\rightarrow \bot \) is always range-restricted. Following [63], the algorithm \(\text {crr}\) (for classical range-restriction) below transforms a given clause set into a range-restricted set.

×

It is not difficult to see that \(\text {crr}(M)\) is indeed range-restricted for any clause set M. In Step (3) the arity n of \(f \in \Sigma _f\) can be 0, that is, constants are included. The transformation is sound and complete, that is, M is satisfiable iff \(\text {crr}(M)\) is satisfiable [24, 63]. The size of \(\text {crr}(M)\) is linear in the size of M and can be computed in linear time.

Perhaps the easiest way to understand the transformation achieved by the classical range restriction algorithm \(\text {crr}\) is to imagine we use a BUMG method, for example, hyperresolution as described in Sect. 3. The idea is to build the model(s) during the derivation. The clause used in the initialization in Step (1) ensures that the domain of interpretation given by the domain predicate \(\mathsf{dom}\) is non-empty. (Recall, we assume \(\Sigma _f\) contains at least one constant.) The loop in Step (2) turns clauses into range-restricted clauses. This is done by shielding the variables \(\{x_1, \ldots , x_k \}\) in the head, that do not occur negatively, with the added \(\mathsf{dom}\)-literals in the body. Clauses that are already range-restricted are unaffected by this step. The loop in Step (3) ensures that all elements of the Herbrand universe of the (original) clause set are added to the domain via hyperresolution inference steps. Notice that all positive non-unit clauses given in \(\text {crr}(M)\) or derived from these are splittable (because hyperresolvents on range restricted clauses are positive ground clauses), which suggests the use of the Split rule.

As a consequence a clause set M with at least one non-nullary function symbol causes hyperresolution derivations on \(\text {crr}(M)\) to be unbounded, unless M is unsatisfiable. This is a negative aspect of the classical range-restricting transformation. However, the method has been shown to be useful for (domain-)minimal model generation when combined with other techniques [23, 24]. In particular, [23] use splitting and the \(\delta ^*\)-rule to generate domain minimal models. In the present research we have evaluated the combination of blocking techniques (introduced later in Sect. 7) with the classical range-restricting transformation \(\text {crr}\). This has shown promising empirical results as presented in Sect. 10.

Next, we introduce an improved transformation to range-restricted form. Instead of enumerating the generally infinite Herbrand universe in a bottom-up fashion, the intuition is that it generates terms only as needed. A major difference to the crr transformation is that it involves term abstraction for body atoms.

Definition 3

(Term abstraction) Let \(P(t_1,\ldots ,t_n)\) be an atom and \(x_1,\ldots ,x_ n\) distinct, fresh variables. For all \(i \in \{1,\ldots ,n\}\), let \(s_i = t_i\), if \(t_i\) is a variable, and \(s_i = x_i\), otherwise. The atom \(P(s_1,\ldots ,s_n)\) is called the term abstraction of\(P(t_1,\ldots ,t_n)\). \(\square \)

For example, the term abstraction of \(\mathsf{P}(x, x, \mathsf{a}, \mathsf{f}(x))\) is \(\mathsf{P}(x, x, x_3, x_4)\).

The range-restricting transformation of a clause set M, denoted by \(\text {rr}(M)\), is defined by the following algorithm (explanations and an example are given afterwards).

×

The intuition of the rr transformation reveals itself if we think of what happens when using hyperresolution with splitting. The idea is again to build model(s) during the derivation, but this time terms are added to the domain only as necessary.

The initialization of the dom-relation in Step (1) is the same as in the definition of \(\text {crr}\). The auxiliary clauses collected in N in Step (2) cause functional terms that occur in clause bodies to be inserted into the domain. Below, after Definition 4 we explain why these clauses are needed. Step (3) is the classical range-restrictedness transformation applied to the given clauses and the auxiliary clauses N. Step (4) ensures that positively occurring functional terms are added to the domain, and Step (5) ensures that the domain is closed under subterms.

To illustrate the steps of the rr transformation consider the clause set M comprised of the single clause Suppose the clause initializing res in Step (1) is \(\mathsf{dom}(\mathsf{a})\). In Step (2) the term abstraction of the body literal of clause (\(\dagger \)) is \(\mathsf{P}( x_1, x_2, x)\). The clauses added to N are the following: Notice that among the three clauses in \(M \cup N\) the clauses (\(\dagger \)) and (\(\ddagger \)) are not range-restricted. They are however added as range-restricted clauses to res in Step (3), namely: Step (4) generates clauses responsible for inserting the terms that occur in the heads of clauses into the domain. That is, for each \(i \in \{1,2,3\}\) and each \(j \in \{1,2\}\) these clauses are added.Now, to explain the effect of the rr transformation assume additional clauses so that the instance \(\mathsf{Q}(\mathsf{a},\mathsf{g}(\mathsf{a}, \mathsf{f}(\mathsf{a}, \mathsf{a})))\) of one of the head atoms of the clause (\(\dagger \)) is derived (by hyperresolution with splitting, say). With the available clauses, \(\mathsf{dom}(\mathsf{a})\) and \(\mathsf{dom}(\mathsf{g}(\mathsf{a}, \mathsf{f}(\mathsf{a}, \mathsf{a})))\) are also derived. It is not necessary to insert the terms of the instance of the other head atom \(\mathsf{R}(y,z)\) into the domain. The reason is that it does not matter how these (extra) terms are evaluated, or whether the atom is evaluated to true or false in order to satisfy the disjunction.

$$\begin{aligned} \mathsf{P}(x_1, x_2, x_3) \rightarrow {}&\mathsf{dom}(x_i) \\ \mathsf{Q}(x_1, x_2) \rightarrow {}&\mathsf{dom}(x_j) \\ \mathsf{R}(x_1, x_2) \rightarrow {}&\mathsf{dom}(x_j) \end{aligned}$$

×

×

×

The clauses added in Step (4) are not sufficient, however. For each term in the domain all its subterms have to be in the domain, too. This is achieved with the clauses in Step (5). That is, for each \(j \in \{1,2\}\) these clauses are added.In the example, from \(\mathsf{dom}(\mathsf{g}(\mathsf{a}, \mathsf{f}(\mathsf{a},\mathsf{a})))\) and the second of these clauses, the clause \(\mathsf{dom}(\mathsf{f}(\mathsf{a},\mathsf{a}))\) is derivable. Notice that, \(\mathsf{dom}(\mathsf{g}(\mathsf{a},\mathsf{a}))\) for example is not derivable. Indeed, \(\mathsf{g}(\mathsf{a},\mathsf{a})\) it is not a subterm of \(\mathsf{dom}(\mathsf{g}(\mathsf{a}, \mathsf{f}(\mathsf{a}, \mathsf{a})))\).

$$\begin{aligned} \mathsf{dom}(\mathsf{f}(x_1, x_2)) \rightarrow {}&\mathsf{dom}(x_j) \\ \mathsf{dom}(\mathsf{g}(x_1, x_2)) \rightarrow {}&\mathsf{dom}(x_j) \end{aligned}$$

We are interested in quasi-Herbrand interpretations that mimic a given Herbrand interpretation for \(\text {rr}(M)\) as close as possible, in the following way.

Definition 4

(Induced quasi-Herbrand interpretation) Let M be a clause set and suppose that \(\text {rr}(M)\) is satisfied by a Herbrand interpretation I. Let c be the constant in Step (1) of \(\text {rr}\) (so that \(\mathsf{dom}(c) \in \text {rr}(M)\)). The quasi-Herbrand interpretationJinduced byMandI is defined as follows:\(\square \)

(i)

\(|J| = \{ t \mid t\text { is a ground term and }I \models \mathsf{dom}(t) \}\).

(ii)

For every n-ary \(f \in \Sigma _f\), the interpretation function \(f^J: |J| \times \cdots \times |J| \mapsto |J|\) is defined as follows, for all \(d_1,\ldots ,d_n\in |J|\):

$$\begin{aligned} f^J(d_1,\ldots ,d_n) = {\left\{ \begin{array}{ll} f(d_1,\ldots ,d_n) &{} \text {if }f(d_1,\ldots ,d_n) \in |J|,\text { and} \\ c &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

(iii)

For every n-ary \(P \in \Sigma _P\), the interpretation function \(P^J: |J| \times \cdots \times |J| \mapsto \{ false , true \}\) is defined as follows, for all \(d_1,\ldots ,d_n\in |J|\):

$$\begin{aligned} P^J(d_1,\ldots ,d_n) = P^I(d_1,\ldots ,d_n). \end{aligned}$$

Some comments are due. Let J be a quasi-Herbrand interpretation induced by some M and I. We wish to verify that J is indeed a quasi-Herbrand interpretation in the sense of Definition 1. For that, first, observe that |J| is subterm-closed. This is achieved by the clauses added in Step (5) in \(\text {rr}\) and the fact that \(I \models \text {rr}(M)\). Second, item (ii) in Definition 4 fixes the interpretation of function symbols such that \(J(t) = t\), for all \(t \in |J|\). This can be shown by induction on the term structure, in the induction step using the fact that |J| is subterm-closed.

^{5}Third, |J| contains at least one constant per Step (1) in \(\text {rr}\) and item (i) in Definition 4.Item (iii) makes J the same as I with respect to predicate symbols on the domain |J|.

The clauses added in Step (2) in the \(\text {rr}\) transformation may seem unintuitive. We are now in a position to explain why they are needed, from a completeness perspective. More precisely, if I is a Herbrand model of \(\text {rr}(M)\) then we wish to conclude that the quasi-Herbrand interpretation J induced by M and I is a model of M, cf. Proposition 1 below.

By means of an example assume a clause set M that contains the clause \(\mathsf{P}(\mathsf{f}(x),y,z,z) \rightarrow \mathsf{Q}(y)\). Let \(\mathsf{c}\) be the constant picked in Step (1) and assume as given a quasi-Herbrand interpretation J with \(|J| = \{\mathsf{b},\mathsf{c}, \ldots \}\) induced by M and Herbrand model I of \(\text {rr}(M)\). That is, |J| contains at least the constants \(\mathsf{b}\) and \(\mathsf{c}\) and other terms unspecified for now.

Suppose, for example, that J satisfies the body of the clause instance \(\mathsf{P}(\mathsf{f}(\mathsf{c}),\mathsf{c},\mathsf{b}, \mathsf{b}) \rightarrow \mathsf{Q}(\mathsf{c})\). For completeness, we need an argument that J also satisfies the head instance \(\mathsf{Q}(\mathsf{c})\).

That J satisfies \(\mathsf{P}(\mathsf{f}(\mathsf{c}),\mathsf{c},\mathsf{b}, \mathsf{b})\) can only be the case because I satisfies \(\mathsf{P}(\mathsf{f}(\mathsf{c}),\mathsf{c},\mathsf{b}, \mathsf{b})\)orI satisfies \(\mathsf{P}(\mathsf{c},\mathsf{c},\mathsf{b}, \mathsf{b})\). The latter is the case if \(\mathsf{f}(\mathsf{c})\) is interpreted as the default element c in J, cf. Definition 4 (iii). In fact, the clauses added in Step (2) make this case impossible (for all ground instances that matter).

More precisely, the \(\text {rr}\) transformation adds in Step (2) the clause \(\mathsf{P}(x_1,y,z,z) \rightarrow \mathsf{dom}(\mathsf{f}(x))\) which gets range-restricted as \(C = \mathsf{dom}(x) \wedge \mathsf{P}(x_1,y,z,z) \rightarrow \mathsf{dom}(\mathsf{f}(x))\) in Step (3). From Definition 4(i) and the assumption \(\mathsf{c} \in |J|\) it follows that \(I \models \mathsf{dom}(c)\). Regarding the other body literal \(\mathsf{P}(x_1,y,z,z)\) of C, notice that the variable \(x_1\) is unconstrained. This means that we can instantiate \(x_1\) with \(\mathsf{f}(\mathsf{c})\) or \(\mathsf{c}\) as required to make \(\mathsf{P}(x_1,\mathsf{c},\mathsf{b},\mathsf{b})\) true in I. But then, because I is a model of \(\text {rr}(M)\) it follows that I must satisfy the instantiated head \(\mathsf{dom}(\mathsf{f}(c))\). Again from Definition 4(i), \(\mathsf{f}(\mathsf{c}) \in |J|\) follows. That is, \(\mathsf{f}(\mathsf{c})\) is not interpreted as the default element c, as indicated above.

But then, by Definition 4(ii) and 4(iii) the body literal \(\mathsf{P}(\mathsf{f}(\mathsf{c}),\mathsf{c},\mathsf{b},\mathsf{b})\) is evaluated to the same truth value under I and J, which is \( true \), as J satisfies it. As I satisfies \(\mathsf{P}(\mathsf{f}(\mathsf{c}),\mathsf{c},\mathsf{b},\mathsf{b}) \rightarrow \mathsf{Q}(\mathsf{c})\), it satisfies \(\mathsf{Q}(\mathsf{c})\). Once more from Definition 4(iii) conclude that J satisfies \(\mathsf{Q}(\mathsf{c})\) as well, as desired.

The following lemma and proposition give the general argument.

Lemma 1

Let M, I and J be defined as in Definition 4.

(i)

If \(t_1,\ldots ,t_n \in |J|\) then \(J(P(t_1,\ldots ,t_n)) = I(P(t_1,\ldots ,t_n))\).

(ii)

For every \(\mathcal{B}\rightarrow \mathcal{H}\in M\) and substitution \(\gamma \) with codomain |J|: if \(J \models \mathcal{B}\gamma \) then \(I \models \mathcal{B}\gamma \).

Proof

(i) Assume \(t_1,\ldots ,t_n \in |J|\) and let \(P \in \Sigma _P\) arbitrary. For all \(1 \le i \le n\) let \(d_i = J(t_i)\). By definition of quasi-Herbrand interpretation \(d_i = t_i\). From Definition 4(iii) it follows(ii) Take any \(\mathcal{B}\rightarrow \mathcal{H}\in M\) and choose an arbitrary substitution \(\gamma \) with codomain |J|. Assume \(J \models \mathcal{B}\gamma \). Take any body atom \(P(t_1,\ldots ,t_n) \in \mathcal{B}\). It follows that \(J \models P(t_1,\ldots ,t_n)\gamma \). To prove the lemma statement it suffices to show \(I \models P(t_1,\ldots ,t_n)\gamma \).

$$\begin{aligned} J(P(t_1,\ldots ,t_n))= & {} P^J(J(t_1),\ldots ,J(t_n)) = P^J(d_1,\ldots ,d_n) \\= & {} P^I(d_1,\ldots ,d_n) = P^I(t_1,\ldots ,t_n) \\= & {} P^I(I(t_1),\ldots ,I(t_n)) = I(P(t_1,\ldots ,t_n)). \end{aligned}$$

The \(\text {rr}\) transformation adds in Step (2) the clauses \(P(s_1,\ldots ,s_n) \rightarrow \mathsf{dom}(t_j)\), for all \(j \in \{1 \le i \le n \mid t_i\text { is not a variable}\}\) where \(P(s_1,\ldots ,s_n)\) is the term abstraction of \(P(t_1,\ldots ,t_n)\). Let \(C_j = \mathsf{dom}(y_1) \wedge \cdots \wedge \mathsf{dom}(y_k) \wedge P(s_1,\ldots ,s_n) \rightarrow \mathsf{dom}(t_j)\) be the range-restricted versions of these clauses, obtained in Step (3), where \(\{y_1,\ldots ,y_k\} = {{\,\mathrm{var}\,}}(t_j) \setminus {{\,\mathrm{var}\,}}(P(s_1,\ldots ,s_n))\).

The first subgoal is to show \(J(s_i\gamma ) = J(t_i\gamma )\) for all \(i \in \{ 1,\ldots , n \}\). In the first case \(t_i\) is a variable. By construction \(s_i = t_i\) and hence, trivially, \(J(s_i\gamma ) = J(t_i\gamma )\). In the second case \(t_i\) is not a variable and \(s_i\) is a fresh variable \(x_i\). After extending \(\gamma \) by the binding \(x_i \mapsto J(t_i\gamma )\) it follows \(x_i\gamma = s_i\gamma = J(t_i\gamma )\). Notice this extension trivially preserves \(\gamma \) as a substitution with codomain |J|. In particular, hence, \(x_i\gamma \in |J|\). By quasi-Herbrand interpretation \(J(x_i\gamma ) = x_i\gamma \). Altogether then \(J(s_i\gamma ) = J(t_i\gamma )\). That is, in either case \(J(s_i\gamma ) = J(t_i\gamma )\) which completes the proof of the subgoal.

From \(J(s_i\gamma ) = J(t_i\gamma )\) it follows \(J(P(t_1,\ldots ,t_n)\gamma ) = J(P(s_1,\ldots ,s_n)\gamma )\) by semantics of first-order logic. From \(J \models P(t_1,\ldots ,t_n)\gamma \) above it follows \(J \models P(s_1,\ldots ,s_n)\gamma \). By construction every \(s_i\) is a variable. Because \(\gamma \) is a substitution with codomain |J| it follows trivially that \(s_i\gamma \in |J|\). By item (i) of this lemma then \(I \models P(s_1,\ldots ,s_n)\gamma \).

The next subgoal is to show \(t_i\gamma \in |J|\), for all \(i \in \{ 1,\ldots ,n \}\). The first case when \(t_i\) is a variable is trivial as \(\gamma \) is a substitution with codomain |J|.

In the second case \(t_i\) is not a variable and \(\text {rr}(M)\) contains the clause \(C_i = \mathsf{dom}(y_1) \wedge \cdots \wedge \mathsf{dom}(y_k) \wedge P(s_1,\ldots ,s_n) \rightarrow \mathsf{dom}(t_i)\) as mentioned above. Recall that \(\gamma \) is a substitution with codomain |J| and that |J| consists of the extension of the \(\mathsf{dom}\)-predicate in I (Definition 4(i)). It follows that \(y_m\gamma \in |J|\) and hence \(I \models \mathsf{dom}(y_m)\gamma \), for all \(m \in \{ 1,\ldots , k\}\).

Altogether, I satisfies \((\mathsf{dom}(y_1) \wedge \cdots \wedge \mathsf{dom}(y_k) \wedge P(s_1,\ldots ,s_n))\gamma \), the body of the clause instance \(C_i\gamma \). The lemma assumes that I is a Herbrand model of \(\text {rr}(M)\) as per Definition 4. It follows that I satisfies the instantiated head atom \(\mathsf{dom}(t_i\gamma )\) of \(C_i\gamma \). Again with Definition 4(i) conclude \(t_i\gamma \in |J|\). This completes the proof of the second subgoal. That is, \(t_i\gamma \in |J|\) for all \(i \in \{1,\ldots ,n\}\). With that result and \(J \models P(t_1,\ldots ,t_n)\gamma \) from above we can apply item (i) of this lemma and conclude \(I \models P(t_1,\ldots ,t_n)\gamma \), which remained to be shown. \(\square \)

Proposition 1

(Completeness of range-restriction) Let M be any clause set. If \(\text {rr}(M)\) is satisfiable then M is satisfiable. More specifically, if I is a Herbrand model of \(\text {rr}(M)\) then the quasi-Herbrand interpretation J induced by M and I is a model of M.

Proof

Let I be a Herbrand model of \(\text {rr}(M)\) and J be the quasi-Herbrand interpretation induced by M and I. We show \(J \models M\). Take any clause \(\mathcal{B}\rightarrow \mathcal{H}\in M\) and choose an arbitrary substitution \(\gamma \) with codomain |J|. It suffices to show \(J \models (\mathcal{B}\rightarrow \mathcal{H})\gamma \). The case of \(J \not \models \mathcal{B}\gamma \) being trivial, we assume \(J \models \mathcal{B}\gamma \) from now on and show \(J \models \mathcal{H}\gamma \).

From Lemma 1(ii) it follows that \(I \models \mathcal{B}\gamma \). Let \(\mathsf{dom}(x_1) \wedge \cdots \wedge \mathsf{dom}(x_k) \wedge \mathcal{B}\rightarrow \mathcal{H}\) be the range-restricted version of \(\mathcal{B}\rightarrow \mathcal{H}\) as introduced in Step (3) of \(\text {rr}(M)\). Because the codomain of \(\gamma \) is |J|, from the definition of |J| it follows that \(I \models \mathsf{dom}(x_i)\gamma \) for all \(i\in \{1,\ldots ,k\}\). Together, thus, \(I \models (\mathcal{B}\wedge \mathsf{dom}(x_1) \wedge \cdots \wedge \mathsf{dom}(x_k))\gamma \). As I is a Herbrand model of all clauses in \(\text {rr}(M)\) it follows that \(I \models \mathcal{H}\gamma \). That is, there is a head atom of the form \(P(t_1,\ldots ,t_n) \in \mathcal{H}\) such that \(I \models P(t_1,\ldots ,t_n)\gamma \). With the clauses added in Steps (4) and (5) we conclude \(I \models \mathsf{dom}(t_j\gamma )\), for all \(j\in \{1,\ldots ,n\}\). From the definition of |J| we immediately get \(t_j\gamma \in |J|\). From Definition 4(iii) \(J \models P(t_1,\ldots ,t_n)\gamma \) follows and hence \(J \models \mathcal{H}\gamma \), as desired. \(\square \)

## 5 Equality

The following result lays the groundwork the adaptation of the range-restriction transformation for equality.

Corollary 1

(Completeness of range-restriction wrt. E-interpretations) Let M be any clause set. If \(\text {rr}(M)\) is E-satisfiable then M is E-satisfiable. More specifically, assume there is an E-model I of \(\text {rr}(M)\). Then there is a Herbrand model J of \(\text {rr}(M)\) such that J interprets \(\approx \) as a congruence relation and \(J \models s \approx t\) whenever \(I(s) = I(t)\), for all \(s,t \in {{\,\mathrm{HU}\,}}_\Sigma \).

Proof

We prove the second claim only, as it entails the first. Let \(I_H\) be the Herbrand interpretation such that \(I_H \models \text {rr}(M) \cup E_I \cup \text {EAX}(\Sigma )\), where \(E_I = \{ s \approx t \mid s,t \in {{\,\mathrm{HU}\,}}_\Sigma \text { and } I(s) = I(t) \}\). With I given as an E-model of \(\text {rr}(M)\) it is clear that indeed \(I_H\) exists. Letbe the equality axioms with the reflexivity axiom replaced by all its ground instances. Of course, \(I_H \models \text {rr}(M) \cup E_I \cup \text {EAX}'(\Sigma )\) in the Herbrand semantics. One consequence of this move is that the \(\text {rr}\) transformation does not act on the clauses in \(E_I \cup \text {EAX}'(\Sigma )\) (this is straightforward to check).

$$\begin{aligned} \text {EAX}'(\Sigma ) = (\text {EAX}(\Sigma ) \setminus \{ x \approx x \}) \cup \{ t \approx t \mid t \in {{\,\mathrm{HU}\,}}_\Sigma \} \end{aligned}$$

^{6}Therefore \(\text {rr}(M) \cup E_I \cup \text {EAX}'(\Sigma )\) and \(\text {rr}(M \cup E_I \cup \text {EAX}'(\Sigma ))\) are the same set of clauses and it follows \(I_H \models \text {rr}(M \cup E_I \cup \text {EAX}'(\Sigma ))\).Let J be the quasi-Herbrand interpretation induced by \(M \cup E_I \cup \text {EAX}'(\Sigma )\) and \(I_H\). Proposition 1 implies that \(J \models M \cup E_I \cup \text {EAX}'(\Sigma )\). Regarding the domain \(|J| = \{ t \mid t\text { is a ground term and }I_H \models \mathsf{dom}(t)\}\), recall that \(\text {EAX}'(\Sigma )\) contains the clause \(t \approx t \), for every \(t \in {{\,\mathrm{HU}\,}}_\Sigma \). By the clauses added in Step (4) of the \(\text {rr}\) transformation (when P is the equality predicate \(\approx \)) and the fact that \(I_H \models \text {rr}(M)\), it follows that \(I_H \models \mathsf{dom}(t)\), for every \(t \in {{\,\mathrm{HU}\,}}_\Sigma \). In other words, \(|J| = {{\,\mathrm{HU}\,}}_\Sigma \) and so J is (also) a Herbrand interpretation.

As \(J \models M \cup E_I \cup \text {EAX}'(\Sigma )\) we have in particular that \(J \models E_I \cup \text {EAX}'(\Sigma )\). In other words, J interprets \(\approx \) as a congruence relation that includes the ground equations in \(E_I\), as claimed in the statement of the result. \(\square \)

We emphasize that we do not actually propose to use the equality axioms in conjunction with a theorem prover (though they can of course). They serve merely as a theoretical tool to prove completeness of the transformation.

In particular, the \(\text {rr}\) transformed clause set may compute a (finite) quasi-Herbrand interpretation even in the presence of equality. For example, for the clause set M consisting only of \(\mathsf{f}(\mathsf{b}) \approx \mathsf{f}(\mathsf{c}) \), its range-restricted version \(\text {rr}(M)\) has an E-model I. We get a quasi-Herbrand interpretation J induced from \(\text {rr}(M)\) and I with domain \(|J| = \{ \mathsf{b}, \mathsf{c}, \mathsf{f}(\mathsf{b}), \mathsf{f}(\mathsf{c}) \}\) such that \(f^J(f(\mathsf{b})) = c\) and \(f^J(f(\mathsf{c})) = c\). To get an E-interpretation \(J_\approx \) one first builds the congruence closure of \(\{ \mathsf{f}(\mathsf{f}(\mathsf{b})) \approx c, \mathsf{f}(\mathsf{f}(\mathsf{c})) \approx c\} \cup \{\mathsf{f}(\mathsf{b}) \approx \mathsf{f}(\mathsf{c})\}\). The first set stems from the definition of \(f^J\) and the second set from \(I \models \mathsf{f}(\mathsf{b}) \approx \mathsf{f}(\mathsf{c}) \). By means of the first set any ground term (e.g., \(\mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{b})))\)) is congruent to a term in |J|, but some terms in |J| may be congruent as well because of the second set.

In summary, this constructs a Herbrand interpretation where \(\approx \) is interpreted as a congruence relation that mimics the interpretation function \(f^I\) and satisfies the equations true in I. Moreover, every ground term is congruent to a term in |J| (which is the extension of the \(\mathsf{dom}\)-predicate in the model I). From that, we get an induced E-interpretation \(J_\approx \) on the domain \(|{{\,\mathrm{HU}\,}}_\Sigma |/{\approx }\) whose elements are represented by elements from |J|. In this example we obtainand \(J_\approx \) is an E-model of M.

$$\begin{aligned} |J_\approx |&= \big \{ \{\mathsf{b}\},\ \{\mathsf{c}, \mathsf{f}(\mathsf{f}(\mathsf{b})), \mathsf{f}(\mathsf{f}(\mathsf{c})), \mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{b})))), \mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{c})))),\ldots \}, \\&\quad \{\mathsf{f}(\mathsf{b}), \mathsf{f}(\mathsf{c}), \mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{b}))), \mathsf{f}(\mathsf{f}(\mathsf{f}(\mathsf{c}))), \ldots \}\big \} \end{aligned}$$

### 5.1 Modification: The \(\mathsf{myequal}\) Predicate

When using a BUMG theorem prover with built-in equality treatment, it is important to note that one particular type of clause in the \(\text {rr}\) transformation should not be treated as a normal clause. For the equality predicate the \(\text {rr}\) transformation produces in Step (2) the clauses Any superposition-based theorem prover simplifies these clauses to \(\mathsf{dom}(x)\): applying \({\textsf {EqRes}} \) to \(x \approx y \rightarrow \mathsf{dom}(x)\) derives \(\mathsf{dom}(x)\). Both clauses (\(\#\)) are now subsumed by the new clause \(\mathsf{dom}(x)\) and can be deleted.

×

As a consequence this can lead to all negative domain literals being resolved away and all clauses containing a positive domain literal to be subsumed. This means range-restriction is undone. This is what happens, for example, in SPASS.

Since the clauses added in Step (4) really only need to be added for positively occurring predicate symbols, an easy solution involves replacing any positive occurrence of the equality predicate by a predicate symbol \(\mathsf{myequal}\) (say), which is fresh in the signature, and adding the clausesrather than (\(\#\)). In addition, the clause set needs to be extended by this ‘definition’ of \(\mathsf{myequal}\).This solution has the intended effect of adding terms occurring in positive equality literals to the domain (by Step (4) of the \(\text {rr}\) transformation), and prevents other inferences or reductions on \(\mathsf{myequal}\).

$$\begin{aligned} \mathsf{myequal}(x,y) \rightarrow {}&\mathsf{dom}(x)&\mathsf{myequal}(x,y) \rightarrow {}&\mathsf{dom}(y) \end{aligned}$$

$$\begin{aligned} \mathsf{myequal}(x,y) \rightarrow {}&x \approx y \end{aligned}$$

In other words, instead of directly deriving a unit \(s \approx t \) the above solution first derives \(\mathsf{myequal}(s,t) \) as an intermediate conclusion, and then \(s \approx t \) from that. With \(\mathsf{myequal}(s,t) \) derived, \(\mathsf{dom}(s) \) and \(\mathsf{dom}(t) \) follow.

It is not difficult to prove that E-satisfiability is preserved in both directions:

Proposition 2

Let M be a clause set and suppose \(M'\) is obtained from M by replacing in every clause \(\mathcal{B}\rightarrow \mathcal{H}\in M\) every equation \(s \approx t \in \mathcal{H}\) by \(\mathsf{myequal}(s,t)\). Then M is E-satisfiable iff \(M' \cup \{ \mathsf{myequal}(x,y) \rightarrow x \approx y \}\) is E-satisfiable.

Proof

(\(\Leftarrow \)) Assume \(M' \cup \{ \mathsf{myequal}(x,y) \rightarrow x \approx y \}\) is satisfied by an E-interpretation I. Take arbitrarily any clause \(\mathcal{B}\rightarrow \mathcal{H}' \in M'\) and let \(\mathcal{B}\rightarrow \mathcal{H}\in M\) be the clause from which \(\mathcal{B}\rightarrow \mathcal{H}'\) was obtained. The clause \(\mathcal{B}\rightarrow \mathcal{H}\) is a logical consequence of \(\mathcal{B}\rightarrow \mathcal{H}'\) and \(\mathsf{myequal}(x,y) \rightarrow x \approx y\). From \(I \models M' \cup \{ \mathsf{myequal}(x,y) \rightarrow x \approx y \}\) it follows \(I \models \mathcal{B}\rightarrow \mathcal{H}\). As \(\mathcal{B}\rightarrow \mathcal{H}' \in M'\) was chosen arbitrarily it follows that \(I \models M\).

(\(\Rightarrow \)) Assume M is true in an E-interpretation I. Define the E-interpretation \(I'\) to be the same as I, but extended for the new predicate symbol \(\mathsf{myequal}\) by \(\mathsf{myequal}^{I'}(d_1,d_2) := d_1 \approx ^I d_2\), for every \(d_1,d_2 \in |I|\). Trivially, \(I' \models M\) (as \(\mathsf{myequal}\) is a fresh symbol) and \(I' \models \mathsf{myequal}(x,y) \rightarrow x \approx y\) (by definition of \(\mathsf{myequal}^{I'}\)). Take arbitrarily any clause \(\mathcal{B}\rightarrow \mathcal{H}\in M\) and let \(\mathcal{B}\rightarrow \mathcal{H}' \in M'\) be the clause after the described replacement. From \(I' \models \mathcal{B}\rightarrow \mathcal{H}\) it follows \(I' \models \mathcal{B}\rightarrow \mathcal{H}'\) by the semantics of first-order logic. As \(\mathcal{B}\rightarrow \mathcal{H}\in M\) was chosen arbitrarily it follows that \(I \models M' \cup \{ \mathsf{myequal}(x,y) \rightarrow x \approx y \}\). \(\square \)

Proposition 2 ensures that the \(\mathsf{myequal}\) predicate is semantically harmless. We should mention though an operational drawback, if superposition-style inference rules are used. Letting \(\succ \) denote a suitable term ordering, the ParaUnitPos inference rule then becomes the rule that from \(v[s] \approx u \) (left premise) and \(s \approx t \) (right premise) derive \(v[t] \approx u \), only if\(v[s] \succ u\)and\(s \succ t\). Now, the first restriction is lost if the left premise becomes \(\mathsf{myequal}(v[t], u)\) instead. This is a drawback. A better alternative to using the \(\mathsf{myequal}\) predicate is an inference rule that derives \(\mathsf{dom}(s) \) from \(s \approx t \).

### 5.2 Modification: Range-Restriction

We need one more modification to use the \(\text {rr}\) transformation with BUMG methods with built-in equality inference rules. The EqRes inference rule (see Sect. 3) can break the invariant that all positive clauses derivable from a \(\text {rr}\) transformed clause set are ground. Consider, for example, the clause \(x \approx \mathsf{f}(y) \rightarrow \mathsf{P}(x) \vee \mathsf{Q}(x)\). While range-restricted, applying EqRes to it yields \(\mathsf{P}(\mathsf{f}(y)) \vee \mathsf{Q}(\mathsf{f}(y))\), which is no longer range-restricted. Indeed, this clause cannot be split and processed further as intended.

The easiest fix of this problem is to replace line 17 in the \(\text {rr}\) transformation by the line

That is, all variables in the head must be subject to the \(\mathsf{dom}\)-predicate. As a result, the example clause is \(\text {rr}\) transformed into \(\mathsf{dom}(x) \wedge x \approx \mathsf{f}(y) \rightarrow \mathsf{P}(x) \vee \mathsf{Q}(x)\) and EqRes instead derives the range-restricted clause \(\mathsf{dom}(\mathsf{f}(y)) \rightarrow \mathsf{P}(\mathsf{f}(y)) \vee \mathsf{Q}(\mathsf{f}(y))\).let\(\{x_1,\ldots ,x_k\} = {{\,\mathrm{var}\,}}(\mathcal{H})\).

Semantically this change does not make any difference, as Corollary 1 applies without modification. Moreover, the change does not cause any operational penalties either. Consider that a (ground) unit clause H is paired with a body atom B in a clause \(\mathsf{dom}(x) \wedge B[x] \wedge \mathcal{B}\rightarrow \mathcal{H}\) in a HRes or ParaNeg inference (for simplicity we consider only one variable x to make the argument). By the clauses added in Steps (4) and (5) of the \(\text {rr}\) transformation, all subterms in H are in the domain as well, by derived \(\mathsf{dom}\)-facts. This entails that x, if assigned to a subterm of H, is in the domain. The additional body atom \(\mathsf{dom}(x)\), hence, is satisfied anyway.

Notice that the classical range-restriction transformation \(\text {crr}\) needs an analogous modification.

## 6 Shifting Transformation

The clauses introduced in Step (2) of the new \(\text {rr}\) transformation to range-restricted form use abstraction and insert (possibly a large number of) instantiations of terms occurring in the clause bodies into the domain. These are sometimes unnecessary and can lead to non-termination of BUMG procedures.

To explain by means of an example consider the singleton clause set comprised of

×

Step (2) in the \(\text {rr}\) transformation applied to the body atom \(\mathsf{Q}(x)\) does not lead to a new clause and is unproblematic. Step (2) applied to the other body atom \(\mathsf{P}(\mathsf{f}(x))\) adds to the set variable N a new clause of the form \(P(x_1) \rightarrow \mathsf{dom}(\mathsf{f}(x))\), whose range-restricted version is added in Step (3) to the final result.

×

This is already an improvement over the classical range-restriction \(\text {crr}\) which includes the less constraining clause \(\mathsf{dom}(x) \rightarrow \mathsf{dom}(\mathsf{f}(x))\). Together with the clause, say, \(\mathsf{dom}(\mathsf{b}) \) of Step (1) a derivation from that clause set contains \(\mathsf{dom}(\mathsf{f}(\mathsf{b})) \), \(\mathsf{dom}(\mathsf{f} (\mathsf{f}(\mathsf{b}))) \), etc, and does not terminate; the derivation from the \(\text {rr}\) transformed clause set does terminate quickly.

However, the advantage of the \(\text {rr}\) transformation is lost as soon as a clause P(t) , for any ground term t, is present or has been derived. For instance, if we extend the above example (\(*\)) with the clause \(\mathsf{P}(\mathsf{b})\) we obtain the clause setBUMG methods now also derive \(\mathsf{dom}(\mathsf{f}(\mathsf{b})) \), \(\mathsf{dom}(\mathsf{f} (\mathsf{f}(\mathsf{b}))) \), etc, from the \(\text {rr}\) transformation of this clause set by means of the clause (\(\spadesuit \)).

$$\begin{aligned}&\mathsf{P}(\mathsf{b})&\mathsf{Q}(x) \wedge \mathsf{P}(\mathsf{f}(x)) \rightarrow {}&\mathsf{R}(x) . \end{aligned}$$

The shifting transformation introduced in this section can address this problem (and so can the blocking transformation in Sect. 7 below). It consists of two sub-transformations, basic shifting and partial flattening.

### 6.1 Basic Shifting

If A is an atom \(P(t_1,\ldots ,t_n)\) then let \({\mathsf {not\_}A}\) denote the atom \({\mathsf {not\_}P}(t_1,\ldots ,t_n)\), where \({\mathsf {not\_}P}\) is a fresh predicate symbol which is uniquely associated with the predicate symbol P. If P is the equality symbol \(\approx \) we write \({\mathsf {not\_}P}\) as \(\not \approx \) and use infix notation.

Now, the basic shifting transformation of a clause set M is the clause set \(\text {bs}(M)\) obtained from M by the following algorithm.

×

In line 9 of Step (2) every atom \(B_i\) identified in a run of \(\text {bs}\), is called a shifted atom. Notice that we do not add clauses complementary to the ‘shifted atoms consistency’ clauses, that is, \(P(x_1,\ldots ,x_n) \vee {\mathsf {not\_}P}(x_1,\ldots ,x_n)\). They could be included but are superfluous.

As a side effect, the clause set \(\text {bs}(M)\) has implicitly assumed the signature \(\Sigma _P \cup \Sigma _P^+\) as needed for a subsequent \(\text {rr}\) transformation.

Let us consider as an example these clauses.

×

Basic shifting moves the negative occurrences of functional terms into heads for the last clause and we getApplying the \(\text {rr}\) transformation we obtain the following set.The clauses in column one are added in Steps (1) and (2), the clauses in columns one and two are present after Step (3), and the clauses in the third column are added by Step (4) and (5).

Even on this clause set termination of BUMG can be achieved. For instance, in a hyperresolution-like mode of operation and with splitting enabled, the SPASS prover [93, 94] splits the derived clause \(\mathsf{R}(\mathsf{b}) \vee {\mathsf {not\_}\mathsf{P}}(\mathsf{f}(\mathsf{b}))\), considers the case with the literal \(\mathsf{R}(\mathsf{b})\) first and terminates with a model. This is because a finite completion (model) is found without considering the case of the bigger literal \({\mathsf {not\_}\mathsf{P}}(\mathsf{f}(\mathsf{b}))\), which would have added the term \(\mathsf{f}(\mathsf{b})\) to the domain. The same behaviour can be achieved, for example, with the KRHyper BUMG prover, a hypertableaux theorem prover [95].

### 6.2 Partial Flattening

As can be seen in the previous example, the basic shifting transformation avoids the generation of new domain elements by removing literals from a clause body. Of course, a smaller clause body affects the search space, as then the clause can be used as a premise more often. To (partially) avoid this effect, we propose an additional transformation, called partial flattening, to be performed prior to the basic shifting transformation. It consists of replacing all compound terms in non-equational body atoms and replacing them by equations with these variables.

We need some preliminaries before defining partial flattening formally. Let \({\mathsf {unif}}\) be a fresh binary predicate symbol and assume \(\Sigma _P\) has already been extended by \({\mathsf {unif}}\). (Intuitively, \({\mathsf {unif}}\) stands for ‘unifiability’ and is defined by the clause \({\mathsf {unif}}(x,x) \).)

Definition 5

(Target term) Let \(P(t_1,\ldots ,t_n)\) be an atom. For all \(i \in \{1,\ldots ,n\}\), \(t_i\) is a target term in\(P(t_1,\ldots ,t_n)\) iff \(P \notin \{\approx ,\, {\mathsf {unif}}\}\) and \(t_i\) is a compound term. For any variable x, define \(P(t_1,\ldots ,t_n)[t_i \mapsto x] = P(t_1,\ldots ,t_{i-1},x,t_{i+1},\ldots ,t_n)\). \(\square \)

For a clause set M, the partial flattening transformation is the clause set \(\text {pf}(M)\) obtained from M by applying the following algorithm.

×

Applied to our running example consisting of the clauses \((**)\) we get as a result of the \(\text {pf}\) transformation the clausesAltogether, applying the transformations \(\text {pf}\), \(\text {bs}\) and \(\text {rr}\), in this order, yields the following clauses (among other clauses, which are omitted because they are not relevant to the current discussion).Observe that the clause \(\mathsf{Q}(x) \wedge \mathsf{P}(u) \rightarrow \mathsf{R}(x) \vee {\mathsf {not\_}{\mathsf {unif}}}(u, \mathsf{f}(x))\) is more restricted than the above clause \(\mathsf{Q}(x) \rightarrow \mathsf{R}(x) \vee {\mathsf {not\_}\mathsf{P}}(\mathsf{f}(x))\) because of the additional body literal \(\mathsf{P}(u)\).

$$\begin{aligned}&{\mathsf {unif}}(x,x)&\mathsf{P}(\mathsf{b})&\mathsf{Q}(x)&\mathsf{Q}(x) \wedge \mathsf{P}(u) \wedge {\mathsf {unif}}(u, \mathsf{f}(x)) \rightarrow {}&\mathsf{R}(x). \end{aligned}$$

^{7}The reason for not extracting constants during partial flattening is that adding them to the domain does not cause non-termination of BUMG methods. It is preferable to leave them in place in the body literals because they have a stronger constraining effect than the variables introduced otherwise.

An immediate improvement of the combined \(\text {pf}\), \(\text {bs}\) and \(\text {rr}\) transformations comes to mind. The \(\text {pf}\) transformation introduces in its Step (2) \({\mathsf {unif}}\)-body atoms that are, by design, all shifted into corresponding \({\mathsf {not\_}{\mathsf {unif}}}\)-head atoms. The only \({\mathsf {unif}}\)-atoms that remain in the end are those four clauses in the above example. We can hence eliminate the \({\mathsf {unif}}\) predicate from the set by applying resolution inferences exhaustively among them. After deleting tautologies we obtain a simplified, equi-satisfiable clause set.As for \(\text {rr}\), the \(\text {pf}\) and \(\text {bs}\) transformations can be used with respect to equality as well. However, extracting top-level terms from equations has no effect at all, and this is why they are excluded from \(\text {pf}\). Consider the unit clause \(\mathsf{f}(\mathsf{a}) \approx \mathsf{b} \rightarrow \bot \), and its (otherwise) partial flattening \(x \approx \mathsf{b} \wedge {\mathsf {unif}}(x,\mathsf{f}(\mathsf{a})) \rightarrow \bot \). Applying basic shifting yields \(x \approx \mathsf{b} \rightarrow {\mathsf {not\_}{\mathsf {unif}}}(x,\mathsf{f}(\mathsf{a}))\), and, hyperresolution with a derivable instance \(\mathsf{b}\approx \mathsf{b}\) (or the EqRes rule)) gives \({\mathsf {not\_}{\mathsf {unif}}}(\mathsf{b},\mathsf{f}(\mathsf{a}))\). In terms of derivable \(\mathsf{dom}\)-clauses this is equivalent to \(\mathsf{b} \not \approx \mathsf{f}(\mathsf{a})\) as obtained by the \(\text {bs}\) transformation applied to \(f(a) \approx b \rightarrow \bot \) without doing partial flattening first. This explains why top-level terms of equational literals are excluded from the definition. (One could consider using standard flattening, that is, recursively extracting terms, but this does not lead to any improvements over the defined transformations.)

^{8}In the example it looks as follows:$$\begin{aligned} \mathsf{Q}(x) \wedge \mathsf{P}(u) \rightarrow {}&\mathsf{R}(x) \vee {\mathsf {not\_}{\mathsf {unif}}}(\mathsf{f}(x), u)&{\mathsf {not\_}{\mathsf {unif}}}(x, y) \rightarrow {}&\mathsf{dom}(x) \\ {\mathsf {not\_}{\mathsf {unif}}}(x, x) \wedge {}&\mathsf{dom}(x) \rightarrow \bot&{\mathsf {not\_}{\mathsf {unif}}}(x, y) \rightarrow {}&\mathsf{dom}(y). \end{aligned}$$

Finally, we combine basic shifting and partial flattening to give the shifting transformation:

Definition 6

(Shifting) The shifting transformation is defined as \(\text {sh}:= \text {pf}\circ \text {bs}\), that is, \(\text {sh}(M) = \text {bs}(\text {pf}(M))\), for any clause set M.

Proposition 3

(Completeness of shifting) Let M be any clause set. Then M is satisfiable iff \(\text {sh}(M)\) is satisfiable.

Proof

Not difficult, since \(\text {bs}\) (basic shifting) can be seen to be a structural transformation and \(\text {pf}\) (partial flattening) is a form of term abstraction. The formal argumentation is similar to the proof of Proposition 2. In particular, the interpretation function for the predicates \({\mathsf {not\_}P}\) is the negation of P. \(\square \)

Corollary 2

(Completeness of shifting with respect to E-interpretations) Let M be any clause set. Then M is E-satisfiable iff \(\text {sh}(M)\) is E-satisfiable.

Proof

Using the same line of argument as in the proof of Corollary 1, proving preservation of E-satisfiability can be reduced to proving preservation of satisfiability by means of the equality axioms (observe that the shifting transformation does not modify the equality axioms). \(\square \)

## 7 Blocking

The final transformation introduced in this paper is called blocking and provides a mechanism for detecting recurrence in the derived models. The blocking transformation is designed to realize a ‘loop-check’ for the construction of a domain, by capitalizing on available, powerful equality reasoning technology and redundancy criteria from saturation-based theorem proving. To be suitable, a resolution-based prover, for instance, should support hyperresolution-style inference, strong equality inference (for example, superposition or rewriting), splitting, the possibility to search for split-off positive equations first (explained below), and standard redundancy elimination techniques.

The basic idea behind blocking is to add clauses that cause a case analysis of the form \(s \approx t\) versus \(s \not \approx t\), for (ground) terms s and t. Although such a case analysis apparently leads to a bigger search space, it provides a powerful technique to detect finite models with a BUMG prover and in practice actually leads to satisfiability being detected more quickly. This is because in the case that \(s \approx t\) is assumed, this new equation leads to rewriting and representing otherwise infinitely many terms as one single term. To make this possible, the prover must support the above features, including notably splitting. Among resolution theorem provers splitting has become standard. Splitting was first available in the saturation-based prover SPASS [93, 94], but is also part of VAMPIRE [73, 74] and E [83]. Splitting is an integral part of the hypertableau prover E-KRHyper [13, 72].

In the following we introduce four different, but closely related, blocking transformations, called subterm domain blocking, subterm predicate blocking, unrestricted domain blocking and unrestricted predicate blocking. Subterm domain blocking was introduced in the short version of this paper under the name blocking [16]. Subterm predicate blocking is inspired by and related to the blocking technique described in [50]. Unrestricted domain blocking is the first-order version of the unrestricted blocking rule introduced in [79] and used for developing terminating tableau calculi for logics with the effective finite model property [80, 81]. A logic L (or fragment \(\mathcal {F}\) of first-order logic) has the effective finite model property if there a computable model bounding function \(\mu :\mathbb {N}\rightarrow \mathbb {N}\) such that the following holds. For every formula \(\phi \), if \(\phi \) is satisfiable in an L-model (\(\mathcal {F}\)-model) then \(\phi \) has a finite L-model (\(\mathcal {F}\)-model) with domain size not exceeding \(\mu (n)\), where n is the length of \(\phi \).

### 7.1 Subterm Domain Blocking

Let \(\mathsf{sub}\) be a fresh binary predicate symbol not in \(\Sigma _P\). For a clause set M, the subterm domain blocking transformation is the clause set \(\text {sdb}(M)\) obtained from M by applying the following algorithm.

×

Regarding Step (3), recall from Sect. 6.1 that \(\not \approx \) is suggestive notation for the introduced predicate symbol \({\mathsf {not\_}{\approx }}\). The subterm domain blocking transformation allows to consider whether two domain elements that are in a subterm relationship should be identified and merged, or not.

This blocking transformation preserves range-restrictedness. In fact, because the \(\mathsf{dom}\) predicate symbol is mentioned in the definition, the blocking transformation can be applied meaningfully only in combination with a range-restricting transformation.

Reading \(\mathsf{sub}(s,t)\) as ‘s is a subterm of t’, Step (2) in the blocking transformation might seem overly involved, because an apparently simpler specification of the subterm relationship for the terms of the signature \(\Sigma _f\) can be given. Namely:for every n-ary function symbol \(f \in \Sigma _f\) and all \(i\in \{1,\ldots ,n\}\). Yet, this specification, even if range-restricted, is not suitable for our purposes. The problem is that the second clause introduces compound terms.

$$\begin{aligned} \mathsf{dom}(x) \rightarrow {}&\mathsf{sub}(x,x)&\mathsf{sub}(x,x_i) \rightarrow {}&\mathsf{sub}(x,f(x_1,x_2\ldots ,x_n)) \end{aligned}$$

For example, for a given constant \(\mathsf{a}\) and a unary function symbol \(\mathsf{f}\), when just \(\mathsf{dom}(\mathsf{a})\) alone has been derived, a BUMG procedure derives an infinite sequence of clauses:This does not happen with the specification in Step (2). It ensures that conclusions of BUMG inferences involving \(\mathsf{sub}\) are about terms currently in the domain, without unforcedly extending it.

$$\begin{aligned} \mathsf{sub}(\mathsf{a}, \mathsf{a}),\ \mathsf{sub}(\mathsf{a}, \mathsf{f}(\mathsf{a})),\ \mathsf{sub}(\mathsf{a}, \mathsf{f}(\mathsf{f}(\mathsf{a}))),\ \ldots . \end{aligned}$$

To justify the clauses added in Step (3) we continue this example and suppose an interpretation that contains \(\mathsf{dom}(\mathsf{a})\) and \(\mathsf{dom}(\mathsf{f}(\mathsf{a}))\). These might have been derived earlier in the run of a BUMG prover. Then, from the clauses added by blocking, the (necessarily ground) disjunctionis derivable.

$$\begin{aligned} \mathsf{f}(\mathsf{a}) \approx \mathsf{a} \vee \mathsf{f}(\mathsf{a}) \not \approx \mathsf{a} \end{aligned}$$

Now, it is important to use a BUMG prover with support for splitting and to equip it with an appropriate search strategy. In particular, when deriving a disjunction such as the one derived above, the \(\approx \)-literal should be split off and the clause set obtained in this case should be searched first. The reason is that inference with a positive (ground) equational unit is simplifying. For example, should \(\mathsf{dom}(\mathsf{f}(\mathsf{f}(\mathsf{a})))\) be derivable now (in the current branch), then any prover featuring rewriting and a complete simplification ordering is able to prove it redundant from \(\mathsf{f}(\mathsf{a}) \approx \mathsf{a}\) and \(\mathsf{dom}(\mathsf{a})\). Consequently, the domain is not to be extended explicitly. The information that \(\mathsf{dom}(\mathsf{f}(\mathsf{f}(\mathsf{a})))\) is in the domain is nevertheless implicit via the theory of equality.

### 7.2 Subterm Predicate Blocking

Subterm domain blocking defined in the previous section applies blocking to domain terms where one is a proper subterm of the other. The idea of the subterm (unary) predicate blocking transformation is similar, but it merges only the (sub)terms in the extension of unary predicate symbols different from \(\mathsf{dom}\) in the current interpretation.

Subterm predicate blocking is defined as follows:

×

Observe that the only difference between this transformation and the subterm domain blocking transformation lies in Step (3). The clausesadded here are obviously more restricted than their counterpart \(\mathsf{sub}(x,y) \rightarrow x\approx y \vee x \not \approx y\) in the definition of the subterm domain blocking transformation sdb.

$$\begin{aligned} \mathsf{sub}(x,y) \wedge P(x) \wedge P(y) \rightarrow x\approx y \vee x \not \approx y \end{aligned}$$

That subterm predicate blocking is strictly more restricted can be seen from the following example, which also helps to explain the rationale behind this transformation.Any BUMG prover can be expected to terminate on the range-restricted version of the transformed clause set and returns the modelNotice that the subterm predicate blocking transformation includes the clausesbut all BUMG inferences between these clauses and the above model lead to redundant ground conclusions \( t\approx t \vee t \not \approx t\) (in this example where \(t \in \{ \mathsf{a}, \mathsf{f}(\mathsf{a})\}\)). The motivation behind these clauses is to equate the arguments of two \(\mathsf{P}\)-literals (say) only when there are two literals \(\mathsf{P}(s)\) and \(\mathsf{P}(t)\) where s is a subterm of t. Conversely, if no such recurrence comes up, as in the above example, there is no reason for blocking. By contrast, the subterm domain blocking transformation \(\text {sdb}\) with its clause \(\mathsf{sub}(x,y) \rightarrow x\approx y \vee x \not \approx y\) would be applicable even to distinct terms, leading to the (unnecessary) split into the cases \(\mathsf{a} \approx \mathsf{f}(\mathsf{a})\) and \(\mathsf{a} \not \approx \mathsf{f}(\mathsf{a})\).

$$\begin{aligned}&\mathsf{P}( \mathsf{a})&\mathsf{P}(x) \rightarrow {}&\mathsf{Q}( \mathsf{f}(x)) \end{aligned}$$

$$\begin{aligned} \{ \mathsf{dom}(\mathsf{a}),\ \mathsf{dom}(\mathsf{f}(\mathsf{a})),\ \mathsf{P}( \mathsf{a}),\ \mathsf{Q}( \mathsf{f}(\mathsf{a})),\ \mathsf{sub}(\mathsf{a}, \mathsf{a}),\ \mathsf{sub}(\mathsf{a},\mathsf{f}(\mathsf{a})) \}. \end{aligned}$$

$$\begin{aligned} \mathsf{sub}(x,y) \wedge \mathsf{P}(x) \wedge \mathsf{P}(y) \rightarrow {}&x\approx y \vee x \not \approx y \\ \mathsf{sub}(x,y) \wedge \mathsf{Q}(x) \wedge \mathsf{Q}(y) \rightarrow {}&x\approx y \vee x \not \approx y \end{aligned}$$

From a more general perspective, the \(\text {spb}\) transformation is motivated by the application to description logic knowledge bases [2, 50]. Often, such knowledge bases do not contain cyclic definitions, or only few definitions are cyclic. The subterm predicate transformation aims to apply blocking only to concepts (unary predicates) with cyclic definitions. In Sect. 7.5, we discuss a description logic example to highlight the differences between the various blocking transformations.

### 7.3 Unrestricted Domain Blocking

The two previous ‘subterm’ variants of the blocking transformation allow to speculatively identify terms and their subterms. The ‘unrestricted’ variants introduced next differ from both by allowing speculative identifications of any two terms.

For the ‘domain’ variant, called unrestricted domain blocking transformation, the definition is as follows.

×

There is a clear trade-off between this transformation and the subterm domain blocking transformation sdb. On the one hand, the unrestricted domain blocking transformation induces a larger search space, as the bodies of the clauses \(\mathsf{dom}(x) \wedge \mathsf{dom}(y) \rightarrow x\approx y \vee x \not \approx y\) are less constrained than their counterparts in the subterm domain blocking transformation. With this clause BUMG derives the clauses \(s\approx t \vee s \not \approx t\), for every s, t in the derivable domain. In contrast, the clause \(\mathsf{sub}(x, y) \rightarrow x\approx y \vee x \not \approx y\) from the \(\text {sdb}\) transformation derives \(s\approx t \vee s \not \approx t\) only if additionally s is a subterm of t. On the other hand, the unrestricted domain blocking transformation can enable finding models with smaller domains because of the additionally derivable equations. This means fewer congruence classes on the Herbrand terms are induced by the equality relation \(\approx \) and, as our evaluation shows, actually reduces effort and memory consumption as models are found quicker (if there are any), even for the \(\text {crr}\) transformation.

We claim that the termination proof in [82] for semantic ground tableau with unrestricted domain blocking for the description logic \(\mathcal {ALBO\smash [t]{{}^{\textsf {id}}}}\) with the expressive power similar to the two-variable fragment of first-order logic can be adapted to show BUMG with unrestricted domain blocking can return finite models, when they exist, i.e., decide finite satisfiability. For this it is essential that blocking is applied eagerly and a breadth-first or iterative deepening search strategy is used to avoid getting ‘lost’ in a branch of the derivation where only infinite models exist. Carrying over the results in [80] implies unrestricted domain blocking can be used in BUMG methods to return domain minimal models for logics with the effective finite model property.

### 7.4 Unrestricted Predicate Blocking

The definition of the last variant of blocking, the unrestricted (unary) predicate blocking transformation, is as follows.

×

This transformation allows to equate any two (distinct) terms in a P-relation, if there are any. The motivation is a combination of the above, to attempt to block recurrence on P-literals if they arise, and to compute models with small domains.

### 7.5 Comparison on an Example

It is instructive to compare the effects on the returned models of the four blocking transformations on an example from description logics. To this end, consider the description logic knowledge base (left) and its translation into clause logic (right) in Table 1.

Table 1

Sample description logic knowledge base and clausal form

Notice that the cycle in the dependency graph in the TBox (for \(P_1\) and \(P_2\)) means that some form of blocking is needed for decidability in tableau-based description logic systems. Likewise, blocking is needed to force BUMG methods to terminate on the translated clause form. Any of the four blocking transformations defined above suffice. Table 2 summarizes the behaviour of these transformations, in terms of interesting relations in the computed model.

Table 2

Partial truth assignments in models computed for the sample knowledge base

Blocking | \({\mathsf{dom}}\) | \({\approx }\) | \({\mathsf{P}_1}\) | \({\mathsf{P}_2}\) | \({\mathsf{Q}}\) |
---|---|---|---|---|---|

\(\text {sdb}\) | \(\mathsf{a}, \mathsf{b}\) | \(\begin{array}[t]{@{}l} \mathsf {f(a)}\approx \mathsf {a}, \mathsf {f(b)}\approx \mathsf {b}, \\ \mathsf {g(a)}\approx \mathsf {a}, \mathsf {g(b)}\approx \mathsf {b},\\ \mathsf {h(a)}\approx \mathsf {a}, \mathsf {h(b)}\approx \mathsf {b} \end{array}\) | \(\mathsf{a}, \mathsf{b}\) | \(\mathsf{a}, \mathsf{b}\) | \(\mathsf{a}, \mathsf{b}\) |

\(\text {spb}\) | \(\mathsf{a}, \mathsf{b}\) | \(\begin{array}[t]{@{}l} \mathsf {f(a)}\approx \mathsf {a}, \mathsf {f(b)}\approx \mathsf {b}, \\ \mathsf {g(a)}\approx \mathsf {a}, \mathsf {g(b)}\approx \mathsf {b},\\ \mathsf {h(a)}\approx \mathsf {a}, \mathsf {h(b)}\approx \mathsf {b} \end{array}\) | \(\mathsf{a}, \mathsf{b}\) | \(\mathsf{a}, \mathsf{b}\) | \(\mathsf{a}, \mathsf{b}\) |

\(\text {udb}\) | \(\mathsf{b}\) | \(\begin{array}[t]{@{}l} \mathsf {a}\approx \mathsf {b}, \mathsf {f(b)}\approx \mathsf {b},\\ \mathsf {g(b)}\approx \mathsf {b}, \mathsf {h(b)}\approx \mathsf {b} \end{array}\) | \(\mathsf{b}\) | \(\mathsf{b}\) | \(\mathsf{b}\) |

\(\text {upb}\) | \(\mathsf{b}, \mathsf {f(b)}, \mathsf {h(b)}\) | \(\begin{array}[t]{@{}l} \mathsf {a}\approx \mathsf {b},\\ \mathsf {g(h(b))}\approx \mathsf {b} \end{array}\) | \(\mathsf{b}\) | \(\mathsf {h(b)}\) | \(\mathsf {f(b)}\) |

When comparing in detail the blocking techniques developed for description logics it becomes clear thatwhen applied to a knowledge base with finite models, in conjunction with a suitable BUMG method (see above, at the end of Sect. 7.1), can be refined to simulate various forms of standard blocking techniques used in description logic systems, including subset ancestor blocking and equality ancestor blocking (cf. [50, 56, 82]). Because standard loop-checking mechanisms used in description logic systems do not require backtracking, appropriate search strategies and restrictions for performing inferences and applying blocking need to be used.

$$\begin{aligned} \text {the transformations }\text {rr}\circ \tau \text { and } \text {sh}\circ \text {rr}\circ \tau , \quad \text {for } \tau \in \{ \text {sdb}, \text {spb}, \text {udb}, \text {upb}\}, \end{aligned}$$

An advantage of our approach to blocking as opposed to loop-checking used in mainstream description logic systems [2] (i.e., blocking without equality reasoning) is that it applies to any first-order clause set, not only to clauses from the translation of description logic problems. This makes the approach very general and widely applicable.

For instance, our approach makes it possible to extend description logics with arbitrary (first-order expressible) ‘rule’ languages. ‘Rules’ provide a connection to (deductive) databases and are being used to represent information that is currently not expressible in the description logics associated with OWL DL [1]. The specification of many natural properties of binary relations and complex statements involving binary relations are outside the scope of most current description logic systems. An example is the statement: individuals who live and work at the same location are home workers. This can be expressed as a Horn rule (clause) \(\mathsf{work}(x, y) \wedge \mathsf{loc}(y) \wedge \mathsf{live}(x, z) \wedge \mathsf{loc}(z) \wedge y \approx z \rightarrow \mathsf{homeWorker}(x)\), but, with some exceptions [54, 94], is not expressible in current description logic systems.

## 8 Soundness and Completeness of the Transformations

Each of the blocking transformations is complete:

Proposition 4

(Completeness of blocking with respect to E-interpretations) Let M be any clause set. For all \(\tau \in \{ \text {sdb},\ \text {spb},\ \text {udb},\ \text {upb}\}\), if \(\tau (M)\) is E-satisfiable then M is E-satisfiable.

Proof

Not difficult, as \(M \subseteq \tau (M)\) by definition. \(\square \)

The converse, that is, soundness of the transformation, is easy to prove. One basically needs to observe that the clauses added in respectively steps of the blocking transformations, realize a case distinction over whether two terms are equal or not. Trivially, one of the two cases always holds.

Putting all the transformations and the corresponding results together we can state the main theoretical result of the paper.

Theorem 1

(Completeness of the combined transformations wrt. E-interpretations) Let M be a clause set and suppose \(\text {tr}\) is any of the transformations inThen:

$$\begin{aligned}&\{ \text {rr}, \text {sh}\circ \text {rr}\} \cup \{ \text {rr}\circ \tau , \text {sh}\circ \text {rr}\circ \tau \mid \tau \in \{ \text {sdb},\ \text {spb},\ \text {udb},\ \text {upb}\} \} \quad \text {or }\\&\{ \text {crr}, \text {sh}\circ \text {crr}\} \cup \{ \text {crr}\circ \tau , \text {sh}\circ \text {crr}\circ \tau \mid \tau \in \{ \text {sdb},\ \text {spb},\ \text {udb},\ \text {upb}\} \}. \end{aligned}$$

(i)

\(\text {tr}(M)\) is range-restricted.

(ii)

\(\text {tr}(M)\) can be computed in quadratic time.

(iii)

If \(\text {tr}(M)\) is E-satisfiable then M is E-satisfiable.

The reverse directions of (iii), that is, soundness of the respective transformations, hold as well. The proofs are either easy or completely standard. Also, item (iii) can be made more specific in the same way as Corollary 1.

By carefully modifying the definition of \(\text {rr}\) (by essentially structure sharing to optimize the size of the transformation and avoid unneeded duplication) it is possible to compute the reductions in linear time.

Proposition 5

Let M and \(\text {tr}\) be as in the previous result. Then:

(i)

The size of \(\text {tr}(M)\) is bounded by a linear function in the size of M.

(ii)

\(\text {tr}(M)\) can be computed in linear time.

## 9 Decidability of BS Classes

Formulae in the Bernays–Schönfinkel class are conjunctions of function-free formulae of the form \(\exists ^*\forall ^* \psi \), where \(\psi \) is free of quantifiers. A clause is a BS clause iff all functional terms occurring in it are constants.

From [60, §5.3] it is known that hyperresolution does not decide the Bernays–Schönfinkel class without equality directly, and neither do any of the standard refinements of resolution. However it has been shown that hyperresolution and any refinements can decide the class of range-restricted BS clauses without equality [77]. Here we assume that the language includes equality.

Theorem 2

The class of range-restricted BS clauses (with equality), is decidable by unordered hyperresolution, or unordered resolution with selection of at least one negative literal (and paramodulation). It is also decidable by all refinements using a complete simplification ordering (and superposition).

This means all refinements of hyperresolution (and some form of equality reasoning) combined with any translation into range-restricted clauses is a decision procedure for the BS class. Therefore:

Corollary 3

Let M be any set of BS clauses, and suppose \(\text {tr}\) is a transformation inThen:

$$\begin{aligned}&\big \{ \text {rr}, \text {sh}\circ \text {rr}\big \} \cup \big \{ \text {rr}\circ \tau , \text {sh}\circ \text {rr}\circ \tau \mid \tau \in \{ \text {sdb},\ \text {spb},\ \text {udb},\ \text {upb}\} \big \} \quad \text {and}\\&\big \{ \text {crr}, \text {sh}\circ \text {crr}\big \} \cup \big \{ \text {crr}\circ \tau , \text {sh}\circ \text {rr}\circ \tau \mid \tau \in \{ \text {sdb},\ \text {spb},\ \text {udb},\ \text {upb}\} \big \}. \end{aligned}$$

(i)

Hyperresolution (or resolution with selection of at least one negative literal) with paramodulation and all refinements using a superposition-based equality reasoning decide \(\text {tr}(M)\).

(ii)

All BUMG methods decide M.

Since there are linear transformations of first-order formulae into clausal form, and since all the \(\text {tr}\) transformations are effective reductions of first-order clauses into range-restricted clauses, we obtain the following result.

Theorem 3

(i)

There is a quadratic (linear), satisfiability equivalence preserving transformation of any formula in the Bernays–Schönfinkel class, and any set of BS clauses, into a set of range-restricted BS clauses.

(ii)

All procedures based on hyperresolution or BUMG using a suitable form of equality handling decide the class of BS formulae and the class of BS clauses.

In [77] a similar but different transformation is used to prove this result for hyperresolution and BS without equality. In fact, what is crucial for deciding the BS class is a grounding method. This can be achieved by any form of range-restriction and hyperresolution-like inferences. Theorem 3(ii) can therefore be strengthened to include also any instantiation-based method [22, 39, 40], in particular also methods using on-the-fly instantiation such as semantic Smullyan-type tableau systems, where instantiation of universally quantified variables is realized by the \(\gamma \)-rule.

## 10 Experimental Evaluation

We have implemented the transformations described in the previous sections and carried out experiments on problems from the TPTP library, Version 6.0.0 [89]. The implementation, in SWI-Prolog, is called Yarralumla (Yet another range-restriction avoiding loops under much less assumptions). Since the transformations introduced in this paper are defined for clausal problems we have selected for the experiments all the CNF problems from the TPTP suite.

In our initial research [16] we used Yarralumla with the MSPASS theorem prover, Version 2.0g.1.4 [52]. As the extra features of MSPASS have in the mean time been integrated into the SPASS theorem prover [94] and SPASS has significantly evolved since Version 2.0, for the present paper we combined Yarralumla with SPASS Version 3.8d.

For that purpose we modified the code of SPASS in a number of ways. We added one new flag to activate splitting on positive ground equality literals in positive non-Horn clauses. The main inference loop was adapted so that finding a splitting clause and applying splitting has highest priority (unchanged) followed immediately by picking a non-positive blocking clause, that is, a clause of the form \(B_1 \wedge \cdots \wedge B_k \rightarrow s\approx t \vee H_1 \vee \cdots \vee H_m\) for \(m \ge 0\) and \(k > 0\), and performing inferences with it. The selection of splitting clauses was adapted so that positive ground clauses of the form \(s\approx t \vee H_1 \vee \cdots \vee H_m\), where \(m \ge 1\), are always selected, when there are any. Moreover, the first equality literal is split upon. This adaptation ensures blocking is performed eagerly to keep the set of ground terms small. The tests with Yarralumla were performed using ordered resolution and superposition with selection of at least one negative literal, forward and backward rewriting, unlimited splitting (as described above),

^{9}matching replacement resolution,^{10}subsumption and various other simplification rules. This means the inferences are performed in an ordered hyperresolution-style with eager splitting and forward and backward ground rewriting. The derivations constructed are thus BUMG tree derivations, the proofs produced are BUMG refutation proofs, and the models returned are BUMG models.We also tested SPASS Version 3.8d in auto mode on the sample. In auto mode SPASS used ordered resolution with dynamic selection of negative literals (an option in SPASS). SPASS automatically turned off splitting for non-Horn clauses. Dynamic selection means typically literals were only selected if multiple maximal literals occur in a clause. This means the behaviour of SPASS in auto mode was very different to that of SPASS-Yarralumla, which always selected a literal in clauses with non-empty negative part. The changes to SPASS in SPASS-Yarralumla meant that splitting was performed eagerly and blocking clauses were targeted, which was not the case with SPASS in auto mode. We tested SPASS in auto mode only on the original files (translated from TPTP syntax to SPASS syntax).

The experiments were run on a cluster of 128 Dell PowerEdge M610 Blade Servers each with two Intel Xeon E5620 2.4 GHz processors and 48 GiB main memory each. The time limit was 10 min (CPU time).

SPASS-Yarralumla is available from http://www.cs.man.ac.uk/~schmidt/spass-yarralumla/.

### 10.1 Results

Tables 3 and 4 summarize the results for satisfiable clausal problems in the TPTP library, measuring the number of problems solved within the time limit. The columns with the heading ‘#’ give the number of problems in the TPTP categories and the different TPTP rating ranges. The subsequent columns give the number of problems solved within the time limit. The results are presented for the different BUMG methods that were used. For example, \(\text {sh}\circ \text {rr}\circ \text {sdb}\) refers to the method based on the transformation defined by the new range-restriction transformation, shifting and subterm domain blocking. To evaluate the effect of the different forms of blocking the results are grouped into groups of four: new range-restriction \(\text {rr}\), new range-restriction with shifting \(\text {sh}\circ \text {rr}\), classical range-restriction \(\text {crr}\) and classical range-restriction with shifting \(\text {sh}\circ \text {crr}\). (The groups are separated by larger gaps in the tables.) In each group the first column provides the baseline for that group. The last column with the heading ‘auto’ gives the results for runs of SPASS Version 3.8d in auto mode on the original input files. The runtimes for the problems solved spanned the whole range, from less than one second to all of the time allowed.

Table 3

Number of problems solved on satisfiable problems, by TPTP categories

Category | # | \(\text {rr}\) | \(\text {rr}\circ \text {sdb}\) | \(\text {rr}\circ \text {udb}\) | \(\text {rr}\circ \text {spb}\) | \(\text {rr}\circ \text {upb}\) | \(\text {sh}\circ \text {rr}\) | \(\text {sh}\circ \text {rr}\circ \text {sdb}\) | \(\text {sh}\circ \text {rr}\circ \text {udb}\) | \(\text {sh}\circ \text {rr}\circ \text {spb}\) | \(\text {sh}\circ \text {rr}\circ \text {upb}\) |
---|---|---|---|---|---|---|---|---|---|---|---|

ALG | 37 | – | – | – | – | – | – | – | – | – | – |

ANA | 2 | – | – | – | – | – | – | – | – | – | – |

BOO | 17 | – | 5 | 5 | – | – | – | 4 | 5 | – | – |

CAT | 10 | 2 | 6 | 4 | 4 | 2 | 2 | 4 | 4 | 4 | 2 |

COL | 6 | – | – | – | – | – | – | – | – | – | – |

GEO | 17 | – | – | – | – | – | – | – | 1 | – | – |

GRP | 85 | 49 | 55 | 60 | 49 | 50 | 50 | 53 | 57 | 50 | 51 |

HEN | 3 | – | – | – | – | – | – | – | – | – | – |

HWC | 2 | – | – | – | – | – | – | – | – | – | – |

HWV | 39 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |

KRS | 13 | 1 | 8 | 8 | 7 | 5 | 7 | 8 | 8 | 8 | 8 |

LAT | 62 | – | 10 | 4 | – | – | – | 6 | 5 | – | – |

LCL | 44 | 1 | 4 | 1 | 2 | 1 | 2 | 4 | 3 | 2 | 2 |

LDA | 26 | – | – | – | – | – | – | – | – | – | – |

MGT | 11 | 1 | 6 | 9 | 6 | 1 | 4 | 7 | 7 | 8 | 4 |

MSC | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

NLP | 236 | 52 | 130 | 137 | 128 | 63 | 99 | 191 | 195 | 188 | 110 |

NUM | 8 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

PLA | 6 | – | – | – | – | – | – | – | – | – | – |

PUZ | 27 | 7 | 8 | 8 | 7 | 7 | 7 | 8 | 8 | 7 | 7 |

REL | 1 | – | 1 | 1 | – | – | – | 1 | 1 | – | – |

RNG | 14 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

ROB | 5 | – | – | – | – | – | – | – | – | – | – |

SCT | 3 | 2 | 3 | 2 | 3 | 3 | 2 | 2 | 3 | 2 | 3 |

SET | 12 | – | – | – | – | – | – | – | – | – | – |

Category | # | \(\text {rr}\) | \(\text {rr}\circ \text {sdb}\) | \(\text {rr}\circ \text {udb}\) | \(\text {rr}\circ \text {spb}\) | \(\text {rr}\circ \text {upb}\) | \(\text {sh}\circ \text {rr}\) | \(\text {sh}\circ \text {rr}\circ \text {sdb}\) | \(\text {sh}\circ \text {rr}\circ \text {udb}\) | \(\text {sh}\circ \text {rr}\circ \text {spb}\) | \(\text {sh}\circ \text {rr}\circ \text {upb}\) |
---|---|---|---|---|---|---|---|---|---|---|---|

SWC | 1 | – | – | – | – | – | – | – | – | – | – |

SWV | 147 | 2 | 6 | 8 | 7 | 2 | 2 | 2 | 7 | 2 | 2 |

SWW | 39 | – | 7 | 8 | – | – | – | 2 | 1 | 1 | – |

SYN | 223 | 54 | 132 | 147 | 126 | 117 | 56 | 132 | 145 | 125 | 117 |

SYO | 9 | – | – | – | – | – | – | – | – | – | – |

TOP | 19 | – | 2 | 3 | 1 | – | – | 1 | 1 | – | – |

Total | 1126 | 176 | 388 | 410 | 345 | 256 | 236 | 430 | 456 | 402 | 311 |

Category | \(\text {crr}\) | \(\text {crr}\circ \text {sdb}\) | \(\text {crr}\circ \text {udb}\) | \(\text {crr}\circ \text {spb}\) | \(\text {crr}\circ \text {upb}\) | \(\text {sh}\circ \text {crr}\) | \(\text {sh}\circ \text {crr}\circ \text {sdb}\) | \(\text {sh}\circ \text {crr}\circ \text {udb}\) | \(\text {sh}\circ \text {crr}\circ \text {spb}\) | \(\text {sh}\circ \text {crr}\circ \text {upb}\) | auto |
---|---|---|---|---|---|---|---|---|---|---|---|

ALG | – | 1 | 1 | – | – | – | – | – | – | – | 3 |

ANA | – | – | – | – | – | – | – | – | – | – | – |

BOO | – | 3 | 6 | – | – | – | 3 | 6 | – | – | 2 |

CAT | 2 | 3 | 4 | 2 | 2 | 2 | 3 | 4 | 3 | 2 | 2 |

COL | – | 1 | 3 | – | – | – | 1 | 3 | – | – | – |

GEO | – | 1 | – | – | – | – | – | – | – | – | – |

GRP | 49 | 55 | 61 | 49 | 50 | 49 | 52 | 56 | 49 | 50 | – |

HEN | – | – | – | – | – | – | – | – | – | – | – |

HWC | – | – | – | – | – | – | – | – | – | – | – |

HWV | 2 | 3 | 3 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | – |

KRS | – | 7 | 8 | 7 | 5 | – | 7 | 7 | 7 | 5 | 8 |

LAT | – | 25 | 32 | – | – | – | 24 | 30 | – | – | – |

LCL | – | 2 | 5 | – | – | 1 | 2 | 5 | 1 | 1 | – |

LDA | – | – | – | – | – | – | – | – | – | – | – |

MGT | – | 7 | 5 | 5 | – | – | 4 | 5 | 4 | – | 8 |

MSC | – | 1 | 1 | 1 | 1 | – | 1 | 1 | 1 | 1 | 1 |

NLP | 22 | 125 | 125 | 114 | 33 | 22 | 118 | 125 | 104 | 33 | 198 |

NUM | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

PLA | – | – | – | – | – | – | – | – | – | – | – |

PUZ | 7 | 8 | 8 | 7 | 7 | 7 | 8 | 8 | 7 | 7 | 7 |

REL | – | 1 | 1 | – | – | – | 1 | 1 | – | – | – |

RNG | 1 | 1 | 2 | 1 | 1 | 1 | 3 | 1 | 1 | 1 | – |

ROB | – | – | – | – | – | – | – | – | – | – | – |

SCT | 2 | 3 | 3 | 3 | 3 | 2 | 2 | 3 | 2 | 2 | 3 |

SET | – | – | – | – | – | – | – | – | – | – | – |

SWC | – | – | – | – | – | – | – | – | – | – | – |

SWV | – | 5 | 7 | 5 | – | – | 4 | 6 | 4 | – | 79 |

SWW | – | – | 4 | – | – | – | – | – | – | – | – |

SYN | 54 | 134 | 148 | 126 | 110 | 55 | 131 | 143 | 124 | 111 | 99 |

SYO | – | – | – | – | – | – | – | – | – | – | – |

TOP | – | 2 | 2 | 1 | – | – | 1 | 3 | – | – | 6 |

Total | 140 | 389 | 430 | 324 | 215 | 142 | 368 | 411 | 310 | 216 | 417 |

Table 4

Number of problems solved on satisfiable problems, by TPTP problem rating

Category | # | \(\text {rr}\) | \(\text {rr}\circ \text {sdb}\) | \(\text {rr}\circ \text {udb}\) | \(\text {rr}\circ \text {spb}\) | \(\text {rr}\circ \text {upb}\) | \(\text {sh}\circ \text {rr}\) | \(\text {sh}\circ \text {rr}\circ \text {sdb}\) | \(\text {sh}\circ \text {rr}\circ \text {udb}\) | \(\text {sh}\circ \text {rr}\circ \text {spb}\) | \(\text {sh}\circ \text {rr}\circ \text {upb}\) |
---|---|---|---|---|---|---|---|---|---|---|---|

0.00 | 371 | 112 | 268 | 286 | 256 | 184 | 154 | 279 | 293 | 267 | 223 |

(0.00, 0.10] | – | – | – | – | – | – | – | – | – | – | – |

(0.10, 0.20] | 32 | 1 | 20 | 21 | 20 | 2 | 6 | 26 | 27 | 27 | 7 |

(0.20, 0.30] | 132 | 57 | 66 | 65 | 59 | 59 | 59 | 88 | 88 | 81 | 59 |

(0.30, 0.40] | 52 | 4 | 6 | 6 | 4 | 4 | 13 | 21 | 22 | 20 | 13 |

(0.40, 0.50] | 125 | 2 | 21 | 23 | 6 | 7 | 3 | 14 | 22 | 5 | 8 |

(0.50, 0.60] | 27 | – | 6 | 7 | – | – | 1 | 1 | 2 | 2 | 1 |

(0.60, 0.70] | 76 | – | 1 | 2 | – | – | – | 1 | 2 | - | – |

(0.70, 0.80] | 83 | – | – | 1 | – | – | – | 1 | 1 | - | – |

(0.80, 0.90] | 78 | – | 1 | – | – | – | – | – | – | – | – |

(0.90, 1.00] | – | – | – | – | – | – | – | – | – | – | – |

1.00 | 1017\({}^\dagger \) | – | – | – | – | – | – | – | – | – | – |

Total | – | 176 | 388 | 410 | 345 | 256 | 236 | 430 | 456 | 402 | 311 |

\(\text {crr}\) | \(\text {crr}\circ \text {sdb}\) | \(\text {crr}\circ \text {udb}\) | \(\text {crr}\circ \text {spb}\) | \(\text {crr}\circ \text {upb}\) | \(\text {sh}\circ \text {crr}\) | \(\text {sh}\circ \text {crr}\circ \text {sdb}\) | \(\text {sh}\circ \text {crr}\circ \text {udb}\) | \(\text {sh}\circ \text {crr}\circ \text {spb}\) | \(\text {sh}\circ \text {crr}\circ \text {upb}\) | auto | |
---|---|---|---|---|---|---|---|---|---|---|---|

0.00 | 80 | 264 | 276 | 241 | 150 | 80 | 250 | 270 | 227 | 150 | 231 |

(0.00, 0.10] | – | – | – | – | – | – | – | – | – | – | – |

(0.10, 0.20] | – | 21 | 19 | 20 | 1 | – | 18 | 19 | 18 | – | 29 |

(0.20, 0.30] | 54 | 64 | 66 | 56 | 54 | 55 | 65 | 66 | 57 | 56 | 39 |

(0.30, 0.40] | 4 | 5 | 6 | 4 | 4 | 4 | 4 | 6 | 4 | 4 | 20 |

(0.40, 0.50] | 2 | 26 | 42 | 3 | 6 | 3 | 23 | 40 | 4 | 6 | 18 |

(0.50, 0.60] | – | 3 | 7 | – | – | – | – | 1 | – | – | 6 |

(0.60, 0.70] | – | 5 | 8 | – | – | – | 6 | 6 | – | – | – |

(0.70, 0.80] | – | – | – | – | – | – | – | – | – | – | 59 |

(0.80, 0.90] | – | – | 4 | – | – | – | 1 | 2 | – | – | 15 |

(0.90, 1.00] | – | – | – | – | – | – | – | – | – | – | – |

1.00 | – | – | 1 | – | – | – | – | – | – | – | – |

Total | 140 | 389 | 430 | 324 | 215 | 142 | 368 | 411 | 310 | 216 | 417 |

The best results in each group in each row are highlighted in bold font. The underlined values are the best results for all methods including SPASS in auto mode. As expected the worst results in each group were obtained for the baseline transformations without blocking. This confirms the expectation that blocking is an essential technique for BUMG methods. Among the different blocking techniques the best results were obtained with unrestricted domain blocking in all four groups. Overall, the best result was obtained for the combination with \(\text {rr}\) and shifting, i.e., \(\text {sh}\circ \text {rr}\circ \text {udb}\), solving 6.0% more problems than the second best method, \(\text {crr}\circ \text {udb}\) using the classical range-restriction transformation without shifting, and nearly 11% more problems than the transformations \(\text {rr}\circ \text {udb}\) and \(\text {sh}\circ \text {crr}\circ \text {udb}\). This means shifting had a significant positive effect in combination with the new range-restriction transformation, but less so in combination with classical range-restriction. The positive effect of shifting could also be seen for the number of problems solved without blocking for \(\text {rr}\) and \(\text {sh}\circ \text {rr}\) (34% improvement).

^{11}The good results for \(\text {crr}\circ \text {udb}\) show the value of classical range-restriction. In the LAT category, \(\text {crr}\circ \text {udb}\) solved 32 problems, whereas \(\text {sh}\circ \text {rr}\circ \text {udb}\) solved only 5 problems. In this category the problems are from the domain of lattice theory and contain predominantly equational axioms (i.e., non-ground equational unit clauses) and one ground inequality (for the axiom to be refuted). On such problems it can be seen that \(\text {crr}\) produces instantiations of the axioms very quickly, whereas the \(\text {rr}\) favours producing clauses involving the \(\mathsf{myequal}\) predicate and the blocking splitting decisions are less systematic because domain elements are identified on a by-need basis during the inference process, whereas for \(\text {crr}\), all constants are in the domain from the outset, leading to blocking being performed with these immediately. On these problems \(\text {crr}\) has the advantage that blocking leads to a reduction of the size of the domain more quickly. This shows there is a trade-off between using the \(\text {crr}\) transformation and the \(\text {rr}\) transformation. The results however also showed the virtues of unrestricted domain blocking as a universal technique for BUMG. SPASS in auto mode fared very well in the SWV category, where 79 problems were solved compared to 7–8 problems for the best BUMG methods. Overall SPASS in auto mode solved 9% fewer problems than the best BUMG method \(\text {sh}\circ \text {rr}\circ \text {udb}\).

Looking at the top half of Table 4 (up to difficulty rating of 0.40), the BUMG method based on \(\text {sh}\circ \text {rr}\circ \text {udb}\) fared best, but for problems more difficult (up to a rating of 0.70) the performance deteriorated and the method \(\text {crr}\circ \text {udb}\) solved the highest number of problems. For problems with ratings higher than 0.70 SPASS in auto mode solved significantly more problems than the BUMG methods. One problem with rating 1.00 was solved by the \(\text {crr}\circ \text {udb}\) method (namely, GRP741-1 in 121.86 s). Problems in the TPTP library with rating 1.00 have not yet been solved by any other prover.

Table 5

Evaluation of blocking techniques

Baseline | \(\text {sdb}\) | \(\text {udb}\) | \(\text {spb}\) | \(\text {upb}\) |
---|---|---|---|---|

Satisfiable | ||||

\(\text {rr}\) | \(-\) 0 / \(+\) 212 | \(-\) 4 /\(+\) 238 | \(-\) 0 / \(+\) 169 | \(-\) 1 /\(+\) 81 |

\(\text {sh}\circ \text {rr}\) | \(-\) 1 / \(+\) 195 | \(-\) 5 / \(+\) 225 | \(-\) 2 / \(+\) 168 | \(-\) 2/\(+\) 77 |

\(\text {crr}\) | \(-\) 0 /\(+\) 249 | \(-\) 4 /\(+\) 294 | \(-\) 0 /\(+\) 184 | \(-\) 3 /\(+\) 78 |

\(\text {sh}\circ \text {crr}\) | \(-\) 0 /\(+\) 226 | \(-\) 5 /\(+\) 274 | \(-\) 0 /\(+\) 168 | \(-\) 3 /\(+\) 77 |

Unsatisfiable | ||||

\(\text {rr}\) | \(-\) 211 /\(+\) 78 | \(-\) 315 /\(+\) 83 | \(-\) 100 /\(+\) 61 | \(-\) 77 /\(+\) 37 |

\(\text {sh}\circ \text {rr}\) | \(-\) 188 /\(+\) 106 | \(-\) 225 /\(+\) 126 | \(-\)81 /\(+\)87 | \(-\)34 /\(+\)52 |

\(\text {crr}\) | \(-\) 65 /\(+\) 170 | \(-\) 105 /\(+\) 242 | \(-\) 32 /\(+\) 78 | \(-\) 26 /\(+\) 24 |

\(\text {sh}\circ \text {crr}\) | \(-\) 52 /\(+\) 163 | \(-\) 98 /\(+\) 190 | \(-\)30 /\(+\)57 | \(-\)16 /\(+\)15 |

Table 5 presents an evaluation of the different blocking techniques, listing the number of problems lost and the number of problems gained against the baseline methods in each group. The best results in each row of the first table (for satisfiable problems lost and gained) are highlighted in bold font. The results confirm the significant positive effect of unrestricted domain blocking for satisfiable problems. The results for unsatisfiable problems are discussed at the end of the section.

Analysis of the gain and loss of the method based on \(\text {sh}\circ \text {rr}\circ \text {udb}\) against the other methods gave these results: Against \(\text {rr}\circ \text {udb}\) 66 problems were gained and 20 problems lost; against \(\text {sh}\circ \text {crr}\circ \text {udb}\) the gain/loss was +90/-45 and against \(\text {crr}\circ \text {udb}\) it was +85/-59. This non-uniformity suggests each variation of range-restriction had the potential to solve some problems not solvable within the time limit by \(\text {sh}\circ \text {rr}\) with unrestricted blocking. The biggest variation was against SPASS in auto mode, where 169 problems were gained and 130 problems were lost.

Table 6

Uniquely solved problems

\(\text {rr}\) | \(\text {rr}\circ \text {sdb}\) | \(\text {rr}\circ \text {udb}\) | \(\text {rr}\circ \text {spb}\) | \(\text {rr}\circ \text {upb}\) | Total | \(\text {sh}\circ \text {rr}\) | \(\text {sh}\circ \text {rr}\circ \text {sdb}\) | \(\text {sh}\circ \text {rr}\circ \text {udb}\) | \(\text {sh}\circ \text {rr}\circ \text {spb}\) | \(\text {sh}\circ \text {rr}\circ \text {upb}\) | Total | \(\text {crr}\) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Satisfiable | |||||||||||||

All methods | – | 1 | – | – | – | – | – | 4 | – | – | – | ||

All BUMG methods | – | 1 | 2 | – | – | – | – | 6 | – | – | – | ||

All BUMG, by group | – | 12 | 28 | – | – | 40 | – | 4 | 25 | 1 | 2 | 32 | – |

Unsatisfiable | |||||||||||||

All methods | – | – | 1 | – | – | – | – | – | – | – | – | ||

All BUMG methods | 17 | 4 | 5 | 2 | 2 | 3 | 5 | 3 | – | 3 | – | ||

All BUMG, by group | 32 | 15 | 18 | 23 | 4 | 92 | 16 | 24 | 41 | 29 | 14 | 124 | 4 |

\(\text {crr}\circ \text {sdb}\) | \(\text {crr}\circ \text {udb}\) | \(\text {crr}\circ \text {spb}\) | \(\text {crr}\circ \text {upb}\) | Total | \(\text {sh}\circ \text {crr}\) | \(\text {sh}\circ \text {crr}\circ \text {sdb}\) | \(\text {sh}\circ \text {crr}\circ \text {udb}\) | \(\text {sh}\circ \text {crr}\circ \text {spb}\) | \(\text {sh}\circ \text {crr}\circ \text {upb}\) | Total | auto | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Satisfiable | |||||||||||||

All methods | – | 11 | – | – | – | 4 | 1 | – | – | 115 | 136 | ||

All BUMG methods | – | 11 | – | – | – | 4 | 1 | – | – | 25 | |||

All BUMG, by group | 10 | 57 | – | – | 67 | – | 12 | 60 | – | 1 | 73 | ||

Unsatisfiable | |||||||||||||

All methods | – | – | – | – | – | – | – | – | – | 1779 | 1780 | ||

All BUMG methods | 3 | 3 | 2 | – | – | 1 | 3 | – | – | 56 | |||

All BUMG, by group | 23 | 90 | 10 | 3 | 130 | 8 | 31 | 71 | 6 | 2 | 118 |

Table 6 displays how many problems were uniquely solved. We focus here only on the results for the satisfiable problems in the table and postpone the discussion for unsatisfiable problems to the end of the section. The first row lists how many problems were uniquely solved over all methods including SPASS in auto mode. Although two of the BUMG methods with unrestricted domain blocking fared better than SPASS in auto mode, the latter solved a significant number of problems that none of the BUMG could solve (namely, 115 problems, or 27.5% of the problems solved by SPASS in auto mode, or 10.2% of all satisfiable problems). This reflected the orthogonality of the underlying methods. Analogously, the relatively low number of problems uniquely solved by the BUMG methods (21 problems, i.e., 1.9% of all satisfiable problems), which is also apparent from the number of problems solved uniquely among the BUMG methods in the second row (25 or 2.2% of all satisfiable problems) can be attributed to the similarity of the underlying methods. An analysis of the number of uniquely solved problems per group of BUMG methods

^{12}in the third row of the table highlighted the importance of unrestricted domain blocking. While overall no problems were only solved with unary predicate blocking techniques, within the groups there were four problems solved only with unary predicate blocking.Table 7

Average increase in the size of input files

Method | Avg | Method | Avg | Method | Avg |
---|---|---|---|---|---|

\(\text {rr}\) | 1.8 | \(\text {crr}\) | 1.1 | auto | 0 |

\(\text {rr}\circ \text {sdb}\) | 2.4 | \(\text {crr}\circ \text {sdb}\) | 1.7 | ||

\(\text {rr}\circ \text {udb}\) | 1.9 | \(\text {crr}\circ \text {udb}\) | 1.2 | ||

\(\text {rr}\circ \text {spb}\) | 2.4 | \(\text {crr}\circ \text {spb}\) | 1.7 | ||

\(\text {rr}\circ \text {upb}\) | 1.9 | \(\text {crr}\circ \text {upb}\) | 1.2 | ||

\(\text {sh}\circ \text {rr}\) | 1.7 | \(\text {sh}\circ \text {crr}\) | 1.2 | ||

\(\text {sh}\circ \text {rr}\circ \text {sdb}\) | 2.3 | \(\text {sh}\circ \text {crr}\circ \text {sdb}\) | 1.9 | ||

\(\text {sh}\circ \text {rr}\circ \text {udb}\) | 1.8 | \(\text {sh}\circ \text {crr}\circ \text {udb}\) | 1.3 | ||

\(\text {sh}\circ \text {rr}\circ \text {spb}\) | 2.3 | \(\text {sh}\circ \text {crr}\circ \text {spb}\) | 1.8 | ||

\(\text {sh}\circ \text {rr}\circ \text {upb}\) | 1.7 | \(\text {sh}\circ \text {crr}\circ \text {upb}\) | 1.3 |

Table 7 gives an impression of the increase in the size of the input files caused by the transformations. Although the file sizes were measured after all comments and white space were removed, variations in name lengths distort the values slightly (which can be seen in the values for shifting). The results therefore need to be interpreted cautiously. The average increase in file size does show a significant effect on the size of the problem for the new range-restriction transformations and also subterm blocking (both subterm domain blocking and subterm predicate blocking). The largest increase in size was observed for the problem SYO600-1 (13.7 fold increase), which contained 380 predicate symbols with arity up to 64, 2 constants and no non-constant function symbols. The main cause for this increase was the large number of clauses added in Step (4) of the \(\text {rr}\) transformation. For each of the 284 predicate symbols with arity 64 in the problem, 64 clauses were added in Step (4). This is a large number. In contrast for the \(\text {crr}\) transformations the increase in size was negligible, and also, generally, it was significantly lower. Despite its positive virtues this shows a downside of the \(\text {rr}\) transformation. For problems containing a large number of function symbols with high arity, Step (5) similarly adds many clauses, even though the transformation overall is still effective.

Analysis of the problems solved without any form of blocking revealed a large number belonged to the Bernays–Schönfinkel class: 131/176 (74%) for \(\text {rr}\), 132/236 (56%) for \(\text {sh}\circ \text {rr}\), 133/140 (95%) for \(\text {crr}\), and 134/142 (94%) for \(\text {sh}\circ \text {rr}\). Since without blocking no terms are (hypothetically) merged, these results confirm the expectation that more problems can be solved with the \(\text {rr}\) transformation than the \(\text {crr}\) transformation, due to the reduction of the number of terms added to the domain.

Although the main purpose of BUMG methods is disproving theorems and generating models for satisfiable problems, for completeness we report in Tables 8 and 9 the results for unsatisfiable clausal TPTP problems. Results of gains and losses for different blocking techniques and uniquely solved for unsatisfiable problems are included in Tables 5 and 6. Overall, the results were not as uniform as for satisfiable problems. However some general observations can be made. SPASS in auto mode fared best overall, and did so in all TPTP categories and each problem rating category. For unsatisfiable problems the drawback of BUMG methods is that clauses need to be exhaustively grounded and each branch in the derivation tree needs to be closed. The dominance of SPASS in auto mode is thus not surprising.

Table 8

Number of problems solved on unsatisfiable clausal TPTP problems

Category | # | \(\text {rr}\) | \(\text {rr}\circ \text {sdb}\) | \(\text {rr}\circ \text {udb}\) | \(\text {rr}\circ \text {spb}\) | \(\text {rr}\circ \text {upb}\) | \(\text {sh}\circ \text {rr}\) | \(\text {sh}\circ \text {rr}\circ \text {sdb}\) | \(\text {sh}\circ \text {rr}\circ \text {udb}\) | \(\text {sh}\circ \text {rr}\circ \text {spb}\) | \(\text {sh}\circ \text {rr}\circ \text {upb}\) |
---|---|---|---|---|---|---|---|---|---|---|---|

ALG | 133 | 13 | 10 | 11 | 10 | 12 | 7 | 6 | 7 | 6 | 6 |

ANA | 83 | 7 | 11 | 11 | 10 | 7 | 6 | 7 | 7 | 6 | 6 |

BOO | 74 | 38 | 32 | 28 | 43 | 38 | 38 | 31 | 26 | 42 | 37 |

CAT | 52 | 38 | 33 | 33 | 34 | 38 | 29 | 31 | 32 | 29 | 28 |

COL | 225 | 32 | 23 | 22 | 25 | 28 | 19 | 18 | 16 | 17 | 19 |

COM | 14 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 5 | 6 | 6 |

FLD | 175 | 24 | 39 | 44 | 38 | 48 | 19 | 38 | 45 | 38 | 43 |

GEO | 187 | 68 | 68 | 67 | 67 | 66 | 71 | 74 | 66 | 72 | 71 |

GRA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

GRP | 798 | 233 | 183 | 171 | 240 | 225 | 307 | 231 | 260 | 291 | 298 |

HEN | 64 | 41 | 41 | 34 | 40 | 41 | 34 | 37 | 32 | 34 | 34 |

HWC | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

HWV | 116 | 44 | 39 | 34 | 40 | 35 | 21 | 21 | 21 | 23 | 20 |

KRS | 17 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 |

LAT | 250 | 39 | 33 | 29 | 40 | 39 | 39 | 25 | 24 | 43 | 37 |

LCL | 625 | 44 | 40 | 35 | 39 | 45 | 19 | 19 | 17 | 19 | 19 |

LDA | 23 | 3 | 2 | 2 | 3 | 3 | 3 | 3 | 2 | 3 | 3 |

MGT | 67 | 41 | 38 | 35 | 37 | 42 | 52 | 40 | 29 | 40 | 46 |

MSC | 20 | 9 | 11 | 9 | 10 | 9 | 8 | 10 | 7 | 9 | 7 |

NLP | 22 | 17 | 18 | 18 | 18 | 17 | 20 | 19 | 18 | 18 | 20 |

NUM | 86 | 15 | 10 | 10 | 13 | 13 | 10 | 8 | 9 | 9 | 11 |

PLA | 51 | 4 | 5 | 4 | 4 | 5 | 3 | 4 | 3 | 3 | 4 |

PUZ | 72 | 46 | 45 | 37 | 45 | 45 | 38 | 37 | 36 | 37 | 38 |

REL | 107 | 4 | 3 | 2 | 4 | 4 | 3 | 2 | 2 | 4 | 3 |

RNG | 83 | 20 | 11 | 8 | 23 | 19 | 19 | 10 | 9 | 22 | 19 |

ROB | 31 | 4 | 3 | 2 | 4 | 4 | 5 | 3 | 1 | 5 | 5 |

SCT | 98 | 8 | 8 | 8 | 8 | 8 | 2 | 2 | 2 | 2 | 2 |

SET | 450 | 68 | 63 | 57 | 58 | 66 | 46 | 50 | 52 | 47 | 45 |

SWC | 383 | 120 | 107 | 76 | 100 | 81 | 75 | 74 | 75 | 74 | 89 |

SWV | 855 | 100 | 72 | 70 | 92 | 104 | 43 | 45 | 38 | 48 | 45 |

SWW | 33 | – | - | - | - | - | - | - | - | - | - |

SYN | 621 | 330 | 329 | 321 | 326 | 328 | 318 | 326 | 318 | 319 | 317 |

SYO | 17 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

TOP | 5 | 3 | 3 | 3 | 3 | 3 | 1 | 2 | 3 | 1 | 1 |

Total | 5842 | 1431 | 1298 | 1199 | 1392 | 1391 | 1273 | 1191 | 1174 | 1279 | 1291 |

Category | \(\text {crr}\) | \(\text {crr}\circ \text {sdb}\) | \(\text {crr}\circ \text {udb}\) | \(\text {crr}\circ \text {spb}\) | \(\text {crr}\circ \text {upb}\) | \(\text {sh}\circ \text {crr}\) | \(\text {sh}\circ \text {crr}\circ \text {sdb}\) | \(\text {sh}\circ \text {crr}\circ \text {udb}\) | \(\text {sh}\circ \text {crr}\circ \text {spb}\) | \(\text {sh}\circ \text {crr}\circ \text {upb}\) | auto |
---|---|---|---|---|---|---|---|---|---|---|---|

ALG | 18 | 18 | 20 | 19 | 19 | 5 | 5 | 4 | 6 | 4 | 62 |

ANA | 4 | 7 | 10 | 6 | 4 | 3 | 4 | 5 | 3 | 3 | 40 |

BOO | 33 | 36 | 38 | 33 | 33 | 33 | 36 | 34 | 33 | 33 | 62 |

CAT | 34 | 36 | 35 | 35 | 34 | 27 | 30 | 30 | 30 | 28 | 51 |

COL | 32 | 31 | 25 | 32 | 31 | 8 | 15 | 15 | 8 | 8 | 149 |

COM | 5 | 5 | 5 | 5 | 5 | 4 | 4 | 4 | 4 | 4 | 12 |

FLD | 23 | 32 | 36 | 35 | 19 | 17 | 29 | 37 | 29 | 17 | 99 |

GEO | 45 | 43 | 53 | 45 | 42 | 45 | 46 | 47 | 44 | 44 | 121 |

GRA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

GRP | 319 | 378 | 424 | 337 | 322 | 358 | 385 | 406 | 353 | 357 | 673 |

HEN | 31 | 42 | 40 | 31 | 29 | 36 | 43 | 33 | 35 | 36 | 62 |

HWC | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 2 |

HWV | 18 | 18 | 20 | 18 | 19 | 15 | 14 | 10 | 13 | 12 | 72 |

KRS | 7 | 9 | 9 | 9 | 9 | 5 | 8 | 9 | 8 | 8 | 9 |

LAT | 56 | 56 | 64 | 57 | 56 | 70 | 65 | 56 | 71 | 70 | 91 |

LCL | 43 | 44 | 44 | 44 | 43 | 21 | 24 | 26 | 23 | 21 | 336 |

LDA | 5 | 5 | 5 | 5 | 5 | 5 | 6 | 5 | 5 | 5 | 6 |

MGT | 18 | 30 | 25 | 21 | 20 | 12 | 23 | 17 | 13 | 12 | 61 |

MSC | 8 | 10 | 10 | 9 | 8 | 7 | 10 | 8 | 7 | 7 | 14 |

NLP | 9 | 18 | 14 | 18 | 9 | 9 | 18 | 14 | 18 | 9 | 22 |

NUM | 10 | 9 | 8 | 10 | 11 | 8 | 7 | 8 | 8 | 8 | 36 |

PLA | 3 | 4 | 2 | 4 | 3 | 2 | 3 | 1 | 3 | 3 | 31 |

PUZ | 44 | 43 | 35 | 43 | 41 | 34 | 34 | 30 | 33 | 33 | 55 |

REL | 5 | 7 | 6 | 5 | 5 | 5 | 6 | 6 | 5 | 5 | 37 |

RNG | 10 | 14 | 14 | 11 | 10 | 10 | 14 | 14 | 11 | 11 | 48 |

ROB | 3 | 5 | 3 | 3 | 3 | 3 | 6 | 4 | 5 | 3 | 14 |

SCT | 11 | 11 | 11 | 11 | 11 | 2 | 2 | 2 | 2 | 2 | 27 |

SET | 53 | 50 | 51 | 54 | 53 | 29 | 36 | 40 | 29 | 28 | 290 |

SWC | 111 | 106 | 100 | 105 | 110 | 99 | 100 | 89 | 99 | 97 | 281 |

SWV | 77 | 72 | 71 | 77 | 77 | 29 | 31 | 33 | 29 | 28 | 373 |

SWW | 1 | 1 | 1 | 1 | 1 | – | – | – | – | – | 12 |

SYN | 321 | 321 | 314 | 319 | 322 | 300 | 307 | 306 | 300 | 303 | 508 |

SYO | 1 | 1 | 1 | 1 | 1 | – | 1 | – | 1 | 1 | 3 |

TOP | 2 | 3 | 3 | 3 | 3 | 1 | 1 | 1 | 1 | 1 | 5 |

Total | 1363 | 1468 | 1500 | 1409 | 1361 | 1204 | 1315 | 1296 | 1231 | 1203 | 3665 |

Table 9

Result summary wrt. problem rating on unsatisfiable clausal TPTP problems

Category | # | \(\text {rr}\) | \(\text {rr}\circ \text {sdb}\) | \(\text {rr}\circ \text {udb}\) | \(\text {rr}\circ \text {spb}\) | \(\text {rr}\circ \text {upb}\) | \(\text {sh}\circ \text {rr}\) | \(\text {sh}\circ \text {rr}\circ \text {sdb}\) | \(\text {sh}\circ \text {rr}\circ \text {udb}\) | \(\text {sh}\circ \text {rr}\circ \text {spb}\) | \(\text {sh}\circ \text {rr}\circ \text {upb}\) |
---|---|---|---|---|---|---|---|---|---|---|---|

0.00 | 1409 | 871 | 830 | 796 | 860 | 885 | 734 | 745 | 722 | 744 | 746 |

(0.00, 0.10] | 480 | 244 | 203 | 187 | 236 | 239 | 210 | 184 | 167 | 216 | 210 |

(0.10, 0.20] | 594 | 151 | 145 | 118 | 148 | 137 | 133 | 123 | 112 | 140 | 141 |

(0.20, 0.30] | 507 | 98 | 85 | 72 | 87 | 79 | 70 | 61 | 63 | 65 | 72 |

(0.30, 0.40] | 258 | 30 | 24 | 18 | 30 | 24 | 38 | 30 | 40 | 35 | 36 |

(0.40, 0.50] | 513 | 32 | 11 | 6 | 24 | 22 | 70 | 45 | 53 | 62 | 69 |

(0.50, 0.60] | 207 | 3 | – | 1 | 4 | 2 | 13 | 2 | 14 | 9 | 13 |

(0.60, 0.70] | 298 | 2 | – | 1 | 3 | 3 | 4 | 1 | 3 | 6 | 4 |

(0.70, 0.80] | 324 | – | – | – | – | – | 1 | – | – | 2 | – |

(0.80, 0.90] | 344 | – | – | – | – | – | – | – | – | – | – |

(0.90, 1.00] | 259 | – | – | – | – | – | – | – | – | – | – |

1.00 | 1516 | – | – | – | – | – | – | – | – | – | - |

Total | – | 1431 | 1298 | 1199 | 1392 | 1391 | 1273 | 1191 | 1174 | 1279 | 1291 |

Category | \(\text {crr}\) | \(\text {crr}\circ \text {sdb}\) | \(\text {crr}\circ \text {udb}\) | \(\text {crr}\circ \text {spb}\) | \(\text {crr}\circ \text {upb}\) | \(\text {sh}\circ \text {crr}\) | \(\text {sh}\circ \text {crr}\circ \text {sdb}\) | \(\text {sh}\circ \text {crr}\circ \text {udb}\) | \(\text {sh}\circ \text {crr}\circ \text {spb}\) | \(\text {sh}\circ \text {crr}\circ \text {upb}\) | auto |
---|---|---|---|---|---|---|---|---|---|---|---|

0.00 | 739 | 769 | 747 | 756 | 737 | 590 | 654 | 647 | 614 | 597 | 1360 |

(0.00, 0.10] | 236 | 239 | 245 | 237 | 234 | 204 | 213 | 212 | 202 | 201 | 477 |

(0.10, 0.20] | 151 | 165 | 162 | 164 | 150 | 145 | 157 | 144 | 157 | 144 | 529 |

(0.20, 0.30] | 85 | 89 | 94 | 85 | 83 | 67 | 70 | 71 | 66 | 64 | 382 |

(0.30, 0.40] | 42 | 55 | 61 | 51 | 44 | 51 | 53 | 46 | 49 | 51 | 226 |

(0.40, 0.50] | 75 | 116 | 139 | 86 | 76 | 89 | 112 | 125 | 89 | 88 | 344 |

(0.50, 0.60] | 22 | 23 | 30 | 20 | 23 | 27 | 30 | 30 | 26 | 27 | 122 |

(0.60, 0.70] | 12 | 11 | 13 | 9 | 12 | 19 | 14 | 11 | 16 | 19 | 110 |

(0.70, 0.80] | 1 | 1 | 6 | 1 | 2 | 4 | 4 | 4 | 4 | 4 | 85 |

(0.80, 0.90] | – | – | 1 | – | – | 1 | 1 | 1 | 1 | 1 | 25 |

(0.90, 1.00] | – | 2 | – | – | – | 7 | 7 | 5 | 7 | 7 | 5 |

1.00 | – | – | – | – | – | – | – | – | – | – | – |

Total | 1363 | 1468 | 1500 | 1409 | 1361 | 1204 | 1315 | 1296 | 1231 | 1203 | 3665 |

For the BUMG methods, a general deterioration in performance could be observed for shifting, when comparing the results for the groups with baselines \(\text {sh}\circ \text {rr}\) and \(\text {sh}\circ \text {crr}\) to the respective groups without shifting. This is plausible because shifting leads to fewer negative literals in clauses and more positive literals thus reducing the constraining effect and leading to more splitting. For problems with higher rating, shifting did seem to have a positive effect; for instance, in the (0.40, 0.50] range, \(\text {sh}\circ \text {rr}\) solved 70 problems whereas \(\text {rr}\) solved 32 problems.

Within the BUMG groups we expected best performance for the baseline transformations, because these do not involve blocking and performing many blocking steps lead to a significant overhead. However only for the first group the \(\text {rr}\) transformation fared best. In combination with classical range-restriction \(\text {crr}\), somewhat surprisingly, the best results were obtained with unrestricted domain blocking, the most expensive form of blocking, because it is applicable to any terms. Among the blocking techniques in each case the highest gain was obtained for unrestricted domain blocking (see Table 5). However also the greatest loss was observed for this blocking technique. The smallest loss and lowest gain was obtained for \(\text {upb}\) blocking. The high loss for \(\text {udb}\) could be a reflection of the high increase in splitting steps preventing quicker detection of contradictions. Analogously the small loss for \(\text {upb}\) could be attributable to the smallest number of additional splitting steps among the blocking techniques. The high gain for \(\text {udb}\) blocking suggests the inference process panned out significantly differently leading to solutions not found with the other techniques. This seems to be supported by the results in the third row of Table 6 according to which, with one exception, the largest number of uniquely solved problems in each group was obtained with \(\text {udb}\) blocking. The exception was the first group, where \(\text {rr}\) led to the largest number of uniquely solved problems. Among all the BUMG methods, \(\text {rr}\) solved the largest number of problems not solved by any of the other methods. However, these results pale against the number of uniquely solved problems by SPASS in auto mode. Only one problem was solved by a BUMG method which was not solved by SPASS in auto mode.

### 10.2 Findings of the Experimental Evaluation

Several findings can be drawn from the results. The results have confirmed our expectation that unrestricted domain blocking is a powerful technique, which helps discover finite models more often than with the other blocking techniques. The results suggest the technique is indispensable for bottom-up model generation. Both in combination with the new range-restricting transformation, and the classical range-restricting transformation, good results have been obtained. Overall, the method based on new range-restriction, shifting and unrestricted domain blocking performed best on the sample. On satisfiable problems with higher difficulty rating this method was however gradually edged out by the method based on classical range-restriction and unrestricted domain blocking. This suggests there is a trade-off between the \(\text {rr}\) transformation, which is based on a non-trivial transformation but does restrict the creation of terms, and the simpler \(\text {crr}\) transformation, which has to rely on blocking to restrict the creation of terms.

The results for subterm domain blocking were good and often not far behind unrestricted domain blocking for satisfiable problems. In contrast, predicate blocking seems not to be effective on many problems. We attribute this to the nature of the problems in the TPTP library.

An investigation with SPASS-Yarralumla on translations of modal logic problems has revealed a different picture [78]. There, the best performance was obtained with subterm domain blocking for both satisfiable and unsatisfiable problems. Better results than for unrestricted domain blocking were also obtained with subterm predicate blocking and unrestricted predicate blocking. Better performances for subterm and predicate blocking are also expected on problems stemming from (cyclic) description logic knowledge bases. Experiments with blocking restricted by excluding a finite subset of the domain have shown better results than for unrestricted domain blocking for consistency testing on a large corpus of ontologies [56]. The better performance for restricted forms of blocking on modal and description logic problems can be attributed to mainstream modal and description logics having the finite tree model property. This means every satisfiable formula holds in a model based on a finite tree, which is not a property of first-order formulae.

The results showed BUMG methods were good for disproving theorems and generating models for satisfiable problems. For unsatisfiable problems BUMG methods were significantly less efficient than SPASS in auto mode. For theorem proving purposes a limitation of BUMG methods is that they require full grounding. It can be seen already from very small unsatisfiable examples that a complete BUMG derivation tree can be very large, whereas resolution proofs are significantly shorter.

Compared to ordered resolution (and appropriate equality reasoning) without splitting, an advantage of BUMG methods for satisfiable problems is the division of the search space into branches which are individually constructed and individually processed. As a consequence, if the right decisions are made at branching points models can be found more quickly. When the branching point decisions are less optimal the performance can deteriorate dramatically, particularly if the search is trapped in a branch with only infinite models. This could be another explanation for the lower success rate of the BUMG methods observed for more difficult satisfiable problems. For problems where only infinite models exist, clearly other methods are better. Methods that can compute representations of infinite models include the method of [33] computing atom representations and the method of [27] using a hybrid approach based on careful enumeration and deduction on constrained clauses.

## 11 Discussion and Related Work

For modal, description or hybrid logics characterized by finite tree models, blocking is implemented as a loop-checking mechanism which involves the comparison of sets of formulae in order to detect periodicity in the (implicitly) constructed model. The idea is that (implicit) objects (or possible worlds) are reused when the set of formulae associated with the current (implicit) object is the same as (or a subset of) the formulae associated with another previously created (implicit) object, because the inference steps would be repetitions of inference steps previously performed for the other object [19, 25, 28, 35, 47, 49, 58, 64]. Variations of this form of loop-checking can be found in description logic tableaux systems, referred to as predecessor equality (or subset) blocking, ancestor equality (or subset) blocking and anywhere equality (or subset) blocking [2, 25, 47] as well as pattern-based blocking [55]. These loop-checking techniques are sound for modal and description logics with some form of tree model property, provided the inference rules and the inference strategy ensures that objects are maximally expanded at the point that the loop-check is performed (e.g., S4 and \(\mathcal{ALC}\) with TBox axioms belong to this class). For other logics, it must be possible to undo previous merges of objects when no models can be found, and to try other merges. Examples of this form of blocking are dynamic anywhere blocking and pair-wise blocking which have been introduced for logics with inverse and functional relations [2, 45, 46]. Often these tableau systems do not require explicit presence of objects (or worlds) in the formulae of the tableau system and tightly integrate loop-checking into the rules and how they can be used. As a consequence the definitions of these tableau systems and the soundness and completeness proofs are very particular. How far this style of system and loop-checking can be pushed toward the boundary of undecidable modal, description and hybrid logics is open. It is also unclear if the approach can be extended beyond logics with some form of tree-model property, see the discussion in [82].

The work of [50, 51, 53] presents the relationship between tableau-based description logic systems and first-order resolution. In [50] it was shown how tableau decision procedures for \(\mathcal{ALC}\) with TBox axioms can be step-wise simulated using a positive BUMG refinement of resolution (a negative selection refinement augmented with splitting and one-step rewriting). This requires a special encoding of the semantics of the logic and the blocking rule with the side-condition that s and t (representing two objects) are maximally expanded and indistinguishable with respect to the properties that hold for them (this exploits the Leibniz principle for all positive units with a unary predicate symbol, which suffices for \(\mathcal{ALC}\) and also S4). This achieves the same behaviour as loop-checking based on subset (or equality) blocking, which is sufficient to decide these logics. In [31, 53] the relationship is extended to a large class of modal logics (including the standard ones between multi-modal versions of modal logic K and S5), and used as a common setting for an empirical evaluation of ordered resolution refinements and BUMG-based refinements. The link is also used to define tableau systems for novel extended modal logics defined over Kripke frames closed under relational intersection, union and converse of relations.

$$\begin{aligned} { \begin{array}[t]{@{}c@{}}M\\ \hline M,\;s \approx t \end{array} } \end{aligned}$$

Blocking as introduced in this paper and previously in [16], is achieved by transforming the input clause set and extending it with blocking clauses. In semantic tableau systems unrestricted blocking can be formulated as an analytic cut rule expanded using a left-to-right strategy [79, 82].

$$\begin{aligned} { \begin{array}[t]{@{}c@{}}M\\ \hline M,\;s\approx t \quad \mid \quad M,\;s\not \approx t \end{array} } \end{aligned}$$

^{13}The rule asserts the equality \(s \approx t\) of two objects (labels) and if this does not lead to finding a model (e.g., all the branches from this point onwards can be closed), then \(s \not \approx t\) must hold, which is achieved by normal disjunctive backtracking. Adding this rule to any sound and complete inference system preserves soundness and completeness. The same is true for restricted versions of the rule [56]. It has been shown that if a logic admits finite filtration or has the (effective) finite model property any semantic tableau system extended with this mechanism provides a decision procedure for the logic [80, 82]. These results provide the theoretical foundation for a generic platform in which tableau decision procedures can be built in a systematic and uniform way and other loop-checking mechanisms can be simulated [56, 76, 81]. The unrestricted blocking mechanism has been implemented in MetTeL tableau prover generator [90, 91].We argue the key principles and ideas of this platform extend to most, if not all, methods in the BUMG paradigm. Using unrestricted blocking, either as a transformation into clausal form, or as an inference rules as given above, and small adjustments to the tableau generation methodologies [76, 80, 81], the effort to develop terminating BUMG methods for decidable, first-order representable logics should not be considerable.

Unrestricted blocking has been used to define ground semantic tableau systems for various concrete cases of deciding: the description logics \(\mathcal{ALCO}\), \(\mathcal{ALCO}\) with transitive roles, \(\mathcal{ALBO}\) [79, 80] and \(\mathcal {ALBO\smash [t]{{}^{\textsf {id}}}}\) [82], intuitionistic logic [81], interrogative epistemic logics [67], modal logics with counting quantifiers [97], the description logic \(\mathcal{SHOI}\) [56], a bi-intuitionistic logic [87] and a related multi-modal logic [78]. The evaluations in [56, 78, 87] of different variations of unrestricted blocking on description logic ontologies and modal logic problems complement those in the present paper.

Reuse of domain terms in the handling of existentially quantified formulae is common in automated model building systems for enumerating finite models. In tableau systems it can be found in the form of an adapted and extended \(\delta \)-rule (i.e., a \(\delta ^*\)-rule) [23, 44, 61, 70]. The \(\delta ^*\)-rule has the property that it avoids the use of non-constant Skolem functions and does not require the inclusion of equality in a BUMG system. The effect of \(\delta ^*\)-rule can be achieved with (equality-based) unrestricted blocking, but the reverse does not hold.

The nineties saw presentations of hyperresolution decision procedures (without splitting or backtracking) for various classes, which can be used for automating model building [26, 33]. This work developed ways of representing models in compact form as atom representations and sometimes allows infinite models to be presented finitely. Completeness for finite model generation is achieved through enhancing the approach with an enumeration technique where the idea is reminiscent of the \(\delta ^*\)-rule, but splitting is not necessarily performed [71]. A more powerful method with the functionality to compute finite representations of infinite models is described in [27].

Methods such as the extended PUHR tableau method [23], and the method for geometric logic [17] directly search for a finite model. They can be seen as BUMG methods presented as tableau calculi with a sophisticated inference rule for existential quantification. Whenever a witness term (technically, a constant) for an existentially quantified variable is sought, the idea is to attempt to re-use an existing one, to keep the domain small (and where possibly finite).

Other methods for model computation search for small finite models, by essentially searching the space of interpretations with increasing domain sizes \(1,2,\ldots \), in increasing order, until a model is found. The SEM-family [66, 85, 98] of procedures do that by essentially built-in exhaustive search over the interpretations for a fixed domain size. Model builders such as ANL-DP in the old style of MACE [65], including for example the method of [29], reduce model search to testing of propositional satisfiability. The approach of [10] is conceptually similar but translates into the Bernays–Schönfinkel fragment of first-order logic instead. The satisfiability test is delegated to an instance-based method (see e.g. [8, 57]), which generally decide satisfiability of that class.

Being based on a translation, the MACE-style approach is conceptually related, but different to our approach, as the translation to propositional logic is parameterized by a domain size. In our approach there is no a priori requirement for iterative deepening over the domain size, and the search for finite models works differently, by making terms equal as needed or as specified by the blocking strategy.

^{14}This way, we can address a problem often found with models computed by these methods: from a pragmatic perspective, they tend to identify too many terms. For instance, for the two unit clauses \(\mathsf{P}(\mathsf{a})\) and \(\mathsf{Q}(\mathsf{b})\) there is a model that identifies \(\mathsf{a}\) and \(\mathsf{b}\) with the same object. Such models can be counter-intuitive, for instance, in a description logic setting where unique names are not necessarily specified or formally assumed, but it would be unnatural to equate \(\mathsf{a}\) and \(\mathsf{b}\), if \(\mathsf{a}\) represents a person, Alice say, and \(\mathsf{b}\) represents an illness, bronchitis say.

^{15}Furthermore, logic programs are typically understood with respect to Herbrand semantics, and it is desirable to develop compatible model building techniques. Our transformations are more careful at identifying objects than the methods mentioned and thus work closer to a Herbrand semantics.Winker [96] is the first work that we are aware of, which uses hyperresolution in combination with the classical range-restriction transformation to generate ground instantiations of sets of equations (in this case). As an extra ‘trick’ the user is allowed to successively add equations \(s\approx t\) considered suitable to limit the number of instantiations and find small models by adding clauses of the formwhere \(x_1, \ldots , x_n\) are the variables occurring in \(s\approx t\), to the problem set. Using these tricks [96] shows that first-order theorem provers can be used to solve open problems of axiom independence of ternary Boolean algebras.

$$\begin{aligned} \mathsf{dom}(x_1) \wedge \cdots \wedge \mathsf{dom}(x_n) \rightarrow s \approx t, \end{aligned}$$

Tammet [34, Ch. 7] found a special condition and automated method of extending clause sets with equations between terms of the Herbrand universe (i.e., ground terms) to decide the union of the Ackermann class and the monadic class using resolution and narrowing without backtracking.

Blocking has the same goal as the speculative inference method described in [62] and [21] (with improvements). The idea of allowing the speculative addition of clauses (which are not consequences) is incorporated into a DPLL\((\Gamma + \mathcal {T})\) framework combining superposition and SMT solving [21]. Is is shown that two type systems relevant for software engineering (given by range-restricted theories) can be decided in this framework, when negative selection is used for resolution and superposition. Instances exemplified in [62] include replacing a clause by one that subsumes it, and adding equations for joining equivalence classes in the abstract congruence closure framework. A crucial difference in [21] to [62] is the ability to remove a speculatively added clause via native backtracking of DPLL, when this clause leads to the derivation of a contradiction. This variation means the approach of [21] is a sound blocking mechanism, just like our blocking mechanism.where \(\mathcal{B}\) is the appropriate constraint (cf. Sect. 7). That is, a new predicate symbol \(\mathsf{nonequal}\) is used instead of the predicate symbol \(\not \approx \) and no definitional clause is included. By not including a definition for \(\mathsf{nonequal}(x,y)\) essentially the same effect is achieved: an equation \(s\approx t\) is asserted and if this does not lead to finding a model the right branch is created containing the unit \(\mathsf{nonequal}(s,t)\).

^{16}A small difference is that our blocking mechanism replaces the speculated clause by its negation (in general, some caution is needed to avoid unneeded overhead, but if the case analysis is over ground clauses there is not an issue). In our setting, the kind of blocking behaviour as in [21] can be achieved with a small adjustment in the definition of the blocking transformations. In particular, only this clause is used in the respective ‘equality case analysis’ steps:$$\begin{aligned} \mathcal{B}\rightarrow {}&x\approx y \vee \mathsf{nonequal}(x,y), \end{aligned}$$

^{17}These clauses prevent blocking to s and t being applied again and they record the unsuccessful blocking attempts between terms undertaken during the derivation. The essence of what is exploited here is that the negated formula in an (analytic) cut rule does not need to be expanded if the cut rule is not essential for completeness.## 12 Concluding Remarks

We have presented and tested a number of enhancements for BUMG methods. An important aspect is that our enhancements exploit the strengths of readily available BUMG systems with only modest modifications. Our range-restriction technique is a refinement of existing transformations to range-restricted clauses in that terms are added to the domain of interpretation on a by-need basis. Moreover, we have presented methods that allow us to extend BUMG methods with blocking techniques related to loop-checking techniques with a long history in the more specialized setting of modal and description logics.

The experimental evaluation has shown blocking techniques are indispensable in BUMG methods for satisfiable problems. In particular, unrestricted domain blocking turned out to be the most powerful technique on problems from the TPTP library. Limiting the creation of terms during the inference process by using the new range-restricting transformation paid off, leading to better results. It showed best results together with the shifting transformation. The experimental results however also show that classical range restriction together with unrestricted blocking is a good complementary method. Because model generation methods are not just aimed at showing the existence of models but are built to construct and return models, when no models exist the entire search space must be traversed, which has led to inferior performance compared to saturation-based resolution.

Our bottom-up model generation approach is especially suitable for generating small models and we claim it is possible to show the approach using unrestricted domain blocking allows us to compute finite models when they exist (by adapting the proof of [81]). The models produced by subterm blocking and predicate blocking are not as small as those produced by unrestricted domain blocking. The generated models do not need to be Herbrand models. It follows from how the transformations work that the generated models are quasi-Herbrand models, in the following sense. Whenever \(\mathsf{dom}(s)\) and \(\mathsf{dom}(t)\) hold in the (Herbrand) model constructed by the BUMG method, then (as in Herbrand interpretations) the terms s and t are mapped to themselves in the associated (possibly non-Herbrand) model. Using subterm blocking or predicate blocking for the two unit clauses \(\mathsf{P}(\mathsf{a})\) and \(\mathsf{Q}(\mathsf{b})\), the associated model maps \(\mathsf{a}\) and \(\mathsf{b}\) to themselves (without merging them). More informative models are produced than those computed by unrestricted domain blocking and, for example, MACE- and SEM-style finite model searchers. From an applications perspective, this can be an advantage because larger models are more likely to be helpful to a user debugging mistakes in the formal specification of a program or protocol, or an ontology engineer trying to discover why an expected entailment does not follow from an ontology.

Research on developing resolution decision procedures has concentrated on developing ordering refinements of resolution for deciding solvable fragments of first-order logic. Fragments decidable with ordered resolution are complementary to the fragments that can be decided by refinements using the techniques presented in this paper. We have thus extended the set of techniques available for resolution methods to turn them into more effective and efficient (terminating) automated reasoning methods. In particular, we have shown that all procedures based on hyperresolution, or BUMG methods, can decide the Bernays–Schönfinkel class and the class of BS clauses with equality.

Studying how well the ideas and techniques discussed in this paper can be exploited and behave in dedicated BUMG provers, tableau-based provers and other provers (including resolution-based provers) is very important but is beyond the scope of the present paper. Our experience with another prover, Darwin [11], was encouraging. An in-depth comparison and analysis of BUMG approaches with our techniques and MACE-style or SEM-style model generation would also be of interest. Another source for future work is to combine the presented transformations with other BUMG techniques, such as magic sets transformations [43, 88], a typed version of range-restriction [15], and minimal model computation [23, 24, 69]. Having been designed to be generic, we believe that our transformations carry over to formalisms with default negation, which could provide a possible basis for enhancements to answer-set programming systems.

## Acknowledgements

The second author is very grateful to Christoph Weidenbach and Uwe Waldmann for hosting her during 2010 and 2013–2014. In this time the implementation of SPASS-Yarralumla was completed and the experimental evaluation undertaken on the cluster of the Max-Planck-Institut für Informatik, Saarbrücken. We thank Uli Furbach, Dmitry Tishkovsky, Uwe Waldmann and Christoph Weidenbach for useful discussions and comments on this research, and extend our thanks to the anonymous reviewers for useful comments on the submission. Financial support through research Grants EP/F068530/1 and EP/H043748/1 of the UK Engineering and Physical Sciences Research Council (EPSRC) is gratefully acknowledged.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.