1 Introduction
One motivation for this work stems from Description Logic (DL) [1], where constant symbols (called individual names) are used within knowledge bases to denote objects or individuals in an application domain. If such objects are composed of other objects, it makes sense to represent them as (ground) terms rather than constants. For example, the couple consisting of individual a in the first component and individual b in the second component is more reasonably represented by the term f(a, b) (where f is a binary function symbol denoting the couple constructor) than by a third constant c that is unrelated to a and b. In fact, if we have two couples, one consisting of a and b and the other of \(a'\) and \(b'\), and we learn (by DL reasoning or from external sources) that a is equal to \(a'\) and b is equal to \(b'\), then this automatically implies that f(a, b) is equal to \(f(a',b')\), i.e., that this is one and the same couple, whereas we would not obtain such a consequence if we had introduced constants c and \(c'\) for the two couples.
If we use terms to represent objects, and can learn (e.g., by DL reasoning) that two terms are supposed to be equal, we need to be able to decide which other identities between terms can be derived from the given ones. Fortunately, this problem (usually called the word problem for ground identities) is decidable in polynomial time. The standard approach for deciding this word problem is congruence closure [2‐5]. Basically, congruence closure starts with the given set of ground identities E, and then extends it using closure under reflexivity, symmetry, transitivity, and congruence. The set \( CC (E)\) obtained this way is usually infinite, and the main observation that yields decidability in polynomial time is that one can restrict it to the subterms of E and the subterms of the terms for which one wants to decide the word problem. An alternative approach for deciding the word problem for ground identities is based on term rewriting. Basically, in this approach one generates an appropriate canonical term rewriting system from E, and then decides whether two terms are equal modulo the theory E by computing their canonical forms and checking whether they are syntactically equal. This was implicit in [6], and made explicit in [7] (see also [8, 9] for other rewritingbased approaches).
Anzeige
In the motivating example from DL, but also in other settings where congruence closure is employed (such as SMT [10, 11]), it sometimes makes sense to assume that certain function symbols satisfy additional semantic properties that are not expressible by ground identities. Such symbols are then often called interpreted symbols, in contrast to the uninterpreted ones, for which no such properties are required. For example, one may want to consider couples where the order of the components is irrelevant, which means that the couple constructor function is commutative. Another interesting property for (ordered) couples is extensionality: if two couples are equal then they must have the same first and second components, i.e., the couple constructor f must satisfy the extensionality axiom \(f(x,y) \mathrel {\approx }f(x',y') \Rightarrow x\mathrel {\approx }x' \wedge y\mathrel {\approx }y'\). While it is wellknown that adding commutativity does not increase the complexity (see, e.g., [3, 12]), extensionality has, to the best of our knowledge, not been considered in this context before. The problem with extensionality is that it allows us to derive “small” identities from larger ones. Consequently, it is conceivable that one first needs to generate such large identities using congruence and other rules, before one can get back to a smaller one through the application of extensionality. Thus, it is not obvious that also with extensionality one can restrict congruence closure to a finite set of terms determined by the input.
Here, we will tackle this problem using a rewritingbased approach. Our proofs imply that, also with extensional symbols, proofs of identities that detour through “large” terms can be replaced by proofs using only “small” terms, but it is not clear how this could be shown directly without the rewritingbased approach introduced in this paper.
The conference version of this paper [13] shows how the rewritingbased approach of [7] can be extended to also handle commutative symbols. In contrast to approaches that deal with associativecommutative (AC) symbols [14, 15] using rewriting modulo AC, commutativity is treated by introducing an additional ground rewrite system \(R(\varSigma _c)\) consisting of all appropriately ordered ground instances of commutativity. In Sect. 3, we extend this approach from commutativity to more general shallow identities [16] of the form \(h(x_1,\ldots ,x_n) \mathrel {\approx }h(y_1,\ldots ,y_n)\) where \(\{x_1,\ldots ,y_n\}=\{y_1\ldots ,y_n\}\) is a set of (not necessarily distinct) variables. Although it deals with a more general setting, we were able to design a simpler congruence closure algorithm. Basically, it applies the classical rewritingbased congruence closure for uninterpreted function symbols. Shallow identities of the above form, called strongly shallow in this paper, are dealt with by reducing the lefthand sides of the generated rules using an additional ground rewrite system induced by the strongly shallow identities. This ground system is infinite, but finitely represented by an ordered rewrite system [17]. No other interaction between the ground identities and the strongly shallow identities is needed.
In [13] it was shown how to extend the commutative congruence closure algorithm developed there to deal also with (noncommutative) extensional symbols. In Sect. 4, we not only extend this approach from commutativity to strongly shallow identities. Instead, we exhibit general properties that the canonical rewrite system computed by a semantic congruence closure algorithm must satisfy such that extensionality for otherwise uninterpreted function symbols can be handled by a simple iterative algorithm that derives new identities between constants due to extensionality. As already observed in [13], it does not make sense to consider symbols that are both commutative and extensional since this makes the induced equational theory trivial. Instead, we introduce, in [13] and in Sect. 5 of the present paper, the notion of cextensionality, which is more appropriate for commutative symbols, but increases the complexity of the word problem from polynomial to coNPcomplete. Section 6 gives a more detailed description of related work.
Anzeige
2 Preliminaries
We assume that the reader is familiar with basic notions and results regarding equational theories, universal algebra, and term rewriting. In this section, we first recall the relevant notions and results regarding equational theories and term rewriting, but refer the readers to [5] for details. We will keep as close as possible to the notation introduced in [5]. In particular, we use \(\mathrel {\approx }\) to denote identities between terms and \(=\) to denote syntactic equality. Then, we introduce congruence closure, both without and with interpreted function symbols. Finally, we define our notion of strongly shallow identities, and show how such identities give rise to an ordered rewrite system that is canonical on ground terms.
2.1 Equational Theories and Term Rewriting
Terms are built as usual from variables, constants, and function symbols. A term of the form \(f(t_1,\ldots ,t_n)\) for an nary function symbol f is said to have root symbol f. An identity is a pair of terms (s, t), which we usually write as \(s\mathrel {\approx }t\). A ground term is a term not containing variables and a ground identity is a pair of ground terms. Given a set of (not necessarily ground) identities F, the equational theory induced by F is defined semantically as \( {\mathrel {\approx }_F} := \{ s\mathrel {\approx }t \mid \text{ every } \text{ model } \text{ of } F \text { is a model of} s\mathrel {\approx }t\}. \) The notion of model used here is the usual one from firstorder logic, where the equality symbol is interpreted as the equality relation on the domain and we assume that identities are (implicitly) universally quantified. Since we consider signatures consisting only of constant and function symbols, we call firstorder interpretations algebras.
Birkhoff’s theorem (see Theorem 4.5.14 in [5]) provides us with an alternative syntactic characterization of \(\mathrel {\approx }_F\) that is based on rewriting. A given set of identities F induces a binary relation \(\rightarrow _F\) on terms. Basically, we have \(s \rightarrow _F t\) if there is an identity \(\ell \mathrel {\approx }r\) in F such that s contains a substitution instance \(\sigma (\ell )\) of \(\ell \) as subterm, and t is obtained from s by replacing this subterm with \(\sigma (r)\). Birkhoff’s theorem says that \(\mathrel {\approx }_F\) is identical to \(\mathrel {\displaystyle \mathop {\leftrightarrow }^{*}}_F\), where \(\mathrel {\displaystyle \mathop {\leftrightarrow }^{*}}_F\) denotes the reflexive, transitive, and symmetric closure of \(\rightarrow _F\).
The normal form of a term s is an irreducible term \({\widehat{s}}\) such that \(s \mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F {\widehat{s}}\), where \(\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F\) denotes the reflexive and transitive closure of \(\rightarrow _F\) and \({\widehat{s}}\) is irreducible if there is no \(s'\) with \({\widehat{s}}\rightarrow _F s'\). If \(\rightarrow _F\) is canonical (i.e., terminating and confluent), then every term has a unique normal form, which we then called its canonical form. In addition, \(s \mathrel {\displaystyle \mathop {\leftrightarrow }^{*}}_F t\) iff s and t have the same canonical forms. Termination ensures that the normal form exists and confluence that it is unique. The relation \(\rightarrow _F\) is confluent if \(s\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t_1\) and \(s\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t_2\) imply that the pair of terms \(\langle t_1,t_2\rangle \) is joinable, i.e., there is a term t such that \(t_1\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t\) and \(t_2\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t\). It is terminating if there is no infinite chain \(t_0\rightarrow _F t_1\rightarrow _F t_2\rightarrow _F\cdots \). Termination can be proved using a reduction order, which is a wellfounded order > on terms such that \(\ell > r\) for all \(\ell \mathrel {\approx }r\in F\) implies \(s > t\) for all terms s, t with \(s\rightarrow _F t\). Since > is wellfounded this then implies termination. If \(\rightarrow _F\) is terminating, then confluence can be tested by checking whether all critical pairs of F are joinable. Basically, critical pairs \(\langle t_1,t_2\rangle \) consider the most general “forks” of the form \(s\rightarrow _F t_1\) and \(s \rightarrow _F t_2\) that are due to overlapping lefthand sides of identities. They can be obtained by computing the most general unifiers of lefthand sides of rules with subterms of renamed variants of lefthand sides of other or the same rule. To be more precise, if \(\ell \mathrel {\approx }r\) and \(g\mathrel {\approx }d\) are identities (whose variables have been renamed apart) such that a nonvariable subterm \(\ell '\) of \(\ell \) unifies with g with most general unifier \(\sigma \), then this overlap induces the critical pair \(\langle t_1,t_2\rangle \) where \(t_1 = \sigma (r)\) and \(t_2\) is obtained from \(\sigma (\ell )\) by replacing the subterm \(\sigma (\ell ') = \sigma (g)\) with \(\sigma (d)\). Such a critical pair \(\langle t_1,t_2\rangle \) is joinable if there is a term t such that \(t_1\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t\) and \(t_2\mathrel {\displaystyle \mathop {\rightarrow }^{*}}_F t\).
When considering the relation \(\rightarrow _F\), one calls F a term rewriting system rather than a set of identities, and writes its elements (called rewrite rules) as \(\ell \rightarrow r\) instead of \(\ell \mathrel {\approx }r\). From a formal point of view, however, both rewrite rules and identities are (ordered) pairs of terms. Given a set of such pairs, we can view it as a set of identities or a term rewriting system, and thus the notions introduced above apply to both.
2.2 Congruence Closure
Let \(\varSigma \) be a finite set of function symbols of arity \(\ge 1\) and \(C_0\) a finite set of constant symbols. We denote the set of ground terms built using symbols from \(\varSigma \) and \(C_0\) with \(G(\varSigma ,C_0)\). In the following, let E be a finite set of ground identities \(s \mathrel {\approx }t\) between terms \(s, t\in G(\varSigma ,C_0)\), and \(\mathrel {\approx }_E\) the equational theory induced by E on \(G(\varSigma ,C_0)\).
2.2.1 Classical Congruence Closure
It is well known (see, e.g., [5], Lemma 4.3.3) that \(\mathrel {\approx }_E\) (viewed as a subset of \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\)) can be generated using congruence closure, i.e., by exhaustively applying reflexivity, transitivity, symmetry, and congruence to E. To be more precise, \( CC (E)\) is the smallest subset of \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) that contains E and is closed under the following rules:The set \( CC (E)\) is usually infinite. To obtain a decision procedure for the word problem for E (i.e., for the relation \(\mathrel {\approx }_E\)), one can show that it is sufficient to restrict the application of the above rules to a finite subset of \(G(\varSigma ,C_0)\), which consists of the subterms of terms occurring in E and of the subterms of the terms for which one wants to decide whether
holds or not (see, e.g., [5], Theorem 4.3.5).

if \(s\in G(\varSigma ,C_0)\), then \({s\mathrel {\approx }s}\in CC (E)\) (reflexivity);

if \({s_1\mathrel {\approx }s_2}, {s_2\mathrel {\approx }s_3}\in CC (E)\), then \({s_1\mathrel {\approx }s_3}\in CC (E)\) (transitivity);

if \({s_1\mathrel {\approx }s_2}\in CC (E)\), then \({s_2\mathrel {\approx }s_1}\in CC (E)\) (symmetry);

if f is an nary function symbol and \({s_1\mathrel {\approx }t_1},\ldots , {s_n\mathrel {\approx }t_n}\in CC (E)\), then \(f(s_1,\ldots ,s_n) \mathrel {\approx }f(t_1,\ldots ,t_n)\in CC (E)\) (congruence).
2.2.2 Semantic Congruence Closure
As mentioned in the introduction, it sometimes makes sense to assume that certain function symbols are interpreted in the sense that they satisfy additional semantic properties that are not expressible by ground identities. Thus, we now consider a setting where there is a subset \(\varSigma _F \) of \(\varSigma \) and a set of (nonground) identities F formulated using the function symbols in \(\varSigma _F \) and variables. Given a finite set E of ground identities \(s \mathrel {\approx }t\) between terms \(s, t\in G(\varSigma ,C_0)\), we assume that also the identities in F are satisfied. This means that we consider algebras \({\mathcal {A}}\) that satisfy not only the ground identities in E, but also the universal closure of the nonground identities in F, i.e., algebras \({\mathcal {A}}\) that are models of \(E\cup F\).
Given \(s, t\in G(\varSigma ,C_0)\), we say that \(s\mathrel {\approx }t\) follows from E w.r.t. F (written \(s \mathrel {\approx }_E^{F}t\)) if \(s^{\mathcal {A}} = t^{\mathcal {A}} \) holds in all models \({\mathcal {A}} \) of \(E\cup F\). Thus, \(\mathrel {\approx }_E^{F}\) is the restriction of \(\mathrel {\approx }_{E\cup F}\) to the ground terms in \(G(\varSigma ,C_0)\). By a slight abuse of notation, we will later use \(\mathrel {\approx }_E^{F}\) also to denote the restriction of \(\mathrel {\approx }_{E\cup F}\) to \(G(\varSigma ,C)\) for a set of constants C containing \(C_0\).
The relation \({\mathrel {\approx }_E^{F}}\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) can again be generated by extending congruence closure by an additional rule that deals with the identities in F, i.e., \( CC ^{F} (E)\) is the smallest subset of \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) that contains E and is closed under reflexivity, transitivity, symmetry, congruence, and the following rule:We call \( CC ^{F} (E)\) the congruence closure of E w.r.t. F. Birkhoff’s theorem again shows that \( CC ^{F} (E)\) coincides with \({\mathrel {\approx }_E^{F}}\).

if \({\ell \mathrel {\approx }r} \in F\) and \(\sigma \) is a substitution that maps the variables in \(\ell , r\) to elements of \(G(\varSigma ,C_0)\), then \({\sigma (\ell ) \mathrel {\approx }\sigma (r)} \in CC ^{F} (E)\) (Fidentities).
Without restrictions on the set of identities F, the word problem for E w.r.t. F (i.e., the relation \({\mathrel {\approx }_E^{F}}\)) need not be decidable. For example, if F axiomatizes associativity of a binary function symbol f, then this word problem corresponds to the word problem for finitely presented semigroups, which is known to be undecidable [18]. Thus, one needs to impose some restrictions on the set of identities F to guarantee decidability of the word problem.
2.3 Strongly Shallow Identities and Ordered Rewrite Systems
In [13], we considered the setting where \(\varSigma _F \) (denoted as \(\varSigma _c\) in [13]) contains only binary function symbols and F consists of the commutativity axioms \(f(x,y)\mathrel {\approx }f(y,x)\) for all \(f\in \varSigma _F \). In the present paper, we extend the results of [13] from commutativity to more general kinds of semantic properties of function symbols that can be expressed by strongly shallow identities. Note that the notion of strongly shallow identities introduced in this paper is more restrictive than what is often called “shallow” in the literature [16] (see Sect. 6 for more details).
Definition 1
Let \(\varSigma _S \subseteq \varSigma \) be a set of function symbols and \(h\in \varSigma _S \) a symbol in \(\varSigma _S \) of arity n. A strongly shallow hidentity is of the form \(h(x_1,\ldots ,x_n)\mathrel {\approx }h(y_1,\ldots ,y_n)\) where \(x_1,y_1,\ldots ,x_n,y_n\) are not necessarily distinct variables such that \(\{x_1,\ldots ,x_n\} = \{y_1,\ldots ,y_n\}\). The set of identities S is a set of strongly shallow identities for \(\varSigma _S \) if \(S = \bigcup _{h\in \varSigma _S}S_h\), where \(S_h\) is a nonempty set of strongly shallow hidentities for all \(h\in \varSigma _S \).
Example 1
Let f, g, h respectively be a binary, an nary, and a ternary function symbol (for \(n>2\)), and a a constant symbol.
1.
The identity \(f(x,y)\mathrel {\approx }f(y,x)\) stating that f is commutative is a strongly shallow fidentity.
2.
The identities \(g(x_1,\ldots ,x_i,x_{i+1},\ldots ,x_n)\mathrel {\approx }g(x_1,\ldots ,x_{i+1},x_{i},\ldots ,x_n)\) for distinct variables \(x_1,\ldots ,x_n\) and \(1\le i < n\) are strongly shallow gidentities. We call such an identity a gtransposition.
3.
The identity \(h(x,x,y)\mathrel {\approx }h(x,y,y)\) is a strongly shallow hidentity.
4.
The following identities are not strongly shallow according to our definition, though they are shallow according to the definition in [16]: \(f(x,x) \mathrel {\approx }x\), \(f(x,y)\mathrel {\approx }h(x,y,x)\), \(f(x,y) \mathrel {\approx }f(x,x)\), \(f(x,a)\mathrel {\approx }f(a,x)\).
The word problem for a finite set of strongly shallow identities (i.e., the relation \(\mathrel {\approx }_S\)) can easily be shown to be decidable using rewriting. However, the classical notion of a canonical rewrite system, as introduced above, cannot be used for this purpose. The reason is that strongly shallow identities, like commutativity or transposition identities, usually cannot be oriented into terminating rewrite rules. However, we can orient their ground instances into terminating rules using an appropriate reduction order.
Given a finite set of constant symbols \(C\supseteq C_0\), we denote the set of ground terms built using symbols from \(\varSigma \) and C with \(G(\varSigma ,C)\). Ground instances of terms are obtained by applying ground substitutions, which replace the variables in the given terms with elements of \(G(\varSigma ,C)\).
Definition 2
(Ordered rewrite system [17]) Given a finite set of (nonground) identities F and a reduction order >, we call the pair \((F,{>})\) an ordered rewrite system. The ground rewrite system \(R(F,{>})\) induced by \((F,{>})\) is defined aswhere \(F^{1} = \{t\mathrel {\approx }s\mid {s\mathrel {\approx }t}\in F\}\).
$$\begin{aligned} \{\theta (s)\rightarrow \theta (t) \mid s\mathrel {\approx }t\in F\cup F^{1},\ \theta \ \text{ is } \text{ a } \text{ ground } \text{ substitution, } \text{ and }\ \theta (s)>\theta (t)\}, \end{aligned}$$
Since > is assumed to be a reduction order, \(R(F,{>})\) is clearly terminating. The system \(R(F,{>})\) is usually infinite. However, if F is finite and > decidable, then one can effectively decide whether some rule in \(R(F,{>})\) applies to a given ground term, and compute a term resulting from the application of this rule. In particular, this implies that we can effectively compute a normal form of a given ground term w.r.t. the ground rewrite system \(R(F,{>})\). Note, however, that this normal form need not be unique since there is no guarantee that this system is confluent. We say that the ordered rewrite system \((F,{>})\) is canonical on ground terms if \(R(F,{>})\) is confluent.
As a first example, we consider commutativity, i.e., the setting where \(F_c\) consists of the commutativity axioms for the (binary) function symbols in \(\varSigma _c \subseteq \varSigma \). Instead of \(\varSigma _F \) and F we use the symbols \(\varSigma _c \) and \(F_c\) to indicate already in the notation that we now consider commutativity. Let \(\mathrel {>_{ lpo }}\) be the lexicographic path order^{1} induced by a linear order on \(\varSigma \cup C\). Note that \(\mathrel {>_{ lpo }}\) is then a linear order on \(G(\varSigma ,C)\) (see Exercise 5.20 in [5]). The ground rewrite system induced by \((F_c,{\mathrel {>_{ lpo }}})\) is of the formand it is easy to see that it is confluent (this is an instance of Lemma 4 in [13], and also follows from what we show for strongly shallow identities below). Thus, \((F_c,{\mathrel {>_{ lpo }}})\) is canonical on ground terms.
$$\begin{aligned} R(F_c,{\mathrel {>_{ lpo }}}) = \{ f(s,t)\rightarrow f(t,s) \mid f\in \varSigma _c,\ s, t\in G(\varSigma ,C),\ \text{ and }\ s \mathrel {>_{ lpo }}t\}, \end{aligned}$$
Now, assume \(\varSigma _S \subseteq \varSigma \) and let S be a set of strongly shallow identities for \(\varSigma _S \). In general, the ground rewrite system \(R(S,{\mathrel {>_{ lpo }}})\) is not confluent. For example, assume that \(h\in \varSigma _S \) is ternary and \(S_h = \{h(x,y,z)\mathrel {\approx }h(x,z,y), h(x,x,y)\mathrel {\approx }h(x,y,y)\}\). If the linear order > on \(\varSigma \cup C\) that induces \(\mathrel {>_{ lpo }}\) satisfies \(a>b\), then the ground term h(a, a, b) rewrites with \(R(S,{\mathrel {>_{ lpo }}})\) both to h(a, b, b) and h(a, b, a). Since these terms are irreducible w.r.t. \(R(S,{\mathrel {>_{ lpo }}})\), this shows that \(R(S,{\mathrel {>_{ lpo }}})\) is not confluent.
To obtain an ordered rewrite system that is canonical on ground terms, we extend S by all implied strongly shallow identities, i.e., we consider the setFiniteness of this set (up to renaming of variables) is an easy consequence of the fact that such identities can contain at most n different variables. Thus, we can fix an arbitrary set \(V_n\) of n different variables and assume without loss of generality that the variables \(x_1,y_1,\ldots ,x_n,y_n\) belong to \(V_n\). In addition, it is an easy consequence of the restricted form of strongly shallow identities that it is decidable whether an identity of this form follows from S: starting with \(h(x_1,\ldots ,x_n)\) one can simply enumerate the finitely many terms that can be obtained by iterated application of the identities in \(S_h\).
$$\begin{aligned} S^* = \{\{h(x_1,\ldots ,x_n)\mathrel {\approx }h(y_1,\ldots ,y_n) \mid h\in \varSigma _S, h(x_1,\ldots ,x_n)\mathrel {\approx }_{S_h} h(y_1,\ldots ,y_n)\}. \end{aligned}$$
Thus, \((S^*,{\mathrel {>_{ lpo }}})\) is a finite ordered rewrite system that can effectively be computed. Due to the finiteness of \(S^*\) and decidability of \(\mathrel {>_{ lpo }}\) (Proposition 5.4.16 in [5]), checking whether a rule of \(R(S^*,{\mathrel {>_{ lpo }}})\) is applicable and actually applying it can effectively be done. The following lemma shows that \((S^*,{\mathrel {>_{ lpo }}})\) is canonical on ground terms.
Lemma 1
The ground rewrite system \(R(S^*,{\mathrel {>_{ lpo }}})\) is canonical and equivalent to S on \(G(\varSigma ,C)\), i.e., \(s\mathrel {\approx }_S t\) iff \(s \mathrel {\approx }_{R(S^*,{\mathrel {>_{ lpo }}})} t\) holds for all terms \(s,t\in G(\varSigma ,C)\).
Proof
Termination of \(R(S^*,{\mathrel {>_{ lpo }}})\) is an immediate consequence of the fact that \(\mathrel {>_{ lpo }}\) is a reduction order. Confluence is an easy consequence of the critical pair lemma for ordered rewrite systems in [17] (Lemma 2.1 and 2.2), but we prove it here explicitly for the sake of completeness.
Since \(R(S^*,{\mathrel {>_{ lpo }}})\) is terminating, it is sufficient to show that all critical pairs of this ground rewrite system are joinable. There are two cases to consider: an overlap at the root or a rewrite within the substitution part. If there is an overlap at the root, then there are rules \(h(\sigma (x_1),\ldots ,\sigma (x_n))\rightarrow h(\sigma (y_1),\ldots ,\sigma (y_n))\) and \(h(\theta (x_1'),\ldots ,\theta (x_n'))\rightarrow h(\theta (y_1'),\ldots ,\theta (y_n'))\) in \(R(S^*,{\mathrel {>_{ lpo }}})\) with identical lefthand sides, i.e., such that \(h(\sigma (x_1),\ldots , \sigma (x_n)) = h(\theta (x_1'),\ldots ,\theta (x_n'))\), which yields the critical pair:If we assume without loss of generality that the sets \(\{x_1,\ldots ,x_n\} =\{y_1,\ldots ,y_n\}\) and \(\{x_1',\ldots ,x_n'\} =\{y_1',\ldots ,y_n'\}\) are disjoint, then we can actually also assume that \(\sigma =\theta \). Consequently, this substitution is a unifier of the terms \(h(x_1,\ldots ,x_n)\) and \(h(x_1',\ldots ,x_n')\), and thus is an instance of the most general unifier (mgu) \(\delta \) of these two terms, i.e., there is a substitution \(\lambda \) such that \(\sigma = \lambda \delta = \theta \). Obviously, this mgu must map variables to variables, i.e., \(\delta (x_i) = \delta (x_i')\) is a variable for all \(i = 1,\ldots ,n\). But then \(h(x_1,\ldots ,x_n) \mathrel {\approx }_S h(y_1,\ldots ,y_n)\) and \(h(x_1',\ldots ,x_n') \mathrel {\approx }_S h(y_1',\ldots ,y_n')\) imply that \(h(\delta (y_1),\ldots ,\delta (y_n)) \mathrel {\approx }_S h(\delta (y_1'),\ldots ,\delta (y_n'))\). Since \(\mathrel {>_{ lpo }}\) is total on ground terms, this implies that one of the following two rules belongs to \(R(S^*,{\mathrel {>_{ lpo }}})\):
Since \(\sigma = \lambda \delta = \theta \), this shows that the critical pair (1) is joinable.
$$\begin{aligned} \langle h(\sigma (y_1),\ldots ,\sigma (y_n)), h(\theta (y_1'),\ldots ,\theta (y_n'))\rangle . \end{aligned}$$
(1)
A rewrite in the substitution part means that there is a rule of the form \(h(\sigma (x_1),\ldots ,\sigma (x_n))\rightarrow h(\sigma (y_1),\ldots ,\sigma (y_n))\) in \(R(S^*,{\mathrel {>_{ lpo }}})\) and an \(i, 1\le i\le n\), such that \(\sigma (x_i)\) can be rewritten to a term \(r_i\) using a rule from \(R(S^*,{\mathrel {>_{ lpo }}})\). The critical pair obtained this way isIf we modify the substitution \(\sigma \) to \(\sigma '\) by setting \(\sigma '(x_i) = r_i\), then both the left and the right component of the critical pair can be rewritten to the common term \(h(\sigma '(y_1),\ldots ,\sigma '(y_n))\) by \(R(S^*,{\mathrel {>_{ lpo }}})\). This finishes the proof that \(R(S^*,{\mathrel {>_{ lpo }}})\) is canonical.
$$\begin{aligned} \langle h(\sigma (x_1),\ldots ,\sigma (x_{i1}),r_i,\sigma (x_{i+1}),\ldots ,\sigma (x_n)), h(\sigma (y_1),\ldots ,\sigma (y_n))\rangle . \end{aligned}$$
By the definition of \(R(S^*,{\mathrel {>_{ lpo }}})\), every rule \(\ell \rightarrow r\) in \(R(S^*,{\mathrel {>_{ lpo }}})\) satisfies \(\ell \mathrel {\approx }_S r\), which shows that \(s \mathrel {\approx }_{R(S^*,{\mathrel {>_{ lpo }}})} t\) implies \(s\mathrel {\approx }_S t\) for all terms \(s,t\in G(\varSigma ,C)\). The other direction is an easy consequence of the fact that \(\mathrel {>_{ lpo }}\) is a linear order since this implies that every nontrivial^{2} substitution instance in \(G(\varSigma ,C)\) of a nontrivial identity in S occurs (appropriately ordered) as a rule in \(R(S^*,{\mathrel {>_{ lpo }}})\). \(\square \)
Coming back to the example illustrating that \(R(S,{\mathrel {>_{ lpo }}})\) need not be confluent, we note that, in this example, the identity \(h(x,y,y)\mathrel {\approx }h(x,y,x)\) belongs to \(S^*\), and thus the term h(a, b, a) rewrites to h(a, b, b) in \(R(S^*,{\mathrel {>_{ lpo }}})\).
3 Semantic Congruence Closure Based on Rewriting
Let \(\varSigma \) be a finite set of function symbols of arity \(\ge 1\) and \(C_0\) a finite set of constant symbols. In the following, let E be a finite set of ground identities \(s \mathrel {\approx }t\) between terms \(s, t\in G(\varSigma ,C_0)\). In addition, assume that \(\varSigma _S \subseteq \varSigma \) and let S be a set of strongly shallow identities for \(\varSigma _S \). Below, we show how to construct a canonical rewrite system that can be used to decide the word problem for E w.r.t. S. This system will be the union of \(R(S^*,\mathrel {>_{ lpo }})\) with a finite ground rewrite system \({\widehat{R}}^{S} (E)\), whose construction is described below.
3.1 Construction of the Rewrite System \({\widehat{R}}^{S} (E)\)
Let \( Sub (E)\) denote the set of subterms of the terms occurring in E. In a first step, we introduce a new constant \(c_s\) for every term \(s\in Sub (E)\setminus C_0\). To simplify notation, for a constant \(a\in C_0\) we sometimes use \(c_a\) to denote a. Let \(C_1\) be the set of new constants introduced this way and \(C := C_0\cup C_1\). We fix an arbitrary linear order > on C, which will be used to orient identities between constants into rewrite rules. Note that this order does not take into account which terms the constants correspond to, and thus we may well have \(c_s > c_{f(s)}\).
The initial rewrite system R(E) induced by E consists of the following rules:Obviously, the cardinality of \(C_1\) is linear in the size of E, and R(E) can be constructed in time linear in the size of E. From the above construction, it follows that R(E) has two types of rules: constant rules of the form \(c \rightarrow d\) for \(c > d\) and function rules of the form \(f(c_1, \ldots , c_n) \rightarrow d\).

If \(s\in Sub (E)\setminus C_0\), then s is of the form \(f(s_1,\ldots ,s_n)\) for an nary function symbol f and terms \(s_1,\ldots ,s_n\) for some \(n\ge 1\). For every such s we add the rule \( f(c_{s_1},\ldots ,c_{s_n}) \rightarrow c_s \) to R(E).

For every identity \({s\mathrel {\approx }t}\in E\) we add \(c_s \rightarrow c_t\) to R(E) if \(c_s > c_t\), and \(c_t \rightarrow c_s\) if \(c_t > c_s\).
Example 2
Consider \(E = \{ f(g(a),a) \mathrel {\approx }c, g(b) \mathrel {\approx }h(a), a \mathrel {\approx }b\}\) with \(\varSigma _S = \{f\}\) and \(S_f = \{f(x,y)\mathrel {\approx }f(y,x)\}\). It is easy to see that \(f(b,h(a)) \mathrel {\approx }_E^{S}c\).
Using our construction, we first introduce the constants \(C_1 = \{c_{f(g(a),a)}, c_{g(a)}, c_{g(b)}, c_{h(a)}\}\). If we fix the linear order on C as \(c_{f(g(a),a)}> c_{g(a)}> c_{g(b)}> c_{h(a)}> a> b > c\), then we obtain the following rewrite system: \(R(E) = \{ f(c_{g(a)},a) \rightarrow c_{f(g(a),a)}, g(a) \rightarrow c_{g(a)}, g(b) \rightarrow c_{g(b)}, h(a) \rightarrow c_{h(a)}, c_{f(g(a),a)} \rightarrow c, c_{g(b)} \rightarrow c_{h(a)}, a \rightarrow b \}\).
The following lemma is an easy consequence of the definition of R(E). It can be shown by a simple induction on the structure of the term s.
Lemma 2
For all terms \(s\in Sub (E)\) it holds that \(s \mathrel {\approx }_{R(E)} c_s\).
Using this lemma, we can show that the construction of R(E) is correct for consequence w.r.t. strongly shallow identities in the sense stated in the next lemma. Recall that E and \(\mathrel {\approx }_E^{S}\) are binary relations in \(G(\varSigma ,C_0)\), whereas R(E) and \(\mathrel {\approx }_{R(E)}^{S}\) are binary relations in \(G(\varSigma ,C)\) for \(C = C_0\cup C_1\).
Lemma 3
Viewed as a set of identities, w.r.t. S is a conservative extension of E w.r.t. S, i.e., iff
for all terms
.
Proof
To show the ifdirection, assume that
, i.e., there is an algebra \({\mathcal {A}}\) over the signature \(\varSigma \cup C_0\) satisfying all identities in \(E\cup S\) such that
. We expand \({\mathcal {A}} \) to the new constants in \(C_1\) by interpreting \(c_s\) as \(s^{\mathcal {A}} \) for all \(s\in Sub (E)\setminus C_0\), and call the expansion obtained this way \({\mathcal {B}} \). Since
, it is sufficient to show that \({\mathcal {B}}\) satisfies the identities in \(R(E)\cup S\). Since \({\mathcal {A}}\) satisfies S, its expansion obviously satisfies S as well. If \(f(c_{s_1},\ldots ,c_{s_n}) \rightarrow c_s\) belongs to R(E), then \(s = f(s_1,\ldots ,s_n)\). Consequently, \(f(c_{s_1},\ldots ,c_{s_n})^{\mathcal {B}} = f^{\mathcal {B}} (c_{s_1}^{\mathcal {B}},\ldots ,c_{s_n}^{\mathcal {B}}) = f^{\mathcal {A}} (s_1^{\mathcal {A}},\ldots ,s_n^{\mathcal {A}}) = s^{\mathcal {A}} = s^{\mathcal {B}} \), which shows that \({\mathcal {B}} \) satisfies the identity \(f(c_{s_1},\ldots ,c_{s_n}) \mathrel {\approx }c_s\). If \(c_s \rightarrow c_t\) belongs to R(E), then \({s\mathrel {\approx }t}\in E\). Consequently, \(c_s^{\mathcal {B}} = s^{\mathcal {A}} = t^{\mathcal {A}} = c_t^{\mathcal {B}} \), which shows that \({\mathcal {B}} \) satisfies the identity \(c_s \mathrel {\approx }c_t\).
To show the onlyifdirection, assume that
, i.e., there is an algebra \({\mathcal {B}}\) over the signature \(\varSigma \cup C\) satisfying all identities in \(R(E)\cup S\) such that
. Let \({\mathcal {A}}\) be the reduct of \({\mathcal {B}}\) to the signature \(\varSigma \cup C_0\), which is obtained from \({\mathcal {B}}\) by forgetting about the interpretation of the symbols in \(C_1\). Since
, it is sufficient to show that \({\mathcal {A}}\) satisfies the identities in \(E\cup S\). For the identities in S, this is again trivially true. Thus, consider an identity \({s\mathrel {\approx }t}\in E\). Then R(E) contains the rule \(c_s \rightarrow c_t\) or \(c_t \rightarrow c_s\). In addition, Lemma 2 yields \(s \mathrel {\approx }_{R(E)} c_s\) and \(t \mathrel {\approx }_{R(E)} c_t\). Consequently, \(s^{\mathcal {A}} = s^{\mathcal {B}} = c_s^{\mathcal {B}} = c_t^{\mathcal {B}} = t^{\mathcal {B}} = t^{\mathcal {A}} \) since \({\mathcal {B}}\) satisfies the identities in in R(E). \(\square \)
In Lemma 3, we consider the relation \(\mathrel {\approx }_{R(E)}^{S}\), where the elements of \(S_h\) for \(h\in \varSigma _S \) are used as additional identities. Our goal is, however, to deal both with the ground identities in E and with the strongly shallow identities in S by rewriting. This is where the ground rewrite system \(R(S^*,{\mathrel {>_{ lpo }}})\) comes into play. We assumed that \(\mathrel {>_{ lpo }}\) is induced by a linear order on \(\varSigma \cup C\) that extends > on C, makes each function symbol in \(\varSigma \) greater than each constant symbol in C, and linearly orders the function symbols in an arbitrary way. This implies that \(\ell \mathrel {>_{ lpo }}r\) for all rules \({\ell \rightarrow r}\in R(E)\). Consequently, termination of the rewrite system \(R(E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) can also easily be shown using \(\mathrel {>_{ lpo }}\).
However, in general \(R(E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) need not be confluent due to possible interactions of rules in R(E) with each other and of rules in R(E) with rules in \(R(S^*,{\mathrel {>_{ lpo }}})\). We turn \(R(E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) into a confluent and terminating system by modifying R(E) appropriately.
Algorithm 1
We start with \(R^{S}_0(E) := R(E)\) and \(i:=0\):
(a)
Let \(R^{S}_i(E){}_ con \) consist of the constant rules in \(R^{S}_i(E)\). For every constant \(c\in C\), consider
and let e be the least element in \([c]_i\) w.r.t. the order >. We call e the representative of c w.r.t. \(R^{S}_i(E)\) and >. If \(c\ne e\), then add \(c\rightarrow e\) to \(R^{S}_{i+1}(E)\).
(b)
In all function rules in \(R^{S}_i(E)\), replace each constant by its representative w.r.t. \(R^{S}_i(E)\) and >, and call the resulting set of function rules \(F^{S}_i(E)\). If at least one constant rule has been added in step (b), then set \(i:= i+1\) and continue with step (a). Otherwise, terminate with output \({\widehat{R}}^{S} (E) := R^{S}_{i+1}(E)\).
(b1)
Let h be an nary function symbol in \(\varSigma _S\). For each function rule of the form \(h(c_1,\ldots ,c_n)\rightarrow d\) in \(F^{S}_i(E)\), replace this rule with the rule \(\ell \rightarrow d\), where \(\ell \) is the canonical form of \(h(c_1,\ldots ,c_n)\) w.r.t. the canonical rewrite system \(R(S^*,{\mathrel {>_{ lpo }}})\). Then continue with (b2).
(b2)
Let f be an nary function symbol in \(\varSigma \). For every term \(f(c_1,\ldots ,c_n)\) occurring as the lefthand side of a rule in \(F^{S}_i(E)\), consider all the rules \(f(c_1,\ldots ,c_n)\rightarrow d_1, \ldots , f(c_1,\ldots ,c_n)\rightarrow d_k\) in \(F^{S}_i(E)\) with this lefthand side. Let d be the least element w.r.t. > in \(\{d_1, \ldots , d_k\}\). Add \(f(c_1,\ldots ,c_n)\rightarrow d\) and \(d_j\rightarrow d\) for all j with \(d_j\ne d\) to \(R^{S}_{i+1}(E)\).
Let us illustrate the construction of \({\widehat{R}}^{S} (E)\) using Example 2. In step (a), the nontrivial equivalence classes are \([a]_0 =\{a,b\}\) with representative b, \([c_{f(g(a),a)}] = \{c_{f(g(a),a)},c\}\) with representative c, and \([c_{g(b)}] = \{c_{g(b)}, c_{h(a)}\}\) with representative \(c_{h(a)}\). Thus, \(a \rightarrow b, c_{f(g(a),a)} \rightarrow c, c_{g(b)} \rightarrow c_{h(a)}\) are the constant rules added to \(R^{S}_1(E)\). The function rules in \(F^{S}_0(E)\) are then \(f(c_{g(a)},b) \rightarrow c, g(b) \rightarrow c_{g(a)}, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}.\) Since \(c_{g(a)} > b\), the lefthand side of the rule \(f(c_{g(a)},b)\rightarrow c\) is reduced with \(R(S^*,{\mathrel {>_{ lpo }}})\) to \(f(b,c_{g(a)})\), and thus in step (b1) this rule is replaced with \(f(b,c_{g(a)})\rightarrow c\). In step (b2), there are two rules with lefthand side g(b). For these rules, we add \(c_{g(a)}\rightarrow c_{h(a)}\) and \(g(b) \rightarrow c_{h(a)}\) to \(R^{S}_1(E)\). The rules with lefthand sides different from g(b) are moved unchanged from \(F^{S}_0(E)\) to \(R^{S}_1(E)\) since their lefthand sides are unique. Thus, \(R^{S}_1(E) = \{ a \rightarrow b, c_{f(g(a),a)} \rightarrow c, c_{g(b)} \rightarrow c_{h(a)}, c_{g(a)}\rightarrow c_{h(a)}, f(b,c_{g(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}\}.\)
In the second iteration step, we now have the new nontrivial equivalence class \([c_{g(b)}]_1 = \{c_{g(b)}, c_{h(a)}, c_{g(a)}\}\) with representative \(c_{h(a)}\). The net effect of step (a) is, however, that the constant rules are moved unchanged from \(R^{S}_1(E)\) to \(R^{S}_2(E)\). The function rules in \(F^{S}_1(E)\) are then \(f(b,c_{h(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}\). Consequently, no reduction is possible in step (b1) and no constant rules are added in step (b2). The construction thus terminates with output \({\widehat{R}}^{S} (E) = \{ a \rightarrow b, c_{f(g(a),a)} \rightarrow c, c_{g(b)} \rightarrow c_{h(a)}, c_{g(a)}\rightarrow c_{h(a)}, f(b,c_{h(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}\}.\)
3.2 Proof of Correctness and Tractability
Our goal is now to show that \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) provides us with a polynomialtime decision procedure for the word problem in E w.r.t. S. When looking at the complexity of the word problem, one needs to clarify what counts as input and what is assumed to be fixed. In our setting, where we want to know whether
holds or not, one could on the one hand assume that S and E are fixed, and only the terms
are part of the input. On the other hand, one could assume that the input consists of S, E, and
. In this paper, we make the latter assumption, which makes our polynomiality result considerably stronger than if we had used the former one.
However, to obtain this result, we make the additional assumption that the maximal arity of the function symbols in \(\varSigma _S\) is bounded by a constant. In this case we say that \(\varSigma _S\) is aritybounded. For the case of commutativity, this restriction is satisfied since then all the symbols in \(\varSigma _S\) are binary. If we consider gtranspositions, but do not bound the arity n of the symbol g (i.e., the arities of the symbols satisfying transpositions is determined by the input rather than fixed from the outset), then the arityboundedness condition is not satisfied. If \(S_g\) consists of all gtransitions, then the set of identities \(\{g(x_1,\ldots ,x_n)\mathrel {\approx }g(y_1,\ldots ,y_n) \mid g(x_1,\ldots ,x_n)\mathrel {\approx }_{S_h} g(y_1,\ldots ,y_n)\}\) contains (for distinct variables \(x_1,\ldots ,x_n\)) all identities of the form \(g(x_1,\ldots ,x_n)\mathrel {\approx }g(x_{\pi (1)},\ldots ,x_{\pi (n)})\), where \(\pi \) is a permutation of the set \(\{1,\ldots ,n\}\), and is thus exponential in n. If n is assumed to be a constant, then this set of identities is actually of constant size.
In the following, we assume for our complexity results that \(\varSigma _S\) is arity bounded. Note that, even if the maximal arity of the symbols in \(\varSigma _S\) is bounded in this way, how many symbols of a certain allowed arity are present in \(\varSigma _S\) and which strongly shallow identities they satisfy, can still be determined by the input. First, we prove that, under this assumption, the construction of \({\widehat{R}}^{S} (E)\) takes only polynomial time.
Lemma 4
If \(\varSigma _S\) is aritybounded, then the system \({\widehat{R}}^{S} (E)\) can be computed from R(E) in polynomial time.
Proof
First, note that step (a) can clearly be performed in polynomial time since deciding \(\mathrel {\approx }_{R^{S}_i(E){}_ con }\) amounts to performing reachability tests in an undirected graph.^{3} Producing the original rules in \(F^{S}_i(E)\) can then clearly also be done in polynomial time.
The canonical forms of lefthand sides of the form \(h(c_1,\ldots ,c_n)\) for \(h\in \varSigma _S \) required in step (b1) can also be computed in polynomial time. In fact, the terms obtained by a rewrite sequence in \(R(S^*,{\mathrel {>_{ lpo }}})\) that starts with \(h(c_1,\ldots ,c_n)\) are all of the form \(h(d_1,\ldots ,d_n)\) with \(\{d_1,\ldots ,d_n\} = \{c_1,\ldots ,c_n\}\). The number of such terms is bounded by \(n^n\) (which is a constant) and since \(R(S^*,{\mathrel {>_{ lpo }}})\) is terminating, there cannot be a repetition in this sequence. Consequently, at most \(n^n\) (i.e., constantly many) rewrite steps are applied.
Grouping the rules in \(F^{S}_i(E)\) according to their lefthand sides in step (b2) is clearly also possible in polynomial time, as is adding the new rules to \(R^{S}_{i+1}(E)\). In case the procedure does not terminate after step (b), the number of different equivalence classes of constants decreases by at least one. Thus, the iteration must terminate after at most C steps. \(\square \)
Next, we show that the construction of \({\widehat{R}}^{S} (E)\) is correct in the following sense.
Lemma 5
Viewed as a set of identities,
is equivalent to R(E) w.r.t. S, i.e., \(s \mathrel {\approx }_{R(E)}^{S} t\) iff \(s \mathrel {\approx }_{{\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})} t\) for all terms \(s, t \in G(\varSigma ,C)\).
Proof
Note that (on \(G(\varSigma ,C)\)) we have \({\mathrel {\approx }_{R(E)}^{S}} = {\mathrel {\approx }_{R^{S}_0(E)}^{S}} = {\mathrel {\approx }_{R^{S}_0(E)\cup R(S^*,{\mathrel {>_{ lpo }}})}}\) since \(R(E) = R^{S}_0(E)\) and \(R(S^*,{\mathrel {>_{ lpo }}})\) is equivalent to S on \(G(\varSigma ,C)\) by Lemma 1. It is thus sufficient to show that the modifications performed when going from \(R^{S}_i(E)\) to \(R^{S}_{i+1}(E)\) do not change the induced equational theory, i.e., \({\mathrel {\approx }_{R^{S}_i(E)\cup R(S^*,{\mathrel {>_{ lpo }}})}} = {\mathrel {\approx }_{R^{S}_{i+1}(E)\cup R(S^*,{\mathrel {>_{ lpo }}})}}\) for all \(i\ge 0\) for which \(R^{S}_{i+1}(E)\) is defined.
To show the inclusion from left to right, first consider step (a). If \(R^{S}_i(E)\) contains the constant rule \(c_1\rightarrow c_2\), then \(c_1\) and \(c_2\) belong to the same equivalence class w.r.t. \(R^{S}_i(E){}_ con \) and \(c_1 > c_2\). In case \(c_2\) is the least element in this class, then \(R^{S}_{i+1}(E)\) still contains the rule \(c_1\rightarrow c_2\). Otherwise, since we know that \(c_1 > c_2\), the least element e in the class is different from these two constants, and thus \(R^{S}_{i+1}(E)\) contains the rules \(c_1\rightarrow e, c_2\rightarrow e\), which shows \(c_1 \mathrel {\approx }_{R^{S}_{i+1}(E)} c_2\), and thus \(c_1 \mathrel {\approx }_{R^{S}_{i+1}(E)\cup R(S^*,{\mathrel {>_{ lpo }}})} c_2\).
Regarding step (b), note that the replacements performed in the construction of \(F^{S}_i(E)\) replace constants c occurring in function rules by constants e that are equivalent to c both w.r.t. \(R^{S}_i(E)\) and w.r.t. \(R^{S}_{i+1}(E)\). Thus, these replacements do not change the overall equational theory. In step (b1), rules of the form \(h(c_1,\ldots ,c_n)\rightarrow d\) are replaced with rules \(\ell \rightarrow d\) such that \(h(c_1,\ldots ,c_n) \mathrel {\approx }_{R(S^*,{\mathrel {>_{ lpo }}})} \ell \), which thus also does not change the overall equational theory.
For step (b2), consider a function rule \(f(c_1,\ldots ,c_n)\rightarrow d_j\) in \(F^{S}_i(E)\). Either this rule also belongs to \(R^{S}_{i+1}(E)\), or \(R^{S}_{i+1}(E)\) contains rules \(f(c_1,\ldots ,c_n)\rightarrow d, d_j\rightarrow d\). In both cases, \(f(c_1,\ldots ,c_n) \mathrel {\approx }_{R^{S}_{i+1}(E)} d_j\), and thus \(f(c_1,\ldots ,c_n) \mathrel {\approx }_{R^{S}_{i+1}(E)\cup R(S^*,{\mathrel {>_{ lpo }}})} d_j\) is preserved.
The inclusion from right to left can be shown similarly to the one from left to right. \(\square \)
Lemma 6
Viewed as a term rewriting system, \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) is canonical, i.e., terminating and confluent.
Proof
Termination of the term rewriting system \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) can be shown as for \(R(E)\cup R(S^*,{\mathrel {>_{ lpo }}})\), by using the reduction order \(\mathrel {>_{ lpo }}\).
Regarding confluence, we first claim that there are no nontrivial critical pairs (see Sect. 6.2 in [5]) between the rules in \({\widehat{R}}^{S} (E)\). To see this, note that two function rules from \({\widehat{R}}^{S} (E)\) cannot overlap due to the fact that in step (b2) no more constants are identified, and thus all lefthand sides of function rules in \({\widehat{R}}^{S} (E)\) are unique. In addition, any constant can occur as lefthand side of at most one rule due to step (a). A rule of the form \(f(c_1,\ldots ,c_n)\rightarrow d\) cannot overlap with a rule of the form \(e\rightarrow e'\) since the \(c_i\) are representatives, whereas e is not a representative.
Second, we have already shown in the proof of Lemma 1 that all critical pairs between rules in \(R(S^*,{\mathrel {>_{ lpo }}})\) are joinable.
Thus, consider an overlap of a rule \(h(s_1,\ldots ,s_n)\rightarrow r\) in \(R(S^*,{\mathrel {>_{ lpo }}})\) with a rule in \({\widehat{R}}^{S} (E)\). If this overlap occurs at the root position, then the rule in \({\widehat{R}}^{S} (E)\) is a function rule of the form \(h(c_1,\ldots ,c_n) \rightarrow d\) in \({\widehat{R}}^{S} (E)\). However, due to step (b1), the lefthand sides of such function rules are irreducible w.r.t. \(R(S^*,{\mathrel {>_{ lpo }}})\), which contradicts our assumption that \(h(s_1,\ldots ,s_n) = h(c_1,\ldots ,c_n)\) is the lefthand side of a rule in \(R(S^*,{\mathrel {>_{ lpo }}})\).
Finally, assume that the rule from \({\widehat{R}}^{S} (E)\) is applied below the root position in the lefthand side of the rule in \(R(S^*,{\mathrel {>_{ lpo }}})\). Joinability of the critical pair obtained this way can be shown as in the case of a rewrite in the substitution part in the proof of Lemma 1.
Summing up, we have shown that all nontrivial critical pairs of \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) can be joined, which proves confluence of \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\). \(\square \)
Since \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) is canonical, each term \(s\in G(\varSigma ,C)\) has a unique canonical form (i.e., irreducible term reachable from s) w.r.t. \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\). We can thus use the system \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) to decide whether terms
are equivalent in E w.r.t. S, i.e., whether
, by computing the canonical forms of the terms \(s_0\) and \(t_0\).
Theorem 1
If
, then
iff \(s_0\) and \(t_0\) have the same canonical form w.r.t. \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\).
Proof
As an example, consider the rewrite system \({\widehat{R}}^{S} (E)\) that we have computed (above Lemma 4) from the set of ground identities E in Example 2, and recall that \(f(b,h(a)) \mathrel {\approx }_E^{S}c\). The canonical form of c is clearly c, and the canonical form of f(b, h(a)) can be computed by the following rewrite sequence: \(f(b,h(a)) \rightarrow _{{\widehat{R}}^{S} (E)} f(b,h(b)) \rightarrow _{{\widehat{R}}^{S} (E)} f(b,c_{h(a)}) \rightarrow _{{\widehat{R}}^{S} (E)} c\). In this example, no rule of \(R(S^*,{\mathrel {>_{ lpo }}})\) is used in the computation of the canonical forms, which is of course not the case in general. Also note that we have used \(R(S^*,{\mathrel {>_{ lpo }}})\) in the construction of \({\widehat{R}}^{S} (E)\).
The construction of \({\widehat{R}}^{S} (E)\) is actually independent of the terms \(s_0, t_0\) for which we want to decide the word problem in E w.r.t. S. This is in contrast to approaches that restrict the construction of the congruence closure to the subterms of E and the subterms of the terms \(s_0, t_0\) for which one wants to decide the word problem. This fact will turn out to be useful in the next section.
It remains to show that the decision procedure obtained by applying Theorem 1 requires only polynomial time if \(\varSigma _S \) is assumed to be aritybounded.
Corollary 1
If \(\varSigma _S\) is aritybounded, then the word problem for finite sets of ground identities w.r.t. S is decidable in polynomial time, i.e., given a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) and terms
, we can decide in polynomial time, whether
holds or not if we assume that the maximal arity of the symbols in \(\varSigma _S\) is bounded by a constant.
Proof
Since we already know that \({\widehat{R}}^{S} (E)\) can be constructed in time polynomial in the size of E, and thus also has polynomial size, it is sufficient to show that the canonical form of a term
w.r.t. the rewrite system \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) can be computed in time polynomial in the size of \(s_0\) and \({\widehat{R}}^{S} (E)\). This is clearly the case if only a polynomial number of rewrite steps are needed to produce the canonical form of \(s_0\), and each application of a rewrite step also takes only polynomial time. The latter is an easy consequence of the facts that the relevant part of the set of identitiesis of linear size if n is assumed to be a constant and \(\mathrel {>_{ lpo }}\) is decidable in polynomial time (see Proposition 5.4.16 in [5]).
$$\begin{aligned} S^* = \{h(x_1,\ldots ,x_n)\mathrel {\approx }h(y_1,\ldots ,y_n) \mid h\in \varSigma _S, h(x_1,\ldots ,x_n)\mathrel {\approx }_{S_h} h(y_1,\ldots ,y_n)\} \end{aligned}$$
We show the former by induction on the structure of \(s_0\). If \(s_0\) is a constant, then there are at most \(C1\) rewrite steps starting with \(s_0\) possible. This yields the desired polynomial bound since the cardinality of C is linear in the size of E.
Now assume that
for a function symbol \(h\in \varSigma \). First, we compute the canonical forms \(t_1,\ldots ,t_n\) of \(s_1,\ldots ,s_n\). By induction, we can assume that the number of rewrite steps required for these computations is bounded by polynomials \(p_1,\ldots ,p_n\) in the sizes of E and \(s_1,\ldots ,s_n\), respectively. If \(h\in \varSigma _S \), then we rewrite \(h(t_1,\ldots ,t_n)\) with the rules in \(R(S^*,{\mathrel {>_{ lpo }}})\). Note that the terms obtained by this rewrite sequence are all of the form \(h(u_1,\ldots ,u_n)\) with \(\{t_1,\ldots ,t_n\} = \{u_1,\ldots ,u_n\}\). Since the number of such terms is bounded by \(n^n\) (which is a constant) and \(R(S^*,{\mathrel {>_{ lpo }}})\) is terminating, there cannot be a repetition in this sequence. Consequently, at most \(n^n\) rewrite steps are applied. If the term \(h(v_1,\ldots ,v_n)\) obtained by this rewrite sequence is the lefthand side of a rule in \({\widehat{R}}^{S} (E)\), then it rewrites to a constant d, from which at most \(C1\) rewrite steps are possible. Overall, we thus need at most \(p_1+\ldots + p_n + n^n + C\) rewrite steps to compute the canonical form of
, which is clearly a polynomial bound in the size of \(s_0\) and E.
In the case where \(h\in \varSigma \setminus \varSigma _S \), the bound is \(p_1+\ldots + p_n + C\) instead. \(\square \)
Commutative function symbols all have arity 2, and thus \(\varSigma _S = \varSigma _c\) is clearly aritybounded. Given \(\varSigma _c\), we denote the set of commutativity axioms for the elements of \(\varSigma _c\) as \(F_c\).
Corollary 2
The commutative word problem for finite sets of ground identities is decidable in polynomial time, i.e., given a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\), commutativity axioms \(F_c\) for the symbols in \(\varSigma _c \subseteq \varSigma \), and terms
, we can decide in polynomial time whether
holds or not.
Even if \(\varSigma _S \) is not aritybounded, we may still obtain polynomially decidable word problems. As an example, consider for all \(n>2\) function symbols \(g_n\) that are assumed to satisfy the transposition identitiesIn this case, it is not necessary to consider the (exponentially many) derived identities between strongly shallow \(g_n\)terms in \(S_{g_n}^*\) to construct the ordered rewrite system. Using \((S_{g_n},{\mathrel {>_{ lpo }}})\) instead is sufficient since this system is canonical on ground terms. In fact, using the rules in \(R(S_{g_n},{\mathrel {>_{ lpo }}})\) one can compute the canonical form of a ground term in polynomial time since this corresponds to applying bubblesort. This observation can be used to show that the word problem for finite sets of ground identities w.r.t. function symbols satisfying transposition identities is decidable in polynomial time even without assuming arityboundedness.
$$\begin{aligned} S_{g_n} = \{g_n(x_1,\ldots ,x_i,x_{i+1},\ldots ,x_n)\mathrel {\approx }g_n(x_1,\ldots ,x_{i+1},x_{i},\ldots ,x_n)\mid 1\le i\le n1\}. \end{aligned}$$
4 Semantic Congruence Closure with Extensionality
Here, we additionally assume that some of the function symbols are extensional, i.e., there is a set of function symbols \(\varSigma ^e \subseteq \varSigma \) whose elements we call extensional symbols. An extensional symbol \(f\in \varSigma ^e \) must satisfy the conditional identitiesIn the present section, we assume that extensional symbols do not satisfy any other semantic properties. The reason for this restriction will be explained in the next section.
$$\begin{aligned} f(x_1,\ldots ,x_n) \mathrel {\approx }f(y_1,\ldots ,y_n) \Rightarrow x_i \mathrel {\approx }y_i\ \ \text{ for } \text{ all } \, i, 1\le i\le n. \end{aligned}$$
(2)
We have shown in [13] how to integrate extensional symbols into the rewritingbased commutative congruence closure algorithm developed there. It is not hard to see that this approach can be adapted to the integration of extensionality into congruence closure w.r.t. strongly shallow identities as developed in the previous section. Below, we describe this integration in the more general setting of a generic rewritingbased semantic congruence closure. In addition to making our result more general, this also has the advantage that it clarifies which properties of our rewritingbased congruence closure algorithm are vital for integrating extensional symbols.
4.1 Generic Semantic Congruence Closure
Abstracting from the specific form of the semantic properties, we now consider a setting where there is a subset \(\varSigma _F \) of \(\varSigma \) and a set of identities F formulated using the function symbols in \(\varSigma _F \) and variables (see Sect. 2.2.2). We assume that one can construct a canonical ground rewrite system \({\widehat{R}}^{F} \) that satisfies properties similar to the system \({\widehat{R}}^{S} (E)\cup R(S^*,{\mathrel {>_{ lpo }}})\) considered in the previous section.
We call a ground rewrite system \(R\subseteq G(\varSigma ,C)\times G(\varSigma ,C)\) flat if it consists of constant rules of the form \(c \rightarrow d\) and function rules of the form \(f(c_1, \ldots , c_n) \rightarrow d\), where \(c,c_1,\ldots ,c_n,d \in C\) and \(f\in \varSigma \) is an nary function symbol.
Definition 3
We say that rewritebased congruence closure is effective for \(\varSigma _F \) and F if, for every finite flat ground rewrite system \(R\subseteq G(\varSigma ,C)\times G(\varSigma ,C)\), there is a
(possibly infinite) ground rewrite system \({\widehat{R}}^{F} \subseteq G(\varSigma ,C)\times G(\varSigma ,C)\) that satisfies the following properties:
1.
All rules containing a symbol in \(\varSigma _F \) have lefthand sides whose root symbols belong to \(\varSigma _F \),
2.
All rules containing a symbol \(f\in \varSigma \setminus \varSigma _F \) as root symbol of the lefthand side are flat function rules of the form \(f(c_1,\ldots ,c_n)\rightarrow d\), and the subset of these rules can effectively be computed,
3.
\({\widehat{R}}^{F} \) is equivalent to R w.r.t. F on terms in \(G(\varSigma ,C)\),
4.
\({\widehat{R}}^{F} \) is canonical and canonical forms of terms in \(G(\varSigma ,C)\) w.r.t. this term rewriting system can effectively be computed.
Note that we do not require that the canonical ground rewrite system \({\widehat{R}}^{F} \) itself is computable, but only that there is an algorithm for computing canonical forms w.r.t. this system. The reason is that \({\widehat{R}}^{F} \) may actually be infinite, and thus may not be computable in finite time. One may wonder how one can get a procedure for computing canonical forms w.r.t. \({\widehat{R}}^{F} \) if this system is infinite. One possibility is that one can compute a finite representation of this system that allows one to produce these normal forms. In the case of strongly shallow identities, this finite representation consists of the finite ground rewrite system \({\widehat{R}}^{S} (E)\) together with the ordered rewrite system \((S^*,\mathrel {>_{ lpo }})\). It is easy to see that the system \({\widehat{R}}^{S}:= {\widehat{R}}^{S} (E) \cup R(S^*,\mathrel {>_{ lpo }})\) induced by these two components satisfies the conditions required by Definition 3.
Under the assumptions formulated in this definition, given two terms
, we can decide whether
holds or not by computing their canonical forms w.r.t. \({\widehat{R}}^{F} \) and checking whether they are syntactically equal. In order to obtain a polynomialtime decision procedure, it is sufficient to assume that all the computations mentioned above can be performed in polynomial time. In this case we say that rewritebased congruence closure is polynomial for \(\varSigma _F \) and F. If S is a strongly shallow set of identities over an aritybounded signature \(\varSigma _S \), then \({\widehat{R}}^{S} = {\widehat{R}}^{S} (E) \cup R(S^*,\mathrel {>_{ lpo }})\) satisfies the conditions required for a polynomial rewritebased congruence closure. However, as mentioned at the end of Sect. 3, polynomiality may be achieved even if \(\varSigma _S \) is not aritybounded.
4.2 Generic Semantic Congruence Closure with Extensionality
Let \(\varSigma _F \) and F be as above, and assume that \(\varSigma ^e \subseteq \varSigma \setminus \varSigma _F \). In addition to the identities in E and the identities in F for the symbols in \(\varSigma _F \), we now assume that also the conditional identities in (2) are satisfied for the symbols \(f\in \varSigma ^e \). From a semantic point of view, this means that we now consider algebras \({\mathcal {A}}\) that satisfy not only the identities in \(E\cup F\), but also extensionality for the symbols in \(\varSigma ^e\), i.e., for all \(f\in \varSigma ^e \), all \(i, 1\le i\le n\), and all elements \(a_1,\ldots ,a_n,b_1,\ldots ,b_n\) of \({\mathcal {A}}\) it holds that \(f^{\mathcal {A}} (a_1,\ldots ,a_n) = f^{\mathcal {A}} (b_1,\ldots ,b_n)\) implies \(a_i=b_i\) for all \(i, 1\le i\le n\). We say that \(s\mathrel {\approx }t\) follows from E w.r.t. F and the extensional symbols in \(\varSigma ^e \) (written \(s \mathrel {\approx }_E^{F,\varSigma ^e}t\)) if \(s^{\mathcal {A}} = t^{\mathcal {A}} \) holds in all algebras \({\mathcal {A}} \) that satisfy the identities in \(E\cup F\) and extensionality for the symbols in \(\varSigma ^e\).
The relation \({\mathrel {\approx }_E^{F,\varSigma ^e}}\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) can also be generated using the following extension of congruence closure by an extensionality rule.
To be more precise, \( CC ^{F,\varSigma ^e} (E)\) is the smallest subset of \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) that contains E and is closed under reflexivity, transitivity, symmetry, congruence, Fidentities, and the following extensionality rule:

if \(f\in \varSigma ^e \) is an nary function symbol, \(1\le i\le n\), and \(f(s_1,\ldots ,s_n) \mathrel {\approx }f(t_1,\ldots ,t_n)\in CC ^{F,\varSigma ^e} (E)\), then \({s_i\mathrel {\approx }t_i}\in CC ^{F,\varSigma ^e} (E)\) (extensionality).
Proposition 1
For all terms \(s,t\in G(\varSigma ,C_0)\) it holds that \(s \mathrel {\approx }_E^{F,\varSigma ^e}t\) iff \({s\mathrel {\approx }t}\in CC ^{F,\varSigma ^e} (E)\).
Proof
This proposition is an easy consequence of Theorem 54 in [19]. Adapted to our setting, this theorem says that \(\mathrel {\approx }_E^{F,\varSigma ^e}\) is the least congruence containing E that is invariant under applying Fidentities and extensionality. Clearly, this is exactly \( CC ^{F,\varSigma ^e} (E)\). \(\square \)
To obtain a decision procedure for \(\mathrel {\approx }_E^{F,\varSigma ^e}\), we assume that rewritebased congruence closure is effective for \(\varSigma _F \) and F. To apply this assumption, we first need to transform E into a flat rewrite system. Thus, let the flat term rewriting system R(E) be defined as in Sect. 3.
Example 3
Consider \(E' = \{ f(g(a),a) \mathrel {\approx }c, g(b) \mathrel {\approx }h(a), g(a) \mathrel {\approx }g(b)\}\) with \(\varSigma _F = \{f\}\) where \(F = \{f(x,y)\mathrel {\approx }f(y,x)\}\) and \(\varSigma ^e = \{g\}\). Then \(f(b,h(a)) \mathrel {\approx }_{E'}^{F,\varSigma ^e}c\).
Let the set \(C_1\) of new constants and the linear order on all constants be defined as in Example 2. Then we obtain the following flat system:
Lemma 7
The system R(E) is a conservative extension of E also w.r.t. F and the extensional symbols in \(\varSigma ^e \), i.e.,
iff
for all terms
.
Proof
The proof of this lemma is basically identical to the proof of Lemma 3. One only needs to note, additionally, that going to the expansion and the reduct constructed in this proof does not change satisfaction of the extensionality axioms. \(\square \)
The confluent and terminating rewrite system for deciding \(\mathrel {\approx }_E^{F,\varSigma ^e}\) is now constructed by first applying the generic semantic congruence closure to R(E), then applying extensionality to derive new identities between constants, and iterating these two steps until no new identities are derived in the second step.
Algorithm 2
The rewrite system \({\widehat{R}}^{F,\varSigma ^e} (E)\) is constructed by performing the following steps, starting with \(R_0 := R(E)\) and \(i:= 0\):
(g)
Apply the generic semantic congruence closure for \(\varSigma _F \) and F to \(R_i\) to obtain the canonical rewrite system \({\widehat{R}}^{F} _i\).
(e)
Let \(R_{i+1} := R_i\). For all \(f\in \varSigma ^e \), all pairs of distinct flat function rules \(f(c_1,\ldots ,c_n)\rightarrow d, f(c_1',\ldots ,c_n')\rightarrow d\) in \({\widehat{R}}^{F} _i\), and all \(j, 1\le j\le n\), such that \(c_j\ne c_j'\), add \(c_j\rightarrow c_j'\) to \(R_{i+1}\) if \(c_j > c_j'\) and otherwise add \(c_j'\rightarrow c_j\) to \(R_{i+1}\). If at least one constant rule has been added in this step, then set \(i:= i+1\) and continue with step (g). Otherwise, terminate with output \({\widehat{R}}^{F,\varSigma ^e} (E) := {\widehat{R}}^{F} _i\).
Lemma 8
The construction of \({\widehat{R}}^{F,\varSigma ^e} (E)\) terminates after a linear number of iterations and produces a canonical rewriting system.
Proof
Termination after linearly many iterations is an immediate consequence of the fact that, in each iteration of the construction (except the last one), the number of distinct equivalence classes of constants decreases, and the cardinality of the set of constants C is linear in the size of the input. The system \({\widehat{R}}^{F,\varSigma ^e} (E)\) is canonical since it is produced by an application of rewritebased congruence closure to a finite flat ground rewrite system \(R_i\). \(\square \)
We illustrate the construction using Example 3. In step (g), we apply Algorithm 1 from the previous section. In step (a) of that algorithm, the nontrivial equivalence classes of constants are \([c_{f(g(a),a)}] = \{c_{f(g(a),a)},c\}\) with representative c and \([c_{g(b)}] = \{c_{g(a)}, c_{g(b)}, c_{h(a)}\}\) with representative \(c_{h(a)}\). Thus, \(c_{f(g(a),a)} \rightarrow c, c_{g(a)} \rightarrow c_{h(a)}, c_{g(b)} \rightarrow c_{h(a)}\) are the constant rules added. The function rules are then \(f(c_{h(a)},a) \rightarrow c, g(a) \rightarrow c_{h(a)}, g(b) \rightarrow c_{h(a)}, h(a) \rightarrow c_{h(a)}.\) In step (b1), the first rule is rewritten to \(f(a,c_{h(a)}) \rightarrow c\). Since the function rules obtained this way have unique lefthand sides, no constant rule is added in step (b2). Thus, the construction in step (g) terminates and yields the system that is the union of the generated flat rules with the ones in \(R(F,{\mathrel {>_{ lpo }}})\) (with the strongly shallow identities F being commutativity of the symbols in \(\varSigma _F\)). Note that, for commutativity, \(F^* = F\).
Consequently, we now proceed with step (e). Since \(g\in \varSigma ^e \), the presence of the rules \(g(a) \rightarrow c_{h(a)}\) and \(g(b) \rightarrow c_{h(a)}\) triggers the addition of \(a\rightarrow b\) to \(R_1\).
In the second iteration step, we now have in step (a) of the construction applied in (g) the additional nontrivial equivalence class \([a] = \{a,b\}\) with representative b. The net effect of step (a) is then to generate the constant rules \(c_{f(g(a),a)} \rightarrow c, c_{g(a)} \rightarrow c_{h(a)}, c_{g(b)} \rightarrow c_{h(a)}\), and \(a\rightarrow b\). The obtained function rules after step (b1) are then \(f(b,c_{h(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}.\) No new constant rules are added in step (b2). In step (e), the construction terminates with the set of flat rules \(R_1 = \{ a\rightarrow b, c_{f(g(a),a)} \rightarrow c, c_{g(a)} \rightarrow c_{h(a)}, c_{g(b)} \rightarrow c_{h(a)}, f(b,c_{h(a)}) \rightarrow c, g(b) \rightarrow c_{h(a)}, h(b) \rightarrow c_{h(a)}\},\) which is identical to the system \({\widehat{R}}^{S} (E)\) computed for the set of identity E of Example 2. The system \({\widehat{R}}^{F,\varSigma ^e} (E)\) produced by this run of Algorithm 2 is \(R_1\cup R(F,{\mathrel {>_{ lpo }}})\).
Our goal is now to show, for the general case, that \({\widehat{R}}^{F,\varSigma ^e} (E)\) provides us with a decision procedure for the extensional word problem for \(E\cup F\), i.e., it allows us to decide the relation \(\mathrel {\approx }_E^{F,\varSigma ^e}\). This is an easy consequence of the following theorem.
Theorem 2
If
, then
iff \(s_0\) and \(t_0\) have the same canonical form w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\).
Proof
Assume that Algorithm 2 has terminated with output \({\widehat{R}}^{F,\varSigma ^e} (E) = {\widehat{R}}^{F} _n\).
To show the ifdirection, assume that \(s_0\) and \(t_0\) have the same canonical form w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\). Then,
, and thus
since \({\widehat{R}}^{F} _n\) is equivalent to \(R_n\) w.r.t. F. Consequently, it is sufficient to prove that \({\mathrel {\approx }^{F}_{R_{i+1}}} \subseteq {\mathrel {\approx }^{F,\varSigma ^e}_{R_{i}}}\) holds for all \(i, 0\le i<n\). In fact, then
implies
, which yields
by Lemma 7 since \(R_0 = R(E)\). To see the inclusion \({\mathrel {\approx }^{F}_{R_{i+1}}} \subseteq {\mathrel {\approx }^{F,\varSigma ^e}_{R_{i}}}\), note that the presence of the rules \(f(c_1,\ldots ,c_n)\rightarrow d, f(c_1',\ldots ,c_n')\rightarrow d\) in \({\widehat{R}}^{F} _i\) implies that \(f(c_1,\ldots ,c_n) \mathrel {\approx }_{R_i}^{F} f(c_1',\ldots ,c_n')\), and thus \(c_j \mathrel {\approx }^{F,\varSigma ^e}_{R_i} c_j'\). Since these are the only identities added when going from \(R_i\) to \(R_{i+1}\), this completes the proof of the ifdirection.
We show the onlyifdirection by contraposition. Thus, assume that \(s_0, t_0\) have different canonical forms w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\). Let \({\mathcal {A}} \) be the initial algebra (i.e., the free algebra over the empty set of generators) for \({\widehat{R}}^{F,\varSigma ^e} (E)\) viewed as a set of identities over the signature
. Recall that this algebra has the equivalence classes of terms in \(G(\varSigma ,C)\) w.r.t. \(\mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)}\) as its elements, and any term \(s\in G(\varSigma ,C)\) is interpreted in \({\mathcal {A}} \) as the class of s. Since \({\widehat{R}}^{F,\varSigma ^e} (E)\) is canonical, we can represent such a class by the unique canonical form of its elements. Obviously, the fact that \(s_0, t_0\) have different canonical forms w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\) implies
. Thus, if we can show that \({\mathcal {A}} \) satisfies E, the identities in F, and extensionality for every \(f\in \varSigma ^e \), then
implies that
, and thus
by Proposition 1.
If \({s\mathrel {\approx }t} \in E\), then \({s\mathrel {\approx }_{R(E)}^{F} t}\) (see the proof of Lemma 3). Since \(R(E) = R_0\subseteq \ldots \subseteq R_n\), this implies \(s \mathrel {\approx }_{{\widehat{R}}^{F} _n} t\), and thus \(s \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} t\). Consequently, these two terms are evaluated to the same element of \({\mathcal {A}} \), which shows that \({\mathcal {A}} \) satisfies the identities in E.
Let \({u\mathrel {\approx }v} \in F\) and \(\theta (u), \theta (v)\) be ground instances of u, v. Then we know that \(\theta (u) \mathrel {\approx }_{R(E)}^{F} \theta (v)\). As above, this implies that the two terms \(\theta (u)\) and \(\theta (v)\) evaluated to the same element of \({\mathcal {A}} \), which shows that \({\mathcal {A}} \) satisfies the identities in F.
Let \(f\in \varSigma ^e \) be an nary function symbol and \(s_1,t_1,\ldots ,s_n,t_n\in G(\varSigma ,C)\) be terms such that the terms \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) evaluate to the same element of \({\mathcal {A}} \). Then \(f(s_1,\ldots ,s_n) \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} f(t_1,\ldots ,t_n)\), and thus these terms have the same canonical forms w.r.t. \({\widehat{R}}^{F,\varSigma ^e} (E)\). Let \(s_1',t_1',\ldots ,s_n',t_n'\) be the canonical forms of the terms \(s_1,t_1,\ldots ,s_n,t_n\), respectively. Then the canonical forms of \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) are respectively identical to the canonical forms of \(f(s_1',\ldots ,s_n')\) and \(f(t_1',\ldots ,t_n')\). Since \(f\not \in \varSigma _F \), no rule with root symbol from \(\varSigma _F \) on the lefthand side is applicable to the latter terms. Thus, the only possible rules applicable to these terms are flat function rules for f. Consequently, there are two possible cases for how the (identical) canonical forms of \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) can look like: Summing up, we have thus shown that \({\mathcal {A}} \) also satisfies extensionality for the symbols in \(\varSigma ^e \), which completes the proof of the onlyifdirection. \(\square \)
1.
The terms \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) respectively have the canonical forms \(f(s_1',\ldots ,\) \(s_n')\) and \(f(t_1',\ldots ,t_n')\), and the corresponding arguments are syntactically equal, i.e., \(s_j' = t_j'\) for \(j = 1, \ldots , n\). In this case, \(s_j \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} s_j' = t_j' \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} t_j\) for \(j = 1, \ldots , n\), and thus, for \(j = 1, \ldots , n\), the terms \(s_j\) and \(t_j\) evaluate to the same element of \({\mathcal {A}} \).
2.
The terms \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) reduce to the same constant d. Then \({\widehat{R}}^{F_c,\varSigma ^e} (E)\) must contain the rules \(f(s_1',\ldots ,s_n')\rightarrow d\) and \(f(t_1',\ldots ,t_n')\rightarrow d\). By the construction of \({\widehat{R}}^{F,\varSigma ^e} (E)\), we thus have \(s_j' = t_j'\) for all \(j =1, \ldots , n\) since otherwise new constant rules would have been added in step (e) and the construction would not yet have terminated. This yields \(s_j \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} s_j' = t_j' \mathrel {\approx }_{{\widehat{R}}^{F,\varSigma ^e} (E)} t_j\) for \(j = 1, \ldots , n\), and thus, for \(j = 1, \ldots , n\), the terms \(s_j\) and \(t_j\) evaluate to the same element of \({\mathcal {A}} \).
Recall that \(f(b,h(a)) \mathrel {\approx }_{E'}^{F,\varSigma ^e}c\) for the set of identities \(E'\) of Example 3. We have already seen that these two terms rewrite to the same canonical form w.r.t. \({\widehat{R}}^{S} (E)\cup R(F,{\mathrel {>_{ lpo }}}) = {\widehat{R}}^{F,\varSigma ^e} (E')\).
The following corollary is an easy consequence of the above theorem, the conditions satisfied if rewritebased congruence closure is effective (polynomial), and Lemma 8.
Corollary 3
Let \(\varSigma _F \subseteq \varSigma \) and \(\varSigma ^e \subseteq \varSigma \setminus \varSigma _F \), F be a set of identities formulated using the function symbols in \(\varSigma _F \) and variables, and assume that rewritebased congruence closure is effective (polynomial) for \(\varSigma _F \) and F. Then the word problem for finite sets E of ground identities w.r.t. F and the extensional symbols in \(\varSigma ^e \) is decidable (in polynomial time), i.e., given terms
, we can decide (in polynomial time) whether
holds or not.
In the case of a set of strongly shallow identities S for a signature \(\varSigma _S\) that is arity bounded, the results shown in the previous section imply that rewritebased congruence closure is polynomial for \(\varSigma _S \) and S. Thus, we obtain the following instance of Corollary 3.
Corollary 4
Let \(\varSigma _S \subseteq \varSigma \) and \(\varSigma ^e \subseteq \varSigma \setminus \varSigma _S \), S be a set of strongly shallow identities for \(\varSigma _S\), and assumed that \(\varSigma _S\) is aritybounded. Then the word problem for finite sets of ground identities w.r.t. S and the extensional symbols in \(\varSigma ^e \) is decidable in polynomial time, i.e., given a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) and terms
, we can decide in polynomial time whether
holds or not if we assume that the maximal arity of the symbols in \(\varSigma _S\) is bounded by a constant.
We have mentioned in the introduction that it is unclear how this polynomiality result could be obtained by a simple adaptation of the usual approach that restricts congruence closure to a polynomially large set of subterms determined by the input (informally called “small” terms in the following). The main problem is that one might have to generate identities between “large” terms before one can get back to a desired identity between “small” terms using extensionality. The question is now where our proof actually deals with this problem. The answer is: in Case 1 of the case distinction in the proof of the onlyifdirection of Theorem 2. In fact, there we consider a derived identity \(f(s_1,\ldots ,s_n)\mathrel {\approx }f(t_1,\ldots ,t_n)\) such that the (syntactically identical) canonical forms of \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) are not a constant from C, but of the form \(f(s_1',\ldots ,s_n') = f(t_1',\ldots ,t_n')\). Basically, this means that \(f(s_1,\ldots ,s_n)\) and \(f(t_1,\ldots ,t_n)\) are terms that are not equivalent modulo E to subterms of terms occurring in E, since the latter terms have a constant representing them. Thus, these terms are “large” terms that potentially could cause a problem: an identity between them has been derived, and now extensionality applied to this identity yields new identities \(s_i\mathrel {\approx }t_i\) between smaller terms. Our proof shows that these identities can nevertheless be derived from \({\widehat{R}}^{F,\varSigma ^e} (E)\), and thus do not cause a problem. For the proof of Theorem 2 to go through, it is important that the canonical rewrite system \({\widehat{R}}^{F,\varSigma ^e} (E)\) works for all ground terms in \(G(\varSigma ,C)\), and not just for a finite set of “small” terms.
5 Symbols that are Commutative and Extensional
In the previous section, we have made the assumptions that the sets \(\varSigma _F \) and \(\varSigma ^e \) are disjoint, i.e., we did not consider extensionality for symbols satisfying the identities in F. The reason is that, without this restriction, we might obtain more consequences than we bargained for. For example, it is easy to see that the presence of a commutative and extensional symbol would trivialize the equational theory. In fact, if f is assumed to be commutative and extensional, then commutativity yields \(f(s,t) \mathrel {\approx }f(t,s)\) for all terms \(s,t\in G(\varSigma ,C_0)\), and extensionality then yields \(s\mathrel {\approx }t\). This shows that, in this case, the commutative and extensional congruence closure would be \(G(\varSigma ,C_0)\times G(\varSigma ,C_0)\), independently of E, and thus even for \(E=\emptyset \).
To avoid this problem, one can, however, look at appropriate variants of extensionality. In this section, we restrict the attention to commutativity, and consider the following variant of extensionality for commutative function symbols f, which we call cextensionality:For example, if f is a commutative couple constructor, and two couples turn out to be equal, then we want to infer that they consist of the same two persons, independently of the order in which they were put into the constructor.
$$\begin{aligned} f(x_1,x_2) \mathrel {\approx }f(y_1,y_2) \Rightarrow (x_1 \mathrel {\approx }y_1 \wedge x_2 \mathrel {\approx }y_2) \vee (x_1 \mathrel {\approx }y_2 \wedge x_2 \mathrel {\approx }y_1). \end{aligned}$$
(3)
Unfortunately, adding such a rule makes the word problem coNPhard, which can be shown by a reduction from validity of propositional formulae.
Proposition 2
In the presence of at least one commutative and cextensional symbol, the word problem for finite sets of ground identities is coNPhard.
We prove this proposition by a reduction from validity of propositional formulae. Thus, consider a propositional formula \(\phi \), and let \(p_1,\ldots ,p_n\) be the propositional variables occurring in \(\phi \). We take the constants 0 and 1, and for every \(i, 1\le i\le n\), we view \(p_i\) as a constant symbol, and add a second constant symbol \({\overline{p}}_i\). In addition, we consider the function symbols \(f_\vee , f_\wedge , f_\lnot , f\), and assume that f is commutative and satisfies (3). We then consider ground identities that axiomatize the truth tables for \(\vee ,\wedge ,\lnot \), i.e.,
In addition, we consider, for every \(i, 1\le i\le n\), the identitiesLet \(E_\phi \) be the set of these ground identities, and let \(t_\phi \) be the term obtained from \(\phi \) by replacing the Boolean operations \(\vee ,\wedge \), and \(\lnot \) by the corresponding function symbols \(f_\vee , f_\wedge \), and \(f_\lnot \). Proposition 2 is now an immediate consequence of the following lemma.
(4)
$$\begin{aligned} f(p_i,{\overline{p}}_i) \mathrel {\approx }f(0,1). \end{aligned}$$
Lemma 9
The identity \(t_\phi \mathrel {\approx }1\) holds in every algebra satisfying \(E_\phi \) together with (3) and commutativity of f iff \(\phi \) is valid.
Proof
First assume that \(\phi \) is valid, and let \({\mathcal {A}} \) be an algebra satisfying \(E_\phi \) together with (3) and commutativity of f. Since \({\mathcal {A}} \) satisfies the identities \(f(p_i,{\overline{p}}_i) \mathrel {\approx }f(0,1)\) for \(i = 1,\ldots ,n\), as well as (3), we can deduce that \(p_i^{\mathcal {A}} \in \{0^{\mathcal {A}},1^{\mathcal {A}} \}\) for all \(i = 1,\ldots ,n\). Let v be the propositional valuation that satisfies \(v(p_i) = 1\) iff \(p_i^{\mathcal {A}} = 1^{\mathcal {A}} \). Since \(\phi \) is valid, we know that \(v(\phi ) = 1\). Using this fact and the identities axiomatizing the truth tables for \(\vee ,\wedge ,\lnot \), we obtain that \(t_\phi ^{\mathcal {A}} = 1^{\mathcal {A}} \). Thus, the identity \(t_\phi \mathrel {\approx }1\) holds in \({\mathcal {A}}\).
Conversely, assume that \(\phi \) is not valid, and let v be a propositional valuation such that \(v(\phi ) = 0\). Let \({\mathcal {B}} \) be the free commutative algebra with generators 0, 1 and the binary function symbol f as binary operation. Thus, the domain B of \({\mathcal {B}} \) consists of the equivalence classes of ground terms in \(G(\{f\},\{0,1\})\) modulo commutativity. We expand \({\mathcal {B}} \) to an algebra \({\mathcal {A}} \) that also interprets the symbols \(p_1,\ldots ,p_n, {\overline{p}}_1,\ldots ,{\overline{p}}_n, f_\vee , f_\wedge , f_\lnot \) as follows:It is easy to see that \({\mathcal {A}} \) satisfies the identities in \(E_\phi \). In addition, commutativity of f and (3) are satisfied since these properties hold in the free commutative algebra [20]. It is now easy to see that \(t_\phi ^{\mathcal {A}} = v(\phi )^{\mathcal {B}} = 0^{\mathcal {B}} = 0^{\mathcal {A}} \), and thus \({\mathcal {A}} \) does not satisfy the identity \(t_\phi \mathrel {\approx }1\) since in \({\mathcal {B}} \) the class of the generator 0 is different from the class of the generator 1. \(\square \)

the domain of \({\mathcal {A}} \) is B;

\(0^{\mathcal {A}} = 0^{\mathcal {B}} \) and \(1^{\mathcal {A}} = 1^{\mathcal {B}} \);

\(p_i^{\mathcal {A}} = v(p_i)^{\mathcal {B}} \) and \({\overline{p}}_i^{\mathcal {A}} = (1v(p_i))^{\mathcal {B}} \) for \(i = 1,\ldots ,n\);

on \(\{0^{\mathcal {B}},1^{\mathcal {B}} \}\), \(f_\vee ^{\mathcal {A}} \) behaves like disjunction, \(f_\wedge ^{\mathcal {A}} \) behaves like conjunction, and \(f_\lnot ^{\mathcal {A}} \) behaves like negation, and on tuples not in \(\{0^{\mathcal {B}},1^{\mathcal {B}} \}\times \{0^{\mathcal {B}},1^{\mathcal {B}} \}\) (for \(f_\vee \) and \(f_\wedge \)) or \(\{0^{\mathcal {B}},1^{\mathcal {B}} \}\) (for \(f_\lnot \)) they yield an arbitrary value, say \(0^{\mathcal {B}} \);

\(f^{\mathcal {A}} \) is \(f^{\mathcal {B}} \).
To prove a complexity upper bound that matches the lower bound stated in Proposition 2, we consider a finite signature \(\varSigma \), a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) as well as sets \(\varSigma _c \subseteq \varSigma \) and \(\varSigma ^e \subseteq \varSigma \) of commutative and extensional symbols, respectively, and assume that the noncommutative extensional symbols in \(\varSigma ^e \setminus \varSigma _c \) satisfy extensionality (2), whereas the commutative extensional symbols in \(\varSigma ^e \cap \varSigma _c \) satisfy cextensionality (3). We want to show that, in this setting, the problem of deciding, for given terms
, whether
is not equivalent to \(t_0\) is in NP.
For this purpose, we employ a nondeterministic variant of Algorithm 2. In step (g) this procedure applies Algorithm 1 for the special case where the set \(F_c\) of strongly shallow identities axiomatize commutativity of the symbols in \(\varSigma _c\). We denote the canonical rewrite system produced by an application of this algorithm to the current flat system \(R_i\) by \({\widehat{R}}^{F_c} _i\), i.e., \({\widehat{R}}^{F_c} _i = {\widehat{R}}^{F_c} (R_i) \cup R(F_c,{\mathrel {>_{ lpo }}})\).
For extensional symbols \(f\in \varSigma ^e \setminus \varSigma _c \), step (e) is also performed as in Algorithm 2. For extensional symbols \(f\in \varSigma ^e \cap \varSigma _c \), step (e) is modified as follows: for all pairs of distinct flat function rules \(f(c_1,c_2) \rightarrow d, f(c_1',c_2') \rightarrow d\) in \({\widehat{R}}^{F_c} _i\), nondeterministically choose whetherand then add the corresponding constant rules to \(R_{i+1}\) unless the respective constants are already syntactically equal.

\(c_1\) and \(c_1'\) as well as \(c_2\) and \(c_2'\) are to be identified, or

\(c_1\) and \(c_2'\) as well as \(c_2\) and \(c_1'\) are to be identified,
This nondeterministic algorithm has different runs, depending on the choices made in the nondeterministic part of step (e). But each run r produces a rewrite system \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\).
Example 4
We illustrate the nondeterministic construction using the identities \(E_\phi \) for \(\phi = p\vee \lnot p\) from our coNPhardness proof. Then \(E_\phi \) consists of the identities in (4) together with the identity \(f(p,{\overline{p}}) \mathrel {\approx }f(0,1)\). Assuming an appropriate order on the constants, the system \(R(E_\phi )\) contains, among others, the rules
In steps (a) and (b) of Algorithm 1, these rules are transformed into the form
Since no new constant rule is added, the construction proceeds with step (e). Due to the presence of the rules \(f(p,{\overline{p}}) \rightarrow c_{f(1,0)}\) and \(f(1,0) \rightarrow c_{f(1,0)}\) for \(f\in \varSigma _c \cap \varSigma ^e \), it now nondeterministically chooses between identifying p with 1 or with 0. In the first case, the constant rules \(p\rightarrow 1, {\overline{p}}\rightarrow 0\) are added, and in the second \(p\rightarrow 0, {\overline{p}}\rightarrow 1\) are added. In the next iteration, no new constant rules are added, and thus the construction terminates. It has two runs \(r_1\) and \(r_2\). The generated rewrite systems \({\widehat{R}}^{F_c,\varSigma ^e} _{r_1}(E)\) and \({\widehat{R}}^{F_c,\varSigma ^e} _{r_2}(E)\) share the rules in (5), but the first contains \(p\rightarrow 1\) whereas the second contains \(p\rightarrow 0\).
(5)
Lemma 10
Every run r of the construction terminates after a linear number of iterations and the system \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) this run produces is canonical.
Proof
Using the canonical rewrite systems \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\), we can now characterize when an identity follows from E w.r.t. commutativity of the symbols in \(\varSigma _c \), extensionality of the symbols in \(\varSigma ^e \setminus \varSigma _c \), an cextensionality of the symbols in \(\varSigma ^e \cap \varSigma _c \) as follows.
Theorem 3
Let
. The identity
holds in every algebra that satisfies E, commutativity for every \(f\in \varSigma _c \), extensionality for every \(f\in \varSigma ^e \setminus \varSigma _c \), and cextensionality for every \(f\in \varSigma ^e \cap \varSigma _c \) iff \(s_0, t_0\) have the same canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) for every run r of the nondeterministic construction.
Proof
First, we show the ifdirection. Thus, assume that \(s_0, t_0\) have the same canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) for every run r. Let \({\mathcal {A}} \) be an algebra that satisfies E, commutativity for every \(f\in \varSigma _c \), extensionality for every \(f\in \varSigma ^e \setminus \varSigma _c \), and cextensionality for every \(f\in \varSigma ^e \cap \varSigma _c \). We must show that \(s_0^{\mathcal {A}} = t_0^{\mathcal {A}} \).
To this purpose, we expand \({\mathcal {A}} \) to the new constants in \(C_1\) by setting \(c_s^{\mathcal {A}}:= s^{\mathcal {A}} \), and call the expansion obtained this way still \({\mathcal {A}} \). Since \({\mathcal {A}} \) is a model of E, it is easy to see that it is also a model of R(E). In addition, it satisfies the rewrite rules (viewed as identities) added in steps (a) and (b) of Algorithm 1. The same is true for the rules added in step (e) of the modified version of Algorithm 2 when treating symbols in \(\varSigma ^e \setminus \varSigma _c \). For symbols \(f\in \varSigma ^e \cap \varSigma _c \), we let \({\mathcal {A}} \) decide which option to take. To be more precise, for all pairs of distinct rules \(f(c_1,c_2) \rightarrow d, f(c_1',c_2') \rightarrow d\) in \({\widehat{R}}^{F_c} _i\), we can assume by induction that \(f(c_1,c_2)^{\mathcal {A}} = f^{\mathcal {A}} (c_1^{\mathcal {A}},c_2^{\mathcal {A}}) = d^{\mathcal {A}} = f^{\mathcal {A}} (c_1'^{\mathcal {A}},c_2'^{\mathcal {A}}) = f(c_1',c_2')^{\mathcal {A}} \) holds. Since \({\mathcal {A}} \) satisfies cextensionality for f, this implies that \(c_1^{\mathcal {A}} = c_1'^{\mathcal {A}}, c_2^{\mathcal {A}} = c_2'^{\mathcal {A}} \) or \(c_1^{\mathcal {A}} = c_2'^{\mathcal {A}}, c_2^{\mathcal {A}} = c_1'^{\mathcal {A}} \). If the former is the case, then we take the first option in the nondeterministic choice, and otherwise we take the second option. Overall, this yields a run r of the nondeterministic construction such that \({\mathcal {A}} \) is a model of \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\). Note that \({\mathcal {A}} \) is a model of the rules in \(R(F_c,{\mathrel {>_{ lpo }}})\) since it satisfies commutativity for every symbol in \(\varSigma _c \). Since \(s_0, t_0\) have the same canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\), we know that
. This yields \(s_0^{\mathcal {A}} = t_0^{\mathcal {A}} \) since \({\mathcal {A}} \) is a model of \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\).
Second, we show the onlyifdirection by contraposition. Thus, assume that there is a run r of the algorithm such that \(s_0, t_0\) have different canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\). Since \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) is canonical, this implies that
. As in the proof of Theorem 2, let \({\mathcal {A}} \) be the initial algebra for \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) viewed as a set of identities over the signature
. Then
implies
. Thus, it is sufficient to show that \({\mathcal {A}} \) satisfies E, commutativity for every \(f\in \varSigma _c \), extensionality for every \(f\in \varSigma ^e \setminus \varSigma _c \), and cextensionality for every \(f\in \varSigma ^e \cap \varSigma _c \). Since the former three properties can be shown as in the proof of Theorem 2, we concentrate on the last one.
Thus, let \(f\in \varSigma ^e \cap \varSigma _c \) be a commutative and cextensional function symbol and \(s_1,t_1,s_2,t_2\in G(\varSigma ,C)\) be terms such that the terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) evaluate to the same element of \({\mathcal {A}} \). Again, this implies that these terms have the same canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\). Let \(s_1',t_1',s_2',t_2'\) be the canonical forms of the terms \(s_1,t_1,s_2,t_2\), respectively. As before, we distinguish several cases, but since f is commutative we now also need to take the rules in \(R(F_c,{\mathrel {>_{ lpo }}})\), which are part of \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\), into account:Summing up, we have thus shown that \({\mathcal {A}} \) satisfies cextensionality for the symbols in \(\varSigma ^e \cap \varSigma _c \), which completes the proof of the theorem.\(\square \)

First, assume that the canonical forms of \(f(s_1,s_2)\) and \(f(t_1,t_2)\) still have root symbol f. We distinguish several subcases, depending on how the rules in \(R(F_c,{\mathrel {>_{ lpo }}})\) have been applied:

The terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) have the canonical forms \(f(s_1',s_2')\) and \(f(t_1',t_2')\), respectively, and the corresponding arguments are syntactically equal, i.e., \(s_1'=t_1'\) and \(s_2'=t_2'\). As in the proof of Theorem 2, this implies that \(s_1\) and \(t_1\) as well as \(s_2\) and \(t_2\) respectively evaluate to the same elements of \({\mathcal {A}} \).

The case where the terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) have the canonical forms \(f(s_2',s_1')\) and \(f(t_2',t_1')\), respectively, can be handled in the same way since then again \(s_1'=t_1'\) and \(s_2'=t_2'\).

The terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) have the canonical forms \(f(s_2',s_1')\) and \(f(t_1',t_2')\), respectively, and the corresponding arguments are syntactically equal, i.e., \(s_2'=t_1'\) and \(s_1'=t_2'\). In this case we can derive that \(s_2\) and \(t_1\) as well as \(s_1\) and \(t_2\) respectively evaluate to the same elements of \({\mathcal {A}} \).

The case where the terms \(f(s_1,s_2)\) and \(f(t_1,t_2)\) have the canonical forms \(f(s_1',s_2')\) and \(f(t_2',t_1')\), respectively, can be handled in the same way since then again \(s_2'=t_1'\) and \(s_1'=t_2'\).


Second, assume that the canonical forms of \(f(s_1,s_2)\) and \(f(t_1,t_2)\) are the same constant d. Again, we distinguish several subcases, depending on how the rules in \(R(F_c,{\mathrel {>_{ lpo }}})\) have been applied before the reduction to the constant d:

The term rewriting system \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) contains the rules \(f(s_1',s_2')\rightarrow d\) and \(f(t_1',t_2')\rightarrow d\). By the construction of \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\), we thus have \(s_1' = t_1'\) and \(s_2' = t_2'\), or \(s_1' = t_2'\) and \(s_2' = t_1'\) since no new constant rules have been added. This implies that \(s_1\) and \(t_1\) as well as \(s_2\) and \(t_2\) respectively evaluate to the same elements of \({\mathcal {A}} \), or \(s_1\) and \(t_2\) as well as \(s_2\) and \(t_1\) respectively evaluate to the same elements of \({\mathcal {A}} \).

The cases where the function rules reducing to d contain the arguments of f in other permutations can be treated in the same way.

Coming back to Example 4, we note that \(\phi = p\vee \lnot p\) is valid, and thus (by Lemma 9), the identity \(f_\vee (p,f_\lnot (p)) \mathrel {\approx }1\) holds in all algebras that satisfy \(E_\phi \) and interpret f as a commutative and cextensional symbol. Using the rewrite system generated by the run \(r_1\), we obtain the following rewrite sequence:For the run \(r_2\), we obtain the sequenceThus, for both runs the terms \(f_\vee (p,f_\lnot (p))\) and 1 have the same canonical form 1.
$$\begin{aligned} f_\vee (p,f_\lnot (p)) \rightarrow f_\vee (1,f_\lnot (p)) \rightarrow f_\vee (1,f_\lnot (1)) \rightarrow f_\vee (1,0) \rightarrow 1. \end{aligned}$$
$$\begin{aligned} f_\vee (p,f_\lnot (p)) \rightarrow f_\vee (0,f_\lnot (p)) \rightarrow f_\vee (0,f_\lnot (0)) \rightarrow f_\vee (0,1) \rightarrow 1. \end{aligned}$$
Corollary 5
Consider a finite set of ground identities \(E\subseteq G(\varSigma ,C_0)\times G(\varSigma ,C_0)\) as well as sets \(\varSigma _c \subseteq \varSigma \) and \(\varSigma ^e \subseteq \varSigma \) of commutative and extensional symbols, respectively, and two terms \(s_0,t_0\in G(\varSigma ,C_0)\). The problem of deciding whether the identity \(s_0\mathrel {\approx }t_0\) holds in every algebra that satisfies E, commutativity for every \(f\in \varSigma _c \), extensionality for every \(f\in \varSigma ^e \setminus \varSigma _c \), and cextensionality for every \(f\in \varSigma ^e \cap \varSigma _c \) is coNPcomplete.
Proof
Since Proposition 2 yields coNPhardness, it is sufficient to show that the complement problem is in NP. This is an easy consequence of Theorem 3. In fact, to show that \(s_0\mathrel {\approx }t_0\) does not hold in all such algebras, it is sufficient to generate one run r of our nondeterministic construction, and then test whether \(s_0\) and \(t_0\) have different canonical forms w.r.t. \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\). The system \({\widehat{R}}^{F_c,\varSigma ^e} _r(E)\) can be generated in nondeterministic polynomial time, and the canonical forms can then be computed in polynomial time. \(\square \)
6 Related Work
As already mentioned in the introduction, several approaches for how to solve the word problem for finite sets of ground identities by constructing a canonical rewrite system have been developed in the literature [7‐9]. Restricted to the case of uninterpreted function symbols, the approach employed in the present paper is similar to the rewritingbased congruence closure algorithm developed in [7]. In the presence of interpreted function symbols, i.e., ones that are must satisfy certain nonground identities, our approach differs from previous ones [12, 14, 15] by the use of an ordered rewrite system. Basically, in our approach the ordered rewrite system deals with the semantic properties of the interpreted function symbols, and the only interaction with the construction of the rewrite system for the ground identities is through computing canonical forms of the lefthand sides of function rules using the ordered rewrite system. Otherwise, the construction is the same as for the case of uninterpreted function symbols. The papers [14, 15], which consider symbols that are associativity and commutativity (AC), deal with the problem that AC does not allow for a terminating nonground rewrite system not by employing ordered rewriting, but by using rewriting modulo AC. This makes the computation of critical pairs more complicated and requires the use of completion modulo AC. One can actually use ordered rewriting to obtain a canonical ground rewrite system for AC [17], but since associativity is not shallow, the simple approach of combining this system with a rewrite system constructed from ground identities used in the present paper would not work. In fact, since it is known that the word problem for commutative semigroups is ExpSpacecomplete [21], it is clear that some expensive additional computations are required. The approach in [12] treats, e.g., commutativity by dealing with it directly in the construction of the ground rewrite system, rather than by employing a separate ordered rewrite system. To the best of our knowledge, no other work than ours has dealt with extensionality in the context of ground identities.
Shallow identities have been investigated in term rewriting and unification theory as a class of identities with good computational properties before. Comon et al. [16] introduce a more general notion of shallow identities than the strongly shallow identities employed in the present paper. According to [16], an identity is shallow if every variable occurs at depth at most one in the left and righthand side of the identity. In particular, this notion of shallow also encompasses ground identities and identities like idempotency [i.e. \(f(x,x) = x\)]. It is shown in [16] that the word problem for shallow theories is decidable. The approach basically applies a completion procedure akin to unfailing completion [22] to obtain an ordered rewrite system whose ground instantiation is canonical. This yields decidability of the word problem as considered in our Corollary 1, but no complexity result is stated in [16]. Extensionality is not considered in that paper. Our restriction to sets of strongly shallow identities S allows us to obtain an ordered rewrite system \((S^*,\mathrel {>_{ lpo }})\) whose ground instantiation \(R(S^*,\mathrel {>_{ lpo }})\) is canonical in a very simple way. For more general shallow identities, this simple construction would not work (e.g., idempotency would destroy finiteness of \(S^*\)). In particular, our polynomiality results depend of the fact that \(S^*\) is of polynomial size if the signature \(\varSigma _S \) is aritybounded.
Lynch et al. [23, 24] introduce a method called schematic saturation for automatically generating a ground satisfiability procedure for a given theory T by combining the axioms of T with ground clauses using paramodulation. Similar to KnuthBendix completion, schematic saturation need not terminate. However, if it does terminate, then the word problem for ground identities w.r.t. the theory T is decidable in polynomial time by superposition because superposition only generates polynomially many clauses. Similar to our complexity result for strongly shallow identities stated in Corollary 1, this polynomiality result needs the assumption that the signature used by the theory T is aritybounded. If applied to commutativity and clauses specifying extensionality for noncommutative symbols, schematic saturation terminates,^{4} which means that superposition can be used to decides the word problem for ground identities w.r.t. that theory in polynomial time. Schematic saturation does not terminate in the presence of commutative symbols that are cextensional. But it can be extended by one more deletion rule, which then leads to termination.^{5} If schematic saturation with the deletion rule halts on the clauses representing some theory, then the word problem for ground identities w.r.t. that theory is again decidable by superposition, but now it may generate exponentially many clauses. This appears to yield an ExpTime decision procedure for the coNPcomplete word problem considered in Corollary 5. The problem with applying the results in [23, 24] is that there are no criteria known that are both easy to check and that guarantee that schematic saturation (with or without the additional deletion rule) terminates. One must run the procedure and see what happens. Unfortunately, there is currently no publicly accessible implementation of this procedure available that could be used for doing this.
The paper [25] develops a procedure that can combine decision procedures for the word problem for equational theories that share only constants. To be more precise, the following result is shown there.
Theorem 4
(Baader and Tinelli [25]) Let \(E := E_1 \cup E_2\), where, for \(i = 1, 2\), \(E_i\) is a nontrivial equational theory over \(\varSigma _i\). Furthermore, assume that \(\varSigma := \varSigma _1 \cap \varSigma _2\) is a finite set of constant symbols, and that \(E_1\) and \(E_2\) agree on \(\varSigma \), that is, \(c =_{E_1} d\) iff \(c =_{E_2} d\) for all \(c, d \in \varSigma \).
If the word problem is decidable for \(E_1\) and \(E_2\), then it is also decidable for E.
Though it is not explicitly stated in [25], the combination procedure preserves polynomiality for the word problem, i.e., if the word problem is decidable in polynomial time for \(E_1\) and \(E_2\), then it is also decidable in polynomial time for E (see the termination proof for the procedure in [26]).
This result can be applied in the setting of ground identities with interpreted function symbols considered in the present paper. Indeed, assume that F and G are equational theories over the disjoint signatures \(\varSigma _F \) and \(\varSigma _G \) such that the word problem for finite set of ground identities w.r.t. F is decidable, and the same is true for G. Then the word problem for finite sets of ground identities w.r.t. \(F\cup G\) is decidable as well. In fact, while the original ground identities may then contain symbols from both \(\varSigma _F\) and \(\varSigma _G\), flattening as in our construction of the rewrite system R(E) can be used to obtain a conservative extension of the input theory in which only constants are shared between the F and the G part. These theories may not yet agree on the shared constants, but this can be achieved by iteratedly adding identities between the shared constants derived from one theory to the other. This way, we can, for example, combine the decision procedure for the case of ground identities w.r.t. strongly shallow identities described in Sect. 3 with a decision procedure for ground identities w.r.t. AC symbols [14, 15, 27, 28]. This kind of combination also works in the presence of extensional symbols. In fact, while extensionality is axiomatized by conditional identities rather than identities, our results in Sect. 4 show that we can represent its effect using the equational theory axiomatized by the computed canonical rewrite system, which allows us to apply the above combination result.
7 Conclusion
We have shown, using a rewritingbased approach, that adding strongly shallow identities (like commutativity) and extensionality for certain function symbols to a finite set of ground identities leaves the complexity of the word problem in P. In contrast, adding cextensionality for commutative function symbols raises the complexity to coNP. For classical congruence closure, it is wellknown that it can actually be computed in \(O(n\log n)\) [4, 11]. Since this complexity upper bound can also be achieved using a rewritingbased approach [9, 12], we believe that the approach developed here can also be used to obtain an \(O(n\log n)\) upper bound for the word problem for ground identities in the presence of commutativity (and maybe other kinds of strongly shallow identities) and extensionality, as in Sect. 4, but this question was not in the focus of the present paper.
Our result on adding extensionality is actually not restricted to the case of interpreted function symbols whose semantic properties are defined by strongly shallow identities. Instead, we exhibited general properties that the canonical rewrite system computed by semantic congruence closure for the equational theory under consideration must satisfy such that extensionality can be treated in the simple way described in Sect. 4. An important condition is that the system is canonical for all ground terms, and not just a finite set of “small” terms determined by the input. It would be interesting to see whether rewritingbased approaches to congruence closure for other theories produced rewrite systems that satisfy the conditions stated in Sect. 4. For the rewritingbased approaches that deal with associativecommutative symbols proposed in [14, 15], this is not directly the case since they use rewriting modulo AC. However, our approach for adding extensional symbols in Sect. 4 can probably be extended to deal with rewriting modulo AC since the AC symbols are not extensional. We have mentioned in the previous section that extensionality in the presence of nonextensional AC symbols can be dealt with by the combination results in [25]. However, while this approach yields a decision procedure for the word problem, it does not provide us with a canonical ground rewrite system. Another, and more challenging question would be to come up with an appropriate notion of extensionality for AC symbols, and to show that adding it leaves the word problem decidable.
In our treatment of strongly shallow identities, we have used a rather bruteforce way to obtain an ordered rewrite system that is canonical on ground terms, which basically adds all implied identities between strongly shallow hterms (see the definition of the set \(S^*\) in Sect. 2.3). Instead, one could use unfailing completion [22] to add only those consequences that are needed to achieve ground confluence. The results in [16] show that this always terminates on shallow (and thus also strongly shallow) identities. The interesting question would then be under what conditions on the strongly shallow identities this actually terminates in polynomial time without requiring arityboundedness. As mentioned in the previous section, the approach used in [16] can actually deal with more general kinds of shallow identities than the strongly shallow ones considered here. It produces an ordered rewrite system that is canonical for ground terms, but it is not clear whether, with the more general notion of shallow identities employed in [16], this system satisfies the properties required in Sect. 4 for adding extensionality. For strongly shallow identities other than commutativity, it would also be interesting to come up with appropriate variants of cextensionality.
Regarding the application motivation from DL, it should be easy to extend tableaubased algorithms for DLs to deal with individuals named by ground terms and identities between these terms. Basically, the tableau algorithm then works with the canonical forms of such terms, and if it identifies two terms (e.g., when applying a tableaurule dealing with number restrictions), then the rewrite system and the canonical forms need to be updated. More challenging would be a setting where rules are added to the knowledge base that generate new terms if they find a certain constellation in the knowledge base (e.g., a married couple, for which the rule introduces a ground term denoting the couple and assertions that link the couple with its components). In the context of firstorder logic and modal logics, the combination of tableaubased reasoning and congruence closure has respectively been investigated in [29] and [30].
Acknowledgements
The authors would like to thank the reviewers (of the original IJCAR 2020 paper [13] and of this journal version) for their careful reading and their useful comments, which helped to improve the presentation of this article. We also thank Barbara Morawska and Christopher Lynch for helpful discussions. Franz Baader was partially supported by the AI competence center ScaDS.AI Dresden/Leipzig and the Deutsche Forschungsgemeinschaft (DFG), Grant 389792660, within TRR 248. Deepak Kapur was partially supported by the National Science Foundation (NSF) CCF 1908804.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.