Skip to main content
Top

Open Access 28-10-2021

On the structure of solution-sets to regular word equations

Authors: Joel D. Day, Florin Manea

Published in: Theory of Computing Systems

Activate our intelligent search to find suitable subject content or patents.

search-config
insite
CONTENT
download
DOWNLOAD
print
PRINT
insite
SEARCH
loading …

Abstract

For quadratic word equations, there exists an algorithm based on rewriting rules which generates a directed graph describing all solutions to the equation. For regular word equations – those for which each variable occurs at most once on each side of the equation – we investigate the properties of this graph, such as bounds on its diameter, size, and DAG-width, as well as providing some insights into symmetries in its structure. As a consequence, we obtain a combinatorial proof that the problem of deciding whether a regular word equation has a solution is in NP.

Notes

This article belongs to the Topical Collection: Special Issue on International Colloquium on Automata, Languages and Programming (ICALP 2020)

Guest Editors: Artur Czumaj and Anuj Dawar

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

On the structure of solution-sets to regular word equations

Joel D. Day1  Orchid and Florin Manea2  
Theory of Computing Systems2021:10058

DOI: 10.1007/s00224-021-10058-5

Accepted: 4 August 2021

Published: 28 October 2021

Abstract

For quadratic word equations, there exists an algorithm based on rewriting rules which generates a directed graph describing all solutions to the equation. For regular word equations – those for which each variable occurs at most once on each side of the equation – we investigate the properties of this graph, such as bounds on its diameter, size, and DAG-width, as well as providing some insights into symmetries in its structure. As a consequence, we obtain a combinatorial proof that the problem of deciding whether a regular word equation has a solution is in NP.

Keywords

Quadratic word equations Regular word equations String solving NP

1 Introduction

A word equation is a tuple (α, β), which we shall usually write as \(\alpha \doteq \beta \), such that α and β are words comprised of letters from a terminal alphabetΣ = {,,…} and variables from a set X = {x, y, z,…}. Solutions are substitutions of the variables for words in Σ making both sides identical. For example, one solution to the word equation \(x {\mathtt {a}} {\mathtt {b}} y \doteq y {\mathtt {b}} {\mathtt {a}} x\) is given by x → and y →. A system of equations is a set of equations, and a solution to the system is a substitution for the variables which is a solution to all the equations in the system.

One of the most fundamental questions concerning word equations is the satisfiability problem: determining whether or not a word equation has a solution. The first general algorithm for the satisfiability problem was presented by Makanin [22] in 1977. Since then, several further algorithms have been presented. Most notable among these are the algorithm given by Plandowski [25] which demonstrated that the problem is included in the complexity class PSPACE, the algorithm based on Lempel-Ziv encodings by Plandowksi and Rytter [26], and the method of recompression by Jeż, which has since been shown to require only non-deterministic linear space [15, 16]. On the other hand, it is easily seen that solving word equations is NP-hard due to fact that the subcase when one side of the equation consists only of terminals is exactly the pattern matching problem which is NP-complete [3, 12]. It remains a long-standing open problem whether or not the satisfiability problem for word equations is contained in NP.

Recently, there has been elevated interest in solving more general versions of the satisfiability problem, originating from practical applications in e.g. software verification where several string solving tools capable of solving word equations are being developed [1, 2, 4, 6, 18] and database theory [13, 14], where one asks whether a given (system of) word equation(s) has a solution which satisfies some additional constraints. Prominent examples include requiring that the substitution for a variable x belongs to some regular language \({\mathscr{L}}_{x}\) (regular constraints), or that the lengths of the substitutions of the variables satisfy a set of given linear diophantine equations. Adding regular constraints makes the problem PSPACE complete (see [10, 25, 27]), while it is another long standing open problem whether the satisfiability problem with length constraints is decidable. There are also many other kinds of constraints, however many lead to undecidable variants of the satisfiability problem [7, 19]. The main difficulty in dealing with additional constraints is that the solution-sets to word equations are often infinite sets with complex structures. For example, they are not parametrisable [24], and the set of lengths of solutions is generally not definable in Presburger arithmetic [20]. Thus, a better understanding of the solution-sets and their structures is a key aspect of improving our ability to solve problems relating to word equations both in theory and practice.

Quadratic word equations (QWEs) are equations in which each variable occurs at most twice. For QWEs, a conceptually simple and easily implemented algorithm exists which produces a representation of the set of all solutions as a graph. Despite this, however, the satisfiability problem for quadratic equations remains NP-hard, even for severely restricted subclasses [8, 11], while inclusion in NP, and whether the satisfiability problem with length constraints is decidable, have remained open for a long time, just as for the general case.

The algorithm solving QWEs is based on iteratively rewriting the equation(s) according to some simple rules called Nielsen transformations. If there exists a sequence of transformations from the original equation to the trivial equation \(\varepsilon \doteq \varepsilon \), then the equation has a solution. Otherwise, there is no solution. Hence the satisfiability problem becomes a reachability problem for the underlying rewriting transformation relation, which we denote ⇒NT. It is natural to represent this relation as a directed graph \({\mathscr{G}}^{\Rightarrow _{NT}}\) in which the vertices are word equations and the edges are the rewriting transformations. This has the advantage that the set of all solutions to an equation E corresponds exactly to the set of walks in the graph starting at E and finishing at the trivial equation \(\varepsilon \doteq \varepsilon \).1 Consequently, the properties of the subgraph of \({\mathscr{G}}^{\Rightarrow _{NT}}\) containing all vertices reachable from E (denoted \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\)) are also informative about the set of solutions to the equation. For example, in [24] a connection is made between the non-parametrisability of the solution set of E and the occurrence of combinations of cycles in the graph. Since equations with a paramtrisable solution set are much easier to work with when dealing with additional constraints, this also establishes a connection between the structure of \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) and the potential (un)decidability of variants of the satisfiability problem. Moreover, new insights into the structure and symmetries of these graphs are necessary for better understanding and optimising the practical performance of the algorithm.

Our contribution

We consider a subclass of QWEs called regular equations (RWEs) introduced in [23]. A word equation is regular if each variable occurs at most once on each side of the equation. Thus, for example, \(x {\mathtt {a}}{\mathtt {b}} y \doteq y {\mathtt {b}} {\mathtt {a}} x\) is regular while \(x {\mathtt {a}} {\mathtt {b}} x \doteq y {\mathtt {b}} {\mathtt {a}} y\) is not. Understanding RWEs is a vital step towards understanding the quadratic case, not only because they constitute a significant and general subclass, but also because many non-regular quadratic equations can exhibit the same behaviour as regular ones (consider, e.g. HCode \(zz \doteq x{\mathtt {a}}{\mathtt {b}} y y{\mathtt {b}}{\mathtt {a}} x\) for which all solutions must satisfy z = xy = yx). The satisfiability problem was shown in [8] to be NP-hard for RWEs, and shown to be in NP in [9] for some restricted subclasses including the classes of regular-reversed and regular-ordered equations.

For RWEs E, we investigate the structure of the graphs \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\), and as a consequence, are able to describe some of their most important properties. We achieve this by first noting that \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) can be divided into strongly connected components \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) for which all the vertices are equations of the same length (⇒ shall be used to denote the restriction of ⇒NT to length preserving transformations only). The ‘full’ graph \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) is comprised of these individual components \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) arranged in a DAG-like structure of linear depth (see Section 3) and therefore many properties and parameters of the ‘full’ graph \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) are determined by the equivalent properties and parameters of the individual components \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\). We then focus on the structure of the subgraphs \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\), and as a result are able to give bounds on certain parameters such as diameter, size, and DAG-width.

Our structural results come in two stages, based on whether the equation belongs to a the class of ‘jumbled’ equations introduced in Section 6. In the first stage, we consider equations which are not jumbled, and we show that for all such equations E, there exists a jumbled equation \(\hat {E}\) such that \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is comprised mainly of several well-connected near-copies of \({\mathscr{G}}^{\Rightarrow }_{[\hat {E}]}\). For jumbled equations \(\hat {E}\), we show in Section 7 that every vertex in \({\mathscr{G}}^{\Rightarrow }_{[\hat {E}]}\) is close to a vertex in a certain normal form. We show that the vertices in this normal form are determined to a large extent by a property invariant under ⇒ introduced in Section 5.

With regards to the diameter of \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\), we give upper bounds which are polynomial in the length of the equation. It follows that the diameter of the full graph \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) is also polynomial, and consequently, that the satisfiability problem for RWEs is NP-complete. This can be generalised to systems of equations satisfying a natural extension of the regularity property (see Section 11). We also give exact upper and lower bounds on the number of vertices2 in \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) for a subclass of RWEs called basic RWEs (see Section 4), as well as describing exactly for which equations these bounds are achieved. For RWEs which are not basic, we can infer similar bounds, at the cost of a small (linear in the length of the equation) degree of imprecision. Since in the worst case (e.g. for equations without a solution), running the algorithm will perform a full ‘search’ of the graph, the number of vertices is integral to the running time of the algorithm, and is potentially a better indicator of difficult instances than the complexity class alone. An example of this, comes from comparing two subclasses of RWEs called regular-ordered and regular rotated equations. It follows from our results that while both classes have an NP-complete satisfiability problem, if \(E^{\prime }\) is regular-ordered, then \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) will contain at most n vertices, where n is the length of the equation, while if \(E^{\prime }\) is regular rotated, but not regular-ordered, then \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) will contain \(\frac {n!}{2}\) vertices, indicating a vast difference in the number of vertices the algorithm would have to visit.

Motivated by generalisations of the satisfiability problem permitting additional constraints, we also consider the connectivity of the graphs \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\). To do this, we use DAG-width, a measure for directed graphs which is in several ways analogous to treewidth for undirected graphs. Intuitively, equations for which \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) has low DAG-width are likely to be more amenable when considering additional constraints such as length constraints (see Section 3.3). We give an example class of equations for which the DAG-width is unbounded, as well as a class for which the DAG-width is at most two. The latter includes the class of regular-ordered equations which is the most general subclass of QWEs for which it is known that the satisfiability problem with length constraints is decidable [20], and we expect that both cases will be interesting classes to consider in the context of this problem.

2 Preliminaries

For a set S, we denote the cardinality of S by Card(S). Let Σ be an alphabet. By Σ, we denote the set of all words over Σ, and by ε the empty word. By Σ+, we denote the free semigroup Σ∖{ε}. A word u is a prefix (resp. suffix) of a word w if there exists v such that w = uv (resp. w = vu). Similarly, u is a factor of w if there exist \(v,v^{\prime }\) such that \(w = v u v^{\prime }\). A prefix/suffix/factor is proper if is neither the whole word w, nor ε. The length of a word w is denoted |w|, while for ∈Σ, |w| denotes the number of occurrences of in w. For a word w = 12n with iΣ for 1 ≤ in, the notation w[i] refers to the letter i in the ith position. By wR, we denote the reversal nn− 11 of the word w. Two words w1, w2 are conjugate (written \(w_{1} \sim w_{2}\)) if there exist u, v such that w1 = uv and w2 = vu.

We shall generally distinguish between two types of alphabet: an infinite set X = {x1, x2,…} of variables, and a set Σ = {,,…} of terminal symbols. We shall assume that Card(Σ) ≥ 2, and that there exists an order on X leading to a lexicographic order on X. For a word α ∈ (XΣ), we shall denote by var(α) the set {xXx is a factor of α}. We shall denote by qv(α) the set {xvar(α)∣|α|x = 2}. A word equation is a tuple (α, β) ∈ (XΣ)× (XΣ), usually written \(\alpha \doteq \beta \). Solutions are morphisms h : (XΣ)Σ with h() = for all ∈Σ such that h(α) = h(β). The satisfiability problem is the problem of deciding algorithmically whether a given word equation has a solution. For equations E given by \(\alpha \doteq \beta \), we shall often extend notations regarding words in (XΣ) to E for convenience, so that, e.g. |E| = |αβ|, var(E) = var(αβ) and qv(E) = qv(αβ). An equation \(\alpha \doteq \beta \) is quadratic if |αβ|x ≤ 2 for all xX. It is regular if |α|x ≤ 1 and |β|x ≤ 1 hold for all xX. Thus all regular equations are quadratic, but not all quadratic equations are regular. We shall usually abbreviate regular (resp. quadratic) word equation to RWE (resp. QWE). For \(Y \subseteq X\), let \(\pi _{Y} : (X\cup {{\varSigma }}^{*}) \to {Y}^{*}\) be the morphism such that πY(x) = x if xY and πY(x) = ε otherwise; i.e. πY is a projection from (XΣ) onto Y. A regular equation E given by \(\alpha \doteq \beta \) is regular-ordered if πqv(E)(α) = πqv(E)(β), it is regular rotated if \(\pi _{{qv}(E)}(\alpha ) \sim \pi _{{qv}(E)}(\beta )\) and it is regular reversed if πqv(E)(α) = πqv(E)(β)R.

Given a set S and binary relation \({\mathscr{R}} \subseteq S \times S\), we denote the reflexive-transitive closure of \({\mathscr{R}}\) as \({\mathscr{R}}^{*}\). For each sS, we denote by \([s]_{{\mathscr{R}}}\) the set \(\{s^{\prime } \mid (s,s^{\prime }) \in {\mathscr{R}}^{*} \}\). The relation \({\mathscr{R}}\) may be represented as a directed graph, which we denote \({\mathscr{G}}^{{\mathscr{R}}}\), with vertices from S and edges from \({\mathscr{R}}\). Usually, we will be interested in the subgraph of \({\mathscr{G}}^{{\mathscr{R}}}\) containing vertices belonging to \([s]_{{\mathscr{R}}}\) for some sS. Thus, for a subset T of S we shall denote by \({\mathscr{G}}^{{\mathscr{R}}}_{T}\) the subgraph of \({\mathscr{G}}^{{\mathscr{R}}}\) containing vertices from T. Given a (directed) graph \({\mathscr{G}}\), with vertices \(V({\mathscr{G}})\) and edges \(E({\mathscr{G}})\), a root vertex is some \(v\in V({\mathscr{G}})\) such that there does not exist \((u,v) \in E({\mathscr{G}})\). We denote by \({diam}({\mathscr{G}})\) the diameter of the graph \({\mathscr{G}}\), by which we mean the maximum length of a shortest (directed) path between two vertices. For our purposes, we are really interested in the maximum length of shortest paths only when they exist, meaning that we shall not adopt the convention that \({diam}({\mathscr{G}}) = \infty \) when \({\mathscr{G}}\) is a directed graph which is not strongly connected.

For \(W,V^{\prime } \subseteq V({\mathscr{G}})\), we say that W guards \(V^{\prime }\) if for all \((u,v) \in E({\mathscr{G}})\) with \(u \in V^{\prime }\), we have \(v \in V^{\prime } \cup W\). If \({\mathscr{G}}\) is acyclic, we write \(v_{1} \leq _{{\mathscr{G}}} v_{2}\) if there is a directed path from v1 to v2 in \({\mathscr{G}}\) or v1 = v2. Following [5], A DAG-decomposition of \({\mathscr{G}}\) is a pair (D, χ) such that D is a directed acyclic graph (DAG) with vertices V (D), and χ = {XddV (D)} is a family of subsets of \(V({\mathscr{G}})\) satisfying:
  1. (D1)

    \(V({\mathscr{G}}) = \bigcup \limits _{d\in V(D)}X_{d}\),

     
  2. (D2)

    if \(d,d^{\prime },d^{\prime \prime } \in V(D)\) such that \(d \leq _{D} d^{\prime } \leq _{D} d^{\prime \prime }\), then \(X_{d} \cap X_{d^{\prime \prime }} \subseteq X_{d^{\prime }}\),

     
  3. (D3)

    For all edges \((d,d^{\prime })\) of D, \(X_{d} \cap X_{d^{\prime }}\) guards \(X_{\geq d^{\prime }} \backslash X_{d}\), where \(X_{\geq d^{\prime }} = \bigcup \limits _{d^{\prime \prime }\geq _{D} d^{\prime }}X_{d^{\prime \prime }}\), and for all root vertices d, Xd is guarded by .

     
The width of the DAG-decomposition is \(\max \limits \{{\text {Card}}(X_{d}) \mid d \in V(D)\}\). The DAG-width of \({\mathscr{G}}\) is the minimum width of any possible DAG-decomposition of \({\mathscr{G}}\) and is denoted \({dgw}({\mathscr{G}})\).

3 An Algorithm for Solving Regular Word Equations

In this section we present the algorithm for solving QWEs as a rewriting system defined by a relation ⇒NT. The rewriting relation is derived from morphisms called Nielsen transformations, and we shall abuse this terminology slightly and generally also refer to the rewriting transformations themselves as Nielsen transformations. The Nielsen transformations never introduce new variables or terminal symbols, and never increase the length of the equation. They also preserve the properties of being quadratic (resp. regular). Thus, given a quadratic (resp. regular) word equation E, the set \(\{E^{\prime } \mid E\Rightarrow _{NT}^{*} E^{\prime } \}\) of equations reachable via Nielsen transformations is finite. Moreover, given an equation which has a solution h, there is always a Nielsen transformation which produces an equation which has a solution, such that at least one of the new equation or the new solution is strictly shorter than the previous one. It follows that, given an equation which possesses a solution, it is possible to reach the equation \(\varepsilon \doteq \varepsilon \) after finitely many rewriting steps. For a more detailed description of the algorithm, we refer the reader to e.g. Chapter 12 of [21].

3.1 Nielsen Transformations

The Nielsen transformations (morphisms) are defined as follows: for xXΣ and yX, let ψx<y : (XΣ)→ (XΣ) be the morphism given by ψx<y(y) = xy and ψx<y(z) = z whenever zy. We define the rewriting transformations via the relations ⇒L, ⇒R,⇒> as follows. Suppose we have a QWE E of the form \(x\alpha \doteq y\beta \) where x, yXΣ and α, β ∈ (XΣ). Then:
  1. 1.

    if xqv(E) and xy, then \(x\alpha \doteq y\beta \Rightarrow _{L} x\psi _{y<x}(\alpha ) \doteq \psi _{y<x}(\beta )\), and

     
  2. 2.

    if yqv(E) and xy, then \(x\alpha \doteq y\beta \Rightarrow _{R} \psi _{x<y}(\alpha ) \doteq y\psi _{x<y}(\beta )\), and

     
  3. 3.

    if xXqv(E), then \(x\alpha \doteq y\beta \Rightarrow _> x\alpha \doteq \beta \), and

     
  4. 4.

    if yXqv(E), then \(x\alpha \doteq y\beta \Rightarrow _> \alpha \doteq y\beta \), and

     
  5. 5.

    if x = y, then \(x\alpha \doteq y\beta \Rightarrow _{>} \alpha \doteq \beta \).

     
Moreover, for a QWE E of the form \(\alpha \doteq \beta \) with α, β ∈ (XΣ), and for each \(Y \subseteq {var}(E)\), we have the additional transformations \(\alpha \doteq \beta \Rightarrow _> \pi _{X\backslash \{Y\}}(\alpha ) \doteq \pi _{X\backslash \{Y\}}(\beta )\).

Now, our full rewriting relation, ⇒NT, is given by ⇒L∪⇒R∪⇒>.3 For convenience, we shall define ⇒ to be ⇒L∪⇒R. We shall call the rewriting transformations from ⇒ length-preserving, since they are exactly those for which the resulting equation has the same length as the original. The following observation follows directly from the definition of ⇒NT.

Remark 3.1

Let \(E, E^{\prime }\) be QWEs such that \(E \Rightarrow _{NT} E^{\prime }\). If E is regular, then \(E^{\prime }\) is regular. Moreover, if \(E \Rightarrow E^{\prime }\), then \({var}(E) = {var}(E^{\prime })\), \({qv}(E) = {qv}(E^{\prime })\), and \(|E| = |E^{\prime }|\). Similarly, if \(E \Rightarrow _> E^{\prime }\), then \({var}(E^{\prime }) \subseteq {var}(E)\), \({qv}(E^{\prime }) \subseteq {qv}(E)\), and \(|E^{\prime }| < |E|\). Hence the set \(\{E^{\prime \prime } \mid E \Rightarrow _{NT}^{*} E^{\prime \prime }\}\) is finite.

If E1, E2 are RWEs such that E1LE2, then it follows from the definitions that there exist x, yX and α1, α2, β1, β2 ∈ (X∖{x, y}) such that E1 is given by \(x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}\) and E2 is given by \(x \alpha _{1} y \alpha _{2} \doteq \beta _{1} y x \beta _{2}\). Extending this observation to multiple applications of ⇒L, we may conclude that the set \(\{ E_{2} \mid E_{1} \Rightarrow _{L}^{*} E_{2}^{\prime }\}\) is exactly the set \(\{x \alpha _{1} y \alpha _{2} \doteq \beta _{3} x \beta _{2} \mid \beta _{3} \sim y \beta _{1}\}\). A similar statement can be made for \(\Rightarrow _{R}^{*}\). Consequently, the reflexive transitive closures \(\Rightarrow _{L}^{*}\) and \(\Rightarrow _{R}^{*}\) are symmetric. Hence, we may also observe the following.

Remark 3.2

Let E be a RWE and Z ∈{L, R}. Then \({\text {Card}}(\{ E^{\prime } \mid E \Rightarrow _{Z}^{*} E^{\prime }\}) < |E|\) and \(\Rightarrow _{Z}^{*}\) is an equivalence relation. It follows that ⇒ is also an equivalence relation.

The following well-known result forms the basis for the algorithm for solving QWEs.

Theorem 3.3

[21] Let E be a QWE. Then E has a solution if and only if \(E \Rightarrow _{NT}^{*} \varepsilon \doteq \varepsilon \).

3.2 Representing the Set of Solutions as a Graph

Theorem 3.3 provides the basis for treating the satisfiability of QWEs as a reachability problem for the rewriting relation ⇒NT. Since any relation R is naturally represented as a (directed) graph \({\mathscr{G}}^R\), it is also natural to interpret the resulting algorithm as a search in the graph \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\): it suffices to to determine whether there exists a path in the graph from the original equation E to the trivial equation \(\varepsilon \doteq \varepsilon \). In fact, the graph \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) can tell us significantly more than simply whether a solution to E exists: every walk from E to \(\varepsilon \doteq \varepsilon \) in \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) corresponds to a solution to E and likewise, every solution to E is represented by a walk in \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) from E to \(\varepsilon \doteq \varepsilon \). Thus the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\) contain a full description of all solutions to E, and as such, their properties and structure are of inherent interest to the study of QWEs and their solutions. An immediate example of this is the diameter, which is strongly related to the complexity of the satisfiability problem, as demonstrated in the following proposition.

Proposition 3.4

Let \({\mathscr{C}}\) be a class of QWEs. Suppose there exists a constant \(k \in \mathbb {N}\) such that for each \(E \in {\mathscr{C}}\), we have \({diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \in O(|E|^k)\). Then the satisfiability problem for \({\mathscr{C}}\) is in NP.

Proof

Let \({\mathscr{C}}\) be a class of quadratic word equations and let \(k\in \mathbb {N}\) such that for each \(E \in {\mathscr{C}}\), \({diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \in O(|E|^k)\). By Theorem 3.3, to check whether an equation \(E \in {\mathscr{C}}\) has a solution, we have to check whether there is a path from E to \(\varepsilon \doteq \varepsilon \) in \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\). If such a path exists, then due to our assumptions about the diameter, one exists of length at most O(|E|k). Moreover, for each edge E1NTE2 in the path, we have that |E2|≤|E1|≤|E|, so verifying that E1NTE2 can be achieved in linear time. Hence, subject to appropriate non-deterministic choices, we may find such a path whenever it exists in O(|E|k+ 1) time and the satisfiability problem for \({\mathscr{C}}\) is in NP. □

Many properties will be determined mostly (i.e. up to some small imprecision) on the subgraphs obtained by restricting our rewriting relation to length-preserving transformations only (i.e. to ⇒). Since the rewriting relation ⇒NT allows us to preserve or decrease the length, but never increase it again, any walk in the graph will visit a subgraph containing equations of each length only once, and in order of decreasing length. The following proposition confirms how we may infer a global property of \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) from its ‘local’ values in the individual subgraphs \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) in the case of two properties we are particularly interested in: diameter and DAG-width.

Proposition 3.5

Let E be a QWE. Then
  1. 1.

    \({diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \leq (|E|+1)(1+\max \limits \{{diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}) \mid E \Rightarrow _{NT}^{*} E^{\prime }\})-1\), and

     
  2. 2.

    \({dgw}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) = \max \limits \{{dgw}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}) \mid E \Rightarrow _{NT}^{*} E^{\prime }\}\).

     

Proof

The second statement is a direct consequence of Theorem 6 in [5]. We shall consider the first statement. Let E be a quadratic word equation. Let
$$m = \max \{ {diam}(\mathscr{G}^{\Rightarrow}_{[E^{\prime}]} \mid E \Rightarrow_{NT}^{*} E^{\prime})\}.$$
Let E1, E2,…En be the shortest path in \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) between E1 and En. Then EiNTEi+ 1 for 1 ≤ i < n. Consequently, for each i,1 ≤ i < n either |Ei| = |Ei+ 1| or |Ei| > |Ei+ 1|. Let j1, j2,…,jk be all the indices i for which the latter holds. Then, since the length of an equation cannot be negative, we necessarily have that k ≤|E|. Moreover, we have that \(E_{1} \Rightarrow ^{*} E_{j_{1}}\), \(E_{j_k+1} \Rightarrow ^{*} E_n\), and for each i, 1 ≤ i < k, \(E_{j_i+1} \Rightarrow ^{*} E_{j_{i+1}}\). Since, for each Ei, \({\mathscr{G}}^{\Rightarrow }_{[E_i]}\) is a subgraph of \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\), and by our assumption that the path E1, E2,…En is minimal in \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\), it follows that the path \(E_{1}, E_{2},\ldots , E_{j_{1}}\) is minimal in \({\mathscr{G}}^{\Rightarrow }_{[E_{1}]}\), and thus j1 − 1 ≤ m. By the same argument, the path \(E_{j_k+1}, E_{j_k+2},\ldots , E_{n}\) is minimal in \({\mathscr{G}}^{\Rightarrow }_{[E_{j_k+1}]}\) so we get that njk − 1 ≤ m and similarly, for each i, 1 ≤ i < k, we may conclude that ji+ 1ji − 1 ≤ m. It follows that
$$n = (n - j_{k}) + (j_{k} - j_{k-1}) + {\ldots} + (j_{2} - j_{1}) + j_{1} \leq (k+1)(m+1)$$
meaning the length of the path E1, E2,…En is at most (|E| + 1)(m + 1). Since this holds for all choices of E1, En, we have that \({diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \leq (|E|+1)(m+1)-1\) as claimed. □
In what follows, we shall focus predominantly on the structure of the (sub)graphs \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) corresponding to the length-preserving transformations belonging to ⇒ (see Fig. 1). This has the advantage of allowing us to apply further restrictions, in particular a reduction to the case of basic equations introduced in Section 4, without significantly altering the structure of the graph. It is worth pointing out that due to Remark 3.2, the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is strongly connected whenever E is a RWE. The same is generally not true in the case of arbitrary QWEs E, or for the full graph \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\).
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig1_HTML.png
Fig. 1

The graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) in the case that E is the equation \(x{\mathtt {a}} y {\mathtt {a}} z {\mathtt {b}} w \doteq w {\mathtt {b}} y x {\mathtt {a}} z {\mathtt {a}}\) with variables x, y, z, w and terminal symbols ,. Generated in python using the PyDot graph drawing package

3.3 Solving Equations Modulo Constraints

Often, it is important to determine whether a given equation has a solution which satisfies some additional constraints. For some types of constraints, it is possible to adapt the algorithm by finding, for each Nielsen transformation, an appropriate corresponding transformation of the constraints. For example, if x, y, zX and we have the length constraint |x| = |z|, when we apply the Nielsen transformation associated with ψy<x to our equation, we replace each occurrence of x with yx. Thus, the updated constraint would be |x| + |y| = |z|. Unfortunately, as is the case for length constraints, the resulting set of possible equation/constraint combinations can become infinite, meaning that the modified version of the algorithm is not guaranteed to terminate.

A possible solution to this is to find finite descriptions of the potentially infinite sets of constraints which may occur alongside each equation. The task of finding such descriptions, and consequently the potential decidability of the corresponding extended satisfiability problems, is dependent on the structural properties of the graph, as can be seen e.g. in [20, 24].

One case in which computing finite descriptions is straightforward is when the graph \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) is acyclic (i.e. a DAG). Unfortunately, inspection of the definition of ⇒NT reveals that this is not true for the majority of RWEs (or QWEs). Hence, when considering the existence of algorithms for solving word equations with length constraints (or constraints of other types), it is natural to specifically consider classes of equations E where the graphs \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) have particularly DAG-like (or un-DAG-like) structures, which we can measure using parameters such as DAG-width.

3.4 Properties of the Graphs \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) for Regular Equations E

In order to understand the full graphs \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\), we mostly need to understand the (strongly connected) components corresponding to the length-preserving transformations, as we can easily see that these components will be connected in a DAG-like structure whose depth is at most |E|. Hence, our main goal is to describe the structure of the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\) for RWEs E. This is done in several steps, with each one accounting for a particular structural feature or aspect as follows.
  1. (1)

    In the first step (Section 4), we describe the effect of terminal symbols, single occurrence variables, and ‘decomposability’ on the structure of \({\mathscr{G}}^{\Rightarrow }_{[E]}\), essentially reducing the structure of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) to \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) for a ‘basic’ equation \(E^{\prime }\) which does not contain any of these features.

     
  2. (2)

    Building on an important technical tool developed in Section 5, the second step (Section 6) introduces the class of jumbled equations. For equations \(E^{\prime }\) which are not jumbled, but which have nevertheless been simplified as per the first step, there exists a specific repetitive structure allowing us to express \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) as a combination of (near) copies of some smaller graph \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]}\) where \(E^{\prime \prime }\) is a jumbled equation obtained by deleting the appropriate variables from \(E^{\prime }\).

     
  3. (3)

    In the third step (Section 7), we show that for jumbled equations \(E^{\prime \prime }\), all vertices in \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]}\) are ‘close’ to a vertex from a small subset conforming to a very particular structure called Lex Normal Form.

     
  4. (4)

    Finally, in Sections 89 and 10, we exploit our structural results to investigate the diameter, number of vertices and connectivity (DAG-width) of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) respectively. In Section 11 we note a generalisation of our results to systems of equations.

     

4 Basic Equations: A Convenient Abstraction

The current section is devoted to reducing the study of the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\) to the case of basic equations. This has several advantages, including a significant reduction in the size of the graphs which is useful for working with examples, as well as allowing for the simpler formulation of precise results, e.g. regarding the size of the graphs in Section 9, as well as avoiding unnecessary repetition in the formal statements and their proofs.

Definition 4.1 (Basic Equations)

Let E be a QWE given by \(\alpha \doteq \beta \). Then E is decomposable if there exist proper prefixes \(\alpha ^{\prime }, \beta ^{\prime }\) of α and β such that \({var}(\alpha ^{\prime }) \cap {qv}(E) = {var}(\beta ^{\prime }) \cap {qv}(E)\). Otherwise, E is indecomposable. E is basic if it is indecomposable and α, βqv(E).

For a basic RWE, both sides of the equation are permutations of the same set of variables, for example \(x_{1} x_{2} x_3 \doteq x_3 x_{1} x_{2}\) and \(x y w z \doteq w z x y\) are both basic RWEs. On the other hand, \(x y z w \doteq yx zw \), \({\mathtt {a}} x {\mathtt {b}} y \doteq y {\mathtt {b}}{\mathtt {a}} x\) and \(x y \doteq y z\) are not – the first being decomposable and the latter two containing terminal symbols and variables occurring on one side only.

We firstly consider decomposable equations E, showing that in this case the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is isomorphic to \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) for some shorter equation \(E^{\prime }\). The main step in this respect is the following observation.

Lemma 4.2

Let E be a RWE given by \(\alpha _{1}\alpha _{2} \doteq \beta _{1}\beta _{2}\) where α1, α2, β1, β2 ∈ (XΣ) such that α1, β1ε and var(α1) ∩ qv(E) = var(β1) ∩ qv(E). Let \(E^{\prime }\) be a RWE. Then \(E \Rightarrow E^{\prime }\) if and only if there exist α3, β3 ∈ (XΣ) such that \(E^{\prime }\) is given by \(\alpha _3 \alpha _{2} \doteq \beta _3\beta _{2}\) and \(\alpha _{1} \doteq \beta _{1} \Rightarrow \alpha _3 \doteq \beta _3\).

Proof

Suppose E is a RWE given by \(\alpha _{1}\alpha _{2} \doteq \beta _{1}\beta _{2}\) where α1, α2, β1, β2 ∈ (XΣ) with α1, β1ε such that var(α1) ∩ qv(E) = var(β1) ∩ qv(E). Let \(E^{\prime }\) be a RWE. Suppose firstly that α3, β3 ∈ (XΣ) such that \(\alpha _{1} \doteq \beta _{1} \Rightarrow _L \alpha _3 \doteq \beta _3\) (the case that \(\alpha _{1} \doteq \beta _{1} \Rightarrow _R \alpha _3 \doteq \beta _3\) is symmetric). Then it follows from the definition of ⇒L that α1 has a prefix yqv(E). Hence, there exist xXΣ and γ, δ1, δ2 ∈ (XΣ) such that α1 = yγ, β1 = xδ1yδ2, α3 = α1 and β3 = δ1xyδ2. By the definition of ⇒L, it follows that \(\alpha _{1} \alpha _{2} \doteq \beta _{1} \beta _{2} \Rightarrow _L \alpha _3 \alpha _{2} \doteq \beta _3\beta _{2}\) and thus \(E \Rightarrow E^{\prime }\).

Now suppose instead that \(E \Rightarrow _L E^{\prime }\) (again, the case that \(E \Rightarrow _R E^{\prime }\) is symmetric). Then by definition of ⇒L, there exists a variable yqv(E) in the leftmost position of α1 which also occurs in β1β2. Moreover, it follows from the definition of ⇒L and the fact that \(E \Rightarrow _L E^{\prime }\) that yβ1[1]. Furthermore, since var(α1) ∩ qv(E) = var(β1) ∩ qv(E), y must in fact occur somewhere in β1, so there exist xXΣ and γ, δ1, δ2 ∈ (XΣ) such that α1 = yγ and β1 = xδ1yδ2, and such that \(E^{\prime }\) is given by \(\alpha _3\alpha _{2} \doteq \beta _3 \beta _{2}\) where α3 = α1 and β3 = δ1xyδ2. It follows from the definition of ⇒L that \(\alpha _{1} \doteq \beta _{1} \Rightarrow _L \alpha _3 \doteq \beta _3\) and thus the statement holds. □

It follows immediately from Lemma 4.2 that the relation ⇒ preserves the properties of being (in)decomposable and basic.

Corollary 4.3

Let E1, E2 be RWEs such that E1E2. Then E1 is indecomposable if and only if E2 is indecomposable. Consequently E1 is basic if and only if E2 is basic.

Moreover, a straightforward induction yields the following description of the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\) in the case that E is decomposable.

Corollary 4.4

Let E be a decomposable RWE given by \(\alpha _{1}\alpha _{2} \doteq \beta _{1}\beta _{2}\) where α1, α2, β1, β2 ∈ (XΣ) such that α1, β1ε and var(α1) ∩ qv(E) = var(β1) ∩ qv(E). Then \({\mathscr{G}}_{[E]}^{\Rightarrow }\) is isomorphic to \({\mathscr{G}}_{[\alpha _{1} \doteq \beta _{1}]}^{\Rightarrow }\) and can be obtained from \({\mathscr{G}}_{[\alpha _{1} \doteq \beta _{1}]}^{\Rightarrow }\) by replacing each vertex \(\alpha _3 \doteq \beta _3 \in [\alpha _{1} \doteq \beta _{1}]_{\Rightarrow }\) with \(\alpha _3 \alpha _{2} \doteq \beta _3 \beta _{2}\).

Corollary 4.4 accounts for decomposable equations. It remains to consider the case of equations containing terminal symbols and variables occurring on only one side (and therefore once overall). For this case, we need the following notion for relating the structure of two graphs.

Definition 4.5 (Isolated path compression)

Let G1, G2 be (directed) graphs. We say that G1 is an isolated path compression of order n of G2 if G2 may be obtained from G1 by replacing each edge \((e,e^{\prime })\) in G1 by a path \((e,e_{1}), (e_{1}, e_{2}), {\ldots } (e_{k-1},e_k), (e_k, e^{\prime })\) such that kn and e1, e2, e3,…,ek are new vertices unique to the edge \((e,e^{\prime })\).

Informally, an isolated path compression of a graph is obtained simply by replacing ‘isolated paths’ (paths whose internal vertices are not adjacent to to any vertices outside the path) of a bounded length with single edges. Therefore, the overall structure is generally preserved, and most properties will be preserved, or change proportionally to the order n (Fig. 2).
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig2_HTML.png
Fig. 2

The graph G1 is an isolated path compression of order two of the graph G2

Remark 4.6

Consider graphs G1, G2 such that G1 is an isolated path compression of order n of G2. If dgw(G1) = 1, then dgw(G2) ∈{1,2}.4

If dgw(G1) ≥ 2, then the dgw(G1) = dgw(G2). Moreover, diam(G2) ≤ (n + 1)diam(G1), and the number of vertices (resp. edges) in G2 is at most the number of vertices in G1 plus n times the number of edges of G1.

Using isolated path compressions, it is possible to describe the structure of the graph \({\mathscr{G}}_{[E]}^{\Rightarrow }\) for any RWE E in terms of the graph \({\mathscr{G}}_{[E^{\prime }]}^{\Rightarrow }\) for the RWE \(E^{\prime }\) obtained from E by erasing all terminal symbols and single-occurrence variables from E (i.e. projecting onto qv(E)).

Lemma 4.7

Let E be an indecomposable RWE given by \(\alpha \doteq \beta \). Then the graph \({\mathscr{G}}_{[\pi _{{qv}(E)}(\alpha ) \doteq \pi _{{qv}(E)}(\beta )]}^{\Rightarrow }\) is isomorphic to an isolated path compression of order |E| of \({\mathscr{G}}_{[E]}^{\Rightarrow }\).

Proof

Let E be an indecomposable RWE given by \(\alpha \doteq \beta \). Note that by Corollary 4.4, it follows that \(E^{\prime }\) is indecomposable for every \(E^{\prime } \in [E]_{\Rightarrow }\). We begin by considering the simple cases arising when Card(qv(E)) < 2. If Card(qv(E)) = 0, then \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is a single vertex with no edges. Moreover, \(\pi _{{qv}(E)} (\alpha ) \doteq \pi _{{qv}(E)} (\beta )\) is the trivial equation \(\varepsilon \doteq \varepsilon \), so \({\mathscr{G}}_{[\pi _{{qv}(E)}(\alpha ) \doteq \pi _{{qv}(E)}(\beta )]}^{\Rightarrow }\) is also a single vertex with no edges. The two graphs are clearly isomorphic, so the lemma holds trivially.

Now suppose that Card(qv(E)) = 1. Then E has the form \(w_{1} x w_{2} \doteq w_3 x w_4\) where qv(E) = {x} and w1, w2, w3, w4 ∈ ((XΣ)∖{x}). It necessarily follows that the equation \(\pi _{{qv}(E)} (\alpha ) \doteq \pi _{{qv}(E)} (\beta )\) has the form \(x \doteq x\), meaning that \({\mathscr{G}}_{[\pi _{{qv}(E)}(\alpha ) \doteq \pi _{{qv}(E)}(\beta )]}^{\Rightarrow }\) is again a single vertex with no edges. If w1, w2ε, then E is decomposable, a contradiction. Otherwise, \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is a cycle of length \(\max \limits \{|w_{1}|,|w_{2}|\} < |E|\), so again the statement of the lemma follows directly. Thus, for the remainder of the proof, we shall suppose that Card(qv(E)) ≥ 2.

Before proceeding, we remark that for any equation \(E^{\prime }\) given by \(\alpha ^{\prime } \doteq \beta ^{\prime }\), if \(\alpha ^{\prime }[1], \beta ^{\prime }[1] \notin {qv}(E^{\prime })\), then either \(E^{\prime }\) is decomposable, or \(|\alpha ^{\prime }|, |\beta ^{\prime }| \in \{0,1\}\). Both are contradictions to previous assumptions (the former to the fact that E is indecomposable, and hence \(E^{\prime }\) is indecomposable for all \(E^{\prime } \in [E]_{\Rightarrow }\), and the latter to the assumption that Card(qv(E)) ≥ 2 which is only possible if |α|,|β|≥ 2). Consequently, we may partition [E] into two sets S1 and S2 where S1 contains all equations \(E^{\prime }\) given by \(\alpha ^{\prime } \doteq \beta ^{\prime }\) such that \(\alpha ^{\prime }[1]\) and \(\beta ^{\prime }[1] \) are both in \( {qv}(E^{\prime })\), and S2 contains all equations \(E^{\prime }\) given by \(\alpha ^{\prime } \doteq \beta ^{\prime }\) such that exactly one of \(\alpha ^{\prime }[1],\beta ^{\prime }[1]\) is in \({qv}(E^{\prime })\). Intuitively, S1 will be the set of ‘surviving’ vertices in the isolated path compression while S2 consists of those vertices which belong only to the ‘isolated paths’ which are contracted/compressed. Supporting this, we show the following two claims regarding elements of S2.

Claim 4.7.1

Suppose that \(E^{\prime }\in S_{2}\). Then the in-degree and out-degree of \(E^{\prime }\) in \({\mathscr{G}}_{[E]}^{\Rightarrow }\) are both exactly one.

Proof

W.l.o.g. suppose that \(E^{\prime }\) is given by \(x \alpha ^{\prime }_{1} y \alpha _{2}^{\prime } \doteq y \beta ^{\prime }\) with \(y \in {qv}(E^{\prime })\) and \(x \notin {qv}(E^{\prime })\). It follows from the definitions of ⇒L and ⇒R that there is no \(E^{\prime \prime }\) such that \(E^{\prime } \Rightarrow _R E^{\prime \prime }\), and exactly one \(E^{\prime \prime }\) such that \(E^{\prime } \Rightarrow _L E^{\prime \prime }\). Thus the out-degree is one as claimed. Now consider the in-degree and let \(E^{\prime \prime } \in [E]_{\Rightarrow }\) such that \(E^{\prime \prime } \Rightarrow E^{\prime }\). Note that by the definition of ⇒R, we cannot have that \(E^{\prime \prime } \Rightarrow _R E^{\prime }\), so we must have that \(E^{\prime \prime } \Rightarrow _L E^{\prime }\). It follows from the fact that the Nielsen transformation morphisms ψy<x are injective that there is exactly one such \(E^{\prime \prime }\), and thus we also have that the in-degree of \(E^{\prime }\) is one as claimed. □

Claim 4.7.2

Let \(E^{\prime } \in S_{2}\). Then there exists k ≤|E|− 2 and E0, E1,…,Ek+ 1 ∈ [E] and Z ∈{L, R} such that all the following statements hold:
  1. 1.

    E0, Ek+ 1S1,

     
  2. 2.

    EiS2 for 1 ≤ ik,

     
  3. 3.

    EiZEi+ 1 for 0 ≤ ik,

     
  4. 4.

    there exists i,1 ≤ ik such that \(E^{\prime } = E_i\).

     

Proof

W.l.o.g. suppose that the RHS of \(E^{\prime }\) has a prefix contained in \({qv}(E^{\prime })\). Then since \({\text {Card}}({qv}(E^{\prime })) \geq 2\) and since \(E^{\prime }\) is regular, the LHS also contains at least one variable in \({qv}(E^{\prime })\) and we may either write \(E^{\prime }\) as
  1. (1)

    \(a_ia_{i+1} {\ldots } a_k x \alpha ^{\prime }_{1} x^{\prime } a_{1} a_{2}{\ldots } a_{i-1} y \alpha _{2}^{\prime } \doteq y \beta ^{\prime }\), or

     
  2. (2)

    \(a_ia_{i+1} {\ldots } a_k x a_{1} a_{2}{\ldots } a_{i-1} y \alpha _{2}^{\prime } \doteq y \beta ^{\prime }\)

     
where k ≤|E|− 2, aj ∈ (Xqv(E)) ∪Σ for 1 ≤ jk, \(x,x^{\prime }, y \in {qv}(E)\) with \(x,x^{\prime }\not =y\), and \(\alpha _{1}^{\prime },\alpha _{2}^{\prime }, \beta ^{\prime } \in (X\cup {{\varSigma }})^{*}\). Consider the first case. Let E0 be the equation given by
$$x^{\prime} a_{1} a_{2} {\ldots} a_{k} x \alpha_{1}^{\prime} y \alpha_{2}^{\prime} \doteq y \beta^{\prime},$$
let Ek+ 1 be the equation given by
$$x \alpha_{1}^{\prime} x^{\prime} a_{1} a_{2} {\ldots} a_{k} y \alpha_{2}^{\prime} \doteq y \beta^{\prime},$$
and for 1 ≤ jk, let Ej be the equation given by
$$a_{j} a_{j+1} {\ldots} a_{k} x \alpha_{1}^{\prime} x^{\prime} a_{1} a_{2} {\ldots} a_{j-1} y \alpha_{2}^{\prime} \doteq y \beta^{\prime}.$$
Then clearly, \(E_i = E^{\prime }\), E0, Ek+ 1S1, EjS2 for 1 ≤ jk, and EjREj+ 1 for 0 ≤ jk as claimed.
Now consider the second case. Let E0 = Ek+ 1 be the equation given by
$$x a_{1} a_{2} {\ldots} a_{k} y \alpha_{2}^{\prime} \doteq y \beta^{\prime}$$
and for 1 ≤ jk, let Ej be the equation given by
$$a_{j} a_{j+1} {\ldots} a_{k} x a_{1} a_{2} {\ldots} a_{j-1} y \alpha_{2}^{\prime} \doteq y \beta^{\prime}.$$
Then clearly, \(E_i = E^{\prime }\), E0, Ek+ 1S1, EjS2 for 1 ≤ jk, and EjREj+ 1 for 0 ≤ jk as claimed. □

Claims 4.7.1 and 4.7.2 are sufficient to show that the equations/vertices in S1 are exactly those which survive in an isolated path compression of order |E| of \({\mathscr{G}}_{[E]}^{\Rightarrow }\). To state this more formally, we define a relation ◇ on the equations in S1 such that \(E^{\prime } \diamond E^{\prime \prime }\) if \(E^{\prime },E^{\prime \prime } \in S_{1}\) and either \(E^{\prime } \Rightarrow E^{\prime \prime }\), or there exist E1, E2,…EkS2 and Z ∈{L, R} such that \(E^{\prime } \Rightarrow _Z E_{1}\Rightarrow _Z E_{2} \Rightarrow _Z {\ldots } \Rightarrow _Z E_k \Rightarrow _Z E^{\prime \prime }\). Then we get the following.

Claim 4.7.3

The graph \({\mathscr{G}}_{S_{1}}^{\diamond }\) is an isolated path compression of order |E| of \({\mathscr{G}}_{[E]}^{\Rightarrow }\).

Proof

Directly from Claims 4.7.1 and 4.7.2. □

It remains to show that \({\mathscr{G}}_{S_{1}}^{\diamond }\) is isomorphic to \({\mathscr{G}}_{[\hat {E}]}^{\Rightarrow }\) where \(\hat {E}\) is given by \(\pi _{{qv}(E)}(\alpha ) \doteq \pi _{{qv}(E)}(\beta )\). In other words, we must show that there is an isomorphism \(f : S_{1} \to [\hat {E}]_{\Rightarrow }\) such that for any \(E^{\prime },E^{\prime \prime } \in S_{1}\), f(E1) ⇒ F(E2) if and only if E1E2. Before we can define f, we must firstly show that there exists \(\tilde {E} \in S_{1}\) given by \(\tilde {\alpha } \doteq \tilde {\beta }\) such that \(\pi _{{qv}(\tilde {E})}(\tilde {\alpha }) = \pi _{{qv}(E)}(\alpha )\) and \(\pi _{{qv}(\tilde {E})}(\tilde {\beta }) = \pi _{{qv}(E)}(\beta )\). If ES1 then we may simply take \(\tilde {E} = E\). Otherwise, ES2, meaning exactly one of α[1],β[1] is in qv(E). W.l.o.g. suppose that α[1]∉qv(E). Then we may write α = γxα1yα2 and β = yβ1 where γ ∈ ((Xqv(E)) ∪Σ)+, x, yqv(E), and α1, α2, β1 ∈ (XΣ). Furthermore, we have \(E \Rightarrow _R^{*} \tilde {E}\) where \(\tilde {E} \in S_{1}\) is given by \(x \alpha _{1} \gamma _{2} y \alpha _{2} \doteq y \beta _{1}\), in which case we have that \(\pi _{{qv}(E)}(\alpha ) = \pi _{{qv}(\tilde {E})}(x \alpha _{1} \gamma _{2} y \alpha _{2})\) and \(\pi _{{qv}(E)}(\beta ) = \pi _{{qv}(\tilde {E})}(y\beta _{1})\) (note that we have that \({qv}(E) = {qv}(\tilde {E})\) since \(\tilde {E} \in [E]_{\Rightarrow }\)).

Since \(\tilde {E} \in S_{1}\), we may write \(\tilde {E}\) as
$$y_{1} \gamma_{1} y_{2} \gamma_{2} {\ldots} y_{n} \gamma_{n} \doteq y^{\prime}_{1} \delta_{1} y^{\prime}_{2} \delta_{2} {\ldots} y^{\prime}_{n} \delta_{n}$$
where \(y_i,y^{\prime }_i \in {qv}(\tilde {E})\) and \(\gamma _i, \delta _i \in ((X\backslash {qv}(\tilde {E})) \cup {{\varSigma }})^{*}\) for 1 ≤ in. Consequently, by our assumptions about \(\tilde {E}\), it follows that \(\hat {E}\) may be written as \(y_{1}y_{2},{\ldots } y_n \doteq y^{\prime }_{1} y^{\prime }_{2}{\ldots } y^{\prime }_n\). With this information, we are now ready to define our isomorphism \(f : S_{1} \to [\hat {E}]_{\Rightarrow }\) via two morphisms σLHS and σRHS. In particular, let \(\sigma _{LHS} : {qv}(\tilde {E})^{*} \to (X\cup {{\varSigma }})^{*}\) be the morphism such that σLHS(yi) = yiγi for 1 ≤ in and \(\sigma _{RHS} : {qv}(\tilde {E})^{*} \to (X\cup {{\varSigma }})^{*}\) be the morphism such that \(\sigma _{RHS}(y^{\prime }_j) = y^{\prime }_j \delta _i\) for 1 ≤ jn. Then we define f such that \(f(\alpha ^{\prime } \doteq \beta ^{\prime })\) is \(\sigma _{LHS}(\alpha ^{\prime }) \doteq \sigma _{RHS}(\beta ^{\prime })\) for all \(\alpha ^{\prime } \doteq \beta ^{\prime } \in S_{1}\). In order to show that f is indeed an isomorphism with the desired property that f(E1) ⇒ F(E2) if and only if E1E2, we need the following claim.

Claim 4.7.4

Let \(\hat {\alpha _{1}},\hat {\alpha _{2}},\hat {\beta _{1}},\hat {\beta _{2}} \in {qv}(E)^{*}\) such that \(\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \in [\hat {E}]_{\Rightarrow }\), and \(\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}}) \in [E]_{\Rightarrow }\). Then \(\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \Rightarrow \hat {\alpha _{2}} \doteq \hat {\beta _{2}}\) if and only if \(\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}}) \diamond \sigma _{LHS}(\hat {\alpha _{2}}) \doteq \sigma _{RHS}(\hat {\beta _{2}})\).

Proof

Suppose firstly that \(\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \Rightarrow \hat {\alpha _{2}} \doteq \hat {\beta _{2}}\), and w.l.o.g. suppose that \(\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \Rightarrow _L \hat {\alpha _{2}} \doteq \hat {\beta _{2}}\). Then there exist z1, z2,…,znqv(E), μqv(E) such that \(\hat {\alpha _{1}} = z_i \mu \) for some i,1 ≤ in, \(\hat {\beta _{1}} = z_{1}z_{2} {\ldots } z_n\), \(\hat {\alpha _{2}} = \hat {\alpha _{1}}\) and \(\hat {\beta _{2}} = z_{2}{\ldots } z_{i-1} z_{1} z_i {\ldots } z_n\). Let a1, a2,…ak ∈ (Xqv(E)) ∪Σ such that σRHS(z1) = z1a1a2ak. Let E0 be given by \(\sigma _{LHS}(z_i \mu ) \doteq \sigma _{RHS}(z_{1}z_{2}{\ldots } z_n)\), and for 1 ≤ jk, let Ej be given by \(\sigma _{LHS}(z_i \mu ) \doteq a_j a_{j+1} {\ldots } a_k \sigma _{RHS}(z_{2} {\ldots } z_{i-1}) z_{1} a_{1} a_{2} {\ldots } a_{j-1} \sigma _{RHS}(z_i {\ldots } z_n)\), and let Ek+ 1 be given by \(\sigma _{LHS}(z_i \mu ) \doteq \sigma _{RHS}(z_{2} {\ldots } z_{i-1}) z_{1} a_{1} a_{2} {\ldots } a_k \sigma _{RHS}(z_i {\ldots } z_n)\). Then we have E0LE1L… ⇒LEk+ 1. Moreover, we have that E0S1 is given by \(\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}})\), Ek+ 1S1 is given by \(\sigma _{LHS}(\hat {\alpha _{2}}) \doteq \sigma _{RHS}(\hat {\beta _{2}})\), and EjS2 for 1 ≤ jk so E0Ek+ 1 as required.

Now suppose that \(\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}}) \diamond \sigma _{LHS}(\hat {\alpha _{2}}) \doteq \sigma _{RHS}(\hat {\beta _{2}})\). Then by the definition of ◇, there exist E0, E1,…,Ek+ 1 ∈ [E] such that E0S1 is given by \(\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}})\), Ek+ 1S1 is given by \(\sigma _{LHS}(\hat {\alpha _{2}}) \doteq \sigma _{RHS}(\hat {\beta _{2}})\), E0ZE1Z… ⇒ZEk+ 1 for some Z ∈{L, R}, and EjS2 for 1 ≤ jk.

W.l.o.g. suppose that Z = L. Then there exist z1, z2,…,znqv(E),μqv(E), and a1, a2,…,a ∈ (Xqv(E)) ∪Σ such that \(\hat {\alpha _{1}} = z_i \mu \) for some i,1 ≤ in, \(\hat {\beta } = z_{1} z_{2}{\ldots } z_n\), and σRHS(z1) = z1a1a2a. Hence E0 can be written as
$$\sigma_{LHS}(z_{i} \mu) \doteq z_{1} a_{1} a_{2} {\ldots} a_{\ell} \sigma_{RHS}(z_{2} z_{3} {\ldots} z_{n}).$$
Moreover, we have that \(E_0 \Rightarrow _L E_{1}^{\prime } \Rightarrow _L E_{2}^{\prime } \Rightarrow _L {\ldots } \Rightarrow E_{\ell }^{\prime } \Rightarrow _L E_{\ell +1}^{\prime }\) where \(E_j^{\prime }\) is given by
$$\sigma_{LHS}(z_{i} \mu) \doteq a_{j} a_{j+1} {\ldots} a_{\ell} \sigma_{RHS}(z_{2} {\ldots} z_{i-1}) z_{1} a_{1} {\ldots} a_{j-1} \sigma_{RHS}(z_{i}z_{i+1} {\ldots} z_{n})$$
for 1 ≤ jk, and \(E_{\ell + 1}^{\prime }\) is given by
$$\sigma_{LHS}(z_{i} \mu) \doteq \sigma_{RHS}(z_{2} {\ldots} z_{i-1}) z_{1} a_{1} {\ldots} a_{\ell} \sigma_{RHS}(z_{i}z_{i+1} {\ldots} z_{n}).$$
Note that E+ 1 may also be written
$$\sigma_{LHS}(z_{i}\mu) \doteq \sigma_{RHS}(z_{2}z_{3}{\ldots} z_{i-1} z_{1} z_{i} z_{i+1} {\ldots} z_{n}).$$
Now, since ⇒L is deterministic, and since E+ 1, Ek+ 1S1 while \(E_{j_{1}}^{\prime }, E_{j_{2}} \in S_{2}\) for each j1, 1 ≤ j1 and j2,1 ≤ j2k, we must necessarily have that k = . Since σLHS and σRHS are injective, we must have \(\hat {\alpha _{2}} = \hat {\alpha _{1}}\) and \(\hat {\beta _{2}} = z_{2}z_3{\ldots } z_{i-1} z_{1} z_i z_{i+1} {\ldots } z_n\). It follows from the definitions that \(\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \Rightarrow _L \hat {\alpha _{2}} \doteq \hat {\beta _{2}}\). □

It follows from Claim 4.7.4 by a simple induction with \(\tilde {E}\) as the base case that \(S_{1} = \{ \sigma _{LHS}(\hat {\alpha ^{\prime }}) \doteq \sigma _{RHS}(\hat {\beta ^{\prime }}) \mid \hat {\alpha ^{\prime }} \doteq \hat {\beta ^{\prime }} \in [\hat {E}]_{\Rightarrow } \}\), or equivalently that \(f(S_{1}) = [\hat {E}]_{\Rightarrow }\). The claim also states explicitly that \(\sigma _{LHS}(\hat {\alpha ^{\prime }}) \doteq \sigma _{RHS}(\hat {\beta ^{\prime }}) \diamond \sigma _{LHS}(\hat {\alpha ^{\prime \prime }}) \doteq \sigma _{RHS}(\hat {\beta ^{\prime \prime }})\) if and only if \(\hat {\alpha ^{\prime }} \doteq \hat {\beta ^{\prime }}\Rightarrow \hat {\alpha ^{\prime \prime }} \doteq \hat {\beta ^{\prime \prime }}\) and thus f is an isomorphism such that f(E1) ⇒ f(E2) if and only if E1E2 for all E1, E2S1. We may therefore conclude that \({\mathscr{G}}_{S_{1}}^{\diamond }\) is indeed isomorphic to \({\mathscr{G}}_{[\hat {E}]}^{\Rightarrow }\) as required. □

Combining Corollary 4.4 and Lemma 4.7, it is now possible to formulate the main result of this section, describing the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\) for arbitrary RWEs E in terms of graphs \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) for basic RWEs \(E^{\prime }\). An example of the theorem is given in Fig. 3.
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig3_HTML.png
Fig. 3

An example of Theorem 4.8. On the left is the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) in the case that E is given by \(x {\mathtt {a}} y z {\mathtt {a}} {\mathtt {b}} w \doteq y {\mathtt {a}} z {\mathtt {a}} x w\) with variables x, y, z, w and terminal symbols ,. On the right is \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) for the corresponding basic equation \(E^{\prime }\), which in this case is given by \(xyz \doteq yzx\). The graph on the right is isomorphic to an isolated path compression of order 2 of the graph on the right. Vertices internal to the isolated paths (i.e. those which are removed by the compression are shown in grey

Theorem 4.8

Let E be a RWE given by \(\alpha \doteq \beta \). Let \(\alpha ^{\prime },\beta ^{\prime }\) be the shortest non-empty prefixes of α, β respectively such that \({var}(\alpha ^{\prime }) \cap {qv}(E) = {var}(\beta ^{\prime }) \cap {qv}(E)\). Let \(E^{\prime }\) be the equation given by \(\pi _{{qv}(E)}(\alpha ^{\prime }) \doteq \pi _{{qv}(E)}(\beta ^{\prime })\). Then \(E^{\prime }\) is basic, and \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) is isomorphic to an isolated path compression of order |E| of \({\mathscr{G}}^{\Rightarrow }_{[E]}\).

Proof

Let \(S = {qv}(\alpha ^{\prime } \doteq \beta ^{\prime })\). Firstly, we shall show that \(\alpha ^{\prime } \doteq \beta ^{\prime }\) is indecomposable. Suppose for contradiction that \(\alpha ^{\prime } \doteq \beta ^{\prime }\) is decomposable. Then there exist proper prefixes \(\alpha ^{\prime \prime }, \beta ^{\prime \prime }\) of \(\alpha ^{\prime }\) and \(\beta ^{\prime }\) respectively such that \({var}(\alpha ^{\prime \prime }) \cap S = {var}(\beta ^{\prime \prime }) \cap S\). Then \(\alpha ^{\prime \prime }\) and \(\beta ^{\prime \prime }\) are proper prefixes of α and β, and since they are shorter than \(\alpha ^{\prime }\) and \(\beta ^{\prime }\), by our assumptions about \(\alpha ^{\prime }\) and \(\beta ^{\prime }\), we cannot have that \({var}(\alpha ^{\prime \prime }) \cap {qv}(E) = {var}(\beta ^{\prime \prime }) \cap {qv}(E)\). Consequently, either there exists \(x \in {var}(\alpha ^{\prime \prime }) \cap {qv}(E)\) such that \( x\notin {var}(\beta ^{\prime \prime }) \cap {qv}(E)\) or there exists \(x \in {var}(\beta ^{\prime \prime }) \cap {qv}(E)\) such that \(x \notin {var}(\alpha ^{\prime \prime }) \cap {qv}(E)\). W.l.o.g. suppose the former is true. Then \(x \notin {var}(\beta ^{\prime \prime })\), but since xqv(E), it follows from \({var}(\alpha ^{\prime }) \cap {qv}(E) = {var}(\beta ^{\prime }) \cap {qv}(E)\) that \(x \in {var}(\beta ^{\prime })\). However, this implies that xS, and since \(x \in {var}(\alpha ^{\prime \prime })\) but \(x\notin {var}(\beta ^{\prime \prime })\), we arrive at a contradiction to our assumption that \({var}(\alpha ^{\prime \prime }) \cap S = {var}(\beta ^{\prime \prime }) \cap S\).

Now, let \(E^{\prime \prime }\) be the equation given by \(\pi _{S}(\alpha ^{\prime }) \doteq \pi _S(\beta ^{\prime })\). By the assumption that \({var}(\alpha ^{\prime }) \cap {qv}(E) = {var}(\beta ^{\prime }) \cap {qv}(E)\), there is no variable xqv(E)∖S occurring in \(\alpha ^{\prime }\) or \(\beta ^{\prime }\). Consequently, \(E^{\prime \prime } = E^{\prime }\), and by Lemma 4.7, we have that \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) is isomorphic to an isolated path compression of order |E| of \({\mathscr{G}}^{\Rightarrow }_{[\alpha ^{\prime } \doteq \beta ^{\prime }]}\), which by Corollary 4.4 is isomorphic to \({\mathscr{G}}^{\Rightarrow }_{[E]}\). □

5 A Useful Invariant

When reasoning about the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\), we need a way to help determine whether or not, for two equations E1, E2, we have E1E2. Showing the positive case that E1E2 can be achieved by simply finding an appropriate sequence of length-preserving Nielsen transformations from E1 to E2. However, showing that E1E2 presents more of a challenge: the naive way would be to enumerate all vertices in \({\mathscr{G}}^{\Rightarrow }_{[E_{1}]}\) and show that E2 is not among them. However, this is not suitable for abstract reasoning, and, even in concrete cases, is inelegant and time-consuming.

The contribution of this section is a property of basic RWEs, defined as ΥE below, which is preserved under the relation ⇒ and thus provides a concise and more general means for showing that E1E2. It is an indispensable component of the proofs of our main results.

Definition 5.1 (The invariant Υ E)

Let E be a basic RWE such that Card(var(E)) > 1. Let # be a new symbol not in X. Then we may write E as \(x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}\) with x, yX and α1, α2, β1, β2 ∈ (X∖{x, y}). Let \({\mathscr{Z}}_E = {var}(\alpha _{1}\alpha _{2}\beta _{1}\beta _{2}) \cup \{\#\}\). Let the function \(Q_E : \mathcal {Z}_E \to X^2\) be defined as follows: for each \(z \in \mathcal {Z}_E \backslash \{\#\}\), let QE(z) = (u, v) where uz is a factor of xα1yα2 and vz is a factor of yβ1xβ2. Let QE(#) = (u, v) where uy is a factor of xα1yα2 and vx is a factor of yβ1xβ2. Let \({\varUpsilon }_{\!E} = \{Q_E(z) \mid z \in \mathcal {Z}_E\}\). If Card(var(E)) ≤ 1, then ΥE = .

Intuitively, given a basic RWE E of the form \(\alpha \doteq \beta \), we construct ΥE by taking, for each variable xvar(E), the pair (u, v) of predecessors of x in E, i.e. such that ux is a factor of α and vx is a factor of β. It follows directly from the definition of basic RWEs that this pair is unique, and it exists whenever x is not the leftmost variable in either α or β. The special case that x is the leftmost variable of α or β is handled by the special symbol #. The following observations follow directly from the definitions, but are central to the use of ΥE in later proofs.

Remark 5.2

Let E be a basic regular word equation given by \(\alpha y \doteq \beta x\) with x, yX and α, βX. Then for each zvar(α), there is exactly one element (u, v) ∈ΥE such that u = z. For each zvar(α), there is no element (u, v) ∈ΥE such that u = z. Similarly, for each wvar(β), there is exactly one element (u, v) ∈ΥE such that v = w and for each wvar(β), there is no element (u, v) ∈ΥE such that v = w.

The usefulness of ΥE as a property of basic RWEs arises from the fact that it is invariant under the length-preserving Nielsen transformations. Consequently for a given basic RWE E, we can use the set \(\{E^{\prime } \mid {\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E}\}\) as an over-approximation of the set [E].

Theorem 5.3

Let E1, E2 be basic RWEs such that E1E2. Then \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}\).

Proof

It is sufficient to prove the same statement for the case that E1E2. W.l.o.g. we may assume that E1LE2. The case that E1RE2 is symmetric. Moreover, if E1 = E2, then the statement holds trivially, thus we may assume that E1E2. The statement trivially holds for equations of the form \(xy \doteq yx\), since \([xy \doteq yx]_{\Rightarrow } = \{xy \doteq yx\}\). Otherwise, taking into account the fact that E1 and E2 are basic and therefore indecomposable, we have two cases: we may write E1 and E2 as either
  1. 1.

    \(x \alpha _{1} w \alpha _{2} y \alpha _3 \doteq y w \beta _{1} x \beta _{2}\) and \(x \alpha _{1} w \alpha _{2} y \alpha _3 \doteq w \beta _{1} y x \beta _{2}\), or

     
  2. 2.

    \(x \alpha _{1} y \alpha _{2} w \alpha _3 \doteq y w \beta _{1} x \beta _{2}\) and \(x \alpha _{1} y \alpha _{2} w \alpha _3 \doteq w \beta _{1} y x \beta _{2}\)

     
respectively, where w, x, yX with xy and α1, α2, α3, β1, β2 ∈ (X∖{x, y, w}) such that var(α1α2α3) = var(β1β2).

Suppose that we have the first case, then \(\mathcal {Z}_{E_{1}} = {var}(\alpha _{1}\alpha _{2}\alpha _3) \cup \{\#, w\}\) and \(\mathcal {Z}_{E_{2}} = {var}(\alpha _{1}\alpha _{2}\alpha _3) \cup \{\#,y\}\). Moreover, for each zvar(α1α2α3), there exist u, vX such that uz (resp. vz) is a factor of the LHS (resp. RHS) of both E1 and E2, so \(Q_{E_{1}}(z) = Q_{E_{2}}(z)\). Now, let a, b, c be the rightmost variables in xα1, wα2 and wβ1 respectively (i.e. their length-1 suffixes). Then we have that \(Q_{E_{1}}(w) = (a,y)\), \(Q_{E_{1}}(\#) = (b,c)\), \(Q_{E_{2}}(y) = (b,c)\), and \(Q_{E_{2}}(\#) = (a,y)\). Thus \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}\).

Now suppose instead that we have the second case. Similarly to the first case, we have that \(\mathcal {Z}_{E_{1}} = {var}(\alpha _{1}\alpha _{2}\alpha _3) \cup \{\#, w\}\), \(\mathcal {Z}_{E_{2}} = {var}(\alpha _{1}\alpha _{2}\alpha _3) \cup \{\#,y\}\) and for each zvar(α1α2α3), \(Q_{E_{1}}(z) = Q_{E_{2}}(z)\). Now, let a, b, c be the rightmost variables in xα1, wβ1 and yα2 respectively. Then we have that \(Q_{E_{1}}(w) = (c,y)\), \(Q_{E_{1}}(\#) = (a,b)\), \(Q_{E_{2}}(y) = (c,y)\), and \(Q_{E_{2}}(\#) = (a,b)\). Thus \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}\) in both cases as required. □

As an example, let E1 be the basic RWE given by \(x u z w y \doteq y w u x z\). Then \(\mathcal {Z}_{E_{1}} = \{u,z,w,\#\}\) and \(Q_{E_{1}}\) is the function with \(Q_{E_{1}}(u) = (x,w)\), \(Q_{E_{1}}(z) = (u,x)\), \(Q_{E_{1}}(w) = (z,y)\) and \(Q_{E_{1}}(\#) = (w,u)\). Thus, \({\varUpsilon }_{\!E_{1}} = \{(w,u), (x,w), (u,x), (z,y)\}\). Similarly, if E2 is the basic RWE given by \(x u w z y \doteq y u x w z\), then \({\varUpsilon }_{\!E_{2}} = \{(x,y), (u,x), (w,w), (z,u)\}\). Consequently, we may conclude that E1E2 (and symmetrically that E2E1).

Since the invariant ΥE provides a necessary condition on when two basic RWEs belong to the same equivalence class under ⇒, we might also ask whether it is also sufficient, and hence characteristic. However, this is not the case. For instance, if E3 is given by \(x uvw y \doteq y wvu x\) and E4 is given by \(x wvu y \doteq y uvw x\), then \({\varUpsilon }_{\!E_3} = {\varUpsilon }_{\!E_4} = \{(x,v),(u,w),(v,y),(w,u)\}\) but it can be verified (e.g. by enumerating [E3] and [E4]) that E3E4.

6 Jumbled Equations and a Special Case of Symmetry

The invariant property ΥE introduced in the Section 5 consists of pairs of variables. The case that (x, x) ∈ΥE for some xvar(E) is special in the sense that it leads to a particular repetitive structure in the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\), described in the current section. We shall call basic RWEs E for which no pair of the form (x, x) occurs in ΥE jumbled.

Definition 6.1 (Jumbled Equations and Δ(E))

Let E be a basic RWE and let Δ(E) = {xvar(E)∣(x, x) ∈ΥE}. If Card(Δ(E)) = 0, then E is jumbled.

For example, if we consider the equation E given by \(xyzw \doteq wyzx\), then ΥE = {(x, w),(y, y),(z, z)} so Δ(E) = {y, z} and E is not jumbled. On the other hand, for \(E^{\prime }\) given by \(x y z w \doteq w z y x\), we have \({\varUpsilon }_{\!E^{\prime }} = \{ (x,z), (y,w), (z,y) \}\), so \({{\varDelta }}(E^{\prime }) = \emptyset \) and \(E^{\prime }\) is jumbled.

Note that since ΥE is invariant under ⇒, so is the property of being jumbled. Furthermore, it follows from the definitions that (x, x) ∈ΥE for some basic RWE E and xX if and only if there exists yX such that one of the following holds:5
  1. 1.

    xy occurs as a factor of both the LHS and RHS of E, or

     
  2. 2.

    there exists \(E^{\prime }\) with \(E \Rightarrow E^{\prime }\) such that xy occurs as a factor of both the LHS and RHS of \(E^{\prime }\).

     
The cardinality of Δ(E) can be interpreted as a measure of the similarity of the two sides of the equation. If Card(Δ(E)) is large in comparison to Card(E), then the orders in which the variables occur on the LHS and RHS of E will be similar. On the other hand, when Δ(E) = , there will be no common order in the variables on each side, and hence the equation is ‘jumbled’. In general, we may observe the following bounds on Card(Δ(E)) as follows.

Remark 6.2

Let E be a basic RWE. It follows directly from Definition 5.1 that if Card(var(E)) < 2, then Card(Δ(E)) = 0. Otherwise, E can be written as \(\alpha x \doteq \beta y\) for some x, yX, α ∈ (X∖{x}) and β ∈ (X∖{y}). Since E is basic, it is indecomposable, so we may additionally conclude that xy. By Remark 5.2, neither (x, x) nor (y, y) can be contained in ΥE, so we must have Card(Δ(E)) ≤Card(var(E)) − 2.

The rest of this section is devoted to describing the structure of the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\) in the general case in terms of the graphs \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) where \(E^{\prime }\) is jumbled. The first step is to notice that we can easily transform any basic RWE E into one which is jumbled by simply removing all variables x such that (x, x) ∈Δ(E).

Lemma 6.3

Let E be a basic RWE given by \(\alpha \doteq \beta \) and let Y = var(E)∖Δ(E). Then the equation EY given by \(\pi _Y(\alpha ) \doteq \pi _Y(\beta )\) is a jumbled basic RWE.

Proof

If Δ(E) = , then the lemma holds trivially. Assume that Δ(E)≠. We shall prove the following statement, from which the lemma follows by a simple induction.

Claim 6.3.1

Suppose that E is a basic RWE given by \(\alpha \doteq \beta \), and that xΔ(E). Let \(E^{\prime }\) be the equation \(\pi _{{var}(E)\backslash \{x\}}(\alpha ) \doteq \pi _{{var}(E)\backslash \{x\}}(\beta )\). Then \(E^{\prime }\) is a basic RWE and \({\varUpsilon }_{E^{\prime }} = {\varUpsilon }_{E} \backslash \{ (x,x) \}\).

Proof

Let \(Q_E, \mathcal {Z}_E\) be defined as per Definition 5.1. We shall consider two cases depending on whether QE(#) = (x, x). Suppose firstly that QE(#)≠(x, x). Then there exist α1, α2, β1, β2 such that α = α1xyα2, β = β1xyβ2, πvar(E)∖{x}(α) = α1yα2 and πvar(E)∖{x}(β) = β1yβ2. Suppose for contradiction that \(E^{\prime }\) is not basic. Clearly both sides of \(E^{\prime }\) belong to \({qv}(E^{\prime })\), so we may infer that \(E^{\prime }\) is decomposable, and thus that there exist proper prefixes \(\alpha ^{\prime }\) and \(\beta ^{\prime }\) of α1yα2 and β1yβ2 respectively such that \({var}(\alpha ^{\prime }) \cap {qv}(E^{\prime }) = {var}(\beta ^{\prime }) \cap {qv}(E^{\prime })\). Clearly, either y occurs in both \(\alpha ^{\prime }\) and \(\beta ^{\prime }\), or in neither. Let \({\tau } : {var}(E^{\prime })^{*} \to {var}(E)^{*}\) be the morphism such that τ(y) = xy and τ(z) = z for \(z \in {var}(E^{\prime }) \backslash \{y\}\). Then \(\alpha ^{\prime \prime } = {\tau }(\alpha ^{\prime })\) and \(\beta ^{\prime \prime } = {\tau }(\beta ^{\prime })\) are proper prefixes of α and β respectively which satisfy \({var}(\alpha ^{\prime \prime }) \cap {qv}(E) = {var}(\beta ^{\prime \prime }) \cap {qv}(E)\). Thus E is decomposable and therefore not basic, a contradiction.

To see that \({\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} \backslash \{(x,x)\}\), suppose firstly that x is not a prefix of α or β, and thus that α1ε and β1ε. Then \( \mathcal {Z}_{E} = ({var}(E) \backslash \{ \alpha _{1}[1], \beta _{1}[1]\}) \cup \{\#\} \), and \(\mathcal {Z}_{E^{\prime }} = ({var}(E) \backslash \{ \alpha _{1}[1], \beta _{1}[1], x\} ) \cup \{\#\}\). It follows from the definitions that \(Q_{E^{\prime }}(y) = Q_{E}(x) = (\alpha _{1}[|\alpha _{1}|], \beta _{1}[|\beta _{1}|])\). Since α1, β1ε, α1[1]∉{x, y} and β1[1]∉{x, y}. Consequently there exist u#, v#var(E)∖{x} such that u#α1[1] is a factor of both α and πvar(E)∖{x}(α) and such that v#β1[1] is a factor of both β and πvar(E)∖{x}(β). It follows that \(Q_E(\#) = Q_{E^{\prime }}(\#) = (u_\#,v_\#)\). Likewise, for any z∉{x, y, α1[1],β1[1]}, there exist u, vvar(E)∖{x} such that uz is a factor of both α and πvar(E)∖{x}(α) and such that vz is a factor of both β and πvar(E)∖{x}(β). It follows that \(Q_E(z) = Q_{E^{\prime }}(z) = (u,v)\). Thus we may conclude that \({\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} \backslash \{(x,x)\}\).

Next, suppose that α1 = ε and β1ε (the case that β1 = ε and α1ε is symmetric). Then \(\mathcal {Z}_{E} = ({var}(E) \backslash \{x,\beta _{1}[1] \}) \cup \{\#\}\) and \(\mathcal {Z}_{E^{\prime }} = ({var}(E) \backslash \{y,x,\beta _{1}[1] \}) \cup \{\#\}\). Then QE(#) = (u#, β1[|β1|]) where u#β1[1] is a factor of xyα2. Since E is regular, each variable occurs once per side, so we may infer that β1[1]≠y, and hence that u#x. It follows that u#β1[1] is also a factor of yα2, so we may further conclude that \(Q_{E^{\prime }}(\#) = (u_\#,\beta _{1}[|\beta _{1}|]) = Q_E(\#)\). Note that QE(y) = (x, x). Let zvar(E)∖{x, y, β1[1]}. Then there exist u, vvar(E)∖{x} such that uz is a factor of both xyα2 and yα2, and such that vz is a factor of both β1xyβ2 and β1yβ2. It follows that \(Q_E(z) = Q_{E^{\prime }}(z) = (u,v)\). Again we have \({\varUpsilon }_{E^{\prime }} = {\varUpsilon }_{E}\backslash \{(x,x)\}\). Finally, note that if α1 = β1 = ε, then E is decomposable, which is a contradiction to the assumption that E is basic.

It remains to consider the case that QE(#) = (x, x). This implies that there exist u, vvar(E)∖{x} and α1, α2, β1, β2var(E) such that α = uα1xvα2, β2 = vβ1xuβ2, meaning \(E^{\prime }\) is given by \(u \alpha _{1} v \alpha _{2} \doteq v \beta _{1} u \beta _{2}\). Suppose for contradiction that \(E^{\prime }\) is not basic. Then as in the previous case, it must be decomposable, and there exist proper prefixes \(\alpha ^{\prime }, \beta ^{\prime }\) of uα1vα2 and vβ1uβ2 respectively which satisfy \({var}(\alpha ^{\prime }) \cap {qv}(E^{\prime }) = {var}(\beta ^{\prime }) \cap {qv}(E^{\prime })\). Then we must have that \(\alpha ^{\prime } = u \alpha _{1} v \alpha _3\) and \(\beta ^{\prime } = v \beta _{1} u \beta _3\) for some α3, β3X. However, it follows that \(\alpha ^{\prime \prime } = u \alpha _{1} x v \alpha _3\) and \(\beta ^{\prime \prime } = v \beta _{1} x u \beta _3\) are proper prefixes of α and β satisfying \({var}(\alpha ^{\prime \prime }) \cap {qv}(E) = {var}(\beta ^{\prime \prime }) \cap {qv}(E)\), so E is decomposable which is a contradiction to the assumption that E is basic.

To see that \({\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} \backslash \{(x,x)\}\), note that in this case \(\mathcal {Z}_E = ({var}(E) \backslash \{u,v\}) \cup \{\#\}\) and \(\mathcal {Z}_{E^{\prime }} = ({var}(E) \backslash \{u,v,x\}) \cup \#\). It follows from the definitions that \(Q_{E^{\prime }}(\#) = Q_{E}(x) = (w_{1},w_{2})\), where w1 is the leftmost variable in uα1 and w2 is the leftmost variable in vβ1. Moreover, for any zvar(E)∖{u, v, x}, there exist \(w_{1}^{\prime },w_{2}^{\prime } \in {var}(E)\backslash \{x\}\) such that \(w_{1}^{\prime }z\) is a factor of both uα1xvα2 and uα1vα2, and such that \(w_{2}^{\prime }z\) is a factor of both vβ1xuβ2 and vβ1uβ2, meaning that \(Q_{E^{\prime }}(z) = Q_{E}(z) = (w_{1}^{\prime },w_{2}^{\prime })\). It follows that \({\varUpsilon }_{E^{\prime }} = {\varUpsilon }_{E}\backslash \{(x,x)\}\) as required. □

We conclude the proof by noting that if Δ(E) = {x1, x2,…,xk}, then there exist equations Ei for 0 ≤ ik given by \(\alpha _i \doteq \beta _i\) such that
  1. 1.

    E0 = E and Ek = EY, and

     
  2. 2.

    for 1 ≤ ik, \(\alpha _i = \pi _{{var}(E_{i-1}) \backslash \{x_i\}}(\alpha _{i-1})\) and \(\beta _i = \pi _{{var}(E_{i-1}) \backslash \{x_i\}}(\beta _{i-1})\).

     
Since E is basic, it follows by Claim 6.3.1 that Ei is basic for 1 ≤ ik, and moreover by the same claim that \({\varUpsilon }_{\!E_Y} = {\varUpsilon }_{\!E} \backslash \{(x_i,x_i) \mid 1 \leq i \leq k\}\) meaning that EY is both basic and jumbled. □

There is a strong relation between the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) for a basic RWE E and \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) where EY is the jumbled basic RWE obtained from E by deleting the variables in Δ(E). The relation is described formally in Theorem 6.8. Before presenting the theorem, it is useful to first introduce some additional notions. Essentially, \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is made up of approximate copies of \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\). Each copy is a subgraph \({\mathscr{H}}_{\varphi }^E\) of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) which is associated with a certain morphism φ : Yvar(E) from a set ΦE defined below. Intuitively, φ can be seen as a way of assigning variables in Δ(E) to variables in Y = var(E)∖Δ(E).

Definition 6.4 (The set Φ E)

Let E be a basic RWE. Let Y = var(E)∖Δ(E). Let ΦE be the set of morphisms φ : Yvar(E) satisfying φ(y) ∈Δ(E)y for all yY, and \(\sum \limits _{y \in Y} |{\varphi }(y)|_x = 1\) for all xΔ(E).

The subgraphs \({\mathscr{H}}^E_{{\varphi }}\) are obtained by restricting \({\mathscr{G}}^{\Rightarrow }_{[E]}\) to subsets \(H^E_{{\varphi }}\) defined below. More precisely, \({\mathscr{H}}^E_{{\varphi }}\) consists of vertices \(H^E_{{\varphi }}\) and edges (E1, E2) whenever \(E_{1},E_{2} \in H^E_{\varphi }\) and E1E2 (i.e. whenever (E1, E2) is an edge of \({\mathscr{G}}^{\Rightarrow }_{[E]}\)). We shall say that \({\mathscr{H}}^E_{{\varphi }}\) is the subgraph of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) induced by \(H^E_{{\varphi }}\).

Definition 6.5 (\(V_{\varphi }^E, U_{\varphi }^E\) and \(H_{\varphi }^E\))

Let E be a basic RWE given by \(\alpha \doteq \beta \) and let Y = var(E)∖Δ(E). Let EY be the equation \(\pi _Y(\alpha ) \doteq \pi _Y(\beta )\). Let φΦE. Then we define the sets \(V_{\varphi }^E, U_{\varphi }^E\) and \(H_{\varphi }^E\) as follows:
  1. 1.

    \(V_{\varphi }^E = \{ {\varphi }(\hat {\alpha }) \doteq {\varphi }(\hat {\beta }) \mid \hat {\alpha } \doteq \hat {\beta } \in [E_Y]_{\Rightarrow }\}\),

     
  2. 2.

    \(H_{\varphi }^E = \{ E^{\prime } \mid \exists E^{\prime \prime } \in V_{\varphi }^E, Z \in \{L,R\}. E^{\prime \prime } \Rightarrow ^{*}_Z E^{\prime } \}\),

     
  3. 3.

    \(U_{\varphi }^E = H_{\varphi }^E \backslash V_{\varphi }^E\).

     

For each φΦE, the subgraph \({\mathscr{H}}_{\varphi }^E\) is an approximate copy of \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) in the sense that \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) is isomorphic to an isolated path contraction of \({\mathscr{H}}_{\varphi }^E\). The intuition behind the sets \(V_{\varphi }^E\) and \(U_{\varphi }^E\) is that they provide a decomposition of the set \(H_{\varphi }^E\) of vertices of \({\mathscr{H}}_{\varphi }^E\) into those which survive after the isolated path compression (\(V_{\varphi }^E\)) and those which are compressed/removed (\(U_{\varphi }^E\)). The underlying isomorphism is the function which maps equations \(\hat {\alpha } \doteq \hat {\beta } \in [E_Y]_{\Rightarrow }\) to \({\varphi }(\hat {\alpha }) \doteq {\varphi }(\hat {\beta })\).

The structure of each subgraph \({\mathscr{H}}^E_{\varphi }\) is therefore essentially the same as the structure of \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\). In order to fully understand the structure of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) however, we also need to know how the individual subgraphs are connected, or in other words, when two of subgraphs \({\mathscr{H}}^E_{{\varphi }_{1}}, {\mathscr{H}}^E_{{\varphi }_{2}}\) share a common vertex. We shall later see (Lemma 6.14) that \({\mathscr{H}}^E_{{\varphi }_{1}}\) and \({\mathscr{H}}^E_{{\varphi }_{2}}\) share a vertex if and only if the corresponding morphisms φ1, φ2 satisfy a ‘closeness’ condition defined as follows. See Fig. 4 for a complete example of the resulting relation.
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig4_HTML.png
Fig. 4

A graph representing the closeness relation for morphisms in ΦE for a basic RWE E with var(E) = {x1, x2, x3, x4} and Δ(E) = {x3, x4}, meaning that Y = {x1, x2}. In this case, ΦE contains six morphisms, φi,1 ≤ i ≤ 6, which make up the vertices of the graph. Vertices connected by an edge are close in the sense of Definition 6.6

Definition 6.6 (Close morphisms φ 1, φ 2Φ E)

Let E be a basic RWE and let Y = var(E)∖Δ(E). Let φ1, φ2ΦE. Then φ1, φ2 are close if there exist y1, y2Y with y1y2 and γ1, γ2Δ(E) such that:
  1. 1.

    For all yY ∖{y1, y2}, φ1(y) = φ2(y), and

     
  2. 2.

    φ1(y1) = γ1γ2y1, φ2(y1) = γ2y1, and φ2(y2) = γ1φ1(y2).

     

Informally, two morphisms φ1, φ2ΦE are close if we can obtain one from the other by removing some prefix of the image of a variable y1 and appending it to the left of the image of another variable y2. For example, suppose that var(E) = {x1, x2, x3, x4, x5, x6} and Δ(E) = {x3, x4, x5, x6}, and consider the two morphisms \({\varphi }_{1},{\varphi }_{2} : \{x_{1},x_{2}\}^{*} \to \{x_{1},x_{2},x_3,x_4,x_5,x_6\}^{*}\) given by φ1(x1) = x4x3x5x1, φ1(x2) = x6x2, φ2(x1) = x5x1 and φ2(x2) = x4x3x6x2. Then φ1, φ2 both belong to ΦE and are close, since we can get one from the other simply by moving the the prefix x4x3 from the image of x1 to the image of x2.

The following lemma shows that even when φ1 and φ2 are not close, we can find a sequence of intermediate morphisms in ΦE starting with φ1 and ending with φ2, such that each morphism in the sequence and its successor are close, and such that this sequence is ‘short’. This will form the basis of our claim that the subgraphs \({\mathscr{H}}_{\varphi }^E\) which make up the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) are well-connected, and in particular means that there is a (short) path in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) between any two of the subgraphs.

Lemma 6.7

Let E be a basic RWE and suppose that \({\varphi }^{\prime }, {\varphi }^{\prime \prime } \in {{\varPhi }}_E\) with \({\varphi }^{\prime } \not = {\varphi }^{\prime \prime }\). Then there exist k ≤ 4Card(Δ(E)) + 1 and φ1, φ2, φ3,…,φkΦE such that \({\varphi }^{\prime } = {\varphi }_{1}\), \({\varphi }^{\prime \prime } = {\varphi }_k\), and φi, φi+ 1 are close for all i,1 ≤ i < k.

Proof

Let Y = var(E)∖Δ(E). If Δ(E) = , then ΦE contains only the identity morphism. Thus we may assume that Δ(E)≠ and consequently by Remark 6.2 that Card(Y ) ≥ 2. Note the following claim.

Claim 6.7.1

Let φ1, φ2ΦE, y1, y2Y, zΔ(E) and γ1, γ2Δ(E) such that y1y2 and
  1. 1.

    φ1(y1) = γ1zγ2y1, φ2(y1) = γ1γ2y1 and φ2(y2) = zφ1(y2), and

     
  2. 2.

    φ1(y) = φ2(y) for all yY ∖{y1, y2}.

     
Then there exists φ3ΦE such that φ1, φ3 are close, and φ3, φ2 are close.

Proof

Let φ3 be the morphism such that φ3(y1) = γ2y1, φ3(y2) = γ1zφ1(y2), and φ3(y) = φ1(y) for all yY ∖{y1, y2}. Then it follows directly from the definitions that φ1, φ3 are close. Moreover, since φ2(y) = φ1(y) for all yY ∖{y1, y2}, it also follows from the definitions that φ2, φ3 are also close. □

Claim 6.7.1 shows us that with two successors in a sequence, we can ‘move’ any variable zΔ(E) from φ(y1) to the prefix of φ(y2) where y1, y2Y with y1y2 (leaving the rest of the morphism unchanged). Given any \({\varphi }^{\prime } \in {{\varPhi }}_E\) we can reach any other morphism \({\varphi }^{\prime \prime } \in {{\varPhi }}_E\) by moving each variable zΔ(E) twice in this manner according to the following strategy: firstly, we move each variable zΔ(E) to the prefix of the image of a variable yY such that \(z \notin {var}({\varphi }^{\prime \prime }(y))\). Note that this is possible due to the assumption that Card(Y ) ≥ 2 and requires moving each variable in Δ(E) at most once. Then, we move the variables zΔ(E) back to the images of the ‘correct’ yY in the appropriate order. For example, if \({\varphi }^{\prime \prime }(y) = z_{1}z_{2} {\ldots } z_n y\), then we would first move zn to the prefix of the image of y, then zn− 1, and so on. Again this requires moving each variable at most once, and once we have done this for all variables, then we will be left with exactly the morphism \({\varphi }^{\prime \prime }\). Overall we have moved each variable at most twice. Since each move requires two successors in the underlying sequence, we need at most 4Δ(E) successors in total and the statement of the lemma follows. □

We are now ready to give the full statement relating \({\mathscr{G}}^{\Rightarrow }_{[E]}\) and \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) formally as follows. An example demonstrating the theorem is given in Fig. 5.
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig5_HTML.png
Fig. 5

Example illustrating Theorem 6.8. On the left is \({\mathscr{G}}_{[E]}^{\Rightarrow }\) for the equation E given by \(y_{1}xy_{2}y_3y_4 \doteq y_4y_3xy_{2}y_{1}\). Note that Δ(E) = {x}, so Y = {y1, y2, y3, y4} and EY is given by \(y_{1}y_{2}y_3y_4 \doteq y_4y_3y_{2}y_{1}\). The graph \({\mathscr{G}}_{[E_Y]}^{\Rightarrow }\) is shown on the top-right, where the equations in [EY] have been labelled A, B, C, D, E, F, G. The set ΦE contains four morphisms φi, 1 ≤ i ≤ 4, such that φi(yi) = xyi and φi(yj) = yj for ji. In this case, all morphisms in ΦE are close to each other so the closeness relation (depicted as the graph \({\mathscr{G}}^{\text {Close}}_{{{\varPhi }}_E}\) on the bottom-right) is a complete graph. The graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is comprised of four subgraphs \({\mathscr{H}}^E_{{\varphi }_i}\), 1 ≤ i ≤ 4. Each subgraph and morphism from ΦE is depicted with a distinct colour in the figure. For each Z ∈{A, B, C, D, E, F, G} given by \(\alpha _Z \doteq \beta _Z\), Zi denotes the equation \({\varphi }_i(\alpha _Z) \doteq {\varphi }_i(\beta _Z)\). Thus the set of vertices unique to the subgraph \({\mathscr{H}}^E_{{\varphi }_i}\) is given by \(V^E_{{\varphi }_i} = \{A_i,B_i,C_i,D_i,E_i,F_i,G_i\}\). The vertices shared between two subgraphs (i.e. those belonging to \(U^E_{{\varphi }_i}\)) are labelled u1, u2,…,u6. Since any two morphisms from ΦE are close, each pair of subgraphs have at least one vertex in common. Each subgraph can be made isomorphic to \({\mathscr{G}}^{\Rightarrow }_{[E]}\) by contracting the paths (dashed) passing through the shared vertices i1, i2,…,i6. For example, the subgraph \({\mathscr{H}}^E_{{\varphi }_{1}}\) containing the vertices A1, B1, C1, D1, E1, F1, G1, u1, u4, u5 can be made isomorphic to \({\mathscr{G}}_{[E_Y]}^{\Rightarrow }\) by contracting the paths (A1, u4, E1),(B1, u5, D1), and (C1, u1, C1) into single edges (A1, E1),(B1, D1) and (C1, C1)

Theorem 6.8

Let E be a basic RWE given by \(\alpha \doteq \beta \). Let Y = var(E)∖Δ(E). Let EY be the equation \(\pi _Y(\alpha ) \doteq \pi _Y(\beta )\). Let \(d = \max \limits \{1, {diam}({\mathscr{G}}^{\Rightarrow }_{[E_Y]})\}\). Then:
  1. 1.

    for each φΦE, \(H_{\varphi }^E \subseteq [E]_{\Rightarrow }\) and \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) is isomorphic to an isolated path contraction of order Card(Δ(E)) of the subgraph \({\mathscr{H}}_{\varphi }^E\) of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) induced by \(H_{\varphi }^E\).

     
  2. 2.

    \({\mathscr{G}}^{\Rightarrow }_{[E]} = \bigcup \limits _{{\varphi } \in {{\varPhi }}_E}{\mathscr{H}}_{\varphi }^E\).

     
  3. 3.

    \({diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(d|E|^2)\).

     

Before we proceed with proving Theorem 6.8, it deserves a few further comments. Firstly, we note that since each morphism φΦE is clearly injective, the subsets \(V_{\varphi }^E\) of vertices of each subgraph \({\mathscr{H}}_{\varphi }^E\) are pairwise disjoint. Consequently, while the subgraphs \({\mathscr{H}}_{\varphi }^E\) do overlap (and it is precisely these overlaps which mean they are all connected), each one contains a unique copy of the vertices of \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\).

Secondly, note that the number of morphisms in the set ΦE will grow exponentially with respect to Card(Δ(E)). More precisely, we may assume some order Y = {y1, y2,…,yn} on the variables in Y and represent each morphism φΦE as a word φ(y1)φ(y2)…φ(yn). This representation is clearly unique to φ. Furthermore, a word over var(E) is a representation of this form for some φΦE if and only if each variable occurs exactly once, the variables yi occur in order from left to right, and yn occurs as a suffix. Thus, the number of morphisms in total is given by
$${\text{Card}}({{\varPhi}}_{E}) = \frac{({\text{Card}}({var}(E))-1)!}{({\text{Card}}({var}(E))-{\text{Card}}({{\varDelta}}(E)))!}.$$

Since each subgraph contains a subset of vertices not shared with any other, it follows that the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) will also be (at least) exponential in Card(Δ(E)). We shall see later in Section 9 that this is essentially the worst case for the size of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) for RWEs E, with the largest graphs corresponding exactly to the case that Card(Δ(E)) is maximal. Nevertheless, it is worth pointing out that in the same case, the graph \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) will be consist of a single vertex and two self-loops and thus the \({diam}({\mathscr{G}}^{\Rightarrow }_{[E]})\) will be (at most) quadratic in |E|. This is significantly better than our upper bound in the general case.

Proof of Theorem 6.8

The rest of the section focuses on the proof of Theorem 6.8. The main technical content is presented in the following series of lemmas. Statement 1 is given by Lemmas 6.15 and 6.16, while Statements 2 and 3 are given by Lemmas 6.17 and 6.18 respectively. Throughout the remainder of this section, for a basic RWE E given by \(\alpha \doteq \beta \) and a morphism φ, we shall use the notation φ(E) as shorthand for \({\varphi }(\alpha ) \doteq {\varphi }(\beta )\). We begin by noting some properties of equations belonging to the sets \(H_{\varphi }^E\). The first deals with equations belonging to \(V_{\varphi }^E\) and follows directly from the definitions.

Fact 6.9

Let E be a basic RWE. Let Y = var(E)∖Δ(E), n = Card(Δ(E)) and let EY = πY(E). Suppose that φΦE. Then \(E^{\prime } \in V_{\varphi }^E\) if and only if there exists a permutation σ : {1,2,…,n}→{1,2,…,n} and y1, y2,…,yn with Y = {y1, y2,…,yn} such that \(y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }\) and such that \(E^{\prime }\) can be written as
$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq {\varphi}(y_{\sigma(1)}) {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(n)}). $$

With a little additional reasoning, we can give a similar characterisation of equations contained in \(U_{\varphi }^E\).

Lemma 6.10

Let E be a basic RWE. Let Y = var(E)∖Δ(E), n = Card(Δ(E)) and let EY = πY(E). Suppose that φΦE. Then \(E^{\prime } \in U_{\varphi }^E\) if and only if there exist a permutation σ : {1,2,…,n}→{1,2,…,n} and y1, y2,…,yn with Y = {y1, y2,…,yn} such that one of the following holds:
  1. 1.
    \(y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }\) and \(E^{\prime }\) may be written as:
    $$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)})$$
     
  2. 2.
    \( y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \doteq y_{1}y_{2}{\ldots } y_n \in [E_Y]_{\Rightarrow }\) and \(E^{\prime }\) may be written as:
    $$\delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}) \doteq {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n})$$
     
where σ(ι) = 1, δ1δ2 = φ(yσ(1)), and δ1, δ2ε.

Proof

Suppose that \(E^{\prime }\) satisfies the conditions of the lemma. We shall consider the case that Statement 1 holds. The case that Statement 2 holds is symmetric. Then \(E^{\prime \prime } \Rightarrow _L^{*} E^{\prime }\) where \(E^{\prime \prime }\) is the equation given by
$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq \overbrace{\delta_{1} \delta_{2} }^{{\varphi}(y_{\sigma(1)})} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}).$$
Consequently, \(E^{\prime \prime } = {\varphi }(\hat {E})\) for some \(\hat {E} \in [E_Y]_{\Rightarrow }\), so \(E^{\prime \prime } \in V_{\varphi }^E\) and thus \(E^{\prime } \in H_{\varphi }^E\). Note however, that since \(E^{\prime }\) is a basic RWE, each variable occurs exactly once on each side of the equation. We may therefore conclude that δ1δ2 = φ(yσ(1)) is not a factor of the RHS of \(E^{\prime }\), and consequently, by Fact 6.9, \(E^{\prime } \notin V_{\varphi }^E\). Thus \(E^{\prime } \in U_{\varphi }^E\).

Now suppose instead that \(E^{\prime } \in U_{\varphi }^E\). Then there exists some \(E^{\prime \prime } \in V_{\varphi }^E\), \(k \in \mathbb {N}\) and Z ∈{L, R} such that \(E^{\prime \prime } \Rightarrow _Z^k E^{\prime }\). Suppose we choose \(E^{\prime \prime },Z\) and k such that k is minimal. Suppose additionally that Z = L. We shall show that Statement 1 of the lemma is satisfied. The case that Z = R is symmetric and results in Statement 2 being satisfied.

Since we have \(E^{\prime \prime } \in V_{\varphi }^E\), it follows from Fact 6.9 that there exists a permutation σ : {1,2,…,n}→{1,2,…,n} and y1, y2,…,yn with Y = {y1, y2,…,yn} such that \(y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }\) and such that \(E^{\prime \prime }\) can be written as
$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq {\varphi}(y_{\sigma(1)}) {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(n)}). $$
Let = |φ(yσ(1))| and let \(E^{\prime \prime \prime }\) be the equation given by
$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) {\varphi}(y_{\sigma(1)}) {\varphi}(y_{\sigma(\iota)}) \ldots {\varphi}(y_{\sigma(n)}) $$
where σ(ι) = 1. Then \(E^{\prime \prime } \Rightarrow _Z^{\ell } E^{\prime \prime \prime }\). However, since \(y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }\) and
$$y_{1}y_{2}{\ldots} y_{n} \!\doteq\! y_{\sigma(1)} y_{\sigma(2)} {\ldots} y_{\sigma(n)} \!\Rightarrow\! y_{1}y_{2}{\ldots} y_{n} \!\doteq\! y_{\sigma(2)} {\ldots} y_{\sigma(\iota-1)} y_{\sigma(1)} y_{\sigma(\iota)} {\ldots} y_{\sigma(n)}, $$
we may conclude that \(y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (2)} {\ldots } y_{\sigma (\iota -1)} y_{\sigma (1)} y_{\sigma (\iota )} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }\). Thus, by Fact 6.9, \(E^{\prime \prime \prime } \in V_{\varphi }^E\). Consequently, since \(V_{\varphi }^E\) and \(U_{\varphi }^E\) are by definition disjoint, we must have that k∉{0,}. Moreover, by our assumption that k is as minimal, we must have that k < (otherwise we could choose \(E^{\prime \prime \prime }\) in place of \(E^{\prime }\) and get a smaller value of k). This directly implies that there exist δ1, δ1ε with δ1δ2 = φ(yσ(1)) such that \(E^{\prime }\) may be written as
$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)})$$
and thus Statement 1 of the lemma is satisfied. The case that Z = R is symmetrical, leading instead to the satisfaction of Statement 2. □

We shall now focus on the claim that \(H_{\varphi }^E \subseteq [E]_{\Rightarrow }\) for each φΦ. The first step is to show that for at least one φΦE, the equation φ(EY) is contained in [E].

Lemma 6.11

Let E be a basic RWE. Let Y = var(E)∖Δ(E) and let EY = πY(E). Then there exists φΦE such that φ(EY) ∈ [E].

Proof

Note that if Δ(E) = , then EY = E and ΦE contains only the identity morphism, so the lemma holds trivially. Suppose that Δ(E)≠. By Remark 6.2, we may therefore assume that E is a basic RWE with at least two variables, so may write it as \( x \alpha _{1} u_{1} u_{2} {\ldots } u_n y \alpha _{2} \doteq y \beta _{1} u_{1} u_{2} {\ldots } u_n x \beta _{2}\) where x, y, u1, u2,…,unX are pairwise distinct variables and α1, α2, β1, β2 ∈ (var(E)∖{x, y, u1, u2,…,un}), and such that α1 and β1 do not share a common non-empty suffix. Then \(E\Rightarrow ^{*}_R E^{\prime }\) where \(E^{\prime }\) is given by \(u_{1}u_{2} {\ldots } u_n x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} u_{1} u_{2} {\ldots } u_n x \beta _{2}\).

Now, consider the function \(Q_{E^{\prime }}\) as defined in Definition 5.1. Note in particular that \(Q_{E^{\prime }}(\#) = (v,w)\) where v, wX are the length-1 suffixes of xα1 and yβ1, and hence vw. By Theorem 5.3, \({\varUpsilon }_{\!E} = {\varUpsilon }_{\!E^{\prime }}\) (and hence \({{\varDelta }}(E) = {{\varDelta }}(E^{\prime })\)). Thus, for every zΔ(E), there exists \(z^{\prime } \in {var}(E)\) such that \(Q_{E^{\prime }}(z^{\prime }) = (z,z)\), meaning that z occurs directly to the left of \(z^{\prime }\) on both the LHS and RHS of \(E^{\prime }\). It follows that each zΔ(E) has a unique ‘successor’ variable \(z^{\prime }\) occurring to the right of z on both sides of the equation, and therefore that there exists some morphism φΦE such that \(E^{\prime } = {\varphi }(\pi _Y(E^{\prime }))\). Finally, notice that \(u_i \in {{\varDelta }}(E^{\prime }) = {{\varDelta }}(E)\) for 1 ≤ in, and consequently, \(\pi _Y(E^{\prime })= \pi _Y(E) = E_Y\). □

The following lemma shows a correspondence between edges in \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) and paths in the subgraphs \({\mathscr{H}}_{\varphi }^E\) of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) which start and end with vertices from \(V_{\varphi }^E\) and whose internal vertices (if there are any) belong to \(U_{\varphi }^E\).

Lemma 6.12

Let E be a basic RWE. Let Y = var(E)∖Δ(E) and let EY = πY(E). Let Z ∈{L, R} and suppose that \(E^{\prime },E^{\prime \prime } \in [E_Y]_{\Rightarrow }\) such that \(E^{\prime } \Rightarrow _Z E^{\prime \prime }\). Let φΦE. Then there exist k ≤Card(Δ(E)) and E0, E1, E2,…,Ek+ 1 such that
  1. 1.

    \({\varphi }(E^{\prime }) = E_0\) and \({\varphi }(E^{\prime \prime }) = E_{k+1}\), and

     
  2. 2.

    \(E_i \in U^E_{\varphi }\) for 1 ≤ ik, and

     
  3. 3.

    E0ZE1ZE2Z… ⇒ZEkZEk+ 1.

     

Proof

Note that if Card(Y ) < 2, then [EY] is a singleton and the lemma holds trivially. We may therefore assume that Card(Y ) ≥ 2. Suppose that Z = R. The case that Z = L is symmetric. Then there exist x, yY and α1, α2, β1, β2Y such that \(E^{\prime }\) may be written as \(x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}\) and \(E^{\prime \prime }\) may be written as \(\alpha _{1} xy \alpha _{2} \doteq y \beta _{1} x \beta _{2}\). Let \(E_0 = {\varphi }(E^{\prime })\) and \(E_{k+1} = {\varphi }(E^{\prime \prime })\). If φ(x) = x, then E0REk+ 1 so the lemma holds for k = 0. Suppose that φ(x)≠x.

Then there exists k, 1 ≤ k ≤Card(Δ(E)) and z1, z2,…,zkΔ(E) such that φ(x) = z1z2zkx. For each i,1 ≤ ik, let Ei be the equation given by:
$$z_{i+1} {\ldots} z_{k} x {\varphi}(\alpha_{1}) z_{1} z_{2} {\ldots} z_{i} {\varphi}(y) {\varphi}(\alpha_{2}) \doteq {\varphi}(y) {\varphi}(\beta_{1} ) {\varphi}(x) {\varphi}(\beta_{2}).$$
Then it follows directly from Lemma 6.10 that \(E_i \in U_{\varphi }^E\) for 1 ≤ ik. Moreover,
$$ E_{0} \Rightarrow_{R} E_{1} \Rightarrow_{R} E_{2} \Rightarrow_{R} {\ldots} \Rightarrow_{R} E_{k} \Rightarrow_{R} E_{k+1} $$
as required. □

A straightforward induction on Lemma 6.12 allows us to conclude that if, for some φΦE, φ(EY) ∈ [E], then \(H_{\varphi }^E \subseteq [E]_{\Rightarrow }\). We have already shown (Lemma 6.11 that this is true for at least one choice of φ. The next step is to show that φ(EY) ∈ [E] for all φΦE, which we obtain as a consequence of Lemmas 6.7 and 6.14 below. Before proving Lemma 6.14, we need the following result, which we shall reuse later and is therefore stated separately.

Lemma 6.13

Let E be a basic RWE. Then there exist n1, n2 < |E|2 and \(\hat {E}\) such that \(E \Rightarrow ^{n_{1}} \hat {E}\) and \(\hat {E} \Rightarrow ^{n_{2}} E\) where \(\hat {E}\) can be written as \(x \alpha y \doteq y \beta x\) where x, yvar(E) and α, β ∈ (var(E)∖{x, y})

Proof

We shall prove the case that \(E \Rightarrow ^{n_{1}} \hat {E}\). The case that \(\hat {E} \Rightarrow ^{n_{2}} E\) is easily adapted. Recall that we may write any basic RWE as \(x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}\) where x, yX and α1, α2, β1, β2 ∈ (X∖{x, y}). We have the following claim:

Claim 6.13.1

For every basic, regular equation E given by \(x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}\), either α2 = β2 = ε, or there exists n < |E| and \(E^{\prime }\) such that \(E\Rightarrow ^n E^{\prime }\) and \(E^{\prime }\) may be written as \(x^{\prime } \alpha _{1}^{\prime } y^{\prime } \alpha _{2}^{\prime } \doteq y^{\prime } \beta _{1}^{\prime } x^{\prime } \beta _{2}^{\prime }\) where \(x^{\prime },y^{\prime } \in X\), \(\alpha _{1}^{\prime },\alpha _{2}^{\prime },\beta _{1}^{\prime },\beta _{2}^{\prime } \in (X \backslash \{x^{\prime },y^{\prime }\})^{*}\), and such that \(|\alpha _{1}^{\prime }|+ |\beta _{1}^{\prime }| > |\alpha _{1}| + |\beta _{1}|\).

Proof

Let E be given by \(x \alpha _{1} y \alpha _{2} \!\doteq \! y \beta _{1} x \beta _{2}\) where x, yvar(E) and α1, α2, β1, β2 ∈ (var(E)∖{x, y}). We have two cases, either var(α1) = var(β1), in which case, due to the fact that E is basic and therefore indecomposable, we must have that α2 = β2 = ε, so the claim holds. Otherwise, there exists z ∈ (var(α1)∖var(β1)) ∪ (var(β1)∖var(α1)). W.l.o.g. suppose that zvar(α1)∖var(β1). Then since E is regular, zvar(β2) and we can write E as \(x \gamma _{1} z \gamma _{2} y \alpha _{2} \doteq y \beta _{1} x \delta _{1} z \delta _{2}\) where γ1, γ2, δ1, δ2 ∈ (var(E)∖{x, y, z}). Consequently, we have that \(E \Rightarrow _R^{*} E^{\prime }\) where \(E^{\prime }\) is given by \(z \gamma _{2} x \gamma _{1} y \alpha _{2} \doteq y \beta _{1} x \delta _{1} z \delta _{2}\). By Remark 3.2, we have that \(E\Rightarrow ^n E^{\prime }\) where n < |E|. Moreover, \(E^{\prime }\) clearly has the form described in the claim as witnessed by \(x^{\prime } = z\), \(y^{\prime } = y\), \(\alpha _{1}^{\prime } = \gamma _{2} x \gamma _{1}\), \(\alpha _{2}^{\prime } = \alpha _{2}\), \(\beta _{1}^{\prime } = \beta _{1} x \delta _{1}\) and \(\beta _{2}^{\prime } = \delta _{2}\). □

Since for any equation E of the form \(x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}\), we must clearly have that |α1| + |β1| < |E|, it follows from a simple induction on Claim 6.13.1 that \(E \Rightarrow ^n \hat {E}\) for some n < |E|2 and \(\hat {E}\) of the form \(x^{\prime } \alpha y^{\prime } \doteq y^{\prime } \beta x^{\prime }\) as claimed. □

Lemma 6.14

Let E be a basic RWE. Let Y = var(E)∖Δ(E) and let EY = πY(E). Let φ1, φ2ΦE. Then \(H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}} \not = \emptyset \) if and only if φ1, φ2 are close.

Proof

If Card(Y ) < 2, then ΦE consists of the identity morphism only so the statement holds trivially. Suppose that Card(Y ) ≥ 2. Suppose firstly that φ1, φ2 are close. Then there exist y1, y2Y with y1y2 and γ1, γ2Δ(E) such that φ1(y1) = γ1γ2y1, φ2(y1) = γ2y1, φ2(y2) = γ1φ1(y2), and for yY ∖{y1, y2}, φ1(y) = φ2(y). In order to show that \(H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}} \not = \emptyset \), we need the following claim.

Claim 6.14.1

There exist \(\hat {E} \in [E_Y]_{\Rightarrow }\) and \(\hat {\alpha }_{1},\hat {\alpha }_{2},\hat {\beta }_{1},\hat {\beta }_{2} \in (Y \backslash \{y_{1},y_{2}\})^{*}\) such that \(\hat {E}\) can be written either as:
  1. 1.

    \(y_{1} \hat {\alpha }_{1} y_{2} \hat {\alpha _{2}} \doteq y_{2} \hat {\beta }_{1} y_{1} \hat {\beta _{2}}\), or

     
  2. 2.

    \(y_{2} \hat {\alpha }_{1} y_{1} \hat {\alpha _{2}} \doteq y_{1} \hat {\beta }_{1} y_{2} \hat {\beta _{2}}\)

     

Proof

By Lemma 6.13, there exists \(\hat {E}^{\prime } \in [E_Y]_{\Rightarrow }\) such that \(\hat {E}^{\prime }\) may be written as \(x \hat {\alpha } z \doteq z \hat {\beta } x\) where x, zY, xz and \(\hat {\alpha },\hat {\beta } \in (Y \backslash \{x,z\})^{*}\). By Lemma 6.3, EY is basic, meaning that each variable in Y = var(EY) occurs exactly once on each side of EY. It follows by properties of ⇒ that each variable in Y also occurs exactly once in each of \(x \hat {\alpha } z\) and \(z \hat {\beta } x\). Hence there exist \(\hat {\alpha }^{\prime }, \hat {\alpha }^{\prime \prime } \in (Y \backslash \{y_{2}\})^{*}\) such that \(x \hat {\alpha } z = \hat {\alpha }^{\prime } y_{2} \hat {\alpha }^{\prime \prime }\) (and such that y1 occurs in either \(\hat {\alpha }^{\prime }\) or \(\hat {\alpha }^{\prime \prime })\).

Suppose w.l.o.g. that y1 occurs to the left of y2 in the RHS. We shall show that Statement 1 of the lemma is satisfied. The case that y2 occurs to the right of y1 is symmetric and leads to Statement 2 being satisfied. Then there exist \(\hat {\beta }^{\prime },\hat {\beta }^{\prime \prime }, \hat {\beta }^{\prime \prime \prime } \in (Y\backslash \{y_{1},y_{2}\})^{*}\) such that \(z \hat {\beta } x = \hat {\beta }^{\prime } y_{1} \hat {\beta }^{\prime \prime } y_{2} \hat {\beta }^{\prime \prime \prime }\). Then we may write \(\hat {E}^{\prime }\) as
$$ \hat{\alpha}^{\prime} y_{2} \hat{\alpha}^{\prime\prime} \doteq \hat{\beta}^{\prime} y_{1} \hat{\beta}^{\prime\prime} y_{2} \hat{\beta}^{\prime\prime\prime}.$$
Note that z is a suffix of \(y_{2} \hat {\alpha }^{\prime \prime }\) and a prefix of \(\hat {\beta }^{\prime }y_{1}\). Since y2 does not occur in \(\hat {\beta }^{\prime } y_{1}\), we have y2z. Consequently, we may write \(\hat {\alpha ^{\prime \prime }} = \hat {\alpha }^{\prime \prime \prime } z\) for some \(\hat {\alpha }^{\prime \prime \prime }\). Then
$$ \begin{array}{@{}rcl@{}} &&\overbrace{\hat{\alpha}^{\prime} y_{2} \hat{\alpha}^{\prime\prime\prime}z \doteq \hat{\beta}^{\prime} y_{1} \hat{\beta}^{\prime\prime} y_{2} \hat{\beta}^{\prime\prime\prime}}^{\hat{E}^{\prime}}\\ \Rightarrow_{R}^{*} && y_{2} \hat{\alpha}^{\prime\prime\prime} \hat{\alpha}^{\prime} z \doteq \hat{\beta}^{\prime} y_{1} \hat{\beta}^{\prime\prime} y_{2} \hat{\beta}^{\prime\prime\prime}\\ \Rightarrow_{L}^{*} && y_{2} \underbrace{\hat{\alpha}^{\prime\prime\prime} \hat{\alpha}^{\prime} z }_{\hat{\alpha}_{1} y_{1} \hat{\alpha}_{2}} \doteq y_{1} \underbrace{\hat{\beta}^{\prime\prime} \hat{\beta}^{\prime} }_{\hat{\beta}_{1}} y_{2} \underbrace{\hat{\beta}^{\prime\prime\prime}}_{\hat{\beta}_{2}} \end{array} $$

so \(y_{2} \hat {\alpha }^{\prime \prime \prime } \hat {\alpha }^{\prime } z \doteq y_{1} \hat {\beta }^{\prime \prime } \hat {\beta }^{\prime } y_{2} \hat {\beta }^{\prime \prime \prime } \in [E_Y]_{\Rightarrow }\). Since y1 occurs either in \(\hat {\alpha }^{\prime }\) or in \(\hat {\alpha }^{\prime \prime } = \hat {\alpha }^{\prime \prime \prime }z\), we may write \(\hat {\alpha }^{\prime \prime \prime } \hat {\alpha }^{\prime } z\) as \(\hat {\alpha }_{1} y_{1} \hat {\alpha }_{2}\) for some \(\hat {\alpha }_{1},\hat {\alpha }_{2} \in (Y\backslash \{y_{1},y_{2}\})^{*}\). Thus the first statement of the lemma holds with \(\hat {\beta }_{1} = \hat {\beta }^{\prime \prime } \hat {\beta }^{\prime }\) and \(\hat {\beta }_{2} = \hat {\beta }^{\prime \prime \prime }\). □

Assume that the first statement of Claim 6.14.1 holds. The case that the second statement holds is symmetric. Then there exists \(\hat {E} \in [E_Y]_{\Rightarrow }\) such that \(\hat {E}\) has the form \(y_{1} \hat {\alpha }_{1} y_{2} \hat {\alpha _{2}} \doteq y_{2} \hat {\beta }_{1} y_{1} \hat {\beta _{2}}\), for some \( \hat {\alpha }_{1}, \hat {\alpha }_{2},\hat {\beta }_{1},\hat {\beta }_{2} \in (Y\backslash \{y_{1},y_{2}\})^{*}\). Let EINT be the equation given by
$$\gamma_{2} y_{1} {\varphi}_{1}(\hat{\alpha}_{1}) \gamma_{1} {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2}) \doteq {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}) \gamma_{1} \gamma_{2} y_{1} {\varphi}_{1}(\hat{\beta}_{2})$$
and notice that
$$ \begin{array}{@{}rcl@{}} &&\overbrace{{\varphi}_{1}(y_{1}){\varphi}_{1}(\hat{\alpha}_{1}) {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2})\doteq {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}){\varphi}_{1}(y_{1}){\varphi}_{1}(\hat{\beta}_{2})}^{{\varphi}_{1}(\hat{E})} \\ \Rightarrow_{R}^{*} && \underbrace{\gamma_{2} y_{1} {\varphi}_{1}(\hat{\alpha}_{1}) \gamma_{1} {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2}) \doteq {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}) \gamma_{1} \gamma_{2} y_{1} {\varphi}_{1}(\hat{\beta}_{2})}_{E_{INT}}. \end{array} $$
Moreover, recall that φ2(y1) = γ2y1, φ2(y2) = γ1φ1(y2). Since \(\hat {\alpha }_{1},\hat {\alpha }_{2} \in (Y \backslash \{y_{1},y_{2}\})^{*}\), we also have \({\varphi }_{2}(\hat {\alpha }_{1}) = {\varphi }_{1}(\hat {\alpha }_{1})\) and \({\varphi }_{2}(\hat {\alpha }_{2}) = {\varphi }_{1}(\hat {\alpha }_{2})\). Consequently
$$ \begin{array}{@{}rcl@{}} & &\overbrace{\gamma_{2} y_{1} {\varphi}_{1}(\hat{\alpha}_{1}) \gamma_{1} {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2}) \doteq \gamma_{1} {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}) \gamma_{2} y_{1} {\varphi}_{1}(\hat{\beta}_{2})}^{{\varphi}_{2}(\hat{E})}\\ \Rightarrow_{L}^{*} & &\underbrace{\gamma_{2} y_{1} {\varphi}_{1}(\hat{\alpha}_{1}) \gamma_{1} {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2}) \doteq {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}) \gamma_{1} \gamma_{2} y_{1} {\varphi}_{1}(\hat{\beta}_{2})}_{E_{INT}}. \end{array} $$

Since \(\hat {E} \in [E_Y]_{\Rightarrow }\), by definition \({\varphi }_{1}(\hat {E}) \in V^E_{{\varphi }_{1}}\) and \({\varphi }_{2}(\hat {E}) \in V^E_{{\varphi }_{2}}\). Thus it follows that \(E_{INT} \in U^E_{{\varphi }_{1}} \cap U^E_{{\varphi }_{2}}\) and consequently \(H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}} \not = \emptyset \).

Now suppose instead that \(H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}} \not = \emptyset \). Let \(E_{INT} \in H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}}\). If φ1 = φ2 then the statement holds trivially. Thus we assume that φ1φ2. Before we proceed, we need the following claim.

Claim 6.14.2

Let \({\varphi }^{\prime },{\varphi }^{\prime \prime } \in {{\varPhi }}_E\) and \(\mu ^{\prime }, \mu ^{\prime \prime } \in Y^{*}\) such that \(|\mu ^{\prime }|_y = |\mu ^{\prime \prime }|_y = 1\) for all yY. If \({\varphi }^{\prime }(\mu ^{\prime }) = {\varphi }^{\prime \prime }(\mu ^{\prime \prime })\), then \({\varphi }^{\prime } = {\varphi }^{\prime \prime }\) and \(\mu ^{\prime } = \mu ^{\prime \prime }\).

Proof

Suppose that \({\varphi }^{\prime }(\mu ^{\prime }) = {\varphi }^{\prime \prime }(\mu ^{\prime \prime })\). It follows from the definition of ΦE that for any φΦ, the morphism πYφ is the identity over Y. Thus \(\mu ^{\prime } = \pi _Y({\varphi }^{\prime }(\mu ^{\prime })) = \pi _Y({\varphi }^{\prime \prime }(\mu ^{\prime \prime })) = \mu ^{\prime \prime }\). Furthermore, for each yY, we may uniquely reconstruct \({\varphi }^{\prime }(y)\) and \({\varphi }^{\prime \prime }(y)\) as the longest factors of the form Δ(E)y in \({\varphi }^{\prime }(\mu ^{\prime })\) and \({\varphi }^{\prime \prime }(\mu ^{\prime \prime })\) respectively. It follows from the definition of ΦE and the fact that \(|\mu ^{\prime }|_y, |\mu ^{\prime \prime }|_y = 1\) that these factors will exist and be unique. Thus, under the assumption that \({\varphi }^{\prime }(\mu ^{\prime }) = {\varphi }^{\prime \prime }(\mu ^{\prime \prime })\), it follows that \({\varphi }^{\prime }(y) = {\varphi }^{\prime \prime }(y)\) for all yY and hence \({\varphi }^{\prime } = {\varphi }^{\prime \prime }\). □

It follows from Fact 6.9 and Lemma 6.10 that for each i ∈{1,2}, there exists μiY with |μi|y = 1 for all yY such that at least one of the LHS or RHS of EINT has the form φi(μi). By Claim 6.14.2, and since φ1φ2, a single side of EINT cannot have the form φi(μi) for both i = 1 and i = 2. By Fact 6.9, this means that \(E_{INT} \notin V^E_{{\varphi }_{1}}, V^E_{{\varphi }_{2}}\) and consequently that \(E_{INT} \in U^E_{{\varphi }_{1}} \cap U^E_{{\varphi }_{2}}\). Thus, either Statement 1 or Statement 2 of Lemma 6.10 holds with φ = φ1 and \(E^{\prime } = E_{INT}\). W.l.o.g. suppose that the LHS of EINT has the form φ1(μ1) and the RHS of EINT has the form φ2(μ2). This corresponds to the case that Statement 1 of Lemma 6.10 holds, so there exist y1, y2,…,yn with Y = {y1, y2,…,yn} and a permutation σ : {1,2,…,n}→{1,2,…,n} such that EINT may be written as
$${\varphi}_{1}(y_{1}) {\varphi}_{1}(y_{2}) {\ldots} {\varphi}_{1}(y_{n}) \doteq \delta_{2} {\varphi}_{1}(y_{\sigma(2)}) {\ldots} {\varphi}_{1}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}_{1}(y_{\sigma(\iota)}) {\ldots} {\varphi}_{1}(y_{\sigma(n)})$$
where δ1δ2 = φ1(yσ(1)) with δ1, δ2ε and σ(ι) = 1. Note that by the definition of ΦE, the fact that δ2ε implies that δ1Δ(E) and δ2 = δ3yσ(1) for some δ3Δ(E).

Recalling that the RHS of EINT has the form φ2(μ2), we may directly infer that μ2 = yσ(1)yσ(2)…yσ(n) and subsequently φ2(yσ(2)) = δ2, φ2(yσ(ι)) = δ1φ1(yσ(ι)), and φ2(y) = φ1(y) for all y∉{yσ(2), yσ(ι)}. Thus φ1 and φ2 are close as required. □

We are now able to prove that each set \(H_{\varphi }^E\) is in fact a subset of the vertices of \({\mathscr{G}}^{\Rightarrow }_{[E]}\), and thus that the subgraphs \({\mathscr{H}}_{\varphi }^E\) of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) are well-defined.

Lemma 6.15

Let E be a basic RWE. Then \(H_{\varphi }^E \subseteq [E]_{\Rightarrow }\) for each φΦE.

Proof

Let Y = var(E)∖Δ(E) and let EY = πY(E). By Lemma 6.11, there exists φΦE such that φ(EY) ∈ [E]. Let \(\tilde {E} \in H^E_{{\varphi }^{\prime }}\) for some arbitrary \({\varphi }^{\prime } \in {{\varPhi }}_E\). By Lemma 6.7, there exist k ≤ 4Card(Δ(E)) + 1 and φ1, φ2,…,φkΦE such that \({\varphi } = {\varphi }_{1}, {\varphi }^{\prime } = {\varphi }_k\), and for 1 ≤ i < k, φi and φi+ 1 are close. Thus, by Lemma 6.14, there exist E1, E2,…,Ek such that \(E_i \in H^E_{{\varphi }_i} \cap H^E_{{\varphi }_{i+1}} \) for 1 ≤ i < k.

It follows from Lemma 6.12 that if \(E^{\prime },E^{\prime \prime } \in H^E_{{\varphi }_i}\) for some i,1 ≤ ik, then \(E^{\prime } \Rightarrow ^{*} E^{\prime \prime }\). Thus, φ(EY) ⇒φ(E1), \(E_k \Rightarrow ^{*} \tilde {E}\), and for 1 ≤ ik, EiEi+ 1. Consequently, \(\tilde {E} \in [E]_{\Rightarrow }\). Since this holds for all \(\tilde {E} \in {\mathscr{H}}^E_{{\varphi }^{\prime }}\) for all \({\varphi }^{\prime } \in {{\varPhi }}_E\), the lemma follows. □

The following lemma completes the proof of Statement 1 of Theorem 6.8.

Lemma 6.16

Let E be a basic RWE. Let Y = var(E)∖Δ(E), let EY = πY(E), and let φΦE. Then \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) is isomorphic to an isolated path contraction of order Card(Δ(E)) of \({\mathscr{H}}_{\varphi }^E\).

Proof

For k ≥ 0, we shall say that a sequence of equations E0, E1,…,Ek+ 1 as a U-path if \(E_0, E_{k+1} \in V^E_{\varphi }\), \(E_i \in U^E_{\varphi }\) for 1 ≤ ik, and there exists Z ∈{L, R} such that E0ZE1ZE2Z… ⇒ZEkZEk+ 1. Let ◇ be the relation on \(V_{\varphi }^E\) such that \(E^{\prime } \diamond E^{\prime \prime }\) if and only if \(E^{\prime },E^{\prime \prime } \in V_{\varphi }^E\) and there exists a U-path starting with \(E^{\prime }\) and ending with \(E^{\prime \prime }\). We shall show firstly that the graph \({\mathscr{G}}^{\diamond }_{V_{\varphi }^E}\) is an isolated path compression of order Card(Δ(E)) of \({\mathscr{H}}_{\varphi }^E\), and secondly that \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) is isomorphic to \({\mathscr{G}}^{\diamond }_{V_{\varphi }^E}\).

Clearly, every U-path is a path in \({\mathscr{H}}^E_{\varphi }\). Moreover, it follows from the definition of \(H^E_{\varphi }\), along with the fact that \(\Rightarrow _Z^{*}\) is an equivalence relation for Z ∈{L, R}, that for every vertex \(E^{\prime } \in U^E_{\varphi }\), there exist \(E^{\prime \prime },E^{\prime \prime \prime } \in V^E_{\varphi }\) and Z ∈{L, R} such that \(E^{\prime \prime } \Rightarrow _Z^{*} E^{\prime }\) and \(E^{\prime } \Rightarrow _Z^{*} E^{\prime \prime \prime }\). Consequently, every vertex in \({\mathscr{H}}^E_{\varphi }\) either belongs to \(V^E_{\varphi }\) or is the internal vertex of some U-path. It follows as a direct consequence of the following claim that U-path containing a given vertex in \(U_{\varphi }^E\) is unique, and therefore that no two distinct U-paths share an internal vertex. Thus \({\mathscr{G}}^{\diamond }_{V_{\varphi }^E}\) is an isolated path compression of order k of \({\mathscr{H}}_{\varphi }^E\) where k is the number of internal vertices in the longest U-path in \({\mathscr{H}}_{\varphi }^E\).

Claim 6.16.1

Let \(E^{\prime } \in U_{\varphi }^E\). Then the in- and out-degrees of \(E^{\prime }\) in \({\mathscr{H}}_{\varphi }^E\) are exactly one.

Proof

Since \(E^{\prime } \in U_{\varphi }^E\), there exist a permutation σ : {1,2,…,n}→{1,2,…,n} and y1, y2,…,yn with Y = {y1, y2,…,yn} such either Statement 1 or Statement 2 of Lemma 6.10 holds. Suppose that Statement 1 holds. The case that Statement 2 holds is symmetric. Then we may write \(E^{\prime }\) as follows:
$$ {\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)})$$
where σ(ι) = 1, δ1δ2 = φ(yσ(1)) and δ1, δ2ε. Moreover, \(\hat {E} \in [E_Y]_{\Rightarrow }\) where \(\hat {E}\) is given by \(y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)}\). Note that \({\varphi }(\hat {E}) \Rightarrow _L^{*} E^{\prime }\).
Let \(E^{\prime }_{\text {pre}_L}, E^{\prime }_{\text {suc}_L}\) be the equations such that \(E^{\prime }_{\text {pre}_L} \Rightarrow _L E^{\prime }\) and \(E^{\prime } \Rightarrow _L E^{\prime }_{\text {suc}_L}\). It follows from the definitions that \({\varphi }(\hat {E}) \Rightarrow _L^{*} E^{\prime }_{\text {pre}_L}\) and \({\varphi }(\hat {E}) \Rightarrow _L^{*} E^{\prime }_{\text {suc}_L}\), so both belong to \(H_{\varphi }^E\) and the in- and out-degree of \(E^{\prime }\) in \({\mathscr{H}}_{\varphi }^E\) are both at least one. To see that they are exactly one, we must show that for the equations \(E^{\prime }_{\text {pre}_R}\) and \(E^{\prime }_{\text {suc}_R}\) such that \(E^{\prime }_{\text {pre}_R} \Rightarrow _R E^{\prime }\) and \(E^{\prime } \Rightarrow _R E^{\prime }_{\text {suc}_R}\), neither \(E^{\prime }_{\text {pre}_R}\) nor \(E^{\prime }_{\text {suc}_R}\) is contained in the set \(H_{\varphi }^E\). We may write \(E^{\prime }_{\text {pre}_R}\) as
$$ z {\varphi}(y_{1}) {\varphi}(y_{2}) {\ldots} \delta_{3} \delta_{2} {\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(i-1)}) \delta_{1} {\varphi}(y_{\sigma(i)}) {\ldots} {\varphi}(y_{\sigma(n)})$$
where zX and δ3X such that δ3z = δ1, and we may write \(E^{\prime }_{\text {suc}_R}\) as
$$ \gamma {\varphi}(y_{2}) {\ldots} \delta_{1} z^{\prime} \delta_{2} {\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(i-1)}) \delta_{1} {\varphi}(y_{\sigma(i)}) {\ldots} {\varphi}(y_{\sigma(n)})$$
where \(z^{\prime } \in X\) and γX such that \(z^{\prime } \gamma = {\varphi }(y_{1})\). It follows by Fact 6.9 and Lemma 6.10 that any equation in \(V_{\varphi }^E \cup U_{\varphi }^E = H_{\varphi }^E\) must have φ(yσ(1)) = δ1δ2 occurring as a factor of at least one side. However, since each variable occurs exactly once on each side of the equations \(E^{\prime }_{\text {pre}_R}\), \(E^{\prime }_{\text {suc}_R}\), we may immediately observe that φ(yσ(1)) does not occur as a factor of the LHS or of the RHS of either equation. Thus \(E^{\prime }_{\text {pre}_R}\), \(E^{\prime }_{\text {suc}_R} \notin H_{\varphi }^E\), and the in- and out-degrees of \(E^{\prime }\) in \({\mathscr{H}}_{\varphi }^E\) are exactly one as claimed. □

The following claim asserts that each vertex in \(E^{\prime } \in U^E_{\varphi }\) occurs on a U-path with at most Card(Δ(E)) internal vertices. Since we have already shown that \(E^{\prime }\) occurs on exactly one U-path, it follows that all U-paths have at most Card(Δ(E)) internal vertices and thus that the order of the isolated path compression is at most Card(Δ(E)).

Claim 6.16.2

Let \(E^{\prime } \in U_{\varphi }^E\). Then there exist k ≤Card(Δ(E)), E0, E1,…,Ek+ 1 and Z ∈{L, R} such that:
  1. 1.

    \(E_0, E_{k+1} \in V_{\varphi }^E\), and

     
  2. 2.

    \(E_i \in U_{\varphi }^E\) for 1 ≤ ik, and

     
  3. 3.

    EiZEi+ 1 for 0 ≤ ik, and

     
  4. 4.

    there exists i,1 ≤ ik such that \(E^{\prime } = E_i\).

     

Proof

Since \(E^{\prime } \in U_{\varphi }^E\), there exist a permutation σ : {1,2,…,n}→{1,2,…,n} and y1, y2,…,yn with Y = {y1, y2,…,yn} such either Statement 1 or Statement 2 of Lemma 6.10 holds. Suppose that Statement 1 holds. The case that Statement 2 holds is symmetric. Then the equation \(\hat {E}\) given by \(y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)}\) is contained in [EY] and we may write \(E^{\prime }\) as follows
$$ {\varphi}(y_{1}){\varphi}(y_{2}) \!\ldots\! {\varphi}(y_{n}) \!\doteq\! z_{j+1} {\ldots} z_{k} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) z_{1} \!\ldots\! z_{j} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)})$$
where σ(ι) = 1, z1z2zk = φ(yσ(1)) and 1 ≤ j < k ≤Card(Δ(E)) + 1.
Now, let E0 be the equation given by
$${\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq \overbrace{z_{1} z_{2} {\ldots} z_{k}}^{{\varphi}(y_{1})} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}),$$
let Ek+ 1 be the equation
$${\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq{\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \overbrace{z_{1} z_{2} {\ldots} z_{k}}^{{\varphi}(y_{1})} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}),$$
and for 1 ≤ i < k, let Ei be the equation given by
$$ {\varphi}(y_{1}){\varphi}(y_{2}) ... {\varphi}(y_{n}) \doteq z_{i+1} ... z_{k} y_{\sigma(1)} {\varphi}(y_{\sigma(2)}) ... {\varphi}(y_{\sigma(\iota-1)}) z_{1} {\ldots} z_{i} {\varphi}(y_{\sigma(\iota)}) ... {\varphi}(y_{\sigma(n)}).$$
Then clearly we have \(E_0 = {\varphi }(\hat {E}) \in V_{\varphi }^E\). Let \(\hat {E}^{\prime }\) be the equation given by \(y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (2)} {\ldots } y_{\sigma (\iota -1)} y_{\sigma (1)} y_{\sigma (\iota )} {\ldots } y_{\sigma (n)}\). Then \(\hat {E} \Rightarrow \hat {E}^{\prime }\) so \(\hat {E}^{\prime } \in [E_Y]_{\Rightarrow }\), and moreover \(E_{k+1} = {\varphi }(\hat {E}^{\prime })\) so \(E_{k+1} \in V_{\varphi }^E\). Thus Statement 1 is satisfied. Note also that EiLEi+ 1 for 0 ≤ ik, so Statement 3 is satisfied, and furthermore we have that \(E_i \in H_{\varphi }^E\) for 1 ≤ ik. For each i,1 ≤ ik, since each variable yY occurs exactly once on each side Ei, we may conclude that φ(yσ(1)) = z1z2zk is not a factor of the RHS of Ei. Thus, by Fact 6.9, \(E_i \notin V_{\varphi }^E\) so \(E_i \in U_{\varphi }^E\) and Statement 2 is satisfied. Finally note that \(E^{\prime } = E_j\), so Statement 4 is also satisfied. □

It remains to show that \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) is isomorphic to \({\mathscr{G}}^{\diamond }_{V^E_{\varphi }}\). Recall that by definition \(V_{\varphi }^E = \{ {\varphi }(E^{\prime }) \mid E^{\prime } \in [E_Y]_{\Rightarrow }\}\) and note that the function mapping equations \(\hat {E} \in [E_Y]_{\Rightarrow }\) to their counterparts \({\varphi }(\hat {E}) \in V^E_{\varphi }\) is a bijection. Consequently, the fact that \({\mathscr{G}}^{\diamond }_{V_{\varphi }^E}\) is isomorphic to \({\mathscr{G}}^{\Rightarrow }_{[E_Y]}\) follows directly from the following claim.

Claim 6.16.3

Let \(\hat {E}_{1},\hat {E}_{2} \in [E_Y]_{\Rightarrow }\). Then \(\hat {E}_{1} \Rightarrow \hat {E}_{2}\) if and only if \({\varphi }(\hat {E}_{1}) {\diamond } {\varphi }(\hat {E}_{2})\).

Proof

Suppose that \(\hat {E}_{1} \Rightarrow \hat {E}_{2}\). Then it follows from Lemma 6.12 that \({\varphi }(\hat {E}_{1}) \diamond {\varphi }(\hat {E}_{2})\). Suppose instead that \({\varphi }(\hat {E}_{1}) \diamond {\varphi }(\hat {E}_{2})\). Since \(\hat {E}_{1} \in [E_Y]_{\Rightarrow }\), it may be written as
$$y_{1}y_{2} {\ldots} y_{n} \doteq y_{\sigma(1)} y_{\sigma(2)} {\ldots} y_{\sigma(n)}$$
where Y = {y1, y2,…,yn} and σ : {1,2,…,n}→{1,2,…,n} is a permutation.
By definition of ◇, there exists Z ∈{L, R} and \(\ell \in \mathbb {N}\) such that \({\varphi }(\hat {E}_{1}) \Rightarrow _Z^{\ell } {\varphi }(\hat {E}_{2})\). Suppose that Z = L. The case that Z = R is symmetric. For i > 1, let Ei be the equation such that \({\varphi }(\hat {E}_{1}) \Rightarrow _L^i {\varphi }(E_i)\). Let k = |φ(yσ(1))|− 1. Then we may write Ek+ 1 as
$${\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) {\varphi}(y_{\sigma(1)}) {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}).$$
Let \(\hat {E}_3\) be the equation given by \(y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (2)} {\ldots } y_{\sigma (\iota -1)} y_{\sigma (1)} y_{\sigma (\iota )} {\ldots } y_{\sigma (n)}\). Then \(\hat {E}_{1} \Rightarrow \hat {E}_3\) so \(\hat {E}_3 \in [E_Y]_{\Rightarrow }\) and it follows from Fact 6.9 that \(E_{k+1} \in V_{\varphi }^E\). Hence we must have k + 1. Moreover, for 1 ≤ ik, there exist δ1, δ2 such that δ1δ2 = φ(yσ(1)) and δ1, δ2ε and such that we may write Ei as
$${\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}).$$
Consequently, by Lemma 6.10, \(E_i \in U_{\varphi }^E\) for 1 ≤ ik. By definition, \({\varphi }(\hat {E}_{2}) \in V_{\varphi }^E\), so it follows that > k and thus = k + 1, and thus that in fact \(\hat {E}_{2} = \hat {E}_3\), meaning that \(\hat {E}_{1} \Rightarrow \hat {E}_{2}\) as required. □

Claims 6.16.1 and 6.16.2 show that the graph \({\mathscr{G}}^{\diamond }_{V_{\varphi }^E}\) is an isolated path compression of order Card(Δ(E)) of \({\mathscr{H}}_{\varphi }^E\). Claim 6.16.3 shows that \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is isomorphic to \({\mathscr{G}}^{\diamond }_{V_{\varphi }^E}\), so the statement of the lemma holds. □

The following lemma deals with the second statement of Theorem 6.8. It asserts that the subgraphs \({\mathscr{H}}^E_{\varphi }\) completely cover the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\): each edge and each vertex of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) also belong to at least one subgraph \({\mathscr{H}}^E_{\varphi }\).

Lemma 6.17

Let E be a basic RWE. Then \({\mathscr{G}}^{\Rightarrow }_{[E]} = \bigcup \limits _{{\varphi } \in {{\varPhi }}_E} {\mathscr{H}}^E_{\varphi }\).

Proof

We have already shown in Lemma 6.15 that each vertex of \(\bigcup \limits _{{\varphi } \in {{\varPhi }}_E} {\mathscr{H}}^E_{\varphi }\) is a vertex of \({\mathscr{G}}^{\Rightarrow }_{[E]}\). Moreover, it follows directly from the definition of \({\mathscr{H}}^E_{\varphi }\) that each edge in \(\bigcup \limits _{{\varphi } \in {{\varPhi }}_E} {\mathscr{H}}^E_{\varphi }\) is also an edge of \({\mathscr{G}}^{\Rightarrow }_{[E]}\). It remains to show that each vertex/edge of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is a vertex/edge of \({\mathscr{H}}^E_{\varphi }\) for some φΦE. The main step is Claim 6.17.1 as follows.

Claim 6.17.1

For every \(E^{\prime } \in [E]_{\Rightarrow }\), and Z ∈{L, R}, there exists φΦE and \(E^{\prime \prime } \in V_{\varphi }^E\) such that \(E^{\prime \prime } \Rightarrow _Z^{*} E^{\prime }\).

Proof

Note that by Lemma 6.11, there exists E0 ∈ [E] and φ0ΦE such that \(E_0 \in V_{{\varphi }_0}^E\) and thus the claim holds for \(E_0 = E^{\prime }\). Note also that for every \(E^{\prime } \in [E]_{\Rightarrow }\), since ⇒ is an equivalence relation, we have \(E_0 \Rightarrow ^{*} E^{\prime }\). Thus it is sufficient to show that if the claim holds for Ei and EiEi+ 1, then it also holds for Ei+ 1.

Suppose that the claim holds for Ei ∈ [E] and that \(E_i \Rightarrow _{Z_i} E_{i+1}\). Then there exist φiΦE and \(E^{\prime \prime }_i \in V^E_{{\varphi }_i}\) such that \(E^{\prime \prime }_i \Rightarrow _{Z_i}^{*} E_i\) and thus \(E^{\prime \prime }_i \Rightarrow _{Z_i}^{*} E_{i+1}\). Thus \(E_{i+1} \in H^E_{{\varphi }_i}\). If \(E_{i+1} \in V^E_{{\varphi }_i}\), then the claim holds trivially. Suppose instead that \(E_{i+1} \in U_{{\varphi }_i}^E\).

Let Y = var(E)∖Δ(E) and let EY = πY(E). Recall that there exist a permutation σ : {1,2,…,n}→{1,2,…,n} and y1, y2,…,yn with Y = {y1, y2,…,yn} such either Statement 1 or Statement 2 of Lemma 6.10 holds. Suppose that Statement 1 holds (the case that Statement 2 holds is symmetric). Then we may write Ei+ 1 as
$$ {\varphi}_{i}(y_{1}){\varphi}_{i}(y_{2}) {\ldots} {\varphi}_{i}(y_{n}) \doteq \delta_{2} {\varphi}_{i}(y_{\sigma(2)}) {\ldots} {\varphi}_{i}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}_{i}(y_{\sigma(\iota)}) {\ldots} {\varphi}_{i}(y_{\sigma(n)})$$
where σ(ι) = 1, δ1δ2 = φi(yσ(1)) and δ1, δ2ε. Furthermore, we have \(\hat {E} \in [E_Y]_{\Rightarrow }\) where \(\hat {E}\) is the equation given by
$$ y_{1}y_{2}{\ldots} y_{n} \doteq y_{\sigma(1)} y_{\sigma(2)} {\ldots} y_{\sigma(n)}.$$
It is straightforward to see that for Z = L, \({\varphi }_i(\hat {E}) \Rightarrow _Z^{*} E_{i+1}\) and since \({\varphi }_i(\hat {E}) \in V^E_{{\varphi }_i}\) by definition, the claim holds in this case.
It remains to consider the case that Z = R. By Lemma 6.3, EY is basic and therefore indecomposable. Thus y1yσ(1). Let φi+ 1 : YX be the morphism such that φi+ 1(yσ(1)) = δ2, φi+ 1(y1) = δ1φi(y1), and φi+ 1(yj) = φi(yj) for 1 ≤ jn with j∉{1,σ(1)}. Note that φi+ 1ΦE since δ1Δ(E) and δ2Δ(E)yσ(1). Let \(E^{\prime \prime }_{i+1}\) be the equation given by \({\varphi }_{i+1}(\hat {E})\), so that \(E^{\prime \prime }_{i+1} \in V^E_{{\varphi }_{i+1}}\). Then we may write \(E^{\prime \prime }_{i+1}\) as:
$$ \begin{array}{@{}rcl@{}} && \delta_{1} {\varphi}_{i}(y_{1}) {\varphi}_{i}(y_{2}) {\ldots} {\varphi}_{i}(y_{\sigma(1)-1}) \delta_{2} {\varphi}_{i}(y_{\sigma(1)+1}) {\ldots} {\varphi}_{i}(y_{n}) \\ \doteq && \delta_{2} {\varphi}_{i}(y_{\sigma(2)}) {\ldots} {\varphi}_{i}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}_{i}(y_{\sigma(\iota)}) {\ldots} {\varphi}_{i}(y_{\sigma(n)}).\end{array} $$

Consequently \(E^{\prime \prime }_{i+1} \Rightarrow _R^{*} E_{i+1}\), so the claim holds for Ei+ 1 and by induction, it holds for all \(E^{\prime } \in [E]_{\Rightarrow }\). □

It follows directly from Claim 6.17.1 that every vertex of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) belongs to \(H^E_{\varphi }\) for some φΦE and is consequently also a vertex of some subgraph \({\mathscr{H}}^E_{\varphi }\). To see why the same holds for edges, note firstly that for every edge (E1, E2) in \({\mathscr{G}}^{\Rightarrow }_{[E]}\), there exists Z ∈{L, R} such that E1ZE2. By Claim 6.17.1 and since E1 ∈ [E], there exist φΦE and \(E^{\prime } \in V^E_{\varphi }\) such that \(E^{\prime } \Rightarrow _Z^{*} E_{1}\). It follows that \(E^{\prime } \Rightarrow _Z^{*} E_{2}\), meaning that \(E_{1},E_{2} \in H^E_{\varphi }\) (so they are both vertices of \({\mathscr{H}}^E_{\varphi }\)). It follows by definition that (E1, E2) is an edge of \({\mathscr{H}}^E_{\varphi }\). □

The proof of Theorem 6.8 is completed by the following lemma which addresses the third statement of the theorem.

Lemma 6.18

Let E be a basic RWE. Let Y = var(E)∖Δ(E) and let EY = πY(E). Let \(d= \max \limits \{1,{diam}({\mathscr{G}}^{\Rightarrow }_{[E_Y]})\}\). Then \({diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(d|E|^2)\).

Proof

Let \(E^{\prime }, E^{\prime \prime } \in [E]_{\Rightarrow }\). Then by Lemma 6.17, there exist \({\varphi }^{\prime },{\varphi }^{\prime \prime } \in {{\varPhi }}_E\) such that \(E^{\prime } \in H_{{\varphi }^{\prime }}\) and \(E^{\prime \prime } \in H_{{\varphi }^{\prime \prime }}\). For each φΦE, note that by Lemma 6.16 and Remark 4.6, there is path of length O(d Card(Δ(E))) between any two vertices in \(H^E_{\varphi }\). Thus if \({\varphi }^{\prime } = {\varphi }^{\prime \prime }\), then there is path of length O(d Card(Δ(E))) from \(E^{\prime }\) to \(E^{\prime \prime }\).

Suppose otherwise that \({\varphi }^{\prime } \not = {\varphi }^{\prime \prime }\). Then it follows from Lemma 6.7, there exist kO(Card(Δ(E))) and φ1, φ2,…,φkΦE such that \({\varphi }^{\prime } = {\varphi }_{1}\), \({\varphi }^{\prime \prime } = {\varphi }_k\) and φi, φi+ 1 are close for 1 ≤ i < k. By Lemma 6.14, there exist E1, E2,…,Ek− 1 such that \(E_i \in H^E_{{\varphi }_i} \cap H^E_{{\varphi }_{i+1}}\) for 1 ≤ i < k.

It follows that there exist paths from \(E^{\prime }\) to E0, from Ek to \(E^{\prime \prime }\) and from Ei to Ei+ 1 for 1 ≤ i < k of length O(d Card(Δ(E))). Thus there is a path from \(E^{\prime }\) to \(E^{\prime \prime }\) of length O(kd Card(Δ(E))) = O(d Card(Δ(E))2) = O(d|E|2). Since this is true for all \(E^{\prime },E^{\prime \prime }\), the statement of the lemma follows. □

7 Normal Forms and Block Decompositions

Having described the structure of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) for equations E which are not jumbled in the previous section, the current section focuses on the structure of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) in the case that E is jumbled. Our main result in this direction is the existence of specific normal forms, from which every vertex in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is polynomial distance away. We present two normal forms, with the second being a restriction on the first. Both are constructed based on reversed structures in such a way that they allow for taking full advantage of the invariant ΥE from Section 5. A major advantage of this is that we are able to show later in Section 8 that the number of equations occurring as vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) in the second normal form is bounded by a polynomial in |E|, allowing us to prove that the diameter of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is also polynomial.

Since the results in this section mainly concern positive reachability statements, the technical content relies heavily on describing sequences of applications of ⇒. Certain sequences will occur repeatedly, so it is convenient to define some shorthand notations given in terms of the following ‘shortcut’ relations.

Definition 7.1 (\(\xrightarrow {u,v}\) and ⊙)

For each u, vX, we define the relation \(\xrightarrow {u,v} \) over basic regular equations as \(E_{1}\xrightarrow {u,v} E_{2}\) if there exist x, yX and α1, α2, α3, β1, β2, β3 ∈ (X∖{u, v, x, y}) such that E1 may be written as \(x \alpha _{1} u \alpha _{2} v \alpha _3 y \doteq y \beta _{1} u \beta _{2} v \beta _3 x\) and E2 may be written as \(x \alpha _{1} v \alpha _3 u \alpha _{2} y \doteq y \beta _{1} v \beta _3 u \beta _{2} x\). Additionally, we define \(\odot = \bigcup \limits _{u,v\in X} \xrightarrow {u,v} \).

Note that there exist u, vX such that \(E_{1}\xrightarrow {u,v} E_{2}\) if and only if E1E2. The following lemma verifies that if E1E2, then we can reach E2 from E1 by a short sequence of applications of the rewriting transformation ⇒, or equivalently, that there is a short path from E1 to E2 in \({\mathscr{G}}^{\Rightarrow }_{[E_{1}]}\).

Lemma 7.2

Let x, y, u, vX and α1, α2, α3, β1, β2, β3 ∈ (X∖{x, y, u, v}). Let E1 be the basic RWE given by \(x \alpha _{1} u \alpha _{2} v \alpha _3 y \doteq y \beta _{1} u \beta _{2} v \beta _3 x\) and let E2 be the basic RWE given by \(x \alpha _{1} v \alpha _3 u \alpha _{2} y \doteq y \beta _{1} v \beta _3 u \beta _{2} x\). Then there exist n1, n2 < 4|E1| such that \(E_{1} \Rightarrow ^{n_{1}} E_{2}\) and \(E_{2} \Rightarrow ^{n_{2}} E_{1}\).

Proof

Let E3, E4, E5 be the equations given as follows:
$$ \begin{array}{@{}rcl@{}} E_{3}: &&v \alpha_{3} x \alpha_{1} u \alpha_{2} y \doteq y \beta_{1} u \beta_{2} v \beta_{3} x \\ E_{4}: &&x \alpha_{1} v \alpha_{3} u \alpha_{2} y \doteq u \beta_{2} y \beta_{1} v \beta_{3} x \\ E_{5}: &&v \alpha_{3} x \alpha_{1} u \alpha_{2} y \doteq u \beta_{2} y \beta_{1} v \beta_{3} x. \end{array} $$

Then it follows directly from the definitions that \(E_{1} \Rightarrow _R^{*} E_3 \Rightarrow _L^{*} E_5 \Rightarrow _R^{*} E_4 \Rightarrow _L^{*} E_{2}\). Thus, by Remark 3.2, there exists n1 < 4|E1| such that \(E_{1} \Rightarrow ^{n_{1}} E_{2}\). By the same remark, we know that \(\Rightarrow _L^{*},\Rightarrow _R^{*}\) are symmetric, and thus we may similarly conclude that \(E_{2} \Rightarrow _L^{*} E_4 \Rightarrow _R^{*} E_5 \Rightarrow _L^{*} E_3 \Rightarrow _R^{*} E_{1}\) so there exists n2 < 4|E1| such that \(E_{2} \Rightarrow ^{n_{2}} E_{1}\). □

Corollary 7.3

Let E1, E2 be basic RWEs. If E1mE2 for some \(m\in \mathbb {N}\), then E1nE2 for some nO(|E1|m).

The first of our two normal forms is defined as follows. Theorem 7.5 confirms the desired property that any basic RWE E can be transformed into an equation \(\overline {E}\) which is in normal form in a small (i.e. polynomial in |E|) number of rewriting steps.

Definition 7.4 (Normal Form)

Let E be a basic RWE. Then E is in normal form if it can be written as \(x \alpha _{1} \alpha _{2}, {\ldots } \alpha _n y \doteq y \alpha _{1}^R \alpha _{2}^R {\ldots } \alpha _n^R x\) where x, yX, αiX+ for 1 ≤ in, and |αi|≤ 3 for 1 ≤ i < n.

Theorem 7.5

Let E be a jumbled basic RWE. Then there exists \(\overline {E}\) which is in normal form and such that \(E \Rightarrow ^{n_{1}} \overline {E}\) and \(\overline {E} \Rightarrow ^{n_{2}} E\) for some n1, n2O(|E|3).

The main step in the proof of Theorem 7.5 is the following lemma, which we shall make use of again later and is therefore stated independently.

Lemma 7.6

Let E be a jumbled basic RWE of the form \(x \gamma _{1} \beta _{1} y \doteq y \gamma _{2} \beta _{2} x\) where x, yX, γ1, γ2, β1β2 ∈ (X∖{x, y}) and var(γ1) = var(γ2). Then at least one of the following two statements holds:
  1. 1.

    \(\beta _{1} = \beta _{2}^R\), or

     
  2. 2.

    there exists αvar(β1) with 1 ≤|α|≤ 3, η1, η2var(β1) and nO(|E|) such that \(E \odot ^n x\gamma _{1} \alpha \eta _{1} y \doteq y \gamma _{2} \alpha ^R \eta _{2} x\).

     

Proof

Throughout this proof, we shall use the fact that \(E_{1} \xrightarrow {u,v} E_{2}\) implies E1E2 and shall use the two notations interchangeably where convenient. Let E be a jumbled basic RWE of the form \(x \gamma _{1} \beta _{1} y \doteq y \gamma _{2} \beta _{2} x\) where x, yX, γ1, γ2, β1β2 ∈ (X∖{x, y}) and var(γ1) = var(γ2). Suppose that \(\beta _{1} \not = \beta _{2}^R\). Note that since E is basic and regular, var(β1) = var(β2), and moreover we have that β1, β2ε. Hence we may write E in the form:
$$ x\gamma_{1} u \delta_{1} y \doteq y \gamma_{2} \delta_{2} u \delta_{3} x $$
(1)
where uX, and δ1, δ2, δ3X such that uδ1 = β1 and δ2uδ3 = β2. If δ2 = ε then we can set α = u and we are done. Otherwise, the next step is to show that we can get to an equation of the form
$$ x\gamma_{1} u \delta_{1}^{\prime} z_{1}z_{2}{\ldots} z_{k} \delta_{2}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u \delta_{3}^{\prime} x $$
(2)
where z1, z2,…,zkX, \(\delta _{1}^{\prime },\delta _{2}^{\prime }, \delta _{3}^{\prime } \in X^{*}\). Suppose that our equation of the form (1) is not already of the form (2). Suppose firstly that there exist v1, v2X such that v1 and v2 occur in the same order in δ1 and δ2. In other words, suppose there exist δ1,1, δ1,2, δ1,3, δ2,1, δ2,2, δ2,3X such that we can write δ1 = δ1,1v1δ1,2v2δ1,3 and δ2 = δ2,1v1δ2,2v2δ2,3. Then we have that \(E \xrightarrow {v_{1},v_{2}} E_{1,2}\) where E1,2 is given by \(x\gamma _{1} u \hat {\delta _{1}} y \doteq y \gamma _{2} \hat {\delta _{2}} u \hat {\delta _3} x\) such that \(|\hat {\delta _{2}}| <|\delta _{2}|\), with \(\hat {\delta _{1}} = \delta _{1,1} v_{2} \delta _{1,3} v_{1} \delta _{1,2} \), \(\hat {\delta _{2}} = \delta _{2,1} v_{2} \delta _{2,3}\) and \(\hat {\delta _3} = \delta _3 v_{1} \delta _{2,2}\).
Iterating this, we may thus conclude that there exists n1 ≤|δ2| and a sequence \(E = E_{1,1} \odot E_{1,2} \odot {\ldots } \odot E_{1,n_{1}}\) such that \(E_{1, n_{1}}\) has the form
$$x\gamma_{1} u \hat{\delta}_{1}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u \hat{\delta}_{3}^{\prime} x$$
where z1, z2,…,zkX and \(\hat {\delta }_{1}^{\prime } \in X^{*} z_{1} X^{*} z_{2} X^{*} {\ldots } X^{*} z_k X^{*}\). If all the internal X factors are the empty word (i.e. if \(\hat {\delta }_{1}^{\prime } \in X^{*}z_{1}z_{2} {\ldots } z_k X^{*}\)), then \(E_{1,n_{1}}\) already has the desired form described by (2). Otherwise, there exists wX∖{z1, z2,…,zk} such that w occurs between z1 and zk in \(\hat {\delta }_{1}^{\prime }\). More precisely, we can write \(E_{1,n_{1}}\) as:
$$x \gamma_{1} u \hat{\delta}_{1,1} z_{1} \hat{\delta}_{1,2} w \hat{\delta}_{1,3} z_{k} \hat{\delta}_{1,4} y \doteq y \gamma_{2} z_{k} \hat{\delta}_{2,1} z_{1} \hat{\delta}_{3,1} w \hat{\delta}_{3,2} x$$
where \(\hat {\delta }_{1,1}, \hat {\delta }_{1,2}, \hat {\delta }_{1,3}, \hat {\delta }_{1,4}, \hat {\delta }_{2,1}, \hat {\delta }_{3,1}, \hat {\delta }_{3,2} \in X^{*}\) such that \(\hat {\delta }_{1}^{\prime } = \hat {\delta }_{1,1}z_{1} \hat {\delta }_{1,2} w \hat {\delta }_{1,3}z_k \hat {\delta }_{1,4}\), and \(z_{k-1} z_{k-2} {\ldots } z_{2} = \hat {\delta }_{2,1}\), and \(\hat {\delta }_{3}^{\prime } = \hat {\delta }_{3,1} w \hat {\delta }_{3,2}\). In this case we have \(E_{1,n_{1}} \xrightarrow {z_{1},w} E_{2,1} \xrightarrow {z_k,z_{1}} E_{2,2}\) where E2,1 is given by
$$ x\gamma_{1} u \hat{\delta}_{1,1}w\hat{\delta}_{1,3}z_{k}\hat{\delta}_{1,4} z_{1}\hat{\delta}_{1,2} y \doteq y \gamma_{2} z_{k} \hat{\delta}_{2,1}w \hat{\delta}_{3,2} z_{1} u \hat{\delta}_{3,1} x $$
and E2,2 is given by
$$ x\gamma_{1} u \hat{\delta}_{1,1}w\hat{\delta}_{1,3} z_{1}\hat{\delta}_{1,2} z_{k} \hat{\delta}_{1,4}y \doteq y \gamma_{2} z_{1} u \hat{\delta}_{3,1} z_{k} \hat{\delta}_{2,1}w \hat{\delta}_{3,2} x $$
which is again of the desired form described by (2) for k = 1 and z1 = v1. In all cases, there exists n2 ≤|E| such that \(E \odot ^{n_{2}} E_{2,2}\) for some equation E2,2 of the desired form (2).
Now suppose that E2,2 has the form (2), and define \(\delta _{1}^{\prime },\delta _{2}^{\prime },\delta _{3}^{\prime }\) accordingly. Next, we note that there exists n3 ∈{0,1} such that \(E_{2,2} \odot ^{n_3} E_{3}\) where E3 has the form
$$ x\gamma_{1} u^{\prime} z_{1} z_{2} {\ldots} z_{k} \delta_{1}^{\prime\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u^{\prime} \delta_{2}^{\prime\prime} x $$
(3)
where \(u^{\prime } \in X\) and \(\delta _{1}^{\prime \prime }, \delta _{2}^{\prime \prime } \in X^{*}\). Indeed, if \(\delta _{1}^{\prime } = \varepsilon \), then this is trivial, simply taking E3 = E2,2. Otherwise, there exists \(u^{\prime } \in X\) and \(\delta _{1,1}^{\prime }, \delta _{3,1}^{\prime }, \delta _{3,2}^{\prime }\) such that \(\delta _{1}^{\prime } = \delta _{1,1}^{\prime }u^{\prime }\) and \(\delta _{3}^{\prime } = \delta _{3,1}^{\prime } u^{\prime } \delta _{3,2}^{\prime }\). Then E2,2 may be written as:
$$ x\gamma_{1} u \delta_{1,1}^{\prime} u^{\prime} z_{1}z_{2}{\ldots} z_{k} \delta_{2}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u \delta_{3,1}^{\prime} u^{\prime} \delta_{3,2}^{\prime} x $$
and \(E_{2,2} \xrightarrow {u, u^{\prime }} E_3\) where E3 is given by
$$ x\gamma_{1} u^{\prime} z_{1}z_{2}{\ldots} z_{k} \delta_{2}^{\prime} u \delta_{1,1}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u^{\prime} \delta_{3,2}^{\prime} u \delta_{3,1}^{\prime} x $$
which is of the form (3) as required. Now, if k ≤ 2, we may take \(\alpha = u^{\prime }z_{1}z_{2} {\ldots } z_k\), \(\eta _{1} =\delta _{2}^{\prime } u \delta _{1,1}^{\prime }\) and \(\eta _{2} = \delta _{3,2}^{\prime } u \delta _{3,1}^{\prime }\) and we are done. Suppose otherwise that k ≥ 3. Next, we observe that if \(u\delta _{1,1}^{\prime }\) and \(u\delta _{3,1}^{\prime }\) share a non-empty suffix, then we have an equation of the form \(x {\ldots } s y \doteq y {\ldots } s x\). However, this implies that \((s,s) \in {\varUpsilon }_{\!E_3}\), and by Theorem 5.3, \({\varUpsilon }_{\!E_3} = {\varUpsilon }_{\!E}\), meaning that E is not jumbled: a contradiction. Consequently, there must exist s, tX with st and \(\beta _{1,1}^{\prime },\beta _{1,2}^{\prime },\beta _{2,1}^{\prime },\beta _{2,2}^{\prime } \in X^{*}\) such that E3 has the form
$$ x\gamma_{1} u^{\prime} z_{1}z_{2}{\ldots} z_{k} \beta_{1,1}^{\prime} s \beta_{1,2}^{\prime} t y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u^{\prime} \beta_{2,1}^{\prime} t \beta_{2,2}^{\prime} s x. $$
Then we have \(E_3 \xrightarrow {z_{2},s} E_{4,1} \xrightarrow {z_{1},t} E_{4,2} \xrightarrow {z_k,z_{1}} E_{4,3}\xrightarrow {u^{\prime },z_{k-1}} E_{4,4}\) where E4,1, E4,2, E4,3, E4,4 are given as follows:
$$ \begin{array}{@{}rcl@{}} E_{4,1}: && x\gamma_{1} u^{\prime} z_{1} s \beta_{1,2}^{\prime} t z_{2}{\ldots} z_{k} \beta_{1,1}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{3} s z_{2} z_{1} u^{\prime} \beta_{2,1}^{\prime} t \beta_{2,2}^{\prime} x \\ E_{4,2}: && x\gamma_{1} u^{\prime} t z_{2}{\ldots} z_{k} \beta_{1,1}^{\prime} z_{1} s \beta_{1,2}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{3} s z_{2} t \beta_{2,2}^{\prime} z_{1} u^{\prime} \beta_{2,1}^{\prime}x \\ E_{4,3}: && x\gamma_{1} u^{\prime} t z_{2}{\ldots} z_{k-1} z_{1} s \beta_{1,2}^{\prime} z_{k} \beta_{1,1}^{\prime} y \doteq y \gamma_{2} z_{1} u^{\prime} \beta_{2,1}^{\prime} z_{k} z_{k-1} {\ldots} z_{3} s z_{2} t \beta_{2,2}^{\prime} x \\ E_{4,4}: && x\gamma_{1} z_{k-1} z_{1} s \beta_{1,2}^{\prime} z_{k} \beta_{1,1}^{\prime} u^{\prime} t z_{2} {\ldots} z_{k-2} y \doteq y \gamma_{2} z_{1} z_{k-1} z_{k-2} {\ldots} z_{3} s z_{2} t \beta_{2,2}^{\prime} u^{\prime} \beta_{2,1}^{\prime} z_{k} x. \end{array} $$
Now, E4,4 has the required form with α = zk− 1z1, \(\eta _{1} = s \beta _{1,2}^{\prime } z_k \beta _{1,1}^{\prime } u^{\prime } t z_{2} {\ldots } z_{k-2}\) and \(\eta _{2} = z_{k-2} {\ldots } z_3 s z_{2} t \beta _{2,2}^{\prime } u^{\prime } \beta _{2,1}^{\prime } z_k\). Moreover, we have that EnE4,4 with nn2 + n3 + 4 ≤|E| + 5 ∈ O(|E|) as claimed. □

We can now prove Theorem 7.5 with a simple induction based on Lemma 7.6.

Proof Theorem 7.5.

By Lemma 6.13, we have that \(E \Rightarrow ^{n_{1}} E^{\prime }\) and \(E^{\prime } \Rightarrow ^{n_{1}^{\prime }} E\) where \(E^{\prime }\) is a basic regular equation of the form \(x \beta _{1} y \doteq y \beta _{2} x\) such that x, yX and β1, β2 ∈ (X∖{x, y}) with \(n_{1},n_{1}^{\prime } \in O(|E|^2)\). By Theorem 5.3, since E is jumbled, \(E^{\prime }\) is also jumbled. By a simple induction using Lemma 7.6 (starting with the case that γ1 = γ2 = ε) we can therefore infer that \(E^{\prime } \odot ^{n_{2}} \overline {E}\) for some \(\overline {E}\) in normal form and n2O(|E|2). It follows directly from the definitions that ⊙, is symmetric, so we also have that \(\overline {E} \odot ^{n_{2}} E^{\prime }\). Thus, by Corollary 7.3 we have that \(E^{\prime } \Rightarrow ^{n_3} \overline {E}\) and \(\overline {E} \Rightarrow ^{n_{3}^{\prime }} E^{\prime }\) for some \(n_3,n_{3}^{\prime } \in O(|E|^3)\), and therefore also that \(E \Rightarrow ^n \overline {E}\) and \(\overline {E} \Rightarrow ^{n^{\prime }} E\) for some \(n,n^{\prime } \in O(|E|^3)\) as claimed. □

The idea behind the first normal form is to divide the RWE into pairs \((\alpha _i, \alpha _i^R)\) which are regular-reversed word equations (although solutions to the full equation E are not necessarily solutions to these smaller equations), and for which all but one belong to a finite number of cases (i.e. three cases depending on the length of αi). Forcing the sub-equations to be regular-reversed gives us the most control when working with the invariant ΥE. Some intuition behind this fact can be derived from the observation that if we know that a (complete) basic RWE E is regular-reversed, we can uniquely reconstruct it from the leftmost two variables on the LHS and ΥE. Indeed, any regular-reversed basic RWE E can be written in the form \(x_{1}x_{2} {\ldots } x_n \doteq x_n x_{n-1} {\ldots } x_{1}\), meaning that ΥE = {(xi− 1, xi+ 1)∣2 ≤ in}∪{(xn− 1, x2)}, and if we know x1, then we may infer from ΥE all the odd-index variables (x3, x5,…) and if we know x2 then we may infer all the even-index variables (x4, x6,…).

Rather than looking at the pairs \((\alpha _i, \alpha _i^R)\) in isolation, in order to take full advantage of the invariant ΥE, we actually need to consider pairs of the form
$$(\alpha_{i} \alpha_{i+1} {\ldots} \alpha_{j}, {\alpha_{i}^{R}} \alpha_{i+1}^{R} {\ldots} {\alpha_{j}^{R}})$$
for well-chosen values i and j. We shall call such pairs blocks, which we define formally below.

Definition 7.7 (Blocks)

We define 3 variations of blocks which may each have up to two types.
  1. 1.

    A standard block is a pair \((\alpha _{1} \alpha _{2} {\ldots } \alpha _j, \alpha _{1}^R\alpha _{2}^R{\ldots } \alpha _j^R)\) such that j ≥ 1, αiX for 1 ≤ ij, |α1|∈{1,3}, and for each i,1 < ij, |αi| = 2. It is Type A if |α1| = 1 and Type B if |α1| = 3.

     
  2. 2.

    An initial block is a pair \((x \alpha _{1} {\ldots } \alpha _j, y \alpha _{1}^R {\ldots } \alpha _j^R)\) with j ≥ 0, x, yX with xy, and αi ∈ (X∖{x, y}) where |αi| = 2 for 1 ≤ ij. All initial blocks are Type A.

     
  3. 3.

    A final block is a pair (γ1δy, γ2δRx) where x, yX with xy, and γ1, γ2, δX with |δ|≥ 1 such that (γ1, γ2) is a block (initial or standard). It is Type A if (γ1, γ2) is Type A, and Type B otherwise.

     

Given an equation which is in normal form, we may decompose it uniquely into blocks in the following manner. The intuition behind this decomposition is that if we fix the invariant property ΥE, then each block (with the exception of the final block) is determined entirely by the block preceding it along with its first (leftmost in the first element) variable. This gives us a crucial degree of control when considering which equations in normal form may appear in \({\mathscr{G}}^{\Rightarrow }_{[E]}\).

Definition 7.8 (Block Decomposition)

Let E be a basic RWE in normal form. Then E may be written as \(x \alpha _{1}\alpha _{2}{\ldots } \alpha _n y \doteq y \alpha _{1}^R \alpha _{2}^R {\ldots } \alpha _n^R x\) where x, yX, αiX+ for 1 ≤ in, and |αi|≤ 3 for 1 ≤ i < n. Let I = {i1, i2,…,ik} = {i∣1 ≤ i < n and |αi|≠ 2} with 1 ≤ i1 < i2 < … < ik < n. If I = , let \(\mathfrak {B} = (E)\). Otherwise, let \(\mathfrak {B} = (B_0,B_{1}, \ldots , B_k)\) where for 0 ≤ jk, the Bj are blocks such that:
  1. 1.

    \(B_0 = (x \alpha _{1} {\ldots } \alpha _{i_{1}-1}, y \alpha _{1}^R {\ldots } \alpha _{i_{1}-1}^R)\),

     
  2. 2.

    \(B_k = (\alpha _{i_k} {\ldots } \alpha _n y, \alpha _{i_k}^R {\ldots } \alpha _n^R x)\), and

     
  3. 3.

    for 1 ≤ j < k, \(B_j = (\alpha _{i_j} {\ldots } \alpha _{i_{j+1}-1}, \alpha _{i_j}^R {\ldots } \alpha _{i_{j+1}-1}^R)\).

     
Then \(\mathfrak {B}\) is the block decomposition of E.
As an example, consider the basic RWE E given as follows:
$$ x \overbrace{z_{1} z_{2} }^{\alpha_{1}} \overbrace{z_{3}}^{\alpha_{2}} \overbrace{z_{4} z_{5} z_{6}}^{\alpha_{3}} \overbrace{z_{7} z_{8}}^{\alpha_{4}} \overbrace{z_{9}}^{\alpha_{5}} \overbrace{z_{10} z_{11} z_{12} z_{13} }^{\alpha_{6}} y \doteq y \overbrace{z_{2} z_{1} }^{{\alpha_{1}^{R}}} \overbrace{z_{3}}^{{\alpha_{2}^{R}}} \overbrace{z_{6} z_{5} z_{4}}^{{\alpha_{3}^{R}}} \overbrace{z_{8} z_{7}}^{{\alpha_{4}^{R}}} \overbrace{z_{9}}^{{\alpha_{5}^{R}}} \overbrace{z_{13} z_{12} z_{11} z_{10} }^{{\alpha_{6}^{R}}} x $$
Note that E is in normal form. Then I = {2,3,5} and the block decomposition of E is (B0, B1, B2, B3) where:
$$ \begin{array}{@{}rcl@{}} B_{0} &=& (xz_{1}z_{2}, yz_{2}z_{1})\\ B_{1} &=& (z_{3}, z_{3})\\ B_{2} &=& (z_{4}z_{5}z_{6}z_{7}z_{8}, z_{6}z_{5}z_{4}z_{8}z_{7})\\ B_{3} &=& (z_{9}z_{10}z_{11}z_{12}y, z_{9}z_{12}z_{11}z_{10}x). \end{array} $$
Another example illustrating the block decomposition of an equation in normal form is given in Fig. 6. The next fact follows directly from the definitions.
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig6_HTML.png
Fig. 6

A depiction of the equation E given by \(x z_{1} z_{2} z_3 z_4 z_5 z_6 z_7 z_8 z_9 z_{10} z_{11} z_{12} z_{13} z_{14} z_{15} y \doteq y z_{2} z_{1} z_5 z_4 z_3 z_7 z_6 z_{8} z_{10} z_9 z_{11} z_{15} z_{14} z_{13} z_{12} x\) where x, y and zi for 1 ≤ i ≤ 15 are variables. The LHS and RHS of the equation are aligned vertically. The block decomposition \(\mathfrak {B} = (B_0,B_{1},B_{2},B_3)\) of E is shown with solid rectangles and with the variety and type of the block written beneath. The additional divisions into the factors \(\alpha _i, \alpha _i^R\) required by the definition of normal form are indicated by dashed lines (so that, i.e. α1 = z1z2, α2 = z3z4z5, α3 = z6z7, α4 = z8, z5, α5 = z9z10, α6 = z11 and α7 = z12z13z14z15). In order for the equation to satisfy the definition of Lex Normal Form, the variables highlighted in bold must be lexicographically minimal with respect to the appropriate sets \({{\varGamma }}^E_i\). For i = 1, we have that \({{\varGamma }}^E_{1} = \{z_i \mid 3 \leq i \leq 15\} \backslash \{z_4\}\). In particular, \({{\varGamma }}^E_{1}\) consists of the first variable in the block B1 (x3) along with (nearly) all variables on the LHS of the equation occurring to the right of z3, excluding the rightmost variable (y), and since B1 is Type B, also excluding the second variable in the block B1 (namely z4). On the other hand, since B2 is Type A, for i = 2, we do not need to exclude the second variable in the block B2, so \({{\varGamma }}^E_{2} = \{ z_i \mid 8 \leq i \leq 15\}\). Assuming an underlying lexicographic order for which zi+ 1 is greater than zi, we can conclude that E is in Lex Normal Form

Fact 7.9

For every basic RWE in normal form, there exists a unique block decomposition (B0, B1,…,Bk) where k ≤Card(var(E)), Bk is a final block, and if k > 0, then B0 is an initial block.

Since the blocks are fixed by their first variable, it is natural to ask for which variables we can find an equation in our graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) such that the block begins with that variable. In particular, can we find an equation in normal form in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) for which the first variable of each block is lexicographically minimal when reading from left to right? The answer to the question is “nearly”. In other words, if we relax the notion slightly to account for some specific exceptions, then we can always guarantee the existence of such an equation. This leads to the notion of Lex Normal Form defined below.

Definition 7.10 (Lex Normal Form)

Let E be a basic RWE in normal form. Then there exist x, yX and α, β ∈ (X∖{x, y}) such that E has the form \(x \alpha y \doteq y \beta x\). Let (B0, B1,…,Bk) be the block decomposition of E. For each i, 0 ≤ ik, let \(\gamma _i, \gamma _i^{\prime } \in X^{*}\) such that \(B_i = (\gamma _i,\gamma _i^{\prime })\), let Si = {γi[2],y} whenever Bi is Type B and Si = {y} otherwise, and let \({{\varGamma }}^E_i = \left (\bigcup \limits _{i \leq j \leq k} {var}(\gamma _j)\right )\backslash S_i \). A block Bi is lex-minimal if γi[1] is lexicographically minimal in \({{\varGamma }}^E_i\). The equation E is in Lex Normal Form (LNF) if, for each i, 0 < i < k, Bi is lex-minimal.

Lex Normal Form (see also Fig. 6 for an example) describes the class of equations for which the first variable of each blocks is lexicographically minimal whenever possible. We can, in general, guarantee the existence of an equation \(E^{\prime }\) in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) such that the first variable of each block is lexicographically minimal with the following exceptions. Firstly, we must exclude the first and last blocks (the first block is fixed completely by ΥE). Secondly, we must only compare the first variable to other variables occurring further right in the LHS of the equation, and excluding the rightmost variable on the LHS of the equation (y in the definition above) and, for blocks of Type B, the second variable in the block. The sets \({{\varGamma }}^E_i\) in the definition account for these exclusions.

The main result of this section is that every vertex in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is never more than a polynomial distance away from a vertex corresponding to an equation in LNF.

Theorem 7.11

Let E be a jumbled basic RWE. Then there exists \(E^{\prime }\) such that \(E^{\prime }\) is in Lex Normal Form, and such that \(E \Rightarrow ^{n_{1}} E^{\prime }\) and \(E^{\prime } \Rightarrow ^{n_{2}} E\) for some n1, n2O(|E|4).

Although Theorem 7.11 does not provide as detailed a description of the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\) in the jumbled case as Theorem 6.8 does in the non-jumbled case, it does allow us to study them as the polynomial-distance neighbourhoods of the highly restricted set of vertices corresponding to equations in Lex Normal Form. Section 8 gives a strong example of the benefits of this approach, allowing us to show firstly that the cardinality of the set of vertices in Lex Normal Form is bounded by a polynomial in |E| (in contrast to the fact that the total number of vertices will typically be exponential, as shown in Section 9), and consequently, that the diameter of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is also bounded by a polynomial in |E|.

Proof of Theorem 7.11

The rest of this section is devoted to proving Theorem 7.11. To do so, we essentially provide a strategy for rewriting any jumbled basic regular word equation E into an equation in Lex Normal Form. The overall structure is similar to that of Theorem 7.5 in the sense that we transform the equation in steps from left to right so that after each step, the prefixes of the LHS and RHS having the desired form are longer. Since each side of the equation stays the same length under the transformations, we eventually reach a state where the entire equation is in the correct form.

The first step in this strategy is to first ensure that E is in normal form (which we can do due to Theorem 7.5). We can then decompose E into blocks according to Definition 7.8 (see also Fig. 6). In each subsequent step, we apply transformations which increase the number of blocks satisfying the requirements for Lex Normal Form. In particular, if the first j blocks satisfy the requirements for Lex Normal Form, then we apply a sequence of transformations which either preserve the first j − 1 blocks and turn the jth block into a final block, or which preserves the first j blocks, and which result in an equation which is also in normal form, and for which the j + 1th block also satisfies the requirements for Lex Normal Form. Note that Lex Normal Form does not impose any additional constraints on the initial or final blocks, so we can start with j = 1 and we are done whenever we produce a final block.

There are two cases depending on whether the j + 1th block is Type A or Type B. The case that it is Type A is substantially the easier of the two and is considered directly in the proof of Lemma 7.17. Lemmas 7.12-7.16 focus on the case that the block is Type B. In this case, there exist x, y, a, b, c,∈ X and \(\mu _{1},\mu _{1}^{\prime },\mu _{2},\mu _{2}^{\prime } \in X^{*}\) such that our equation may be written as
$$ x \mu_{1} abc \mu_{2} y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} x $$
where \({var}(\mu _{1}) = {var}(\mu _{1}^{\prime })\) and \({var}(\mu _{2}) = {var}(\mu _{2}^{\prime })\), the prefixes xμ1 and \(y \mu _{1}^{\prime }\) constitute the first j blocks (the ones satisfying the requirements for LNF), and such that the j + 1th block, which does not satisfy the requirements for LNF, has the form \((abc\gamma , cba\gamma ^{\prime })\) for prefixes \(\gamma ,\gamma ^{\prime }\) of \(\mu _{2},\mu _{2}^{\prime }\) respectively. Our aim is to transform the equation above into an equation either of the form:
$$ x\mu_{1} \beta y \doteq y \mu_{1}^{\prime} \beta^{R} x$$
in which case the jth block becomes final (and all other blocks are preserved), or of the form:
$$ x \mu_{1} zbw \eta y \doteq y \mu_{1}^{\prime} wbz \eta^{\prime} x $$
where w, zX and \(\eta ,\eta ^{\prime } \in X^{*}\), such that either \(\eta ^{\prime } = \eta ^R\) (meaning \((zbw \eta y, wbz \eta ^{\prime } x)\) is a final block), or z is lexicographically minimal in \({{\varGamma }}^E_{j+1} = {var}(\mu _{2}) \cup \{a,c\}\).

In the case that \(\eta ^{\prime } = \eta ^R\), then the new equation is in normal form and will have a block decomposition with j + 1 blocks, such that the first j blocks are the same as before, and thus satisfy the requirements for LNF. The j + 1th block is final, and trivially satisfies the requirements for LNF, so the whole equation is in LNF. In the second case, we can apply Lemma 7.6 to further transform our equation into one in normal form without changing the prefixes xμ1zbw and \(y\mu _{1}^{\prime } wbz\). In the resulting block decomposition, the first j blocks will remain unchanged, while the j + 1th block will have the form \((zbw\gamma ,wbz \gamma ^{\prime })\) for some \(\gamma ,\gamma ^{\prime } \in {{{\varGamma }}^E_{j+1}}^{*}\). Since \({{\varGamma }}^E_{j+1}\) will also remain unchanged, z is lexicographically minimal in \({{\varGamma }}^{E^{\prime }}_{j+1}\) for our new equation \(E^{\prime }\), so the j + 1th block also satisfies the requirements for LNF as intended.

The following lemma shows us how, under the rewriting transformation ⊙, we can replace the factors abc and cba with factors dbe and ebd, providing that d, eX occur in the appropriate positions (namely directly left of y and x on the LHS and RHS respectively).

Lemma 7.12

Let \(E,E^{\prime }\) be basic RWEs given by
$$ \begin{array}{@{}rcl@{}} &&E:\quad x \mu_{1} abc \mu_{2} d \mu_{3} e y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} e \mu_{3}^{\prime} d x\\ &&E^{\prime}\!\!: \quad x \mu_{1} ebd \mu_{3} c \mu_{2} a y \doteq y \mu_{1}^{\prime} dbe \mu_{3}^{\prime} a \mu_{2}^{\prime} c x \end{array} $$

where x, y, a, b, c, d, eX and \(\mu _{1},\mu _{2},\mu _3, \mu _{1}^{\prime },\mu _{2}^{\prime },\mu _{3}^{\prime } \in X^{*}\). Then \(E \odot ^3 E^{\prime }\).

Proof

It follows from the definitions that:
$$ \begin{array}{@{}rcl@{}} &&\overbrace{x \mu_{1} abc \mu_{2} d \mu_{3} e y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} e \mu_{3}^{\prime} d x}^{E}\\ \xrightarrow{b,e} &&x \mu_{1} a e bc \mu_{2} d \mu_{3} y \doteq y \mu_{1}^{\prime} ce \mu_{3}^{\prime} d ba \mu_{2}^{\prime} x\\ \xrightarrow{c,d} &&x \mu_{1} a e bd \mu_{3} c \mu_{2} y \doteq y \mu_{1}^{\prime} d ba \mu_{2}^{\prime} ce \mu_{3}^{\prime} x\\ \xrightarrow{a,e} &&\underbrace{x \mu_{1} e bd \mu_{3} c \mu_{2} a y \doteq y \mu_{1}^{\prime} d be \mu_{3}^{\prime} a \mu_{2}^{\prime} c x.}_{E^{\prime}} \end{array} $$

Thus the statement follows by Lemma 7.2. □

Of course, the variable d occurring to the left of y on the RHS will in general not be the lexicographically minimal element z of \({{\varGamma }}^E_{j+1}\). In order to take advantage of Lemma 7.12, we also need to find a sequence of transformations which, for any z ∈{c}∪ var(ν), results in an equation of the form \(x \mu _{1} a^{\prime }bc^{\prime } \eta z y \doteq y \mu _{1}^{\prime } c^{\prime }ba^{\prime } \eta ^{\prime } x\) with \(a^{\prime },c^{\prime } \in X\) and \(\eta ,\eta ^{\prime } \in X^{*}\). To achieve this, we need Lemmas 7.13 and 7.14 as follows.

Lemma 7.13

Let E be a basic RWE given by \(x \mu _{1} \alpha \mu _{2} y \doteq y \mu _{1}^{\prime } \alpha ^R \mu _{2}^{\prime } x\) with α, μ1, μ2, \(\mu _{1}^{\prime },\mu _{2}^{\prime }\in X^{*}\), 2 ≤|α|≤ 3, |μ2|≥ 1 and \({var}(\mu _{2}) = {var}(\mu _{2}^{\prime })\). Let v = α[|α|− 1]. Then for each zvar(αμ2)∖{v}, there exists n ≤ 3 and \(\eta ,\eta ^{\prime } \in X^{*}\) such that \(E \odot ^n x \mu _{1} \eta z y \doteq y \mu _{1}^{\prime } \eta ^{\prime } x\).

Proof

Let zvar(αμ2)∖{v}. If z is a suffix of μ2 then the statement holds trivially. Suppose that z is not a suffix of μ2. We shall consider two cases separately. Firstly, suppose that zvar(μ2) ∪{α[|α|]}. Then there exists wX such that zw is a factor of αμ2. Moreover, \(w \in {var}(\mu _{2}) = {var}(\mu _{2}^{\prime })\), so there exist \(\nu _{1}, \nu _{2},\nu _{1}^{\prime },\nu _{2}^{\prime } \in X^{*}\) such that μ2 = ν1wν2 and \(\mu _{2}^{\prime } = \nu _{1}^{\prime } w \nu _{2}^{\prime }\) where ν1 = ε if z = α[|α|], and ν1[|ν1|] = z otherwise. Furthermore, there exists uX and \(\alpha ^{\prime } \in X^{*}\) such that \(\alpha = u \alpha ^{\prime }\) and \(\alpha ^{R} = \alpha ^{\prime {R}} u\). Thus we may write E as \(x \mu _{1} u \alpha ^{\prime } \nu _{1} w \nu _{2} y \doteq y \mu _{1}^{\prime } \alpha ^{\prime R} u \nu _{1}^{\prime } w \nu _{2}^{\prime } x\), and thus \(E \xrightarrow {u,w} x \mu _{1} w \nu _{2} u \alpha ^{\prime } \nu _{1} y \doteq y \mu _{1}^{\prime } \alpha ^{\prime R} w \nu _{2}^{\prime } u \nu _{1}^{\prime } x\). Since z is a suffix of \(\alpha ^{\prime }\nu _{1}\), the statement of the lemma follows.

Now suppose that zvar(μ2) ∪{α[|α|]}. Then the only possibility is that |α| = 3 and z = α[1]. In this case, due to the fact that ⊙ is symmetric, the statement follows directly from Lemma 7.12. □

Lemma 7.14

Let E be a basic RWE given by \(x \mu _{1} v \mu _{2} y \doteq y \mu _{1}^{\prime } v \mu _{2}^{\prime } x\) with vX and \(\mu _{1},\mu _{2}, \mu _{1}^{\prime },\mu _{2}^{\prime }\in X^{*}\) such that \({var}(\mu _{2}) = {var}(\mu _{2}^{\prime })\). Then for every zvar(vμ2), there exist \(v^{\prime } \in X\) and \(\eta ,\eta ^{\prime } \in X^{*}\) and n ≤ 1 such that \(E \odot ^n x \mu _{1} v^{\prime } \eta z y \doteq y \mu _{1}^{\prime } v^{\prime } \eta ^{\prime } x\).

Proof

Let zvar(vμ2). If z is a suffix of μ2, then the statement holds trivially. Otherwise, there exists wX such that zw is a factor of vμ2. Moreover, since wv, \(w \in {var}(\mu _{2}) = {var}(\mu _{2}^{\prime })\), so there exist \(\nu _{1},\nu _{2},\nu _{1}^{\prime },\nu _{2}^{\prime } \in X^{*}\) such that vμ2 = ν1wν2 and \(v\mu _{2}^{\prime }\) = \(\nu _{1}^{\prime } w \nu _{2}^{\prime }\). Thus we may write E as \(x \mu _{1}\nu _{1} w \nu _{2} y \doteq y \mu _{1}^{\prime } \nu _{1}^{\prime } w \nu _{2}^{\prime } x\) such that v is a prefix of ν1 and \(\nu _{1}^{\prime }\), and such that z is a suffix of ν1. Thus, \(E\xrightarrow {v,w} x \mu _{1} w \nu _{2} \nu _{1}y \doteq y \mu _{1}^{\prime } w \nu _{2}^{\prime } \nu _{1}^{\prime } x\), and since z is a suffix of ν1, the statement of the lemma follows. □

Recall that our strategy for transforming an equation of the form \(x \mu _{1} abc \mu _{2} y \doteq y \mu _{1}^{\prime } cba \mu _{2}^{\prime } x\) into one of the form \(x \mu _{1} zbw \eta y \doteq y \mu _{1}^{\prime } wbz \nu ^{\prime } x\) is first to ‘move’ the lexicographically minimal variable z from \({{\varGamma }}^E_{j+1}\) into the correct position (to the left of y on the LHS) and then to apply Lemma 7.14. We can consider three cases for z separately. The first, that z = a is trivial, and we do not need to change our original equation at all. The case that z = c is the most involved and is considered in the proof of Lemma 7.16. All other choices of z (namely when zvar(μ2)), are addressed in Lemma 7.15 below.

Note that in the statement of Lemma 7.15, the factors \(\mu _{2},\mu _{2}^{\prime }\) are replaced by μ2δ and \(\mu _{2}^{\prime }\delta ^R\) respectively. We may make this change w.l.o.g. since our equation is in normal form, and since the case that \(\mu _{2} = \mu _{2}^{\prime } = \varepsilon \) is trivial (the jth block will be final in this case). Moreover, if |δ| = 1, then (δ, δ) ∈ΥE, so it follows from the definitions that the equation is not jumbled. Since we are only interested in this section in jumbled equations, we may therefore also assume that |δ|≥ 2, which is necessary for the proof of the lemma.

Lemma 7.15

Let E be a basic RWE in normal form given by
$$x \mu_{1} abc \mu_{2} \delta y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} \delta^{R} x$$
with a, b, cX and \(\delta , \mu _{1},\mu _{2}, \mu _{1}^{\prime },\mu _{2}^{\prime }\in X^{*}\) such that |δ|≥ 2, and \({var}(\mu _{2}) = {var}(\mu _{2}^{\prime })\). Then at least one of the following two statements is true.
  1. 1.

    There exist nO(|E|), \(a^{\prime },c^{\prime } \in X\), and βX+ such that \(E \odot ^n x \mu _{1} a^{\prime }bc^{\prime } \beta y \doteq y \mu _{1}^{\prime } c^{\prime }ba^{\prime } \beta ^R x\), or

     
  2. 2.

    for every zvar(μ2δ), there exist \(a^{\prime },c^{\prime } \in X\), \(\eta ,\eta ^{\prime } \in X^{*}\), and nO(|E|2) such that \(E \odot ^n x \mu _{1} a^{\prime }bc^{\prime } \eta z y \doteq y \mu _{1}^{\prime } c^{\prime }ba^{\prime } \eta ^{\prime } x\).

     

Proof

Suppose that the first statement does not hold and notice that this implies |μ2|≥ 1. We shall now prove that the second statement holds. We divide our reasoning into three cases based on the prefixes of μ2 and \(\mu _{2}^{\prime }\). In particular, since E is in normal form, there exists a prefix αi of μ2 such that \(\alpha _i^R\) is a prefix of \(\mu _{2}^{\prime }\) and such that 1 ≤|αi|≤ 3. Firstly suppose that |αi| = 1, or in other words that μ2 and \(\mu _{2}^{\prime }\) have a common prefix vX. Then the statement follows directly from Lemma 7.14.

It remains to consider the cases that |αi| = 2 and |αi| = 3. Before we consider these cases explicitly, it is convenient to define the following equation \(E^{\prime }\) such that \(E \odot ^{n^{\prime }} E;\) for some \(n^{\prime } \in O(|E|)\). In particular, note that there exist u, vX such that \(\delta = u \delta ^{\prime } v\). It follows by Lemma 7.12 that there exist ν1, ν1X+ with \({var}(\nu _{1}) = {var}(\nu _{1}^{\prime })\) such that \(E \odot ^3 x \mu _{1} vbu \nu _{1} y \doteq y \mu _{1}^{\prime } ubv \nu _{1}^{\prime } x\). Moreover, by Lemma 7.6, there exist \(\nu _{2},\nu _{2}^{\prime } \in X^{*}\), βX+ and \(n^{\prime } \in O(|E|)\) such that \(E \odot ^{n^{\prime }} E^{\prime }\) where \(E^{\prime }\) is given by
$$E^{\prime} : \quad x \mu_{1} vbu \beta \nu_{2} y \doteq y \mu_{1}^{\prime} ubv \beta^{R} \nu_{2}^{\prime} x$$
where 1 ≤|β|≤ 3 (recall by our assumption that the first statement of the lemma does not hold, that ν2ε). Note that since \(E \odot ^{*} E^{\prime }\), we have \(E \Rightarrow ^{*} E^{\prime }\) and thus by Theorem 5.3, \({\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} = {\varUpsilon }\).

We are now ready to consider the second case, that |αi| = 2. In this case, there exist d, eX such that αi = de, so de is a prefix of μ2 and ed is a prefix of \(\mu _{2}^{\prime }\). If zvar(μ2α)∖{d}, then the second statement of the lemma follows directly from Lemma 7.13. Suppose instead that z = d. In this case, we shall show that the (second statement of the) lemma holds for \(E^{\prime }\). Since \(E \odot ^{n^{\prime }} E^{\prime }\), it follows that the lemma also holds for E.

If |β| = 1, then the second statement of the lemma follows from Lemma 7.14 along with the fact that \(E \odot ^{n^{\prime }} E^{\prime }\). Similarly, if |β|∈{2,3} and zβ[|β|− 1], the statement follows from Lemma 7.13. Finally, we must consider the case that β ∈{2,3} and z = β[|β|− 1]. If |β| = 2, then there exists \(z^{\prime } \in X\) such that \(\beta = zz^{\prime }\). It follows that \(zz^{\prime }\) is a factor of the LHS of \(E^{\prime }\) and \(vz^{\prime }\) is a factor of the RHS of \(E^{\prime }\), so (z, v) ∈Υ. Furthermore, by our assumption that z = d, ze = de is a factor of the LHS of E and ae is a factor of the RHS of E, so (z, a) ∈Υ. However, since av, this contradicts Remark 5.2. We can proceed similarly when |β| = 3. In particular, if |β| = 3, then there exist \(z^{\prime },z^{\prime \prime } \in X\) such that \(\beta = z^{\prime }zz^{\prime \prime }\). It follows that (z, v),(u, z) ∈Υ. Furthermore, since z = d, we also have that (z, a) ∈Υ. However, since va we again get a contradiction to Remark 5.2. Thus dβ[|β|− 1] and we are done with the case that |αi| = 2.

Suppose now that |αi| = 3, meaning there exist d, e, fX such that αi = def is a prefix of μ2 and fed is a prefix of \(\mu _{2}^{\prime }\). As before, if zvar(μ2α)∖{e}, the second statement of the lemma follows from Lemma 7.13 (applied to E). Suppose instead that z = e. We shall again proceed by showing that the second statement of the lemma holds for \(E^{\prime }\). If |β| = 1, it follows directly from Lemma 7.14. Similarly, if |β|∈{2,3} and zβ[|β|− 1], the statement again follows from Lemma 7.13. Finally, suppose for contradiction that β ∈{2,3} and z = β[|β|− 1]. We again have to consider two cases based on |β|. If |β| = 2, then there exists \(z^{\prime } \in X\) such that \(\beta = zz^{\prime }\). It follows that (z, v) ∈Υ. Furthermore, since z = e, we also have that (z, a) ∈Υ, a contradiction to Remark 5.2. Similarly, if |β| = 3, then there exist \(z^{\prime },z^{\prime \prime } \in X\) such that \(\beta = z^{\prime }zz^{\prime \prime }\). It follows that (z, v),(u, z) ∈Υ. Furthermore, since z = e, we also have that (z, a),(c, z) ∈Υ. However, since uc, va we again get a contradiction to Remark 5.2. Thus dβ[|β|− 1] and the statement holds as required. □

We are now ready to prove the following lemma, which is the main technical step in the proof of Theorem 7.11, showing that we can replace the factors abc and cba at the start of the j + 1th block (which occur whenever the block is Type B) with factors zbw and wbz where z is any variable from \({{\varGamma }}^E_{j+1}\), and hence that we can do the same for the lexicographically minimal choice of z. This, combined with Lemma 7.6, allows us to transform the equation into one with the j + 1th block satisfying the requirements for Lex Normal Form.

It is also worth noting that the variable b and whether the block is Type A or Type B remain unchanged (see Section 8 for more information on why we cannot change them). Aside from these parameters, we can essentially produce all other possibilities for the variable in the first position in the block. In other words, we do not use anything about the lexicographic order other than it permits us to make some well-defined choice at each stage which is consistent across all equations. Consequently, there is a high degree of symmetry in the set of equations in normal form occurring in the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\).

Lemma 7.16

Let E be a basic RWE in normal form given by
$$x \mu_{1} abc \mu_{2} \delta y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} \delta^{R} x$$
with a, b, cX and \(\delta , \mu _{1},\mu _{2}, \mu _{1}^{\prime },\mu _{2}^{\prime }\in X^{*}\) such that |δ|≥ 2 and \({var}(\mu _{2}) = {var}(\mu _{2}^{\prime })\). Let Γ = var(μ2δ) ∪{a, c}. Then at least one of the following two statements is true.
  1. 1.

    There exist nO(|E|), \(a^{\prime },c^{\prime } \in X\), and βX+ such that \(E \odot ^n x \mu _{1} a^{\prime }bc^{\prime } \beta y \doteq y \mu _{1}^{\prime } c^{\prime }ba^{\prime } \beta ^R x\), or

     
  2. 2.

    for each zΓ, there exist wX, \(\eta ,\eta ^{\prime } \in X^{*}\), and nO(|E|2) such that \(E \odot ^n x \mu _{1} zbw \eta y \doteq y \mu _{1}^{\prime } wbz \eta ^{\prime } x\).

     

Proof

Assume that the first statement does not hold and notice that this implies |μ2|≥ 1. We shall now prove that the second statement holds. The case that z = a is trivial. Next, consider the case that z∉{a, c}. Then zvar(μ2δ). By Lemma 7.15, and by our assumption that Statement 1 of the lemma does not hold, we get that there exist \(a^{\prime },c^{\prime } \in X\) \(\nu , \nu ^{\prime } \in X^{*}\) and \(n^{\prime } \in O(|E|^2)\) such that
$$ E \odot^{n^{\prime}} x \mu_{1} a^{\prime}bc^{\prime} \nu z y \doteq y \mu_{1}^{\prime} c^{\prime}ba^{\prime} \nu^{\prime} x.$$
Since E is basic and regular, and since \({var}(\mu _{1}) = {var}(\mu _{1}^{\prime })\), we may conclude that \({var}(\nu ^{\prime }) = {var}(\nu z)\). Thus, by Lemma 7.12, there exist \(\eta , \eta ^{\prime } \in X^{*}\) such that
$$ x \mu_{1} a^{\prime}bc^{\prime} \nu z y \doteq y \mu_{1}^{\prime} c^{\prime}ba^{\prime} \nu^{\prime} x \odot^{3} x \mu_{1} zb w \eta y \doteq y \mu_{1}^{\prime} wbz \eta^{\prime} x$$
where \(w = \nu ^{\prime }[|\nu ^{\prime }|]\). Consequently, we have that \(E \odot ^n x \mu _{1} zb w \eta y \doteq y \mu _{1}^{\prime } wbz \eta ^{\prime } x\) for some nO(|E|) and the second statement holds as claimed.

It remains to consider the case that z = c. Then since |δ|≥ 2, there exist u, vX∖{a, b, c} such that \(\delta = u \delta ^{\prime } v\) for some \(\delta ^{\prime } \in X^{*}\). Thus, by Lemma 7.12, there exist \(\nu _{1},\nu _{1}^{\prime } \in X^{*}+\) such that \(E \odot ^3 x \mu _{1} vbu \nu _{1} y \doteq y \mu _{1}^{\prime } ubv \nu _{1}^{\prime } x\). Moreover, since E is basic and regular, and since \({var}(\mu _{1}) = {var}(\mu _{1}^{\prime })\), we may conclude that \({var}(\nu _{1}) = {var}(\nu _{1}^{\prime })\). Thus, by Lemma 7.6, there exist \(\nu _{2},\nu _{2}^{\prime } \in X^{*}\) and βX+ and n1O(|E|) such that \(E \odot ^{n_{1}} E^{\prime }\) where \(E^{\prime }\) is given by \(x \mu _{1} vbu \beta \nu _{2} y \doteq y \mu _{1}^{\prime } ubv \beta ^R \nu _{2}^{\prime } x\) and such that 1 ≤|β|≤ 3 whenever ν2ε. By our assumption that the first statement of the lemma is not true, we must in fact have that ν2ε.

Additionally, note that \({var}(\nu _{2}) = {var}(\nu _{2}^{\prime })\) and cvar(βν2). Thus, by Lemma 7.15, along with our assumption that the first statement of the lemma does not hold, it follows that there exist n2O(|E|2), \(a^{\prime },c^{\prime }, d\in X\) and \(\eta ,\eta ^{\prime } \in X^{*}\) such that \(E^{\prime } \odot ^{n_{2}} E^{\prime \prime }\) where \(E^{\prime \prime }\) is given by \(x \mu _{1} a^{\prime }bc^{\prime } \eta c y \doteq y \mu _{1}^{\prime } c^{\prime } b a^{\prime } \eta ^{\prime } dx\). As before, since E (and therefore also \(E^{\prime \prime }\)) is basic and regular, and since \({var}(\mu _{1}) = {var}(\mu _{1}^{\prime })\), we may further conclude that \({var}(\eta c) = {var}(\eta ^{\prime } d)\). Similarly, since E is jumbled and \(E\odot ^{*} E^{\prime \prime }\) (meaning also that \(E \Rightarrow ^{*}E^{\prime \prime }\)) it follows that \(E^{\prime \prime }\) is also jumbled and consequently that dc. Hence we may write \(E^{\prime \prime }\) as \(x \mu _{1} a^{\prime }bc^{\prime } \eta _{1} d \eta _{2} c y \doteq y \mu _{1}^{\prime } c^{\prime } b a^{\prime } \eta _{1}^{\prime } c \eta _{2}^{\prime } dx\) where \(\eta _{1},\eta _{1}^{\prime },\eta _{2},\eta _{2}^{\prime } \in X^{*}\) and the second statement of the lemma follows from Lemma 7.12. □

Having described the main technical elements to the proof of Theorem 7.11, we are now ready to give the main intuitive statement as to why it holds, which also constitutes the main induction step, forming the backbone of the proof.

Lemma 7.17

Let E be a jumbled basic RWE in normal form with block decomposition (B0, B1,…,Bk). Let \(\iota \in \mathbb {N}\) with 0 < ι < k. Then at least one of the following two statements is true.
  1. 1.

    There exists a (final) block Cι, \(\hat {E} \in [E]_{\Rightarrow }\) and nO(|E|) such that \(E \odot ^n \hat {E}\) and such that \(\hat {E}\) has a block decomposition (B0, B1,…,Bι− 1, Cι), or

     
  2. 2.

    there exist blocks Cι, Cι+ 1,…C, \(\hat {E} \in [E]_{\Rightarrow }\) and nO(|E|2) such that \(E \odot ^n \hat {E}\) and such that \(\hat {E}\) has a block decomposition (B0, B1,…,Bι− 1, Cι, Cι+ 1,…C) and such that Cι is lex-minimal.

     

Proof

Let E be given by
$$x \alpha_{1}\alpha_{2}{\ldots} \alpha_{m} y \doteq y {\alpha_{1}^{R}} {\alpha_{2}^{R}} {\ldots} {\alpha_{m}^{R}} x$$
such that x, yX, αiX+ for 1 ≤ in, and |αi|≤ 3 for 1 ≤ i < m. Let IE = {i1, i2,…,ik} = {i∣1 ≤ i < m and |αi|≠ 2} with 1 ≤ i1 < i2 < … < ik < m. If IE = , then the statement holds trivially. Thus we may assume that IE. Note that the block decomposition \(\mathfrak {B}\) of E is given by (B0, B1,…,Bk) where
$$ \begin{array}{@{}rcl@{}} B_{0} &=& (x \alpha_{1}\alpha_{2} {\ldots} \alpha_{i_{1}-1}, y {\alpha_{1}^{R}}{\alpha_{2}^{R}} {\ldots} \alpha_{i_{1}-1}^{R} )\\ B_{j} &=& (\alpha_{i_{j}} \alpha_{i_{j}+1} \ldots \alpha_{i_{j+1}-1}, \alpha_{i_{j}}^{R} \alpha_{i_{j}+1}^{R} {\ldots} \alpha_{i_{j+1}-1}^{R})\\ B_{k} &=& (\alpha_{i_{k}} \alpha_{i_{k}+1} {\ldots} \alpha_{n} y, \alpha_{i_{k}}^{R} \alpha_{i_{k}+1}^{R} {\ldots} {\alpha_{n}^{R}} x) \end{array} $$

for 0 < j < k.

Now, let \( \iota \in \mathbb {N}\) with 0 < ι < k. If Bι is lex-minimal, the second statement holds trivially for = k and Cj = Bj for ιjk. Suppose instead that Bι is not lex-minimal. We shall consider the cases that Bι is Type A and Type B separately. Suppose firstly that Bι is Type A. Then \(|\alpha _{i_{\iota }}| = 1\). Thus we can write E as
$$x \mu_{1} v \mu_{2} y \doteq y \mu_{1}^{\prime} v \mu_{2}^{\prime} x$$
where \(v = \alpha _{i_{\iota }} \in X\), \(\mu _{1} = \alpha _{1} \alpha _{2} {\ldots } \alpha _{i_{\iota }-1}\), \(\mu _{1}^{\prime } = \alpha _{1}^R \alpha _{2}^R {\ldots } \alpha _{i_{\iota }-1}^R\), \(\mu _{2} = \alpha _{i_{\iota }+1}\alpha _{i_{\iota }+2} {\ldots } \alpha _m\) and \(\mu _{2}^{\prime } = \alpha _{i_{\iota } + 1}^R \alpha _{i_{\iota }+2}^R {\ldots } \alpha _m^R\). Moreover, \({{\varGamma }}^E_{\iota } = {var}(v \mu _{2})\). Let z be the lexicographically minimal element of \({{\varGamma }}^E_{\iota }\). Then by our assumption that Bι is not lex-minimal, we have that zv. Thus there exist \(\nu _{1},\nu _{2}, \nu _{1}^{\prime }, \nu _{2}^{\prime } \in X^{*}\) such that μ2 = ν1zν2 and \(\mu _{2}^{\prime } = \nu _{1}^{\prime } z \nu _{2}^{\prime }\). Consequently, \(E \xrightarrow {v,z} x \mu _{1} z \nu _{2} v \nu _{1} y \doteq y \mu _{1}^{\prime } z \nu _{2}^{\prime } v \nu _{1}^{\prime } x\) and since \({var}(\mu _{1} z) = {var}(\mu _{1}^{\prime }z)\), by Lemma 7.6, we have that \(E \odot ^n E^{\prime }\) where \(E^{\prime }\) is given by:
$$ x \alpha_{1}\alpha_{2}{\ldots} \alpha_{i_{\iota}-1} z \alpha_{i_{\iota}+1}^{\prime} \alpha_{i_{\iota}+2}^{\prime} {\ldots} \alpha_{m^{\prime}}^{\prime} y \doteq y {\alpha_{1}^{R}}{\alpha_{2}^{R}}{\ldots} \alpha_{i_{\iota}-1}^{R} z\alpha_{i_{\iota}+1}^{\prime R} \alpha_{i_{\iota}+2}^{\prime R} {\ldots} \alpha_{m^{\prime}}^{\prime R} x$$
for some nO(|E|2) and \( \alpha _{i_{\iota }+1}^{\prime },\alpha _{i_{\iota }+2}^{\prime },\ldots , \alpha _{m^{\prime }}^{\prime } \in X^+\) with \(1\leq |\alpha _j^{\prime }| \leq 3\) for \(i_{\iota }+1 \leq j < m^{\prime }\). Let \(I_{E^{\prime }} = \{i_{1}^{\prime },i_{2}^{\prime },\ldots ,i_{\ell }^{\prime } \} = \{i \mid 1\leq i < i_{\iota } \text { and } |\alpha _i| \not = 2\} \cup \{i_{\iota } \} \cup \{i \mid i_{\iota } < i < m^{\prime } \text { and } |\alpha _i^{\prime }| \not = 2\}\) with \(1\leq i_{1}^{\prime } < i_{2}^{\prime } < {\ldots } < i_{\ell }^{\prime } < m\).

Let \(\mathfrak {B}^{\prime } = (B_{0}^{\prime },B_{1}^{\prime },\ldots , B_{\ell }^{\prime })\) be the block decomposition of \(E^{\prime }\). Then since \(I_E \cap \{1,2,\ldots ,i_{\iota }\} = I_{E^{\prime }} \cap \{1,2,\ldots , i_{\iota }\}\), we have \(B_j = B_j^{\prime }\) for 0 ≤ jι − 1. Moreover, since z is minimal in \({{\varGamma }}^E_{\iota } = {{\varGamma }}^{E^{\prime }}_{\iota }\), \(B_{\iota }^{\prime }\) is lex-minimal and the second statement holds.

Now suppose that Bι is Type B. Then \(|\alpha _{i_{\iota }}| = 3\), so there exist a, b, cX such that \(\alpha _{i_{\iota }} = abc\). Thus we can write E as
$$ x \mu_{1} abc \mu_{2} \delta y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} \delta^{R} x $$
where \(\mu _{1} = \alpha _{1} \alpha _{2} {\ldots } \alpha _{i_{\iota }-1}\), \(\mu _{1}^{\prime } = \alpha _{1}^R \alpha _{2}^R {\ldots } \alpha _{i_{\iota }-1}^R\), \(\mu _{2} = \alpha _{i_{\iota }+1}\alpha _{i_{\iota }+2} {\ldots } \alpha _{m-1}\), \(\mu _{2}^{\prime } = \alpha _{i_{\iota } + 1}^{R} \alpha _{i_{\iota }+2}^{R} {\ldots } \alpha _{m-1}^{R}\) and δ = αm. Moreover, \({{\varGamma }}_{\iota }^{E} = {var}(\mu _{2}\delta )\cup \{a,c\}\). Let z be the lexicographically minimal element of \({{\varGamma }}_{\iota }^{E}\). Then by our assumption that Bι is not lex-minimal, za. Moreover, since E is jumbled, we may conclude that |δ|≠ 1 (otherwise we would have (δ, δ) ∈ΥE, a contradiction).
By Lemma 7.16, we have two cases. The first is that there exists nO(|E|2), \(a^{\prime },c^{\prime } \in X\) and βX+ such that \(E \odot E^{\prime }\) where \(E^{\prime }\) is given by
$$ x \alpha_{1}\alpha_{2}{\ldots} \alpha_{i_{\iota}-1} a^{\prime}bc^{\prime} \beta y \doteq y {\alpha_{1}^{R}} {\alpha_{2}^{R}} {\ldots} \alpha_{i_{\iota}-1}^{R} c^{\prime}ba^{\prime} \beta^{R} x. $$
Let \(I_{E^{\prime }} = \{i_{1}^{\prime },i_{2}^{\prime },\ldots ,i_{\ell }^{\prime } \} = \{i \mid 1\leq i < i_{\iota } \text { and } |\alpha _{i}| \not = 2\} \cup \{i_{\iota } \}\). Let \(\mathfrak {B}^{\prime } = (B_{0}^{\prime },B_{1}^{\prime },\ldots , B_{\ell }^{\prime })\) be the block decomposition of \(E^{\prime }\). Then since \(I_{E} \cap \{1,2,\ldots ,i_{\iota }\} = I_{E^{\prime }} \cap \{1,2,\ldots , i_{\iota }\}\), we have \(B_{j} = B_{j}^{\prime }\) for 0 ≤ jι − 1. Moreover, since \(I_{E^{\prime }}\) does not contain any elements greater than iι, \(B_{\iota }^{\prime }\) is the final block, so the first statement holds for Cι = Bι.
The second case is that there exist \(n^{\prime }\in O(|E|^{2}), w \in X\) and \(\eta ,\eta ^{\prime }\) such that \(E \odot ^{n^{\prime }} x \mu _{1} z b w \eta y \doteq y \mu _{1}^{\prime } wbz \eta ^{\prime } x\). By Lemma 7.6, there exist \(n^{\prime \prime }\in O(|E|^{2})\) and \(\alpha _{i_{\iota }+1}^{\prime },\alpha _{i_{\iota }+2}^{\prime }\), \(\ldots , \alpha _{m^{\prime }}^{\prime } \in X^{+}\) with \(|\alpha _{j}^{\prime }|\leq 3\) for \(i_{\iota } < j < m^{\prime }\) such that \(E \odot E^{\prime }\) where \(E^{\prime }\) is given by
$$ x \alpha_{1}\alpha_{2}{\ldots} \alpha_{i_{\iota}-1} zbw \alpha_{i_{\iota}+1}^{\prime} \alpha_{i_{\iota}+2}^{\prime} {\ldots} \alpha_{m^{\prime}}^{\prime} y \doteq y {\alpha_{1}^{R}} {\alpha_{2}^{R}} {\ldots} \alpha_{i_{\iota}-1}^{R} wbz\alpha_{i_{\iota}+1}^{\prime R} \alpha_{i_{\iota}+2}^{\prime R} {\ldots} \alpha_{m^{\prime}}^{\prime R} x.$$
Let \(I_{E^{\prime }} = \{i_{1}^{\prime },i_{2}^{\prime },\ldots ,i_{\ell }^{\prime } \} = \{i \mid 1\leq i < i_{\iota } \) and \( |\alpha _{i}| \not = 2\} \cup \{i_{\iota } \} \cup \{i \mid i_{\iota }+1 \leq i < m^{\prime } \text { and } |\alpha _{i}^{\prime }| \not = 2\}\) with \(1\leq i_{1}^{\prime } < i_{2}^{\prime } < {\ldots } < i_{\ell }^{\prime } < m^{\prime }\). Let \(\mathfrak {B}^{\prime } = (B_{0}^{\prime },B_{1}^{\prime },\ldots , B_{\ell }^{\prime })\) be the block decomposition of \(E^{\prime }\). Then since \(I_{E} \cap \{1,2,\ldots ,i_{\iota }\} = I_{E^{\prime }} \cap \{1,2,\ldots , i_{\iota }\}\), we have \(B_{j} = B_{j}^{\prime }\) for 0 ≤ jι − 1. Moreover, since z is minimal in \({{\varGamma }}_{\iota }^{E} = {{\varGamma }}_{\iota }^{E^{\prime }}\), \(B_{\iota }^{\prime }\) is lex-minimal and the second statement of the lemma statement holds for \(C_{j} = B_{j}^{\prime }\) for ιj. □

Finally, for the sake of completeness, we provide a formal summary of the proof of Theorem 7.11 based on Lemma 7.17 using the arguments which have so-far been described informally.

Proof Theorem 7.11

Let E be a jumbled basic RWE. By Theorem 7.5, we may assume that E is in normal form. Let \(\mathfrak {B} = (B_{0},B_{1},\ldots ,B_{k})\) be its block decomposition. If Bi is lex-minimal for 0 < i < k, then E is in LNF and we are done (this also covers the case that k ≤ 1). Otherwise, suppose that k > 1 and let \(\iota = \min \limits _{0<j<k}\{ j\mid B_{j} \text { is not lex-minimal}\}\). Then by Lemma 7.17, we have two possibilities. Either:
  1. 1.

    there exists a block Cι, nO(|E|) and \(\hat {E}\) such that \(E \odot ^{n} \hat {E}\) and \(\hat {E}\) has the block decomposition (B0, B1,…,Bι− 1, Cι), or

     
  2. 2.

    there exist blocks Cι, Cι+ 1,…,C, nO(|E|2) and \(\hat {E}\) such that \(E \odot ^{n} \hat {E}\) and such that \(\hat {E}\) has the block decomposition (B0, B1,…,Bι− 1, Cι, Cι+ 1,…,C) and such that Cι is lex-minimal.

     
In the first case, by definition of ι, Bj is lex-minimal for 0 < j < ι, meaning \(\hat {E}\) is in LNF and we are done. In the second case, we have an equation \(\hat {E}\) such that \(E \odot ^{n^{\prime }} \hat {E}\) where \(n^{\prime } \in O(|E|)^{2}\) and such that the block decomposition of \(\hat {E}\) has a longer initial sequence of lex-minimal blocks than the block decomposition of E.

Furthermore, it follows from the definitions that any block decomposition cannot have more blocks than the number of variables occurring in the equation. Recall that the set of variables occurring in an equation is invariant under ⇒ (and therefore also ⊙). Thus with at most O(|E|) applications of Lemma 7.17, we may conclude that \(E\odot ^{n^{\prime \prime }} E^{\prime }\) for an equation \(E^{\prime }\) and with block decomposition \((B_{0}^{\prime },B_{1}^{\prime },B_{2}^{\prime }, \ldots , B^{\prime }_{k^{\prime }})\) such that \(B_{j}^{\prime }\) is lex-minimal for \(0 < j < k^{\prime }\) (meaning \(E^{\prime }\) is in LNF) and such that \(n^{\prime \prime } \in O(|E|^{3})\). It follows directly from the definitions that ⊙ is symmetric, and therefore we also have \(E^{\prime } \odot ^{n^{\prime \prime }} E\). By Corollary 7.3, we may therefore conclude that \(E^{\prime } \Rightarrow ^{n_{1}} E\) and \(E \Rightarrow ^{n_{2}} E^{\prime }\) for some n1, n2O(|E|4). □

8 Diameter

It was mentioned in the previous section that the choices for the blocks in a block decomposition of an equation in normal form are restricted by the invariant ΥE. We shall now make full use of that fact to show that the number of equations in Lex Normal Form in a single graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is bounded by a polynomial in |E| (Theorem 7.11), and as a consequence that the diameter of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is also bounded by a polynomial in |E| (Theorem 8.11). By combining this result with Theorems 6.8 and 4.8, we can extend it from jumbled basic regular word equations to all regular word equations. Consequently, we can conclude that satisfiability of regular word equations is NP-complete (Theorem 8.12).

Since each equation in Lex Normal Form has a unique block decomposition, it is sufficient to count the possible block decompositions satisfying the conditions for Lex Normal Form for a given value of ΥE. We shall focus on conditions which force two blocks to be the same. We shall consider the cases of initial, standard and final blocks separately, but first we need the following lemmas which take advantage of the invariant ΥE in order to limit the equations in normal form occurring in a single equivalence class [E].

The first of these lemmas, and the resulting corollary provide some intuition behind the definition of the block decomposition and to why the blocks are often fixed by the invariant ΥE (along with the leftmost variable which, aside from exceptional cases, is fixed by Lex Normal Form). Essentially, they show that the length-two factors αi (and thus \({\alpha _{i}^{R}}\)) occurring as per the definition of normal form are fixed exactly by the variables preceding them along with the invariant ΥE.

Lemma 8.1

Let u, v, a, bX and let \(\alpha _{1},\alpha _{2},\beta _{1},\beta _{2},\alpha _{1}^{\prime },\alpha _{2}^{\prime }, \beta _{1}^{\prime }, \beta _{2}^{\prime },\gamma \in X^{*}\) such that 1 ≤|γ|≤ 3. Let E1 and E2 be jumbled basic RWEs given by
$$ \begin{array}{@{}rcl@{}} && E_{1}: \quad \alpha_{1} u a b \alpha_{2} \doteq \beta_{1} v b a \beta_{2}\\ && E_{2}: \quad \alpha_{1}^{\prime} u \gamma \alpha_{2}^{\prime} \doteq \beta_{1}^{\prime} v \gamma^{R} \beta_{2}^{\prime}. \end{array} $$

If \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}\) then γ = ab.

Proof

Let γ = c1c2cn with ciX, 1 ≤ in. Suppose that \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}} = {\varUpsilon }\). Note that (a, v),(u, b) ∈Υ. If |γ| = 1, then (u, v) ∈Υ, which by Remark 5.2, implies a = u, a contradiction to the assumption that E1 is regular. Similarly, if |γ| = 3, then (c2, v),(u, c2) ∈Υ which by Remark 5.2 implies c2 = a = b, again a contradiction to the assumption that E1 is regular. Thus, it follows that |γ| = 2. In this case, we have that (c1, v),(u, c2) ∈Υ. By Remark 5.2, it follows that c1 = a and c2 = b so γ = ab as required. □

Corollary 8.2

Let \(k\in \mathbb {N}\). For 1 ≤ i ≤ 4 and 1 ≤ jk, let \(\mu _{i},\mu _{i}^{\prime }, \alpha _{j}, \beta _{j} \in X^{*}\) such that |αj| = |βj| = 2. Let E1 and E2 be the jumbled basic RWEs given by
$$ \begin{array}{@{}rcl@{}} &&E_{1}: \quad \mu_{1} u \alpha_{1} \alpha_{2} {\ldots} \alpha_{k} \mu_{2} \doteq \mu_{3} v {\alpha_{1}^{R}} {\alpha_{2}^{R}}{\ldots} {\alpha_{k}^{R}} \mu_{4}\\ &&E_{2}: \quad \mu_{1}^{\prime} u \beta_{1} \beta_{2} {\ldots} \beta_{k} \mu_{2}^{\prime} \doteq \mu_{3}^{\prime} v {\beta_{1}^{R}} {\beta_{2}^{R}}{\ldots} {\beta_{k}^{R}} \mu_{4}^{\prime}. \end{array} $$

Suppose that \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}\). Then αj = βj for 1 ≤ jk.

Any initial block has the form \((x\alpha _{1}\alpha _{2} {\ldots } \alpha _{i}, y{\alpha _{1}^{R}}{\alpha _{2}^{R}} {\ldots } {\alpha _{i}^{R}})\) where x, yX and αjX with |αj| = 2 for 1 ≤ ji. Since x, y are fixed by ΥE, it follows from Corollary 8.2 that all the αj factors, for 1 ≤ ji are fixed exactly by the invariant ΥE. With a little additional effort, we can conclude the slightly more general statement that initial blocks occurring in the block decomposition of some equation E in normal form are fixed exactly by ΥE. Recall from the definitions that in a block decomposition (B0, B1,…,Bk) of an equation in normal form, B0 will be an initial block provided k ≥ 1 (if k = 0 then B0 = Bk will be a final block).

Lemma 8.3

Let E1, E2 be jumbled basic RWEs in normal form such that \({\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}\). Let (B0, B1,…,Bk) and (C0, C1,…,C) be the block decompositions of E1 and E2 respectively. Suppose that k, ≥ 1. Then B0 = C0.

Proof

Since E1 is in normal form, we may write it as \(x \alpha _{1} \alpha _{2}{\ldots } \alpha _{n} y \doteq y {\alpha _{1}^{R}} {\alpha _{2}^{R}} {\ldots } {\alpha _{n}^{R}} x\) with x, yX and αiX+ for 1 ≤ in such that |αi|≤ 3 for 1 ≤ i < n. Similarly, we may write E2 as \(x^{\prime } \alpha _{1}^{\prime } \alpha _{2}^{\prime }{\ldots } \alpha _{m}^{\prime } y^{\prime } \doteq y^{\prime } \alpha _{1}^{\prime R} \alpha _{2}^{\prime R} {\ldots } \alpha _{m}^{\prime R} x^{\prime }\) with \(x^{\prime },y^{\prime } \in X\) and \(\alpha _{i}^{\prime } \in X^{+}\) for 1 ≤ im such that \(|\alpha _{i}^{\prime }| \leq 3\) for 1 ≤ i < m. Suppose that \({\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}} = {\varUpsilon }\) and note that this implies var(E1) = var(E2). Similarly, is easily verified (either from the definition of ⇒, or from Remark 5.2) that \(x = x^{\prime }\) and \(y = y^{\prime }\).

Since k, ≥ 1, there must exist \(p = \min \limits \{ i \mid 1\leq i < n \text { and } |\alpha _{i}| \not = 2\}\) and \(q = \min \limits \{ i \mid 1\leq i < m \text { and } |\alpha _{i}^{\prime }| \not = 2 \}\). It follows that \(B_{0} = (x\alpha _{1}\alpha _{2}{\ldots } \alpha _{p-1}, y {\alpha _{1}^{R}} {\alpha _{2}^{R}} {\ldots } \alpha _{p-1}^{R})\) and \(C_{0} = (x\alpha _{1}^{\prime }\alpha _{2}^{\prime }{\ldots } \alpha _{q-1}^{\prime }, y \alpha _{1}^{\prime R} \alpha _{2}^{\prime R} {\ldots } \alpha _{q-1}^{\prime R})\). By Corollary 8.2, it follows that \(\alpha _{i} = \alpha ^{\prime }_{i}\) for \(1\leq i <\min \limits \{p,q\}\).

Suppose for contradiction that pq. W.l.o.g. suppose that p > q. Then we may write E1 and E2 as \(\mu _{1} u ab \mu _{2} \doteq \mu _{3} v ba \mu _{4}\) and \(\mu _{1}^{\prime } u ab \gamma \mu _{2}^{\prime } \doteq \mu _{3}^{\prime } v \gamma ^{R} \mu _{4}^{\prime }\) respectively where μ1, μ2, μ3, μ4, \(\mu _{1}^{\prime }\), \(\mu _{2}^{\prime }\), \(\mu _{3}^{\prime }\), \(\mu _{4}^{\prime },\gamma \in X^{*}\), u, v, a, bX, and |γ|∈{1,3} (in particular, this is true for ab = αq and \(\gamma = \alpha ^{\prime }_{q}\)). However in this case, it follows from Lemma 8.1 that \({\varUpsilon }_{\!E_{1}} \not = {\varUpsilon }_{\!E_{2}}\), a contradiction. Thus we must have that p = q, and the fact that B0 = C0 follows immediately. □

Similarly to initial blocks, we can use Corollary 8.2 to restrict standard blocks which are Type A. These blocks will have the form \((z \alpha _{1}\alpha _{2} {\ldots } \alpha _{i}, z {\alpha _{1}^{R}}{\alpha _{2}^{R}} {\ldots } {\alpha _{i}^{R}})\) where zX and αjX with |αj| = 2 for 1 ≤ ji. Hence the factors αj, 1 ≤ ji are fixed completely by ΥE and z. For Type B blocks, which instead have the form \((abc \alpha _{1}\alpha _{2} {\ldots } \alpha _{i}, cba {\alpha _{1}^{R}}{\alpha _{2}^{R}} {\ldots } {\alpha _{i}^{R}})\) with a, b, cX, we need the following additional observation.

Lemma 8.4

Let u, v, a, b, c,∈ X and let \(\alpha _{1},\alpha _{2},\beta _{1},\beta _{2},\alpha _{1}^{\prime },\alpha _{2}^{\prime }, \beta _{1}^{\prime }, \beta _{2}^{\prime }, \gamma \in X^{*}\) such that 1 ≤|γ|≤ 3. Let E1 and E2 be the basic regular word equations given by
$$ \begin{array}{@{}rcl@{}} &&E_{1}: \quad \alpha_{1} u a b c \alpha_{2} \doteq \beta_{1} v c b a \beta_{2}\\ && E_{2}: \quad \alpha_{1}^{\prime} u \gamma \alpha_{2}^{\prime} \doteq \beta_{1}^{\prime} v \gamma^{R} \beta_{2}^{\prime}. \end{array} $$

If \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}\) then there exist \(a^{\prime },c^{\prime } \in X\) such that \(\gamma = a^{\prime }bc^{\prime }\). Moreover, if \(a^{\prime } = a\), then \(c^{\prime } = c\).

Proof

Let γ = e1e2en with eiX, 1 ≤ in. Suppose that \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}} = {\varUpsilon }\). Note that (u, b),(a, c),(b, v) ∈Υ. If |γ| = 1, then (u, v) ∈Υ, and by Remark 5.2 we have that u = b, a contradiction to the assumption that E is regular. Thus we assume n ≥ 2. Then (u, e2),(en− 1, v) ∈Υ. Hence, we have e2 = en− 1 = b, and since E is regular, this implies that n = 3 so the statement holds with \(a^{\prime } = e_{1}, b^{\prime } = e_{3}\). Finally, we note that since \((a^{\prime },c^{\prime }) \in {\varUpsilon }\), by Remark 5.2, if \(a= a^{\prime }\) then \(c= c^{\prime }\) as claimed. □

In what follows we shall show that for two jumbled basic regular equations E1, E2 in Lex Normal Form with \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}\) and block decompositions of the same length, all blocks except the final blocks must be identical (Corollary 8.7). We have already shown in Lemma 8.3 that this is true for the initial blocks, The next step is to show that if the previous blocks in both block decompositions are identical, then the next blocks will have the same type.

Lemma 8.5

Let E1, E2 be jumbled basic regular word equations in normal form such that \({\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}\). Let (B0, B1,…,Bk) and (C0, C1,…,C) be block decompositions of E1 and E2 respectively. Suppose that \(i,j \in \mathbb {N}_{0}\) with i < k, j < such that Bi = Cj. Then Bi+ 1 and Cj+ 1 have the same type.

Proof

Since there are two types, it is sufficient to prove that Bi+ 1 is Type B if and only if Cj+ 1 is Type B. Suppose that Bi+ 1 is Type B and suppose for contradiction that Cj+ 1 is Type A. Then there exist γ1, γ2, γ3, γ4X, and a, b, c, dX such that Bi+ 1 = (abcγ1, cbaγ2) and Cj+ 1 = (dγ3, dγ4). Note that there exist u, vX such that Bi = Cj = (δ1u, δ2v) where δ1, δ2X. Hence there exist \(\alpha _{1},\alpha _{2},\beta _{1},\beta _{2},\alpha _{1}^{\prime },\alpha _{2}^{\prime },\beta _{1}^{\prime },\beta _{2}^{\prime } \in X^{*}\) such that E1 is may be written as \(\alpha _{1}u abc \alpha _{2} \doteq \beta _{1} v cba \beta _{2}\) and E2 may be written as \(\alpha _{1}^{\prime } u d \alpha _{2}^{\prime } \doteq \beta _{1}^{\prime } v d \beta _{2}^{\prime }\). However, by Lemma 8.4, this implies \({\varUpsilon }_{E_{1}} \not = {\varUpsilon }_{E_{2}}\), a contradiction. Consequently, Cj+ 1 is Type B if Bi+ 1 is Type B. The proof that Bi+ 1 is Type B if Cj+ 1 is Type B is symmetric and can be obtained by simply swapping E1 and E2. □

We are now ready to show that standard blocks in a block decomposition are fixed entirely by the preceding block, the invariant ΥE, and the leftmost letter of the block. This is the primary motivation for the definition of Lex Normal Form, which restricts the choice for the leftmost letter of the block where possible, and thus restricts the possibilities for the standard blocks. In particular, it follows directly by a straightforward induction that for two jumbled basic RWEs in Lex Normal Form with the same invariant ΥE, if their block decompositions have the same length, then all but the final blocks will be identical.

Lemma 8.6

Let E1, E2 be jumbled basic RWEs in normal form such that \({\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}\). Let (B0, B1,…,Bk) and (C0, C1,…,C) be their respective block decompositions and let k, > 0. Suppose that Bi = Cj, for some i < k − 1, j < − 1. Let Bi+ 1 = (γ1, γ2) and Cj+ 1 = (δ1, δ2) with γ1, γ2, δ1, δ2X. If γ1[1] = δ1[1], then Bi+ 1 = Cj+ 1.

Proof

Note that since 0 < i + 1 < k and 0 < j + 1 < , the blocks Bi+ 1 and Cj+ 1 are both standard blocks. Note also that by Lemma 8.5, Bi+ 1 and Cj+ 1 have the same type. Hence, by definition, there exist α1, α2,…,αn, β1, β2,…,βmX+ such that \(B_{i+1} = (\alpha _{1}\alpha _{2}\ldots \alpha _{n}, {\alpha _{1}^{R}}{\alpha _{2}^{R}}{\ldots } {\alpha _{n}^{R}})\) and \(C_{j+1} = (\beta _{1}\beta _{2}\ldots \beta _{m}, {\beta _{1}^{R}}{\beta _{2}^{R}}\ldots ,{\beta _{m}^{R}})\), where |α1| = |β1|∈{1,3} and |αp|,|βq| = 2 for 2 ≤ pn and 2 ≤ qm. Since Bi = Cj, there exist u, vX and \(\mu _{1},\mu _{2},\nu _{1},\nu _{2},\mu _{1}^{\prime },\mu _{2}^{\prime },\nu _{1}^{\prime },\nu _{2}^{\prime }, \eta ,\eta ^{\prime } \in X^{*}\) with \(|\eta |, |\eta ^{\prime }| \in \{1,3\}\) and such that E1 is given by \(\mu _{1} u \alpha _{1} \alpha _{2} {\ldots } \alpha _{n} \eta \mu _{2} \doteq \nu _{1} v {\alpha _{1}^{R}} {\alpha _{2}^{R}} {\ldots } {\alpha _{n}^{R}} \eta ^{R} \nu _{2}\) and E2 is given by \(\mu _{1}^{\prime } u \beta _{1} \beta _{2} {\ldots } \beta _{n} \eta ^{\prime } \mu _{2}^{\prime } \doteq \nu _{1}^{\prime } v {\beta _{1}^{R}} {\beta _{2}^{R}} {\ldots } {\beta _{n}^{R}} \eta ^{\prime R} \nu _{2}^{\prime }\).

By the assumption that γ[1] = δ[1], we have that α1[1] = β1[1] meaning if |α1| = |β1| = 1 then α1 = β1 holds trivially. Similarly, if |α1| = |β1| = 3, then it follows from Lemma 8.4 that α1 = β1. In both cases, it follows from Corollary 8.2 that additionally, αp = βp for \(2\leq p \leq \min \limits \{n,m\}\). It follows from Lemma 8.1 that n = m. Hence we have Bi+ 1 = Cj+ 1 as required. □

Note that if the first i blocks are identical in the block decompositions of two jumbled basic RWEs in Lex Normal Form with the same invariant set ΥE, it follows that the set \({{\varGamma }}^{E}_{i+1}\) is also the same in both cases. Consequently, by definition of Lex Normal Form, if the i + 1th blocks are not final blocks, the leftmost variable will be the same in each case (namely the lexicographically minimal element of \({{\varGamma }}^{E}_{i+1}\)). Consequently, by Lemma 8.6, the i + 1th blocks will also be identical. By a simple induction, we can thus conclude the following.

Corollary 8.7

Let E1, E2 be jumbled basic RWEs in Lex Normal Form such that \({\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}\). Let (B0, B1,…,Bk) and (C0, C1,…,C) be their respective block decompositions and suppose that k, > 0. Then Bi = Ci for \(0 \leq i < \min \limits (k,\ell )\).

Consequently, two equations in Lex Normal Form in the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) with block decompositions containing the same number of blocks may differ only in the final block. Clearly, the number of blocks in a block decomposition is at most Card(var(E)). Thus, in order to bound the number of equations in Lex Normal Form in \({\mathscr{G}}^{\Rightarrow }_{[E]}\), it suffices to count the possibilities for the final block.

Recall from the definition of normal form that the last (rightmost) αi factor is the only one which may have length greater than 3. Consequently, we need a counterpart to Lemmas 8.1 and 8.4 for this case, given by the following.

Lemma 8.8

Let \(u,v,x,y,x^{\prime },y^{\prime } \in X\) and let \(\alpha ,\beta ,\alpha ^{\prime }, \beta ^{\prime }, \gamma ,\gamma ^{\prime } \in X^{*}\) such that |γ|≥ 1. Let E1 and E2 be the basic regular word equations given by \( x \alpha u \gamma y \doteq y \beta v \gamma ^{R} x\) and \( x^{\prime } \alpha ^{\prime } u \gamma ^{\prime } y^{\prime } \doteq y^{\prime } \beta ^{\prime } v \gamma ^{\prime R} x^{\prime }\) respectively. If \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}\) and \(\gamma [1] = \gamma ^{\prime }[1]\) then \(\gamma = \gamma ^{\prime }\).

Proof

Let z1, z2,…,zn, w1, w2,…,wmX be variables such that γ = z1z2zn and \(\gamma ^{\prime } = w_{1}w_{2}{\ldots } w_{m}\) and suppose that z1 = w1. Suppose also that \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}} = {\varUpsilon }\). Note that for \(1\leq i \leq \min \limits \{n,m\}-2\), we have (zi, zi+ 2),(wi, wi+ 2) ∈Υ. Moreover, if n, m ≥ 2, we also have that (u, z2),(u, w2) ∈Υ. Consequently, by Remark 5.2, we have that wi = zi for \(1\leq i \leq \min \limits \{n,m\}\). If n = m we are done. Otherwise, suppose that nm, and note in particular that since E1, E2 are regular, this implies znwm. However, (zn, z1),(wn, w1) ∈Υ, and since w1 = z1, by Remark 5.2 we have that zn = wm, a contradiction. Thus we must have n = m and \(\gamma = \gamma ^{\prime }\) as claimed. □

The following lemma establishes conditions under which two final blocks must be identical, forming the basis for our bound on the number of possible final blocks in a block decomposition of an equation in Lex Normal Form, and consequently, a bound on the number of equations in Lex Normal Form itself.

Lemma 8.9

Let E1, E2 be jumbled basic RWEs in normal form such that \({\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}\). Let (B0, B1,…,Bk) and (C0, C1,…,C) be their respective block decompositions. Suppose that k, > 0 and that Bk− 1 = C− 1. Let \(B_{k} = (\alpha _{1}\alpha _{2}\ldots \alpha _{n} y, {\alpha _{1}^{R}}{\alpha _{2}^{R}}\ldots {\alpha _{n}^{R}} x)\) and \(C_{\ell } = (\beta _{1}\beta _{2}\ldots \beta _{m} y, {\beta _{1}^{R}}{\beta _{2}^{R}}\ldots ,{\beta _{m}^{R}}x)\), where x, yX, α1, α2,…,αn, β1, β2,…,βmX+, |α1| = |β1|∈{1,3} and |αi|,|βj| = 2 for 2 ≤ i < n and 2 ≤ j < m. Then if α1[1] = β1[1], n = m, and αn[1] = βm[1], we have Bk = C.

Proof

Suppose that all the conditions of the lemma are met. Note that Bk and C are both end blocks. Note also that by Lemma 8.5, Bk and C have the same type.

Since Bk− 1 = C− 1, there exist u, vX and \(\mu _{1},\mu _{2},\mu _{1}^{\prime },\mu _{2}^{\prime } \in X^{*}\) such that E1 and E2 are given by:
$$ \begin{array}{@{}rcl@{}} &&E_{1}: \quad x \mu_{1} u \alpha_{1} \alpha_{2} {\ldots} \alpha_{n} y \doteq y \mu_{2} v {\alpha_{1}^{R}} {\alpha_{2}^{R}} {\ldots} {\alpha_{n}^{R}} x\\ &&E_{2}: \quad x \mu_{1}^{\prime} u \beta_{1} \beta_{2} {\ldots} \beta_{n} y \doteq y \mu_{2}^{\prime}v {\beta_{1}^{R}} {\beta_{2}^{R}} {\ldots} {\beta_{n}^{R}} x. \end{array} $$

By the assumption that α1[1] = β1[1], we have that if |α1| = |β1| = 1 then trivially α1 = β1, and if |α1| = |β1| = 3, then α1 = β1 by Lemma 8.4. In both cases, it follows from Corollary 8.2 that αi = βi for \(1\leq i < \min \limits \{n,m\}\). It follows from Lemma 8.1 that n = m, and from Lemma 8.8 that αn = βm. Consequently, we have Bk = C as claimed. □

Lemma 8.9 reveals that the options for last block are dependent only on the choices of three parameters: α1[1],αn[1], and n. Since each of these can take at most |E| possible values, there are |E|3 possibilities altogether. Thus for each possible number of blocks, there are at most |E|3 possible block decompositions, and therefore only |E|4 possible block decompositions respecting the invariant ΥE in total. Since every equation in Lex Normal Form permits a unique block decomposition, this gives us our desired polynomial bound.

Theorem 8.10

Let E be a jumbled basic RWE. Let S be the set of basic regular equations \(E^{\prime }\) in Lex Normal Form for which \({\varUpsilon }_{E} = {\varUpsilon }_{E^{\prime }}\). Then Card(S) ≤|E|4.

Proof

We shall count possible block decompositions of equations \(E^{\prime }\) for which \({\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{E} = {\varUpsilon }\). Since the block decomposition uniquely determines the equation, this count is an upper bound on the number of equations in S. Note that \({\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{E}\), implies \({var}(E^{\prime }) = {var}(E)\).

It is straightforward from the definitions that any block decomposition of an equation \(E^{\prime }\) can have at most \({\text {Card}}({var}(E^{\prime })) = {\text {Card}}({var}(E)) < |E|\) blocks, so it is sufficient to count how many block decompositions with exactly N blocks are possible for each N ≤Card(var(E)).

We start with the case that the block decomposition consists of exactly one block (N = 1). Suppose we have two basic regular word equations E1, E2 in Lex Normal Form, such that \({\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}} = {\varUpsilon }\) (and so additionally var(E1) = var(E2) = var(E)). Suppose that (B0) and (C0) are the block decompositions of E1 and E2 respectively. By definition B0 = E1 and C0 = E2. It follows that \(B_{0} = (x\alpha _{1} \alpha _{2} {\ldots } \alpha _{o} y, y {\alpha _{1}^{R}} {\alpha _{2}^{R}} {\ldots } {\alpha _{o}^{R}} x)\) and \(C_{0} = (x^{\prime } \alpha _{1}^{\prime } \alpha _{2}^{\prime } {\ldots } \alpha _{m}^{\prime } y^{\prime }, y^{\prime } \alpha _{1}^{\prime },\alpha _{2}^{\prime },\ldots , \alpha _{m}^{\prime } x^{\prime } )\) where \(x,x^{\prime },y,y^{\prime } \in X\) and \(\alpha _{i},\alpha _{j}^{\prime } \in X^{+}\) for 1 ≤ io, 1 ≤ jm and such that |αi|,|αj| = 2 for 1 ≤ i < o and 1 ≤ j < m. It is easily verified (either from the definition of ⇒, or from Remark 5.2) that \(x = x^{\prime }\) and \(y = y^{\prime }\). Moreover, we clearly must have o, m < Card(var(E)). Now suppose that o = m. Then by Corollary 8.2, we may conclude that \(\alpha _{i} = \alpha _{i}^{\prime }\) for 1 ≤ i < n. Similarly, it follows from Lemma 8.9 that \(\alpha _{n} = \alpha ^{\prime }_{n}\), and thus B0 = C0. Hence, for each possible value of o, there is at most one possible block decomposition, meaning there are fewer than Card(var(E)) < |E| possible block decompositions containing only one block.

Now consider the cases that there is more than one block in the block decomposition (1 < N ≤Card(var(E))). Suppose we have two basic regular word equations E1, E2 in Lex Normal Form, such that \({\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}} = {\varUpsilon }\). Suppose that (B0, B1, B2,…,Bn) and (C0, C1,…,Cn) are the block decompositions of E1 and E2 respectively, and that they have the same number of blocks 1 < n ≤Card(var(E)). By Corollary 8.7, we have that Bi = Ci for 0 ≤ in − 1. By Lemma 8.9, there are at most |E|3 possibilities for the end block Cn. Thus there are at most |E|3 block decompositions overall with exactly n blocks for 1 < n ≤|E|. Thus at most |E|4 possible block decompositions in total, and the statement of the theorem follows. □

For a jumbled basic RWE E, since every vertex in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is a small (i.e. bounded by a polynomial in |E|) distance from a vertex in Lex Normal Form, and since there are only a small number of such vertices, it is straightforward to show that the diameter of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) must also be small: indeed if we have a sufficiently long path between two vertices, then we must have a long path between two vertices which are close to the same vertex in Lex Normal Form. Since they are close to the same vertex, we can find a shortcut between them, and the initial long path is not minimal. Knowing that the diameter of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is bounded by a polynomial in |E| when E is jumbled and basic, it follows from Theorems 6.8 and 4.8 (see also Remark 4.6) and Proposition 3.5 that the diameter of \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) is bounded by a polynomial in |E| whenever E is regular.

Theorem 8.11

Let E be a basic RWE. Then \({diam}({\mathscr{G}}_{[E]}^{\Rightarrow }) \in O(|E|^{10})\). Consequently, for any RWE E, \({diam}({\mathscr{G}}_{[E]}^{\Rightarrow _{NT}}) \in O(|E|^{12})\).

Proof

We shall first consider the case of \({diam}({\mathscr{G}}^{\Rightarrow }_{[E]})\) when E is jumbled, basic and regular. Let \(S = \{ E^{\prime } \in [E]_{\Rightarrow } \mid E^{\prime } \text { is in Lex Normal Form}\}\). By Theorem 5.3, \({\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}\) for all E1, E2 ∈ [E]. Thus, by Theorem 8.10, we have that Card(S) ≤|E|4. Moreover, by Theorem 7.11, for every \(E^{\prime } \in [E]_{\Rightarrow }\), there exists some \(\hat {E^{\prime }} \in S\) such that \(E^{\prime }\) is at most distance O(|E|4) from \(\hat {E^{\prime }}\), and \(\hat {E^{\prime }}\) is at distance at most O(|E|4) from \(E^{\prime }\) in the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\). From this, we may conclude that \({diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(|E|^{8})\) as follows: suppose for contradiction that, for an appropriate constant c, there exist \(\overline {E}_{1},\overline {E}_{2} \in [E]_{\Rightarrow }\) such that the minimal path between them in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) has length at least 2c|E|8 + 1. Let that path be E1, E2,…,En where \(E_{1} = \overline {E}_{1}\), \(E_{n} = \overline {E}_{2}\), and EiEi+ 1 for 1 ≤ in and such that n > 2c|E|8 + 1. Now, to each Ei, 1 ≤ in, we may associate some \(\hat {E_{i}} \in S\) such that the distance from Ei to \(\hat {E_{i}}\) is at most c|E|4. Since Card(S) ≤|E|4 and n > 2c|E|8 + 1, we must have that there exists \(\hat {E} \in S\) such that \(\hat {E} = \hat {E_{i}}\) for at least 2c|E|4 + 1 different values of i. This implies in particular that there exist i1, i2 with i1i2 > 2c|E|4 such that \(\hat {E_{i_{1}}} = \hat {E_{i_{2}}}\). It follows that the length of the path \(E_{i_{1}}, E_{i_{1}+1}, {\ldots } E_{i_{2}}\) is at least 2c|E|4 + 1, and moreover, since E1, E2,…,En is the shortest path between E1 and E2, \(E_{i_{1}}, E_{i_{1}+1}, {\ldots } E_{i_{2}}\) must also be the shortest path between \(E_{i_{1}}\) and \(E_{i_{2}}\). However, we have that \(E_{i_{1}}\) is distance at most c|E|4 from \(\hat {E}\), and that \(\hat {E}\) is at most distance \(E_{i_{2}}\) at most c|E|4 from \(E_{i_{2}}\). Consequently, \(E_{i_{1}}\) is distance at most 2c|E|4 from \(E_{i_{2}}\), a contradiction to the fact that \(E_{i_{1}}, E_{i_{1}+1}, {\ldots } E_{i_{2}}\) is the shortest possible path. Consequently, if E is jumbled basic and regular, then \({diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(|E|^{8})\).

Now we shall consider the case that E of \({diam}({\mathscr{G}}^{\Rightarrow }_{[E]})\) when E is basic and regular, but not necessarily jumbled. Suppose that E is given by \(\alpha \doteq \beta \). Let Y = var(E)∖Δ(E) and let \(E^{\prime }\) be the equation \(\pi _{Y}(\alpha ) \doteq \pi _{Y}(\beta )\). Clearly, \(E^{\prime }\) is basic, regular and \(|E^{\prime }| \leq |E|\). By Theorem 6.8, we have that \({diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O({diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]})|E|^{2})\). Moreover, by Lemma 6.3, \(E^{\prime }\) is jumbled. Thus by our previous claim, it follows that \({diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(|E^{\prime }|^{8}|E|^{2}) = O(|E|^{10})\).

Finally, we consider the case of \({diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \) for arbitrary regular equations E. Let E be any regular word equation. Then by Proposition 3.5, \({diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \leq 1+(|E|+1)m\) where
$$ m= \max\{{diam}(\mathscr{G}^{\Rightarrow}_{[E^{\prime}]}) \mid E \Rightarrow_{NT}^{*} E^{\prime}\}.$$
Now fix \(E^{\prime }\) be such that \(E \Rightarrow _{NT}^{*} E^{\prime }\) and \({diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}) = m\). Then since \(E \Rightarrow _{NT}^{*} E^{\prime }\), \(E^{\prime }\) is also regular and \(|E^{\prime }| \leq |E|\). Moreover by Theorem 4.8, there exists a basic regular equation \(E^{\prime \prime }\) such that \(|E^{\prime \prime }| \leq |E|\) and such that \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]}\) is isomorphic to an isolated path compression of order \(|E^{\prime }|\) of \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\). Thus (cf. Remark 4.6), we have \(m \leq |E^{\prime }| {diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]})\). Since \(E^{\prime \prime }\) is basic and regular, we have that \({diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]}) \in O(|E^{\prime \prime }|^{10})\). Since \(|E^{\prime \prime }|, |E^{\prime }| \leq |E|\), we therefore have mO(|E|11) and \({diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \in O(|E|^{12})\). □

Due to Proposition 3.4, we may infer directly from Theorem 8.11 that the satisfiability problem for regular word equations is in NP. It was already shown in [8] that this problem is NP-hard, and thus we obtain matching upper and lower bounds for its complexity.

Theorem 8.12

The satisfiability problem for RWEs is NP-complete.

Proof

Directly from Theorem 8.11 and Proposition 3.4. □

9 Size

While the diameter of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is one important parameter, being directly related to the complexity of the satisfiability problem, it is by no means the only interesting one. The overall size of the graphs will also play a central role in the practical performance of the algorithm described in Section 3.

For basic RWEs, we are able to give tight upper and lower bounds on the number of vertices in the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\), as well as identifying the cases in which these bounds are reached. Recalling Theorem 4.8, we are also able to translate these bounds into the case of general (i.e. not basic) RWEs. In particular, when moving to a general RWE from the corresponding basic one, the effect on the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is that ‘isolated paths’ of length linear in |E| are collapsed. In fact, an inspection of the proofs (in particular of Lemma 4.7) yields a tighter bound, namely that collapsed paths will have at most \(\max \limits (T_{1},T_{2})\) internal vertices where T1 and T2 are the number of occurrences of terminal symbols and single-occurrence variables in the LHS and RHS respectively.

Corollary 9.1

Let E be an RWE given by \(\alpha \doteq \beta \). Let Ebasic be the corresponding basic equation as per Theorem 4.8. Let n = Card(qv(E)) and let \(M = \max \limits \{ |\alpha |-n, |\beta |-n \}\). Then
$$ {\text{Card}}([E_{basic}]_{\Rightarrow}) \leq {\text{Card}}([E]_{\Rightarrow}) \leq M{\text{Card}}([E_{basic}]_{\Rightarrow}).$$

We begin with the upper bounds, which occur in the case of basic regular-rotated word equations.

Lemma 9.2

Let E be a basic regular word equation. Let n = Card(var(E)) and suppose that n ≥ 2. Let V be the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\). Then \(V\leq \frac {n!}{2}\). Moreover, \(V = \frac {n!}{2}\) if and only if there exists \(E^{\prime } \in [E]_{\Rightarrow }\) such that \(E^{\prime }\) is regular rotated.

Proof

Let E be a basic regular word equation. Let n = Card(var(E)) and suppose that n ≥ 2. Let V = Card([E]) be the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\). We shall begin with the claim that \(V \leq \frac {n!}{2}\). To do this, we recall that from Theorem 5.3, the set \(S_{{\varUpsilon }} = \{E^{\prime } \mid E^{\prime } \text { is a basic regular equation such that } {\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} \}\) is a (not necessarily strict) superset of [E]. We shall show that the cardinality of SΥ is at most \(\frac {n!}{2}\). Let Υ = ΥE and let \(E^{\prime }\) be a regular basic equation such that \({\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }\). Now, it follows from the definition of Υ that \({var}(E^{\prime }) = {var}(E)\) and that the rightmost variables the LHS (resp. RHS) of E and \(E^{\prime }\) are the same. More precisely, there exist x, yvar(E) and \(\alpha ,\alpha ^{\prime },\beta ,\beta ^{\prime } \in X^{*}\) such that E may be written \(\alpha x \doteq \beta y\) and \(E^{\prime }\) may be written as \(\alpha ^{\prime } x \doteq \beta ^{\prime } y\). Clearly, there are at most (n − 1)! possibilities for \(\alpha ^{\prime }\). Moreover, since \({\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }\) is fixed, we can, given \(\alpha ^{\prime }\), for each \(u \in {var}(\beta ^{\prime }) \backslash \{\alpha ^{\prime }[1],\beta ^{\prime }[1]\}\), determine uniquely the predecessor of u in \(\beta ^{\prime } y\). More precisely, there exist factors vu and \(v^{\prime }u\) of \(\alpha ^{\prime }x\) and \(\beta ^{\prime }y\) respectively where \(v,v^{\prime }\in {var}(E)\). Thus \((v,v^{\prime }) \in {\varUpsilon }\), so if v is fixed (i.e. by \(\alpha ^{\prime }\)) then \(v^{\prime }\) is also fixed by Υ. It follows directly that for each choice of \(\alpha ^{\prime }\), there exists a unique suffix γ of \(\beta ^{\prime }y\) having \(\alpha ^{\prime }[1]\) as a prefix. Moreover, once the variable occurring immediately to the left of γ (i.e. the predecessor of γ[1] in \(\beta ^{\prime }y\)) is fixed, then \(\beta ^{\prime }y\) is fixed entirely, meaning that there are n −|γ| possible choices for \(\beta ^{\prime }y\) once \(\alpha ^{\prime }\) is fixed.

Next, we shall show that for each k,1 ≤ kn − 1, there are exactly (n − 2)! choices of \(\alpha ^{\prime }\) such that the corresponding γ has length exactly k. For other values of k, there are no possible choices of \(\alpha ^{\prime }\) due to the fact that every equation in SΥ is basic and regular (note in particular that the case k = n would result in an equation which is decomposable and therefore not basic). It follows from this that the cardinality of SΥ is at most \(\frac {n!}{2}\):
$$ {\text{Card}}(S_{{\varUpsilon}}) \leq \sum\limits_{k = 1}^{n-1}k(n-2)! = (n-2)!\sum\limits_{k = 1}^{n-1} k = (n-2)! \frac{n(n-1)}{2} = \frac{n!}{2}.$$
To see why there are exactly (n − 2)! choices of \(\alpha ^{\prime }\) such that the corresponding γ has length k, we shall take a slightly different approach to constructing/selecting \(\alpha ^{\prime }\) and \(\beta ^{\prime }\). In particular, we shall first choose γ and then see how many choices there are for \(\alpha ^{\prime }\). Let \(k \in \mathbb {N}\) such that 1 ≤ k < n.

By definition of ΥE, we must have that if γ = v1v2vk− 1y, then there exist u1, u2,…,uk− 1var(E) such that \(\alpha ^{\prime }[1] = v_{1}\) and (ui, vi) ∈Υ for 1 ≤ ik − 2, (uk− 1, y) ∈Υ, and such that uk− 1y is a factor of \(\alpha ^{\prime }x\) and uivi+ 1 are factors of \(\alpha ^{\prime }x\) for 1 ≤ ik − 2. Since \(E^{\prime }\) is regular, it follows that vix for 1 ≤ ik − 1. Consequently, there are \({{n-2}\choose {k-1}} (k-1)! = \frac {(n-2)!}{(n-k-1)!}\) possible ways of choosing γ. Once γ is fixed, then, since uk− 1y is a factor of \(\alpha ^{\prime }x\) and uivi+ 1 are factors of \(\alpha ^{\prime }x\) for 1 ≤ ik − 2, we may infer that \(\alpha ^{\prime }\) is uniquely determined by the relative order of the variables in var(E)∖{x, y, v1, v2,…,vk− 1}, and thus there are (nk − 1)! possible choices for \(\alpha ^{\prime }\) for each choice of γ. Altogether we have \((n-k-1)! \frac {(n-2)!}{(n-k-1)!} = (n-2)!\) possible choices for \(\alpha ^{\prime }\) as claimed, and it follows that \(V \leq \frac {n!}{2}\).

It remains to consider the claim that \(V = \frac {n!}{2}\) if and only if there exists \(E^{\prime } \in [E]_{\Rightarrow }\) such that \(E^{\prime }\) is regular rotated. Note that since n > 1, and since \(E^{\prime }\) is basic (and therefore indecomposable) for all \(E^{\prime } \in [E]_{\Rightarrow }\), \(E^{\prime }\) is not regular ordered for all \(E^{\prime } \in [E]_{\Rightarrow }\).

We shall begin with the ‘if’ direction. Let V = Card([E]) be the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\). Then we may assume w.l.o.g. that E is regular rotated and thus we can write E as \(y_{1}y_{2}{\ldots } y_{k} x_{1} y_{k+1} y_{k+2} {\ldots } y_{\ell } x_{2} \doteq y_{k+1} y_{k+2} {\ldots } y_{\ell } x_{2} y_{1}y_{2}{\ldots } y_{k} x_{1}\) where x1, x2, y1, y2,…yX, = n − 2 and k. Then Δ(E) = {y1, y2,…,y}. Consequently, by Theorem 6.8, the set of equations
$$ S= \{\alpha x_{1} \beta x_{2} \doteq \beta x_{2} \alpha x_{1} \mid |\alpha\beta|_{y} = 1 \text{ if }y \in {{\varDelta}}(E) \text{ and } |\alpha\beta|_{z} = 0 \text{ otherwise}\}$$
is a subset of [E]. Now, for each i,1 ≤ i = Card(Δ(E)), let the set SiS be the set
$$ S_{i}= \{\alpha x_{1} \beta x_{2} \!\doteq\! \beta x_{2} \alpha x_{1} \mid |\alpha| = i \land |\alpha\beta|_{y} = 1 \text{ if }y \in {{\varDelta}}(E) \text{ and } |\alpha\beta|_{z} = 0 \text{ otherwise} \}.$$
Clearly, we have \(S = \bigcup \limits _{0 \leq i \leq \ell } S_{i}\). Moreover, we have that Card(Si) = ! = (n − 2)! for each i,0 ≤ i. Finally, note that for each i,0 ≤ i, if \(E^{\prime } \in S_{i}\), then for \(T_{E^{\prime }} = \{E^{\prime \prime } \mid E^{\prime } \Rightarrow _{R}^{*} E^{\prime \prime } \}\), we have that \({\text {Card}}(T_{E^{\prime }}) = i+1\) It is straightforward from the definitions that for E1, E2S, if E1E2, then \(T_{E_{1}} \cap T_{E_{2}} = \emptyset \). Consequently, we may conclude that
$$V \geq \sum\limits_{E^{\prime} \in S} {\text{Card}}(T_{E^{\prime}}) = \sum\limits_{0 \leq i \leq \ell} (i+1){\text{Card}}(S_{i}) = \frac{(\ell+1)(\ell+2)}{2}(n-2)! = \frac{n!}{2}.$$
We have already shown that \(V \leq \frac {n!}{2}\), so \(V= \frac {n!}{2}\) as required.

Suppose now that \(E^{\prime }\) is not regular rotated for all \(E^{\prime } \in [E]_{\Rightarrow }\). To see that \(V< \frac {n!}{2}\), it suffices to notice that we can decrease the bound on Card(SΥ) if not all the previously considered possibilities for the left-hand-sides \(\alpha ^{\prime }y\) are actually possible.

Recall from the Theorem 5.3 that \({{\varDelta }}(E) = {{\varDelta }}(E^{\prime })\) for all \(E^{\prime } \in [E]_{\Rightarrow }\). Moreover, it follows from the definitions that the rightmost variables on each side of the equation are not contained in (Δ(E)) and thus Card(Δ(E)) ≤ n − 2. Next, suppose (for contradiction) that Card(Δ(E)) = n − 2. Then there exist z1, z2,…,znX and i,1 ≤ i < n such that zn is a suffix of the LHS of E and zi is a suffix of the RHS of E, meaning that Δ(E) = {zj∣1 ≤ j < n, ji}. Consequently, there exists j, i < jn such that E may be written \(z_{1} z_{2} {\ldots } z_{n} \doteq z_{j+1} {\ldots } z_{n-1} z_{n} z_{i+1} {\ldots } z_{j-1} z_{j} z_{1}{\ldots } z_{i-2} z_{i-1} z_{i}\). Thus \(E \Rightarrow _{L}^{*} E^{\prime }\) where \(E^{\prime }\) is given by \( z_{1} z_{2} {\ldots } z_{n} \doteq z_{i+1} {\ldots } z_{j-1} z_{j} z_{j+1} {\ldots } z_{n-1} z_{n} z_{1}{\ldots } z_{i-2} z_{i-1} z_{i}\). However, \(E^{\prime }\) is regular-rotated, a contradiction.

Hence, we may assume that Card(Δ(E)) < n − 2, and consequently, there exist pairwise distinct variables u, v, x, yvar(E) such that (u, v),(x, y) ∈ΥE. However, if this is the case, then the LHS of any equation in [E] cannot contain both the factors uv and xy. Suppose for contradiction that both factors were present in the LHS, then by definition of ΥE, there must exist zX such that either uz is a factor of the LHS and vz is a factor of the RHS, or xz is a factor of the LHS and yz is a factor of the RHS. W.l.o.g. we may assume the first case that uz is a factor of the LHS and vz is a factor of the RHS. However, by the assumption that uv is also a factor of the LHS, we have z = v, and consequently vv is a factor of the RHS, a contradiction to the fact that E is regular. It follows in this case that \({\text {Card}}(S_{{\varUpsilon }}) < \frac {n}{2}\), and thus that \(V < \frac {n}{2}\). □

We can use Corollary 9.1 to adapt Lemma 9.2 to general RWEs as follows. Let E be a RWE given by \(\alpha \doteq \beta \), let n = Card(qv(E)), and let \(T = \max \limits \{|\alpha |-n, |\beta |-n\}\). Let Ebasic be the corresponding basic RWE as per Theorem 4.8. Clearly for Card([E]) to be maximal, E should be indecomposable. Now, by Corollary 9.1, we have that \({\text {Card}}([E]_{\Rightarrow }) \leq T {\text {Card}}([E_{basic}]_{\Rightarrow }) \leq T\frac {n!}{2} \leq \frac {(n+T)!}{2} = \frac {(\max \limits \{|\alpha |,|\beta |\})!}{2} \).

Note also that if E is not regular-rotated, then either Ebasic is not regular-rotated, or E is decomposable and Ebasic is regular-rotated but with fewer variables. In either case it follows that the second inequality becomes strict. Similarly, if T≠ 0, then the third inequality becomes strict. Hence we get the following.

Corollary 9.3

Let E be a RWE given by \(\alpha \doteq \beta \). Let \(M = \max \limits \{|\alpha |,|\beta |\}\). Let V be the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\). Then \(V\leq \frac {M!}{2}\). Moreover, \(V = \frac {M!}{2}\) if and only if E is basic and there exists \(E^{\prime } \in [E]_{\Rightarrow }\) such that \(E^{\prime }\) is regular rotated.

For upper bounds on the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\), we consider the class of regular-reversed equations. We shall eventually prove a statement similar to that of Lemma 9.2, but first we need some additional definitions and lemmas. Our reasoning in this case revolves primarily around a particular binary-tree like structure arising locally in the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\). The binary trees do not occur directly as subgraphs of \({\mathscr{G}}^{\Rightarrow }_{[E]}\), but rather can be obtained by treating certain short paths as edges. The relation defining the ‘edges’ of the tree is given by ⊳, introduced formally below. By showing that these binary trees always occur in the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\), and by verifying that they are balanced and have height proportional to the number of edges, we are able to produce the lower bound on the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) given in Lemma 9.11.

Definition 9.4 (→R,→L,⊳,W(E))

Let E be a basic RWE such that Card(var(E)) ≥ 2. Then we may write E in the form
$$x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta$$
with x, y, z1, z2,…,zk, w1, w2,…,wkX such that {z1, z2,…,zk} = {w1, w2,…,wk}, and α, β, γ1, γ2,…,γk, δ1, δ2,…,δk ∈ (X∖{x, y, z1, z2,…,zk}) such that for each i, j, 0 ≤ i, jk, we have var(γi) ∩ var(δj) = . Note that this decomposition is unique. We define W(E) = {x, y, z1, z2,…,zk}. Moreover, there exist i, j such that wi = zk and zj = wk. We define the relations →L and →R such that
$$ \begin{array}{@{}rcl@{}} && x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta\\ \to_{L} && {}x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} \!\ldots\! z_{k} \gamma_{k} y \alpha\! \doteq\! w_{i} \delta_{i} w_{i+1} \delta_{i+1} \!\ldots\! w_{k} \delta_{k} y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{i-1} \delta_{i-1} x \beta \end{array} $$
and
$$ \begin{array}{@{}rcl@{}} && x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta\\ \to_{R} && z_{j} \gamma_{i} z_{j+1} \gamma_{j+1} \!\ldots\! z_{k} \gamma_{k} x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} \!\ldots\! z_{j-1} \gamma_{j-1} y \alpha \!\doteq\! y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta \end{array} $$

Additionally, for convenience, we define ⊳ =→L∪→R.

The tree-structure we are interested in is the set \(S = \{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}\) for a given basic RWE E with at least two variables (the one-variable case being trivial). An example is given by Fig. 7. The following fact can be verified directly from the definition, and confirms that the set S is indeed contained in \({\mathscr{G}}^{\Rightarrow }_{[E]}\).
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig7_HTML.png
Fig. 7

The set \(S = \{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}\) occurring as a subset of the vertices of the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) in the case that E is given by \(x_{1}x_{2}x_{3}x_{4} \doteq x_{4}x_{2}x_{3}x_{1}\). In order to conserve space, for each vertex, the equation is arranged vertically with the LHS above and the RHS below. The vertices belonging to S are highlighted in bold, and E is shaded (blue). The tree structure induced by the relation ⊳ is given by the bold solid edges, while the edges of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) are dashed. Note that the edges due to ⊳ do not necessarily coincide with edges due to ⇒, but for every ⊳-edge, there is a corresponding path using ⇒-edges, guaranteeing that \(S \subseteq [E]_{\Rightarrow }\). In this case we have that W(E) = var(E) = {x1, x2, x3, x4}, so S forms a tree of height 24 − 2 − 1 = 3, and contains exactly 24 − 1 − 1 = 7 equations

Fact 9.5

Let E1, E2 be basic RWEs with Card(var(E1)),Card(var(E2)) ≥ 2. Let Z ∈{L, R}. If E1ZE2, then \(E_{1} \Rightarrow _{Z}^{*} E_{2}\). Conversely, if E1ZE2, then either \(E_{1} \to _{Z}^{*} E_{2}\) or \(E_{2} \to _{Z}^{*} E_{1}\).

In what follows, in order to understand the number of equations occurring in \(S = \{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}\), we shall show that when combined with the relation ⊳, it becomes a balanced binary tree of height Card(W(E)) − 1. We proceed by noting two more facts following directly from the definition. Fact 9.6 provides the first step towards understanding why ⊳ induces a binary tree like structure on S: the leaf nodes are equations for which Card(W(E)) = 2, while all other equations have exactly two children w.r.t. ⊳.6

Fact 9.6

Let E be a basic RWE with Card(var(E)) ≥ 2. Then the following statements are equivalent.
  1. 1.

    Card(W(E)) > 2,

     
  2. 2.

    there exists \(E^{\prime }\) such that \(E \to _{L} E^{\prime }\),

     
  3. 3.

    there exists \(E^{\prime }\) such that \(E \to _{R} E^{\prime }\).

     

Fact 9.7 allows us to infer exactly the height of the tree by establishing a natural ordering (namely the cardinality of W(E)) on equations. Note that by Fact 9.6, whenever we move from a an equation to one of its children w.r.t. ⊳, we decrease Card(W(E)) by exactly one.

Fact 9.7

Let E1, E2 be basic RWEs with Card(var(E1)),Card(var(E2)) ≥ 2. Let Z ∈{L, R} and suppose that E1ZE2. Suppose that x, yX and let α1, α2, β1, β2 ∈ (X∖{x, y}) such that E1 may be written \(x\alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}\). If Z = L, then W(E2) = W(E1)∖{y} and if Z = R, then W(E2) = W(E1)∖{x}.

Facts 9.7 and 9.6 are sufficient to observe that the set \(\{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}\) combined with ⊳ forms a DAG of bounded height. However, this is not sufficient for our purposes of providing a lower bound on the number of equations contained in \(\{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}\). The following lemma shows that this DAG is in fact a tree by confirming that for each equation (which is not a leaf node), the two ‘subtrees’ rooted at the two children of that equation do not share any vertices.

Lemma 9.8

Let E, E1, E2 be basic regular word equations such that ELE1 and ERE2. Let \(S_{1} = \{E_{1}^{\prime } \mid E_{1} \triangleright ^{*} E_{1}^{\prime }\}\) and let \(S_{2} = \{E_{2}^{\prime } \mid E_{2} \triangleright ^{*} E_{2}^{\prime }\}\). Then S1S2 = and ES1S2.

Proof

The fact that ES1S2 follows from the fact that, by Fact 9.7, for all \(E^{\prime } \in S_{1} \cup S_{2}\), we have \({\text {Card}}(W(E^{\prime })) \leq {\text {Card}}(W(E_{1})) = {\text {Card}}(W(E_{2})) < {\text {Card}}(W(E))\). We shall next consider the claim that S1S2 = . Notice that it follows from the definitions of →R and →L that if \(E^{\prime } \triangleright E^{\prime \prime }\) and \(w \in {var}(E^{\prime }) \backslash W(E^{\prime })\), then firstly \(w \in {var}(E^{\prime \prime }) \backslash W(E^{\prime \prime })\), and secondly \(Q_{E^{\prime }}(w) = Q_{E^{\prime \prime }}(w)\) where \(Q_{E^{\prime }}, Q_{E^{\prime \prime }}\) are the functions defined in accordance with Definition 5.1. Now, if Card(W(E)) ≤ 2, then the statement follows trivially. Otherwise let x, y, z1, z2,…,zk, w1, w2,…,wkX such that {z1, z2,…,zk} = {w1, w2,…,wk}, and α, β, γ1, γ2,…,γk, δ1, δ2,…,δk ∈ (X∖{x, y, z1, z2,…,zk}) such that var(γi) ∩ var(δj) = for 0 ≤ ik and such that E may be written as:
$$x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta. $$
From Fact 9.7, it follows that yW(E1), so we may conclude that \(Q_{E^{\prime }}(y) = Q_{E_{1}}(y)\) for all \(E^{\prime } \in S_{1}\). Similarly, it follows from Fact 9.7 that xW(E2), and we may hence conclude that \(Q_{E^{\prime }}(x) = Q_{E_{2}}(x)\) for all \(E^{\prime } \in S_{2}\). Now, let u, v be the rightmost variables in zkγk and wkδk respectively. Then \(Q_{E_{1}}(y) = Q_{E_{2}}(x) = (u,v)\). However, since \(E^{\prime }\) is regular, xy, so by properties of the functions \(Q_{E^{\prime }}\) (namely that by Remark 5.2 they are injective), we cannot have that \(Q_{E^{\prime }}(x) = (u,v)\) for any \(E^{\prime } \in S_{1}\) and likewise we cannot have \(Q_{E^{\prime }}(y) = (u,v)\) for any \(E^{\prime } \in S_{2}\). Consequently, S1S2 = . □

Lemma 9.8, along with Facts 9.6 and 9.7, are sufficient to confirm our claim that the set \(\{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}\) forms a balanced binary tree of height Card(W(E)) − 2. Thus we are now in a position to state the cardinality of \(\{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}\) precisely as follows.

Lemma 9.9

Let E be a basic regular word equation such that Card(W(E)) ≥ 2. Let \(S = \{ E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}\). Then Card(S) = 2Card(W(E))− 1 − 1.

Proof

We shall prove the claim by induction on Card(W(E)). If Card(W(E)) = 2 then S = {E} and the statement is immediate. Now suppose that the claim holds for all basic regular word equations E such that Card(W(E)) ≤ n for some n ≥ 2. Let E be a basic regular word equation such that Card(W(E)) = n + 1. Then Card(W(E)) > 2, so by Fact 9.6, there exist E1, E2 ∈ [E] such that ELE1 and ERE2. From the definitions, we have that S = {E}∪ S1S2 where \(S_{1} = \{E_{1}^{\prime } \mid E_{1} \triangleright ^{*} E_{1}^{\prime }\}\) and \(S_{2} = \{E_{2}^{\prime } \mid E_{2} \triangleright ^{*} E_{2}^{\prime }\}\). By Lemma 9.8, it follows that Card(S) = 1 + Card(S1) + Card(S2). Moreover, since Card(W(E1)) = Card(W(E2)) = n, we have from our induction hypothesis that Card(S1) = Card(S2) = 2n− 1 − 1. Thus we have Card(S) = 2(2n− 1 − 1) + 1 = 2(n+ 1)− 1 − 1 as required. □

Lemma 9.9 together with Fact 9.5 are sufficient to provide lower bounds on the number of vertices of \({\mathscr{G}}^{\Rightarrow {[E]}}\), and we are nearly ready to provide the counterpart to Lemma 9.2. The final step before we do so is the following lemma which characterises the basic RWEs E for which the set of vertices of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is exactly W(E). Since by Fact 9.5, W(E) is always a subset of the vertices of \({\mathscr{G}}^{\Rightarrow }_{[E]}\), this naturally leads us to the extremal case in which the lower bound is obtained.

Lemma 9.10

Let E be a basic regular word equation. Let \(S = \{E^{\prime } \mid E\triangleright ^{*} E^{\prime }\}\). Then S = [E] if and only if E is regular reversed.

Proof

Let E be a basic regular word equation. If Card(var(E)) = 1 then E can be written as x = x, for some xX, meaning that E is regular reversed, and moreover, that S = [E] = {E}, so the statement holds trivially. Suppose henceforth that Card(var(E)) ≥ 2.

Consider first the case that \(E^{\prime }\) is not regular reversed for all \(E^{\prime } \in [E]_{\Rightarrow }\). Then by Lemma 6.13, there exists E1 ∈ [E] such that E1 has the form \(x \alpha y \doteq y \beta x\) where x, yX and α, β ∈ (X∖{x, y}). By our assumption, E1 is not regular reversed. Hence we may write E1 as:
$$x \alpha_{1} u \alpha_{2} v \alpha_{3} y \doteq y \beta_{1} u \beta_{2} v \beta_{3} x$$
where x, y, u, vX and α1, α2, α3, β1, β2, β3 ∈ (X∖{x, y, u, v}). Thus, by Lemma 7.2 we have that E2 ∈ [E] where E2 is given by \(x \alpha _{1} v \alpha _{3} u \alpha _{2} y \doteq y \beta _{1} v \beta _{3} u \beta _{2} x\). However, Card(W(E1)) = Card(W(E2)) = n. Since by Fact 9.7, \(E^{\prime } \triangleright E^{\prime \prime }\) implies \({\text {Card}}(W(E^{\prime \prime })) < {\text {Card}}(W(E^{\prime }))\), and hence \({\text {Card}}(W(E^{\prime })) < {\text {Card}}(W(E))\) for all \(E^{\prime } \in S \backslash \{E\}\), we may immediately conclude that at least one of E1, E2S, and hence S≠[E].

Now suppose that E is regular reversed. We have the following claim:

Claim 9.10.1

Let \(E^{\prime } \in S\) be given by \(\alpha \doteq \beta \). Then the equation \(\pi _{W(E^{\prime })}(\alpha ) \doteq \pi _{W(E^{\prime })}(\beta ) \) is regular reversed.

Proof

We shall prove the claim by induction on \({\text {Card}}(W(E^{\prime }))\). In particular note that if \({\text {Card}}(W(E^{\prime })) = {\text {Card}}(W(E))\), then by Fact 9.7, we have \(E^{\prime } = E\) and the statement holds trivially. Now suppose for some n that the claim holds for all \(E^{\prime } \in S\) with \({\text {Card}}(W(E^{\prime })) \geq n\). Let \(E^{\prime } \in S\) such that \({\text {Card}}(W(E^{\prime })) = n-1\). By definition, since \(E^{\prime } \not = E\), there exists \(E^{\prime \prime } \in S\) such that \(E^{\prime \prime } \triangleright E^{\prime }\). By Fact 9.7, we have also that \({\text {Card}}(W(E^{\prime \prime })) = n\). Assume w.l.o.g. that \(E^{\prime \prime } \to _{R} E^{\prime }\). Then by the induction hypothesis, there exist x, y, z1, z2,…,zn− 2X, and α, β, γ0, γ1, γ2,…,γk, δ0, δ1, δ2,…,δk ∈ (X∖{x, y, z1, z2, …,zk}) such that var(γi) ∩ var(δj) = for 0 ≤ ik and such that \(E^{\prime \prime }\) is given by
$$x \gamma_{0} z_{1} \gamma_{1} z_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} z_{k} \delta_{1} z_{k-1} \delta_{2} {\ldots} z_{1} \delta_{k} x \beta$$
and \(E^{\prime }\) is given by
$$ z_{1} \gamma_{1} z_{2} {\ldots} z_{k} \gamma_{k} x \gamma_{0} y \alpha \doteq y \delta_{0} z_{k} \delta_{1} z_{k-1} \delta_{2} {\ldots} z_{1} \delta_{k} x \beta.$$
Note that \(W(E^{\prime }) = W(E^{\prime \prime }) \backslash \{x\} = \{y,z_{1},z_{2},\ldots ,z_{k}\}\). Erasing all the variables not in \(W(E^{\prime })\) from \(E^{\prime }\) yields
$$ z_{1}z_{2} {\ldots} z_{k} y \doteq y z_{k} z_{k-1} {\ldots} z_{1}$$
which is regular reversed so the statement of the claim holds for \(E^{\prime }\). By induction, it holds for all \(E^{\prime } \in S\) as required. □
Now suppose for contradiction that [E]S. This implies that there exists \(E^{\prime } \in [E]_{\Rightarrow }\) such that \(E^{\prime } \notin S\). Now, by Fact 9.5, this implies that there exists a sequence E1, E2,…En such that E1 = E, EnS and such that either EiEi+ 1 or Ei+ 1Ei for each i,1 ≤ i < n. Let us take the shortest such sequence. Note that this implies that EiS for all i,1 ≤ i < n, and consequently, that EiEi+ 1 for all i,1 ≤ i < n − 1, and that En− 1En, meaning that EnEn− 1 instead. It follows from the fact that W(E) = Card(var(E)), and by Fact 9.7 that there does not exist \(E^{\prime } \in [E]_{\Rightarrow }\) such that \(E^{\prime } \triangleright E\). Hence we may additionally conclude that n > 2. Moreover, since En− 2S and EnS, we have that En− 2En. Thus we must necessarily have that either En− 2LEn− 1 and EnREn− 1, or symmetrically En− 2REn− 1 and EnLEn− 1. W.l.o.g. we may assume the first case holds. Then it follows from the definitions that there exist x1, x2, y1, y2, z1, z2, α1, α2, α3, β1, β2, β3, γ1, γ2, γ3, δ1, δ2, δ3 such that \({var}(\alpha _{1}\alpha _{2}) \subseteq {var}(\beta _{3})\), \({var}(\beta _{1}\beta _{2}) \subseteq {var}(\alpha _{3})\), \({var}(\gamma _{1}\gamma _{2}) \subseteq {var}(\delta _{3})\) and \({var}(\delta _{1}\delta _{2}) \subseteq {var}(\gamma _{3})\), and such that En− 2 is given by \(x_{1} \alpha _{1} z_{1} \alpha _{2} y_{1} \alpha _{3} \doteq y_{1} \beta _{1} z_{1} \beta _{2} x_{1} \beta _{3}\), En is given by \(x_{2} \gamma _{1} z_{2} \gamma _{2} y_{2} \gamma _{3} \doteq y_{2} \delta _{1} z_{2} \delta _{2} x_{2} \delta _{3}\), and therefore that En− 1 can be written both as
$$ z_{1} \alpha_{2} x_{1} \alpha_{1} y_{1} \alpha_{3} \doteq y_{1} \beta_{1} z_{1} \beta_{2} x_{1} \beta_{3} \text{ and as } x_{2} \gamma_{1} z_{2} \gamma_{2} y_{2} \gamma_{3} \doteq z_{2} \delta_{2} y_{2} \delta_{1} x_{2} \delta_{3}.$$
It follows that x2 = z1, z2 = y1, and thus that γ1 = α2x1α1, α3 = γ2y2γ3, β1 = δ2y2δ1, and δ3 = β2x1β3. Consequently, we may write En− 2 as:
$$ x_{1} \alpha_{1} z_{1} \alpha_{2} y_{1} \gamma_{2} y_{2} \gamma_{3} \doteq y_{1} \delta_{2} y_{2} \delta_{1} z_{1} \beta_{2} x_{1} \beta_{3}. $$
Now, let \(E_{n-1}^{\prime }\) be the equation
$$ x_{1} \alpha_{1} z_{1} \alpha_{2} y_{1} \gamma_{2} y_{2} \gamma_{3} \doteq y_{2} \delta_{1} z_{1} \beta_{2} y_{1} \delta_{2} x_{1} \beta_{3}. $$
Since \({var}(\delta _{2}) \subseteq {var}(\gamma _{3}) \subseteq {var}(\alpha _{3})\), we have that var(δ2) ∩ var(α1α2γ2) = , and consequently, \(E_{n-1}^{\prime } \to _{L} E_{n-2}\). However, since \(z_{1},y_{1}\in W(E_{n-1}^{\prime })\), we can infer from Claim 1.10.1 that En− 1S. However, this contradicts our earlier assumption that the sequence E1, E2,…,En is minimal, since \(E_{1}, E_{2}, {\ldots } E_{n-2}, E_{n-1}^{\prime }\) also satisfies that E1 = E, \(E_{n-1}^{\prime } \notin S\) and EiEi+ 1 or Ei+ 1Ei for 1 ≤ i < n − 2 and \(E_{n-1}^{\prime } \triangleright E_{n-2}\). Thus, we must have that [E] = S as required. □

We are now ready to give the tight lower bounds on the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\), and to characterise those equations for which the lower bounds are achieved. The final step is to move from the bounds depending on Card(W(E)) given by Lemma 9.9 to bounds depending on Card(var(E)) by noting that by Lemma 6.13, there is always an equation in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) for which Card(var(E)) = Card(W(E)).

Lemma 9.11

Let E be a basic regular word equation. Let n = Card(var(E)) and suppose that n ≥ 2. Let V be the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\). Then V ≥ 2n− 1 − 1. Moreover, V = 2n− 1 − 1 if and only if E is regular reversed.

Proof

Let E be a basic regular word equation and let n = Card(var(E)) ≥ 2. Let V = Card([E]) be the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\). W.l.o.g. by Lemma 6.13, we may assume that E has the form \(x \alpha y \doteq y \beta x\) for some x, yX and α, β ∈ (X∖{x, y}). Thus Card(W(E)) = n. Let \(S = \{E^{\prime } \mid E\triangleright ^{*} E^{\prime }\}\). Then by Fact 9.5, \(S \subseteq [E]_{\Rightarrow }\). By Lemma 9.9, Card(S) = 2n− 1 − 1. Hence we have that V ≥ 2n− 1 − 1. Moreover, by Lemma 9.10, S = [E] if and only if E is regular reversed. Hence V = 2n− 1 − 1 if and only if there exists \(E^{\prime } \in [E]_{\Rightarrow }\) such that \(E^{\prime }\) is regular reversed. □

It is worth noting that the lower bound given by Lemma 9.11 is already exponential in the number of variables, which, since we consider basic RWEs, is proportional to the length of the equation. In order to interpret these bounds in the more general (i.e. not basic) case we recall from Section 4 that for any RWE \(\alpha \doteq \beta \), there exist prefixes \(\alpha ^{\prime }, \beta ^{\prime }\) of α and β respectively such that \(E^{\prime }\) given by \(\alpha ^{\prime } \doteq \beta ^{\prime }\) is indecomposable, and such that \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is isomorphic to \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\). In this case, the lower bound on the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) becomes 2m− 1 − 1 where \(m = {\text {Card}}({qv}(E^{\prime }))\).

We conclude this section with the following theorem summarising the bounds on the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\).

Theorem 9.12

Let E be a basic RWE and let n = Card(var(α)). Suppose that n > 1. Let V be the number of vertices in \({\mathscr{G}}^{\Rightarrow }_{[E]}\). Then:
  1. 1.

    \(2^{n-1}-1 \leq V \leq \frac {n!}{2}\),

     
  2. 2.

    V = 2n− 1 − 1 if and only if there exists \(E^{\prime } \in [E]_{\Rightarrow }\) such that \(E^{\prime }\) is regular reversed,

     
  3. 3.

    \(V = \frac {n!}{2}\) if and only if there exists \(E^{\prime } \in [E]_{\Rightarrow }\) such that \(E^{\prime }\) is regular rotated.

     

Proof

Directly from Lemmata 9.2 and 9.11. □

10 DAG-Width

In addition to the size we are also able to give some insights about the connectedness of the graphs, which, as discussed in Section 3.3, are of interest when solving RWEs modulo additional constraints. We show firstly that there exist classes of equations E for which \({dgw}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]})\) may be arbitrarily large.

Theorem 10.1

Let x, y, z0, z1, z2,…,znX. Let E be the equation given by
$$x z_{0} z_{1} z_{2} {\ldots} z_{n} y \doteq y z_{0} z_{n} z_{n-1} {\ldots} z_{1} x.$$
Then \({dgw}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) > n\).

To prove Theorem 10.1, we make use of the k-cops and robber games for directed graphs as introduced by [5]. The following definition is taken directly from [5].

Definition 10.2 (Cops and robber game 5)

Given a directed graph G = (V, E), the k-cops and robber game on G is played between two players, the cop and the robber player. Positions of this game are pairs (X, r) where XVk are the vertices occupied by the cops and rV is the vertex occupied by the robber. The game is played as follows:
  • At the beginning, the cop player chooses X0Vk, and the robber player chooses a vertex r0V, giving position (X0, r0).

  • From position (Xi, ri), if riXi, then the cop player chooses Xi+ 1Vk, and the robber player chooses a vertex ri+ 1V such that there is a directed path from ri to ri+ 1 in the graph G∖(XiXi+ 1).

  • A play in the game is a maximal (finite or infinite) sequence π = (X0, r0),(X1, r1), (X2, r2),… of positions given by the rules above.

  • A play π is winning for the cop player if and only if it is finite. (Note that, by the rules above, this implies that rmXm for the last position (Xm, rm) of this play.) A play π is winning for the robber player if and only if it is infinite.

  • A (k-cop) strategy for the cop player is a function f from Vk × V to Vk. A play (X0, r0),(X1, r1),… is consistent with a strategy f if Xi+ 1 = f(Xi, ri) for all i. The strategy f is called a winning strategy if every play consistent with the strategy is winning for the cop player.

  • The cop number of a directed graph G is the least k such that the cop player has a strategy to win the k-cops and robber game on G.

It is shown in [5] (Theorem 16) that for any directed graph G, there is a DAG-decomposition of G of width at most k only if the cop player has a winning strategy in the k-cops and robber game on G. Thus, to show that a graph G has DAG-width greater than n, it is sufficient to show that there is no n-cop winning strategy in the n-cops and robber game on G. This equivalently amounts to providing a winning strategy for the robber. We shall use this fact to prove Theorem 10.1 as follows. Figure 8 provides an example and depicts how the winning strategy for the robber works.
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig8_HTML.png
Fig. 8

A depiction of the graph \({\mathscr{G}}^{\Rightarrow }_{[E]}\) in the case that \(E = x z_{0} z_{1} z_{2} y \doteq y z_{0} z_{2} z_{1} x\). Thus this is an example of Theorem 10.1 for the case n = 2. The graph is divided into sections corresponding to the (disjoint) sets \(\{E_{i}\} \cup S^{in}_{i} \cup S^{out}_{i}\) for 0 ≤ i ≤ 2. The vertices Ei are highlighted in bold while vertices from \(S_{i}^{in}\) are coloured blue and vertices from \(S_{i}^{out}\) are coloured red. In order to conserve space, vertices belonging to one of these sets are displayed with the LHS and RHS of the equation arranged vertically while for other vertices the equations are omitted. Since there are three values for i, if there are two cops, there will always be at least one i such that no vertex in \(\{E_{i}\} \cup S^{in}_{i} \cup S^{out}_{i}\) has a cop on it. The strategy of the robber is to always be on Ei for such a choice of i. This is due to the fact that for each i and j, there is an path from Ei to Ej visiting only vertices from \(S_{i}^{out}\) and \(S_{j}^{in}\) which can be used as an escape-route (an example for i = 1 and j = 3 is highlighted in bold in the figure). Thus, if at any given stage in the game, a cop moves to a vertex in \(\{E_{i}\} \cup S^{in}_{i} \cup S^{out}_{i}\), the robber can use the escape route to safely move to some Ej for which no vertex in \(\{E_{j}\} \cup S^{in}_{j} \cup S^{out}_{j}\) has a cop on it. The edges making up the escape-route paths needed for this strategy are given by solid arrows, while the other edges which are not used by the robber are dashed

Proof Theorem 10.1.

Note that it is sufficient to show that the DAG-width of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is greater than n, since \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is a subgraph of \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\). For 0 ≤ in, let Ei be the (basic regular) equation given by:
$$x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} z_{2} {\ldots} z_{i-1} y \doteq y z_{i} z_{i-1} {\ldots} z_{1} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x$$
where x, y, z0, z1,…,znX. Note that E = E0. Let V = [E]. Before describing a winning strategy for the robber in the n-cops and robber game on \({\mathscr{G}}^{\Rightarrow }_{[E]}\), we define some useful subsets of vertices of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) as follows. For each i,0 ≤ in and each j,0 ≤ in with j > i, let:
$$ \begin{array}{@{}rcl@{}} {T_{i}^{j}} &=& \{ z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x , \\ && z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x , \\ &&\qquad \qquad \qquad \qquad {\vdots} \\ && z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \} \\ &&\cup \{ z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq\\ &&\qquad\qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x\} \\ &&\cup \{ z_{j+2} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{j} z_{j+1} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad\qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x, \\ & &\qquad \qquad\qquad \qquad {\vdots} \\ & &z_{i-1}x z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-2} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x\}. \end{array} $$
Similarly, for each i,0 ≤ in and each j,0 ≤ jn with j < i, let:
$$ \begin{array}{@{}rcl@{}} {T_{i}^{j}} = \{&& z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x , \\ & &z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x , \\ && \qquad \qquad \qquad \qquad {\vdots} \\ & &z_{j} z_{j+1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{j-1} y\! \doteq\! y z_{i} z_{i-1} \!\ldots\! z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \} \\ \cup \{& & z_{j} z_{j+1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1}{\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x\} \\ \cup \{ && z_{j+2} {\ldots} z_{i-1} x z_{j} z_{j+1} z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1}{\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{j+1} y z_{j} z_{j-1}{\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x, \\ && \qquad \qquad \qquad \qquad {\vdots} \\ && z_{i-1} x z_{j} z_{j+1} {\ldots} z_{i-2} z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{0} z_{n} z_{n-1}{\ldots} z_{i+1} x\}. \end{array} $$

For each i,0 ≤ in, let \(S^{out}_{i} = \bigcup \limits _{0 \leq j \leq n, i\not = j} {T_{i}^{j}}\) and let

$$ \begin{array}{@{}rcl@{}} S^{in}_{i} = \{ && x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} y \doteq z_{j} z_{j-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} yz_{i} z_{i-1} {\ldots} z_{j+1} x \mid j \leq i \}\\ \cup \{ && x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} y \doteq z_{j} z_{j-1} {\ldots} z_{i+1} yz_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} x \mid j > i \}. \end{array} $$
Note that \(S^{in}_{i} = \{E^{\prime } \mid E^{\prime } \Rightarrow _{L}^{*} E_{i} \} \backslash \{E_{i}\}\). Moreover, we shall now show that for each Ei, Ej with ij, there exist \(F_{1}, F_{2},\ldots , F_{k} \in S^{out}_{i}\) and \(G_{1}, G_{2},{\ldots } G_{\ell } \in S^{in}_{j}\) such that
$$ E_{i} \Rightarrow F_{1} \Rightarrow F_{2} \Rightarrow {\ldots} F_{k} \Rightarrow G_{1} \Rightarrow G_{2} \Rightarrow {\ldots} \Rightarrow G_{\ell} \Rightarrow E_{j}. $$
(4)
Indeed, observe that
$$ \begin{array}{@{}rcl@{}} E_{i}& & \Rightarrow z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \\ && \Rightarrow z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \\ && \qquad \qquad \qquad \qquad {\vdots} \\ && \Rightarrow z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \\ &&\Rightarrow z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x \\ &&\Rightarrow z_{j+2} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{j} z_{j+1} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x \\ && \qquad \qquad \qquad \qquad {\vdots} \\ & &\Rightarrow z_{i-1}x z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-2} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x \\ &&\Rightarrow x z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad\qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x \in S_{j}^{in}. \end{array} $$

Thus, there exist \(F_{1}, F_{2},\ldots , F_{k} \in S^{out}_{i}\) and \(G_{1} \in S_{j}^{in}\) such that EiF1F2 ⇒… ⇒ FkG1. By definition, \(S^{in}_{j} = \{E^{\prime } \mid E^{\prime } \Rightarrow _{L}^{*} E_{j}\}\backslash \{E_{j}\}\), so it follows directly that there exist \(G_{2},\ldots ,G_{\ell } \in S^{in}_{j}\) such that G1G2 ⇒… ⇒ Ej as claimed.

Consequently, we may conclude that \(S^{in}_{i} \cup S^{out}_{i} \{E_{i}\} \subset [E]_{\Rightarrow }\) for all i,0 ≤ in. Clearly, each Ei, 0 ≤ in is not contained in any \({S_{j}^{Z}}\) for 0 ≤ jn and Z ∈{in, out}. Furthermore, since the RHS of every equation in \(S_{i}^{out}\) has either yzi or zi as a prefix, \(S_{i}^{out} \cap S_{j}^{out} = \emptyset \) whenever ij. Similarly since the LHS of every equation in \(S^{in}_{i}\) has xzi as a prefix, \(S_{i}^{in} \cap S_{j}^{in} = \emptyset \) whenever ij. Since the LHS of all equations in \(S^{in}_{i}\) has x as a prefix, and since the LHS all equations in \(S^{out}_{j}\) does not have x as a prefix, we may conclude further that \({S^{Z}_{i}} \cap S^{Z^{\prime }}_{j} = \emptyset \) for all ij and \(Z,Z^{\prime } \in \{in,out\}\).

We are now ready to give the strategy for the robber in the n-cops and robber game on \({\mathscr{G}}^{\Rightarrow }_{[E]}\). We shall say that Ei is a ‘safe’ vertex if \(S_{i}^{in} \cup S_{i}^{out} \cup \{E_{i}\}\) contains no vertex with a cop on it. Since there are only n cops, it follows from the fact that the sets \(S_{i}^{in} \cup S_{i}^{out} \cup \{E_{i}\}\) are pairwise disjoint that, at any given time, there must be at least one i,0 ≤ in such that Ei is safe. By definition, if the robber is on a safe vertex, then there is no cop also on that vertex, so the play continues.

Clearly, if the cop player chooses an initial placement \(X_{0} \in [E]_{\Rightarrow }^{\leq n}\), then the robber may be placed on a safe vertex \(r_{0} = E_{i_{1}}\) for some i1,0 ≤ i1n. Now, suppose after k steps in the game the position is (Xk, rk) where rk is a safe vertex. Then we shall show that, whatever the cop player chooses for Xk+ 1, the robber may choose rk+ 1 such that rk+ 1 is safe. Indeed, if \(r_{k} = E_{i_{k}}\) for some ik,0 ≤ ikn is safe, then \((S^{out}_{i_{k}} \cup \{E_{k_{i}}\}) \cap X_{k} = \emptyset \). Moreover, since there are only n cops, whatever the choice of Xk+ 1, there exists \(r_{k+1} = E_{i_{k+1}}\) for some ik+ 1,0 ≤ ik+ 1n such that \(E_{i_{k+1}}\) is safe, meaning that \(X_{k+1} \cap (S^{in}_{i_{k+1}}\cup \{E_{i_{k+1}}\}) = \emptyset \). It follows that \(S_{i_{k}}^{out} \cup S_{i_{k+1}}^{in} \cup \{E_{i_{k}}, E_{i_{k+1}}\} \subset [E]_{\Rightarrow } \backslash (X_{k+1}\cap X_{k})\). We have already shown (Equation 4) that there is a directed path in \({\mathscr{G}}^{\Rightarrow }_{[E]}\) using only vertices from \(S_{k_{i}}^{out} \cup S_{k_{i+1}}^{in} \cup \{E_{i_{k}}, E_{i_{k+1}}\}\) from \(r_{k} (= E_{i_{k}})\) to \(r_{k+1} (= E_{i_{k+1}})\), and hence (Xk+ 1, rk+ 1) is a valid next position satisfying the rules of the game. Since rk+ 1 is also safe, this proves our claim, and by a simple induction, it follows that for any n-cop strategy, there is an infinite play (i.e. robber wins). It follows that there is no winning n-cop strategy, so the DAG-width of \({\mathscr{G}}^{\Rightarrow }_{[E]}\) is greater than n as required. □

Since high connectivity can be seen as an obstacle to deciding the satisfiability problem with additional constraints, it is also worth noting classes for which the DAG-width is bounded by a small constant. If all variables occur at most once in an equation E, then it is not difficult to see that the graph \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) will be a DAG. However, when variables may occur more than once, the graphs of even very simple equations such as \(x {\mathtt {a}}{\mathtt {b}} \doteq {\mathtt {b}} {\mathtt {a}} x\) will contain cycles, and will therefore have DAG-width at least two. The following theorem describes an infinite class of equations for which the DAG-width of \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) is at most two. It is worth pointing out that the NP-hardness result for the satisfiability problem for regular word equations from [8] applies to this class, and so, by Theorem 8.12, this class also has an NP-complete satisfiability problem.

Theorem 10.3

Let α1, α2,…,αn, β1, β2,…,βnX such that
  1. 1.

    |αi| = |βi|∈{1,2,3} for 1 ≤ in, and

     
  2. 2.

    var(αi) = var(βi) for 1 ≤ in, and

     
  3. 3.

    var(αi) ∩ var(αj) = for 1 ≤ i, jn with ij.

     
Let E be the RWE \(\alpha _{1}\alpha _{2}{\ldots } \alpha _{n} \doteq \beta _{1}\beta _{2} {\ldots } \beta _{n}\). Then \({dgw}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \leq 2\).

Proof

Let E be of the form described in the theorem. By Proposition 3.5,
$${dgw}(\mathscr{G}^{\Rightarrow_{NT}}_{[E]}) = \max \{ m \mid E \Rightarrow_{NT}^{*} E^{\prime} \text{ and } m = {dgw}(\mathscr{G}^{\Rightarrow}_{[E^{\prime}]}) \}.$$
Let \({\mathscr{C}}\) be the subclass of RWEs of the form \( \alpha _{1}\alpha _{2}{\ldots } \alpha _{k} \doteq \beta _{1}\beta _{2}{\ldots } \beta _{k}\) where \(k \in \mathbb {N}_{0}\) such that:
  1. 1.

    αi, βiXwith |αi| = |βi|∈{1,2,3} for 1 ≤ ik, and

     
  2. 2.

    var(αi) = var(βi) for 1 ≤ ik, and

     
  3. 3.

    var(αi) ∩ var(αj) = for all ij, 1 ≤ i, jk.

     
Clearly, we have \(E \in {\mathscr{C}}\). Since k is not restricted, we may also assume w.l.o.g. that for any word equation in \({\mathscr{C}}\), the ‘sub-equations’ \(\alpha _{i} \doteq \beta _{i}\) are indecomposable. Moreover, if \(E^{\prime }\) is not the equation \(\varepsilon \doteq \varepsilon \), we may also assume that |α1|≥ 1. Under these assumptions, it follows from Corollary 4.4 that for any \(E^{\prime } \in {\mathscr{C}}\), the graph \({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}\) is isomorphic to the graph \({\mathscr{G}}^{\Rightarrow }_{[\alpha _{1} \doteq \beta _{1}]}\). There are four possibilities for \(\alpha _{1} \doteq \beta _{1}\) (up to a renaming of the variables, which does not alter the structure of the graph \({\mathscr{G}}^{\Rightarrow }_{[\alpha _{1} \doteq \beta _{1}]}\)), namely \(x \doteq x\), \(xy \doteq yx\), \(xyz \doteq zyx\), \(xyz \doteq yzx\) and \(xyz \doteq zxy\). It is easily verified by hand that in all cases the DAG-width is at most two (it is exactly two in the cases where |α1| = |β1| = 3). Moreover, it follows from the definitions that if \(E_{1} \in {\mathscr{C}}\) and E1NTE2 for some E2, then \(E_{2} \in {\mathscr{C}}\). Consequently, we have that
$${dgw}(\mathscr{G}^{\Rightarrow_{NT}}_{[E]}) = \max \{ m \mid E \Rightarrow_{NT}^{*} E^{\prime} \text{ and } m= {dgw}(\mathscr{G}^{\Rightarrow}_{[E^{\prime}]}) \} \leq 2.$$

11 Extension to Systems of Equations

So far, we have considered individual equations. However, it is often the case that there is not just one equation to be solved, but a system of several equations which should be satisfied concurrently. However, while constructions exist which transform a system of equations into a single equation (see e.g. [17]), the resulting equation will generally not be quadratic/regular. We extend the definition of regular equations to regular systems as follows.

Definition 11.1 (Regular systems)

Let \({{\varTheta }} = \{\alpha _{1} \doteq \beta _{1},\alpha _{2} \doteq \beta _{2}, \ldots , \alpha _{n} \doteq \beta _{n}\}\) be a system of word equations. An orientation of Θ is any element of \(\{\alpha _{1} \doteq \beta _{1}, \beta _{1} \doteq \alpha _{1}\} \times \{\alpha _{2} \doteq \beta _{2}, \beta _{2} \doteq \alpha _{2}\} \times {\ldots } \times \{\alpha _{n} \doteq \beta _{n}, \beta _{n} \doteq \alpha _{n}\}\). We say that Θ is regular if it has an orientation for which each variable occurs at most once across all LHSs and at most once across all RHSs.

We can easily adapt the algorithm from Section 3 to work more generally for systems of word equations, and with careful application, still make use of Theorem 8.11 in order to obtain (non-deterministic) polynomial running time. To do this, we need to extend the rewriting transformations (Nielsen transformations) underpinning the relation ⇒NT which we have thus far defined for single equations only. Note that each possible rewriting of a single equation can be achieved by firstly applying a morphism to both sides of the equation then followed, if applicable, by cancelling the longest identical prefixes of the new LHS and RHS. For example, the rewriting \(x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}} \Rightarrow _{NT} {\mathtt {a}} xy z {\mathtt {b}} {\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}\) consists of applying the morphism ψy>x (cf. Section 3) to both sides of the first equation in order to get \(x {\mathtt {a}} xy z {\mathtt {b}} {\mathtt {a}} \doteq x y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}\) and then cancelling the resulting leftmost occurrences of x.

The generalisation of the Nielsen Transformations to systems of equations is straightforward: we select one of the word equations E from the system, and apply any of the possible transformations to it as before. Then we simply need to apply the associated morphism to both sides of all the other equations in the system, followed by any further resulting cancellations. We shall say that such a transformation is rooted on the chosen equation E, and we shall write \({{\varTheta }} \Rightarrow _{NT}^{E} {{\varTheta }}^{\prime }\) if \({{\varTheta }}, {{\varTheta }}^{\prime }\) are systems of word equations such that \({{\varTheta }}^{\prime }\) is the result of applying a transformation rooted on E to Θ. So if, for example, we have the system \(\{x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}, w{\mathtt {b}} {\mathtt {a}} \doteq {\mathtt {a}}{\mathtt {b}} x\}\), then one possible transformation of the first equation is \(x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}} \Rightarrow _{NT} x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}\) obtained by applying the morphism ψx>y and cancelling the resulting leftmost occurrences of y. To extend this transformation to the whole system, we just need to apply ψx>y to the other equation (no further cancellation is required in this case) so we have \(\{x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}, w{\mathtt {b}} {\mathtt {a}} \doteq {\mathtt {a}}{\mathtt {b}} x\} \Rightarrow _{NT}^{E} \{x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}, w{\mathtt {b}} {\mathtt {a}} \doteq {\mathtt {a}}{\mathtt {b}} yx\}\) where E is the equation \(x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}\).

Taking the length |Θ| of a system Θ of word equations to be the sum of the lengths of all the individual word equations, it is easily seen that the important properties of this rewriting carry over to the case of systems. Specifically, it is easily verified that for any regular system Θ of word equations each of the following holds:
  1. 1.

    If EΘ and \({{\varTheta }} \Rightarrow _{NT}^{E} {{\varTheta }}^{\prime }\), then \({{\varTheta }}^{\prime }\) is also regular,

     
  2. 2.

    If EΘ and \({{\varTheta }} \Rightarrow _{NT}^{E} {{\varTheta }}^{\prime }\), then \(|{{\varTheta }}^{\prime }| \leq |{{\varTheta }}|\),

     
  3. 3.

    for any solution h to Θ, and for any EΘ with |E| > 0 there exists a system \({{\varTheta }}^{\prime }\) with a solution \(h^{\prime }\) such that \({{\varTheta }} \Rightarrow _{NT}^{E} {{\varTheta }}^{\prime }\) and either \(h^{\prime }\) is smaller than h or \(|{{\varTheta }}^{\prime }| < |{{\varTheta }}|\).

     

With this in mind, we are now able to extend our main result that solving regular word equations is in NP to include regular systems of equations.

Theorem 11.2

The satisfiability problem for regular systems of equations is NP-complete. Moreover, whether a system of word equations is regular can be decided in polynomial time.

Proof

Since the satisfiability problem is NP-hard for regular word equations, it is also NP-hard for regular systems of word equations. Next we shall show inclusion in NP. Let Θ = {E1, E2,…,En} be a regular system of equations. From Observations 1-3 above, there is a solution to Θ if and only if there exists a finite sequence of transformations
$${{\varTheta}}_{0} \Rightarrow_{NT}^{\hat{E}_{1}} {{\varTheta}}_{1} \Rightarrow_{NT}^{\hat{E}_{2}} {\ldots} \Rightarrow_{NT}^{\hat{E}_{m}}{{\varTheta}}_{m}$$
satisfying Θ = Θ0, \({{\varTheta }}_{m} = \{\varepsilon \doteq \varepsilon \}\) and \(\hat {E}_{i} \in {{\varTheta }}_{i-1}\) for 1 ≤ im. In fact, by Observation 3, we may freely choose each \(\hat {E_{i}}\) to be any equation from Θi− 1, and such a finite sequence must still exist whenever there is a solution. Consequently, we may decide whether or not a solution exists with the following procedure (Algorithm 1) which searches for such a sequence by applying firstly transformations rooted on the first equation, followed transformations rooted on the second equation, then the third, etc. For convenience, we shall represent Θ as an ordered list [E1, E2,…,En] rather than a set.
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Figa_HTML.png

We begin by non-deterministically applying a sequence of Nielsen transformations (generalised for systems of word equations) rooted on the first equation in the list until we reach a system of the form \([\varepsilon \doteq \varepsilon ,E_{2}^{\prime },\ldots ,E_{n}^{\prime }]\). If we are not able to transform E1 into \(\varepsilon \doteq \varepsilon \), then no solution to E1 exists and the system has no solution.

Otherwise, once we have transformed E1 into the \(\varepsilon \doteq \varepsilon \), we repeat the process of applying the generalised Nielsen transformations to the (new) second equation \(E_{2}^{\prime }\) until it has also been transformed into \(\varepsilon \doteq \varepsilon \) (note that none of the transformations will change the trivial equation \(\varepsilon \doteq \varepsilon \)). Continue to repeat this process for each equation, in increasing order, until either an equation is reached which cannot be transformed into \(\varepsilon \doteq \varepsilon \), or until we have transformed all equations into this form. In the former case, there is no solution, while in the latter case, a solution exists.

It remains to be seen that we can implement the procedure just described such that it runs in non-deterministic polynomial time. For this, we need a few further observations. The first is that when applying transformations rooted on the ith equation, we are essentially traversing the same graph \({\mathscr{G}}^{\Rightarrow _{NT}}_{[\tilde {E_{i}}]}\) as if we were to consider in isolation the equation \(\tilde {E_{i}}\) obtained after transforming the first i − 1 equations into \(\varepsilon \doteq \varepsilon \). The only difference is that we are potentially changing the other equations as we go. The second important observation is that any transformation rooted on the ith equation which changes any of the other (non-root) equations must necessarily decrease the length of the ith equation. Finally, the equation on which a transformation is rooted never increases in length as a result of that transformation. Thus, by applying the transformations in the order specified, we never increase the length of ith equation once it becomes the current root.

Consequently, when applying transformations which preserve the length of the ith equation, we may, without affecting the outcome, take the shortest path through the graph. Moreover, since we can only decrease the length of an equation a linear number of times, the maximum number of transformations rooted on the ith equation needed in order to find a solution when one exists is bounded above by
$$ C_{i} = |\tilde{E_{i}}| \max\{ {diam}(\mathscr{G}_{[E]}^{\Rightarrow}) \mid \tilde{E_{i}} \Rightarrow_{NT}^{*} E\}.$$
By Theorem 8.11, we can easily compute an upper bound \(C_{{{\varTheta }}} \geq \max \limits \{C_{i} \mid 1 \leq i \leq n\}\) on the number of transformations needed which allows us to restrict the above procedure such that it works in non-deterministic polynomial time without affecting the correctness.

Finally, we describe the following procedure (Algorithm 2) for determining if a system Θ = {E1, E2,…,En} is regular. First we check that each individual equation is regular and that no variable occurs more than twice across the whole system. We then initialise two sets L and R to the empty set. The sets L and R will keep track of variables occurring across the LHS’s and RHS’s of an orientation of Θ. We remove equations \(\alpha \doteq \beta \) from Θ one-by-one, deciding each time whether \(\alpha \doteq \beta \) or \(\beta \doteq \alpha \) should be included in the orientation and updating L and R accordingly.

While there are still equations left in the system, there are two cases to consider. The first is that there exists an equation \(\alpha \doteq \beta \in {{\varTheta }}\) which contains at least one variable x which is already in L or R. In this case, we can rule out at least one choice of \(\alpha \doteq \beta \) or \(\beta \doteq \alpha \) when constructing an orientation satisfying the definition for regular systems. In particular, if xL, then whichever of α, β contains x should be the RHS in the orientation (so, if x occurs in α, we include \(\beta \doteq \alpha \) in the orientation instead of \(\alpha \doteq \beta \)). Likewise if xR then whichever of α, β contains x should be the LHS. Once we have decided which of \(\alpha \doteq \beta \) and \(\beta \doteq \alpha \) is a bad choice (in that it would lead to two occurrences of x in either the LHS’s or RHS’s), we need to check that the remaining “oriented” equation does not lead to a similar conflict (possibly for one of the other variables). To do this, we simply need to check that the LHS does not share any variables with L and likewise that the RHS does not share any variables with R. If this test is failed then our system is not regular and we can stop and return “No”. Otherwise we add all the variables from the LHS of the oriented equation to L and all the variables from the RHS to R. Then we remove the equation \(\alpha \doteq \beta \) from the system Θ and continue.
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Figb_HTML.png

The second case is when none of the variables occurring in the remaining equations are contained in either L or R. In this case, how we construct the rest of the orientation is not dependant on the previous choices. Moreover, for any orientation satisfying the definition, we can find another by simply swapping the LHS’s and RHS’s of all equations. Thus by symmetry, we may include any single one of the remaining equations in the orientation without exchanging the LHS and RHS, and without affecting the possibility of constructing a valid orientation in the end. Thus, we then pick any of the remaining equations \(\alpha \doteq \beta \) at random and add the variables from α to L and all the variables from β to R, before removing \(\alpha \doteq \beta \) from Θ and continuing. If we are able to iterate through and discard all equations in the system like this without returning “No”, then the system is regular and we may return “Yes”. The correctness, along with the fact that the procedure runs in polynomial time are easily verified. □

12 Conclusions

A famous algorithm for solving quadratic word equations can be used to produce a (directed) graph containing all solutions to the equation. In the case of regular equations, we have described some underlying structures of these graphs with the intention of better understanding their solution sets. We give bounds on their diameter and number of vertices, as well as provide classes with bounded (resp. unbounded) DAG-width. Probably the most significant result arising from our analysis is that the satisfiability problem for regular word equations is in NP (and thus NP-complete), which we also extend to regular systems of equations.

We leave open many interesting problems, the most obvious of which is to generalise our results to the (full) quadratic case. We also believe that our analysis and techniques open up the possibility to investigate in far more detail the graphs \({\mathscr{G}}^{\Rightarrow }_{[E]}\), both in the case of regular equations and more generally. For example, in light of our results, it seems reasonable to suggest that determining whether E1E2 for two regular equations E1 and E2 may be done in polynomial time. A particularly nice characterisation of E1 and E2 such that E1E2 might yield a much quicker algorithm than the one resulting from our bound on the diameter of \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) by significantly reducing the degree of the polynomial. We also expect that a detailed analysis of the length-reducing transformations and symmetries which may be found there would be particularly helpful in understanding further the structure of solution sets and the performance of algorithms solving regular equations in practice.

Finally, we mention the task of investigating the decidability of the satisfiability problem for regular equations with additional constraints, in particular length constraints, with the hope that having identified cases where the DAG-width is particularly high/low, along with improved means to describe precisely the structure of the solution-graphs, might provide some useful hints with how to proceed in this direction.

Acknowledgements

We thank the anonymous referees for their detailed and thoughtful comments.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

insite
CONTENT
download
DOWNLOAD
print
PRINT
Footnotes
1

Each choice of edge in a walk can be seen as a decision about the corresponding solution. It is not necessarily true that different walks will result in different solutions. However, all possible decisions are accounted for, so it is guaranteed that for every solution there is a walk from E to \(\varepsilon \doteq \varepsilon \) which corresponds to that solution.

 
2

We consider the number of vertices, rather than edges, because it is the number of vertices which is relevant to the performance of the algorithm, and by definition of ⇒NT, the out-degree of the graph is bounded by a constant so the the number of edges is linear in the number of vertices.

 
3

There are several possible variations on the definition of the length-reducing rewriting transformations ⇒> for which the algorithm remains correct and is guaranteed to terminate. However, for our results, the exact choice is not important as we concentrate our investigations on the length preserving part ⇒ of the rewriting relation for reasons described in Section 3.2.

 
4

The case that dgw(G1) = 1 and dgw(G2) = 2 is a special case arising from the possibility of ‘isolated cycles’ being compressed into singleton self-loops.

 
5

The first case corresponds to the possibility that QE(y) = (x, x) for some variable y. The second case corresponds to the possibility that QE(#) = (x, x), meaning that E has the form \(y\alpha _{1} xz \alpha _{2} \doteq z \beta _{1} xy \beta _{2} \), with x, y, zX and α1, α2, β1, β2X, in which case \(E \Rightarrow \alpha _{1} xyz \alpha _{2} \doteq z \beta _{1} xy \beta _{2} \).

 
6

It is worth noting that since basic RWEs are indecomposable, Card(W(E)) ≥ 2 whenever Card(var(E)) ≥ 2.

 
Literature
  1. Abdulla, P.A., Atig, M.F., Chen, Y., Holík, L., Rezine, A., Rümmer, P., Stenman, J.: Norn: An SMT Solver for String Constraints. In: Proc. Computer Aided Verification (CAV), Lecture Notes in Computer Science (LNCS), vol. 9206, pp 462–469 (2015)
  2. Alkhalaf, M., Bultan, T., Yu, F.: STRANGER: an Automata-Based String Analysis Tool for PHP. In: Proc. Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Lecture Notes in Computer Science (LNCS), vol. 6015 (2010)
  3. Angluin, D.: Finding patterns common to a set of strings. J. Comput. Syst. Sci. 21, 46–62 (1980)MathSciNetView Article
  4. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T., Reynolds, A., Tinelli, C.: CVC4. In: Proc. Computer Aided Verification (CAV), Lecture Notes In Computer Science (LNCS), vol. 6806, pp 171–177 (2011)
  5. Berwanger, D., Dawar, A., Hunter, P., Kreutzer, S., Obdrzálek, J.: The DAG-width of directed graphs. J Combin Theory Series B 102(4), 900–923 (2012)MathSciNetView Article
  6. Berzish, M., Ganesh, V., Zheng, Y.: Z3str3: a String Solver with Theory-Aware Heuristics. In: Proc. Formal Methods in Computer-Aided Design (FMCAD), pp. 55–59. IEEE (2017)
  7. Day, J.D., Ganesh, V., He, P., Manea, F., Nowotka, D.: The Satisfiability of Word Equations: Decidable and Undecidable Theories. In: Potapov, I., Reynier, P. (eds.) Proc. 12th International Conference on Reachability Problems, RP 2018, Lecture Notes in Computer Science (LNCS), vol. 11123, pp 15–29 (2018)
  8. Day, J.D., Manea, F., Nowotka, D.: The Hardness of Solving Simple Word Equations. In: Proc. Mathematical Foundations of Computer Science (MFCS), LIPIcs, vol. 83, pp 18:1–18:14 (2017)
  9. Day, J.D., Manea, F., Nowotka, D.: Upper Bounds on the Length of Minimal Solutions to Certain Quadratic Word Equations. In: Proc. Mathematical Foundations of Computer Science (MFCS), LIPIcs, vol. 138, pp 44:1–44:15 (2019)
  10. Diekert, V., Jeż, A., Plandowski, W.: Finding all solutions of equations in free groups and monoids with involution. Inf. Comput. 251, 263–286 (2016)MathSciNetView Article
  11. Diekert, V., Robson, J.M.: On Quadratic Word Equations. In: Proc. 16Th Annual Symposium on Theoretical Aspects of Computer Science, STACS, Lecture Notes in Computer Science (LNCS), vol. 1563, pp 217–226 (1999)
  12. Ehrenfeucht, A., Rozenberg, G.: Finding a homomorphism between two words is NP-complete. Inf. Process. Lett. 9, 86–88 (1979)MathSciNetView Article
  13. Freydenberger, D.D.: A logic for document spanners. Theory of Computing Systems 63(7), 1679–1754 (2019)MathSciNetView Article
  14. Freydenberger, D.D., Holldack, M.: Document spanners: From expressive power to decision problems. Theory of Computing Systems 62(4), 854–898 (2018)MathSciNetView Article
  15. Jeż, A.: Recompression: a simple and powerful technique for word equations. J. ACM 63 (2016)
  16. Jeż, A.: Word Equations in Nondeterministic Linear Space. In: Proc. International Colloquium on Automata, Languages and Programming (ICALP), LIPIcs, vol. 80, pp 95:1–95:13 (2017)
  17. Karhumäki, J., Mignosi, F., Plandowski, W.: The expressibility of languages and relations by word equations. J. ACM 47, 483–505 (2000)MathSciNetView Article
  18. Kiezun, A., Ganesh, V., Guo, P.J., Hooimeijer, P., Ernst, M.D.: HAMPI: a Solver for String Constraints. In: Proc. ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), pp 105–116. ACM (2009)
  19. Lin, A.W., Barceló, P.: String Solving with Word Equations and Transducers: Towards a Logic for Analysing Mutation Xss. In: ACM SIGPLAN Notices, vol. 51, pp 123–136. ACM (2016)
  20. Lin, A.W., Majumdar, R.: Quadratic Word Equations with Length Constraints, Counter Systems, and Presburger Arithmetic with Divisibility. In: Lahiri, S.K., Wang, C. (eds.) Proc. 16th International Symposium on Automated Technology for Verification and Analysis (ATVA), Lecture Notes in Computer Science (LNCS), vol. 11138, pp 352–369. Springer (2018)
  21. Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, New York (2002)View Article
  22. Makanin, G.S.: The problem of solvability of equations in a free semigroup. Sbornik: Mathematics 32(2), 129–198 (1977)MathSciNetView Article
  23. Manea, F., Nowotka, D., Schmid, M.L.: On the complexity of solving restricted word equations. Int. J. Found. Comput. Sci. 29(5), 893–909 (2018)MathSciNetView Article
  24. Petre, E.: An Elementary Proof for the Non-Parametrizability of the Equation Xyz = Zvx. In: Proc. 29Th International Symposium on Mathematical Foundations of Computer Science (MFCS), Lecture Notes in Computer Science (LNCS), vol. 3153, pp 807–817 (2004)
  25. Plandowski, W.: Satisfiability of Word Equations with Constants is in PSPACE. In: Proc. Foundations of Computer Science (FOCS), pp. 495–500. IEEE (1999)
  26. Plandowski, W., Rytter, W.: Application of Lempel-Ziv Encodings to the Solution of Words Equations. In: Proc. International Colloquium on Automata, Languages and Programming (ICALP), Lecture Notes in Computer Science (LNCS), vol. 1443, pp 731–742 (1998)
  27. Schulz, K.U.: Makanin’s Algorithm for Word Equations-Two Improvements and a Generalization. In: International Workshop on Word Equations and Related Topics, pp. 85–150. Springer (1990)
Metadata
Title
On the structure of solution-sets to regular word equations
Authors
Joel D. Day
Florin Manea
Publication date
28-10-2021
Publisher
Springer US
Published in
Theory of Computing Systems
Print ISSN: 1432-4350
Electronic ISSN: 1433-0490
DOI
https://doi.org/10.1007/s00224-021-10058-5

Premium Partner