Top

Theory of Computing Systems

Open Access 28-10-2021

On the structure of solution-sets to regular word equations

Authors: Joel D. Day, Florin Manea

Published in: Theory of Computing Systems

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

For quadratic word equations, there exists an algorithm based on rewriting rules which generates a directed graph describing all solutions to the equation. For regular word equations – those for which each variable occurs at most once on each side of the equation – we investigate the properties of this graph, such as bounds on its diameter, size, and DAG-width, as well as providing some insights into symmetries in its structure. As a consequence, we obtain a combinatorial proof that the problem of deciding whether a regular word equation has a solution is in NP.

This article belongs to the Topical Collection: Special Issue on International Colloquium on Automata, Languages and Programming (ICALP 2020)

Guest Editors: Artur Czumaj and Anuj Dawar

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

On the structure of solution-sets to regular word equations

Joel D. Day¹Orchid and Florin Manea²

Theory of Computing Systems2021:10058

DOI: 10.1007/s00224-021-10058-5

Accepted: 4 August 2021

Published: 28 October 2021

Abstract

Keywords

Quadratic word equations Regular word equations String solving NP

1 Introduction

A word equation is a tuple (α, β), which we shall usually write as $\alpha \doteq \beta $, such that α and β are words comprised of letters from a terminal alphabetΣ = {,,…} and variables from a set X = {x, y, z,…}. Solutions are substitutions of the variables for words in Σ^∗ making both sides identical. For example, one solution to the word equation $x {\mathtt {a}} {\mathtt {b}} y \doteq y {\mathtt {b}} {\mathtt {a}} x$ is given by x → and y →. A system of equations is a set of equations, and a solution to the system is a substitution for the variables which is a solution to all the equations in the system.

One of the most fundamental questions concerning word equations is the satisfiability problem: determining whether or not a word equation has a solution. The first general algorithm for the satisfiability problem was presented by Makanin [22] in 1977. Since then, several further algorithms have been presented. Most notable among these are the algorithm given by Plandowski [25] which demonstrated that the problem is included in the complexity class PSPACE, the algorithm based on Lempel-Ziv encodings by Plandowksi and Rytter [26], and the method of recompression by Jeż, which has since been shown to require only non-deterministic linear space [15, 16]. On the other hand, it is easily seen that solving word equations is NP-hard due to fact that the subcase when one side of the equation consists only of terminals is exactly the pattern matching problem which is NP-complete [3, 12]. It remains a long-standing open problem whether or not the satisfiability problem for word equations is contained in NP.

Recently, there has been elevated interest in solving more general versions of the satisfiability problem, originating from practical applications in e.g. software verification where several string solving tools capable of solving word equations are being developed [1, 2, 4, 6, 18] and database theory [13, 14], where one asks whether a given (system of) word equation(s) has a solution which satisfies some additional constraints. Prominent examples include requiring that the substitution for a variable x belongs to some regular language ${\mathscr{L}}_{x}$ (regular constraints), or that the lengths of the substitutions of the variables satisfy a set of given linear diophantine equations. Adding regular constraints makes the problem PSPACE complete (see [10, 25, 27]), while it is another long standing open problem whether the satisfiability problem with length constraints is decidable. There are also many other kinds of constraints, however many lead to undecidable variants of the satisfiability problem [7, 19]. The main difficulty in dealing with additional constraints is that the solution-sets to word equations are often infinite sets with complex structures. For example, they are not parametrisable [24], and the set of lengths of solutions is generally not definable in Presburger arithmetic [20]. Thus, a better understanding of the solution-sets and their structures is a key aspect of improving our ability to solve problems relating to word equations both in theory and practice.

Quadratic word equations (QWEs) are equations in which each variable occurs at most twice. For QWEs, a conceptually simple and easily implemented algorithm exists which produces a representation of the set of all solutions as a graph. Despite this, however, the satisfiability problem for quadratic equations remains NP-hard, even for severely restricted subclasses [8, 11], while inclusion in NP, and whether the satisfiability problem with length constraints is decidable, have remained open for a long time, just as for the general case.

The algorithm solving QWEs is based on iteratively rewriting the equation(s) according to some simple rules called Nielsen transformations. If there exists a sequence of transformations from the original equation to the trivial equation $\varepsilon \doteq \varepsilon $, then the equation has a solution. Otherwise, there is no solution. Hence the satisfiability problem becomes a reachability problem for the underlying rewriting transformation relation, which we denote ⇒_NT. It is natural to represent this relation as a directed graph ${\mathscr{G}}^{\Rightarrow _{NT}}$ in which the vertices are word equations and the edges are the rewriting transformations. This has the advantage that the set of all solutions to an equation E corresponds exactly to the set of walks in the graph starting at E and finishing at the trivial equation $\varepsilon \doteq \varepsilon $.¹ Consequently, the properties of the subgraph of ${\mathscr{G}}^{\Rightarrow _{NT}}$ containing all vertices reachable from E (denoted ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$) are also informative about the set of solutions to the equation. For example, in [24] a connection is made between the non-parametrisability of the solution set of E and the occurrence of combinations of cycles in the graph. Since equations with a paramtrisable solution set are much easier to work with when dealing with additional constraints, this also establishes a connection between the structure of ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ and the potential (un)decidability of variants of the satisfiability problem. Moreover, new insights into the structure and symmetries of these graphs are necessary for better understanding and optimising the practical performance of the algorithm.

Our contribution

We consider a subclass of QWEs called regular equations (RWEs) introduced in [23]. A word equation is regular if each variable occurs at most once on each side of the equation. Thus, for example, $x {\mathtt {a}}{\mathtt {b}} y \doteq y {\mathtt {b}} {\mathtt {a}} x$ is regular while $x {\mathtt {a}} {\mathtt {b}} x \doteq y {\mathtt {b}} {\mathtt {a}} y$ is not. Understanding RWEs is a vital step towards understanding the quadratic case, not only because they constitute a significant and general subclass, but also because many non-regular quadratic equations can exhibit the same behaviour as regular ones (consider, e.g. HCode $zz \doteq x{\mathtt {a}}{\mathtt {b}} y y{\mathtt {b}}{\mathtt {a}} x$ for which all solutions must satisfy z = xy = yx). The satisfiability problem was shown in [8] to be NP-hard for RWEs, and shown to be in NP in [9] for some restricted subclasses including the classes of regular-reversed and regular-ordered equations.

For RWEs E, we investigate the structure of the graphs ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$, and as a consequence, are able to describe some of their most important properties. We achieve this by first noting that ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ can be divided into strongly connected components ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ for which all the vertices are equations of the same length (⇒ shall be used to denote the restriction of ⇒_NT to length preserving transformations only). The ‘full’ graph ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ is comprised of these individual components ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ arranged in a DAG-like structure of linear depth (see Section 3) and therefore many properties and parameters of the ‘full’ graph ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ are determined by the equivalent properties and parameters of the individual components ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$. We then focus on the structure of the subgraphs ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$, and as a result are able to give bounds on certain parameters such as diameter, size, and DAG-width.

Our structural results come in two stages, based on whether the equation belongs to a the class of ‘jumbled’ equations introduced in Section 6. In the first stage, we consider equations which are not jumbled, and we show that for all such equations E, there exists a jumbled equation $\hat {E}$ such that ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is comprised mainly of several well-connected near-copies of ${\mathscr{G}}^{\Rightarrow }_{[\hat {E}]}$. For jumbled equations $\hat {E}$, we show in Section 7 that every vertex in ${\mathscr{G}}^{\Rightarrow }_{[\hat {E}]}$ is close to a vertex in a certain normal form. We show that the vertices in this normal form are determined to a large extent by a property invariant under ⇒ introduced in Section 5.

With regards to the diameter of ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$, we give upper bounds which are polynomial in the length of the equation. It follows that the diameter of the full graph ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ is also polynomial, and consequently, that the satisfiability problem for RWEs is NP-complete. This can be generalised to systems of equations satisfying a natural extension of the regularity property (see Section 11). We also give exact upper and lower bounds on the number of vertices² in ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ for a subclass of RWEs called basic RWEs (see Section 4), as well as describing exactly for which equations these bounds are achieved. For RWEs which are not basic, we can infer similar bounds, at the cost of a small (linear in the length of the equation) degree of imprecision. Since in the worst case (e.g. for equations without a solution), running the algorithm will perform a full ‘search’ of the graph, the number of vertices is integral to the running time of the algorithm, and is potentially a better indicator of difficult instances than the complexity class alone. An example of this, comes from comparing two subclasses of RWEs called regular-ordered and regular rotated equations. It follows from our results that while both classes have an NP-complete satisfiability problem, if $E^{\prime }$ is regular-ordered, then ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ will contain at most n vertices, where n is the length of the equation, while if $E^{\prime }$ is regular rotated, but not regular-ordered, then ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ will contain $\frac {n!}{2}$ vertices, indicating a vast difference in the number of vertices the algorithm would have to visit.

Motivated by generalisations of the satisfiability problem permitting additional constraints, we also consider the connectivity of the graphs ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$. To do this, we use DAG-width, a measure for directed graphs which is in several ways analogous to treewidth for undirected graphs. Intuitively, equations for which ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ has low DAG-width are likely to be more amenable when considering additional constraints such as length constraints (see Section 3.3). We give an example class of equations for which the DAG-width is unbounded, as well as a class for which the DAG-width is at most two. The latter includes the class of regular-ordered equations which is the most general subclass of QWEs for which it is known that the satisfiability problem with length constraints is decidable [20], and we expect that both cases will be interesting classes to consider in the context of this problem.

2 Preliminaries

For a set S, we denote the cardinality of S by Card(S). Let Σ be an alphabet. By Σ^∗, we denote the set of all words over Σ, and by ε the empty word. By Σ⁺, we denote the free semigroup Σ^∗∖{ε}. A word u is a prefix (resp. suffix) of a word w if there exists v such that w = uv (resp. w = vu). Similarly, u is a factor of w if there exist $v,v^{\prime }$ such that $w = v u v^{\prime }$. A prefix/suffix/factor is proper if is neither the whole word w, nor ε. The length of a word w is denoted |w|, while for ∈Σ, |w| denotes the number of occurrences of in w. For a word w = ₁₂…_n with _i ∈Σ for 1 ≤ i ≤ n, the notation w[i] refers to the letter _i in the i^th position. By w^R, we denote the reversal _n_n− 1…₁ of the word w. Two words w₁, w₂ are conjugate (written $w_{1} \sim w_{2}$) if there exist u, v such that w₁ = uv and w₂ = vu.

We shall generally distinguish between two types of alphabet: an infinite set X = {x₁, x₂,…} of variables, and a set Σ = {,,…} of terminal symbols. We shall assume that Card(Σ) ≥ 2, and that there exists an order on X leading to a lexicographic order on X^∗. For a word α ∈ (X ∪Σ)^∗, we shall denote by var(α) the set {x ∈ X∣x is a factor of α}. We shall denote by qv(α) the set {x ∈ var(α)∣|α|_x = 2}. A word equation is a tuple (α, β) ∈ (X ∪Σ)^∗× (X ∪Σ)^∗, usually written $\alpha \doteq \beta $. Solutions are morphisms h : (X ∪Σ)^∗→Σ^∗ with h() = for all ∈Σ such that h(α) = h(β). The satisfiability problem is the problem of deciding algorithmically whether a given word equation has a solution. For equations E given by $\alpha \doteq \beta $, we shall often extend notations regarding words in (X ∪Σ)^∗ to E for convenience, so that, e.g. |E| = |αβ|, var(E) = var(αβ) and qv(E) = qv(αβ). An equation $\alpha \doteq \beta $ is quadratic if |αβ|_x ≤ 2 for all x ∈ X. It is regular if |α|_x ≤ 1 and |β|_x ≤ 1 hold for all x ∈ X. Thus all regular equations are quadratic, but not all quadratic equations are regular. We shall usually abbreviate regular (resp. quadratic) word equation to RWE (resp. QWE). For $Y \subseteq X$, let $\pi _{Y} : (X\cup {{\varSigma }}^{*}) \to {Y}^{*}$ be the morphism such that π_Y(x) = x if x ∈ Y and π_Y(x) = ε otherwise; i.e. π_Y is a projection from (X ∪Σ)^∗ onto Y^∗. A regular equation E given by $\alpha \doteq \beta $ is regular-ordered if π_qv(E)(α) = π_qv(E)(β), it is regular rotated if $\pi _{{qv}(E)}(\alpha ) \sim \pi _{{qv}(E)}(\beta )$ and it is regular reversed if π_qv(E)(α) = π_qv(E)(β)^R.

Given a set S and binary relation ${\mathscr{R}} \subseteq S \times S$, we denote the reflexive-transitive closure of ${\mathscr{R}}$ as ${\mathscr{R}}^{*}$. For each s ∈ S, we denote by $[s]_{{\mathscr{R}}}$ the set $\{s^{\prime } \mid (s,s^{\prime }) \in {\mathscr{R}}^{*} \}$. The relation ${\mathscr{R}}$ may be represented as a directed graph, which we denote ${\mathscr{G}}^{{\mathscr{R}}}$, with vertices from S and edges from ${\mathscr{R}}$. Usually, we will be interested in the subgraph of ${\mathscr{G}}^{{\mathscr{R}}}$ containing vertices belonging to $[s]_{{\mathscr{R}}}$ for some s ∈ S. Thus, for a subset T of S we shall denote by ${\mathscr{G}}^{{\mathscr{R}}}_{T}$ the subgraph of ${\mathscr{G}}^{{\mathscr{R}}}$ containing vertices from T. Given a (directed) graph ${\mathscr{G}}$, with vertices $V({\mathscr{G}})$ and edges $E({\mathscr{G}})$, a root vertex is some $v\in V({\mathscr{G}})$ such that there does not exist $(u,v) \in E({\mathscr{G}})$. We denote by ${diam}({\mathscr{G}})$ the diameter of the graph ${\mathscr{G}}$, by which we mean the maximum length of a shortest (directed) path between two vertices. For our purposes, we are really interested in the maximum length of shortest paths only when they exist, meaning that we shall not adopt the convention that ${diam}({\mathscr{G}}) = \infty $ when ${\mathscr{G}}$ is a directed graph which is not strongly connected.

For $W,V^{\prime } \subseteq V({\mathscr{G}})$, we say that W guards $V^{\prime }$ if for all $(u,v) \in E({\mathscr{G}})$ with $u \in V^{\prime }$, we have $v \in V^{\prime } \cup W$. If ${\mathscr{G}}$ is acyclic, we write $v_{1} \leq _{{\mathscr{G}}} v_{2}$ if there is a directed path from v₁ to v₂ in ${\mathscr{G}}$ or v₁ = v₂. Following [5], A DAG-decomposition of ${\mathscr{G}}$ is a pair (D, χ) such that D is a directed acyclic graph (DAG) with vertices V (D), and χ = {X_d∣d ∈ V (D)} is a family of subsets of $V({\mathscr{G}})$ satisfying:

(D1)

$V({\mathscr{G}}) = \bigcup \limits _{d\in V(D)}X_{d}$,
(D2)

if $d,d^{\prime },d^{\prime \prime } \in V(D)$ such that $d \leq _{D} d^{\prime } \leq _{D} d^{\prime \prime }$, then $X_{d} \cap X_{d^{\prime \prime }} \subseteq X_{d^{\prime }}$,
(D3)

For all edges $(d,d^{\prime })$ of D, $X_{d} \cap X_{d^{\prime }}$ guards $X_{\geq d^{\prime }} \backslash X_{d}$, where $X_{\geq d^{\prime }} = \bigcup \limits _{d^{\prime \prime }\geq _{D} d^{\prime }}X_{d^{\prime \prime }}$, and for all root vertices d, X_≥d is guarded by ∅.

The width of the DAG-decomposition is $\max \limits \{{\text {Card}}(X_{d}) \mid d \in V(D)\}$. The DAG-width of ${\mathscr{G}}$ is the minimum width of any possible DAG-decomposition of ${\mathscr{G}}$ and is denoted ${dgw}({\mathscr{G}})$.

3 An Algorithm for Solving Regular Word Equations

In this section we present the algorithm for solving QWEs as a rewriting system defined by a relation ⇒_NT. The rewriting relation is derived from morphisms called Nielsen transformations, and we shall abuse this terminology slightly and generally also refer to the rewriting transformations themselves as Nielsen transformations. The Nielsen transformations never introduce new variables or terminal symbols, and never increase the length of the equation. They also preserve the properties of being quadratic (resp. regular). Thus, given a quadratic (resp. regular) word equation E, the set $\{E^{\prime } \mid E\Rightarrow _{NT}^{*} E^{\prime } \}$ of equations reachable via Nielsen transformations is finite. Moreover, given an equation which has a solution h, there is always a Nielsen transformation which produces an equation which has a solution, such that at least one of the new equation or the new solution is strictly shorter than the previous one. It follows that, given an equation which possesses a solution, it is possible to reach the equation $\varepsilon \doteq \varepsilon $ after finitely many rewriting steps. For a more detailed description of the algorithm, we refer the reader to e.g. Chapter 12 of [21].

3.1 Nielsen Transformations

The Nielsen transformations (morphisms) are defined as follows: for x ∈ X ∪Σ and y ∈ X, let ψ_x<y : (X ∪Σ)^∗→ (X ∪Σ)^∗ be the morphism given by ψ_x<y(y) = xy and ψ_x<y(z) = z whenever z≠y. We define the rewriting transformations via the relations ⇒_L, ⇒_R,⇒_> as follows. Suppose we have a QWE E of the form $x\alpha \doteq y\beta $ where x, y ∈ X ∪Σ and α, β ∈ (X ∪Σ)^∗. Then:

1.

if x ∈ qv(E) and x≠y, then $x\alpha \doteq y\beta \Rightarrow _{L} x\psi _{y<x}(\alpha ) \doteq \psi _{y<x}(\beta )$, and
2.

if y ∈ qv(E) and x≠y, then $x\alpha \doteq y\beta \Rightarrow _{R} \psi _{x<y}(\alpha ) \doteq y\psi _{x<y}(\beta )$, and
3.

if x ∈ X∖qv(E), then $x\alpha \doteq y\beta \Rightarrow _> x\alpha \doteq \beta $, and
4.

if y ∈ X∖qv(E), then $x\alpha \doteq y\beta \Rightarrow _> \alpha \doteq y\beta $, and
5.

if x = y, then $x\alpha \doteq y\beta \Rightarrow _{>} \alpha \doteq \beta $.

Moreover, for a QWE E of the form $\alpha \doteq \beta $ with α, β ∈ (X ∪Σ)^∗, and for each $Y \subseteq {var}(E)$, we have the additional transformations $\alpha \doteq \beta \Rightarrow _> \pi _{X\backslash \{Y\}}(\alpha ) \doteq \pi _{X\backslash \{Y\}}(\beta )$.

Now, our full rewriting relation, ⇒_NT, is given by ⇒_L∪⇒_R∪⇒_>.³ For convenience, we shall define ⇒ to be ⇒_L∪⇒_R. We shall call the rewriting transformations from ⇒ length-preserving, since they are exactly those for which the resulting equation has the same length as the original. The following observation follows directly from the definition of ⇒_NT.

Remark 3.1

Let $E, E^{\prime }$ be QWEs such that $E \Rightarrow _{NT} E^{\prime }$. If E is regular, then $E^{\prime }$ is regular. Moreover, if $E \Rightarrow E^{\prime }$, then ${var}(E) = {var}(E^{\prime })$, ${qv}(E) = {qv}(E^{\prime })$, and $|E| = |E^{\prime }|$. Similarly, if $E \Rightarrow _> E^{\prime }$, then ${var}(E^{\prime }) \subseteq {var}(E)$, ${qv}(E^{\prime }) \subseteq {qv}(E)$, and $|E^{\prime }| < |E|$. Hence the set $\{E^{\prime \prime } \mid E \Rightarrow _{NT}^{*} E^{\prime \prime }\}$ is finite.

If E₁, E₂ are RWEs such that E₁ ⇒_LE₂, then it follows from the definitions that there exist x, y ∈ X and α₁, α₂, β₁, β₂ ∈ (X∖{x, y})^∗ such that E₁ is given by $x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}$ and E₂ is given by $x \alpha _{1} y \alpha _{2} \doteq \beta _{1} y x \beta _{2}$. Extending this observation to multiple applications of ⇒_L, we may conclude that the set $\{ E_{2} \mid E_{1} \Rightarrow _{L}^{*} E_{2}^{\prime }\}$ is exactly the set $\{x \alpha _{1} y \alpha _{2} \doteq \beta _{3} x \beta _{2} \mid \beta _{3} \sim y \beta _{1}\}$. A similar statement can be made for $\Rightarrow _{R}^{*}$. Consequently, the reflexive transitive closures $\Rightarrow _{L}^{*}$ and $\Rightarrow _{R}^{*}$ are symmetric. Hence, we may also observe the following.

Remark 3.2

Let E be a RWE and Z ∈{L, R}. Then ${\text {Card}}(\{ E^{\prime } \mid E \Rightarrow _{Z}^{*} E^{\prime }\}) < |E|$ and $\Rightarrow _{Z}^{*}$ is an equivalence relation. It follows that ⇒^∗ is also an equivalence relation.

The following well-known result forms the basis for the algorithm for solving QWEs.

Theorem 3.3

[21] Let E be a QWE. Then E has a solution if and only if $E \Rightarrow _{NT}^{*} \varepsilon \doteq \varepsilon $.

3.2 Representing the Set of Solutions as a Graph

Theorem 3.3 provides the basis for treating the satisfiability of QWEs as a reachability problem for the rewriting relation ⇒_NT. Since any relation R is naturally represented as a (directed) graph ${\mathscr{G}}^R$, it is also natural to interpret the resulting algorithm as a search in the graph ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$: it suffices to to determine whether there exists a path in the graph from the original equation E to the trivial equation $\varepsilon \doteq \varepsilon $. In fact, the graph ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ can tell us significantly more than simply whether a solution to E exists: every walk from E to $\varepsilon \doteq \varepsilon $ in ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ corresponds to a solution to E and likewise, every solution to E is represented by a walk in ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ from E to $\varepsilon \doteq \varepsilon $. Thus the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$ contain a full description of all solutions to E, and as such, their properties and structure are of inherent interest to the study of QWEs and their solutions. An immediate example of this is the diameter, which is strongly related to the complexity of the satisfiability problem, as demonstrated in the following proposition.

Proposition 3.4

Let ${\mathscr{C}}$ be a class of QWEs. Suppose there exists a constant $k \in \mathbb {N}$ such that for each $E \in {\mathscr{C}}$, we have ${diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \in O(|E|^k)$. Then the satisfiability problem for ${\mathscr{C}}$ is in NP.

Proof

Let ${\mathscr{C}}$ be a class of quadratic word equations and let $k\in \mathbb {N}$ such that for each $E \in {\mathscr{C}}$, ${diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \in O(|E|^k)$. By Theorem 3.3, to check whether an equation $E \in {\mathscr{C}}$ has a solution, we have to check whether there is a path from E to $\varepsilon \doteq \varepsilon $ in ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$. If such a path exists, then due to our assumptions about the diameter, one exists of length at most O(|E|^k). Moreover, for each edge E₁ ⇒_NTE₂ in the path, we have that |E₂|≤|E₁|≤|E|, so verifying that E₁ ⇒_NTE₂ can be achieved in linear time. Hence, subject to appropriate non-deterministic choices, we may find such a path whenever it exists in O(|E|^k+ 1) time and the satisfiability problem for ${\mathscr{C}}$ is in NP. □

Many properties will be determined mostly (i.e. up to some small imprecision) on the subgraphs obtained by restricting our rewriting relation to length-preserving transformations only (i.e. to ⇒). Since the rewriting relation ⇒_NT allows us to preserve or decrease the length, but never increase it again, any walk in the graph will visit a subgraph containing equations of each length only once, and in order of decreasing length. The following proposition confirms how we may infer a global property of ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ from its ‘local’ values in the individual subgraphs ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ in the case of two properties we are particularly interested in: diameter and DAG-width.

Proposition 3.5

Let E be a QWE. Then

1.

${diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \leq (|E|+1)(1+\max \limits \{{diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}) \mid E \Rightarrow _{NT}^{*} E^{\prime }\})-1$, and
2.

${dgw}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) = \max \limits \{{dgw}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}) \mid E \Rightarrow _{NT}^{*} E^{\prime }\}$.

Proof

The second statement is a direct consequence of Theorem 6 in [5]. We shall consider the first statement. Let E be a quadratic word equation. Let

$$m = \max \{ {diam}(\mathscr{G}^{\Rightarrow}_{[E^{\prime}]} \mid E \Rightarrow_{NT}^{*} E^{\prime})\}.$$

Let E₁, E₂,…E_n be the shortest path in ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ between E₁ and E_n. Then E_i ⇒_NTE_i+ 1 for 1 ≤ i < n. Consequently, for each i,1 ≤ i < n either |E_i| = |E_i+ 1| or |E_i| > |E_i+ 1|. Let j₁, j₂,…,j_k be all the indices i for which the latter holds. Then, since the length of an equation cannot be negative, we necessarily have that k ≤|E|. Moreover, we have that $E_{1} \Rightarrow ^{*} E_{j_{1}}$, $E_{j_k+1} \Rightarrow ^{*} E_n$, and for each i, 1 ≤ i < k, $E_{j_i+1} \Rightarrow ^{*} E_{j_{i+1}}$. Since, for each E_i, ${\mathscr{G}}^{\Rightarrow }_{[E_i]}$ is a subgraph of ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$, and by our assumption that the path E₁, E₂,…E_n is minimal in ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$, it follows that the path $E_{1}, E_{2},\ldots , E_{j_{1}}$ is minimal in ${\mathscr{G}}^{\Rightarrow }_{[E_{1}]}$, and thus j₁ − 1 ≤ m. By the same argument, the path $E_{j_k+1}, E_{j_k+2},\ldots , E_{n}$ is minimal in ${\mathscr{G}}^{\Rightarrow }_{[E_{j_k+1}]}$ so we get that n − j_k − 1 ≤ m and similarly, for each i, 1 ≤ i < k, we may conclude that j_i+ 1 − j_i − 1 ≤ m. It follows that

$$n = (n - j_{k}) + (j_{k} - j_{k-1}) + {\ldots} + (j_{2} - j_{1}) + j_{1} \leq (k+1)(m+1)$$

meaning the length of the path E₁, E₂,…E_n is at most (|E| + 1)(m + 1). Since this holds for all choices of E₁, E_n, we have that ${diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \leq (|E|+1)(m+1)-1$ as claimed. □

In what follows, we shall focus predominantly on the structure of the (sub)graphs ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ corresponding to the length-preserving transformations belonging to ⇒ (see Fig. 1). This has the advantage of allowing us to apply further restrictions, in particular a reduction to the case of basic equations introduced in Section 4, without significantly altering the structure of the graph. It is worth pointing out that due to Remark 3.2, the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is strongly connected whenever E is a RWE. The same is generally not true in the case of arbitrary QWEs E, or for the full graph ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$.

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig1_HTML.png — Fig. 1
The graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ in the case that E is the equation $x{\mathtt {a}} y {\mathtt {a}} z {\mathtt {b}} w \doteq w {\mathtt {b}} y x {\mathtt {a}} z {\mathtt {a}}$ with variables x, y, z, w and terminal symbols ,. Generated in python using the PyDot graph drawing package

3.3 Solving Equations Modulo Constraints

Often, it is important to determine whether a given equation has a solution which satisfies some additional constraints. For some types of constraints, it is possible to adapt the algorithm by finding, for each Nielsen transformation, an appropriate corresponding transformation of the constraints. For example, if x, y, z ∈ X and we have the length constraint |x| = |z|, when we apply the Nielsen transformation associated with ψ_y<x to our equation, we replace each occurrence of x with yx. Thus, the updated constraint would be |x| + |y| = |z|. Unfortunately, as is the case for length constraints, the resulting set of possible equation/constraint combinations can become infinite, meaning that the modified version of the algorithm is not guaranteed to terminate.

A possible solution to this is to find finite descriptions of the potentially infinite sets of constraints which may occur alongside each equation. The task of finding such descriptions, and consequently the potential decidability of the corresponding extended satisfiability problems, is dependent on the structural properties of the graph, as can be seen e.g. in [20, 24].

One case in which computing finite descriptions is straightforward is when the graph ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ is acyclic (i.e. a DAG). Unfortunately, inspection of the definition of ⇒_NT reveals that this is not true for the majority of RWEs (or QWEs). Hence, when considering the existence of algorithms for solving word equations with length constraints (or constraints of other types), it is natural to specifically consider classes of equations E where the graphs ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ have particularly DAG-like (or un-DAG-like) structures, which we can measure using parameters such as DAG-width.

3.4 Properties of the Graphs ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ for Regular Equations E

In order to understand the full graphs ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$, we mostly need to understand the (strongly connected) components corresponding to the length-preserving transformations, as we can easily see that these components will be connected in a DAG-like structure whose depth is at most |E|. Hence, our main goal is to describe the structure of the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$ for RWEs E. This is done in several steps, with each one accounting for a particular structural feature or aspect as follows.

(1)

In the first step (Section 4), we describe the effect of terminal symbols, single occurrence variables, and ‘decomposability’ on the structure of ${\mathscr{G}}^{\Rightarrow }_{[E]}$, essentially reducing the structure of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ to ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ for a ‘basic’ equation $E^{\prime }$ which does not contain any of these features.
(2)

Building on an important technical tool developed in Section 5, the second step (Section 6) introduces the class of jumbled equations. For equations $E^{\prime }$ which are not jumbled, but which have nevertheless been simplified as per the first step, there exists a specific repetitive structure allowing us to express ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ as a combination of (near) copies of some smaller graph ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]}$ where $E^{\prime \prime }$ is a jumbled equation obtained by deleting the appropriate variables from $E^{\prime }$.
(3)

In the third step (Section 7), we show that for jumbled equations $E^{\prime \prime }$, all vertices in ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]}$ are ‘close’ to a vertex from a small subset conforming to a very particular structure called Lex Normal Form.
(4)

Finally, in Sections 8, 9 and 10, we exploit our structural results to investigate the diameter, number of vertices and connectivity (DAG-width) of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ respectively. In Section 11 we note a generalisation of our results to systems of equations.

4 Basic Equations: A Convenient Abstraction

The current section is devoted to reducing the study of the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$ to the case of basic equations. This has several advantages, including a significant reduction in the size of the graphs which is useful for working with examples, as well as allowing for the simpler formulation of precise results, e.g. regarding the size of the graphs in Section 9, as well as avoiding unnecessary repetition in the formal statements and their proofs.

Definition 4.1 (Basic Equations)

Let E be a QWE given by $\alpha \doteq \beta $. Then E is decomposable if there exist proper prefixes $\alpha ^{\prime }, \beta ^{\prime }$ of α and β such that ${var}(\alpha ^{\prime }) \cap {qv}(E) = {var}(\beta ^{\prime }) \cap {qv}(E)$. Otherwise, E is indecomposable. E is basic if it is indecomposable and α, β ∈ qv(E)^∗.

For a basic RWE, both sides of the equation are permutations of the same set of variables, for example $x_{1} x_{2} x_3 \doteq x_3 x_{1} x_{2}$ and $x y w z \doteq w z x y$ are both basic RWEs. On the other hand, $x y z w \doteq yx zw $, ${\mathtt {a}} x {\mathtt {b}} y \doteq y {\mathtt {b}}{\mathtt {a}} x$ and $x y \doteq y z$ are not – the first being decomposable and the latter two containing terminal symbols and variables occurring on one side only.

We firstly consider decomposable equations E, showing that in this case the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is isomorphic to ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ for some shorter equation $E^{\prime }$. The main step in this respect is the following observation.

Lemma 4.2

Let E be a RWE given by $\alpha _{1}\alpha _{2} \doteq \beta _{1}\beta _{2}$ where α₁, α₂, β₁, β₂ ∈ (X ∪Σ)^∗ such that α₁, β₁≠ε and var(α₁) ∩ qv(E) = var(β₁) ∩ qv(E). Let $E^{\prime }$ be a RWE. Then $E \Rightarrow E^{\prime }$ if and only if there exist α₃, β₃ ∈ (X ∪Σ)^∗ such that $E^{\prime }$ is given by $\alpha _3 \alpha _{2} \doteq \beta _3\beta _{2}$ and $\alpha _{1} \doteq \beta _{1} \Rightarrow \alpha _3 \doteq \beta _3$.

Proof

Suppose E is a RWE given by $\alpha _{1}\alpha _{2} \doteq \beta _{1}\beta _{2}$ where α₁, α₂, β₁, β₂ ∈ (X ∪Σ)^∗ with α₁, β₁≠ε such that var(α₁) ∩ qv(E) = var(β₁) ∩ qv(E). Let $E^{\prime }$ be a RWE. Suppose firstly that α₃, β₃ ∈ (X ∪Σ)^∗ such that $\alpha _{1} \doteq \beta _{1} \Rightarrow _L \alpha _3 \doteq \beta _3$ (the case that $\alpha _{1} \doteq \beta _{1} \Rightarrow _R \alpha _3 \doteq \beta _3$ is symmetric). Then it follows from the definition of ⇒_L that α₁ has a prefix y ∈ qv(E). Hence, there exist x ∈ X ∪Σ and γ, δ₁, δ₂ ∈ (X ∪Σ)^∗ such that α₁ = yγ, β₁ = xδ₁yδ₂, α₃ = α₁ and β₃ = δ₁xyδ₂. By the definition of ⇒_L, it follows that $\alpha _{1} \alpha _{2} \doteq \beta _{1} \beta _{2} \Rightarrow _L \alpha _3 \alpha _{2} \doteq \beta _3\beta _{2}$ and thus $E \Rightarrow E^{\prime }$.

Now suppose instead that $E \Rightarrow _L E^{\prime }$ (again, the case that $E \Rightarrow _R E^{\prime }$ is symmetric). Then by definition of ⇒_L, there exists a variable y ∈ qv(E) in the leftmost position of α₁ which also occurs in β₁β₂. Moreover, it follows from the definition of ⇒_L and the fact that $E \Rightarrow _L E^{\prime }$ that y≠β₁[1]. Furthermore, since var(α₁) ∩ qv(E) = var(β₁) ∩ qv(E), y must in fact occur somewhere in β₁, so there exist x ∈ X ∪Σ and γ, δ₁, δ₂ ∈ (X ∪Σ)^∗ such that α₁ = yγ and β₁ = xδ₁yδ₂, and such that $E^{\prime }$ is given by $\alpha _3\alpha _{2} \doteq \beta _3 \beta _{2}$ where α₃ = α₁ and β₃ = δ₁xyδ₂. It follows from the definition of ⇒_L that $\alpha _{1} \doteq \beta _{1} \Rightarrow _L \alpha _3 \doteq \beta _3$ and thus the statement holds. □

It follows immediately from Lemma 4.2 that the relation ⇒ preserves the properties of being (in)decomposable and basic.

Corollary 4.3

Let E₁, E₂ be RWEs such that E₁ ⇒ E₂. Then E₁ is indecomposable if and only if E₂ is indecomposable. Consequently E₁ is basic if and only if E₂ is basic.

Moreover, a straightforward induction yields the following description of the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$ in the case that E is decomposable.

Corollary 4.4

Let E be a decomposable RWE given by $\alpha _{1}\alpha _{2} \doteq \beta _{1}\beta _{2}$ where α₁, α₂, β₁, β₂ ∈ (X ∪Σ)^∗ such that α₁, β₁≠ε and var(α₁) ∩ qv(E) = var(β₁) ∩ qv(E). Then ${\mathscr{G}}_{[E]}^{\Rightarrow }$ is isomorphic to ${\mathscr{G}}_{[\alpha _{1} \doteq \beta _{1}]}^{\Rightarrow }$ and can be obtained from ${\mathscr{G}}_{[\alpha _{1} \doteq \beta _{1}]}^{\Rightarrow }$ by replacing each vertex $\alpha _3 \doteq \beta _3 \in [\alpha _{1} \doteq \beta _{1}]_{\Rightarrow }$ with $\alpha _3 \alpha _{2} \doteq \beta _3 \beta _{2}$.

Corollary 4.4 accounts for decomposable equations. It remains to consider the case of equations containing terminal symbols and variables occurring on only one side (and therefore once overall). For this case, we need the following notion for relating the structure of two graphs.

Definition 4.5 (Isolated path compression)

Let G₁, G₂ be (directed) graphs. We say that G₁ is an isolated path compression of order n of G₂ if G₂ may be obtained from G₁ by replacing each edge $(e,e^{\prime })$ in G₁ by a path $(e,e_{1}), (e_{1}, e_{2}), {\ldots } (e_{k-1},e_k), (e_k, e^{\prime })$ such that k ≤ n and e₁, e₂, e₃,…,e_k are new vertices unique to the edge $(e,e^{\prime })$.

Informally, an isolated path compression of a graph is obtained simply by replacing ‘isolated paths’ (paths whose internal vertices are not adjacent to to any vertices outside the path) of a bounded length with single edges. Therefore, the overall structure is generally preserved, and most properties will be preserved, or change proportionally to the order n (Fig. 2).

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig2_HTML.png — Fig. 2
The graph G₁ is an isolated path compression of order two of the graph G₂

Remark 4.6

Consider graphs G₁, G₂ such that G₁ is an isolated path compression of order n of G₂. If dgw(G₁) = 1, then dgw(G₂) ∈{1,2}.⁴

If dgw(G₁) ≥ 2, then the dgw(G₁) = dgw(G₂). Moreover, diam(G₂) ≤ (n + 1)diam(G₁), and the number of vertices (resp. edges) in G₂ is at most the number of vertices in G₁ plus n times the number of edges of G₁.

Using isolated path compressions, it is possible to describe the structure of the graph ${\mathscr{G}}_{[E]}^{\Rightarrow }$ for any RWE E in terms of the graph ${\mathscr{G}}_{[E^{\prime }]}^{\Rightarrow }$ for the RWE $E^{\prime }$ obtained from E by erasing all terminal symbols and single-occurrence variables from E (i.e. projecting onto qv(E)).

Lemma 4.7

Let E be an indecomposable RWE given by $\alpha \doteq \beta $. Then the graph ${\mathscr{G}}_{[\pi _{{qv}(E)}(\alpha ) \doteq \pi _{{qv}(E)}(\beta )]}^{\Rightarrow }$ is isomorphic to an isolated path compression of order |E| of ${\mathscr{G}}_{[E]}^{\Rightarrow }$.

Proof

Let E be an indecomposable RWE given by $\alpha \doteq \beta $. Note that by Corollary 4.4, it follows that $E^{\prime }$ is indecomposable for every $E^{\prime } \in [E]_{\Rightarrow }$. We begin by considering the simple cases arising when Card(qv(E)) < 2. If Card(qv(E)) = 0, then ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is a single vertex with no edges. Moreover, $\pi _{{qv}(E)} (\alpha ) \doteq \pi _{{qv}(E)} (\beta )$ is the trivial equation $\varepsilon \doteq \varepsilon $, so ${\mathscr{G}}_{[\pi _{{qv}(E)}(\alpha ) \doteq \pi _{{qv}(E)}(\beta )]}^{\Rightarrow }$ is also a single vertex with no edges. The two graphs are clearly isomorphic, so the lemma holds trivially.

Now suppose that Card(qv(E)) = 1. Then E has the form $w_{1} x w_{2} \doteq w_3 x w_4$ where qv(E) = {x} and w₁, w₂, w₃, w₄ ∈ ((X ∪Σ)∖{x})^∗. It necessarily follows that the equation $\pi _{{qv}(E)} (\alpha ) \doteq \pi _{{qv}(E)} (\beta )$ has the form $x \doteq x$, meaning that ${\mathscr{G}}_{[\pi _{{qv}(E)}(\alpha ) \doteq \pi _{{qv}(E)}(\beta )]}^{\Rightarrow }$ is again a single vertex with no edges. If w₁, w₂≠ε, then E is decomposable, a contradiction. Otherwise, ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is a cycle of length $\max \limits \{|w_{1}|,|w_{2}|\} < |E|$, so again the statement of the lemma follows directly. Thus, for the remainder of the proof, we shall suppose that Card(qv(E)) ≥ 2.

Before proceeding, we remark that for any equation $E^{\prime }$ given by $\alpha ^{\prime } \doteq \beta ^{\prime }$, if $\alpha ^{\prime }[1], \beta ^{\prime }[1] \notin {qv}(E^{\prime })$, then either $E^{\prime }$ is decomposable, or $|\alpha ^{\prime }|, |\beta ^{\prime }| \in \{0,1\}$. Both are contradictions to previous assumptions (the former to the fact that E is indecomposable, and hence $E^{\prime }$ is indecomposable for all $E^{\prime } \in [E]_{\Rightarrow }$, and the latter to the assumption that Card(qv(E)) ≥ 2 which is only possible if |α|,|β|≥ 2). Consequently, we may partition [E]_⇒ into two sets S₁ and S₂ where S₁ contains all equations $E^{\prime }$ given by $\alpha ^{\prime } \doteq \beta ^{\prime }$ such that $\alpha ^{\prime }[1]$ and $\beta ^{\prime }[1] $ are both in $ {qv}(E^{\prime })$, and S₂ contains all equations $E^{\prime }$ given by $\alpha ^{\prime } \doteq \beta ^{\prime }$ such that exactly one of $\alpha ^{\prime }[1],\beta ^{\prime }[1]$ is in ${qv}(E^{\prime })$. Intuitively, S₁ will be the set of ‘surviving’ vertices in the isolated path compression while S₂ consists of those vertices which belong only to the ‘isolated paths’ which are contracted/compressed. Supporting this, we show the following two claims regarding elements of S₂.

Claim 4.7.1

Suppose that $E^{\prime }\in S_{2}$. Then the in-degree and out-degree of $E^{\prime }$ in ${\mathscr{G}}_{[E]}^{\Rightarrow }$ are both exactly one.

Proof

W.l.o.g. suppose that $E^{\prime }$ is given by $x \alpha ^{\prime }_{1} y \alpha _{2}^{\prime } \doteq y \beta ^{\prime }$ with $y \in {qv}(E^{\prime })$ and $x \notin {qv}(E^{\prime })$. It follows from the definitions of ⇒_L and ⇒_R that there is no $E^{\prime \prime }$ such that $E^{\prime } \Rightarrow _R E^{\prime \prime }$, and exactly one $E^{\prime \prime }$ such that $E^{\prime } \Rightarrow _L E^{\prime \prime }$. Thus the out-degree is one as claimed. Now consider the in-degree and let $E^{\prime \prime } \in [E]_{\Rightarrow }$ such that $E^{\prime \prime } \Rightarrow E^{\prime }$. Note that by the definition of ⇒_R, we cannot have that $E^{\prime \prime } \Rightarrow _R E^{\prime }$, so we must have that $E^{\prime \prime } \Rightarrow _L E^{\prime }$. It follows from the fact that the Nielsen transformation morphisms ψ_y<x are injective that there is exactly one such $E^{\prime \prime }$, and thus we also have that the in-degree of $E^{\prime }$ is one as claimed. □

Claim 4.7.2

Let $E^{\prime } \in S_{2}$. Then there exists k ≤|E|− 2 and E₀, E₁,…,E_k+ 1 ∈ [E]_⇒ and Z ∈{L, R} such that all the following statements hold:

1.

E₀, E_k+ 1 ∈ S₁,
2.

E_i ∈ S₂ for 1 ≤ i ≤ k,
3.

E_i ⇒_ZE_i+ 1 for 0 ≤ i ≤ k,
4.

there exists i,1 ≤ i ≤ k such that $E^{\prime } = E_i$.

Proof

W.l.o.g. suppose that the RHS of $E^{\prime }$ has a prefix contained in ${qv}(E^{\prime })$. Then since ${\text {Card}}({qv}(E^{\prime })) \geq 2$ and since $E^{\prime }$ is regular, the LHS also contains at least one variable in ${qv}(E^{\prime })$ and we may either write $E^{\prime }$ as

(1)

$a_ia_{i+1} {\ldots } a_k x \alpha ^{\prime }_{1} x^{\prime } a_{1} a_{2}{\ldots } a_{i-1} y \alpha _{2}^{\prime } \doteq y \beta ^{\prime }$, or
(2)

$a_ia_{i+1} {\ldots } a_k x a_{1} a_{2}{\ldots } a_{i-1} y \alpha _{2}^{\prime } \doteq y \beta ^{\prime }$

where k ≤|E|− 2, a_j ∈ (X∖qv(E)) ∪Σ for 1 ≤ j ≤ k, $x,x^{\prime }, y \in {qv}(E)$ with $x,x^{\prime }\not =y$, and $\alpha _{1}^{\prime },\alpha _{2}^{\prime }, \beta ^{\prime } \in (X\cup {{\varSigma }})^{*}$. Consider the first case. Let E₀ be the equation given by

$$x^{\prime} a_{1} a_{2} {\ldots} a_{k} x \alpha_{1}^{\prime} y \alpha_{2}^{\prime} \doteq y \beta^{\prime},$$

let E_k+ 1 be the equation given by

$$x \alpha_{1}^{\prime} x^{\prime} a_{1} a_{2} {\ldots} a_{k} y \alpha_{2}^{\prime} \doteq y \beta^{\prime},$$

and for 1 ≤ j ≤ k, let E_j be the equation given by

$$a_{j} a_{j+1} {\ldots} a_{k} x \alpha_{1}^{\prime} x^{\prime} a_{1} a_{2} {\ldots} a_{j-1} y \alpha_{2}^{\prime} \doteq y \beta^{\prime}.$$

Then clearly, $E_i = E^{\prime }$, E₀, E_k+ 1 ∈ S₁, E_j ∈ S₂ for 1 ≤ j ≤ k, and E_j ⇒_RE_j+ 1 for 0 ≤ j ≤ k as claimed.

Now consider the second case. Let E₀ = E_k+ 1 be the equation given by

$$x a_{1} a_{2} {\ldots} a_{k} y \alpha_{2}^{\prime} \doteq y \beta^{\prime}$$

and for 1 ≤ j ≤ k, let E_j be the equation given by

$$a_{j} a_{j+1} {\ldots} a_{k} x a_{1} a_{2} {\ldots} a_{j-1} y \alpha_{2}^{\prime} \doteq y \beta^{\prime}.$$

Then clearly, $E_i = E^{\prime }$, E₀, E_k+ 1 ∈ S₁, E_j ∈ S₂ for 1 ≤ j ≤ k, and E_j ⇒_RE_j+ 1 for 0 ≤ j ≤ k as claimed. □

Claims 4.7.1 and 4.7.2 are sufficient to show that the equations/vertices in S₁ are exactly those which survive in an isolated path compression of order |E| of ${\mathscr{G}}_{[E]}^{\Rightarrow }$. To state this more formally, we define a relation ◇ on the equations in S₁ such that $E^{\prime } \diamond E^{\prime \prime }$ if $E^{\prime },E^{\prime \prime } \in S_{1}$ and either $E^{\prime } \Rightarrow E^{\prime \prime }$, or there exist E₁, E₂,…E_k ∈ S₂ and Z ∈{L, R} such that $E^{\prime } \Rightarrow _Z E_{1}\Rightarrow _Z E_{2} \Rightarrow _Z {\ldots } \Rightarrow _Z E_k \Rightarrow _Z E^{\prime \prime }$. Then we get the following.

Claim 4.7.3

The graph ${\mathscr{G}}_{S_{1}}^{\diamond }$ is an isolated path compression of order |E| of ${\mathscr{G}}_{[E]}^{\Rightarrow }$.

Proof

Directly from Claims 4.7.1 and 4.7.2. □

It remains to show that ${\mathscr{G}}_{S_{1}}^{\diamond }$ is isomorphic to ${\mathscr{G}}_{[\hat {E}]}^{\Rightarrow }$ where $\hat {E}$ is given by $\pi _{{qv}(E)}(\alpha ) \doteq \pi _{{qv}(E)}(\beta )$. In other words, we must show that there is an isomorphism $f : S_{1} \to [\hat {E}]_{\Rightarrow }$ such that for any $E^{\prime },E^{\prime \prime } \in S_{1}$, f(E₁) ⇒ F(E₂) if and only if E₁ ◇ E₂. Before we can define f, we must firstly show that there exists $\tilde {E} \in S_{1}$ given by $\tilde {\alpha } \doteq \tilde {\beta }$ such that $\pi _{{qv}(\tilde {E})}(\tilde {\alpha }) = \pi _{{qv}(E)}(\alpha )$ and $\pi _{{qv}(\tilde {E})}(\tilde {\beta }) = \pi _{{qv}(E)}(\beta )$. If E ∈ S₁ then we may simply take $\tilde {E} = E$. Otherwise, E ∈ S₂, meaning exactly one of α[1],β[1] is in qv(E). W.l.o.g. suppose that α[1]∉qv(E). Then we may write α = γxα₁yα₂ and β = yβ₁ where γ ∈ ((X∖qv(E)) ∪Σ)⁺, x, y ∈ qv(E), and α₁, α₂, β₁ ∈ (X ∪Σ)^∗. Furthermore, we have $E \Rightarrow _R^{*} \tilde {E}$ where $\tilde {E} \in S_{1}$ is given by $x \alpha _{1} \gamma _{2} y \alpha _{2} \doteq y \beta _{1}$, in which case we have that $\pi _{{qv}(E)}(\alpha ) = \pi _{{qv}(\tilde {E})}(x \alpha _{1} \gamma _{2} y \alpha _{2})$ and $\pi _{{qv}(E)}(\beta ) = \pi _{{qv}(\tilde {E})}(y\beta _{1})$ (note that we have that ${qv}(E) = {qv}(\tilde {E})$ since $\tilde {E} \in [E]_{\Rightarrow }$).

Since $\tilde {E} \in S_{1}$, we may write $\tilde {E}$ as

$$y_{1} \gamma_{1} y_{2} \gamma_{2} {\ldots} y_{n} \gamma_{n} \doteq y^{\prime}_{1} \delta_{1} y^{\prime}_{2} \delta_{2} {\ldots} y^{\prime}_{n} \delta_{n}$$

where $y_i,y^{\prime }_i \in {qv}(\tilde {E})$ and $\gamma _i, \delta _i \in ((X\backslash {qv}(\tilde {E})) \cup {{\varSigma }})^{*}$ for 1 ≤ i ≤ n. Consequently, by our assumptions about $\tilde {E}$, it follows that $\hat {E}$ may be written as $y_{1}y_{2},{\ldots } y_n \doteq y^{\prime }_{1} y^{\prime }_{2}{\ldots } y^{\prime }_n$. With this information, we are now ready to define our isomorphism $f : S_{1} \to [\hat {E}]_{\Rightarrow }$ via two morphisms σ_LHS and σ_RHS. In particular, let $\sigma _{LHS} : {qv}(\tilde {E})^{*} \to (X\cup {{\varSigma }})^{*}$ be the morphism such that σ_LHS(y_i) = y_iγ_i for 1 ≤ i ≤ n and $\sigma _{RHS} : {qv}(\tilde {E})^{*} \to (X\cup {{\varSigma }})^{*}$ be the morphism such that $\sigma _{RHS}(y^{\prime }_j) = y^{\prime }_j \delta _i$ for 1 ≤ j ≤ n. Then we define f such that $f(\alpha ^{\prime } \doteq \beta ^{\prime })$ is $\sigma _{LHS}(\alpha ^{\prime }) \doteq \sigma _{RHS}(\beta ^{\prime })$ for all $\alpha ^{\prime } \doteq \beta ^{\prime } \in S_{1}$. In order to show that f is indeed an isomorphism with the desired property that f(E₁) ⇒ F(E₂) if and only if E₁ ◇ E₂, we need the following claim.

Claim 4.7.4

Let $\hat {\alpha _{1}},\hat {\alpha _{2}},\hat {\beta _{1}},\hat {\beta _{2}} \in {qv}(E)^{*}$ such that $\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \in [\hat {E}]_{\Rightarrow }$, and $\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}}) \in [E]_{\Rightarrow }$. Then $\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \Rightarrow \hat {\alpha _{2}} \doteq \hat {\beta _{2}}$ if and only if $\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}}) \diamond \sigma _{LHS}(\hat {\alpha _{2}}) \doteq \sigma _{RHS}(\hat {\beta _{2}})$.

Proof

Suppose firstly that $\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \Rightarrow \hat {\alpha _{2}} \doteq \hat {\beta _{2}}$, and w.l.o.g. suppose that $\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \Rightarrow _L \hat {\alpha _{2}} \doteq \hat {\beta _{2}}$. Then there exist z₁, z₂,…,z_n ∈ qv(E), μ ∈ qv(E)^∗ such that $\hat {\alpha _{1}} = z_i \mu $ for some i,1 ≤ i ≤ n, $\hat {\beta _{1}} = z_{1}z_{2} {\ldots } z_n$, $\hat {\alpha _{2}} = \hat {\alpha _{1}}$ and $\hat {\beta _{2}} = z_{2}{\ldots } z_{i-1} z_{1} z_i {\ldots } z_n$. Let a₁, a₂,…a_k ∈ (X∖qv(E)) ∪Σ such that σ_RHS(z₁) = z₁a₁a₂…a_k. Let E₀ be given by $\sigma _{LHS}(z_i \mu ) \doteq \sigma _{RHS}(z_{1}z_{2}{\ldots } z_n)$, and for 1 ≤ j ≤ k, let E_j be given by $\sigma _{LHS}(z_i \mu ) \doteq a_j a_{j+1} {\ldots } a_k \sigma _{RHS}(z_{2} {\ldots } z_{i-1}) z_{1} a_{1} a_{2} {\ldots } a_{j-1} \sigma _{RHS}(z_i {\ldots } z_n)$, and let E_k+ 1 be given by $\sigma _{LHS}(z_i \mu ) \doteq \sigma _{RHS}(z_{2} {\ldots } z_{i-1}) z_{1} a_{1} a_{2} {\ldots } a_k \sigma _{RHS}(z_i {\ldots } z_n)$. Then we have E₀ ⇒_LE₁ ⇒_L… ⇒_LE_k+ 1. Moreover, we have that E₀ ∈ S₁ is given by $\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}})$, E_k+ 1 ∈ S₁ is given by $\sigma _{LHS}(\hat {\alpha _{2}}) \doteq \sigma _{RHS}(\hat {\beta _{2}})$, and E_j ∈ S₂ for 1 ≤ j ≤ k so E₀ ◇ E_k+ 1 as required.

Now suppose that $\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}}) \diamond \sigma _{LHS}(\hat {\alpha _{2}}) \doteq \sigma _{RHS}(\hat {\beta _{2}})$. Then by the definition of ◇, there exist E₀, E₁,…,E_k+ 1 ∈ [E]_⇒ such that E₀ ∈ S₁ is given by $\sigma _{LHS}(\hat {\alpha _{1}}) \doteq \sigma _{RHS}(\hat {\beta _{1}})$, E_k+ 1 ∈ S₁ is given by $\sigma _{LHS}(\hat {\alpha _{2}}) \doteq \sigma _{RHS}(\hat {\beta _{2}})$, E₀ ⇒_ZE₁ ⇒_Z… ⇒_ZE_k+ 1 for some Z ∈{L, R}, and E_j ∈ S₂ for 1 ≤ j ≤ k.

W.l.o.g. suppose that Z = L. Then there exist z₁, z₂,…,z_n ∈ qv(E),μ ∈ qv(E)^∗, and a₁, a₂,…,a_ℓ ∈ (X∖qv(E)) ∪Σ such that $\hat {\alpha _{1}} = z_i \mu $ for some i,1 ≤ i ≤ n, $\hat {\beta } = z_{1} z_{2}{\ldots } z_n$, and σ_RHS(z₁) = z₁a₁a₂…a_ℓ. Hence E₀ can be written as

$$\sigma_{LHS}(z_{i} \mu) \doteq z_{1} a_{1} a_{2} {\ldots} a_{\ell} \sigma_{RHS}(z_{2} z_{3} {\ldots} z_{n}).$$

Moreover, we have that $E_0 \Rightarrow _L E_{1}^{\prime } \Rightarrow _L E_{2}^{\prime } \Rightarrow _L {\ldots } \Rightarrow E_{\ell }^{\prime } \Rightarrow _L E_{\ell +1}^{\prime }$ where $E_j^{\prime }$ is given by

$$\sigma_{LHS}(z_{i} \mu) \doteq a_{j} a_{j+1} {\ldots} a_{\ell} \sigma_{RHS}(z_{2} {\ldots} z_{i-1}) z_{1} a_{1} {\ldots} a_{j-1} \sigma_{RHS}(z_{i}z_{i+1} {\ldots} z_{n})$$

for 1 ≤ j ≤ k, and $E_{\ell + 1}^{\prime }$ is given by

$$\sigma_{LHS}(z_{i} \mu) \doteq \sigma_{RHS}(z_{2} {\ldots} z_{i-1}) z_{1} a_{1} {\ldots} a_{\ell} \sigma_{RHS}(z_{i}z_{i+1} {\ldots} z_{n}).$$

Note that E_ℓ+ 1 may also be written

$$\sigma_{LHS}(z_{i}\mu) \doteq \sigma_{RHS}(z_{2}z_{3}{\ldots} z_{i-1} z_{1} z_{i} z_{i+1} {\ldots} z_{n}).$$

Now, since ⇒_L is deterministic, and since E_ℓ+ 1, E_k+ 1 ∈ S₁ while $E_{j_{1}}^{\prime }, E_{j_{2}} \in S_{2}$ for each j₁, 1 ≤ j₁ ≤ ℓ and j₂,1 ≤ j₂ ≤ k, we must necessarily have that k = ℓ. Since σ_LHS and σ_RHS are injective, we must have $\hat {\alpha _{2}} = \hat {\alpha _{1}}$ and $\hat {\beta _{2}} = z_{2}z_3{\ldots } z_{i-1} z_{1} z_i z_{i+1} {\ldots } z_n$. It follows from the definitions that $\hat {\alpha _{1}} \doteq \hat {\beta _{1}} \Rightarrow _L \hat {\alpha _{2}} \doteq \hat {\beta _{2}}$. □

It follows from Claim 4.7.4 by a simple induction with $\tilde {E}$ as the base case that $S_{1} = \{ \sigma _{LHS}(\hat {\alpha ^{\prime }}) \doteq \sigma _{RHS}(\hat {\beta ^{\prime }}) \mid \hat {\alpha ^{\prime }} \doteq \hat {\beta ^{\prime }} \in [\hat {E}]_{\Rightarrow } \}$, or equivalently that $f(S_{1}) = [\hat {E}]_{\Rightarrow }$. The claim also states explicitly that $\sigma _{LHS}(\hat {\alpha ^{\prime }}) \doteq \sigma _{RHS}(\hat {\beta ^{\prime }}) \diamond \sigma _{LHS}(\hat {\alpha ^{\prime \prime }}) \doteq \sigma _{RHS}(\hat {\beta ^{\prime \prime }})$ if and only if $\hat {\alpha ^{\prime }} \doteq \hat {\beta ^{\prime }}\Rightarrow \hat {\alpha ^{\prime \prime }} \doteq \hat {\beta ^{\prime \prime }}$ and thus f is an isomorphism such that f(E₁) ⇒ f(E₂) if and only if E₁ ◇ E₂ for all E₁, E₂ ∈ S₁. We may therefore conclude that ${\mathscr{G}}_{S_{1}}^{\diamond }$ is indeed isomorphic to ${\mathscr{G}}_{[\hat {E}]}^{\Rightarrow }$ as required. □

Combining Corollary 4.4 and Lemma 4.7, it is now possible to formulate the main result of this section, describing the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$ for arbitrary RWEs E in terms of graphs ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ for basic RWEs $E^{\prime }$. An example of the theorem is given in Fig. 3.

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig3_HTML.png — Fig. 3
An example of Theorem 4.8. On the left is the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ in the case that E is given by $x {\mathtt {a}} y z {\mathtt {a}} {\mathtt {b}} w \doteq y {\mathtt {a}} z {\mathtt {a}} x w$ with variables x, y, z, w and terminal symbols ,. On the right is ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ for the corresponding basic equation $E^{\prime }$, which in this case is given by $xyz \doteq yzx$. The graph on the right is isomorphic to an isolated path compression of order 2 of the graph on the right. Vertices internal to the isolated paths (i.e. those which are removed by the compression are shown in grey

Theorem 4.8

Let E be a RWE given by $\alpha \doteq \beta $. Let $\alpha ^{\prime },\beta ^{\prime }$ be the shortest non-empty prefixes of α, β respectively such that ${var}(\alpha ^{\prime }) \cap {qv}(E) = {var}(\beta ^{\prime }) \cap {qv}(E)$. Let $E^{\prime }$ be the equation given by $\pi _{{qv}(E)}(\alpha ^{\prime }) \doteq \pi _{{qv}(E)}(\beta ^{\prime })$. Then $E^{\prime }$ is basic, and ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ is isomorphic to an isolated path compression of order |E| of ${\mathscr{G}}^{\Rightarrow }_{[E]}$.

Proof

Let $S = {qv}(\alpha ^{\prime } \doteq \beta ^{\prime })$. Firstly, we shall show that $\alpha ^{\prime } \doteq \beta ^{\prime }$ is indecomposable. Suppose for contradiction that $\alpha ^{\prime } \doteq \beta ^{\prime }$ is decomposable. Then there exist proper prefixes $\alpha ^{\prime \prime }, \beta ^{\prime \prime }$ of $\alpha ^{\prime }$ and $\beta ^{\prime }$ respectively such that ${var}(\alpha ^{\prime \prime }) \cap S = {var}(\beta ^{\prime \prime }) \cap S$. Then $\alpha ^{\prime \prime }$ and $\beta ^{\prime \prime }$ are proper prefixes of α and β, and since they are shorter than $\alpha ^{\prime }$ and $\beta ^{\prime }$, by our assumptions about $\alpha ^{\prime }$ and $\beta ^{\prime }$, we cannot have that ${var}(\alpha ^{\prime \prime }) \cap {qv}(E) = {var}(\beta ^{\prime \prime }) \cap {qv}(E)$. Consequently, either there exists $x \in {var}(\alpha ^{\prime \prime }) \cap {qv}(E)$ such that $ x\notin {var}(\beta ^{\prime \prime }) \cap {qv}(E)$ or there exists $x \in {var}(\beta ^{\prime \prime }) \cap {qv}(E)$ such that $x \notin {var}(\alpha ^{\prime \prime }) \cap {qv}(E)$. W.l.o.g. suppose the former is true. Then $x \notin {var}(\beta ^{\prime \prime })$, but since x ∈ qv(E), it follows from ${var}(\alpha ^{\prime }) \cap {qv}(E) = {var}(\beta ^{\prime }) \cap {qv}(E)$ that $x \in {var}(\beta ^{\prime })$. However, this implies that x ∈ S, and since $x \in {var}(\alpha ^{\prime \prime })$ but $x\notin {var}(\beta ^{\prime \prime })$, we arrive at a contradiction to our assumption that ${var}(\alpha ^{\prime \prime }) \cap S = {var}(\beta ^{\prime \prime }) \cap S$.

Now, let $E^{\prime \prime }$ be the equation given by $\pi _{S}(\alpha ^{\prime }) \doteq \pi _S(\beta ^{\prime })$. By the assumption that ${var}(\alpha ^{\prime }) \cap {qv}(E) = {var}(\beta ^{\prime }) \cap {qv}(E)$, there is no variable x ∈ qv(E)∖S occurring in $\alpha ^{\prime }$ or $\beta ^{\prime }$. Consequently, $E^{\prime \prime } = E^{\prime }$, and by Lemma 4.7, we have that ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ is isomorphic to an isolated path compression of order |E| of ${\mathscr{G}}^{\Rightarrow }_{[\alpha ^{\prime } \doteq \beta ^{\prime }]}$, which by Corollary 4.4 is isomorphic to ${\mathscr{G}}^{\Rightarrow }_{[E]}$. □

5 A Useful Invariant

When reasoning about the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$, we need a way to help determine whether or not, for two equations E₁, E₂, we have E₁ ⇒^∗E₂. Showing the positive case that E₁ ⇒^∗E₂ can be achieved by simply finding an appropriate sequence of length-preserving Nielsen transformations from E₁ to E₂. However, showing that E₁⇏^∗E₂ presents more of a challenge: the naive way would be to enumerate all vertices in ${\mathscr{G}}^{\Rightarrow }_{[E_{1}]}$ and show that E₂ is not among them. However, this is not suitable for abstract reasoning, and, even in concrete cases, is inelegant and time-consuming.

The contribution of this section is a property of basic RWEs, defined as Υ_E below, which is preserved under the relation ⇒ and thus provides a concise and more general means for showing that E₁⇏^∗E₂. It is an indispensable component of the proofs of our main results.

Definition 5.1 (The invariant Υ _E)

Let E be a basic RWE such that Card(var(E)) > 1. Let # be a new symbol not in X. Then we may write E as $x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}$ with x, y ∈ X and α₁, α₂, β₁, β₂ ∈ (X∖{x, y})^∗. Let ${\mathscr{Z}}_E = {var}(\alpha _{1}\alpha _{2}\beta _{1}\beta _{2}) \cup \{\#\}$. Let the function $Q_E : \mathcal {Z}_E \to X^2$ be defined as follows: for each $z \in \mathcal {Z}_E \backslash \{\#\}$, let Q_E(z) = (u, v) where uz is a factor of xα₁yα₂ and vz is a factor of yβ₁xβ₂. Let Q_E(#) = (u, v) where uy is a factor of xα₁yα₂ and vx is a factor of yβ₁xβ₂. Let ${\varUpsilon }_{\!E} = \{Q_E(z) \mid z \in \mathcal {Z}_E\}$. If Card(var(E)) ≤ 1, then Υ_E = ∅.

Intuitively, given a basic RWE E of the form $\alpha \doteq \beta $, we construct Υ_E by taking, for each variable x ∈ var(E), the pair (u, v) of predecessors of x in E, i.e. such that ux is a factor of α and vx is a factor of β. It follows directly from the definition of basic RWEs that this pair is unique, and it exists whenever x is not the leftmost variable in either α or β. The special case that x is the leftmost variable of α or β is handled by the special symbol #. The following observations follow directly from the definitions, but are central to the use of Υ_E in later proofs.

Remark 5.2

Let E be a basic regular word equation given by $\alpha y \doteq \beta x$ with x, y ∈ X and α, β ∈ X^∗. Then for each z ∈ var(α), there is exactly one element (u, v) ∈Υ_E such that u = z. For each z∉var(α), there is no element (u, v) ∈Υ_E such that u = z. Similarly, for each w ∈ var(β), there is exactly one element (u, v) ∈Υ_E such that v = w and for each w∉var(β), there is no element (u, v) ∈Υ_E such that v = w.

The usefulness of Υ_E as a property of basic RWEs arises from the fact that it is invariant under the length-preserving Nielsen transformations. Consequently for a given basic RWE E, we can use the set $\{E^{\prime } \mid {\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E}\}$ as an over-approximation of the set [E]_⇒.

Theorem 5.3

Let E₁, E₂ be basic RWEs such that E₁ ⇒^∗E₂. Then ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}$.

Proof

It is sufficient to prove the same statement for the case that E₁ ⇒ E₂. W.l.o.g. we may assume that E₁ ⇒_LE₂. The case that E₁ ⇒_RE₂ is symmetric. Moreover, if E₁ = E₂, then the statement holds trivially, thus we may assume that E₁≠E₂. The statement trivially holds for equations of the form $xy \doteq yx$, since $[xy \doteq yx]_{\Rightarrow } = \{xy \doteq yx\}$. Otherwise, taking into account the fact that E₁ and E₂ are basic and therefore indecomposable, we have two cases: we may write E₁ and E₂ as either

1.

$x \alpha _{1} w \alpha _{2} y \alpha _3 \doteq y w \beta _{1} x \beta _{2}$ and $x \alpha _{1} w \alpha _{2} y \alpha _3 \doteq w \beta _{1} y x \beta _{2}$, or
2.

$x \alpha _{1} y \alpha _{2} w \alpha _3 \doteq y w \beta _{1} x \beta _{2}$ and $x \alpha _{1} y \alpha _{2} w \alpha _3 \doteq w \beta _{1} y x \beta _{2}$

respectively, where w, x, y ∈ X with x≠y and α₁, α₂, α₃, β₁, β₂ ∈ (X∖{x, y, w})^∗ such that var(α₁α₂α₃) = var(β₁β₂).

Suppose that we have the first case, then $\mathcal {Z}_{E_{1}} = {var}(\alpha _{1}\alpha _{2}\alpha _3) \cup \{\#, w\}$ and $\mathcal {Z}_{E_{2}} = {var}(\alpha _{1}\alpha _{2}\alpha _3) \cup \{\#,y\}$. Moreover, for each z ∈ var(α₁α₂α₃), there exist u, v ∈ X such that uz (resp. vz) is a factor of the LHS (resp. RHS) of both E₁ and E₂, so $Q_{E_{1}}(z) = Q_{E_{2}}(z)$. Now, let a, b, c be the rightmost variables in xα₁, wα₂ and wβ₁ respectively (i.e. their length-1 suffixes). Then we have that $Q_{E_{1}}(w) = (a,y)$, $Q_{E_{1}}(\#) = (b,c)$, $Q_{E_{2}}(y) = (b,c)$, and $Q_{E_{2}}(\#) = (a,y)$. Thus ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}$.

Now suppose instead that we have the second case. Similarly to the first case, we have that $\mathcal {Z}_{E_{1}} = {var}(\alpha _{1}\alpha _{2}\alpha _3) \cup \{\#, w\}$, $\mathcal {Z}_{E_{2}} = {var}(\alpha _{1}\alpha _{2}\alpha _3) \cup \{\#,y\}$ and for each z ∈ var(α₁α₂α₃), $Q_{E_{1}}(z) = Q_{E_{2}}(z)$. Now, let a, b, c be the rightmost variables in xα₁, wβ₁ and yα₂ respectively. Then we have that $Q_{E_{1}}(w) = (c,y)$, $Q_{E_{1}}(\#) = (a,b)$, $Q_{E_{2}}(y) = (c,y)$, and $Q_{E_{2}}(\#) = (a,b)$. Thus ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}$ in both cases as required. □

As an example, let E₁ be the basic RWE given by $x u z w y \doteq y w u x z$. Then $\mathcal {Z}_{E_{1}} = \{u,z,w,\#\}$ and $Q_{E_{1}}$ is the function with $Q_{E_{1}}(u) = (x,w)$, $Q_{E_{1}}(z) = (u,x)$, $Q_{E_{1}}(w) = (z,y)$ and $Q_{E_{1}}(\#) = (w,u)$. Thus, ${\varUpsilon }_{\!E_{1}} = \{(w,u), (x,w), (u,x), (z,y)\}$. Similarly, if E₂ is the basic RWE given by $x u w z y \doteq y u x w z$, then ${\varUpsilon }_{\!E_{2}} = \{(x,y), (u,x), (w,w), (z,u)\}$. Consequently, we may conclude that E₁⇏^∗E₂ (and symmetrically that E₂⇏^∗E₁).

Since the invariant Υ_E provides a necessary condition on when two basic RWEs belong to the same equivalence class under ⇒^∗, we might also ask whether it is also sufficient, and hence characteristic. However, this is not the case. For instance, if E₃ is given by $x uvw y \doteq y wvu x$ and E₄ is given by $x wvu y \doteq y uvw x$, then ${\varUpsilon }_{\!E_3} = {\varUpsilon }_{\!E_4} = \{(x,v),(u,w),(v,y),(w,u)\}$ but it can be verified (e.g. by enumerating [E₃]_⇒ and [E₄]_⇒) that E₃⇏^∗E₄.

6 Jumbled Equations and a Special Case of Symmetry

The invariant property Υ_E introduced in the Section 5 consists of pairs of variables. The case that (x, x) ∈Υ_E for some x ∈ var(E) is special in the sense that it leads to a particular repetitive structure in the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$, described in the current section. We shall call basic RWEs E for which no pair of the form (x, x) occurs in Υ_E jumbled.

Definition 6.1 (Jumbled Equations and Δ(E))

Let E be a basic RWE and let Δ(E) = {x ∈ var(E)∣(x, x) ∈Υ_E}. If Card(Δ(E)) = 0, then E is jumbled.

For example, if we consider the equation E given by $xyzw \doteq wyzx$, then Υ_E = {(x, w),(y, y),(z, z)} so Δ(E) = {y, z} and E is not jumbled. On the other hand, for $E^{\prime }$ given by $x y z w \doteq w z y x$, we have ${\varUpsilon }_{\!E^{\prime }} = \{ (x,z), (y,w), (z,y) \}$, so ${{\varDelta }}(E^{\prime }) = \emptyset $ and $E^{\prime }$ is jumbled.

Note that since Υ_E is invariant under ⇒^∗, so is the property of being jumbled. Furthermore, it follows from the definitions that (x, x) ∈Υ_E for some basic RWE E and x ∈ X if and only if there exists y ∈ X such that one of the following holds:⁵

1.

xy occurs as a factor of both the LHS and RHS of E, or
2.

there exists $E^{\prime }$ with $E \Rightarrow E^{\prime }$ such that xy occurs as a factor of both the LHS and RHS of $E^{\prime }$.

The cardinality of Δ(E) can be interpreted as a measure of the similarity of the two sides of the equation. If Card(Δ(E)) is large in comparison to Card(E), then the orders in which the variables occur on the LHS and RHS of E will be similar. On the other hand, when Δ(E) = ∅, there will be no common order in the variables on each side, and hence the equation is ‘jumbled’. In general, we may observe the following bounds on Card(Δ(E)) as follows.

Remark 6.2

Let E be a basic RWE. It follows directly from Definition 5.1 that if Card(var(E)) < 2, then Card(Δ(E)) = 0. Otherwise, E can be written as $\alpha x \doteq \beta y$ for some x, y ∈ X, α ∈ (X∖{x})^∗ and β ∈ (X∖{y})^∗. Since E is basic, it is indecomposable, so we may additionally conclude that x≠y. By Remark 5.2, neither (x, x) nor (y, y) can be contained in Υ_E, so we must have Card(Δ(E)) ≤Card(var(E)) − 2.

The rest of this section is devoted to describing the structure of the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$ in the general case in terms of the graphs ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ where $E^{\prime }$ is jumbled. The first step is to notice that we can easily transform any basic RWE E into one which is jumbled by simply removing all variables x such that (x, x) ∈Δ(E).

Lemma 6.3

Let E be a basic RWE given by $\alpha \doteq \beta $ and let Y = var(E)∖Δ(E). Then the equation E_Y given by $\pi _Y(\alpha ) \doteq \pi _Y(\beta )$ is a jumbled basic RWE.

Proof

If Δ(E) = ∅, then the lemma holds trivially. Assume that Δ(E)≠∅. We shall prove the following statement, from which the lemma follows by a simple induction.

Claim 6.3.1

Suppose that E is a basic RWE given by $\alpha \doteq \beta $, and that x ∈Δ(E). Let $E^{\prime }$ be the equation $\pi _{{var}(E)\backslash \{x\}}(\alpha ) \doteq \pi _{{var}(E)\backslash \{x\}}(\beta )$. Then $E^{\prime }$ is a basic RWE and ${\varUpsilon }_{E^{\prime }} = {\varUpsilon }_{E} \backslash \{ (x,x) \}$.

Proof

Let $Q_E, \mathcal {Z}_E$ be defined as per Definition 5.1. We shall consider two cases depending on whether Q_E(#) = (x, x). Suppose firstly that Q_E(#)≠(x, x). Then there exist α₁, α₂, β₁, β₂ such that α = α₁xyα₂, β = β₁xyβ₂, π_var(E)∖{x}(α) = α₁yα₂ and π_var(E)∖{x}(β) = β₁yβ₂. Suppose for contradiction that $E^{\prime }$ is not basic. Clearly both sides of $E^{\prime }$ belong to ${qv}(E^{\prime })$, so we may infer that $E^{\prime }$ is decomposable, and thus that there exist proper prefixes $\alpha ^{\prime }$ and $\beta ^{\prime }$ of α₁yα₂ and β₁yβ₂ respectively such that ${var}(\alpha ^{\prime }) \cap {qv}(E^{\prime }) = {var}(\beta ^{\prime }) \cap {qv}(E^{\prime })$. Clearly, either y occurs in both $\alpha ^{\prime }$ and $\beta ^{\prime }$, or in neither. Let ${\tau } : {var}(E^{\prime })^{*} \to {var}(E)^{*}$ be the morphism such that τ(y) = xy and τ(z) = z for $z \in {var}(E^{\prime }) \backslash \{y\}$. Then $\alpha ^{\prime \prime } = {\tau }(\alpha ^{\prime })$ and $\beta ^{\prime \prime } = {\tau }(\beta ^{\prime })$ are proper prefixes of α and β respectively which satisfy ${var}(\alpha ^{\prime \prime }) \cap {qv}(E) = {var}(\beta ^{\prime \prime }) \cap {qv}(E)$. Thus E is decomposable and therefore not basic, a contradiction.

To see that ${\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} \backslash \{(x,x)\}$, suppose firstly that x is not a prefix of α or β, and thus that α₁≠ε and β₁≠ε. Then $ \mathcal {Z}_{E} = ({var}(E) \backslash \{ \alpha _{1}[1], \beta _{1}[1]\}) \cup \{\#\} $, and $\mathcal {Z}_{E^{\prime }} = ({var}(E) \backslash \{ \alpha _{1}[1], \beta _{1}[1], x\} ) \cup \{\#\}$. It follows from the definitions that $Q_{E^{\prime }}(y) = Q_{E}(x) = (\alpha _{1}[|\alpha _{1}|], \beta _{1}[|\beta _{1}|])$. Since α₁, β₁≠ε, α₁[1]∉{x, y} and β₁[1]∉{x, y}. Consequently there exist u_#, v_# ∈ var(E)∖{x} such that u_#α₁[1] is a factor of both α and π_var(E)∖{x}(α) and such that v_#β₁[1] is a factor of both β and π_var(E)∖{x}(β). It follows that $Q_E(\#) = Q_{E^{\prime }}(\#) = (u_\#,v_\#)$. Likewise, for any z∉{x, y, α₁[1],β₁[1]}, there exist u, v ∈ var(E)∖{x} such that uz is a factor of both α and π_var(E)∖{x}(α) and such that vz is a factor of both β and π_var(E)∖{x}(β). It follows that $Q_E(z) = Q_{E^{\prime }}(z) = (u,v)$. Thus we may conclude that ${\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} \backslash \{(x,x)\}$.

Next, suppose that α₁ = ε and β₁≠ε (the case that β₁ = ε and α₁≠ε is symmetric). Then $\mathcal {Z}_{E} = ({var}(E) \backslash \{x,\beta _{1}[1] \}) \cup \{\#\}$ and $\mathcal {Z}_{E^{\prime }} = ({var}(E) \backslash \{y,x,\beta _{1}[1] \}) \cup \{\#\}$. Then Q_E(#) = (u_#, β₁[|β₁|]) where u_#β₁[1] is a factor of xyα₂. Since E is regular, each variable occurs once per side, so we may infer that β₁[1]≠y, and hence that u_#≠x. It follows that u_#β₁[1] is also a factor of yα₂, so we may further conclude that $Q_{E^{\prime }}(\#) = (u_\#,\beta _{1}[|\beta _{1}|]) = Q_E(\#)$. Note that Q_E(y) = (x, x). Let z ∈ var(E)∖{x, y, β₁[1]}. Then there exist u, v ∈ var(E)∖{x} such that uz is a factor of both xyα₂ and yα₂, and such that vz is a factor of both β₁xyβ₂ and β₁yβ₂. It follows that $Q_E(z) = Q_{E^{\prime }}(z) = (u,v)$. Again we have ${\varUpsilon }_{E^{\prime }} = {\varUpsilon }_{E}\backslash \{(x,x)\}$. Finally, note that if α₁ = β₁ = ε, then E is decomposable, which is a contradiction to the assumption that E is basic.

It remains to consider the case that Q_E(#) = (x, x). This implies that there exist u, v ∈ var(E)∖{x} and α₁, α₂, β₁, β₂ ∈ var(E)^∗ such that α = uα₁xvα₂, β₂ = vβ₁xuβ₂, meaning $E^{\prime }$ is given by $u \alpha _{1} v \alpha _{2} \doteq v \beta _{1} u \beta _{2}$. Suppose for contradiction that $E^{\prime }$ is not basic. Then as in the previous case, it must be decomposable, and there exist proper prefixes $\alpha ^{\prime }, \beta ^{\prime }$ of uα₁vα₂ and vβ₁uβ₂ respectively which satisfy ${var}(\alpha ^{\prime }) \cap {qv}(E^{\prime }) = {var}(\beta ^{\prime }) \cap {qv}(E^{\prime })$. Then we must have that $\alpha ^{\prime } = u \alpha _{1} v \alpha _3$ and $\beta ^{\prime } = v \beta _{1} u \beta _3$ for some α₃, β₃ ∈ X^∗. However, it follows that $\alpha ^{\prime \prime } = u \alpha _{1} x v \alpha _3$ and $\beta ^{\prime \prime } = v \beta _{1} x u \beta _3$ are proper prefixes of α and β satisfying ${var}(\alpha ^{\prime \prime }) \cap {qv}(E) = {var}(\beta ^{\prime \prime }) \cap {qv}(E)$, so E is decomposable which is a contradiction to the assumption that E is basic.

To see that ${\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} \backslash \{(x,x)\}$, note that in this case $\mathcal {Z}_E = ({var}(E) \backslash \{u,v\}) \cup \{\#\}$ and $\mathcal {Z}_{E^{\prime }} = ({var}(E) \backslash \{u,v,x\}) \cup \#$. It follows from the definitions that $Q_{E^{\prime }}(\#) = Q_{E}(x) = (w_{1},w_{2})$, where w₁ is the leftmost variable in uα₁ and w₂ is the leftmost variable in vβ₁. Moreover, for any z ∈ var(E)∖{u, v, x}, there exist $w_{1}^{\prime },w_{2}^{\prime } \in {var}(E)\backslash \{x\}$ such that $w_{1}^{\prime }z$ is a factor of both uα₁xvα₂ and uα₁vα₂, and such that $w_{2}^{\prime }z$ is a factor of both vβ₁xuβ₂ and vβ₁uβ₂, meaning that $Q_{E^{\prime }}(z) = Q_{E}(z) = (w_{1}^{\prime },w_{2}^{\prime })$. It follows that ${\varUpsilon }_{E^{\prime }} = {\varUpsilon }_{E}\backslash \{(x,x)\}$ as required. □

We conclude the proof by noting that if Δ(E) = {x₁, x₂,…,x_k}, then there exist equations E_i for 0 ≤ i ≤ k given by $\alpha _i \doteq \beta _i$ such that

1.

E₀ = E and E_k = E_Y, and
2.

for 1 ≤ i ≤ k, $\alpha _i = \pi _{{var}(E_{i-1}) \backslash \{x_i\}}(\alpha _{i-1})$ and $\beta _i = \pi _{{var}(E_{i-1}) \backslash \{x_i\}}(\beta _{i-1})$.

Since E is basic, it follows by Claim 6.3.1 that E_i is basic for 1 ≤ i ≤ k, and moreover by the same claim that ${\varUpsilon }_{\!E_Y} = {\varUpsilon }_{\!E} \backslash \{(x_i,x_i) \mid 1 \leq i \leq k\}$ meaning that E_Y is both basic and jumbled. □

There is a strong relation between the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ for a basic RWE E and ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ where E_Y is the jumbled basic RWE obtained from E by deleting the variables in Δ(E). The relation is described formally in Theorem 6.8. Before presenting the theorem, it is useful to first introduce some additional notions. Essentially, ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is made up of approximate copies of ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$. Each copy is a subgraph ${\mathscr{H}}_{\varphi }^E$ of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ which is associated with a certain morphism φ : Y^∗→ var(E)^∗ from a set Φ_E defined below. Intuitively, φ can be seen as a way of assigning variables in Δ(E) to variables in Y = var(E)∖Δ(E).

Definition 6.4 (The set Φ _E)

Let E be a basic RWE. Let Y = var(E)∖Δ(E). Let Φ_E be the set of morphisms φ : Y^∗→ var(E)^∗ satisfying φ(y) ∈Δ(E)^∗y for all y ∈ Y, and $\sum \limits _{y \in Y} |{\varphi }(y)|_x = 1$ for all x ∈Δ(E).

The subgraphs ${\mathscr{H}}^E_{{\varphi }}$ are obtained by restricting ${\mathscr{G}}^{\Rightarrow }_{[E]}$ to subsets $H^E_{{\varphi }}$ defined below. More precisely, ${\mathscr{H}}^E_{{\varphi }}$ consists of vertices $H^E_{{\varphi }}$ and edges (E₁, E₂) whenever $E_{1},E_{2} \in H^E_{\varphi }$ and E₁ ⇒ E₂ (i.e. whenever (E₁, E₂) is an edge of ${\mathscr{G}}^{\Rightarrow }_{[E]}$). We shall say that ${\mathscr{H}}^E_{{\varphi }}$ is the subgraph of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ induced by $H^E_{{\varphi }}$.

Definition 6.5 ($V_{\varphi }^E, U_{\varphi }^E$ and $H_{\varphi }^E$)

Let E be a basic RWE given by $\alpha \doteq \beta $ and let Y = var(E)∖Δ(E). Let E_Y be the equation $\pi _Y(\alpha ) \doteq \pi _Y(\beta )$. Let φ ∈Φ_E. Then we define the sets $V_{\varphi }^E, U_{\varphi }^E$ and $H_{\varphi }^E$ as follows:

1.

$V_{\varphi }^E = \{ {\varphi }(\hat {\alpha }) \doteq {\varphi }(\hat {\beta }) \mid \hat {\alpha } \doteq \hat {\beta } \in [E_Y]_{\Rightarrow }\}$,
2.

$H_{\varphi }^E = \{ E^{\prime } \mid \exists E^{\prime \prime } \in V_{\varphi }^E, Z \in \{L,R\}. E^{\prime \prime } \Rightarrow ^{*}_Z E^{\prime } \}$,
3.

$U_{\varphi }^E = H_{\varphi }^E \backslash V_{\varphi }^E$.

For each φ ∈Φ_E, the subgraph ${\mathscr{H}}_{\varphi }^E$ is an approximate copy of ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ in the sense that ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ is isomorphic to an isolated path contraction of ${\mathscr{H}}_{\varphi }^E$. The intuition behind the sets $V_{\varphi }^E$ and $U_{\varphi }^E$ is that they provide a decomposition of the set $H_{\varphi }^E$ of vertices of ${\mathscr{H}}_{\varphi }^E$ into those which survive after the isolated path compression ($V_{\varphi }^E$) and those which are compressed/removed ($U_{\varphi }^E$). The underlying isomorphism is the function which maps equations $\hat {\alpha } \doteq \hat {\beta } \in [E_Y]_{\Rightarrow }$ to ${\varphi }(\hat {\alpha }) \doteq {\varphi }(\hat {\beta })$.

The structure of each subgraph ${\mathscr{H}}^E_{\varphi }$ is therefore essentially the same as the structure of ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$. In order to fully understand the structure of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ however, we also need to know how the individual subgraphs are connected, or in other words, when two of subgraphs ${\mathscr{H}}^E_{{\varphi }_{1}}, {\mathscr{H}}^E_{{\varphi }_{2}}$ share a common vertex. We shall later see (Lemma 6.14) that ${\mathscr{H}}^E_{{\varphi }_{1}}$ and ${\mathscr{H}}^E_{{\varphi }_{2}}$ share a vertex if and only if the corresponding morphisms φ₁, φ₂ satisfy a ‘closeness’ condition defined as follows. See Fig. 4 for a complete example of the resulting relation.

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig4_HTML.png — Fig. 4
A graph representing the closeness relation for morphisms in Φ_E for a basic RWE E with var(E) = {x₁, x₂, x₃, x₄} and Δ(E) = {x₃, x₄}, meaning that Y = {x₁, x₂}. In this case, Φ_E contains six morphisms, φ_i,1 ≤ i ≤ 6, which make up the vertices of the graph. Vertices connected by an edge are close in the sense of Definition 6.6

Definition 6.6 (Close morphisms φ ₁, φ ₂ ∈Φ _E)

Let E be a basic RWE and let Y = var(E)∖Δ(E). Let φ₁, φ₂ ∈Φ_E. Then φ₁, φ₂ are close if there exist y₁, y₂ ∈ Y with y₁≠y₂ and γ₁, γ₂ ∈Δ(E)^∗ such that:

1.

For all y ∈ Y ∖{y₁, y₂}, φ₁(y) = φ₂(y), and
2.

φ₁(y₁) = γ₁γ₂y₁, φ₂(y₁) = γ₂y₁, and φ₂(y₂) = γ₁φ₁(y₂).

Informally, two morphisms φ₁, φ₂ ∈Φ_E are close if we can obtain one from the other by removing some prefix of the image of a variable y₁ and appending it to the left of the image of another variable y₂. For example, suppose that var(E) = {x₁, x₂, x₃, x₄, x₅, x₆} and Δ(E) = {x₃, x₄, x₅, x₆}, and consider the two morphisms ${\varphi }_{1},{\varphi }_{2} : \{x_{1},x_{2}\}^{*} \to \{x_{1},x_{2},x_3,x_4,x_5,x_6\}^{*}$ given by φ₁(x₁) = x₄x₃x₅x₁, φ₁(x₂) = x₆x₂, φ₂(x₁) = x₅x₁ and φ₂(x₂) = x₄x₃x₆x₂. Then φ₁, φ₂ both belong to Φ_E and are close, since we can get one from the other simply by moving the the prefix x₄x₃ from the image of x₁ to the image of x₂.

The following lemma shows that even when φ₁ and φ₂ are not close, we can find a sequence of intermediate morphisms in Φ_E starting with φ₁ and ending with φ₂, such that each morphism in the sequence and its successor are close, and such that this sequence is ‘short’. This will form the basis of our claim that the subgraphs ${\mathscr{H}}_{\varphi }^E$ which make up the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ are well-connected, and in particular means that there is a (short) path in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ between any two of the subgraphs.

Lemma 6.7

Let E be a basic RWE and suppose that ${\varphi }^{\prime }, {\varphi }^{\prime \prime } \in {{\varPhi }}_E$ with ${\varphi }^{\prime } \not = {\varphi }^{\prime \prime }$. Then there exist k ≤ 4Card(Δ(E)) + 1 and φ₁, φ₂, φ₃,…,φ_k ∈Φ_E such that ${\varphi }^{\prime } = {\varphi }_{1}$, ${\varphi }^{\prime \prime } = {\varphi }_k$, and φ_i, φ_i+ 1 are close for all i,1 ≤ i < k.

Proof

Let Y = var(E)∖Δ(E). If Δ(E) = ∅, then Φ_E contains only the identity morphism. Thus we may assume that Δ(E)≠∅ and consequently by Remark 6.2 that Card(Y ) ≥ 2. Note the following claim.

Claim 6.7.1

Let φ₁, φ₂ ∈Φ_E, y₁, y₂ ∈ Y, z ∈Δ(E) and γ₁, γ₂ ∈Δ(E)^∗ such that y₁≠y₂ and

1.

φ₁(y₁) = γ₁zγ₂y₁, φ₂(y₁) = γ₁γ₂y₁ and φ₂(y₂) = zφ₁(y₂), and
2.

φ₁(y) = φ₂(y) for all y ∈ Y ∖{y₁, y₂}.

Then there exists φ₃ ∈Φ_E such that φ₁, φ₃ are close, and φ₃, φ₂ are close.

Proof

Let φ₃ be the morphism such that φ₃(y₁) = γ₂y₁, φ₃(y₂) = γ₁zφ₁(y₂), and φ₃(y) = φ₁(y) for all y ∈ Y ∖{y₁, y₂}. Then it follows directly from the definitions that φ₁, φ₃ are close. Moreover, since φ₂(y) = φ₁(y) for all y ∈ Y ∖{y₁, y₂}, it also follows from the definitions that φ₂, φ₃ are also close. □

Claim 6.7.1 shows us that with two successors in a sequence, we can ‘move’ any variable z ∈Δ(E) from φ(y₁) to the prefix of φ(y₂) where y₁, y₂ ∈ Y with y₁≠y₂ (leaving the rest of the morphism unchanged). Given any ${\varphi }^{\prime } \in {{\varPhi }}_E$ we can reach any other morphism ${\varphi }^{\prime \prime } \in {{\varPhi }}_E$ by moving each variable z ∈Δ(E) twice in this manner according to the following strategy: firstly, we move each variable z ∈Δ(E) to the prefix of the image of a variable y ∈ Y such that $z \notin {var}({\varphi }^{\prime \prime }(y))$. Note that this is possible due to the assumption that Card(Y ) ≥ 2 and requires moving each variable in Δ(E) at most once. Then, we move the variables z ∈Δ(E) back to the images of the ‘correct’ y ∈ Y in the appropriate order. For example, if ${\varphi }^{\prime \prime }(y) = z_{1}z_{2} {\ldots } z_n y$, then we would first move z_n to the prefix of the image of y, then z_n− 1, and so on. Again this requires moving each variable at most once, and once we have done this for all variables, then we will be left with exactly the morphism ${\varphi }^{\prime \prime }$. Overall we have moved each variable at most twice. Since each move requires two successors in the underlying sequence, we need at most 4Δ(E) successors in total and the statement of the lemma follows. □

We are now ready to give the full statement relating ${\mathscr{G}}^{\Rightarrow }_{[E]}$ and ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ formally as follows. An example demonstrating the theorem is given in Fig. 5.

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig5_HTML.png — Fig. 5
Example illustrating Theorem 6.8. On the left is ${\mathscr{G}}_{[E]}^{\Rightarrow }$ for the equation E given by $y_{1}xy_{2}y_3y_4 \doteq y_4y_3xy_{2}y_{1}$. Note that Δ(E) = {x}, so Y = {y₁, y₂, y₃, y₄} and E_Y is given by $y_{1}y_{2}y_3y_4 \doteq y_4y_3y_{2}y_{1}$. The graph ${\mathscr{G}}_{[E_Y]}^{\Rightarrow }$ is shown on the top-right, where the equations in [E_Y]_⇒ have been labelled A, B, C, D, E, F, G. The set Φ_E contains four morphisms φ_i, 1 ≤ i ≤ 4, such that φ_i(y_i) = xy_i and φ_i(y_j) = y_j for j≠i. In this case, all morphisms in Φ_E are close to each other so the closeness relation (depicted as the graph ${\mathscr{G}}^{\text {Close}}_{{{\varPhi }}_E}$ on the bottom-right) is a complete graph. The graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is comprised of four subgraphs ${\mathscr{H}}^E_{{\varphi }_i}$, 1 ≤ i ≤ 4. Each subgraph and morphism from Φ_E is depicted with a distinct colour in the figure. For each Z ∈{A, B, C, D, E, F, G} given by $\alpha _Z \doteq \beta _Z$, Z_i denotes the equation ${\varphi }_i(\alpha _Z) \doteq {\varphi }_i(\beta _Z)$. Thus the set of vertices unique to the subgraph ${\mathscr{H}}^E_{{\varphi }_i}$ is given by $V^E_{{\varphi }_i} = \{A_i,B_i,C_i,D_i,E_i,F_i,G_i\}$. The vertices shared between two subgraphs (i.e. those belonging to $U^E_{{\varphi }_i}$) are labelled u₁, u₂,…,u₆. Since any two morphisms from Φ_E are close, each pair of subgraphs have at least one vertex in common. Each subgraph can be made isomorphic to ${\mathscr{G}}^{\Rightarrow }_{[E]}$ by contracting the paths (dashed) passing through the shared vertices i₁, i₂,…,i₆. For example, the subgraph ${\mathscr{H}}^E_{{\varphi }_{1}}$ containing the vertices A₁, B₁, C₁, D₁, E₁, F₁, G₁, u₁, u₄, u₅ can be made isomorphic to ${\mathscr{G}}_{[E_Y]}^{\Rightarrow }$ by contracting the paths (A₁, u₄, E₁),(B₁, u₅, D₁), and (C₁, u₁, C₁) into single edges (A₁, E₁),(B₁, D₁) and (C₁, C₁)

Theorem 6.8

Let E be a basic RWE given by $\alpha \doteq \beta $. Let Y = var(E)∖Δ(E). Let E_Y be the equation $\pi _Y(\alpha ) \doteq \pi _Y(\beta )$. Let $d = \max \limits \{1, {diam}({\mathscr{G}}^{\Rightarrow }_{[E_Y]})\}$. Then:

1.

for each φ ∈Φ_E, $H_{\varphi }^E \subseteq [E]_{\Rightarrow }$ and ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ is isomorphic to an isolated path contraction of order Card(Δ(E)) of the subgraph ${\mathscr{H}}_{\varphi }^E$ of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ induced by $H_{\varphi }^E$.
2.

${\mathscr{G}}^{\Rightarrow }_{[E]} = \bigcup \limits _{{\varphi } \in {{\varPhi }}_E}{\mathscr{H}}_{\varphi }^E$.
3.

${diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(d|E|^2)$.

Before we proceed with proving Theorem 6.8, it deserves a few further comments. Firstly, we note that since each morphism φ ∈Φ_E is clearly injective, the subsets $V_{\varphi }^E$ of vertices of each subgraph ${\mathscr{H}}_{\varphi }^E$ are pairwise disjoint. Consequently, while the subgraphs ${\mathscr{H}}_{\varphi }^E$ do overlap (and it is precisely these overlaps which mean they are all connected), each one contains a unique copy of the vertices of ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$.

Secondly, note that the number of morphisms in the set Φ_E will grow exponentially with respect to Card(Δ(E)). More precisely, we may assume some order Y = {y₁, y₂,…,y_n} on the variables in Y and represent each morphism φ ∈Φ_E as a word φ(y₁)φ(y₂)…φ(y_n). This representation is clearly unique to φ. Furthermore, a word over var(E)^∗ is a representation of this form for some φ ∈Φ_E if and only if each variable occurs exactly once, the variables y_i occur in order from left to right, and y_n occurs as a suffix. Thus, the number of morphisms in total is given by

$${\text{Card}}({{\varPhi}}_{E}) = \frac{({\text{Card}}({var}(E))-1)!}{({\text{Card}}({var}(E))-{\text{Card}}({{\varDelta}}(E)))!}.$$

Since each subgraph contains a subset of vertices not shared with any other, it follows that the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ will also be (at least) exponential in Card(Δ(E)). We shall see later in Section 9 that this is essentially the worst case for the size of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ for RWEs E, with the largest graphs corresponding exactly to the case that Card(Δ(E)) is maximal. Nevertheless, it is worth pointing out that in the same case, the graph ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ will be consist of a single vertex and two self-loops and thus the ${diam}({\mathscr{G}}^{\Rightarrow }_{[E]})$ will be (at most) quadratic in |E|. This is significantly better than our upper bound in the general case.

Proof of Theorem 6.8

The rest of the section focuses on the proof of Theorem 6.8. The main technical content is presented in the following series of lemmas. Statement 1 is given by Lemmas 6.15 and 6.16, while Statements 2 and 3 are given by Lemmas 6.17 and 6.18 respectively. Throughout the remainder of this section, for a basic RWE E given by $\alpha \doteq \beta $ and a morphism φ, we shall use the notation φ(E) as shorthand for ${\varphi }(\alpha ) \doteq {\varphi }(\beta )$. We begin by noting some properties of equations belonging to the sets $H_{\varphi }^E$. The first deals with equations belonging to $V_{\varphi }^E$ and follows directly from the definitions.

Fact 6.9

Let E be a basic RWE. Let Y = var(E)∖Δ(E), n = Card(Δ(E)) and let E_Y = π_Y(E). Suppose that φ ∈Φ_E. Then $E^{\prime } \in V_{\varphi }^E$ if and only if there exists a permutation σ : {1,2,…,n}→{1,2,…,n} and y₁, y₂,…,y_n with Y = {y₁, y₂,…,y_n} such that $y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }$ and such that $E^{\prime }$ can be written as

$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq {\varphi}(y_{\sigma(1)}) {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(n)}). $$

With a little additional reasoning, we can give a similar characterisation of equations contained in $U_{\varphi }^E$.

Lemma 6.10

Let E be a basic RWE. Let Y = var(E)∖Δ(E), n = Card(Δ(E)) and let E_Y = π_Y(E). Suppose that φ ∈Φ_E. Then $E^{\prime } \in U_{\varphi }^E$ if and only if there exist a permutation σ : {1,2,…,n}→{1,2,…,n} and y₁, y₂,…,y_n with Y = {y₁, y₂,…,y_n} such that one of the following holds:

1.

$y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }$ and $E^{\prime }$ may be written as:
$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)})$$
2.

$ y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \doteq y_{1}y_{2}{\ldots } y_n \in [E_Y]_{\Rightarrow }$ and $E^{\prime }$ may be written as:
$$\delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}) \doteq {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n})$$

where σ(ι) = 1, δ₁δ₂ = φ(y_σ(1)), and δ₁, δ₂≠ε.

Proof

Suppose that $E^{\prime }$ satisfies the conditions of the lemma. We shall consider the case that Statement 1 holds. The case that Statement 2 holds is symmetric. Then $E^{\prime \prime } \Rightarrow _L^{*} E^{\prime }$ where $E^{\prime \prime }$ is the equation given by

$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq \overbrace{\delta_{1} \delta_{2} }^{{\varphi}(y_{\sigma(1)})} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}).$$

Consequently, $E^{\prime \prime } = {\varphi }(\hat {E})$ for some $\hat {E} \in [E_Y]_{\Rightarrow }$, so $E^{\prime \prime } \in V_{\varphi }^E$ and thus $E^{\prime } \in H_{\varphi }^E$. Note however, that since $E^{\prime }$ is a basic RWE, each variable occurs exactly once on each side of the equation. We may therefore conclude that δ₁δ₂ = φ(y_σ(1)) is not a factor of the RHS of $E^{\prime }$, and consequently, by Fact 6.9, $E^{\prime } \notin V_{\varphi }^E$. Thus $E^{\prime } \in U_{\varphi }^E$.

Now suppose instead that $E^{\prime } \in U_{\varphi }^E$. Then there exists some $E^{\prime \prime } \in V_{\varphi }^E$, $k \in \mathbb {N}$ and Z ∈{L, R} such that $E^{\prime \prime } \Rightarrow _Z^k E^{\prime }$. Suppose we choose $E^{\prime \prime },Z$ and k such that k is minimal. Suppose additionally that Z = L. We shall show that Statement 1 of the lemma is satisfied. The case that Z = R is symmetric and results in Statement 2 being satisfied.

Since we have $E^{\prime \prime } \in V_{\varphi }^E$, it follows from Fact 6.9 that there exists a permutation σ : {1,2,…,n}→{1,2,…,n} and y₁, y₂,…,y_n with Y = {y₁, y₂,…,y_n} such that $y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }$ and such that $E^{\prime \prime }$ can be written as

$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq {\varphi}(y_{\sigma(1)}) {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(n)}). $$

Let ℓ = |φ(y_σ(1))| and let $E^{\prime \prime \prime }$ be the equation given by

$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) {\varphi}(y_{\sigma(1)}) {\varphi}(y_{\sigma(\iota)}) \ldots {\varphi}(y_{\sigma(n)}) $$

where σ(ι) = 1. Then $E^{\prime \prime } \Rightarrow _Z^{\ell } E^{\prime \prime \prime }$. However, since $y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }$ and

$$y_{1}y_{2}{\ldots} y_{n} \!\doteq\! y_{\sigma(1)} y_{\sigma(2)} {\ldots} y_{\sigma(n)} \!\Rightarrow\! y_{1}y_{2}{\ldots} y_{n} \!\doteq\! y_{\sigma(2)} {\ldots} y_{\sigma(\iota-1)} y_{\sigma(1)} y_{\sigma(\iota)} {\ldots} y_{\sigma(n)}, $$

we may conclude that $y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (2)} {\ldots } y_{\sigma (\iota -1)} y_{\sigma (1)} y_{\sigma (\iota )} {\ldots } y_{\sigma (n)} \in [E_Y]_{\Rightarrow }$. Thus, by Fact 6.9, $E^{\prime \prime \prime } \in V_{\varphi }^E$. Consequently, since $V_{\varphi }^E$ and $U_{\varphi }^E$ are by definition disjoint, we must have that k∉{0,ℓ}. Moreover, by our assumption that k is as minimal, we must have that k < ℓ (otherwise we could choose $E^{\prime \prime \prime }$ in place of $E^{\prime }$ and get a smaller value of k). This directly implies that there exist δ₁, δ₁≠ε with δ₁δ₂ = φ(y_σ(1)) such that $E^{\prime }$ may be written as

$$ {\varphi}(y_{1}){\varphi}(y_{2}){\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)})$$

and thus Statement 1 of the lemma is satisfied. The case that Z = R is symmetrical, leading instead to the satisfaction of Statement 2. □

We shall now focus on the claim that $H_{\varphi }^E \subseteq [E]_{\Rightarrow }$ for each φ ∈Φ. The first step is to show that for at least one φ ∈Φ_E, the equation φ(E_Y) is contained in [E]_⇒.

Lemma 6.11

Let E be a basic RWE. Let Y = var(E)∖Δ(E) and let E_Y = π_Y(E). Then there exists φ ∈Φ_E such that φ(E_Y) ∈ [E]_⇒.

Proof

Note that if Δ(E) = ∅, then E_Y = E and Φ_E contains only the identity morphism, so the lemma holds trivially. Suppose that Δ(E)≠∅. By Remark 6.2, we may therefore assume that E is a basic RWE with at least two variables, so may write it as $ x \alpha _{1} u_{1} u_{2} {\ldots } u_n y \alpha _{2} \doteq y \beta _{1} u_{1} u_{2} {\ldots } u_n x \beta _{2}$ where x, y, u₁, u₂,…,u_n ∈ X are pairwise distinct variables and α₁, α₂, β₁, β₂ ∈ (var(E)∖{x, y, u₁, u₂,…,u_n})^∗, and such that α₁ and β₁ do not share a common non-empty suffix. Then $E\Rightarrow ^{*}_R E^{\prime }$ where $E^{\prime }$ is given by $u_{1}u_{2} {\ldots } u_n x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} u_{1} u_{2} {\ldots } u_n x \beta _{2}$.

Now, consider the function $Q_{E^{\prime }}$ as defined in Definition 5.1. Note in particular that $Q_{E^{\prime }}(\#) = (v,w)$ where v, w ∈ X are the length-1 suffixes of xα₁ and yβ₁, and hence v≠w. By Theorem 5.3, ${\varUpsilon }_{\!E} = {\varUpsilon }_{\!E^{\prime }}$ (and hence ${{\varDelta }}(E) = {{\varDelta }}(E^{\prime })$). Thus, for every z ∈Δ(E), there exists $z^{\prime } \in {var}(E)$ such that $Q_{E^{\prime }}(z^{\prime }) = (z,z)$, meaning that z occurs directly to the left of $z^{\prime }$ on both the LHS and RHS of $E^{\prime }$. It follows that each z ∈Δ(E) has a unique ‘successor’ variable $z^{\prime }$ occurring to the right of z on both sides of the equation, and therefore that there exists some morphism φ ∈Φ_E such that $E^{\prime } = {\varphi }(\pi _Y(E^{\prime }))$. Finally, notice that $u_i \in {{\varDelta }}(E^{\prime }) = {{\varDelta }}(E)$ for 1 ≤ i ≤ n, and consequently, $\pi _Y(E^{\prime })= \pi _Y(E) = E_Y$. □

The following lemma shows a correspondence between edges in ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ and paths in the subgraphs ${\mathscr{H}}_{\varphi }^E$ of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ which start and end with vertices from $V_{\varphi }^E$ and whose internal vertices (if there are any) belong to $U_{\varphi }^E$.

Lemma 6.12

Let E be a basic RWE. Let Y = var(E)∖Δ(E) and let E_Y = π_Y(E). Let Z ∈{L, R} and suppose that $E^{\prime },E^{\prime \prime } \in [E_Y]_{\Rightarrow }$ such that $E^{\prime } \Rightarrow _Z E^{\prime \prime }$. Let φ ∈Φ_E. Then there exist k ≤Card(Δ(E)) and E₀, E₁, E₂,…,E_k+ 1 such that

1.

${\varphi }(E^{\prime }) = E_0$ and ${\varphi }(E^{\prime \prime }) = E_{k+1}$, and
2.

$E_i \in U^E_{\varphi }$ for 1 ≤ i ≤ k, and
3.

E₀ ⇒_ZE₁ ⇒_ZE₂ ⇒_Z… ⇒_ZE_k ⇒_ZE_k+ 1.

Proof

Note that if Card(Y ) < 2, then [E_Y]_⇒ is a singleton and the lemma holds trivially. We may therefore assume that Card(Y ) ≥ 2. Suppose that Z = R. The case that Z = L is symmetric. Then there exist x, y ∈ Y and α₁, α₂, β₁, β₂ ∈ Y^∗ such that $E^{\prime }$ may be written as $x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}$ and $E^{\prime \prime }$ may be written as $\alpha _{1} xy \alpha _{2} \doteq y \beta _{1} x \beta _{2}$. Let $E_0 = {\varphi }(E^{\prime })$ and $E_{k+1} = {\varphi }(E^{\prime \prime })$. If φ(x) = x, then E₀ ⇒_RE_k+ 1 so the lemma holds for k = 0. Suppose that φ(x)≠x.

Then there exists k, 1 ≤ k ≤Card(Δ(E)) and z₁, z₂,…,z_k ∈Δ(E) such that φ(x) = z₁z₂…z_kx. For each i,1 ≤ i ≤ k, let E_i be the equation given by:

$$z_{i+1} {\ldots} z_{k} x {\varphi}(\alpha_{1}) z_{1} z_{2} {\ldots} z_{i} {\varphi}(y) {\varphi}(\alpha_{2}) \doteq {\varphi}(y) {\varphi}(\beta_{1} ) {\varphi}(x) {\varphi}(\beta_{2}).$$

Then it follows directly from Lemma 6.10 that $E_i \in U_{\varphi }^E$ for 1 ≤ i ≤ k. Moreover,

$$ E_{0} \Rightarrow_{R} E_{1} \Rightarrow_{R} E_{2} \Rightarrow_{R} {\ldots} \Rightarrow_{R} E_{k} \Rightarrow_{R} E_{k+1} $$

as required. □

A straightforward induction on Lemma 6.12 allows us to conclude that if, for some φ ∈Φ_E, φ(E_Y) ∈ [E]_⇒, then $H_{\varphi }^E \subseteq [E]_{\Rightarrow }$. We have already shown (Lemma 6.11 that this is true for at least one choice of φ. The next step is to show that φ(E_Y) ∈ [E]_⇒ for all φ ∈Φ_E, which we obtain as a consequence of Lemmas 6.7 and 6.14 below. Before proving Lemma 6.14, we need the following result, which we shall reuse later and is therefore stated separately.

Lemma 6.13

Let E be a basic RWE. Then there exist n₁, n₂ < |E|² and $\hat {E}$ such that $E \Rightarrow ^{n_{1}} \hat {E}$ and $\hat {E} \Rightarrow ^{n_{2}} E$ where $\hat {E}$ can be written as $x \alpha y \doteq y \beta x$ where x, y ∈ var(E) and α, β ∈ (var(E)∖{x, y})^∗

Proof

We shall prove the case that $E \Rightarrow ^{n_{1}} \hat {E}$. The case that $\hat {E} \Rightarrow ^{n_{2}} E$ is easily adapted. Recall that we may write any basic RWE as $x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}$ where x, y ∈ X and α₁, α₂, β₁, β₂ ∈ (X∖{x, y})^∗. We have the following claim:

Claim 6.13.1

For every basic, regular equation E given by $x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}$, either α₂ = β₂ = ε, or there exists n < |E| and $E^{\prime }$ such that $E\Rightarrow ^n E^{\prime }$ and $E^{\prime }$ may be written as $x^{\prime } \alpha _{1}^{\prime } y^{\prime } \alpha _{2}^{\prime } \doteq y^{\prime } \beta _{1}^{\prime } x^{\prime } \beta _{2}^{\prime }$ where $x^{\prime },y^{\prime } \in X$, $\alpha _{1}^{\prime },\alpha _{2}^{\prime },\beta _{1}^{\prime },\beta _{2}^{\prime } \in (X \backslash \{x^{\prime },y^{\prime }\})^{*}$, and such that $|\alpha _{1}^{\prime }|+ |\beta _{1}^{\prime }| > |\alpha _{1}| + |\beta _{1}|$.

Proof

Let E be given by $x \alpha _{1} y \alpha _{2} \!\doteq \! y \beta _{1} x \beta _{2}$ where x, y ∈ var(E) and α₁, α₂, β₁, β₂ ∈ (var(E)∖{x, y})^∗. We have two cases, either var(α₁) = var(β₁), in which case, due to the fact that E is basic and therefore indecomposable, we must have that α₂ = β₂ = ε, so the claim holds. Otherwise, there exists z ∈ (var(α₁)∖var(β₁)) ∪ (var(β₁)∖var(α₁)). W.l.o.g. suppose that z ∈ var(α₁)∖var(β₁). Then since E is regular, z ∈ var(β₂) and we can write E as $x \gamma _{1} z \gamma _{2} y \alpha _{2} \doteq y \beta _{1} x \delta _{1} z \delta _{2}$ where γ₁, γ₂, δ₁, δ₂ ∈ (var(E)∖{x, y, z})^∗. Consequently, we have that $E \Rightarrow _R^{*} E^{\prime }$ where $E^{\prime }$ is given by $z \gamma _{2} x \gamma _{1} y \alpha _{2} \doteq y \beta _{1} x \delta _{1} z \delta _{2}$. By Remark 3.2, we have that $E\Rightarrow ^n E^{\prime }$ where n < |E|. Moreover, $E^{\prime }$ clearly has the form described in the claim as witnessed by $x^{\prime } = z$, $y^{\prime } = y$, $\alpha _{1}^{\prime } = \gamma _{2} x \gamma _{1}$, $\alpha _{2}^{\prime } = \alpha _{2}$, $\beta _{1}^{\prime } = \beta _{1} x \delta _{1}$ and $\beta _{2}^{\prime } = \delta _{2}$. □

Since for any equation E of the form $x \alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}$, we must clearly have that |α₁| + |β₁| < |E|, it follows from a simple induction on Claim 6.13.1 that $E \Rightarrow ^n \hat {E}$ for some n < |E|² and $\hat {E}$ of the form $x^{\prime } \alpha y^{\prime } \doteq y^{\prime } \beta x^{\prime }$ as claimed. □

Lemma 6.14

Let E be a basic RWE. Let Y = var(E)∖Δ(E) and let E_Y = π_Y(E). Let φ₁, φ₂ ∈Φ_E. Then $H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}} \not = \emptyset $ if and only if φ₁, φ₂ are close.

Proof

If Card(Y ) < 2, then Φ_E consists of the identity morphism only so the statement holds trivially. Suppose that Card(Y ) ≥ 2. Suppose firstly that φ₁, φ₂ are close. Then there exist y₁, y₂ ∈ Y with y₁≠y₂ and γ₁, γ₂ ∈Δ(E)^∗ such that φ₁(y₁) = γ₁γ₂y₁, φ₂(y₁) = γ₂y₁, φ₂(y₂) = γ₁φ₁(y₂), and for y ∈ Y ∖{y₁, y₂}, φ₁(y) = φ₂(y). In order to show that $H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}} \not = \emptyset $, we need the following claim.

Claim 6.14.1

There exist $\hat {E} \in [E_Y]_{\Rightarrow }$ and $\hat {\alpha }_{1},\hat {\alpha }_{2},\hat {\beta }_{1},\hat {\beta }_{2} \in (Y \backslash \{y_{1},y_{2}\})^{*}$ such that $\hat {E}$ can be written either as:

1.

$y_{1} \hat {\alpha }_{1} y_{2} \hat {\alpha _{2}} \doteq y_{2} \hat {\beta }_{1} y_{1} \hat {\beta _{2}}$, or
2.

$y_{2} \hat {\alpha }_{1} y_{1} \hat {\alpha _{2}} \doteq y_{1} \hat {\beta }_{1} y_{2} \hat {\beta _{2}}$

Proof

By Lemma 6.13, there exists $\hat {E}^{\prime } \in [E_Y]_{\Rightarrow }$ such that $\hat {E}^{\prime }$ may be written as $x \hat {\alpha } z \doteq z \hat {\beta } x$ where x, z ∈ Y, x≠z and $\hat {\alpha },\hat {\beta } \in (Y \backslash \{x,z\})^{*}$. By Lemma 6.3, E_Y is basic, meaning that each variable in Y = var(E_Y) occurs exactly once on each side of E_Y. It follows by properties of ⇒ that each variable in Y also occurs exactly once in each of $x \hat {\alpha } z$ and $z \hat {\beta } x$. Hence there exist $\hat {\alpha }^{\prime }, \hat {\alpha }^{\prime \prime } \in (Y \backslash \{y_{2}\})^{*}$ such that $x \hat {\alpha } z = \hat {\alpha }^{\prime } y_{2} \hat {\alpha }^{\prime \prime }$ (and such that y₁ occurs in either $\hat {\alpha }^{\prime }$ or $\hat {\alpha }^{\prime \prime })$.

Suppose w.l.o.g. that y₁ occurs to the left of y₂ in the RHS. We shall show that Statement 1 of the lemma is satisfied. The case that y₂ occurs to the right of y₁ is symmetric and leads to Statement 2 being satisfied. Then there exist $\hat {\beta }^{\prime },\hat {\beta }^{\prime \prime }, \hat {\beta }^{\prime \prime \prime } \in (Y\backslash \{y_{1},y_{2}\})^{*}$ such that $z \hat {\beta } x = \hat {\beta }^{\prime } y_{1} \hat {\beta }^{\prime \prime } y_{2} \hat {\beta }^{\prime \prime \prime }$. Then we may write $\hat {E}^{\prime }$ as

$$ \hat{\alpha}^{\prime} y_{2} \hat{\alpha}^{\prime\prime} \doteq \hat{\beta}^{\prime} y_{1} \hat{\beta}^{\prime\prime} y_{2} \hat{\beta}^{\prime\prime\prime}.$$

Note that z is a suffix of $y_{2} \hat {\alpha }^{\prime \prime }$ and a prefix of $\hat {\beta }^{\prime }y_{1}$. Since y₂ does not occur in $\hat {\beta }^{\prime } y_{1}$, we have y₂≠z. Consequently, we may write $\hat {\alpha ^{\prime \prime }} = \hat {\alpha }^{\prime \prime \prime } z$ for some $\hat {\alpha }^{\prime \prime \prime }$. Then

$$ \begin{array}{@{}rcl@{}} &&\overbrace{\hat{\alpha}^{\prime} y_{2} \hat{\alpha}^{\prime\prime\prime}z \doteq \hat{\beta}^{\prime} y_{1} \hat{\beta}^{\prime\prime} y_{2} \hat{\beta}^{\prime\prime\prime}}^{\hat{E}^{\prime}}\\ \Rightarrow_{R}^{*} && y_{2} \hat{\alpha}^{\prime\prime\prime} \hat{\alpha}^{\prime} z \doteq \hat{\beta}^{\prime} y_{1} \hat{\beta}^{\prime\prime} y_{2} \hat{\beta}^{\prime\prime\prime}\\ \Rightarrow_{L}^{*} && y_{2} \underbrace{\hat{\alpha}^{\prime\prime\prime} \hat{\alpha}^{\prime} z }_{\hat{\alpha}_{1} y_{1} \hat{\alpha}_{2}} \doteq y_{1} \underbrace{\hat{\beta}^{\prime\prime} \hat{\beta}^{\prime} }_{\hat{\beta}_{1}} y_{2} \underbrace{\hat{\beta}^{\prime\prime\prime}}_{\hat{\beta}_{2}} \end{array} $$

so $y_{2} \hat {\alpha }^{\prime \prime \prime } \hat {\alpha }^{\prime } z \doteq y_{1} \hat {\beta }^{\prime \prime } \hat {\beta }^{\prime } y_{2} \hat {\beta }^{\prime \prime \prime } \in [E_Y]_{\Rightarrow }$. Since y₁ occurs either in $\hat {\alpha }^{\prime }$ or in $\hat {\alpha }^{\prime \prime } = \hat {\alpha }^{\prime \prime \prime }z$, we may write $\hat {\alpha }^{\prime \prime \prime } \hat {\alpha }^{\prime } z$ as $\hat {\alpha }_{1} y_{1} \hat {\alpha }_{2}$ for some $\hat {\alpha }_{1},\hat {\alpha }_{2} \in (Y\backslash \{y_{1},y_{2}\})^{*}$. Thus the first statement of the lemma holds with $\hat {\beta }_{1} = \hat {\beta }^{\prime \prime } \hat {\beta }^{\prime }$ and $\hat {\beta }_{2} = \hat {\beta }^{\prime \prime \prime }$. □

Assume that the first statement of Claim 6.14.1 holds. The case that the second statement holds is symmetric. Then there exists $\hat {E} \in [E_Y]_{\Rightarrow }$ such that $\hat {E}$ has the form $y_{1} \hat {\alpha }_{1} y_{2} \hat {\alpha _{2}} \doteq y_{2} \hat {\beta }_{1} y_{1} \hat {\beta _{2}}$, for some $ \hat {\alpha }_{1}, \hat {\alpha }_{2},\hat {\beta }_{1},\hat {\beta }_{2} \in (Y\backslash \{y_{1},y_{2}\})^{*}$. Let E_INT be the equation given by

$$\gamma_{2} y_{1} {\varphi}_{1}(\hat{\alpha}_{1}) \gamma_{1} {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2}) \doteq {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}) \gamma_{1} \gamma_{2} y_{1} {\varphi}_{1}(\hat{\beta}_{2})$$

and notice that

$$ \begin{array}{@{}rcl@{}} &&\overbrace{{\varphi}_{1}(y_{1}){\varphi}_{1}(\hat{\alpha}_{1}) {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2})\doteq {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}){\varphi}_{1}(y_{1}){\varphi}_{1}(\hat{\beta}_{2})}^{{\varphi}_{1}(\hat{E})} \\ \Rightarrow_{R}^{*} && \underbrace{\gamma_{2} y_{1} {\varphi}_{1}(\hat{\alpha}_{1}) \gamma_{1} {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2}) \doteq {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}) \gamma_{1} \gamma_{2} y_{1} {\varphi}_{1}(\hat{\beta}_{2})}_{E_{INT}}. \end{array} $$

Moreover, recall that φ₂(y₁) = γ₂y₁, φ₂(y₂) = γ₁φ₁(y₂). Since $\hat {\alpha }_{1},\hat {\alpha }_{2} \in (Y \backslash \{y_{1},y_{2}\})^{*}$, we also have ${\varphi }_{2}(\hat {\alpha }_{1}) = {\varphi }_{1}(\hat {\alpha }_{1})$ and ${\varphi }_{2}(\hat {\alpha }_{2}) = {\varphi }_{1}(\hat {\alpha }_{2})$. Consequently

$$ \begin{array}{@{}rcl@{}} & &\overbrace{\gamma_{2} y_{1} {\varphi}_{1}(\hat{\alpha}_{1}) \gamma_{1} {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2}) \doteq \gamma_{1} {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}) \gamma_{2} y_{1} {\varphi}_{1}(\hat{\beta}_{2})}^{{\varphi}_{2}(\hat{E})}\\ \Rightarrow_{L}^{*} & &\underbrace{\gamma_{2} y_{1} {\varphi}_{1}(\hat{\alpha}_{1}) \gamma_{1} {\varphi}_{1}(y_{2}) {\varphi}_{1}(\hat{\alpha}_{2}) \doteq {\varphi}_{1}(y_{2}){\varphi}_{1}(\hat{\beta}_{1}) \gamma_{1} \gamma_{2} y_{1} {\varphi}_{1}(\hat{\beta}_{2})}_{E_{INT}}. \end{array} $$

Since $\hat {E} \in [E_Y]_{\Rightarrow }$, by definition ${\varphi }_{1}(\hat {E}) \in V^E_{{\varphi }_{1}}$ and ${\varphi }_{2}(\hat {E}) \in V^E_{{\varphi }_{2}}$. Thus it follows that $E_{INT} \in U^E_{{\varphi }_{1}} \cap U^E_{{\varphi }_{2}}$ and consequently $H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}} \not = \emptyset $.

Now suppose instead that $H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}} \not = \emptyset $. Let $E_{INT} \in H^E_{{\varphi }_{1}} \cap H^E_{{\varphi }_{2}}$. If φ₁ = φ₂ then the statement holds trivially. Thus we assume that φ₁≠φ₂. Before we proceed, we need the following claim.

Claim 6.14.2

Let ${\varphi }^{\prime },{\varphi }^{\prime \prime } \in {{\varPhi }}_E$ and $\mu ^{\prime }, \mu ^{\prime \prime } \in Y^{*}$ such that $|\mu ^{\prime }|_y = |\mu ^{\prime \prime }|_y = 1$ for all y ∈ Y. If ${\varphi }^{\prime }(\mu ^{\prime }) = {\varphi }^{\prime \prime }(\mu ^{\prime \prime })$, then ${\varphi }^{\prime } = {\varphi }^{\prime \prime }$ and $\mu ^{\prime } = \mu ^{\prime \prime }$.

Proof

Suppose that ${\varphi }^{\prime }(\mu ^{\prime }) = {\varphi }^{\prime \prime }(\mu ^{\prime \prime })$. It follows from the definition of Φ_E that for any φ ∈Φ, the morphism π_Y ∘ φ is the identity over Y. Thus $\mu ^{\prime } = \pi _Y({\varphi }^{\prime }(\mu ^{\prime })) = \pi _Y({\varphi }^{\prime \prime }(\mu ^{\prime \prime })) = \mu ^{\prime \prime }$. Furthermore, for each y ∈ Y, we may uniquely reconstruct ${\varphi }^{\prime }(y)$ and ${\varphi }^{\prime \prime }(y)$ as the longest factors of the form Δ(E)^∗y in ${\varphi }^{\prime }(\mu ^{\prime })$ and ${\varphi }^{\prime \prime }(\mu ^{\prime \prime })$ respectively. It follows from the definition of Φ_E and the fact that $|\mu ^{\prime }|_y, |\mu ^{\prime \prime }|_y = 1$ that these factors will exist and be unique. Thus, under the assumption that ${\varphi }^{\prime }(\mu ^{\prime }) = {\varphi }^{\prime \prime }(\mu ^{\prime \prime })$, it follows that ${\varphi }^{\prime }(y) = {\varphi }^{\prime \prime }(y)$ for all y ∈ Y and hence ${\varphi }^{\prime } = {\varphi }^{\prime \prime }$. □

It follows from Fact 6.9 and Lemma 6.10 that for each i ∈{1,2}, there exists μ_i ∈ Y^∗ with |μ_i|_y = 1 for all y ∈ Y such that at least one of the LHS or RHS of E_INT has the form φ_i(μ_i). By Claim 6.14.2, and since φ₁≠φ₂, a single side of E_INT cannot have the form φ_i(μ_i) for both i = 1 and i = 2. By Fact 6.9, this means that $E_{INT} \notin V^E_{{\varphi }_{1}}, V^E_{{\varphi }_{2}}$ and consequently that $E_{INT} \in U^E_{{\varphi }_{1}} \cap U^E_{{\varphi }_{2}}$. Thus, either Statement 1 or Statement 2 of Lemma 6.10 holds with φ = φ₁ and $E^{\prime } = E_{INT}$. W.l.o.g. suppose that the LHS of E_INT has the form φ₁(μ₁) and the RHS of E_INT has the form φ₂(μ₂). This corresponds to the case that Statement 1 of Lemma 6.10 holds, so there exist y₁, y₂,…,y_n with Y = {y₁, y₂,…,y_n} and a permutation σ : {1,2,…,n}→{1,2,…,n} such that E_INT may be written as

$${\varphi}_{1}(y_{1}) {\varphi}_{1}(y_{2}) {\ldots} {\varphi}_{1}(y_{n}) \doteq \delta_{2} {\varphi}_{1}(y_{\sigma(2)}) {\ldots} {\varphi}_{1}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}_{1}(y_{\sigma(\iota)}) {\ldots} {\varphi}_{1}(y_{\sigma(n)})$$

where δ₁δ₂ = φ₁(y_σ(1)) with δ₁, δ₂≠ε and σ(ι) = 1. Note that by the definition of Φ_E, the fact that δ₂≠ε implies that δ₁ ∈Δ(E)^∗ and δ₂ = δ₃y_σ(1) for some δ₃ ∈Δ(E)^∗.

Recalling that the RHS of E_INT has the form φ₂(μ₂), we may directly infer that μ₂ = y_σ(1)y_σ(2)…y_σ(n) and subsequently φ₂(y_σ(2)) = δ₂, φ₂(y_σ(ι)) = δ₁φ₁(y_σ(ι)), and φ₂(y) = φ₁(y) for all y∉{y_σ(2), y_σ(ι)}. Thus φ₁ and φ₂ are close as required. □

We are now able to prove that each set $H_{\varphi }^E$ is in fact a subset of the vertices of ${\mathscr{G}}^{\Rightarrow }_{[E]}$, and thus that the subgraphs ${\mathscr{H}}_{\varphi }^E$ of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ are well-defined.

Lemma 6.15

Let E be a basic RWE. Then $H_{\varphi }^E \subseteq [E]_{\Rightarrow }$ for each φ ∈Φ_E.

Proof

Let Y = var(E)∖Δ(E) and let E_Y = π_Y(E). By Lemma 6.11, there exists φ ∈Φ_E such that φ(E_Y) ∈ [E]_⇒. Let $\tilde {E} \in H^E_{{\varphi }^{\prime }}$ for some arbitrary ${\varphi }^{\prime } \in {{\varPhi }}_E$. By Lemma 6.7, there exist k ≤ 4Card(Δ(E)) + 1 and φ₁, φ₂,…,φ_k ∈Φ_E such that ${\varphi } = {\varphi }_{1}, {\varphi }^{\prime } = {\varphi }_k$, and for 1 ≤ i < k, φ_i and φ_i+ 1 are close. Thus, by Lemma 6.14, there exist E₁, E₂,…,E_k such that $E_i \in H^E_{{\varphi }_i} \cap H^E_{{\varphi }_{i+1}} $ for 1 ≤ i < k.

It follows from Lemma 6.12 that if $E^{\prime },E^{\prime \prime } \in H^E_{{\varphi }_i}$ for some i,1 ≤ i ≤ k, then $E^{\prime } \Rightarrow ^{*} E^{\prime \prime }$. Thus, φ(E_Y) ⇒^∗φ(E₁), $E_k \Rightarrow ^{*} \tilde {E}$, and for 1 ≤ i ≤ k, E_i ⇒^∗E_i+ 1. Consequently, $\tilde {E} \in [E]_{\Rightarrow }$. Since this holds for all $\tilde {E} \in {\mathscr{H}}^E_{{\varphi }^{\prime }}$ for all ${\varphi }^{\prime } \in {{\varPhi }}_E$, the lemma follows. □

The following lemma completes the proof of Statement 1 of Theorem 6.8.

Lemma 6.16

Let E be a basic RWE. Let Y = var(E)∖Δ(E), let E_Y = π_Y(E), and let φ ∈Φ_E. Then ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ is isomorphic to an isolated path contraction of order Card(Δ(E)) of ${\mathscr{H}}_{\varphi }^E$.

Proof

For k ≥ 0, we shall say that a sequence of equations E₀, E₁,…,E_k+ 1 as a U-path if $E_0, E_{k+1} \in V^E_{\varphi }$, $E_i \in U^E_{\varphi }$ for 1 ≤ i ≤ k, and there exists Z ∈{L, R} such that E₀ ⇒_ZE₁ ⇒_ZE₂ ⇒_Z… ⇒_ZE_k ⇒_ZE_k+ 1. Let ◇ be the relation on $V_{\varphi }^E$ such that $E^{\prime } \diamond E^{\prime \prime }$ if and only if $E^{\prime },E^{\prime \prime } \in V_{\varphi }^E$ and there exists a U-path starting with $E^{\prime }$ and ending with $E^{\prime \prime }$. We shall show firstly that the graph ${\mathscr{G}}^{\diamond }_{V_{\varphi }^E}$ is an isolated path compression of order Card(Δ(E)) of ${\mathscr{H}}_{\varphi }^E$, and secondly that ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ is isomorphic to ${\mathscr{G}}^{\diamond }_{V_{\varphi }^E}$.

Clearly, every U-path is a path in ${\mathscr{H}}^E_{\varphi }$. Moreover, it follows from the definition of $H^E_{\varphi }$, along with the fact that $\Rightarrow _Z^{*}$ is an equivalence relation for Z ∈{L, R}, that for every vertex $E^{\prime } \in U^E_{\varphi }$, there exist $E^{\prime \prime },E^{\prime \prime \prime } \in V^E_{\varphi }$ and Z ∈{L, R} such that $E^{\prime \prime } \Rightarrow _Z^{*} E^{\prime }$ and $E^{\prime } \Rightarrow _Z^{*} E^{\prime \prime \prime }$. Consequently, every vertex in ${\mathscr{H}}^E_{\varphi }$ either belongs to $V^E_{\varphi }$ or is the internal vertex of some U-path. It follows as a direct consequence of the following claim that U-path containing a given vertex in $U_{\varphi }^E$ is unique, and therefore that no two distinct U-paths share an internal vertex. Thus ${\mathscr{G}}^{\diamond }_{V_{\varphi }^E}$ is an isolated path compression of order k of ${\mathscr{H}}_{\varphi }^E$ where k is the number of internal vertices in the longest U-path in ${\mathscr{H}}_{\varphi }^E$.

Claim 6.16.1

Let $E^{\prime } \in U_{\varphi }^E$. Then the in- and out-degrees of $E^{\prime }$ in ${\mathscr{H}}_{\varphi }^E$ are exactly one.

Proof

$$ {\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)})$$

where σ(ι) = 1, δ₁δ₂ = φ(y_σ(1)) and δ₁, δ₂≠ε. Moreover, $\hat {E} \in [E_Y]_{\Rightarrow }$ where $\hat {E}$ is given by $y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)}$. Note that ${\varphi }(\hat {E}) \Rightarrow _L^{*} E^{\prime }$.

Let $E^{\prime }_{\text {pre}_L}, E^{\prime }_{\text {suc}_L}$ be the equations such that $E^{\prime }_{\text {pre}_L} \Rightarrow _L E^{\prime }$ and $E^{\prime } \Rightarrow _L E^{\prime }_{\text {suc}_L}$. It follows from the definitions that ${\varphi }(\hat {E}) \Rightarrow _L^{*} E^{\prime }_{\text {pre}_L}$ and ${\varphi }(\hat {E}) \Rightarrow _L^{*} E^{\prime }_{\text {suc}_L}$, so both belong to $H_{\varphi }^E$ and the in- and out-degree of $E^{\prime }$ in ${\mathscr{H}}_{\varphi }^E$ are both at least one. To see that they are exactly one, we must show that for the equations $E^{\prime }_{\text {pre}_R}$ and $E^{\prime }_{\text {suc}_R}$ such that $E^{\prime }_{\text {pre}_R} \Rightarrow _R E^{\prime }$ and $E^{\prime } \Rightarrow _R E^{\prime }_{\text {suc}_R}$, neither $E^{\prime }_{\text {pre}_R}$ nor $E^{\prime }_{\text {suc}_R}$ is contained in the set $H_{\varphi }^E$. We may write $E^{\prime }_{\text {pre}_R}$ as

$$ z {\varphi}(y_{1}) {\varphi}(y_{2}) {\ldots} \delta_{3} \delta_{2} {\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(i-1)}) \delta_{1} {\varphi}(y_{\sigma(i)}) {\ldots} {\varphi}(y_{\sigma(n)})$$

where z ∈ X and δ₃ ∈ X^∗ such that δ₃z = δ₁, and we may write $E^{\prime }_{\text {suc}_R}$ as

$$ \gamma {\varphi}(y_{2}) {\ldots} \delta_{1} z^{\prime} \delta_{2} {\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(i-1)}) \delta_{1} {\varphi}(y_{\sigma(i)}) {\ldots} {\varphi}(y_{\sigma(n)})$$

where $z^{\prime } \in X$ and γ ∈ X^∗ such that $z^{\prime } \gamma = {\varphi }(y_{1})$. It follows by Fact 6.9 and Lemma 6.10 that any equation in $V_{\varphi }^E \cup U_{\varphi }^E = H_{\varphi }^E$ must have φ(y_σ(1)) = δ₁δ₂ occurring as a factor of at least one side. However, since each variable occurs exactly once on each side of the equations $E^{\prime }_{\text {pre}_R}$, $E^{\prime }_{\text {suc}_R}$, we may immediately observe that φ(y_σ(1)) does not occur as a factor of the LHS or of the RHS of either equation. Thus $E^{\prime }_{\text {pre}_R}$, $E^{\prime }_{\text {suc}_R} \notin H_{\varphi }^E$, and the in- and out-degrees of $E^{\prime }$ in ${\mathscr{H}}_{\varphi }^E$ are exactly one as claimed. □

The following claim asserts that each vertex in $E^{\prime } \in U^E_{\varphi }$ occurs on a U-path with at most Card(Δ(E)) internal vertices. Since we have already shown that $E^{\prime }$ occurs on exactly one U-path, it follows that all U-paths have at most Card(Δ(E)) internal vertices and thus that the order of the isolated path compression is at most Card(Δ(E)).

Claim 6.16.2

Let $E^{\prime } \in U_{\varphi }^E$. Then there exist k ≤Card(Δ(E)), E₀, E₁,…,E_k+ 1 and Z ∈{L, R} such that:

1.

$E_0, E_{k+1} \in V_{\varphi }^E$, and
2.

$E_i \in U_{\varphi }^E$ for 1 ≤ i ≤ k, and
3.

E_i ⇒_ZE_i+ 1 for 0 ≤ i ≤ k, and
4.

there exists i,1 ≤ i ≤ k such that $E^{\prime } = E_i$.

Proof

Since $E^{\prime } \in U_{\varphi }^E$, there exist a permutation σ : {1,2,…,n}→{1,2,…,n} and y₁, y₂,…,y_n with Y = {y₁, y₂,…,y_n} such either Statement 1 or Statement 2 of Lemma 6.10 holds. Suppose that Statement 1 holds. The case that Statement 2 holds is symmetric. Then the equation $\hat {E}$ given by $y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (1)} y_{\sigma (2)} {\ldots } y_{\sigma (n)}$ is contained in [E_Y]_⇒ and we may write $E^{\prime }$ as follows

$$ {\varphi}(y_{1}){\varphi}(y_{2}) \!\ldots\! {\varphi}(y_{n}) \!\doteq\! z_{j+1} {\ldots} z_{k} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) z_{1} \!\ldots\! z_{j} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)})$$

where σ(ι) = 1, z₁z₂…z_k = φ(y_σ(1)) and 1 ≤ j < k ≤Card(Δ(E)) + 1.

Now, let E₀ be the equation given by

$${\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq \overbrace{z_{1} z_{2} {\ldots} z_{k}}^{{\varphi}(y_{1})} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}),$$

let E_k+ 1 be the equation

$${\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq{\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \overbrace{z_{1} z_{2} {\ldots} z_{k}}^{{\varphi}(y_{1})} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}),$$

and for 1 ≤ i < k, let E_i be the equation given by

$$ {\varphi}(y_{1}){\varphi}(y_{2}) ... {\varphi}(y_{n}) \doteq z_{i+1} ... z_{k} y_{\sigma(1)} {\varphi}(y_{\sigma(2)}) ... {\varphi}(y_{\sigma(\iota-1)}) z_{1} {\ldots} z_{i} {\varphi}(y_{\sigma(\iota)}) ... {\varphi}(y_{\sigma(n)}).$$

Then clearly we have $E_0 = {\varphi }(\hat {E}) \in V_{\varphi }^E$. Let $\hat {E}^{\prime }$ be the equation given by $y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (2)} {\ldots } y_{\sigma (\iota -1)} y_{\sigma (1)} y_{\sigma (\iota )} {\ldots } y_{\sigma (n)}$. Then $\hat {E} \Rightarrow \hat {E}^{\prime }$ so $\hat {E}^{\prime } \in [E_Y]_{\Rightarrow }$, and moreover $E_{k+1} = {\varphi }(\hat {E}^{\prime })$ so $E_{k+1} \in V_{\varphi }^E$. Thus Statement 1 is satisfied. Note also that E_i ⇒_LE_i+ 1 for 0 ≤ i ≤ k, so Statement 3 is satisfied, and furthermore we have that $E_i \in H_{\varphi }^E$ for 1 ≤ i ≤ k. For each i,1 ≤ i ≤ k, since each variable y ∈ Y occurs exactly once on each side E_i, we may conclude that φ(y_σ(1)) = z₁z₂…z_k is not a factor of the RHS of E_i. Thus, by Fact 6.9, $E_i \notin V_{\varphi }^E$ so $E_i \in U_{\varphi }^E$ and Statement 2 is satisfied. Finally note that $E^{\prime } = E_j$, so Statement 4 is also satisfied. □

It remains to show that ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ is isomorphic to ${\mathscr{G}}^{\diamond }_{V^E_{\varphi }}$. Recall that by definition $V_{\varphi }^E = \{ {\varphi }(E^{\prime }) \mid E^{\prime } \in [E_Y]_{\Rightarrow }\}$ and note that the function mapping equations $\hat {E} \in [E_Y]_{\Rightarrow }$ to their counterparts ${\varphi }(\hat {E}) \in V^E_{\varphi }$ is a bijection. Consequently, the fact that ${\mathscr{G}}^{\diamond }_{V_{\varphi }^E}$ is isomorphic to ${\mathscr{G}}^{\Rightarrow }_{[E_Y]}$ follows directly from the following claim.

Claim 6.16.3

Let $\hat {E}_{1},\hat {E}_{2} \in [E_Y]_{\Rightarrow }$. Then $\hat {E}_{1} \Rightarrow \hat {E}_{2}$ if and only if ${\varphi }(\hat {E}_{1}) {\diamond } {\varphi }(\hat {E}_{2})$.

Proof

Suppose that $\hat {E}_{1} \Rightarrow \hat {E}_{2}$. Then it follows from Lemma 6.12 that ${\varphi }(\hat {E}_{1}) \diamond {\varphi }(\hat {E}_{2})$. Suppose instead that ${\varphi }(\hat {E}_{1}) \diamond {\varphi }(\hat {E}_{2})$. Since $\hat {E}_{1} \in [E_Y]_{\Rightarrow }$, it may be written as

$$y_{1}y_{2} {\ldots} y_{n} \doteq y_{\sigma(1)} y_{\sigma(2)} {\ldots} y_{\sigma(n)}$$

where Y = {y₁, y₂,…,y_n} and σ : {1,2,…,n}→{1,2,…,n} is a permutation.

By definition of ◇, there exists Z ∈{L, R} and $\ell \in \mathbb {N}$ such that ${\varphi }(\hat {E}_{1}) \Rightarrow _Z^{\ell } {\varphi }(\hat {E}_{2})$. Suppose that Z = L. The case that Z = R is symmetric. For i > 1, let E_i be the equation such that ${\varphi }(\hat {E}_{1}) \Rightarrow _L^i {\varphi }(E_i)$. Let k = |φ(y_σ(1))|− 1. Then we may write E_k+ 1 as

$${\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) {\varphi}(y_{\sigma(1)}) {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}).$$

Let $\hat {E}_3$ be the equation given by $y_{1}y_{2}{\ldots } y_n \doteq y_{\sigma (2)} {\ldots } y_{\sigma (\iota -1)} y_{\sigma (1)} y_{\sigma (\iota )} {\ldots } y_{\sigma (n)}$. Then $\hat {E}_{1} \Rightarrow \hat {E}_3$ so $\hat {E}_3 \in [E_Y]_{\Rightarrow }$ and it follows from Fact 6.9 that $E_{k+1} \in V_{\varphi }^E$. Hence we must have ℓ ≤ k + 1. Moreover, for 1 ≤ i ≤ k, there exist δ₁, δ₂ such that δ₁δ₂ = φ(y_σ(1)) and δ₁, δ₂≠ε and such that we may write E_i as

$${\varphi}(y_{1}){\varphi}(y_{2}) {\ldots} {\varphi}(y_{n}) \doteq \delta_{2} {\varphi}(y_{\sigma(2)}) {\ldots} {\varphi}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}(y_{\sigma(\iota)}) {\ldots} {\varphi}(y_{\sigma(n)}).$$

Consequently, by Lemma 6.10, $E_i \in U_{\varphi }^E$ for 1 ≤ i ≤ k. By definition, ${\varphi }(\hat {E}_{2}) \in V_{\varphi }^E$, so it follows that ℓ > k and thus ℓ = k + 1, and thus that in fact $\hat {E}_{2} = \hat {E}_3$, meaning that $\hat {E}_{1} \Rightarrow \hat {E}_{2}$ as required. □

Claims 6.16.1 and 6.16.2 show that the graph ${\mathscr{G}}^{\diamond }_{V_{\varphi }^E}$ is an isolated path compression of order Card(Δ(E)) of ${\mathscr{H}}_{\varphi }^E$. Claim 6.16.3 shows that ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is isomorphic to ${\mathscr{G}}^{\diamond }_{V_{\varphi }^E}$, so the statement of the lemma holds. □

The following lemma deals with the second statement of Theorem 6.8. It asserts that the subgraphs ${\mathscr{H}}^E_{\varphi }$ completely cover the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$: each edge and each vertex of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ also belong to at least one subgraph ${\mathscr{H}}^E_{\varphi }$.

Lemma 6.17

Let E be a basic RWE. Then ${\mathscr{G}}^{\Rightarrow }_{[E]} = \bigcup \limits _{{\varphi } \in {{\varPhi }}_E} {\mathscr{H}}^E_{\varphi }$.

Proof

We have already shown in Lemma 6.15 that each vertex of $\bigcup \limits _{{\varphi } \in {{\varPhi }}_E} {\mathscr{H}}^E_{\varphi }$ is a vertex of ${\mathscr{G}}^{\Rightarrow }_{[E]}$. Moreover, it follows directly from the definition of ${\mathscr{H}}^E_{\varphi }$ that each edge in $\bigcup \limits _{{\varphi } \in {{\varPhi }}_E} {\mathscr{H}}^E_{\varphi }$ is also an edge of ${\mathscr{G}}^{\Rightarrow }_{[E]}$. It remains to show that each vertex/edge of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is a vertex/edge of ${\mathscr{H}}^E_{\varphi }$ for some φ ∈Φ_E. The main step is Claim 6.17.1 as follows.

Claim 6.17.1

For every $E^{\prime } \in [E]_{\Rightarrow }$, and Z ∈{L, R}, there exists φ ∈Φ_E and $E^{\prime \prime } \in V_{\varphi }^E$ such that $E^{\prime \prime } \Rightarrow _Z^{*} E^{\prime }$.

Proof

Note that by Lemma 6.11, there exists E₀ ∈ [E]_⇒ and φ₀ ∈Φ_E such that $E_0 \in V_{{\varphi }_0}^E$ and thus the claim holds for $E_0 = E^{\prime }$. Note also that for every $E^{\prime } \in [E]_{\Rightarrow }$, since ⇒^∗ is an equivalence relation, we have $E_0 \Rightarrow ^{*} E^{\prime }$. Thus it is sufficient to show that if the claim holds for E_i and E_i ⇒ E_i+ 1, then it also holds for E_i+ 1.

Suppose that the claim holds for E_i ∈ [E]_⇒ and that $E_i \Rightarrow _{Z_i} E_{i+1}$. Then there exist φ_i ∈Φ_E and $E^{\prime \prime }_i \in V^E_{{\varphi }_i}$ such that $E^{\prime \prime }_i \Rightarrow _{Z_i}^{*} E_i$ and thus $E^{\prime \prime }_i \Rightarrow _{Z_i}^{*} E_{i+1}$. Thus $E_{i+1} \in H^E_{{\varphi }_i}$. If $E_{i+1} \in V^E_{{\varphi }_i}$, then the claim holds trivially. Suppose instead that $E_{i+1} \in U_{{\varphi }_i}^E$.

Let Y = var(E)∖Δ(E) and let E_Y = π_Y(E). Recall that there exist a permutation σ : {1,2,…,n}→{1,2,…,n} and y₁, y₂,…,y_n with Y = {y₁, y₂,…,y_n} such either Statement 1 or Statement 2 of Lemma 6.10 holds. Suppose that Statement 1 holds (the case that Statement 2 holds is symmetric). Then we may write E_i+ 1 as

$$ {\varphi}_{i}(y_{1}){\varphi}_{i}(y_{2}) {\ldots} {\varphi}_{i}(y_{n}) \doteq \delta_{2} {\varphi}_{i}(y_{\sigma(2)}) {\ldots} {\varphi}_{i}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}_{i}(y_{\sigma(\iota)}) {\ldots} {\varphi}_{i}(y_{\sigma(n)})$$

where σ(ι) = 1, δ₁δ₂ = φ_i(y_σ(1)) and δ₁, δ₂≠ε. Furthermore, we have $\hat {E} \in [E_Y]_{\Rightarrow }$ where $\hat {E}$ is the equation given by

$$ y_{1}y_{2}{\ldots} y_{n} \doteq y_{\sigma(1)} y_{\sigma(2)} {\ldots} y_{\sigma(n)}.$$

It is straightforward to see that for Z = L, ${\varphi }_i(\hat {E}) \Rightarrow _Z^{*} E_{i+1}$ and since ${\varphi }_i(\hat {E}) \in V^E_{{\varphi }_i}$ by definition, the claim holds in this case.

It remains to consider the case that Z = R. By Lemma 6.3, E_Y is basic and therefore indecomposable. Thus y₁≠y_σ(1). Let φ_i+ 1 : Y^∗→ X^∗ be the morphism such that φ_i+ 1(y_σ(1)) = δ₂, φ_i+ 1(y₁) = δ₁φ_i(y₁), and φ_i+ 1(y_j) = φ_i(y_j) for 1 ≤ j ≤ n with j∉{1,σ(1)}. Note that φ_i+ 1 ∈Φ_E since δ₁ ∈Δ(E)^∗ and δ₂ ∈Δ(E)^∗y_σ(1). Let $E^{\prime \prime }_{i+1}$ be the equation given by ${\varphi }_{i+1}(\hat {E})$, so that $E^{\prime \prime }_{i+1} \in V^E_{{\varphi }_{i+1}}$. Then we may write $E^{\prime \prime }_{i+1}$ as:

$$ \begin{array}{@{}rcl@{}} && \delta_{1} {\varphi}_{i}(y_{1}) {\varphi}_{i}(y_{2}) {\ldots} {\varphi}_{i}(y_{\sigma(1)-1}) \delta_{2} {\varphi}_{i}(y_{\sigma(1)+1}) {\ldots} {\varphi}_{i}(y_{n}) \\ \doteq && \delta_{2} {\varphi}_{i}(y_{\sigma(2)}) {\ldots} {\varphi}_{i}(y_{\sigma(\iota-1)}) \delta_{1} {\varphi}_{i}(y_{\sigma(\iota)}) {\ldots} {\varphi}_{i}(y_{\sigma(n)}).\end{array} $$

Consequently $E^{\prime \prime }_{i+1} \Rightarrow _R^{*} E_{i+1}$, so the claim holds for E_i+ 1 and by induction, it holds for all $E^{\prime } \in [E]_{\Rightarrow }$. □

It follows directly from Claim 6.17.1 that every vertex of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ belongs to $H^E_{\varphi }$ for some φ ∈Φ_E and is consequently also a vertex of some subgraph ${\mathscr{H}}^E_{\varphi }$. To see why the same holds for edges, note firstly that for every edge (E₁, E₂) in ${\mathscr{G}}^{\Rightarrow }_{[E]}$, there exists Z ∈{L, R} such that E₁ ⇒_ZE₂. By Claim 6.17.1 and since E₁ ∈ [E]_⇒, there exist φ ∈Φ_E and $E^{\prime } \in V^E_{\varphi }$ such that $E^{\prime } \Rightarrow _Z^{*} E_{1}$. It follows that $E^{\prime } \Rightarrow _Z^{*} E_{2}$, meaning that $E_{1},E_{2} \in H^E_{\varphi }$ (so they are both vertices of ${\mathscr{H}}^E_{\varphi }$). It follows by definition that (E₁, E₂) is an edge of ${\mathscr{H}}^E_{\varphi }$. □

The proof of Theorem 6.8 is completed by the following lemma which addresses the third statement of the theorem.

Lemma 6.18

Let E be a basic RWE. Let Y = var(E)∖Δ(E) and let E_Y = π_Y(E). Let $d= \max \limits \{1,{diam}({\mathscr{G}}^{\Rightarrow }_{[E_Y]})\}$. Then ${diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(d|E|^2)$.

Proof

Let $E^{\prime }, E^{\prime \prime } \in [E]_{\Rightarrow }$. Then by Lemma 6.17, there exist ${\varphi }^{\prime },{\varphi }^{\prime \prime } \in {{\varPhi }}_E$ such that $E^{\prime } \in H_{{\varphi }^{\prime }}$ and $E^{\prime \prime } \in H_{{\varphi }^{\prime \prime }}$. For each φ ∈Φ_E, note that by Lemma 6.16 and Remark 4.6, there is path of length O(d Card(Δ(E))) between any two vertices in $H^E_{\varphi }$. Thus if ${\varphi }^{\prime } = {\varphi }^{\prime \prime }$, then there is path of length O(d Card(Δ(E))) from $E^{\prime }$ to $E^{\prime \prime }$.

Suppose otherwise that ${\varphi }^{\prime } \not = {\varphi }^{\prime \prime }$. Then it follows from Lemma 6.7, there exist k ∈ O(Card(Δ(E))) and φ₁, φ₂,…,φ_k ∈Φ_E such that ${\varphi }^{\prime } = {\varphi }_{1}$, ${\varphi }^{\prime \prime } = {\varphi }_k$ and φ_i, φ_i+ 1 are close for 1 ≤ i < k. By Lemma 6.14, there exist E₁, E₂,…,E_k− 1 such that $E_i \in H^E_{{\varphi }_i} \cap H^E_{{\varphi }_{i+1}}$ for 1 ≤ i < k.

It follows that there exist paths from $E^{\prime }$ to E₀, from E_k to $E^{\prime \prime }$ and from E_i to E_i+ 1 for 1 ≤ i < k of length O(d Card(Δ(E))). Thus there is a path from $E^{\prime }$ to $E^{\prime \prime }$ of length O(kd Card(Δ(E))) = O(d Card(Δ(E))²) = O(d|E|²). Since this is true for all $E^{\prime },E^{\prime \prime }$, the statement of the lemma follows. □

7 Normal Forms and Block Decompositions

Having described the structure of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ for equations E which are not jumbled in the previous section, the current section focuses on the structure of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ in the case that E is jumbled. Our main result in this direction is the existence of specific normal forms, from which every vertex in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is polynomial distance away. We present two normal forms, with the second being a restriction on the first. Both are constructed based on reversed structures in such a way that they allow for taking full advantage of the invariant Υ_E from Section 5. A major advantage of this is that we are able to show later in Section 8 that the number of equations occurring as vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ in the second normal form is bounded by a polynomial in |E|, allowing us to prove that the diameter of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is also polynomial.

Since the results in this section mainly concern positive reachability statements, the technical content relies heavily on describing sequences of applications of ⇒. Certain sequences will occur repeatedly, so it is convenient to define some shorthand notations given in terms of the following ‘shortcut’ relations.

Definition 7.1 ($\xrightarrow {u,v}$ and ⊙)

For each u, v ∈ X, we define the relation $\xrightarrow {u,v} $ over basic regular equations as $E_{1}\xrightarrow {u,v} E_{2}$ if there exist x, y ∈ X and α₁, α₂, α₃, β₁, β₂, β₃ ∈ (X∖{u, v, x, y})^∗ such that E₁ may be written as $x \alpha _{1} u \alpha _{2} v \alpha _3 y \doteq y \beta _{1} u \beta _{2} v \beta _3 x$ and E₂ may be written as $x \alpha _{1} v \alpha _3 u \alpha _{2} y \doteq y \beta _{1} v \beta _3 u \beta _{2} x$. Additionally, we define $\odot = \bigcup \limits _{u,v\in X} \xrightarrow {u,v} $.

Note that there exist u, v ∈ X such that $E_{1}\xrightarrow {u,v} E_{2}$ if and only if E₁ ⊙ E₂. The following lemma verifies that if E₁ ⊙ E₂, then we can reach E₂ from E₁ by a short sequence of applications of the rewriting transformation ⇒, or equivalently, that there is a short path from E₁ to E₂ in ${\mathscr{G}}^{\Rightarrow }_{[E_{1}]}$.

Lemma 7.2

Let x, y, u, v ∈ X and α₁, α₂, α₃, β₁, β₂, β₃ ∈ (X∖{x, y, u, v})^∗. Let E₁ be the basic RWE given by $x \alpha _{1} u \alpha _{2} v \alpha _3 y \doteq y \beta _{1} u \beta _{2} v \beta _3 x$ and let E₂ be the basic RWE given by $x \alpha _{1} v \alpha _3 u \alpha _{2} y \doteq y \beta _{1} v \beta _3 u \beta _{2} x$. Then there exist n₁, n₂ < 4|E₁| such that $E_{1} \Rightarrow ^{n_{1}} E_{2}$ and $E_{2} \Rightarrow ^{n_{2}} E_{1}$.

Proof

Let E₃, E₄, E₅ be the equations given as follows:

$$ \begin{array}{@{}rcl@{}} E_{3}: &&v \alpha_{3} x \alpha_{1} u \alpha_{2} y \doteq y \beta_{1} u \beta_{2} v \beta_{3} x \\ E_{4}: &&x \alpha_{1} v \alpha_{3} u \alpha_{2} y \doteq u \beta_{2} y \beta_{1} v \beta_{3} x \\ E_{5}: &&v \alpha_{3} x \alpha_{1} u \alpha_{2} y \doteq u \beta_{2} y \beta_{1} v \beta_{3} x. \end{array} $$

Then it follows directly from the definitions that $E_{1} \Rightarrow _R^{*} E_3 \Rightarrow _L^{*} E_5 \Rightarrow _R^{*} E_4 \Rightarrow _L^{*} E_{2}$. Thus, by Remark 3.2, there exists n₁ < 4|E₁| such that $E_{1} \Rightarrow ^{n_{1}} E_{2}$. By the same remark, we know that $\Rightarrow _L^{*},\Rightarrow _R^{*}$ are symmetric, and thus we may similarly conclude that $E_{2} \Rightarrow _L^{*} E_4 \Rightarrow _R^{*} E_5 \Rightarrow _L^{*} E_3 \Rightarrow _R^{*} E_{1}$ so there exists n₂ < 4|E₁| such that $E_{2} \Rightarrow ^{n_{2}} E_{1}$. □

Corollary 7.3

Let E₁, E₂ be basic RWEs. If E₁ ⊙^mE₂ for some $m\in \mathbb {N}$, then E₁ ⇒ⁿE₂ for some n ∈ O(|E₁|m).

The first of our two normal forms is defined as follows. Theorem 7.5 confirms the desired property that any basic RWE E can be transformed into an equation $\overline {E}$ which is in normal form in a small (i.e. polynomial in |E|) number of rewriting steps.

Definition 7.4 (Normal Form)

Let E be a basic RWE. Then E is in normal form if it can be written as $x \alpha _{1} \alpha _{2}, {\ldots } \alpha _n y \doteq y \alpha _{1}^R \alpha _{2}^R {\ldots } \alpha _n^R x$ where x, y ∈ X, α_i ∈ X⁺ for 1 ≤ i ≤ n, and |α_i|≤ 3 for 1 ≤ i < n.

Theorem 7.5

Let E be a jumbled basic RWE. Then there exists $\overline {E}$ which is in normal form and such that $E \Rightarrow ^{n_{1}} \overline {E}$ and $\overline {E} \Rightarrow ^{n_{2}} E$ for some n₁, n₂ ∈ O(|E|³).

The main step in the proof of Theorem 7.5 is the following lemma, which we shall make use of again later and is therefore stated independently.

Lemma 7.6

Let E be a jumbled basic RWE of the form $x \gamma _{1} \beta _{1} y \doteq y \gamma _{2} \beta _{2} x$ where x, y ∈ X, γ₁, γ₂, β₁β₂ ∈ (X∖{x, y})^∗ and var(γ₁) = var(γ₂). Then at least one of the following two statements holds:

1.

$\beta _{1} = \beta _{2}^R$, or
2.

there exists α ∈ var(β₁)^∗ with 1 ≤|α|≤ 3, η₁, η₂ ∈ var(β₁)^∗ and n ∈ O(|E|) such that $E \odot ^n x\gamma _{1} \alpha \eta _{1} y \doteq y \gamma _{2} \alpha ^R \eta _{2} x$.

Proof

Throughout this proof, we shall use the fact that $E_{1} \xrightarrow {u,v} E_{2}$ implies E₁ ⊙ E₂ and shall use the two notations interchangeably where convenient. Let E be a jumbled basic RWE of the form $x \gamma _{1} \beta _{1} y \doteq y \gamma _{2} \beta _{2} x$ where x, y ∈ X, γ₁, γ₂, β₁β₂ ∈ (X∖{x, y})^∗ and var(γ₁) = var(γ₂). Suppose that $\beta _{1} \not = \beta _{2}^R$. Note that since E is basic and regular, var(β₁) = var(β₂), and moreover we have that β₁, β₂≠ε. Hence we may write E in the form:

$$ x\gamma_{1} u \delta_{1} y \doteq y \gamma_{2} \delta_{2} u \delta_{3} x $$

(1)

where u ∈ X, and δ₁, δ₂, δ₃ ∈ X^∗ such that uδ₁ = β₁ and δ₂uδ₃ = β₂. If δ₂ = ε then we can set α = u and we are done. Otherwise, the next step is to show that we can get to an equation of the form

$$ x\gamma_{1} u \delta_{1}^{\prime} z_{1}z_{2}{\ldots} z_{k} \delta_{2}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u \delta_{3}^{\prime} x $$

(2)

where z₁, z₂,…,z_k ∈ X, $\delta _{1}^{\prime },\delta _{2}^{\prime }, \delta _{3}^{\prime } \in X^{*}$. Suppose that our equation of the form (1) is not already of the form (2). Suppose firstly that there exist v₁, v₂ ∈ X such that v₁ and v₂ occur in the same order in δ₁ and δ₂. In other words, suppose there exist δ_1,1, δ_1,2, δ_1,3, δ_2,1, δ_2,2, δ_2,3 ∈ X^∗ such that we can write δ₁ = δ_1,1v₁δ_1,2v₂δ_1,3 and δ₂ = δ_2,1v₁δ_2,2v₂δ_2,3. Then we have that $E \xrightarrow {v_{1},v_{2}} E_{1,2}$ where E_1,2 is given by $x\gamma _{1} u \hat {\delta _{1}} y \doteq y \gamma _{2} \hat {\delta _{2}} u \hat {\delta _3} x$ such that $|\hat {\delta _{2}}| <|\delta _{2}|$, with $\hat {\delta _{1}} = \delta _{1,1} v_{2} \delta _{1,3} v_{1} \delta _{1,2} $, $\hat {\delta _{2}} = \delta _{2,1} v_{2} \delta _{2,3}$ and $\hat {\delta _3} = \delta _3 v_{1} \delta _{2,2}$.

Iterating this, we may thus conclude that there exists n₁ ≤|δ₂| and a sequence $E = E_{1,1} \odot E_{1,2} \odot {\ldots } \odot E_{1,n_{1}}$ such that $E_{1, n_{1}}$ has the form

$$x\gamma_{1} u \hat{\delta}_{1}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u \hat{\delta}_{3}^{\prime} x$$

where z₁, z₂,…,z_k ∈ X and $\hat {\delta }_{1}^{\prime } \in X^{*} z_{1} X^{*} z_{2} X^{*} {\ldots } X^{*} z_k X^{*}$. If all the internal X^∗ factors are the empty word (i.e. if $\hat {\delta }_{1}^{\prime } \in X^{*}z_{1}z_{2} {\ldots } z_k X^{*}$), then $E_{1,n_{1}}$ already has the desired form described by (2). Otherwise, there exists w ∈ X∖{z₁, z₂,…,z_k} such that w occurs between z₁ and z_k in $\hat {\delta }_{1}^{\prime }$. More precisely, we can write $E_{1,n_{1}}$ as:

$$x \gamma_{1} u \hat{\delta}_{1,1} z_{1} \hat{\delta}_{1,2} w \hat{\delta}_{1,3} z_{k} \hat{\delta}_{1,4} y \doteq y \gamma_{2} z_{k} \hat{\delta}_{2,1} z_{1} \hat{\delta}_{3,1} w \hat{\delta}_{3,2} x$$

where $\hat {\delta }_{1,1}, \hat {\delta }_{1,2}, \hat {\delta }_{1,3}, \hat {\delta }_{1,4}, \hat {\delta }_{2,1}, \hat {\delta }_{3,1}, \hat {\delta }_{3,2} \in X^{*}$ such that $\hat {\delta }_{1}^{\prime } = \hat {\delta }_{1,1}z_{1} \hat {\delta }_{1,2} w \hat {\delta }_{1,3}z_k \hat {\delta }_{1,4}$, and $z_{k-1} z_{k-2} {\ldots } z_{2} = \hat {\delta }_{2,1}$, and $\hat {\delta }_{3}^{\prime } = \hat {\delta }_{3,1} w \hat {\delta }_{3,2}$. In this case we have $E_{1,n_{1}} \xrightarrow {z_{1},w} E_{2,1} \xrightarrow {z_k,z_{1}} E_{2,2}$ where E_2,1 is given by

$$ x\gamma_{1} u \hat{\delta}_{1,1}w\hat{\delta}_{1,3}z_{k}\hat{\delta}_{1,4} z_{1}\hat{\delta}_{1,2} y \doteq y \gamma_{2} z_{k} \hat{\delta}_{2,1}w \hat{\delta}_{3,2} z_{1} u \hat{\delta}_{3,1} x $$

and E_2,2 is given by

$$ x\gamma_{1} u \hat{\delta}_{1,1}w\hat{\delta}_{1,3} z_{1}\hat{\delta}_{1,2} z_{k} \hat{\delta}_{1,4}y \doteq y \gamma_{2} z_{1} u \hat{\delta}_{3,1} z_{k} \hat{\delta}_{2,1}w \hat{\delta}_{3,2} x $$

which is again of the desired form described by (2) for k = 1 and z₁ = v₁. In all cases, there exists n₂ ≤|E| such that $E \odot ^{n_{2}} E_{2,2}$ for some equation E_2,2 of the desired form (2).

Now suppose that E_2,2 has the form (2), and define $\delta _{1}^{\prime },\delta _{2}^{\prime },\delta _{3}^{\prime }$ accordingly. Next, we note that there exists n₃ ∈{0,1} such that $E_{2,2} \odot ^{n_3} E_{3}$ where E₃ has the form

$$ x\gamma_{1} u^{\prime} z_{1} z_{2} {\ldots} z_{k} \delta_{1}^{\prime\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u^{\prime} \delta_{2}^{\prime\prime} x $$

(3)

where $u^{\prime } \in X$ and $\delta _{1}^{\prime \prime }, \delta _{2}^{\prime \prime } \in X^{*}$. Indeed, if $\delta _{1}^{\prime } = \varepsilon $, then this is trivial, simply taking E₃ = E_2,2. Otherwise, there exists $u^{\prime } \in X$ and $\delta _{1,1}^{\prime }, \delta _{3,1}^{\prime }, \delta _{3,2}^{\prime }$ such that $\delta _{1}^{\prime } = \delta _{1,1}^{\prime }u^{\prime }$ and $\delta _{3}^{\prime } = \delta _{3,1}^{\prime } u^{\prime } \delta _{3,2}^{\prime }$. Then E_2,2 may be written as:

$$ x\gamma_{1} u \delta_{1,1}^{\prime} u^{\prime} z_{1}z_{2}{\ldots} z_{k} \delta_{2}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u \delta_{3,1}^{\prime} u^{\prime} \delta_{3,2}^{\prime} x $$

and $E_{2,2} \xrightarrow {u, u^{\prime }} E_3$ where E₃ is given by

$$ x\gamma_{1} u^{\prime} z_{1}z_{2}{\ldots} z_{k} \delta_{2}^{\prime} u \delta_{1,1}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u^{\prime} \delta_{3,2}^{\prime} u \delta_{3,1}^{\prime} x $$

which is of the form (3) as required. Now, if k ≤ 2, we may take $\alpha = u^{\prime }z_{1}z_{2} {\ldots } z_k$, $\eta _{1} =\delta _{2}^{\prime } u \delta _{1,1}^{\prime }$ and $\eta _{2} = \delta _{3,2}^{\prime } u \delta _{3,1}^{\prime }$ and we are done. Suppose otherwise that k ≥ 3. Next, we observe that if $u\delta _{1,1}^{\prime }$ and $u\delta _{3,1}^{\prime }$ share a non-empty suffix, then we have an equation of the form $x {\ldots } s y \doteq y {\ldots } s x$. However, this implies that $(s,s) \in {\varUpsilon }_{\!E_3}$, and by Theorem 5.3, ${\varUpsilon }_{\!E_3} = {\varUpsilon }_{\!E}$, meaning that E is not jumbled: a contradiction. Consequently, there must exist s, t ∈ X with s≠t and $\beta _{1,1}^{\prime },\beta _{1,2}^{\prime },\beta _{2,1}^{\prime },\beta _{2,2}^{\prime } \in X^{*}$ such that E₃ has the form

$$ x\gamma_{1} u^{\prime} z_{1}z_{2}{\ldots} z_{k} \beta_{1,1}^{\prime} s \beta_{1,2}^{\prime} t y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{1} u^{\prime} \beta_{2,1}^{\prime} t \beta_{2,2}^{\prime} s x. $$

Then we have $E_3 \xrightarrow {z_{2},s} E_{4,1} \xrightarrow {z_{1},t} E_{4,2} \xrightarrow {z_k,z_{1}} E_{4,3}\xrightarrow {u^{\prime },z_{k-1}} E_{4,4}$ where E_4,1, E_4,2, E_4,3, E_4,4 are given as follows:

$$ \begin{array}{@{}rcl@{}} E_{4,1}: && x\gamma_{1} u^{\prime} z_{1} s \beta_{1,2}^{\prime} t z_{2}{\ldots} z_{k} \beta_{1,1}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{3} s z_{2} z_{1} u^{\prime} \beta_{2,1}^{\prime} t \beta_{2,2}^{\prime} x \\ E_{4,2}: && x\gamma_{1} u^{\prime} t z_{2}{\ldots} z_{k} \beta_{1,1}^{\prime} z_{1} s \beta_{1,2}^{\prime} y \doteq y \gamma_{2} z_{k} z_{k-1} {\ldots} z_{3} s z_{2} t \beta_{2,2}^{\prime} z_{1} u^{\prime} \beta_{2,1}^{\prime}x \\ E_{4,3}: && x\gamma_{1} u^{\prime} t z_{2}{\ldots} z_{k-1} z_{1} s \beta_{1,2}^{\prime} z_{k} \beta_{1,1}^{\prime} y \doteq y \gamma_{2} z_{1} u^{\prime} \beta_{2,1}^{\prime} z_{k} z_{k-1} {\ldots} z_{3} s z_{2} t \beta_{2,2}^{\prime} x \\ E_{4,4}: && x\gamma_{1} z_{k-1} z_{1} s \beta_{1,2}^{\prime} z_{k} \beta_{1,1}^{\prime} u^{\prime} t z_{2} {\ldots} z_{k-2} y \doteq y \gamma_{2} z_{1} z_{k-1} z_{k-2} {\ldots} z_{3} s z_{2} t \beta_{2,2}^{\prime} u^{\prime} \beta_{2,1}^{\prime} z_{k} x. \end{array} $$

Now, E_4,4 has the required form with α = z_k− 1z₁, $\eta _{1} = s \beta _{1,2}^{\prime } z_k \beta _{1,1}^{\prime } u^{\prime } t z_{2} {\ldots } z_{k-2}$ and $\eta _{2} = z_{k-2} {\ldots } z_3 s z_{2} t \beta _{2,2}^{\prime } u^{\prime } \beta _{2,1}^{\prime } z_k$. Moreover, we have that E ⊙ⁿE_4,4 with n ≤ n₂ + n₃ + 4 ≤|E| + 5 ∈ O(|E|) as claimed. □

We can now prove Theorem 7.5 with a simple induction based on Lemma 7.6.

Proof Theorem 7.5.

By Lemma 6.13, we have that $E \Rightarrow ^{n_{1}} E^{\prime }$ and $E^{\prime } \Rightarrow ^{n_{1}^{\prime }} E$ where $E^{\prime }$ is a basic regular equation of the form $x \beta _{1} y \doteq y \beta _{2} x$ such that x, y ∈ X and β₁, β₂ ∈ (X∖{x, y})^∗ with $n_{1},n_{1}^{\prime } \in O(|E|^2)$. By Theorem 5.3, since E is jumbled, $E^{\prime }$ is also jumbled. By a simple induction using Lemma 7.6 (starting with the case that γ₁ = γ₂ = ε) we can therefore infer that $E^{\prime } \odot ^{n_{2}} \overline {E}$ for some $\overline {E}$ in normal form and n₂ ∈ O(|E|²). It follows directly from the definitions that ⊙, is symmetric, so we also have that $\overline {E} \odot ^{n_{2}} E^{\prime }$. Thus, by Corollary 7.3 we have that $E^{\prime } \Rightarrow ^{n_3} \overline {E}$ and $\overline {E} \Rightarrow ^{n_{3}^{\prime }} E^{\prime }$ for some $n_3,n_{3}^{\prime } \in O(|E|^3)$, and therefore also that $E \Rightarrow ^n \overline {E}$ and $\overline {E} \Rightarrow ^{n^{\prime }} E$ for some $n,n^{\prime } \in O(|E|^3)$ as claimed. □

The idea behind the first normal form is to divide the RWE into pairs $(\alpha _i, \alpha _i^R)$ which are regular-reversed word equations (although solutions to the full equation E are not necessarily solutions to these smaller equations), and for which all but one belong to a finite number of cases (i.e. three cases depending on the length of α_i). Forcing the sub-equations to be regular-reversed gives us the most control when working with the invariant Υ_E. Some intuition behind this fact can be derived from the observation that if we know that a (complete) basic RWE E is regular-reversed, we can uniquely reconstruct it from the leftmost two variables on the LHS and Υ_E. Indeed, any regular-reversed basic RWE E can be written in the form $x_{1}x_{2} {\ldots } x_n \doteq x_n x_{n-1} {\ldots } x_{1}$, meaning that Υ_E = {(x_i− 1, x_i+ 1)∣2 ≤ i ≤ n}∪{(x_n− 1, x₂)}, and if we know x₁, then we may infer from Υ_E all the odd-index variables (x₃, x₅,…) and if we know x₂ then we may infer all the even-index variables (x₄, x₆,…).

Rather than looking at the pairs $(\alpha _i, \alpha _i^R)$ in isolation, in order to take full advantage of the invariant Υ_E, we actually need to consider pairs of the form

$$(\alpha_{i} \alpha_{i+1} {\ldots} \alpha_{j}, {\alpha_{i}^{R}} \alpha_{i+1}^{R} {\ldots} {\alpha_{j}^{R}})$$

for well-chosen values i and j. We shall call such pairs blocks, which we define formally below.

Definition 7.7 (Blocks)

We define 3 variations of blocks which may each have up to two types.

1.

A standard block is a pair $(\alpha _{1} \alpha _{2} {\ldots } \alpha _j, \alpha _{1}^R\alpha _{2}^R{\ldots } \alpha _j^R)$ such that j ≥ 1, α_i ∈ X^∗ for 1 ≤ i ≤ j, |α₁|∈{1,3}, and for each i,1 < i ≤ j, |α_i| = 2. It is Type A if |α₁| = 1 and Type B if |α₁| = 3.
2.

An initial block is a pair $(x \alpha _{1} {\ldots } \alpha _j, y \alpha _{1}^R {\ldots } \alpha _j^R)$ with j ≥ 0, x, y ∈ X with x≠y, and α_i ∈ (X∖{x, y})^∗ where |α_i| = 2 for 1 ≤ i ≤ j. All initial blocks are Type A.
3.

A final block is a pair (γ₁δy, γ₂δ^Rx) where x, y ∈ X with x≠y, and γ₁, γ₂, δ ∈ X^∗ with |δ|≥ 1 such that (γ₁, γ₂) is a block (initial or standard). It is Type A if (γ₁, γ₂) is Type A, and Type B otherwise.

Given an equation which is in normal form, we may decompose it uniquely into blocks in the following manner. The intuition behind this decomposition is that if we fix the invariant property Υ_E, then each block (with the exception of the final block) is determined entirely by the block preceding it along with its first (leftmost in the first element) variable. This gives us a crucial degree of control when considering which equations in normal form may appear in ${\mathscr{G}}^{\Rightarrow }_{[E]}$.

Definition 7.8 (Block Decomposition)

Let E be a basic RWE in normal form. Then E may be written as $x \alpha _{1}\alpha _{2}{\ldots } \alpha _n y \doteq y \alpha _{1}^R \alpha _{2}^R {\ldots } \alpha _n^R x$ where x, y ∈ X, α_i ∈ X⁺ for 1 ≤ i ≤ n, and |α_i|≤ 3 for 1 ≤ i < n. Let I = {i₁, i₂,…,i_k} = {i∣1 ≤ i < n and |α_i|≠ 2} with 1 ≤ i₁ < i₂ < … < i_k < n. If I = ∅, let $\mathfrak {B} = (E)$. Otherwise, let $\mathfrak {B} = (B_0,B_{1}, \ldots , B_k)$ where for 0 ≤ j ≤ k, the B_j are blocks such that:

1.

$B_0 = (x \alpha _{1} {\ldots } \alpha _{i_{1}-1}, y \alpha _{1}^R {\ldots } \alpha _{i_{1}-1}^R)$,
2.

$B_k = (\alpha _{i_k} {\ldots } \alpha _n y, \alpha _{i_k}^R {\ldots } \alpha _n^R x)$, and
3.

for 1 ≤ j < k, $B_j = (\alpha _{i_j} {\ldots } \alpha _{i_{j+1}-1}, \alpha _{i_j}^R {\ldots } \alpha _{i_{j+1}-1}^R)$.

Then $\mathfrak {B}$ is the block decomposition of E.

As an example, consider the basic RWE E given as follows:

$$ x \overbrace{z_{1} z_{2} }^{\alpha_{1}} \overbrace{z_{3}}^{\alpha_{2}} \overbrace{z_{4} z_{5} z_{6}}^{\alpha_{3}} \overbrace{z_{7} z_{8}}^{\alpha_{4}} \overbrace{z_{9}}^{\alpha_{5}} \overbrace{z_{10} z_{11} z_{12} z_{13} }^{\alpha_{6}} y \doteq y \overbrace{z_{2} z_{1} }^{{\alpha_{1}^{R}}} \overbrace{z_{3}}^{{\alpha_{2}^{R}}} \overbrace{z_{6} z_{5} z_{4}}^{{\alpha_{3}^{R}}} \overbrace{z_{8} z_{7}}^{{\alpha_{4}^{R}}} \overbrace{z_{9}}^{{\alpha_{5}^{R}}} \overbrace{z_{13} z_{12} z_{11} z_{10} }^{{\alpha_{6}^{R}}} x $$

Note that E is in normal form. Then I = {2,3,5} and the block decomposition of E is (B₀, B₁, B₂, B₃) where:

$$ \begin{array}{@{}rcl@{}} B_{0} &=& (xz_{1}z_{2}, yz_{2}z_{1})\\ B_{1} &=& (z_{3}, z_{3})\\ B_{2} &=& (z_{4}z_{5}z_{6}z_{7}z_{8}, z_{6}z_{5}z_{4}z_{8}z_{7})\\ B_{3} &=& (z_{9}z_{10}z_{11}z_{12}y, z_{9}z_{12}z_{11}z_{10}x). \end{array} $$

Another example illustrating the block decomposition of an equation in normal form is given in Fig. 6. The next fact follows directly from the definitions.

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig6_HTML.png — Fig. 6
A depiction of the equation E given by $x z_{1} z_{2} z_3 z_4 z_5 z_6 z_7 z_8 z_9 z_{10} z_{11} z_{12} z_{13} z_{14} z_{15} y \doteq y z_{2} z_{1} z_5 z_4 z_3 z_7 z_6 z_{8} z_{10} z_9 z_{11} z_{15} z_{14} z_{13} z_{12} x$ where x, y and z_i for 1 ≤ i ≤ 15 are variables. The LHS and RHS of the equation are aligned vertically. The block decomposition $\mathfrak {B} = (B_0,B_{1},B_{2},B_3)$ of E is shown with solid rectangles and with the variety and type of the block written beneath. The additional divisions into the factors $\alpha _i, \alpha _i^R$ required by the definition of normal form are indicated by dashed lines (so that, i.e. α₁ = z₁z₂, α₂ = z₃z₄z₅, α₃ = z₆z₇, α₄ = z₈, z₅, α₅ = z₉z₁₀, α₆ = z₁₁ and α₇ = z₁₂z₁₃z₁₄z₁₅). In order for the equation to satisfy the definition of Lex Normal Form, the variables highlighted in bold must be lexicographically minimal with respect to the appropriate sets ${{\varGamma }}^E_i$. For i = 1, we have that ${{\varGamma }}^E_{1} = \{z_i \mid 3 \leq i \leq 15\} \backslash \{z_4\}$. In particular, ${{\varGamma }}^E_{1}$ consists of the first variable in the block B₁ (x₃) along with (nearly) all variables on the LHS of the equation occurring to the right of z₃, excluding the rightmost variable (y), and since B₁ is Type B, also excluding the second variable in the block B₁ (namely z₄). On the other hand, since B₂ is Type A, for i = 2, we do not need to exclude the second variable in the block B₂, so ${{\varGamma }}^E_{2} = \{ z_i \mid 8 \leq i \leq 15\}$. Assuming an underlying lexicographic order for which z_i+ 1 is greater than z_i, we can conclude that E is in Lex Normal Form

Fact 7.9

For every basic RWE in normal form, there exists a unique block decomposition (B₀, B₁,…,B_k) where k ≤Card(var(E)), B_k is a final block, and if k > 0, then B₀ is an initial block.

Since the blocks are fixed by their first variable, it is natural to ask for which variables we can find an equation in our graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ such that the block begins with that variable. In particular, can we find an equation in normal form in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ for which the first variable of each block is lexicographically minimal when reading from left to right? The answer to the question is “nearly”. In other words, if we relax the notion slightly to account for some specific exceptions, then we can always guarantee the existence of such an equation. This leads to the notion of Lex Normal Form defined below.

Definition 7.10 (Lex Normal Form)

Let E be a basic RWE in normal form. Then there exist x, y ∈ X and α, β ∈ (X∖{x, y})^∗ such that E has the form $x \alpha y \doteq y \beta x$. Let (B₀, B₁,…,B_k) be the block decomposition of E. For each i, 0 ≤ i ≤ k, let $\gamma _i, \gamma _i^{\prime } \in X^{*}$ such that $B_i = (\gamma _i,\gamma _i^{\prime })$, let S_i = {γ_i[2],y} whenever B_i is Type B and S_i = {y} otherwise, and let ${{\varGamma }}^E_i = \left (\bigcup \limits _{i \leq j \leq k} {var}(\gamma _j)\right )\backslash S_i $. A block B_i is lex-minimal if γ_i[1] is lexicographically minimal in ${{\varGamma }}^E_i$. The equation E is in Lex Normal Form (LNF) if, for each i, 0 < i < k, B_i is lex-minimal.

Lex Normal Form (see also Fig. 6 for an example) describes the class of equations for which the first variable of each blocks is lexicographically minimal whenever possible. We can, in general, guarantee the existence of an equation $E^{\prime }$ in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ such that the first variable of each block is lexicographically minimal with the following exceptions. Firstly, we must exclude the first and last blocks (the first block is fixed completely by Υ_E). Secondly, we must only compare the first variable to other variables occurring further right in the LHS of the equation, and excluding the rightmost variable on the LHS of the equation (y in the definition above) and, for blocks of Type B, the second variable in the block. The sets ${{\varGamma }}^E_i$ in the definition account for these exclusions.

The main result of this section is that every vertex in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is never more than a polynomial distance away from a vertex corresponding to an equation in LNF.

Theorem 7.11

Let E be a jumbled basic RWE. Then there exists $E^{\prime }$ such that $E^{\prime }$ is in Lex Normal Form, and such that $E \Rightarrow ^{n_{1}} E^{\prime }$ and $E^{\prime } \Rightarrow ^{n_{2}} E$ for some n₁, n₂ ∈ O(|E|⁴).

Although Theorem 7.11 does not provide as detailed a description of the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$ in the jumbled case as Theorem 6.8 does in the non-jumbled case, it does allow us to study them as the polynomial-distance neighbourhoods of the highly restricted set of vertices corresponding to equations in Lex Normal Form. Section 8 gives a strong example of the benefits of this approach, allowing us to show firstly that the cardinality of the set of vertices in Lex Normal Form is bounded by a polynomial in |E| (in contrast to the fact that the total number of vertices will typically be exponential, as shown in Section 9), and consequently, that the diameter of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is also bounded by a polynomial in |E|.

Proof of Theorem 7.11

The rest of this section is devoted to proving Theorem 7.11. To do so, we essentially provide a strategy for rewriting any jumbled basic regular word equation E into an equation in Lex Normal Form. The overall structure is similar to that of Theorem 7.5 in the sense that we transform the equation in steps from left to right so that after each step, the prefixes of the LHS and RHS having the desired form are longer. Since each side of the equation stays the same length under the transformations, we eventually reach a state where the entire equation is in the correct form.

The first step in this strategy is to first ensure that E is in normal form (which we can do due to Theorem 7.5). We can then decompose E into blocks according to Definition 7.8 (see also Fig. 6). In each subsequent step, we apply transformations which increase the number of blocks satisfying the requirements for Lex Normal Form. In particular, if the first j blocks satisfy the requirements for Lex Normal Form, then we apply a sequence of transformations which either preserve the first j − 1 blocks and turn the j^th block into a final block, or which preserves the first j blocks, and which result in an equation which is also in normal form, and for which the j + 1^th block also satisfies the requirements for Lex Normal Form. Note that Lex Normal Form does not impose any additional constraints on the initial or final blocks, so we can start with j = 1 and we are done whenever we produce a final block.

There are two cases depending on whether the j + 1^th block is Type A or Type B. The case that it is Type A is substantially the easier of the two and is considered directly in the proof of Lemma 7.17. Lemmas 7.12-7.16 focus on the case that the block is Type B. In this case, there exist x, y, a, b, c,∈ X and $\mu _{1},\mu _{1}^{\prime },\mu _{2},\mu _{2}^{\prime } \in X^{*}$ such that our equation may be written as

$$ x \mu_{1} abc \mu_{2} y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} x $$

where ${var}(\mu _{1}) = {var}(\mu _{1}^{\prime })$ and ${var}(\mu _{2}) = {var}(\mu _{2}^{\prime })$, the prefixes xμ₁ and $y \mu _{1}^{\prime }$ constitute the first j blocks (the ones satisfying the requirements for LNF), and such that the j + 1^th block, which does not satisfy the requirements for LNF, has the form $(abc\gamma , cba\gamma ^{\prime })$ for prefixes $\gamma ,\gamma ^{\prime }$ of $\mu _{2},\mu _{2}^{\prime }$ respectively. Our aim is to transform the equation above into an equation either of the form:

$$ x\mu_{1} \beta y \doteq y \mu_{1}^{\prime} \beta^{R} x$$

in which case the j^th block becomes final (and all other blocks are preserved), or of the form:

$$ x \mu_{1} zbw \eta y \doteq y \mu_{1}^{\prime} wbz \eta^{\prime} x $$

where w, z ∈ X and $\eta ,\eta ^{\prime } \in X^{*}$, such that either $\eta ^{\prime } = \eta ^R$ (meaning $(zbw \eta y, wbz \eta ^{\prime } x)$ is a final block), or z is lexicographically minimal in ${{\varGamma }}^E_{j+1} = {var}(\mu _{2}) \cup \{a,c\}$.

In the case that $\eta ^{\prime } = \eta ^R$, then the new equation is in normal form and will have a block decomposition with j + 1 blocks, such that the first j blocks are the same as before, and thus satisfy the requirements for LNF. The j + 1^th block is final, and trivially satisfies the requirements for LNF, so the whole equation is in LNF. In the second case, we can apply Lemma 7.6 to further transform our equation into one in normal form without changing the prefixes xμ₁zbw and $y\mu _{1}^{\prime } wbz$. In the resulting block decomposition, the first j blocks will remain unchanged, while the j + 1^th block will have the form $(zbw\gamma ,wbz \gamma ^{\prime })$ for some $\gamma ,\gamma ^{\prime } \in {{{\varGamma }}^E_{j+1}}^{*}$. Since ${{\varGamma }}^E_{j+1}$ will also remain unchanged, z is lexicographically minimal in ${{\varGamma }}^{E^{\prime }}_{j+1}$ for our new equation $E^{\prime }$, so the j + 1^th block also satisfies the requirements for LNF as intended.

The following lemma shows us how, under the rewriting transformation ⊙, we can replace the factors abc and cba with factors dbe and ebd, providing that d, e ∈ X occur in the appropriate positions (namely directly left of y and x on the LHS and RHS respectively).

Lemma 7.12

Let $E,E^{\prime }$ be basic RWEs given by

$$ \begin{array}{@{}rcl@{}} &&E:\quad x \mu_{1} abc \mu_{2} d \mu_{3} e y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} e \mu_{3}^{\prime} d x\\ &&E^{\prime}\!\!: \quad x \mu_{1} ebd \mu_{3} c \mu_{2} a y \doteq y \mu_{1}^{\prime} dbe \mu_{3}^{\prime} a \mu_{2}^{\prime} c x \end{array} $$

where x, y, a, b, c, d, e ∈ X and $\mu _{1},\mu _{2},\mu _3, \mu _{1}^{\prime },\mu _{2}^{\prime },\mu _{3}^{\prime } \in X^{*}$. Then $E \odot ^3 E^{\prime }$.

Proof

It follows from the definitions that:

$$ \begin{array}{@{}rcl@{}} &&\overbrace{x \mu_{1} abc \mu_{2} d \mu_{3} e y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} e \mu_{3}^{\prime} d x}^{E}\\ \xrightarrow{b,e} &&x \mu_{1} a e bc \mu_{2} d \mu_{3} y \doteq y \mu_{1}^{\prime} ce \mu_{3}^{\prime} d ba \mu_{2}^{\prime} x\\ \xrightarrow{c,d} &&x \mu_{1} a e bd \mu_{3} c \mu_{2} y \doteq y \mu_{1}^{\prime} d ba \mu_{2}^{\prime} ce \mu_{3}^{\prime} x\\ \xrightarrow{a,e} &&\underbrace{x \mu_{1} e bd \mu_{3} c \mu_{2} a y \doteq y \mu_{1}^{\prime} d be \mu_{3}^{\prime} a \mu_{2}^{\prime} c x.}_{E^{\prime}} \end{array} $$

Thus the statement follows by Lemma 7.2. □

Of course, the variable d occurring to the left of y on the RHS will in general not be the lexicographically minimal element z of ${{\varGamma }}^E_{j+1}$. In order to take advantage of Lemma 7.12, we also need to find a sequence of transformations which, for any z ∈{c}∪ var(ν), results in an equation of the form $x \mu _{1} a^{\prime }bc^{\prime } \eta z y \doteq y \mu _{1}^{\prime } c^{\prime }ba^{\prime } \eta ^{\prime } x$ with $a^{\prime },c^{\prime } \in X$ and $\eta ,\eta ^{\prime } \in X^{*}$. To achieve this, we need Lemmas 7.13 and 7.14 as follows.

Lemma 7.13

Let E be a basic RWE given by $x \mu _{1} \alpha \mu _{2} y \doteq y \mu _{1}^{\prime } \alpha ^R \mu _{2}^{\prime } x$ with α, μ₁, μ₂, $\mu _{1}^{\prime },\mu _{2}^{\prime }\in X^{*}$, 2 ≤|α|≤ 3, |μ₂|≥ 1 and ${var}(\mu _{2}) = {var}(\mu _{2}^{\prime })$. Let v = α[|α|− 1]. Then for each z ∈ var(αμ₂)∖{v}, there exists n ≤ 3 and $\eta ,\eta ^{\prime } \in X^{*}$ such that $E \odot ^n x \mu _{1} \eta z y \doteq y \mu _{1}^{\prime } \eta ^{\prime } x$.

Proof

Let z ∈ var(αμ₂)∖{v}. If z is a suffix of μ₂ then the statement holds trivially. Suppose that z is not a suffix of μ₂. We shall consider two cases separately. Firstly, suppose that z ∈ var(μ₂) ∪{α[|α|]}. Then there exists w ∈ X such that zw is a factor of αμ₂. Moreover, $w \in {var}(\mu _{2}) = {var}(\mu _{2}^{\prime })$, so there exist $\nu _{1}, \nu _{2},\nu _{1}^{\prime },\nu _{2}^{\prime } \in X^{*}$ such that μ₂ = ν₁wν₂ and $\mu _{2}^{\prime } = \nu _{1}^{\prime } w \nu _{2}^{\prime }$ where ν₁ = ε if z = α[|α|], and ν₁[|ν₁|] = z otherwise. Furthermore, there exists u ∈ X and $\alpha ^{\prime } \in X^{*}$ such that $\alpha = u \alpha ^{\prime }$ and $\alpha ^{R} = \alpha ^{\prime {R}} u$. Thus we may write E as $x \mu _{1} u \alpha ^{\prime } \nu _{1} w \nu _{2} y \doteq y \mu _{1}^{\prime } \alpha ^{\prime R} u \nu _{1}^{\prime } w \nu _{2}^{\prime } x$, and thus $E \xrightarrow {u,w} x \mu _{1} w \nu _{2} u \alpha ^{\prime } \nu _{1} y \doteq y \mu _{1}^{\prime } \alpha ^{\prime R} w \nu _{2}^{\prime } u \nu _{1}^{\prime } x$. Since z is a suffix of $\alpha ^{\prime }\nu _{1}$, the statement of the lemma follows.

Now suppose that z∉var(μ₂) ∪{α[|α|]}. Then the only possibility is that |α| = 3 and z = α[1]. In this case, due to the fact that ⊙ is symmetric, the statement follows directly from Lemma 7.12. □

Lemma 7.14

Let E be a basic RWE given by $x \mu _{1} v \mu _{2} y \doteq y \mu _{1}^{\prime } v \mu _{2}^{\prime } x$ with v ∈ X and $\mu _{1},\mu _{2}, \mu _{1}^{\prime },\mu _{2}^{\prime }\in X^{*}$ such that ${var}(\mu _{2}) = {var}(\mu _{2}^{\prime })$. Then for every z ∈ var(vμ₂), there exist $v^{\prime } \in X$ and $\eta ,\eta ^{\prime } \in X^{*}$ and n ≤ 1 such that $E \odot ^n x \mu _{1} v^{\prime } \eta z y \doteq y \mu _{1}^{\prime } v^{\prime } \eta ^{\prime } x$.

Proof

Let z ∈ var(vμ₂). If z is a suffix of μ₂, then the statement holds trivially. Otherwise, there exists w ∈ X such that zw is a factor of vμ₂. Moreover, since w≠v, $w \in {var}(\mu _{2}) = {var}(\mu _{2}^{\prime })$, so there exist $\nu _{1},\nu _{2},\nu _{1}^{\prime },\nu _{2}^{\prime } \in X^{*}$ such that vμ₂ = ν₁wν₂ and $v\mu _{2}^{\prime }$ = $\nu _{1}^{\prime } w \nu _{2}^{\prime }$. Thus we may write E as $x \mu _{1}\nu _{1} w \nu _{2} y \doteq y \mu _{1}^{\prime } \nu _{1}^{\prime } w \nu _{2}^{\prime } x$ such that v is a prefix of ν₁ and $\nu _{1}^{\prime }$, and such that z is a suffix of ν₁. Thus, $E\xrightarrow {v,w} x \mu _{1} w \nu _{2} \nu _{1}y \doteq y \mu _{1}^{\prime } w \nu _{2}^{\prime } \nu _{1}^{\prime } x$, and since z is a suffix of ν₁, the statement of the lemma follows. □

Recall that our strategy for transforming an equation of the form $x \mu _{1} abc \mu _{2} y \doteq y \mu _{1}^{\prime } cba \mu _{2}^{\prime } x$ into one of the form $x \mu _{1} zbw \eta y \doteq y \mu _{1}^{\prime } wbz \nu ^{\prime } x$ is first to ‘move’ the lexicographically minimal variable z from ${{\varGamma }}^E_{j+1}$ into the correct position (to the left of y on the LHS) and then to apply Lemma 7.14. We can consider three cases for z separately. The first, that z = a is trivial, and we do not need to change our original equation at all. The case that z = c is the most involved and is considered in the proof of Lemma 7.16. All other choices of z (namely when z ∈ var(μ₂)), are addressed in Lemma 7.15 below.

Note that in the statement of Lemma 7.15, the factors $\mu _{2},\mu _{2}^{\prime }$ are replaced by μ₂δ and $\mu _{2}^{\prime }\delta ^R$ respectively. We may make this change w.l.o.g. since our equation is in normal form, and since the case that $\mu _{2} = \mu _{2}^{\prime } = \varepsilon $ is trivial (the j^th block will be final in this case). Moreover, if |δ| = 1, then (δ, δ) ∈Υ_E, so it follows from the definitions that the equation is not jumbled. Since we are only interested in this section in jumbled equations, we may therefore also assume that |δ|≥ 2, which is necessary for the proof of the lemma.

Lemma 7.15

Let E be a basic RWE in normal form given by

$$x \mu_{1} abc \mu_{2} \delta y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} \delta^{R} x$$

with a, b, c ∈ X and $\delta , \mu _{1},\mu _{2}, \mu _{1}^{\prime },\mu _{2}^{\prime }\in X^{*}$ such that |δ|≥ 2, and ${var}(\mu _{2}) = {var}(\mu _{2}^{\prime })$. Then at least one of the following two statements is true.

1.

There exist n ∈ O(|E|), $a^{\prime },c^{\prime } \in X$, and β ∈ X⁺ such that $E \odot ^n x \mu _{1} a^{\prime }bc^{\prime } \beta y \doteq y \mu _{1}^{\prime } c^{\prime }ba^{\prime } \beta ^R x$, or
2.

for every z ∈ var(μ₂δ), there exist $a^{\prime },c^{\prime } \in X$, $\eta ,\eta ^{\prime } \in X^{*}$, and n ∈ O(|E|²) such that $E \odot ^n x \mu _{1} a^{\prime }bc^{\prime } \eta z y \doteq y \mu _{1}^{\prime } c^{\prime }ba^{\prime } \eta ^{\prime } x$.

Proof

Suppose that the first statement does not hold and notice that this implies |μ₂|≥ 1. We shall now prove that the second statement holds. We divide our reasoning into three cases based on the prefixes of μ₂ and $\mu _{2}^{\prime }$. In particular, since E is in normal form, there exists a prefix α_i of μ₂ such that $\alpha _i^R$ is a prefix of $\mu _{2}^{\prime }$ and such that 1 ≤|α_i|≤ 3. Firstly suppose that |α_i| = 1, or in other words that μ₂ and $\mu _{2}^{\prime }$ have a common prefix v ∈ X. Then the statement follows directly from Lemma 7.14.

It remains to consider the cases that |α_i| = 2 and |α_i| = 3. Before we consider these cases explicitly, it is convenient to define the following equation $E^{\prime }$ such that $E \odot ^{n^{\prime }} E;$ for some $n^{\prime } \in O(|E|)$. In particular, note that there exist u, v ∈ X such that $\delta = u \delta ^{\prime } v$. It follows by Lemma 7.12 that there exist ν₁, ν₁ ∈ X⁺ with ${var}(\nu _{1}) = {var}(\nu _{1}^{\prime })$ such that $E \odot ^3 x \mu _{1} vbu \nu _{1} y \doteq y \mu _{1}^{\prime } ubv \nu _{1}^{\prime } x$. Moreover, by Lemma 7.6, there exist $\nu _{2},\nu _{2}^{\prime } \in X^{*}$, β ∈ X⁺ and $n^{\prime } \in O(|E|)$ such that $E \odot ^{n^{\prime }} E^{\prime }$ where $E^{\prime }$ is given by

$$E^{\prime} : \quad x \mu_{1} vbu \beta \nu_{2} y \doteq y \mu_{1}^{\prime} ubv \beta^{R} \nu_{2}^{\prime} x$$

where 1 ≤|β|≤ 3 (recall by our assumption that the first statement of the lemma does not hold, that ν₂≠ε). Note that since $E \odot ^{*} E^{\prime }$, we have $E \Rightarrow ^{*} E^{\prime }$ and thus by Theorem 5.3, ${\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} = {\varUpsilon }$.

We are now ready to consider the second case, that |α_i| = 2. In this case, there exist d, e ∈ X such that α_i = de, so de is a prefix of μ₂ and ed is a prefix of $\mu _{2}^{\prime }$. If z ∈ var(μ₂α)∖{d}, then the second statement of the lemma follows directly from Lemma 7.13. Suppose instead that z = d. In this case, we shall show that the (second statement of the) lemma holds for $E^{\prime }$. Since $E \odot ^{n^{\prime }} E^{\prime }$, it follows that the lemma also holds for E.

If |β| = 1, then the second statement of the lemma follows from Lemma 7.14 along with the fact that $E \odot ^{n^{\prime }} E^{\prime }$. Similarly, if |β|∈{2,3} and z≠β[|β|− 1], the statement follows from Lemma 7.13. Finally, we must consider the case that β ∈{2,3} and z = β[|β|− 1]. If |β| = 2, then there exists $z^{\prime } \in X$ such that $\beta = zz^{\prime }$. It follows that $zz^{\prime }$ is a factor of the LHS of $E^{\prime }$ and $vz^{\prime }$ is a factor of the RHS of $E^{\prime }$, so (z, v) ∈Υ. Furthermore, by our assumption that z = d, ze = de is a factor of the LHS of E and ae is a factor of the RHS of E, so (z, a) ∈Υ. However, since a≠v, this contradicts Remark 5.2. We can proceed similarly when |β| = 3. In particular, if |β| = 3, then there exist $z^{\prime },z^{\prime \prime } \in X$ such that $\beta = z^{\prime }zz^{\prime \prime }$. It follows that (z, v),(u, z) ∈Υ. Furthermore, since z = d, we also have that (z, a) ∈Υ. However, since v≠a we again get a contradiction to Remark 5.2. Thus d≠β[|β|− 1] and we are done with the case that |α_i| = 2.

Suppose now that |α_i| = 3, meaning there exist d, e, f ∈ X such that α_i = def is a prefix of μ₂ and fed is a prefix of $\mu _{2}^{\prime }$. As before, if z ∈ var(μ₂α)∖{e}, the second statement of the lemma follows from Lemma 7.13 (applied to E). Suppose instead that z = e. We shall again proceed by showing that the second statement of the lemma holds for $E^{\prime }$. If |β| = 1, it follows directly from Lemma 7.14. Similarly, if |β|∈{2,3} and z≠β[|β|− 1], the statement again follows from Lemma 7.13. Finally, suppose for contradiction that β ∈{2,3} and z = β[|β|− 1]. We again have to consider two cases based on |β|. If |β| = 2, then there exists $z^{\prime } \in X$ such that $\beta = zz^{\prime }$. It follows that (z, v) ∈Υ. Furthermore, since z = e, we also have that (z, a) ∈Υ, a contradiction to Remark 5.2. Similarly, if |β| = 3, then there exist $z^{\prime },z^{\prime \prime } \in X$ such that $\beta = z^{\prime }zz^{\prime \prime }$. It follows that (z, v),(u, z) ∈Υ. Furthermore, since z = e, we also have that (z, a),(c, z) ∈Υ. However, since u≠c, v≠a we again get a contradiction to Remark 5.2. Thus d≠β[|β|− 1] and the statement holds as required. □

We are now ready to prove the following lemma, which is the main technical step in the proof of Theorem 7.11, showing that we can replace the factors abc and cba at the start of the j + 1^th block (which occur whenever the block is Type B) with factors zbw and wbz where z is any variable from ${{\varGamma }}^E_{j+1}$, and hence that we can do the same for the lexicographically minimal choice of z. This, combined with Lemma 7.6, allows us to transform the equation into one with the j + 1^th block satisfying the requirements for Lex Normal Form.

It is also worth noting that the variable b and whether the block is Type A or Type B remain unchanged (see Section 8 for more information on why we cannot change them). Aside from these parameters, we can essentially produce all other possibilities for the variable in the first position in the block. In other words, we do not use anything about the lexicographic order other than it permits us to make some well-defined choice at each stage which is consistent across all equations. Consequently, there is a high degree of symmetry in the set of equations in normal form occurring in the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$.

Lemma 7.16

Let E be a basic RWE in normal form given by

$$x \mu_{1} abc \mu_{2} \delta y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} \delta^{R} x$$

with a, b, c ∈ X and $\delta , \mu _{1},\mu _{2}, \mu _{1}^{\prime },\mu _{2}^{\prime }\in X^{*}$ such that |δ|≥ 2 and ${var}(\mu _{2}) = {var}(\mu _{2}^{\prime })$. Let Γ = var(μ₂δ) ∪{a, c}. Then at least one of the following two statements is true.

1.

There exist n ∈ O(|E|), $a^{\prime },c^{\prime } \in X$, and β ∈ X⁺ such that $E \odot ^n x \mu _{1} a^{\prime }bc^{\prime } \beta y \doteq y \mu _{1}^{\prime } c^{\prime }ba^{\prime } \beta ^R x$, or
2.

for each z ∈Γ, there exist w ∈ X, $\eta ,\eta ^{\prime } \in X^{*}$, and n ∈ O(|E|²) such that $E \odot ^n x \mu _{1} zbw \eta y \doteq y \mu _{1}^{\prime } wbz \eta ^{\prime } x$.

Proof

Assume that the first statement does not hold and notice that this implies |μ₂|≥ 1. We shall now prove that the second statement holds. The case that z = a is trivial. Next, consider the case that z∉{a, c}. Then z ∈ var(μ₂δ). By Lemma 7.15, and by our assumption that Statement 1 of the lemma does not hold, we get that there exist $a^{\prime },c^{\prime } \in X$ $\nu , \nu ^{\prime } \in X^{*}$ and $n^{\prime } \in O(|E|^2)$ such that

$$ E \odot^{n^{\prime}} x \mu_{1} a^{\prime}bc^{\prime} \nu z y \doteq y \mu_{1}^{\prime} c^{\prime}ba^{\prime} \nu^{\prime} x.$$

Since E is basic and regular, and since ${var}(\mu _{1}) = {var}(\mu _{1}^{\prime })$, we may conclude that ${var}(\nu ^{\prime }) = {var}(\nu z)$. Thus, by Lemma 7.12, there exist $\eta , \eta ^{\prime } \in X^{*}$ such that

$$ x \mu_{1} a^{\prime}bc^{\prime} \nu z y \doteq y \mu_{1}^{\prime} c^{\prime}ba^{\prime} \nu^{\prime} x \odot^{3} x \mu_{1} zb w \eta y \doteq y \mu_{1}^{\prime} wbz \eta^{\prime} x$$

where $w = \nu ^{\prime }[|\nu ^{\prime }|]$. Consequently, we have that $E \odot ^n x \mu _{1} zb w \eta y \doteq y \mu _{1}^{\prime } wbz \eta ^{\prime } x$ for some n ∈ O(|E|) and the second statement holds as claimed.

It remains to consider the case that z = c. Then since |δ|≥ 2, there exist u, v ∈ X∖{a, b, c} such that $\delta = u \delta ^{\prime } v$ for some $\delta ^{\prime } \in X^{*}$. Thus, by Lemma 7.12, there exist $\nu _{1},\nu _{1}^{\prime } \in X^{*}+$ such that $E \odot ^3 x \mu _{1} vbu \nu _{1} y \doteq y \mu _{1}^{\prime } ubv \nu _{1}^{\prime } x$. Moreover, since E is basic and regular, and since ${var}(\mu _{1}) = {var}(\mu _{1}^{\prime })$, we may conclude that ${var}(\nu _{1}) = {var}(\nu _{1}^{\prime })$. Thus, by Lemma 7.6, there exist $\nu _{2},\nu _{2}^{\prime } \in X^{*}$ and β ∈ X⁺ and n₁ ∈ O(|E|) such that $E \odot ^{n_{1}} E^{\prime }$ where $E^{\prime }$ is given by $x \mu _{1} vbu \beta \nu _{2} y \doteq y \mu _{1}^{\prime } ubv \beta ^R \nu _{2}^{\prime } x$ and such that 1 ≤|β|≤ 3 whenever ν₂≠ε. By our assumption that the first statement of the lemma is not true, we must in fact have that ν₂≠ε.

Additionally, note that ${var}(\nu _{2}) = {var}(\nu _{2}^{\prime })$ and c ∈ var(βν₂). Thus, by Lemma 7.15, along with our assumption that the first statement of the lemma does not hold, it follows that there exist n₂ ∈ O(|E|²), $a^{\prime },c^{\prime }, d\in X$ and $\eta ,\eta ^{\prime } \in X^{*}$ such that $E^{\prime } \odot ^{n_{2}} E^{\prime \prime }$ where $E^{\prime \prime }$ is given by $x \mu _{1} a^{\prime }bc^{\prime } \eta c y \doteq y \mu _{1}^{\prime } c^{\prime } b a^{\prime } \eta ^{\prime } dx$. As before, since E (and therefore also $E^{\prime \prime }$) is basic and regular, and since ${var}(\mu _{1}) = {var}(\mu _{1}^{\prime })$, we may further conclude that ${var}(\eta c) = {var}(\eta ^{\prime } d)$. Similarly, since E is jumbled and $E\odot ^{*} E^{\prime \prime }$ (meaning also that $E \Rightarrow ^{*}E^{\prime \prime }$) it follows that $E^{\prime \prime }$ is also jumbled and consequently that d≠c. Hence we may write $E^{\prime \prime }$ as $x \mu _{1} a^{\prime }bc^{\prime } \eta _{1} d \eta _{2} c y \doteq y \mu _{1}^{\prime } c^{\prime } b a^{\prime } \eta _{1}^{\prime } c \eta _{2}^{\prime } dx$ where $\eta _{1},\eta _{1}^{\prime },\eta _{2},\eta _{2}^{\prime } \in X^{*}$ and the second statement of the lemma follows from Lemma 7.12. □

Having described the main technical elements to the proof of Theorem 7.11, we are now ready to give the main intuitive statement as to why it holds, which also constitutes the main induction step, forming the backbone of the proof.

Lemma 7.17

Let E be a jumbled basic RWE in normal form with block decomposition (B₀, B₁,…,B_k). Let $\iota \in \mathbb {N}$ with 0 < ι < k. Then at least one of the following two statements is true.

1.

There exists a (final) block C_ι, $\hat {E} \in [E]_{\Rightarrow }$ and n ∈ O(|E|) such that $E \odot ^n \hat {E}$ and such that $\hat {E}$ has a block decomposition (B₀, B₁,…,B_ι− 1, C_ι), or
2.

there exist blocks C_ι, C_ι+ 1,…C_ℓ, $\hat {E} \in [E]_{\Rightarrow }$ and n ∈ O(|E|²) such that $E \odot ^n \hat {E}$ and such that $\hat {E}$ has a block decomposition (B₀, B₁,…,B_ι− 1, C_ι, C_ι+ 1,…C_ℓ) and such that C_ι is lex-minimal.

Proof

Let E be given by

$$x \alpha_{1}\alpha_{2}{\ldots} \alpha_{m} y \doteq y {\alpha_{1}^{R}} {\alpha_{2}^{R}} {\ldots} {\alpha_{m}^{R}} x$$

such that x, y ∈ X, α_i ∈ X⁺ for 1 ≤ i ≤ n, and |α_i|≤ 3 for 1 ≤ i < m. Let I_E = {i₁, i₂,…,i_k} = {i∣1 ≤ i < m and |α_i|≠ 2} with 1 ≤ i₁ < i₂ < … < i_k < m. If I_E = ∅, then the statement holds trivially. Thus we may assume that I_E≠∅. Note that the block decomposition $\mathfrak {B}$ of E is given by (B₀, B₁,…,B_k) where

$$ \begin{array}{@{}rcl@{}} B_{0} &=& (x \alpha_{1}\alpha_{2} {\ldots} \alpha_{i_{1}-1}, y {\alpha_{1}^{R}}{\alpha_{2}^{R}} {\ldots} \alpha_{i_{1}-1}^{R} )\\ B_{j} &=& (\alpha_{i_{j}} \alpha_{i_{j}+1} \ldots \alpha_{i_{j+1}-1}, \alpha_{i_{j}}^{R} \alpha_{i_{j}+1}^{R} {\ldots} \alpha_{i_{j+1}-1}^{R})\\ B_{k} &=& (\alpha_{i_{k}} \alpha_{i_{k}+1} {\ldots} \alpha_{n} y, \alpha_{i_{k}}^{R} \alpha_{i_{k}+1}^{R} {\ldots} {\alpha_{n}^{R}} x) \end{array} $$

for 0 < j < k.

Now, let $ \iota \in \mathbb {N}$ with 0 < ι < k. If B_ι is lex-minimal, the second statement holds trivially for ℓ = k and C_j = B_j for ι ≤ j ≤ k. Suppose instead that B_ι is not lex-minimal. We shall consider the cases that B_ι is Type A and Type B separately. Suppose firstly that B_ι is Type A. Then $|\alpha _{i_{\iota }}| = 1$. Thus we can write E as

$$x \mu_{1} v \mu_{2} y \doteq y \mu_{1}^{\prime} v \mu_{2}^{\prime} x$$

where $v = \alpha _{i_{\iota }} \in X$, $\mu _{1} = \alpha _{1} \alpha _{2} {\ldots } \alpha _{i_{\iota }-1}$, $\mu _{1}^{\prime } = \alpha _{1}^R \alpha _{2}^R {\ldots } \alpha _{i_{\iota }-1}^R$, $\mu _{2} = \alpha _{i_{\iota }+1}\alpha _{i_{\iota }+2} {\ldots } \alpha _m$ and $\mu _{2}^{\prime } = \alpha _{i_{\iota } + 1}^R \alpha _{i_{\iota }+2}^R {\ldots } \alpha _m^R$. Moreover, ${{\varGamma }}^E_{\iota } = {var}(v \mu _{2})$. Let z be the lexicographically minimal element of ${{\varGamma }}^E_{\iota }$. Then by our assumption that B_ι is not lex-minimal, we have that z≠v. Thus there exist $\nu _{1},\nu _{2}, \nu _{1}^{\prime }, \nu _{2}^{\prime } \in X^{*}$ such that μ₂ = ν₁zν₂ and $\mu _{2}^{\prime } = \nu _{1}^{\prime } z \nu _{2}^{\prime }$. Consequently, $E \xrightarrow {v,z} x \mu _{1} z \nu _{2} v \nu _{1} y \doteq y \mu _{1}^{\prime } z \nu _{2}^{\prime } v \nu _{1}^{\prime } x$ and since ${var}(\mu _{1} z) = {var}(\mu _{1}^{\prime }z)$, by Lemma 7.6, we have that $E \odot ^n E^{\prime }$ where $E^{\prime }$ is given by:

$$ x \alpha_{1}\alpha_{2}{\ldots} \alpha_{i_{\iota}-1} z \alpha_{i_{\iota}+1}^{\prime} \alpha_{i_{\iota}+2}^{\prime} {\ldots} \alpha_{m^{\prime}}^{\prime} y \doteq y {\alpha_{1}^{R}}{\alpha_{2}^{R}}{\ldots} \alpha_{i_{\iota}-1}^{R} z\alpha_{i_{\iota}+1}^{\prime R} \alpha_{i_{\iota}+2}^{\prime R} {\ldots} \alpha_{m^{\prime}}^{\prime R} x$$

for some n ∈ O(|E|²) and $ \alpha _{i_{\iota }+1}^{\prime },\alpha _{i_{\iota }+2}^{\prime },\ldots , \alpha _{m^{\prime }}^{\prime } \in X^+$ with $1\leq |\alpha _j^{\prime }| \leq 3$ for $i_{\iota }+1 \leq j < m^{\prime }$. Let $I_{E^{\prime }} = \{i_{1}^{\prime },i_{2}^{\prime },\ldots ,i_{\ell }^{\prime } \} = \{i \mid 1\leq i < i_{\iota } \text { and } |\alpha _i| \not = 2\} \cup \{i_{\iota } \} \cup \{i \mid i_{\iota } < i < m^{\prime } \text { and } |\alpha _i^{\prime }| \not = 2\}$ with $1\leq i_{1}^{\prime } < i_{2}^{\prime } < {\ldots } < i_{\ell }^{\prime } < m$.

Let $\mathfrak {B}^{\prime } = (B_{0}^{\prime },B_{1}^{\prime },\ldots , B_{\ell }^{\prime })$ be the block decomposition of $E^{\prime }$. Then since $I_E \cap \{1,2,\ldots ,i_{\iota }\} = I_{E^{\prime }} \cap \{1,2,\ldots , i_{\iota }\}$, we have $B_j = B_j^{\prime }$ for 0 ≤ j ≤ ι − 1. Moreover, since z is minimal in ${{\varGamma }}^E_{\iota } = {{\varGamma }}^{E^{\prime }}_{\iota }$, $B_{\iota }^{\prime }$ is lex-minimal and the second statement holds.

Now suppose that B_ι is Type B. Then $|\alpha _{i_{\iota }}| = 3$, so there exist a, b, c ∈ X such that $\alpha _{i_{\iota }} = abc$. Thus we can write E as

$$ x \mu_{1} abc \mu_{2} \delta y \doteq y \mu_{1}^{\prime} cba \mu_{2}^{\prime} \delta^{R} x $$

where $\mu _{1} = \alpha _{1} \alpha _{2} {\ldots } \alpha _{i_{\iota }-1}$, $\mu _{1}^{\prime } = \alpha _{1}^R \alpha _{2}^R {\ldots } \alpha _{i_{\iota }-1}^R$, $\mu _{2} = \alpha _{i_{\iota }+1}\alpha _{i_{\iota }+2} {\ldots } \alpha _{m-1}$, $\mu _{2}^{\prime } = \alpha _{i_{\iota } + 1}^{R} \alpha _{i_{\iota }+2}^{R} {\ldots } \alpha _{m-1}^{R}$ and δ = α_m. Moreover, ${{\varGamma }}_{\iota }^{E} = {var}(\mu _{2}\delta )\cup \{a,c\}$. Let z be the lexicographically minimal element of ${{\varGamma }}_{\iota }^{E}$. Then by our assumption that B_ι is not lex-minimal, z≠a. Moreover, since E is jumbled, we may conclude that |δ|≠ 1 (otherwise we would have (δ, δ) ∈Υ_E, a contradiction).

By Lemma 7.16, we have two cases. The first is that there exists n ∈ O(|E|²), $a^{\prime },c^{\prime } \in X$ and β ∈ X⁺ such that $E \odot E^{\prime }$ where $E^{\prime }$ is given by

$$ x \alpha_{1}\alpha_{2}{\ldots} \alpha_{i_{\iota}-1} a^{\prime}bc^{\prime} \beta y \doteq y {\alpha_{1}^{R}} {\alpha_{2}^{R}} {\ldots} \alpha_{i_{\iota}-1}^{R} c^{\prime}ba^{\prime} \beta^{R} x. $$

Let $I_{E^{\prime }} = \{i_{1}^{\prime },i_{2}^{\prime },\ldots ,i_{\ell }^{\prime } \} = \{i \mid 1\leq i < i_{\iota } \text { and } |\alpha _{i}| \not = 2\} \cup \{i_{\iota } \}$. Let $\mathfrak {B}^{\prime } = (B_{0}^{\prime },B_{1}^{\prime },\ldots , B_{\ell }^{\prime })$ be the block decomposition of $E^{\prime }$. Then since $I_{E} \cap \{1,2,\ldots ,i_{\iota }\} = I_{E^{\prime }} \cap \{1,2,\ldots , i_{\iota }\}$, we have $B_{j} = B_{j}^{\prime }$ for 0 ≤ j ≤ ι − 1. Moreover, since $I_{E^{\prime }}$ does not contain any elements greater than i_ι, $B_{\iota }^{\prime }$ is the final block, so the first statement holds for C_ι = B_ι.

The second case is that there exist $n^{\prime }\in O(|E|^{2}), w \in X$ and $\eta ,\eta ^{\prime }$ such that $E \odot ^{n^{\prime }} x \mu _{1} z b w \eta y \doteq y \mu _{1}^{\prime } wbz \eta ^{\prime } x$. By Lemma 7.6, there exist $n^{\prime \prime }\in O(|E|^{2})$ and $\alpha _{i_{\iota }+1}^{\prime },\alpha _{i_{\iota }+2}^{\prime }$, $\ldots , \alpha _{m^{\prime }}^{\prime } \in X^{+}$ with $|\alpha _{j}^{\prime }|\leq 3$ for $i_{\iota } < j < m^{\prime }$ such that $E \odot E^{\prime }$ where $E^{\prime }$ is given by

$$ x \alpha_{1}\alpha_{2}{\ldots} \alpha_{i_{\iota}-1} zbw \alpha_{i_{\iota}+1}^{\prime} \alpha_{i_{\iota}+2}^{\prime} {\ldots} \alpha_{m^{\prime}}^{\prime} y \doteq y {\alpha_{1}^{R}} {\alpha_{2}^{R}} {\ldots} \alpha_{i_{\iota}-1}^{R} wbz\alpha_{i_{\iota}+1}^{\prime R} \alpha_{i_{\iota}+2}^{\prime R} {\ldots} \alpha_{m^{\prime}}^{\prime R} x.$$

Let $I_{E^{\prime }} = \{i_{1}^{\prime },i_{2}^{\prime },\ldots ,i_{\ell }^{\prime } \} = \{i \mid 1\leq i < i_{\iota } $ and $ |\alpha _{i}| \not = 2\} \cup \{i_{\iota } \} \cup \{i \mid i_{\iota }+1 \leq i < m^{\prime } \text { and } |\alpha _{i}^{\prime }| \not = 2\}$ with $1\leq i_{1}^{\prime } < i_{2}^{\prime } < {\ldots } < i_{\ell }^{\prime } < m^{\prime }$. Let $\mathfrak {B}^{\prime } = (B_{0}^{\prime },B_{1}^{\prime },\ldots , B_{\ell }^{\prime })$ be the block decomposition of $E^{\prime }$. Then since $I_{E} \cap \{1,2,\ldots ,i_{\iota }\} = I_{E^{\prime }} \cap \{1,2,\ldots , i_{\iota }\}$, we have $B_{j} = B_{j}^{\prime }$ for 0 ≤ j ≤ ι − 1. Moreover, since z is minimal in ${{\varGamma }}_{\iota }^{E} = {{\varGamma }}_{\iota }^{E^{\prime }}$, $B_{\iota }^{\prime }$ is lex-minimal and the second statement of the lemma statement holds for $C_{j} = B_{j}^{\prime }$ for ι ≤ j ≤ ℓ. □

Finally, for the sake of completeness, we provide a formal summary of the proof of Theorem 7.11 based on Lemma 7.17 using the arguments which have so-far been described informally.

Proof Theorem 7.11

Let E be a jumbled basic RWE. By Theorem 7.5, we may assume that E is in normal form. Let $\mathfrak {B} = (B_{0},B_{1},\ldots ,B_{k})$ be its block decomposition. If B_i is lex-minimal for 0 < i < k, then E is in LNF and we are done (this also covers the case that k ≤ 1). Otherwise, suppose that k > 1 and let $\iota = \min \limits _{0<j<k}\{ j\mid B_{j} \text { is not lex-minimal}\}$. Then by Lemma 7.17, we have two possibilities. Either:

1.

there exists a block C_ι, n ∈ O(|E|) and $\hat {E}$ such that $E \odot ^{n} \hat {E}$ and $\hat {E}$ has the block decomposition (B₀, B₁,…,B_ι− 1, C_ι), or
2.

there exist blocks C_ι, C_ι+ 1,…,C_ℓ, n ∈ O(|E|²) and $\hat {E}$ such that $E \odot ^{n} \hat {E}$ and such that $\hat {E}$ has the block decomposition (B₀, B₁,…,B_ι− 1, C_ι, C_ι+ 1,…,C_ℓ) and such that C_ι is lex-minimal.

In the first case, by definition of ι, B_j is lex-minimal for 0 < j < ι, meaning $\hat {E}$ is in LNF and we are done. In the second case, we have an equation $\hat {E}$ such that $E \odot ^{n^{\prime }} \hat {E}$ where $n^{\prime } \in O(|E|)^{2}$ and such that the block decomposition of $\hat {E}$ has a longer initial sequence of lex-minimal blocks than the block decomposition of E.

Furthermore, it follows from the definitions that any block decomposition cannot have more blocks than the number of variables occurring in the equation. Recall that the set of variables occurring in an equation is invariant under ⇒^∗ (and therefore also ⊙). Thus with at most O(|E|) applications of Lemma 7.17, we may conclude that $E\odot ^{n^{\prime \prime }} E^{\prime }$ for an equation $E^{\prime }$ and with block decomposition $(B_{0}^{\prime },B_{1}^{\prime },B_{2}^{\prime }, \ldots , B^{\prime }_{k^{\prime }})$ such that $B_{j}^{\prime }$ is lex-minimal for $0 < j < k^{\prime }$ (meaning $E^{\prime }$ is in LNF) and such that $n^{\prime \prime } \in O(|E|^{3})$. It follows directly from the definitions that ⊙ is symmetric, and therefore we also have $E^{\prime } \odot ^{n^{\prime \prime }} E$. By Corollary 7.3, we may therefore conclude that $E^{\prime } \Rightarrow ^{n_{1}} E$ and $E \Rightarrow ^{n_{2}} E^{\prime }$ for some n₁, n₂ ∈ O(|E|⁴). □

8 Diameter

It was mentioned in the previous section that the choices for the blocks in a block decomposition of an equation in normal form are restricted by the invariant Υ_E. We shall now make full use of that fact to show that the number of equations in Lex Normal Form in a single graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is bounded by a polynomial in |E| (Theorem 7.11), and as a consequence that the diameter of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is also bounded by a polynomial in |E| (Theorem 8.11). By combining this result with Theorems 6.8 and 4.8, we can extend it from jumbled basic regular word equations to all regular word equations. Consequently, we can conclude that satisfiability of regular word equations is NP-complete (Theorem 8.12).

Since each equation in Lex Normal Form has a unique block decomposition, it is sufficient to count the possible block decompositions satisfying the conditions for Lex Normal Form for a given value of Υ_E. We shall focus on conditions which force two blocks to be the same. We shall consider the cases of initial, standard and final blocks separately, but first we need the following lemmas which take advantage of the invariant Υ_E in order to limit the equations in normal form occurring in a single equivalence class [E]_⇒.

The first of these lemmas, and the resulting corollary provide some intuition behind the definition of the block decomposition and to why the blocks are often fixed by the invariant Υ_E (along with the leftmost variable which, aside from exceptional cases, is fixed by Lex Normal Form). Essentially, they show that the length-two factors α_i (and thus ${\alpha _{i}^{R}}$) occurring as per the definition of normal form are fixed exactly by the variables preceding them along with the invariant Υ_E.

Lemma 8.1

Let u, v, a, b ∈ X and let $\alpha _{1},\alpha _{2},\beta _{1},\beta _{2},\alpha _{1}^{\prime },\alpha _{2}^{\prime }, \beta _{1}^{\prime }, \beta _{2}^{\prime },\gamma \in X^{*}$ such that 1 ≤|γ|≤ 3. Let E₁ and E₂ be jumbled basic RWEs given by

$$ \begin{array}{@{}rcl@{}} && E_{1}: \quad \alpha_{1} u a b \alpha_{2} \doteq \beta_{1} v b a \beta_{2}\\ && E_{2}: \quad \alpha_{1}^{\prime} u \gamma \alpha_{2}^{\prime} \doteq \beta_{1}^{\prime} v \gamma^{R} \beta_{2}^{\prime}. \end{array} $$

If ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}$ then γ = ab.

Proof

Let γ = c₁c₂…c_n with c_i ∈ X, 1 ≤ i ≤ n. Suppose that ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}} = {\varUpsilon }$. Note that (a, v),(u, b) ∈Υ. If |γ| = 1, then (u, v) ∈Υ, which by Remark 5.2, implies a = u, a contradiction to the assumption that E₁ is regular. Similarly, if |γ| = 3, then (c₂, v),(u, c₂) ∈Υ which by Remark 5.2 implies c₂ = a = b, again a contradiction to the assumption that E₁ is regular. Thus, it follows that |γ| = 2. In this case, we have that (c₁, v),(u, c₂) ∈Υ. By Remark 5.2, it follows that c₁ = a and c₂ = b so γ = ab as required. □

Corollary 8.2

Let $k\in \mathbb {N}$. For 1 ≤ i ≤ 4 and 1 ≤ j ≤ k, let $\mu _{i},\mu _{i}^{\prime }, \alpha _{j}, \beta _{j} \in X^{*}$ such that |α_j| = |β_j| = 2. Let E₁ and E₂ be the jumbled basic RWEs given by

$$ \begin{array}{@{}rcl@{}} &&E_{1}: \quad \mu_{1} u \alpha_{1} \alpha_{2} {\ldots} \alpha_{k} \mu_{2} \doteq \mu_{3} v {\alpha_{1}^{R}} {\alpha_{2}^{R}}{\ldots} {\alpha_{k}^{R}} \mu_{4}\\ &&E_{2}: \quad \mu_{1}^{\prime} u \beta_{1} \beta_{2} {\ldots} \beta_{k} \mu_{2}^{\prime} \doteq \mu_{3}^{\prime} v {\beta_{1}^{R}} {\beta_{2}^{R}}{\ldots} {\beta_{k}^{R}} \mu_{4}^{\prime}. \end{array} $$

Suppose that ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}$. Then α_j = β_j for 1 ≤ j ≤ k.

Any initial block has the form $(x\alpha _{1}\alpha _{2} {\ldots } \alpha _{i}, y{\alpha _{1}^{R}}{\alpha _{2}^{R}} {\ldots } {\alpha _{i}^{R}})$ where x, y ∈ X and α_j ∈ X^∗ with |α_j| = 2 for 1 ≤ j ≤ i. Since x, y are fixed by Υ_E, it follows from Corollary 8.2 that all the α_j factors, for 1 ≤ j ≤ i are fixed exactly by the invariant Υ_E. With a little additional effort, we can conclude the slightly more general statement that initial blocks occurring in the block decomposition of some equation E in normal form are fixed exactly by Υ_E. Recall from the definitions that in a block decomposition (B₀, B₁,…,B_k) of an equation in normal form, B₀ will be an initial block provided k ≥ 1 (if k = 0 then B₀ = B_k will be a final block).

Lemma 8.3

Let E₁, E₂ be jumbled basic RWEs in normal form such that ${\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}$. Let (B₀, B₁,…,B_k) and (C₀, C₁,…,C_ℓ) be the block decompositions of E₁ and E₂ respectively. Suppose that k, ℓ ≥ 1. Then B₀ = C₀.

Proof

Since E₁ is in normal form, we may write it as $x \alpha _{1} \alpha _{2}{\ldots } \alpha _{n} y \doteq y {\alpha _{1}^{R}} {\alpha _{2}^{R}} {\ldots } {\alpha _{n}^{R}} x$ with x, y ∈ X and α_i ∈ X⁺ for 1 ≤ i ≤ n such that |α_i|≤ 3 for 1 ≤ i < n. Similarly, we may write E₂ as $x^{\prime } \alpha _{1}^{\prime } \alpha _{2}^{\prime }{\ldots } \alpha _{m}^{\prime } y^{\prime } \doteq y^{\prime } \alpha _{1}^{\prime R} \alpha _{2}^{\prime R} {\ldots } \alpha _{m}^{\prime R} x^{\prime }$ with $x^{\prime },y^{\prime } \in X$ and $\alpha _{i}^{\prime } \in X^{+}$ for 1 ≤ i ≤ m such that $|\alpha _{i}^{\prime }| \leq 3$ for 1 ≤ i < m. Suppose that ${\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}} = {\varUpsilon }$ and note that this implies var(E₁) = var(E₂). Similarly, is easily verified (either from the definition of ⇒, or from Remark 5.2) that $x = x^{\prime }$ and $y = y^{\prime }$.

Since k, ℓ ≥ 1, there must exist $p = \min \limits \{ i \mid 1\leq i < n \text { and } |\alpha _{i}| \not = 2\}$ and $q = \min \limits \{ i \mid 1\leq i < m \text { and } |\alpha _{i}^{\prime }| \not = 2 \}$. It follows that $B_{0} = (x\alpha _{1}\alpha _{2}{\ldots } \alpha _{p-1}, y {\alpha _{1}^{R}} {\alpha _{2}^{R}} {\ldots } \alpha _{p-1}^{R})$ and $C_{0} = (x\alpha _{1}^{\prime }\alpha _{2}^{\prime }{\ldots } \alpha _{q-1}^{\prime }, y \alpha _{1}^{\prime R} \alpha _{2}^{\prime R} {\ldots } \alpha _{q-1}^{\prime R})$. By Corollary 8.2, it follows that $\alpha _{i} = \alpha ^{\prime }_{i}$ for $1\leq i <\min \limits \{p,q\}$.

Suppose for contradiction that p≠q. W.l.o.g. suppose that p > q. Then we may write E₁ and E₂ as $\mu _{1} u ab \mu _{2} \doteq \mu _{3} v ba \mu _{4}$ and $\mu _{1}^{\prime } u ab \gamma \mu _{2}^{\prime } \doteq \mu _{3}^{\prime } v \gamma ^{R} \mu _{4}^{\prime }$ respectively where μ₁, μ₂, μ₃, μ₄, $\mu _{1}^{\prime }$, $\mu _{2}^{\prime }$, $\mu _{3}^{\prime }$, $\mu _{4}^{\prime },\gamma \in X^{*}$, u, v, a, b ∈ X, and |γ|∈{1,3} (in particular, this is true for ab = α_q and $\gamma = \alpha ^{\prime }_{q}$). However in this case, it follows from Lemma 8.1 that ${\varUpsilon }_{\!E_{1}} \not = {\varUpsilon }_{\!E_{2}}$, a contradiction. Thus we must have that p = q, and the fact that B₀ = C₀ follows immediately. □

Similarly to initial blocks, we can use Corollary 8.2 to restrict standard blocks which are Type A. These blocks will have the form $(z \alpha _{1}\alpha _{2} {\ldots } \alpha _{i}, z {\alpha _{1}^{R}}{\alpha _{2}^{R}} {\ldots } {\alpha _{i}^{R}})$ where z ∈ X and α_j ∈ X^∗ with |α_j| = 2 for 1 ≤ j ≤ i. Hence the factors α_j, 1 ≤ j ≤ i are fixed completely by Υ_E and z. For Type B blocks, which instead have the form $(abc \alpha _{1}\alpha _{2} {\ldots } \alpha _{i}, cba {\alpha _{1}^{R}}{\alpha _{2}^{R}} {\ldots } {\alpha _{i}^{R}})$ with a, b, c ∈ X, we need the following additional observation.

Lemma 8.4

Let u, v, a, b, c,∈ X and let $\alpha _{1},\alpha _{2},\beta _{1},\beta _{2},\alpha _{1}^{\prime },\alpha _{2}^{\prime }, \beta _{1}^{\prime }, \beta _{2}^{\prime }, \gamma \in X^{*}$ such that 1 ≤|γ|≤ 3. Let E₁ and E₂ be the basic regular word equations given by

$$ \begin{array}{@{}rcl@{}} &&E_{1}: \quad \alpha_{1} u a b c \alpha_{2} \doteq \beta_{1} v c b a \beta_{2}\\ && E_{2}: \quad \alpha_{1}^{\prime} u \gamma \alpha_{2}^{\prime} \doteq \beta_{1}^{\prime} v \gamma^{R} \beta_{2}^{\prime}. \end{array} $$

If ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}$ then there exist $a^{\prime },c^{\prime } \in X$ such that $\gamma = a^{\prime }bc^{\prime }$. Moreover, if $a^{\prime } = a$, then $c^{\prime } = c$.

Proof

Let γ = e₁e₂…e_n with e_i ∈ X, 1 ≤ i ≤ n. Suppose that ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}} = {\varUpsilon }$. Note that (u, b),(a, c),(b, v) ∈Υ. If |γ| = 1, then (u, v) ∈Υ, and by Remark 5.2 we have that u = b, a contradiction to the assumption that E is regular. Thus we assume n ≥ 2. Then (u, e₂),(e_n− 1, v) ∈Υ. Hence, we have e₂ = e_n− 1 = b, and since E is regular, this implies that n = 3 so the statement holds with $a^{\prime } = e_{1}, b^{\prime } = e_{3}$. Finally, we note that since $(a^{\prime },c^{\prime }) \in {\varUpsilon }$, by Remark 5.2, if $a= a^{\prime }$ then $c= c^{\prime }$ as claimed. □

In what follows we shall show that for two jumbled basic regular equations E₁, E₂ in Lex Normal Form with ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}$ and block decompositions of the same length, all blocks except the final blocks must be identical (Corollary 8.7). We have already shown in Lemma 8.3 that this is true for the initial blocks, The next step is to show that if the previous blocks in both block decompositions are identical, then the next blocks will have the same type.

Lemma 8.5

Let E₁, E₂ be jumbled basic regular word equations in normal form such that ${\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}$. Let (B₀, B₁,…,B_k) and (C₀, C₁,…,C_ℓ) be block decompositions of E₁ and E₂ respectively. Suppose that $i,j \in \mathbb {N}_{0}$ with i < k, j < ℓ such that B_i = C_j. Then B_i+ 1 and C_j+ 1 have the same type.

Proof

Since there are two types, it is sufficient to prove that B_i+ 1 is Type B if and only if C_j+ 1 is Type B. Suppose that B_i+ 1 is Type B and suppose for contradiction that C_j+ 1 is Type A. Then there exist γ₁, γ₂, γ₃, γ₄ ∈ X^∗, and a, b, c, d ∈ X such that B_i+ 1 = (abcγ₁, cbaγ₂) and C_j+ 1 = (dγ₃, dγ₄). Note that there exist u, v ∈ X such that B_i = C_j = (δ₁u, δ₂v) where δ₁, δ₂ ∈ X^∗. Hence there exist $\alpha _{1},\alpha _{2},\beta _{1},\beta _{2},\alpha _{1}^{\prime },\alpha _{2}^{\prime },\beta _{1}^{\prime },\beta _{2}^{\prime } \in X^{*}$ such that E₁ is may be written as $\alpha _{1}u abc \alpha _{2} \doteq \beta _{1} v cba \beta _{2}$ and E₂ may be written as $\alpha _{1}^{\prime } u d \alpha _{2}^{\prime } \doteq \beta _{1}^{\prime } v d \beta _{2}^{\prime }$. However, by Lemma 8.4, this implies ${\varUpsilon }_{E_{1}} \not = {\varUpsilon }_{E_{2}}$, a contradiction. Consequently, C_j+ 1 is Type B if B_i+ 1 is Type B. The proof that B_i+ 1 is Type B if C_j+ 1 is Type B is symmetric and can be obtained by simply swapping E₁ and E₂. □

We are now ready to show that standard blocks in a block decomposition are fixed entirely by the preceding block, the invariant Υ_E, and the leftmost letter of the block. This is the primary motivation for the definition of Lex Normal Form, which restricts the choice for the leftmost letter of the block where possible, and thus restricts the possibilities for the standard blocks. In particular, it follows directly by a straightforward induction that for two jumbled basic RWEs in Lex Normal Form with the same invariant Υ_E, if their block decompositions have the same length, then all but the final blocks will be identical.

Lemma 8.6

Let E₁, E₂ be jumbled basic RWEs in normal form such that ${\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}$. Let (B₀, B₁,…,B_k) and (C₀, C₁,…,C_ℓ) be their respective block decompositions and let k, ℓ > 0. Suppose that B_i = C_j, for some i < k − 1, j < ℓ − 1. Let B_i+ 1 = (γ₁, γ₂) and C_j+ 1 = (δ₁, δ₂) with γ₁, γ₂, δ₁, δ₂ ∈ X^∗. If γ₁[1] = δ₁[1], then B_i+ 1 = C_j+ 1.

Proof

Note that since 0 < i + 1 < k and 0 < j + 1 < ℓ, the blocks B_i+ 1 and C_j+ 1 are both standard blocks. Note also that by Lemma 8.5, B_i+ 1 and C_j+ 1 have the same type. Hence, by definition, there exist α₁, α₂,…,α_n, β₁, β₂,…,β_m ∈ X⁺ such that $B_{i+1} = (\alpha _{1}\alpha _{2}\ldots \alpha _{n}, {\alpha _{1}^{R}}{\alpha _{2}^{R}}{\ldots } {\alpha _{n}^{R}})$ and $C_{j+1} = (\beta _{1}\beta _{2}\ldots \beta _{m}, {\beta _{1}^{R}}{\beta _{2}^{R}}\ldots ,{\beta _{m}^{R}})$, where |α₁| = |β₁|∈{1,3} and |α_p|,|β_q| = 2 for 2 ≤ p ≤ n and 2 ≤ q ≤ m. Since B_i = C_j, there exist u, v ∈ X and $\mu _{1},\mu _{2},\nu _{1},\nu _{2},\mu _{1}^{\prime },\mu _{2}^{\prime },\nu _{1}^{\prime },\nu _{2}^{\prime }, \eta ,\eta ^{\prime } \in X^{*}$ with $|\eta |, |\eta ^{\prime }| \in \{1,3\}$ and such that E₁ is given by $\mu _{1} u \alpha _{1} \alpha _{2} {\ldots } \alpha _{n} \eta \mu _{2} \doteq \nu _{1} v {\alpha _{1}^{R}} {\alpha _{2}^{R}} {\ldots } {\alpha _{n}^{R}} \eta ^{R} \nu _{2}$ and E₂ is given by $\mu _{1}^{\prime } u \beta _{1} \beta _{2} {\ldots } \beta _{n} \eta ^{\prime } \mu _{2}^{\prime } \doteq \nu _{1}^{\prime } v {\beta _{1}^{R}} {\beta _{2}^{R}} {\ldots } {\beta _{n}^{R}} \eta ^{\prime R} \nu _{2}^{\prime }$.

By the assumption that γ[1] = δ[1], we have that α₁[1] = β₁[1] meaning if |α₁| = |β₁| = 1 then α₁ = β₁ holds trivially. Similarly, if |α₁| = |β₁| = 3, then it follows from Lemma 8.4 that α₁ = β₁. In both cases, it follows from Corollary 8.2 that additionally, α_p = β_p for $2\leq p \leq \min \limits \{n,m\}$. It follows from Lemma 8.1 that n = m. Hence we have B_i+ 1 = C_j+ 1 as required. □

Note that if the first i blocks are identical in the block decompositions of two jumbled basic RWEs in Lex Normal Form with the same invariant set Υ_E, it follows that the set ${{\varGamma }}^{E}_{i+1}$ is also the same in both cases. Consequently, by definition of Lex Normal Form, if the i + 1^th blocks are not final blocks, the leftmost variable will be the same in each case (namely the lexicographically minimal element of ${{\varGamma }}^{E}_{i+1}$). Consequently, by Lemma 8.6, the i + 1^th blocks will also be identical. By a simple induction, we can thus conclude the following.

Corollary 8.7

Let E₁, E₂ be jumbled basic RWEs in Lex Normal Form such that ${\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}$. Let (B₀, B₁,…,B_k) and (C₀, C₁,…,C_ℓ) be their respective block decompositions and suppose that k, ℓ > 0. Then B_i = C_i for $0 \leq i < \min \limits (k,\ell )$.

Consequently, two equations in Lex Normal Form in the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ with block decompositions containing the same number of blocks may differ only in the final block. Clearly, the number of blocks in a block decomposition is at most Card(var(E)). Thus, in order to bound the number of equations in Lex Normal Form in ${\mathscr{G}}^{\Rightarrow }_{[E]}$, it suffices to count the possibilities for the final block.

Recall from the definition of normal form that the last (rightmost) α_i factor is the only one which may have length greater than 3. Consequently, we need a counterpart to Lemmas 8.1 and 8.4 for this case, given by the following.

Lemma 8.8

Let $u,v,x,y,x^{\prime },y^{\prime } \in X$ and let $\alpha ,\beta ,\alpha ^{\prime }, \beta ^{\prime }, \gamma ,\gamma ^{\prime } \in X^{*}$ such that |γ|≥ 1. Let E₁ and E₂ be the basic regular word equations given by $ x \alpha u \gamma y \doteq y \beta v \gamma ^{R} x$ and $ x^{\prime } \alpha ^{\prime } u \gamma ^{\prime } y^{\prime } \doteq y^{\prime } \beta ^{\prime } v \gamma ^{\prime R} x^{\prime }$ respectively. If ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}$ and $\gamma [1] = \gamma ^{\prime }[1]$ then $\gamma = \gamma ^{\prime }$.

Proof

Let z₁, z₂,…,z_n, w₁, w₂,…,w_m ∈ X be variables such that γ = z₁z₂…z_n and $\gamma ^{\prime } = w_{1}w_{2}{\ldots } w_{m}$ and suppose that z₁ = w₁. Suppose also that ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}} = {\varUpsilon }$. Note that for $1\leq i \leq \min \limits \{n,m\}-2$, we have (z_i, z_i+ 2),(w_i, w_i+ 2) ∈Υ. Moreover, if n, m ≥ 2, we also have that (u, z₂),(u, w₂) ∈Υ. Consequently, by Remark 5.2, we have that w_i = z_i for $1\leq i \leq \min \limits \{n,m\}$. If n = m we are done. Otherwise, suppose that n≠m, and note in particular that since E₁, E₂ are regular, this implies z_n≠w_m. However, (z_n, z₁),(w_n, w₁) ∈Υ, and since w₁ = z₁, by Remark 5.2 we have that z_n = w_m, a contradiction. Thus we must have n = m and $\gamma = \gamma ^{\prime }$ as claimed. □

The following lemma establishes conditions under which two final blocks must be identical, forming the basis for our bound on the number of possible final blocks in a block decomposition of an equation in Lex Normal Form, and consequently, a bound on the number of equations in Lex Normal Form itself.

Lemma 8.9

Let E₁, E₂ be jumbled basic RWEs in normal form such that ${\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}}$. Let (B₀, B₁,…,B_k) and (C₀, C₁,…,C_ℓ) be their respective block decompositions. Suppose that k, ℓ > 0 and that B_k− 1 = C_ℓ− 1. Let $B_{k} = (\alpha _{1}\alpha _{2}\ldots \alpha _{n} y, {\alpha _{1}^{R}}{\alpha _{2}^{R}}\ldots {\alpha _{n}^{R}} x)$ and $C_{\ell } = (\beta _{1}\beta _{2}\ldots \beta _{m} y, {\beta _{1}^{R}}{\beta _{2}^{R}}\ldots ,{\beta _{m}^{R}}x)$, where x, y ∈ X, α₁, α₂,…,α_n, β₁, β₂,…,β_m ∈ X⁺, |α₁| = |β₁|∈{1,3} and |α_i|,|β_j| = 2 for 2 ≤ i < n and 2 ≤ j < m. Then if α₁[1] = β₁[1], n = m, and α_n[1] = β_m[1], we have B_k = C_ℓ.

Proof

Suppose that all the conditions of the lemma are met. Note that B_k and C_ℓ are both end blocks. Note also that by Lemma 8.5, B_k and C_ℓ have the same type.

Since B_k− 1 = C_ℓ− 1, there exist u, v ∈ X and $\mu _{1},\mu _{2},\mu _{1}^{\prime },\mu _{2}^{\prime } \in X^{*}$ such that E₁ and E₂ are given by:

$$ \begin{array}{@{}rcl@{}} &&E_{1}: \quad x \mu_{1} u \alpha_{1} \alpha_{2} {\ldots} \alpha_{n} y \doteq y \mu_{2} v {\alpha_{1}^{R}} {\alpha_{2}^{R}} {\ldots} {\alpha_{n}^{R}} x\\ &&E_{2}: \quad x \mu_{1}^{\prime} u \beta_{1} \beta_{2} {\ldots} \beta_{n} y \doteq y \mu_{2}^{\prime}v {\beta_{1}^{R}} {\beta_{2}^{R}} {\ldots} {\beta_{n}^{R}} x. \end{array} $$

By the assumption that α₁[1] = β₁[1], we have that if |α₁| = |β₁| = 1 then trivially α₁ = β₁, and if |α₁| = |β₁| = 3, then α₁ = β₁ by Lemma 8.4. In both cases, it follows from Corollary 8.2 that α_i = β_i for $1\leq i < \min \limits \{n,m\}$. It follows from Lemma 8.1 that n = m, and from Lemma 8.8 that α_n = β_m. Consequently, we have B_k = C_ℓ as claimed. □

Lemma 8.9 reveals that the options for last block are dependent only on the choices of three parameters: α₁[1],α_n[1], and n. Since each of these can take at most |E| possible values, there are |E|³ possibilities altogether. Thus for each possible number of blocks, there are at most |E|³ possible block decompositions, and therefore only |E|⁴ possible block decompositions respecting the invariant Υ_E in total. Since every equation in Lex Normal Form permits a unique block decomposition, this gives us our desired polynomial bound.

Theorem 8.10

Let E be a jumbled basic RWE. Let S be the set of basic regular equations $E^{\prime }$ in Lex Normal Form for which ${\varUpsilon }_{E} = {\varUpsilon }_{E^{\prime }}$. Then Card(S) ≤|E|⁴.

Proof

We shall count possible block decompositions of equations $E^{\prime }$ for which ${\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{E} = {\varUpsilon }$. Since the block decomposition uniquely determines the equation, this count is an upper bound on the number of equations in S. Note that ${\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{E}$, implies ${var}(E^{\prime }) = {var}(E)$.

It is straightforward from the definitions that any block decomposition of an equation $E^{\prime }$ can have at most ${\text {Card}}({var}(E^{\prime })) = {\text {Card}}({var}(E)) < |E|$ blocks, so it is sufficient to count how many block decompositions with exactly N blocks are possible for each N ≤Card(var(E)).

We start with the case that the block decomposition consists of exactly one block (N = 1). Suppose we have two basic regular word equations E₁, E₂ in Lex Normal Form, such that ${\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}} = {\varUpsilon }$ (and so additionally var(E₁) = var(E₂) = var(E)). Suppose that (B₀) and (C₀) are the block decompositions of E₁ and E₂ respectively. By definition B₀ = E₁ and C₀ = E₂. It follows that $B_{0} = (x\alpha _{1} \alpha _{2} {\ldots } \alpha _{o} y, y {\alpha _{1}^{R}} {\alpha _{2}^{R}} {\ldots } {\alpha _{o}^{R}} x)$ and $C_{0} = (x^{\prime } \alpha _{1}^{\prime } \alpha _{2}^{\prime } {\ldots } \alpha _{m}^{\prime } y^{\prime }, y^{\prime } \alpha _{1}^{\prime },\alpha _{2}^{\prime },\ldots , \alpha _{m}^{\prime } x^{\prime } )$ where $x,x^{\prime },y,y^{\prime } \in X$ and $\alpha _{i},\alpha _{j}^{\prime } \in X^{+}$ for 1 ≤ i ≤ o, 1 ≤ j ≤ m and such that |α_i|,|α_j| = 2 for 1 ≤ i < o and 1 ≤ j < m. It is easily verified (either from the definition of ⇒, or from Remark 5.2) that $x = x^{\prime }$ and $y = y^{\prime }$. Moreover, we clearly must have o, m < Card(var(E)). Now suppose that o = m. Then by Corollary 8.2, we may conclude that $\alpha _{i} = \alpha _{i}^{\prime }$ for 1 ≤ i < n. Similarly, it follows from Lemma 8.9 that $\alpha _{n} = \alpha ^{\prime }_{n}$, and thus B₀ = C₀. Hence, for each possible value of o, there is at most one possible block decomposition, meaning there are fewer than Card(var(E)) < |E| possible block decompositions containing only one block.

Now consider the cases that there is more than one block in the block decomposition (1 < N ≤Card(var(E))). Suppose we have two basic regular word equations E₁, E₂ in Lex Normal Form, such that ${\varUpsilon }_{E_{1}} = {\varUpsilon }_{E_{2}} = {\varUpsilon }$. Suppose that (B₀, B₁, B₂,…,B_n) and (C₀, C₁,…,C_n) are the block decompositions of E₁ and E₂ respectively, and that they have the same number of blocks 1 < n ≤Card(var(E)). By Corollary 8.7, we have that B_i = C_i for 0 ≤ i ≤ n − 1. By Lemma 8.9, there are at most |E|³ possibilities for the end block C_n. Thus there are at most |E|³ block decompositions overall with exactly n blocks for 1 < n ≤|E|. Thus at most |E|⁴ possible block decompositions in total, and the statement of the theorem follows. □

For a jumbled basic RWE E, since every vertex in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is a small (i.e. bounded by a polynomial in |E|) distance from a vertex in Lex Normal Form, and since there are only a small number of such vertices, it is straightforward to show that the diameter of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ must also be small: indeed if we have a sufficiently long path between two vertices, then we must have a long path between two vertices which are close to the same vertex in Lex Normal Form. Since they are close to the same vertex, we can find a shortcut between them, and the initial long path is not minimal. Knowing that the diameter of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is bounded by a polynomial in |E| when E is jumbled and basic, it follows from Theorems 6.8 and 4.8 (see also Remark 4.6) and Proposition 3.5 that the diameter of ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ is bounded by a polynomial in |E| whenever E is regular.

Theorem 8.11

Let E be a basic RWE. Then ${diam}({\mathscr{G}}_{[E]}^{\Rightarrow }) \in O(|E|^{10})$. Consequently, for any RWE E, ${diam}({\mathscr{G}}_{[E]}^{\Rightarrow _{NT}}) \in O(|E|^{12})$.

Proof

We shall first consider the case of ${diam}({\mathscr{G}}^{\Rightarrow }_{[E]})$ when E is jumbled, basic and regular. Let $S = \{ E^{\prime } \in [E]_{\Rightarrow } \mid E^{\prime } \text { is in Lex Normal Form}\}$. By Theorem 5.3, ${\varUpsilon }_{\!E_{1}} = {\varUpsilon }_{\!E_{2}}$ for all E₁, E₂ ∈ [E]_⇒. Thus, by Theorem 8.10, we have that Card(S) ≤|E|⁴. Moreover, by Theorem 7.11, for every $E^{\prime } \in [E]_{\Rightarrow }$, there exists some $\hat {E^{\prime }} \in S$ such that $E^{\prime }$ is at most distance O(|E|⁴) from $\hat {E^{\prime }}$, and $\hat {E^{\prime }}$ is at distance at most O(|E|⁴) from $E^{\prime }$ in the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$. From this, we may conclude that ${diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(|E|^{8})$ as follows: suppose for contradiction that, for an appropriate constant c, there exist $\overline {E}_{1},\overline {E}_{2} \in [E]_{\Rightarrow }$ such that the minimal path between them in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ has length at least 2c|E|⁸ + 1. Let that path be E₁, E₂,…,E_n where $E_{1} = \overline {E}_{1}$, $E_{n} = \overline {E}_{2}$, and E_i ⇒ E_i+ 1 for 1 ≤ i ≤ n and such that n > 2c|E|⁸ + 1. Now, to each E_i, 1 ≤ i ≤ n, we may associate some $\hat {E_{i}} \in S$ such that the distance from E_i to $\hat {E_{i}}$ is at most c|E|⁴. Since Card(S) ≤|E|⁴ and n > 2c|E|⁸ + 1, we must have that there exists $\hat {E} \in S$ such that $\hat {E} = \hat {E_{i}}$ for at least 2c|E|⁴ + 1 different values of i. This implies in particular that there exist i₁, i₂ with i₁ − i₂ > 2c|E|⁴ such that $\hat {E_{i_{1}}} = \hat {E_{i_{2}}}$. It follows that the length of the path $E_{i_{1}}, E_{i_{1}+1}, {\ldots } E_{i_{2}}$ is at least 2c|E|⁴ + 1, and moreover, since E₁, E₂,…,E_n is the shortest path between E₁ and E₂, $E_{i_{1}}, E_{i_{1}+1}, {\ldots } E_{i_{2}}$ must also be the shortest path between $E_{i_{1}}$ and $E_{i_{2}}$. However, we have that $E_{i_{1}}$ is distance at most c|E|⁴ from $\hat {E}$, and that $\hat {E}$ is at most distance $E_{i_{2}}$ at most c|E|⁴ from $E_{i_{2}}$. Consequently, $E_{i_{1}}$ is distance at most 2c|E|⁴ from $E_{i_{2}}$, a contradiction to the fact that $E_{i_{1}}, E_{i_{1}+1}, {\ldots } E_{i_{2}}$ is the shortest possible path. Consequently, if E is jumbled basic and regular, then ${diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(|E|^{8})$.

Now we shall consider the case that E of ${diam}({\mathscr{G}}^{\Rightarrow }_{[E]})$ when E is basic and regular, but not necessarily jumbled. Suppose that E is given by $\alpha \doteq \beta $. Let Y = var(E)∖Δ(E) and let $E^{\prime }$ be the equation $\pi _{Y}(\alpha ) \doteq \pi _{Y}(\beta )$. Clearly, $E^{\prime }$ is basic, regular and $|E^{\prime }| \leq |E|$. By Theorem 6.8, we have that ${diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O({diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]})|E|^{2})$. Moreover, by Lemma 6.3, $E^{\prime }$ is jumbled. Thus by our previous claim, it follows that ${diam}({\mathscr{G}}^{\Rightarrow }_{[E]}) \in O(|E^{\prime }|^{8}|E|^{2}) = O(|E|^{10})$.

Finally, we consider the case of ${diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) $ for arbitrary regular equations E. Let E be any regular word equation. Then by Proposition 3.5, ${diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \leq 1+(|E|+1)m$ where

$$ m= \max\{{diam}(\mathscr{G}^{\Rightarrow}_{[E^{\prime}]}) \mid E \Rightarrow_{NT}^{*} E^{\prime}\}.$$

Now fix $E^{\prime }$ be such that $E \Rightarrow _{NT}^{*} E^{\prime }$ and ${diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}) = m$. Then since $E \Rightarrow _{NT}^{*} E^{\prime }$, $E^{\prime }$ is also regular and $|E^{\prime }| \leq |E|$. Moreover by Theorem 4.8, there exists a basic regular equation $E^{\prime \prime }$ such that $|E^{\prime \prime }| \leq |E|$ and such that ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]}$ is isomorphic to an isolated path compression of order $|E^{\prime }|$ of ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$. Thus (cf. Remark 4.6), we have $m \leq |E^{\prime }| {diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]})$. Since $E^{\prime \prime }$ is basic and regular, we have that ${diam}({\mathscr{G}}^{\Rightarrow }_{[E^{\prime \prime }]}) \in O(|E^{\prime \prime }|^{10})$. Since $|E^{\prime \prime }|, |E^{\prime }| \leq |E|$, we therefore have m ∈ O(|E|¹¹) and ${diam}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \in O(|E|^{12})$. □

Due to Proposition 3.4, we may infer directly from Theorem 8.11 that the satisfiability problem for regular word equations is in NP. It was already shown in [8] that this problem is NP-hard, and thus we obtain matching upper and lower bounds for its complexity.

Theorem 8.12

The satisfiability problem for RWEs is NP-complete.

Proof

Directly from Theorem 8.11 and Proposition 3.4. □

9 Size

While the diameter of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is one important parameter, being directly related to the complexity of the satisfiability problem, it is by no means the only interesting one. The overall size of the graphs will also play a central role in the practical performance of the algorithm described in Section 3.

For basic RWEs, we are able to give tight upper and lower bounds on the number of vertices in the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$, as well as identifying the cases in which these bounds are reached. Recalling Theorem 4.8, we are also able to translate these bounds into the case of general (i.e. not basic) RWEs. In particular, when moving to a general RWE from the corresponding basic one, the effect on the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is that ‘isolated paths’ of length linear in |E| are collapsed. In fact, an inspection of the proofs (in particular of Lemma 4.7) yields a tighter bound, namely that collapsed paths will have at most $\max \limits (T_{1},T_{2})$ internal vertices where T₁ and T₂ are the number of occurrences of terminal symbols and single-occurrence variables in the LHS and RHS respectively.

Corollary 9.1

Let E be an RWE given by $\alpha \doteq \beta $. Let E_basic be the corresponding basic equation as per Theorem 4.8. Let n = Card(qv(E)) and let $M = \max \limits \{ |\alpha |-n, |\beta |-n \}$. Then

$$ {\text{Card}}([E_{basic}]_{\Rightarrow}) \leq {\text{Card}}([E]_{\Rightarrow}) \leq M{\text{Card}}([E_{basic}]_{\Rightarrow}).$$

We begin with the upper bounds, which occur in the case of basic regular-rotated word equations.

Lemma 9.2

Let E be a basic regular word equation. Let n = Card(var(E)) and suppose that n ≥ 2. Let V be the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$. Then $V\leq \frac {n!}{2}$. Moreover, $V = \frac {n!}{2}$ if and only if there exists $E^{\prime } \in [E]_{\Rightarrow }$ such that $E^{\prime }$ is regular rotated.

Proof

Let E be a basic regular word equation. Let n = Card(var(E)) and suppose that n ≥ 2. Let V = Card([E]_⇒) be the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$. We shall begin with the claim that $V \leq \frac {n!}{2}$. To do this, we recall that from Theorem 5.3, the set $S_{{\varUpsilon }} = \{E^{\prime } \mid E^{\prime } \text { is a basic regular equation such that } {\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }_{\!E} \}$ is a (not necessarily strict) superset of [E]_⇒. We shall show that the cardinality of S_Υ is at most $\frac {n!}{2}$. Let Υ = Υ_E and let $E^{\prime }$ be a regular basic equation such that ${\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }$. Now, it follows from the definition of Υ that ${var}(E^{\prime }) = {var}(E)$ and that the rightmost variables the LHS (resp. RHS) of E and $E^{\prime }$ are the same. More precisely, there exist x, y ∈ var(E) and $\alpha ,\alpha ^{\prime },\beta ,\beta ^{\prime } \in X^{*}$ such that E may be written $\alpha x \doteq \beta y$ and $E^{\prime }$ may be written as $\alpha ^{\prime } x \doteq \beta ^{\prime } y$. Clearly, there are at most (n − 1)! possibilities for $\alpha ^{\prime }$. Moreover, since ${\varUpsilon }_{\!E^{\prime }} = {\varUpsilon }$ is fixed, we can, given $\alpha ^{\prime }$, for each $u \in {var}(\beta ^{\prime }) \backslash \{\alpha ^{\prime }[1],\beta ^{\prime }[1]\}$, determine uniquely the predecessor of u in $\beta ^{\prime } y$. More precisely, there exist factors vu and $v^{\prime }u$ of $\alpha ^{\prime }x$ and $\beta ^{\prime }y$ respectively where $v,v^{\prime }\in {var}(E)$. Thus $(v,v^{\prime }) \in {\varUpsilon }$, so if v is fixed (i.e. by $\alpha ^{\prime }$) then $v^{\prime }$ is also fixed by Υ. It follows directly that for each choice of $\alpha ^{\prime }$, there exists a unique suffix γ of $\beta ^{\prime }y$ having $\alpha ^{\prime }[1]$ as a prefix. Moreover, once the variable occurring immediately to the left of γ (i.e. the predecessor of γ[1] in $\beta ^{\prime }y$) is fixed, then $\beta ^{\prime }y$ is fixed entirely, meaning that there are n −|γ| possible choices for $\beta ^{\prime }y$ once $\alpha ^{\prime }$ is fixed.

Next, we shall show that for each k,1 ≤ k ≤ n − 1, there are exactly (n − 2)! choices of $\alpha ^{\prime }$ such that the corresponding γ has length exactly k. For other values of k, there are no possible choices of $\alpha ^{\prime }$ due to the fact that every equation in S_Υ is basic and regular (note in particular that the case k = n would result in an equation which is decomposable and therefore not basic). It follows from this that the cardinality of S_Υ is at most $\frac {n!}{2}$:

$$ {\text{Card}}(S_{{\varUpsilon}}) \leq \sum\limits_{k = 1}^{n-1}k(n-2)! = (n-2)!\sum\limits_{k = 1}^{n-1} k = (n-2)! \frac{n(n-1)}{2} = \frac{n!}{2}.$$

To see why there are exactly (n − 2)! choices of $\alpha ^{\prime }$ such that the corresponding γ has length k, we shall take a slightly different approach to constructing/selecting $\alpha ^{\prime }$ and $\beta ^{\prime }$. In particular, we shall first choose γ and then see how many choices there are for $\alpha ^{\prime }$. Let $k \in \mathbb {N}$ such that 1 ≤ k < n.

By definition of Υ_E, we must have that if γ = v₁v₂…v_k− 1y, then there exist u₁, u₂,…,u_k− 1 ∈ var(E) such that $\alpha ^{\prime }[1] = v_{1}$ and (u_i, v_i) ∈Υ for 1 ≤ i ≤ k − 2, (u_k− 1, y) ∈Υ, and such that u_k− 1y is a factor of $\alpha ^{\prime }x$ and u_iv_i+ 1 are factors of $\alpha ^{\prime }x$ for 1 ≤ i ≤ k − 2. Since $E^{\prime }$ is regular, it follows that v_i≠x for 1 ≤ i ≤ k − 1. Consequently, there are ${{n-2}\choose {k-1}} (k-1)! = \frac {(n-2)!}{(n-k-1)!}$ possible ways of choosing γ. Once γ is fixed, then, since u_k− 1y is a factor of $\alpha ^{\prime }x$ and u_iv_i+ 1 are factors of $\alpha ^{\prime }x$ for 1 ≤ i ≤ k − 2, we may infer that $\alpha ^{\prime }$ is uniquely determined by the relative order of the variables in var(E)∖{x, y, v₁, v₂,…,v_k− 1}, and thus there are (n − k − 1)! possible choices for $\alpha ^{\prime }$ for each choice of γ. Altogether we have $(n-k-1)! \frac {(n-2)!}{(n-k-1)!} = (n-2)!$ possible choices for $\alpha ^{\prime }$ as claimed, and it follows that $V \leq \frac {n!}{2}$.

It remains to consider the claim that $V = \frac {n!}{2}$ if and only if there exists $E^{\prime } \in [E]_{\Rightarrow }$ such that $E^{\prime }$ is regular rotated. Note that since n > 1, and since $E^{\prime }$ is basic (and therefore indecomposable) for all $E^{\prime } \in [E]_{\Rightarrow }$, $E^{\prime }$ is not regular ordered for all $E^{\prime } \in [E]_{\Rightarrow }$.

We shall begin with the ‘if’ direction. Let V = Card([E]_⇒) be the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$. Then we may assume w.l.o.g. that E is regular rotated and thus we can write E as $y_{1}y_{2}{\ldots } y_{k} x_{1} y_{k+1} y_{k+2} {\ldots } y_{\ell } x_{2} \doteq y_{k+1} y_{k+2} {\ldots } y_{\ell } x_{2} y_{1}y_{2}{\ldots } y_{k} x_{1}$ where x₁, x₂, y₁, y₂,…y_ℓ ∈ X, ℓ = n − 2 and k ≤ ℓ. Then Δ(E) = {y₁, y₂,…,y_ℓ}. Consequently, by Theorem 6.8, the set of equations

$$ S= \{\alpha x_{1} \beta x_{2} \doteq \beta x_{2} \alpha x_{1} \mid |\alpha\beta|_{y} = 1 \text{ if }y \in {{\varDelta}}(E) \text{ and } |\alpha\beta|_{z} = 0 \text{ otherwise}\}$$

is a subset of [E]_⇒. Now, for each i,1 ≤ i ≤ ℓ = Card(Δ(E)), let the set S_i ⊂ S be the set

$$ S_{i}= \{\alpha x_{1} \beta x_{2} \!\doteq\! \beta x_{2} \alpha x_{1} \mid |\alpha| = i \land |\alpha\beta|_{y} = 1 \text{ if }y \in {{\varDelta}}(E) \text{ and } |\alpha\beta|_{z} = 0 \text{ otherwise} \}.$$

Clearly, we have $S = \bigcup \limits _{0 \leq i \leq \ell } S_{i}$. Moreover, we have that Card(S_i) = ℓ! = (n − 2)! for each i,0 ≤ i ≤ ℓ. Finally, note that for each i,0 ≤ i ≤ ℓ, if $E^{\prime } \in S_{i}$, then for $T_{E^{\prime }} = \{E^{\prime \prime } \mid E^{\prime } \Rightarrow _{R}^{*} E^{\prime \prime } \}$, we have that ${\text {Card}}(T_{E^{\prime }}) = i+1$ It is straightforward from the definitions that for E₁, E₂ ∈ S, if E₁≠E₂, then $T_{E_{1}} \cap T_{E_{2}} = \emptyset $. Consequently, we may conclude that

$$V \geq \sum\limits_{E^{\prime} \in S} {\text{Card}}(T_{E^{\prime}}) = \sum\limits_{0 \leq i \leq \ell} (i+1){\text{Card}}(S_{i}) = \frac{(\ell+1)(\ell+2)}{2}(n-2)! = \frac{n!}{2}.$$

We have already shown that $V \leq \frac {n!}{2}$, so $V= \frac {n!}{2}$ as required.

Suppose now that $E^{\prime }$ is not regular rotated for all $E^{\prime } \in [E]_{\Rightarrow }$. To see that $V< \frac {n!}{2}$, it suffices to notice that we can decrease the bound on Card(S_Υ) if not all the previously considered possibilities for the left-hand-sides $\alpha ^{\prime }y$ are actually possible.

Recall from the Theorem 5.3 that ${{\varDelta }}(E) = {{\varDelta }}(E^{\prime })$ for all $E^{\prime } \in [E]_{\Rightarrow }$. Moreover, it follows from the definitions that the rightmost variables on each side of the equation are not contained in (Δ(E)) and thus Card(Δ(E)) ≤ n − 2. Next, suppose (for contradiction) that Card(Δ(E)) = n − 2. Then there exist z₁, z₂,…,z_n ∈ X and i,1 ≤ i < n such that z_n is a suffix of the LHS of E and z_i is a suffix of the RHS of E, meaning that Δ(E) = {z_j∣1 ≤ j < n, j≠i}. Consequently, there exists j, i < j ≤ n such that E may be written $z_{1} z_{2} {\ldots } z_{n} \doteq z_{j+1} {\ldots } z_{n-1} z_{n} z_{i+1} {\ldots } z_{j-1} z_{j} z_{1}{\ldots } z_{i-2} z_{i-1} z_{i}$. Thus $E \Rightarrow _{L}^{*} E^{\prime }$ where $E^{\prime }$ is given by $ z_{1} z_{2} {\ldots } z_{n} \doteq z_{i+1} {\ldots } z_{j-1} z_{j} z_{j+1} {\ldots } z_{n-1} z_{n} z_{1}{\ldots } z_{i-2} z_{i-1} z_{i}$. However, $E^{\prime }$ is regular-rotated, a contradiction.

Hence, we may assume that Card(Δ(E)) < n − 2, and consequently, there exist pairwise distinct variables u, v, x, y ∈ var(E) such that (u, v),(x, y) ∈Υ_E. However, if this is the case, then the LHS of any equation in [E]_⇒ cannot contain both the factors uv and xy. Suppose for contradiction that both factors were present in the LHS, then by definition of Υ_E, there must exist z ∈ X such that either uz is a factor of the LHS and vz is a factor of the RHS, or xz is a factor of the LHS and yz is a factor of the RHS. W.l.o.g. we may assume the first case that uz is a factor of the LHS and vz is a factor of the RHS. However, by the assumption that uv is also a factor of the LHS, we have z = v, and consequently vv is a factor of the RHS, a contradiction to the fact that E is regular. It follows in this case that ${\text {Card}}(S_{{\varUpsilon }}) < \frac {n}{2}$, and thus that $V < \frac {n}{2}$. □

We can use Corollary 9.1 to adapt Lemma 9.2 to general RWEs as follows. Let E be a RWE given by $\alpha \doteq \beta $, let n = Card(qv(E)), and let $T = \max \limits \{|\alpha |-n, |\beta |-n\}$. Let E_basic be the corresponding basic RWE as per Theorem 4.8. Clearly for Card([E]_⇒) to be maximal, E should be indecomposable. Now, by Corollary 9.1, we have that ${\text {Card}}([E]_{\Rightarrow }) \leq T {\text {Card}}([E_{basic}]_{\Rightarrow }) \leq T\frac {n!}{2} \leq \frac {(n+T)!}{2} = \frac {(\max \limits \{|\alpha |,|\beta |\})!}{2} $.

Note also that if E is not regular-rotated, then either E_basic is not regular-rotated, or E is decomposable and E_basic is regular-rotated but with fewer variables. In either case it follows that the second inequality becomes strict. Similarly, if T≠ 0, then the third inequality becomes strict. Hence we get the following.

Corollary 9.3

Let E be a RWE given by $\alpha \doteq \beta $. Let $M = \max \limits \{|\alpha |,|\beta |\}$. Let V be the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$. Then $V\leq \frac {M!}{2}$. Moreover, $V = \frac {M!}{2}$ if and only if E is basic and there exists $E^{\prime } \in [E]_{\Rightarrow }$ such that $E^{\prime }$ is regular rotated.

For upper bounds on the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$, we consider the class of regular-reversed equations. We shall eventually prove a statement similar to that of Lemma 9.2, but first we need some additional definitions and lemmas. Our reasoning in this case revolves primarily around a particular binary-tree like structure arising locally in the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$. The binary trees do not occur directly as subgraphs of ${\mathscr{G}}^{\Rightarrow }_{[E]}$, but rather can be obtained by treating certain short paths as edges. The relation defining the ‘edges’ of the tree is given by ⊳, introduced formally below. By showing that these binary trees always occur in the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$, and by verifying that they are balanced and have height proportional to the number of edges, we are able to produce the lower bound on the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ given in Lemma 9.11.

Definition 9.4 (→_R,→_L,⊳,W(E))

Let E be a basic RWE such that Card(var(E)) ≥ 2. Then we may write E in the form

$$x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta$$

with x, y, z₁, z₂,…,z_k, w₁, w₂,…,w_k ∈ X such that {z₁, z₂,…,z_k} = {w₁, w₂,…,w_k}, and α, β, γ₁, γ₂,…,γ_k, δ₁, δ₂,…,δ_k ∈ (X∖{x, y, z₁, z₂,…,z_k})^∗ such that for each i, j, 0 ≤ i, j ≤ k, we have var(γ_i) ∩ var(δ_j) = ∅. Note that this decomposition is unique. We define W(E) = {x, y, z₁, z₂,…,z_k}. Moreover, there exist i, j such that w_i = z_k and z_j = w_k. We define the relations →_L and →_R such that

$$ \begin{array}{@{}rcl@{}} && x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta\\ \to_{L} && {}x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} \!\ldots\! z_{k} \gamma_{k} y \alpha\! \doteq\! w_{i} \delta_{i} w_{i+1} \delta_{i+1} \!\ldots\! w_{k} \delta_{k} y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{i-1} \delta_{i-1} x \beta \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} && x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta\\ \to_{R} && z_{j} \gamma_{i} z_{j+1} \gamma_{j+1} \!\ldots\! z_{k} \gamma_{k} x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} \!\ldots\! z_{j-1} \gamma_{j-1} y \alpha \!\doteq\! y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta \end{array} $$

Additionally, for convenience, we define ⊳ =→_L∪→_R.

The tree-structure we are interested in is the set $S = \{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}$ for a given basic RWE E with at least two variables (the one-variable case being trivial). An example is given by Fig. 7. The following fact can be verified directly from the definition, and confirms that the set S is indeed contained in ${\mathscr{G}}^{\Rightarrow }_{[E]}$.

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig7_HTML.png — Fig. 7
The set $S = \{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}$ occurring as a subset of the vertices of the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ in the case that E is given by $x_{1}x_{2}x_{3}x_{4} \doteq x_{4}x_{2}x_{3}x_{1}$. In order to conserve space, for each vertex, the equation is arranged vertically with the LHS above and the RHS below. The vertices belonging to S are highlighted in bold, and E is shaded (blue). The tree structure induced by the relation ⊳ is given by the bold solid edges, while the edges of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ are dashed. Note that the edges due to ⊳ do not necessarily coincide with edges due to ⇒, but for every ⊳-edge, there is a corresponding path using ⇒-edges, guaranteeing that $S \subseteq [E]_{\Rightarrow }$. In this case we have that W(E) = var(E) = {x₁, x₂, x₃, x₄}, so S forms a tree of height 2^{4 − 2} − 1 = 3, and contains exactly 2^{4 − 1} − 1 = 7 equations

Fact 9.5

Let E₁, E₂ be basic RWEs with Card(var(E₁)),Card(var(E₂)) ≥ 2. Let Z ∈{L, R}. If E₁ →_ZE₂, then $E_{1} \Rightarrow _{Z}^{*} E_{2}$. Conversely, if E₁ ⇒_ZE₂, then either $E_{1} \to _{Z}^{*} E_{2}$ or $E_{2} \to _{Z}^{*} E_{1}$.

In what follows, in order to understand the number of equations occurring in $S = \{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}$, we shall show that when combined with the relation ⊳, it becomes a balanced binary tree of height Card(W(E)) − 1. We proceed by noting two more facts following directly from the definition. Fact 9.6 provides the first step towards understanding why ⊳ induces a binary tree like structure on S: the leaf nodes are equations for which Card(W(E)) = 2, while all other equations have exactly two children w.r.t. ⊳.⁶

Fact 9.6

Let E be a basic RWE with Card(var(E)) ≥ 2. Then the following statements are equivalent.

1.

Card(W(E)) > 2,
2.

there exists $E^{\prime }$ such that $E \to _{L} E^{\prime }$,
3.

there exists $E^{\prime }$ such that $E \to _{R} E^{\prime }$.

Fact 9.7 allows us to infer exactly the height of the tree by establishing a natural ordering (namely the cardinality of W(E)) on equations. Note that by Fact 9.6, whenever we move from a an equation to one of its children w.r.t. ⊳, we decrease Card(W(E)) by exactly one.

Fact 9.7

Let E₁, E₂ be basic RWEs with Card(var(E₁)),Card(var(E₂)) ≥ 2. Let Z ∈{L, R} and suppose that E₁ →_ZE₂. Suppose that x, y ∈ X and let α₁, α₂, β₁, β₂ ∈ (X∖{x, y})^∗ such that E₁ may be written $x\alpha _{1} y \alpha _{2} \doteq y \beta _{1} x \beta _{2}$. If Z = L, then W(E₂) = W(E₁)∖{y} and if Z = R, then W(E₂) = W(E₁)∖{x}.

Facts 9.7 and 9.6 are sufficient to observe that the set $\{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}$ combined with ⊳ forms a DAG of bounded height. However, this is not sufficient for our purposes of providing a lower bound on the number of equations contained in $\{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}$. The following lemma shows that this DAG is in fact a tree by confirming that for each equation (which is not a leaf node), the two ‘subtrees’ rooted at the two children of that equation do not share any vertices.

Lemma 9.8

Let E, E₁, E₂ be basic regular word equations such that E →_LE₁ and E →_RE₂. Let $S_{1} = \{E_{1}^{\prime } \mid E_{1} \triangleright ^{*} E_{1}^{\prime }\}$ and let $S_{2} = \{E_{2}^{\prime } \mid E_{2} \triangleright ^{*} E_{2}^{\prime }\}$. Then S₁ ∩ S₂ = ∅ and E∉S₁ ∪ S₂.

Proof

The fact that E∉S₁ ∪ S₂ follows from the fact that, by Fact 9.7, for all $E^{\prime } \in S_{1} \cup S_{2}$, we have ${\text {Card}}(W(E^{\prime })) \leq {\text {Card}}(W(E_{1})) = {\text {Card}}(W(E_{2})) < {\text {Card}}(W(E))$. We shall next consider the claim that S₁ ∩ S₂ = ∅. Notice that it follows from the definitions of →_R and →_L that if $E^{\prime } \triangleright E^{\prime \prime }$ and $w \in {var}(E^{\prime }) \backslash W(E^{\prime })$, then firstly $w \in {var}(E^{\prime \prime }) \backslash W(E^{\prime \prime })$, and secondly $Q_{E^{\prime }}(w) = Q_{E^{\prime \prime }}(w)$ where $Q_{E^{\prime }}, Q_{E^{\prime \prime }}$ are the functions defined in accordance with Definition 5.1. Now, if Card(W(E)) ≤ 2, then the statement follows trivially. Otherwise let x, y, z₁, z₂,…,z_k, w₁, w₂,…,w_k ∈ X such that {z₁, z₂,…,z_k} = {w₁, w₂,…,w_k}, and α, β, γ₁, γ₂,…,γ_k, δ₁, δ₂,…,δ_k ∈ (X∖{x, y, z₁, z₂,…,z_k})^∗ such that var(γ_i) ∩ var(δ_j) = ∅ for 0 ≤ i ≤ k and such that E may be written as:

$$x \gamma_{0} z_{1} \gamma_{1} z_{2} \gamma_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} w_{1} \delta_{1} w_{2} \delta_{2} {\ldots} w_{k} \delta_{k} x \beta. $$

From Fact 9.7, it follows that y∉W(E₁), so we may conclude that $Q_{E^{\prime }}(y) = Q_{E_{1}}(y)$ for all $E^{\prime } \in S_{1}$. Similarly, it follows from Fact 9.7 that x∉W(E₂), and we may hence conclude that $Q_{E^{\prime }}(x) = Q_{E_{2}}(x)$ for all $E^{\prime } \in S_{2}$. Now, let u, v be the rightmost variables in z_kγ_k and w_kδ_k respectively. Then $Q_{E_{1}}(y) = Q_{E_{2}}(x) = (u,v)$. However, since $E^{\prime }$ is regular, x≠y, so by properties of the functions $Q_{E^{\prime }}$ (namely that by Remark 5.2 they are injective), we cannot have that $Q_{E^{\prime }}(x) = (u,v)$ for any $E^{\prime } \in S_{1}$ and likewise we cannot have $Q_{E^{\prime }}(y) = (u,v)$ for any $E^{\prime } \in S_{2}$. Consequently, S₁ ∩ S₂ = ∅. □

Lemma 9.8, along with Facts 9.6 and 9.7, are sufficient to confirm our claim that the set $\{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}$ forms a balanced binary tree of height Card(W(E)) − 2. Thus we are now in a position to state the cardinality of $\{E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}$ precisely as follows.

Lemma 9.9

Let E be a basic regular word equation such that Card(W(E)) ≥ 2. Let $S = \{ E^{\prime } \mid E \triangleright ^{*} E^{\prime }\}$. Then Card(S) = 2^{Card(W(E))− 1} − 1.

Proof

We shall prove the claim by induction on Card(W(E)). If Card(W(E)) = 2 then S = {E} and the statement is immediate. Now suppose that the claim holds for all basic regular word equations E such that Card(W(E)) ≤ n for some n ≥ 2. Let E be a basic regular word equation such that Card(W(E)) = n + 1. Then Card(W(E)) > 2, so by Fact 9.6, there exist E₁, E₂ ∈ [E]_⇒ such that E →_LE₁ and E →_RE₂. From the definitions, we have that S = {E}∪ S₁ ∪ S₂ where $S_{1} = \{E_{1}^{\prime } \mid E_{1} \triangleright ^{*} E_{1}^{\prime }\}$ and $S_{2} = \{E_{2}^{\prime } \mid E_{2} \triangleright ^{*} E_{2}^{\prime }\}$. By Lemma 9.8, it follows that Card(S) = 1 + Card(S₁) + Card(S₂). Moreover, since Card(W(E₁)) = Card(W(E₂)) = n, we have from our induction hypothesis that Card(S₁) = Card(S₂) = 2^n− 1 − 1. Thus we have Card(S) = 2(2^n− 1 − 1) + 1 = 2^{(n+ 1)− 1} − 1 as required. □

Lemma 9.9 together with Fact 9.5 are sufficient to provide lower bounds on the number of vertices of ${\mathscr{G}}^{\Rightarrow {[E]}}$, and we are nearly ready to provide the counterpart to Lemma 9.2. The final step before we do so is the following lemma which characterises the basic RWEs E for which the set of vertices of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is exactly W(E). Since by Fact 9.5, W(E) is always a subset of the vertices of ${\mathscr{G}}^{\Rightarrow }_{[E]}$, this naturally leads us to the extremal case in which the lower bound is obtained.

Lemma 9.10

Let E be a basic regular word equation. Let $S = \{E^{\prime } \mid E\triangleright ^{*} E^{\prime }\}$. Then S = [E]_⇒ if and only if E is regular reversed.

Proof

Let E be a basic regular word equation. If Card(var(E)) = 1 then E can be written as x = x, for some x ∈ X, meaning that E is regular reversed, and moreover, that S = [E]_⇒ = {E}, so the statement holds trivially. Suppose henceforth that Card(var(E)) ≥ 2.

Consider first the case that $E^{\prime }$ is not regular reversed for all $E^{\prime } \in [E]_{\Rightarrow }$. Then by Lemma 6.13, there exists E₁ ∈ [E]_⇒ such that E₁ has the form $x \alpha y \doteq y \beta x$ where x, y ∈ X and α, β ∈ (X∖{x, y})^∗. By our assumption, E₁ is not regular reversed. Hence we may write E₁ as:

$$x \alpha_{1} u \alpha_{2} v \alpha_{3} y \doteq y \beta_{1} u \beta_{2} v \beta_{3} x$$

where x, y, u, v ∈ X and α₁, α₂, α₃, β₁, β₂, β₃ ∈ (X∖{x, y, u, v})^∗. Thus, by Lemma 7.2 we have that E₂ ∈ [E]_⇒ where E₂ is given by $x \alpha _{1} v \alpha _{3} u \alpha _{2} y \doteq y \beta _{1} v \beta _{3} u \beta _{2} x$. However, Card(W(E₁)) = Card(W(E₂)) = n. Since by Fact 9.7, $E^{\prime } \triangleright E^{\prime \prime }$ implies ${\text {Card}}(W(E^{\prime \prime })) < {\text {Card}}(W(E^{\prime }))$, and hence ${\text {Card}}(W(E^{\prime })) < {\text {Card}}(W(E))$ for all $E^{\prime } \in S \backslash \{E\}$, we may immediately conclude that at least one of E₁, E₂∉S, and hence S≠[E]_⇒.

Now suppose that E is regular reversed. We have the following claim:

Claim 9.10.1

Let $E^{\prime } \in S$ be given by $\alpha \doteq \beta $. Then the equation $\pi _{W(E^{\prime })}(\alpha ) \doteq \pi _{W(E^{\prime })}(\beta ) $ is regular reversed.

Proof

We shall prove the claim by induction on ${\text {Card}}(W(E^{\prime }))$. In particular note that if ${\text {Card}}(W(E^{\prime })) = {\text {Card}}(W(E))$, then by Fact 9.7, we have $E^{\prime } = E$ and the statement holds trivially. Now suppose for some n that the claim holds for all $E^{\prime } \in S$ with ${\text {Card}}(W(E^{\prime })) \geq n$. Let $E^{\prime } \in S$ such that ${\text {Card}}(W(E^{\prime })) = n-1$. By definition, since $E^{\prime } \not = E$, there exists $E^{\prime \prime } \in S$ such that $E^{\prime \prime } \triangleright E^{\prime }$. By Fact 9.7, we have also that ${\text {Card}}(W(E^{\prime \prime })) = n$. Assume w.l.o.g. that $E^{\prime \prime } \to _{R} E^{\prime }$. Then by the induction hypothesis, there exist x, y, z₁, z₂,…,z_n− 2 ∈ X, and α, β, γ₀, γ₁, γ₂,…,γ_k, δ₀, δ₁, δ₂,…,δ_k ∈ (X∖{x, y, z₁, z₂, …,z_k})^∗ such that var(γ_i) ∩ var(δ_j) = ∅ for 0 ≤ i ≤ k and such that $E^{\prime \prime }$ is given by

$$x \gamma_{0} z_{1} \gamma_{1} z_{2} {\ldots} z_{k} \gamma_{k} y \alpha \doteq y \delta_{0} z_{k} \delta_{1} z_{k-1} \delta_{2} {\ldots} z_{1} \delta_{k} x \beta$$

and $E^{\prime }$ is given by

$$ z_{1} \gamma_{1} z_{2} {\ldots} z_{k} \gamma_{k} x \gamma_{0} y \alpha \doteq y \delta_{0} z_{k} \delta_{1} z_{k-1} \delta_{2} {\ldots} z_{1} \delta_{k} x \beta.$$

Note that $W(E^{\prime }) = W(E^{\prime \prime }) \backslash \{x\} = \{y,z_{1},z_{2},\ldots ,z_{k}\}$. Erasing all the variables not in $W(E^{\prime })$ from $E^{\prime }$ yields

$$ z_{1}z_{2} {\ldots} z_{k} y \doteq y z_{k} z_{k-1} {\ldots} z_{1}$$

which is regular reversed so the statement of the claim holds for $E^{\prime }$. By induction, it holds for all $E^{\prime } \in S$ as required. □

Now suppose for contradiction that [E]_⇒≠S. This implies that there exists $E^{\prime } \in [E]_{\Rightarrow }$ such that $E^{\prime } \notin S$. Now, by Fact 9.5, this implies that there exists a sequence E₁, E₂,…E_n such that E₁ = E, E_n∉S and such that either E_i ⊳ E_i+ 1 or E_i+ 1 ⊳ E_i for each i,1 ≤ i < n. Let us take the shortest such sequence. Note that this implies that E_i ∈ S for all i,1 ≤ i < n, and consequently, that E_i ⊳ E_i+ 1 for all i,1 ≤ i < n − 1, and that E_n− 1⋫E_n, meaning that E_n ⊳ E_n− 1 instead. It follows from the fact that W(E) = Card(var(E)), and by Fact 9.7 that there does not exist $E^{\prime } \in [E]_{\Rightarrow }$ such that $E^{\prime } \triangleright E$. Hence we may additionally conclude that n > 2. Moreover, since E_n− 2 ∈ S and E_n∉S, we have that E_n− 2 ≠ E_n. Thus we must necessarily have that either E_n− 2 → _LE_n− 1 and E_n →_RE_n− 1, or symmetrically E_n− 2 →_RE_n− 1 and E_n →_LE_n− 1. W.l.o.g. we may assume the first case holds. Then it follows from the definitions that there exist x₁, x₂, y₁, y₂, z₁, z₂, α₁, α₂, α₃, β₁, β₂, β₃, γ₁, γ₂, γ₃, δ₁, δ₂, δ₃ such that ${var}(\alpha _{1}\alpha _{2}) \subseteq {var}(\beta _{3})$, ${var}(\beta _{1}\beta _{2}) \subseteq {var}(\alpha _{3})$, ${var}(\gamma _{1}\gamma _{2}) \subseteq {var}(\delta _{3})$ and ${var}(\delta _{1}\delta _{2}) \subseteq {var}(\gamma _{3})$, and such that E_n− 2 is given by $x_{1} \alpha _{1} z_{1} \alpha _{2} y_{1} \alpha _{3} \doteq y_{1} \beta _{1} z_{1} \beta _{2} x_{1} \beta _{3}$, E_n is given by $x_{2} \gamma _{1} z_{2} \gamma _{2} y_{2} \gamma _{3} \doteq y_{2} \delta _{1} z_{2} \delta _{2} x_{2} \delta _{3}$, and therefore that E_n− 1 can be written both as

$$ z_{1} \alpha_{2} x_{1} \alpha_{1} y_{1} \alpha_{3} \doteq y_{1} \beta_{1} z_{1} \beta_{2} x_{1} \beta_{3} \text{ and as } x_{2} \gamma_{1} z_{2} \gamma_{2} y_{2} \gamma_{3} \doteq z_{2} \delta_{2} y_{2} \delta_{1} x_{2} \delta_{3}.$$

It follows that x₂ = z₁, z₂ = y₁, and thus that γ₁ = α₂x₁α₁, α₃ = γ₂y₂γ₃, β₁ = δ₂y₂δ₁, and δ₃ = β₂x₁β₃. Consequently, we may write E_n− 2 as:

$$ x_{1} \alpha_{1} z_{1} \alpha_{2} y_{1} \gamma_{2} y_{2} \gamma_{3} \doteq y_{1} \delta_{2} y_{2} \delta_{1} z_{1} \beta_{2} x_{1} \beta_{3}. $$

Now, let $E_{n-1}^{\prime }$ be the equation

$$ x_{1} \alpha_{1} z_{1} \alpha_{2} y_{1} \gamma_{2} y_{2} \gamma_{3} \doteq y_{2} \delta_{1} z_{1} \beta_{2} y_{1} \delta_{2} x_{1} \beta_{3}. $$

Since ${var}(\delta _{2}) \subseteq {var}(\gamma _{3}) \subseteq {var}(\alpha _{3})$, we have that var(δ₂) ∩ var(α₁α₂γ₂) = ∅, and consequently, $E_{n-1}^{\prime } \to _{L} E_{n-2}$. However, since $z_{1},y_{1}\in W(E_{n-1}^{\prime })$, we can infer from Claim 1.10.1 that E_n− 1∉S. However, this contradicts our earlier assumption that the sequence E₁, E₂,…,E_n is minimal, since $E_{1}, E_{2}, {\ldots } E_{n-2}, E_{n-1}^{\prime }$ also satisfies that E₁ = E, $E_{n-1}^{\prime } \notin S$ and E_i ⊳ E_i+ 1 or E_i+ 1 ⊳ E_i for 1 ≤ i < n − 2 and $E_{n-1}^{\prime } \triangleright E_{n-2}$. Thus, we must have that [E]_⇒ = S as required. □

We are now ready to give the tight lower bounds on the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$, and to characterise those equations for which the lower bounds are achieved. The final step is to move from the bounds depending on Card(W(E)) given by Lemma 9.9 to bounds depending on Card(var(E)) by noting that by Lemma 6.13, there is always an equation in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ for which Card(var(E)) = Card(W(E)).

Lemma 9.11

Let E be a basic regular word equation. Let n = Card(var(E)) and suppose that n ≥ 2. Let V be the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$. Then V ≥ 2^n− 1 − 1. Moreover, V = 2^n− 1 − 1 if and only if E is regular reversed.

Proof

Let E be a basic regular word equation and let n = Card(var(E)) ≥ 2. Let V = Card([E]_⇒) be the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$. W.l.o.g. by Lemma 6.13, we may assume that E has the form $x \alpha y \doteq y \beta x$ for some x, y ∈ X and α, β ∈ (X∖{x, y})^∗. Thus Card(W(E)) = n. Let $S = \{E^{\prime } \mid E\triangleright ^{*} E^{\prime }\}$. Then by Fact 9.5, $S \subseteq [E]_{\Rightarrow }$. By Lemma 9.9, Card(S) = 2^n− 1 − 1. Hence we have that V ≥ 2^n− 1 − 1. Moreover, by Lemma 9.10, S = [E]_⇒ if and only if E is regular reversed. Hence V = 2^n− 1 − 1 if and only if there exists $E^{\prime } \in [E]_{\Rightarrow }$ such that $E^{\prime }$ is regular reversed. □

It is worth noting that the lower bound given by Lemma 9.11 is already exponential in the number of variables, which, since we consider basic RWEs, is proportional to the length of the equation. In order to interpret these bounds in the more general (i.e. not basic) case we recall from Section 4 that for any RWE $\alpha \doteq \beta $, there exist prefixes $\alpha ^{\prime }, \beta ^{\prime }$ of α and β respectively such that $E^{\prime }$ given by $\alpha ^{\prime } \doteq \beta ^{\prime }$ is indecomposable, and such that ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is isomorphic to ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$. In this case, the lower bound on the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ becomes 2^m− 1 − 1 where $m = {\text {Card}}({qv}(E^{\prime }))$.

We conclude this section with the following theorem summarising the bounds on the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$.

Theorem 9.12

Let E be a basic RWE and let n = Card(var(α)). Suppose that n > 1. Let V be the number of vertices in ${\mathscr{G}}^{\Rightarrow }_{[E]}$. Then:

1.

$2^{n-1}-1 \leq V \leq \frac {n!}{2}$,
2.

V = 2^n− 1 − 1 if and only if there exists $E^{\prime } \in [E]_{\Rightarrow }$ such that $E^{\prime }$ is regular reversed,
3.

$V = \frac {n!}{2}$ if and only if there exists $E^{\prime } \in [E]_{\Rightarrow }$ such that $E^{\prime }$ is regular rotated.

Proof

Directly from Lemmata 9.2 and 9.11. □

10 DAG-Width

In addition to the size we are also able to give some insights about the connectedness of the graphs, which, as discussed in Section 3.3, are of interest when solving RWEs modulo additional constraints. We show firstly that there exist classes of equations E for which ${dgw}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]})$ may be arbitrarily large.

Theorem 10.1

Let x, y, z₀, z₁, z₂,…,z_n ∈ X. Let E be the equation given by

$$x z_{0} z_{1} z_{2} {\ldots} z_{n} y \doteq y z_{0} z_{n} z_{n-1} {\ldots} z_{1} x.$$

Then ${dgw}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) > n$.

To prove Theorem 10.1, we make use of the k-cops and robber games for directed graphs as introduced by [5]. The following definition is taken directly from [5].

Definition 10.2 (Cops and robber game 5)

Given a directed graph G = (V, E), the k-cops and robber game on G is played between two players, the cop and the robber player. Positions of this game are pairs (X, r) where X ∈ V^≤k are the vertices occupied by the cops and r ∈ V is the vertex occupied by the robber. The game is played as follows:

At the beginning, the cop player chooses X₀ ∈ V^≤k, and the robber player chooses a vertex r₀ ∈ V, giving position (X₀, r₀).
From position (X_i, r_i), if r_i∉X_i, then the cop player chooses X_i+ 1 ∈ V^≤k, and the robber player chooses a vertex r_i+ 1 ∈ V such that there is a directed path from r_i to r_i+ 1 in the graph G∖(X_i ∩ X_i+ 1).
A play in the game is a maximal (finite or infinite) sequence π = (X₀, r₀),(X₁, r₁), (X₂, r₂),… of positions given by the rules above.
A play π is winning for the cop player if and only if it is finite. (Note that, by the rules above, this implies that r_m ∈ X_m for the last position (X_m, r_m) of this play.) A play π is winning for the robber player if and only if it is infinite.
A (k-cop) strategy for the cop player is a function f from V^≤k × V to V^≤k. A play (X₀, r₀),(X₁, r₁),… is consistent with a strategy f if X_i+ 1 = f(X_i, r_i) for all i. The strategy f is called a winning strategy if every play consistent with the strategy is winning for the cop player.
The cop number of a directed graph G is the least k such that the cop player has a strategy to win the k-cops and robber game on G.

It is shown in [5] (Theorem 16) that for any directed graph G, there is a DAG-decomposition of G of width at most k only if the cop player has a winning strategy in the k-cops and robber game on G. Thus, to show that a graph G has DAG-width greater than n, it is sufficient to show that there is no n-cop winning strategy in the n-cops and robber game on G. This equivalently amounts to providing a winning strategy for the robber. We shall use this fact to prove Theorem 10.1 as follows. Figure 8 provides an example and depicts how the winning strategy for the robber works.

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Fig8_HTML.png — Fig. 8
A depiction of the graph ${\mathscr{G}}^{\Rightarrow }_{[E]}$ in the case that $E = x z_{0} z_{1} z_{2} y \doteq y z_{0} z_{2} z_{1} x$. Thus this is an example of Theorem 10.1 for the case n = 2. The graph is divided into sections corresponding to the (disjoint) sets $\{E_{i}\} \cup S^{in}_{i} \cup S^{out}_{i}$ for 0 ≤ i ≤ 2. The vertices E_i are highlighted in bold while vertices from $S_{i}^{in}$ are coloured blue and vertices from $S_{i}^{out}$ are coloured red. In order to conserve space, vertices belonging to one of these sets are displayed with the LHS and RHS of the equation arranged vertically while for other vertices the equations are omitted. Since there are three values for i, if there are two cops, there will always be at least one i such that no vertex in $\{E_{i}\} \cup S^{in}_{i} \cup S^{out}_{i}$ has a cop on it. The strategy of the robber is to always be on E_i for such a choice of i. This is due to the fact that for each i and j, there is an path from E_i to E_j visiting only vertices from $S_{i}^{out}$ and $S_{j}^{in}$ which can be used as an escape-route (an example for i = 1 and j = 3 is highlighted in bold in the figure). Thus, if at any given stage in the game, a cop moves to a vertex in $\{E_{i}\} \cup S^{in}_{i} \cup S^{out}_{i}$, the robber can use the escape route to safely move to some E_j for which no vertex in $\{E_{j}\} \cup S^{in}_{j} \cup S^{out}_{j}$ has a cop on it. The edges making up the escape-route paths needed for this strategy are given by solid arrows, while the other edges which are not used by the robber are dashed

Proof Theorem 10.1.

Note that it is sufficient to show that the DAG-width of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is greater than n, since ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is a subgraph of ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$. For 0 ≤ i ≤ n, let E_i be the (basic regular) equation given by:

$$x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} z_{2} {\ldots} z_{i-1} y \doteq y z_{i} z_{i-1} {\ldots} z_{1} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x$$

where x, y, z₀, z₁,…,z_n ∈ X. Note that E = E₀. Let V = [E]_⇒. Before describing a winning strategy for the robber in the n-cops and robber game on ${\mathscr{G}}^{\Rightarrow }_{[E]}$, we define some useful subsets of vertices of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ as follows. For each i,0 ≤ i ≤ n and each j,0 ≤ i ≤ n with j > i, let:

$$ \begin{array}{@{}rcl@{}} {T_{i}^{j}} &=& \{ z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x , \\ && z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x , \\ &&\qquad \qquad \qquad \qquad {\vdots} \\ && z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \} \\ &&\cup \{ z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq\\ &&\qquad\qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x\} \\ &&\cup \{ z_{j+2} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{j} z_{j+1} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad\qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x, \\ & &\qquad \qquad\qquad \qquad {\vdots} \\ & &z_{i-1}x z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-2} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x\}. \end{array} $$

Similarly, for each i,0 ≤ i ≤ n and each j,0 ≤ j ≤ n with j < i, let:

$$ \begin{array}{@{}rcl@{}} {T_{i}^{j}} = \{&& z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x , \\ & &z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x , \\ && \qquad \qquad \qquad \qquad {\vdots} \\ & &z_{j} z_{j+1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{j-1} y\! \doteq\! y z_{i} z_{i-1} \!\ldots\! z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \} \\ \cup \{& & z_{j} z_{j+1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1}{\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x\} \\ \cup \{ && z_{j+2} {\ldots} z_{i-1} x z_{j} z_{j+1} z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1}{\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{j+1} y z_{j} z_{j-1}{\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x, \\ && \qquad \qquad \qquad \qquad {\vdots} \\ && z_{i-1} x z_{j} z_{j+1} {\ldots} z_{i-2} z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{0} z_{n} z_{n-1}{\ldots} z_{i+1} x\}. \end{array} $$

For each i,0 ≤ i ≤ n, let $S^{out}_{i} = \bigcup \limits _{0 \leq j \leq n, i\not = j} {T_{i}^{j}}$ and let

$$ \begin{array}{@{}rcl@{}} S^{in}_{i} = \{ && x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} y \doteq z_{j} z_{j-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} yz_{i} z_{i-1} {\ldots} z_{j+1} x \mid j \leq i \}\\ \cup \{ && x z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} y \doteq z_{j} z_{j-1} {\ldots} z_{i+1} yz_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} x \mid j > i \}. \end{array} $$

Note that $S^{in}_{i} = \{E^{\prime } \mid E^{\prime } \Rightarrow _{L}^{*} E_{i} \} \backslash \{E_{i}\}$. Moreover, we shall now show that for each E_i, E_j with i≠j, there exist $F_{1}, F_{2},\ldots , F_{k} \in S^{out}_{i}$ and $G_{1}, G_{2},{\ldots } G_{\ell } \in S^{in}_{j}$ such that

$$ E_{i} \Rightarrow F_{1} \Rightarrow F_{2} \Rightarrow {\ldots} F_{k} \Rightarrow G_{1} \Rightarrow G_{2} \Rightarrow {\ldots} \Rightarrow G_{\ell} \Rightarrow E_{j}. $$

(4)

Indeed, observe that

$$ \begin{array}{@{}rcl@{}} E_{i}& & \Rightarrow z_{i} z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \\ && \Rightarrow z_{i+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \\ && \qquad \qquad \qquad \qquad {\vdots} \\ && \Rightarrow z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq y z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{i+1} x \\ &&\Rightarrow z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x \\ &&\Rightarrow z_{j+2} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} x z_{j} z_{j+1} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x \\ && \qquad \qquad \qquad \qquad {\vdots} \\ & &\Rightarrow z_{i-1}x z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-2} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad \qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x \\ &&\Rightarrow x z_{j} z_{j+1} {\ldots} z_{n} z_{0} z_{1} {\ldots} z_{i-1} z_{i} z_{i+1} {\ldots} z_{j-1} y \doteq \\ && \qquad\qquad z_{i} z_{i-1} {\ldots} z_{0} z_{n} z_{n-1} {\ldots} z_{j+1} y z_{j} z_{j-1} {\ldots} z_{i+1} x \in S_{j}^{in}. \end{array} $$

Thus, there exist $F_{1}, F_{2},\ldots , F_{k} \in S^{out}_{i}$ and $G_{1} \in S_{j}^{in}$ such that E_i ⇒ F₁ ⇒ F₂ ⇒… ⇒ F_k ⇒ G₁. By definition, $S^{in}_{j} = \{E^{\prime } \mid E^{\prime } \Rightarrow _{L}^{*} E_{j}\}\backslash \{E_{j}\}$, so it follows directly that there exist $G_{2},\ldots ,G_{\ell } \in S^{in}_{j}$ such that G₁ ⇒ G₂ ⇒… ⇒ E_j as claimed.

Consequently, we may conclude that $S^{in}_{i} \cup S^{out}_{i} \{E_{i}\} \subset [E]_{\Rightarrow }$ for all i,0 ≤ i ≤ n. Clearly, each E_i, 0 ≤ i ≤ n is not contained in any ${S_{j}^{Z}}$ for 0 ≤ j ≤ n and Z ∈{in, out}. Furthermore, since the RHS of every equation in $S_{i}^{out}$ has either yz_i or z_i as a prefix, $S_{i}^{out} \cap S_{j}^{out} = \emptyset $ whenever i≠j. Similarly since the LHS of every equation in $S^{in}_{i}$ has xz_i as a prefix, $S_{i}^{in} \cap S_{j}^{in} = \emptyset $ whenever i≠j. Since the LHS of all equations in $S^{in}_{i}$ has x as a prefix, and since the LHS all equations in $S^{out}_{j}$ does not have x as a prefix, we may conclude further that ${S^{Z}_{i}} \cap S^{Z^{\prime }}_{j} = \emptyset $ for all i≠j and $Z,Z^{\prime } \in \{in,out\}$.

We are now ready to give the strategy for the robber in the n-cops and robber game on ${\mathscr{G}}^{\Rightarrow }_{[E]}$. We shall say that E_i is a ‘safe’ vertex if $S_{i}^{in} \cup S_{i}^{out} \cup \{E_{i}\}$ contains no vertex with a cop on it. Since there are only n cops, it follows from the fact that the sets $S_{i}^{in} \cup S_{i}^{out} \cup \{E_{i}\}$ are pairwise disjoint that, at any given time, there must be at least one i,0 ≤ i ≤ n such that E_i is safe. By definition, if the robber is on a safe vertex, then there is no cop also on that vertex, so the play continues.

Clearly, if the cop player chooses an initial placement $X_{0} \in [E]_{\Rightarrow }^{\leq n}$, then the robber may be placed on a safe vertex $r_{0} = E_{i_{1}}$ for some i₁,0 ≤ i₁ ≤ n. Now, suppose after k steps in the game the position is (X_k, r_k) where r_k is a safe vertex. Then we shall show that, whatever the cop player chooses for X_k+ 1, the robber may choose r_k+ 1 such that r_k+ 1 is safe. Indeed, if $r_{k} = E_{i_{k}}$ for some i_k,0 ≤ i_k ≤ n is safe, then $(S^{out}_{i_{k}} \cup \{E_{k_{i}}\}) \cap X_{k} = \emptyset $. Moreover, since there are only n cops, whatever the choice of X_k+ 1, there exists $r_{k+1} = E_{i_{k+1}}$ for some i_k+ 1,0 ≤ i_k+ 1 ≤ n such that $E_{i_{k+1}}$ is safe, meaning that $X_{k+1} \cap (S^{in}_{i_{k+1}}\cup \{E_{i_{k+1}}\}) = \emptyset $. It follows that $S_{i_{k}}^{out} \cup S_{i_{k+1}}^{in} \cup \{E_{i_{k}}, E_{i_{k+1}}\} \subset [E]_{\Rightarrow } \backslash (X_{k+1}\cap X_{k})$. We have already shown (Equation 4) that there is a directed path in ${\mathscr{G}}^{\Rightarrow }_{[E]}$ using only vertices from $S_{k_{i}}^{out} \cup S_{k_{i+1}}^{in} \cup \{E_{i_{k}}, E_{i_{k+1}}\}$ from $r_{k} (= E_{i_{k}})$ to $r_{k+1} (= E_{i_{k+1}})$, and hence (X_k+ 1, r_k+ 1) is a valid next position satisfying the rules of the game. Since r_k+ 1 is also safe, this proves our claim, and by a simple induction, it follows that for any n-cop strategy, there is an infinite play (i.e. robber wins). It follows that there is no winning n-cop strategy, so the DAG-width of ${\mathscr{G}}^{\Rightarrow }_{[E]}$ is greater than n as required. □

Since high connectivity can be seen as an obstacle to deciding the satisfiability problem with additional constraints, it is also worth noting classes for which the DAG-width is bounded by a small constant. If all variables occur at most once in an equation E, then it is not difficult to see that the graph ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ will be a DAG. However, when variables may occur more than once, the graphs of even very simple equations such as $x {\mathtt {a}}{\mathtt {b}} \doteq {\mathtt {b}} {\mathtt {a}} x$ will contain cycles, and will therefore have DAG-width at least two. The following theorem describes an infinite class of equations for which the DAG-width of ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ is at most two. It is worth pointing out that the NP-hardness result for the satisfiability problem for regular word equations from [8] applies to this class, and so, by Theorem 8.12, this class also has an NP-complete satisfiability problem.

Theorem 10.3

Let α₁, α₂,…,α_n, β₁, β₂,…,β_n ∈ X^∗ such that

1.

|α_i| = |β_i|∈{1,2,3} for 1 ≤ i ≤ n, and
2.

var(α_i) = var(β_i) for 1 ≤ i ≤ n, and
3.

var(α_i) ∩ var(α_j) = ∅ for 1 ≤ i, j ≤ n with i≠j.

Let E be the RWE $\alpha _{1}\alpha _{2}{\ldots } \alpha _{n} \doteq \beta _{1}\beta _{2} {\ldots } \beta _{n}$. Then ${dgw}({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}) \leq 2$.

Proof

Let E be of the form described in the theorem. By Proposition 3.5,

$${dgw}(\mathscr{G}^{\Rightarrow_{NT}}_{[E]}) = \max \{ m \mid E \Rightarrow_{NT}^{*} E^{\prime} \text{ and } m = {dgw}(\mathscr{G}^{\Rightarrow}_{[E^{\prime}]}) \}.$$

Let ${\mathscr{C}}$ be the subclass of RWEs of the form $ \alpha _{1}\alpha _{2}{\ldots } \alpha _{k} \doteq \beta _{1}\beta _{2}{\ldots } \beta _{k}$ where $k \in \mathbb {N}_{0}$ such that:

1.

α_i, β_i ∈ X^∗with |α_i| = |β_i|∈{1,2,3} for 1 ≤ i ≤ k, and
2.

var(α_i) = var(β_i) for 1 ≤ i ≤ k, and
3.

var(α_i) ∩ var(α_j) = ∅ for all i≠j, 1 ≤ i, j ≤ k.

Clearly, we have $E \in {\mathscr{C}}$. Since k is not restricted, we may also assume w.l.o.g. that for any word equation in ${\mathscr{C}}$, the ‘sub-equations’ $\alpha _{i} \doteq \beta _{i}$ are indecomposable. Moreover, if $E^{\prime }$ is not the equation $\varepsilon \doteq \varepsilon $, we may also assume that |α₁|≥ 1. Under these assumptions, it follows from Corollary 4.4 that for any $E^{\prime } \in {\mathscr{C}}$, the graph ${\mathscr{G}}^{\Rightarrow }_{[E^{\prime }]}$ is isomorphic to the graph ${\mathscr{G}}^{\Rightarrow }_{[\alpha _{1} \doteq \beta _{1}]}$. There are four possibilities for $\alpha _{1} \doteq \beta _{1}$ (up to a renaming of the variables, which does not alter the structure of the graph ${\mathscr{G}}^{\Rightarrow }_{[\alpha _{1} \doteq \beta _{1}]}$), namely $x \doteq x$, $xy \doteq yx$, $xyz \doteq zyx$, $xyz \doteq yzx$ and $xyz \doteq zxy$. It is easily verified by hand that in all cases the DAG-width is at most two (it is exactly two in the cases where |α₁| = |β₁| = 3). Moreover, it follows from the definitions that if $E_{1} \in {\mathscr{C}}$ and E₁ ⇒_NTE₂ for some E₂, then $E_{2} \in {\mathscr{C}}$. Consequently, we have that

$${dgw}(\mathscr{G}^{\Rightarrow_{NT}}_{[E]}) = \max \{ m \mid E \Rightarrow_{NT}^{*} E^{\prime} \text{ and } m= {dgw}(\mathscr{G}^{\Rightarrow}_{[E^{\prime}]}) \} \leq 2.$$

□

11 Extension to Systems of Equations

So far, we have considered individual equations. However, it is often the case that there is not just one equation to be solved, but a system of several equations which should be satisfied concurrently. However, while constructions exist which transform a system of equations into a single equation (see e.g. [17]), the resulting equation will generally not be quadratic/regular. We extend the definition of regular equations to regular systems as follows.

Definition 11.1 (Regular systems)

Let ${{\varTheta }} = \{\alpha _{1} \doteq \beta _{1},\alpha _{2} \doteq \beta _{2}, \ldots , \alpha _{n} \doteq \beta _{n}\}$ be a system of word equations. An orientation of Θ is any element of $\{\alpha _{1} \doteq \beta _{1}, \beta _{1} \doteq \alpha _{1}\} \times \{\alpha _{2} \doteq \beta _{2}, \beta _{2} \doteq \alpha _{2}\} \times {\ldots } \times \{\alpha _{n} \doteq \beta _{n}, \beta _{n} \doteq \alpha _{n}\}$. We say that Θ is regular if it has an orientation for which each variable occurs at most once across all LHSs and at most once across all RHSs.

We can easily adapt the algorithm from Section 3 to work more generally for systems of word equations, and with careful application, still make use of Theorem 8.11 in order to obtain (non-deterministic) polynomial running time. To do this, we need to extend the rewriting transformations (Nielsen transformations) underpinning the relation ⇒_NT which we have thus far defined for single equations only. Note that each possible rewriting of a single equation can be achieved by firstly applying a morphism to both sides of the equation then followed, if applicable, by cancelling the longest identical prefixes of the new LHS and RHS. For example, the rewriting $x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}} \Rightarrow _{NT} {\mathtt {a}} xy z {\mathtt {b}} {\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}$ consists of applying the morphism ψ_y>x (cf. Section 3) to both sides of the first equation in order to get $x {\mathtt {a}} xy z {\mathtt {b}} {\mathtt {a}} \doteq x y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}$ and then cancelling the resulting leftmost occurrences of x.

The generalisation of the Nielsen Transformations to systems of equations is straightforward: we select one of the word equations E from the system, and apply any of the possible transformations to it as before. Then we simply need to apply the associated morphism to both sides of all the other equations in the system, followed by any further resulting cancellations. We shall say that such a transformation is rooted on the chosen equation E, and we shall write ${{\varTheta }} \Rightarrow _{NT}^{E} {{\varTheta }}^{\prime }$ if ${{\varTheta }}, {{\varTheta }}^{\prime }$ are systems of word equations such that ${{\varTheta }}^{\prime }$ is the result of applying a transformation rooted on E to Θ. So if, for example, we have the system $\{x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}, w{\mathtt {b}} {\mathtt {a}} \doteq {\mathtt {a}}{\mathtt {b}} x\}$, then one possible transformation of the first equation is $x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}} \Rightarrow _{NT} x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}$ obtained by applying the morphism ψ_x>y and cancelling the resulting leftmost occurrences of y. To extend this transformation to the whole system, we just need to apply ψ_x>y to the other equation (no further cancellation is required in this case) so we have $\{x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}, w{\mathtt {b}} {\mathtt {a}} \doteq {\mathtt {a}}{\mathtt {b}} x\} \Rightarrow _{NT}^{E} \{x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}, w{\mathtt {b}} {\mathtt {a}} \doteq {\mathtt {a}}{\mathtt {b}} yx\}$ where E is the equation $x {\mathtt {a}} y z {\mathtt {b}}{\mathtt {a}} \doteq y {\mathtt {b}} w {\mathtt {b}} z {\mathtt {a}}$.

Taking the length |Θ| of a system Θ of word equations to be the sum of the lengths of all the individual word equations, it is easily seen that the important properties of this rewriting carry over to the case of systems. Specifically, it is easily verified that for any regular system Θ of word equations each of the following holds:

1.

If E ∈Θ and ${{\varTheta }} \Rightarrow _{NT}^{E} {{\varTheta }}^{\prime }$, then ${{\varTheta }}^{\prime }$ is also regular,
2.

If E ∈Θ and ${{\varTheta }} \Rightarrow _{NT}^{E} {{\varTheta }}^{\prime }$, then $|{{\varTheta }}^{\prime }| \leq |{{\varTheta }}|$,
3.

for any solution h to Θ, and for any E ∈Θ with |E| > 0 there exists a system ${{\varTheta }}^{\prime }$ with a solution $h^{\prime }$ such that ${{\varTheta }} \Rightarrow _{NT}^{E} {{\varTheta }}^{\prime }$ and either $h^{\prime }$ is smaller than h or $|{{\varTheta }}^{\prime }| < |{{\varTheta }}|$.

With this in mind, we are now able to extend our main result that solving regular word equations is in NP to include regular systems of equations.

Theorem 11.2

The satisfiability problem for regular systems of equations is NP-complete. Moreover, whether a system of word equations is regular can be decided in polynomial time.

Proof

Since the satisfiability problem is NP-hard for regular word equations, it is also NP-hard for regular systems of word equations. Next we shall show inclusion in NP. Let Θ = {E₁, E₂,…,E_n} be a regular system of equations. From Observations 1-3 above, there is a solution to Θ if and only if there exists a finite sequence of transformations

$${{\varTheta}}_{0} \Rightarrow_{NT}^{\hat{E}_{1}} {{\varTheta}}_{1} \Rightarrow_{NT}^{\hat{E}_{2}} {\ldots} \Rightarrow_{NT}^{\hat{E}_{m}}{{\varTheta}}_{m}$$

satisfying Θ = Θ₀, ${{\varTheta }}_{m} = \{\varepsilon \doteq \varepsilon \}$ and $\hat {E}_{i} \in {{\varTheta }}_{i-1}$ for 1 ≤ i ≤ m. In fact, by Observation 3, we may freely choose each $\hat {E_{i}}$ to be any equation from Θ_i− 1, and such a finite sequence must still exist whenever there is a solution. Consequently, we may decide whether or not a solution exists with the following procedure (Algorithm 1) which searches for such a sequence by applying firstly transformations rooted on the first equation, followed transformations rooted on the second equation, then the third, etc. For convenience, we shall represent Θ as an ordered list [E₁, E₂,…,E_n] rather than a set.

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Figa_HTML.png

We begin by non-deterministically applying a sequence of Nielsen transformations (generalised for systems of word equations) rooted on the first equation in the list until we reach a system of the form $[\varepsilon \doteq \varepsilon ,E_{2}^{\prime },\ldots ,E_{n}^{\prime }]$. If we are not able to transform E₁ into $\varepsilon \doteq \varepsilon $, then no solution to E₁ exists and the system has no solution.

Otherwise, once we have transformed E₁ into the $\varepsilon \doteq \varepsilon $, we repeat the process of applying the generalised Nielsen transformations to the (new) second equation $E_{2}^{\prime }$ until it has also been transformed into $\varepsilon \doteq \varepsilon $ (note that none of the transformations will change the trivial equation $\varepsilon \doteq \varepsilon $). Continue to repeat this process for each equation, in increasing order, until either an equation is reached which cannot be transformed into $\varepsilon \doteq \varepsilon $, or until we have transformed all equations into this form. In the former case, there is no solution, while in the latter case, a solution exists.

It remains to be seen that we can implement the procedure just described such that it runs in non-deterministic polynomial time. For this, we need a few further observations. The first is that when applying transformations rooted on the i^th equation, we are essentially traversing the same graph ${\mathscr{G}}^{\Rightarrow _{NT}}_{[\tilde {E_{i}}]}$ as if we were to consider in isolation the equation $\tilde {E_{i}}$ obtained after transforming the first i − 1 equations into $\varepsilon \doteq \varepsilon $. The only difference is that we are potentially changing the other equations as we go. The second important observation is that any transformation rooted on the i^th equation which changes any of the other (non-root) equations must necessarily decrease the length of the i^th equation. Finally, the equation on which a transformation is rooted never increases in length as a result of that transformation. Thus, by applying the transformations in the order specified, we never increase the length of i^th equation once it becomes the current root.

Consequently, when applying transformations which preserve the length of the i^th equation, we may, without affecting the outcome, take the shortest path through the graph. Moreover, since we can only decrease the length of an equation a linear number of times, the maximum number of transformations rooted on the i^th equation needed in order to find a solution when one exists is bounded above by

$$ C_{i} = |\tilde{E_{i}}| \max\{ {diam}(\mathscr{G}_{[E]}^{\Rightarrow}) \mid \tilde{E_{i}} \Rightarrow_{NT}^{*} E\}.$$

By Theorem 8.11, we can easily compute an upper bound $C_{{{\varTheta }}} \geq \max \limits \{C_{i} \mid 1 \leq i \leq n\}$ on the number of transformations needed which allows us to restrict the above procedure such that it works in non-deterministic polynomial time without affecting the correctness.

Finally, we describe the following procedure (Algorithm 2) for determining if a system Θ = {E₁, E₂,…,E_n} is regular. First we check that each individual equation is regular and that no variable occurs more than twice across the whole system. We then initialise two sets L and R to the empty set. The sets L and R will keep track of variables occurring across the LHS’s and RHS’s of an orientation of Θ. We remove equations $\alpha \doteq \beta $ from Θ one-by-one, deciding each time whether $\alpha \doteq \beta $ or $\beta \doteq \alpha $ should be included in the orientation and updating L and R accordingly.

While there are still equations left in the system, there are two cases to consider. The first is that there exists an equation $\alpha \doteq \beta \in {{\varTheta }}$ which contains at least one variable x which is already in L or R. In this case, we can rule out at least one choice of $\alpha \doteq \beta $ or $\beta \doteq \alpha $ when constructing an orientation satisfying the definition for regular systems. In particular, if x ∈ L, then whichever of α, β contains x should be the RHS in the orientation (so, if x occurs in α, we include $\beta \doteq \alpha $ in the orientation instead of $\alpha \doteq \beta $). Likewise if x ∈ R then whichever of α, β contains x should be the LHS. Once we have decided which of $\alpha \doteq \beta $ and $\beta \doteq \alpha $ is a bad choice (in that it would lead to two occurrences of x in either the LHS’s or RHS’s), we need to check that the remaining “oriented” equation does not lead to a similar conflict (possibly for one of the other variables). To do this, we simply need to check that the LHS does not share any variables with L and likewise that the RHS does not share any variables with R. If this test is failed then our system is not regular and we can stop and return “No”. Otherwise we add all the variables from the LHS of the oriented equation to L and all the variables from the RHS to R. Then we remove the equation $\alpha \doteq \beta $ from the system Θ and continue.

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-021-10058-5/MediaObjects/224_2021_10058_Figb_HTML.png

The second case is when none of the variables occurring in the remaining equations are contained in either L or R. In this case, how we construct the rest of the orientation is not dependant on the previous choices. Moreover, for any orientation satisfying the definition, we can find another by simply swapping the LHS’s and RHS’s of all equations. Thus by symmetry, we may include any single one of the remaining equations in the orientation without exchanging the LHS and RHS, and without affecting the possibility of constructing a valid orientation in the end. Thus, we then pick any of the remaining equations $\alpha \doteq \beta $ at random and add the variables from α to L and all the variables from β to R, before removing $\alpha \doteq \beta $ from Θ and continuing. If we are able to iterate through and discard all equations in the system like this without returning “No”, then the system is regular and we may return “Yes”. The correctness, along with the fact that the procedure runs in polynomial time are easily verified. □

12 Conclusions

A famous algorithm for solving quadratic word equations can be used to produce a (directed) graph containing all solutions to the equation. In the case of regular equations, we have described some underlying structures of these graphs with the intention of better understanding their solution sets. We give bounds on their diameter and number of vertices, as well as provide classes with bounded (resp. unbounded) DAG-width. Probably the most significant result arising from our analysis is that the satisfiability problem for regular word equations is in NP (and thus NP-complete), which we also extend to regular systems of equations.

We leave open many interesting problems, the most obvious of which is to generalise our results to the (full) quadratic case. We also believe that our analysis and techniques open up the possibility to investigate in far more detail the graphs ${\mathscr{G}}^{\Rightarrow }_{[E]}$, both in the case of regular equations and more generally. For example, in light of our results, it seems reasonable to suggest that determining whether E₁ ⇒^∗E₂ for two regular equations E₁ and E₂ may be done in polynomial time. A particularly nice characterisation of E₁ and E₂ such that E₁ ⇒^∗E₂ might yield a much quicker algorithm than the one resulting from our bound on the diameter of ${\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}$ by significantly reducing the degree of the polynomial. We also expect that a detailed analysis of the length-reducing transformations and symmetries which may be found there would be particularly helpful in understanding further the structure of solution sets and the performance of algorithms solving regular equations in practice.

Finally, we mention the task of investigating the decidability of the satisfiability problem for regular equations with additional constraints, in particular length constraints, with the hope that having identified cases where the DAG-width is particularly high/low, along with improved means to describe precisely the structure of the solution-graphs, might provide some useful hints with how to proceed in this direction.

Acknowledgements

We thank the anonymous referees for their detailed and thoughtful comments.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Each choice of edge in a walk can be seen as a decision about the corresponding solution. It is not necessarily true that different walks will result in different solutions. However, all possible decisions are accounted for, so it is guaranteed that for every solution there is a walk from E to $\varepsilon \doteq \varepsilon $ which corresponds to that solution.

We consider the number of vertices, rather than edges, because it is the number of vertices which is relevant to the performance of the algorithm, and by definition of ⇒_NT, the out-degree of the graph is bounded by a constant so the the number of edges is linear in the number of vertices.

There are several possible variations on the definition of the length-reducing rewriting transformations ⇒_> for which the algorithm remains correct and is guaranteed to terminate. However, for our results, the exact choice is not important as we concentrate our investigations on the length preserving part ⇒ of the rewriting relation for reasons described in Section 3.2.

The case that dgw(G₁) = 1 and dgw(G₂) = 2 is a special case arising from the possibility of ‘isolated cycles’ being compressed into singleton self-loops.

The first case corresponds to the possibility that Q_E(y) = (x, x) for some variable y. The second case corresponds to the possibility that Q_E(#) = (x, x), meaning that E has the form $y\alpha _{1} xz \alpha _{2} \doteq z \beta _{1} xy \beta _{2} $, with x, y, z ∈ X and α₁, α₂, β₁, β₂ ∈ X^∗, in which case $E \Rightarrow \alpha _{1} xyz \alpha _{2} \doteq z \beta _{1} xy \beta _{2} $.

It is worth noting that since basic RWEs are indecomposable, Card(W(E)) ≥ 2 whenever Card(var(E)) ≥ 2.

Abdulla, P.A., Atig, M.F., Chen, Y., Holík, L., Rezine, A., Rümmer, P., Stenman, J.: Norn: An SMT Solver for String Constraints. In: Proc. Computer Aided Verification (CAV), Lecture Notes in Computer Science (LNCS), vol. 9206, pp 462–469 (2015)
Alkhalaf, M., Bultan, T., Yu, F.: STRANGER: an Automata-Based String Analysis Tool for PHP. In: Proc. Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Lecture Notes in Computer Science (LNCS), vol. 6015 (2010)
Angluin, D.: Finding patterns common to a set of strings. J. Comput. Syst. Sci. 21, 46–62 (1980)MathSciNetView Article
Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T., Reynolds, A., Tinelli, C.: CVC4. In: Proc. Computer Aided Verification (CAV), Lecture Notes In Computer Science (LNCS), vol. 6806, pp 171–177 (2011)
Berwanger, D., Dawar, A., Hunter, P., Kreutzer, S., Obdrzálek, J.: The DAG-width of directed graphs. J Combin Theory Series B 102(4), 900–923 (2012)MathSciNetView Article
Berzish, M., Ganesh, V., Zheng, Y.: Z3str3: a String Solver with Theory-Aware Heuristics. In: Proc. Formal Methods in Computer-Aided Design (FMCAD), pp. 55–59. IEEE (2017)
Day, J.D., Ganesh, V., He, P., Manea, F., Nowotka, D.: The Satisfiability of Word Equations: Decidable and Undecidable Theories. In: Potapov, I., Reynier, P. (eds.) Proc. 12th International Conference on Reachability Problems, RP 2018, Lecture Notes in Computer Science (LNCS), vol. 11123, pp 15–29 (2018)
Day, J.D., Manea, F., Nowotka, D.: The Hardness of Solving Simple Word Equations. In: Proc. Mathematical Foundations of Computer Science (MFCS), LIPIcs, vol. 83, pp 18:1–18:14 (2017)
Day, J.D., Manea, F., Nowotka, D.: Upper Bounds on the Length of Minimal Solutions to Certain Quadratic Word Equations. In: Proc. Mathematical Foundations of Computer Science (MFCS), LIPIcs, vol. 138, pp 44:1–44:15 (2019)
Diekert, V., Jeż, A., Plandowski, W.: Finding all solutions of equations in free groups and monoids with involution. Inf. Comput. 251, 263–286 (2016)MathSciNetView Article
Diekert, V., Robson, J.M.: On Quadratic Word Equations. In: Proc. 16Th Annual Symposium on Theoretical Aspects of Computer Science, STACS, Lecture Notes in Computer Science (LNCS), vol. 1563, pp 217–226 (1999)
Ehrenfeucht, A., Rozenberg, G.: Finding a homomorphism between two words is NP-complete. Inf. Process. Lett. 9, 86–88 (1979)MathSciNetView Article
Freydenberger, D.D.: A logic for document spanners. Theory of Computing Systems 63(7), 1679–1754 (2019)MathSciNetView Article
Freydenberger, D.D., Holldack, M.: Document spanners: From expressive power to decision problems. Theory of Computing Systems 62(4), 854–898 (2018)MathSciNetView Article
Jeż, A.: Recompression: a simple and powerful technique for word equations. J. ACM 63 (2016)
Jeż, A.: Word Equations in Nondeterministic Linear Space. In: Proc. International Colloquium on Automata, Languages and Programming (ICALP), LIPIcs, vol. 80, pp 95:1–95:13 (2017)
Karhumäki, J., Mignosi, F., Plandowski, W.: The expressibility of languages and relations by word equations. J. ACM 47, 483–505 (2000)MathSciNetView Article
Kiezun, A., Ganesh, V., Guo, P.J., Hooimeijer, P., Ernst, M.D.: HAMPI: a Solver for String Constraints. In: Proc. ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), pp 105–116. ACM (2009)
Lin, A.W., Barceló, P.: String Solving with Word Equations and Transducers: Towards a Logic for Analysing Mutation Xss. In: ACM SIGPLAN Notices, vol. 51, pp 123–136. ACM (2016)
Lin, A.W., Majumdar, R.: Quadratic Word Equations with Length Constraints, Counter Systems, and Presburger Arithmetic with Divisibility. In: Lahiri, S.K., Wang, C. (eds.) Proc. 16th International Symposium on Automated Technology for Verification and Analysis (ATVA), Lecture Notes in Computer Science (LNCS), vol. 11138, pp 352–369. Springer (2018)
Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, New York (2002)View Article
Makanin, G.S.: The problem of solvability of equations in a free semigroup. Sbornik: Mathematics 32(2), 129–198 (1977)MathSciNetView Article
Manea, F., Nowotka, D., Schmid, M.L.: On the complexity of solving restricted word equations. Int. J. Found. Comput. Sci. 29(5), 893–909 (2018)MathSciNetView Article
Petre, E.: An Elementary Proof for the Non-Parametrizability of the Equation Xyz = Zvx. In: Proc. 29Th International Symposium on Mathematical Foundations of Computer Science (MFCS), Lecture Notes in Computer Science (LNCS), vol. 3153, pp 807–817 (2004)
Plandowski, W.: Satisfiability of Word Equations with Constants is in PSPACE. In: Proc. Foundations of Computer Science (FOCS), pp. 495–500. IEEE (1999)
Plandowski, W., Rytter, W.: Application of Lempel-Ziv Encodings to the Solution of Words Equations. In: Proc. International Colloquium on Automata, Languages and Programming (ICALP), Lecture Notes in Computer Science (LNCS), vol. 1443, pp 731–742 (1998)
Schulz, K.U.: Makanin’s Algorithm for Word Equations-Two Improvements and a Generalization. In: International Workshop on Word Equations and Related Topics, pp. 85–150. Springer (1990)

Title: On the structure of solution-sets to regular word equations
Authors: Joel D. Day
Florin Manea
Publication date: 28-10-2021
Publisher: Springer US
Published in: Theory of Computing Systems
Print ISSN: 1432-4350
Electronic ISSN: 1433-0490
DOI: https://doi.org/10.1007/s00224-021-10058-5

Springer Professional

Abstract

Publisher’s Note

On the structure of solution-sets to regular word equations

Abstract

Keywords

1 Introduction

Our contribution

2 Preliminaries

3 An Algorithm for Solving Regular Word Equations

3.1 Nielsen Transformations

Remark 3.1

Remark 3.2

Theorem 3.3

3.2 Representing the Set of Solutions as a Graph

Proposition 3.4

Proof

Proposition 3.5

Proof

3.3 Solving Equations Modulo Constraints

3.4 Properties of the Graphs \({\mathscr{G}}^{\Rightarrow _{NT}}_{[E]}\) for Regular Equations E

4 Basic Equations: A Convenient Abstraction

Definition 4.1 (Basic Equations)

Lemma 4.2

Proof

Corollary 4.3

Corollary 4.4

Definition 4.5 (Isolated path compression)

Remark 4.6

Lemma 4.7

Proof

Claim 4.7.1

Proof

Claim 4.7.2

Proof

Claim 4.7.3

Proof

Claim 4.7.4

Proof

Theorem 4.8

Proof

5 A Useful Invariant

Definition 5.1 (The invariant Υ E)

Remark 5.2

Theorem 5.3

Proof

6 Jumbled Equations and a Special Case of Symmetry

Definition 6.1 (Jumbled Equations and Δ(E))

Remark 6.2

Lemma 6.3

Proof

Claim 6.3.1

Proof

Definition 6.4 (The set Φ E)

Definition 6.5 (\(V_{\varphi }^E, U_{\varphi }^E\) and \(H_{\varphi }^E\))

Definition 6.6 (Close morphisms φ 1, φ 2 ∈Φ E)

Lemma 6.7

Proof

Claim 6.7.1

Proof

Theorem 6.8

Proof of Theorem 6.8

Fact 6.9

Lemma 6.10

Proof

Lemma 6.11

Proof

Lemma 6.12

Proof

Lemma 6.13

Proof

Claim 6.13.1

Proof

Lemma 6.14

Proof

Claim 6.14.1

Proof

Claim 6.14.2

Proof

Lemma 6.15

Definition 5.1 (The invariant Υ _E)

Definition 6.4 (The set Φ _E)

Definition 6.6 (Close morphisms φ ₁, φ ₂ ∈Φ _E)