main-content

## Weitere Artikel dieser Ausgabe durch Wischen aufrufen

08.09.2021 | Ausgabe 4/2021 Open Access

# On the generative capacity of matrix insertion-deletion systems of small sum-norm

Zeitschrift:
Natural Computing > Ausgabe 4/2021
Autoren:
Henning Fernau, Lakshmanan Kuppusamy, Indhumathi Raman
Wichtige Hinweise
A preliminary version of this paper appeared in Proceedings of SOFSEM 2019, LNCS 11376, pp. 192–205, 2019.

## Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## 1 Introduction

Two common operations while processing natural languages are inserting and deleting words in between parts of sentences; such insertions and deletions are usually based on context information. The (context-free) insertion operation was first considered in Haussler ( 1983), but also recently in Verlan et al. ( 2020). The deletion operation as a basis of a grammatical derivation process was introduced in Kari ( 1991), where the deletion was motivated as a variant of the right-quotient operation that does not necessarily happen at the right end of the string. Insertion and deletion were considered with a linguistic motivation in Galiukschov ( 1981) and together were first studied in Kari and Thierrin ( 1996). The corresponding grammatical mechanism is called an insertion-deletion system (abbreviated as ins-del system). Informally, the insertion and deletion operations of an ins-del system are defined as follows: if a string $$\eta$$ is inserted between two parts $$w_1$$ and $$w_2$$ of a string $$w_1w_2$$ to get $$w_1 \eta w_2$$, we call the operation insertion, whereas if a substring $$\delta$$ is deleted from a string $$w_1 \delta w_2$$ to get $$w_1w_2$$, we call the operation deletion. Suffixes of $$w_1$$ and prefixes of $$w_2$$ are called contexts.
Several variants of ins-del systems have been considered in the literature, imposing regulation mechanisms on top, motivated by classical formal language theory (Dassow and Păun 1989). Some interesting variants (from our perspective) are ins-del P systems (Alhazov et al. 2011), tissue P systems with ins-del rules (Kuppusamy and Rama 2003), context-free ins-del systems (Margenstern et al. 2005), graph-controlled ins-del systems (Fernau et al. 2017b; Freund et al. 2010; Ivanov and Verlan 2017), matrix insertion systems (Marcus and Păun 1990), matrix ins-del systems (Kuppusamy et al. 2011; Petre and Verlan 2012; Kuppusamy and Mahendran 2016), random context and semi-conditional ins-del systems (Ivanov and Verlan 2015), etc. We refer to the survey (Verlan 2010) for more details of several variants of ins-del systems. In this paper, we focus on matrix ins-del systems (Kuppusamy et al. 2011; Petre and Verlan 2012; Kuppusamy and Mahendran 2016). Viewing insertions and deletions as elementary operations for biocomputing (Păun et al. 1998), matrices can be seen as a very simple control mechanism.
In a matrix ins-del system, the insertion-deletion rules are given in matrix form. If a matrix is chosen for derivation, then all the rules in that matrix are applied in order and no rule of the matrix is exempted. In the size $$s=(k;n,i',i'';m,j',j'')$$ of a matrix ins-del system, the parameters (from left to right) denote the maximum number of rules ( length) in any matrix, the maximal length of the inserted string, the maximal length of the left context for insertion, the maximal length of the right context for insertion, the maximal length of the deleted string, the maximal length of the left context for deletion, maximal length of the right context for deletion. We denote the language classes generated by matrix ins-del systems of size s by $$\mathrm {MAT}(s)$$. The tuple formed by the last six parameters, namely $$(n,i',i'';m,j',j'')$$, is called the ins-del size. We call the sum of its parameters the sum-norm of the (matrix) ins-del system. 1
It is known that ins-del systems are computationally complete, i.e., they characterize the family $$\mathrm {RE}$$ of recursively enumerable languages, which readily transfers to the mentioned variants. Descriptional complexity then aims at investigating which of the resources are really needed to obtain computational completeness. For instance, is it really necessary to permit insertion operations that check out contexts of arbitrary length? For resource restrictions that do not (or are not known to) suffice to achieve computational completeness, one is interested in seeing which known families of languages can be still generated. As in our case, for several families of matrix ins-del systems, it is even unknown if all of CF (the context-free languages) can be generated, we then look at the rather large sub-family $$\mathcal {L}_{reg}(\mathrm {LIN})$$, the regular closure of $$\mathrm {LIN}$$. In Table  1, we report on what resources are needed for a matrix ins-del system of sum-norm 3 or 4 to generate the class specified there, also giving a short literature survey. Further races for smaller sizes are described when we discuss the particularities of our results below.
Also, the open questions that we list are sometimes connected, sometimes not. For instance, by the closure of RE under reversal, $$\mathrm {MAT}(*;1,0,0;1,1,0)=\mathrm {RE}$$ iff $$\mathrm {MAT}(*;1,0,0;1,0,1)=\mathrm {RE}$$(here the left and right contexts size within insertion/deletion are exchanged), but the questions whether $$\mathrm {MAT}(*;1,0,0;2,1,0)=\mathrm {RE}$$ or whether $$\mathrm {MAT}(*;2,1,0;1,0,0)=\mathrm {RE}$$ (here all the parameters of insertions are swapped with deletion parameters) seem to be independent of each other.
Table 1
Generative power of $$\mathrm {MAT}(k;n,i',i'';m,j',j'')$$ with sum-norm 3 or 4
Sum-norm
ID size $$(n,i',i'';m,j',j'')$$, where $$n,m \in \{1,2\}$$ and $$i',i'',j',j'' \in \{0,1,2\}$$
Matrix length k
Language family relation
Remarks
3
(1, 0, 0; 2, 0, 0), (2, 0, 0; 1, 0, 0)
$$\ge 1$$
$$\ne \mathrm {RE}$$
Inferred from Krassovitskiy et al. ( 2011)
3
(1, 0, 0; 1, 1, 0), (1, 0, 0; 1, 0, 1)
1
$$\ne \mathrm {RE}$$
Inferred from Krassovitskiy et al. ( 2011)

$$\ge 2$$
?
OPEN
3
(1, 1, 0; 1, 0, 0), (1, 0, 1; 1, 0, 0)
3
$$\supset \mathrm {MLIN}$$
Fernau et al. ( 2018b)

3
$$\supset \mathcal {L}_{reg}(\mathrm {LIN})$$
Fernau et al. ( 2019), Corollary  1
4
(1, 0, 0; 1, 1, 1)
3
RE
Fernau et al. ( 2018b)
4
(1, 1, 1; 1, 0, 0)
3
RE
Fernau et al. ( 2018b)

2
RE
Theorem 4
4
(1, 0, 0; 2, 1, 0), (1, 0, 0; 2, 0, 1)
$$\ge 1$$
?
$$\mathrm {OPEN}$$
4
(1, 0, 0; 1, 2, 0), (1, 0, 0; 1, 0, 2)
3
RE
Fernau et al. ( 2019), Theorem 2
4
(1, 2, 0; 1, 0, 0), (1, 0, 2; 1, 0, 0)
3
RE
Fernau et al. ( 2019)

2
RE
Theorem 3
4
(1, 1, 0; 1, 1, 0), (1, 0, 1; 1, 0, 1)
3
RE
Petre and Verlan ( 2012) and Fernau et al. ( 2018b)

2
$$\supset$$ REG
Theorem 6

1
$$\subset$$ CF
Krassovitskiy et al. ( 2011)
4
(1, 1, 0; 1, 0, 1), (1, 0, 1; 1, 1, 0)
3
RE
Petre and Verlan ( 2012) and Fernau et al. ( 2018b)

1
$$\ne \mathrm {RE}$$
Krassovitskiy et al. ( 2011)
4
(1, 1, 0; 2, 0, 0), (1, 0, 1; 2, 0, 0)
2
RE
Petre and Verlan ( 2012) and Fernau et al. ( 2018b)

1
$$\ne \mathrm {RE}$$
Krassovitskiy et al. ( 2011)
4
(2, 0, 0; 2, 0, 0)
$$\ge 1$$
$$\ne \mathrm {RE}$$
Krassovitskiy et al. ( 2011)
4
(2, 0, 0; 1, 1, 0), (2, 0, 0; 1, 0, 1)
2
RE
Petre and Verlan ( 2012) and Fernau et al. ( 2018b)

1
$$\ne \mathrm {RE}$$
Krassovitskiy et al. ( 2011)
4
(2, 1, 0; 1, 0, 0), (2, 0, 1; 1, 0, 0)
2
$$\supset \mathrm {MLIN}$$
Fernau et al. ( 2018b)

2
$$\supset \mathcal {L}_{reg}(\mathrm {LIN})$$
Fernau et al. ( 2019), Corollary  1
Matrix ins-del systems of sizes (3; 1, 1, 0; 1, 1, 0) and (3; 1, 0, 1; 1, 0, 1) are shown to $$\mathrm {RE}$$ earlier and in Fernau et al. ( 2019), the preliminary version of the paper, an attempt was made to reduce the matrix length from 3 to 2. However, a flaw in the proof was later pointed out by one of the referees and therefore this case remains open. Besides this size, matrix ins-del systems of sizes (3; 1, 1, 0;  1, 0, 0), (3; 1, 0, 1;  1, 0, 0), (2; 2, 1, 0;  1, 0, 0), (2; 2, 0, 1;  1, 0, 0) are not known to describe $$\mathrm {RE}$$, not even all of $$\mathrm {CF}$$. However, these systems have been shown to describe at least the class $$\mathrm {MLIN}$$ of metalinear languages (Fernau et al. 2018b). It is known that $$\mathrm {MAT}(1;1,1,0;1,0,0)\cup \mathrm {MAT}(3;1,0,0;1,0,0)\ne \mathrm {RE}$$ (Fernau and Kuppusamy 2017). In particular, it has been shown that the Parikh images of languages from $$\mathrm {MAT}(3;1,0,0;1,0,0)$$ coincide with the (commutative) Petri net languages and that even allowing for longer matrices would not increase the generative power when keeping the ins-del size of (1, 0, 0; 1, 0, 0). We consider a family of languages strictly between $$\mathrm {MLIN}$$ and $$\mathrm {CF}$$ — namely, the regular closure of $$\mathrm {LIN}$$, which is the smallest language class containing $$\mathrm {LIN}$$ but being closed under the three regular operations union, concatenations and Kleene star. We show that if $$\mathrm {LIN} \subseteq \mathrm {MAT}(s)$$, then $$\mathcal {L}_{reg}(\mathrm {LIN}) \subseteq \mathrm {MAT}(s)$$. As a consequence, it follows that matrix ins-del systems of the sizes mentioned above contain $$\mathcal {L}_{reg}(\mathrm {LIN})$$. Finally, we prove that matrix ins-del systems of sizes (2; 1, 1, 0;  1, 1, 0) and (2; 1, 0, 1;  1, 0, 1) can describe at least the regular languages and some non-regular languages. This is clearly only a weak replacement of the earlier claimed computational completeness result.
A further technical contribution consists in formulating a new normal form, called time separating special Geffert normal form (tsSGNF), that allows to simplify some arguments, because in particular there is no way to have mixtures of terminals and nonterminals at the right end of a derivable sentential form. Hence, such mixed cases need not be considered when proving correctness of simulation results based on tsSGNF. This is important, as the non-existence of such mixed forms is often tacitly assumed in several proofs that use SGNF; replacing SGNF by tsSGNF should help to easily fix these results. In this respect, matrix control differs from graph control 2 ( $$\mathrm {GCID}$$) in connection with ins-del systems since the latter control demands that if $$\mathrm {LIN} \subseteq \mathrm {GCID}(k;n,i',i'';m,j',j'')$$, then $$\mathcal {L}_{reg}(\mathrm {LIN}) \subseteq \mathrm {GCID}(k+2;n,i',i'';m,j',j'')$$; see Fernau et al. ( 2017b), Fernau et al. ( 2017d).
Parts of this work have been presented at the conference SOFSEM 2019; see Fernau et al. ( 2019). As an easy reference, the following results were showcased in Fernau et al. ( 2019).
1.
$$\mathrm {MAT}(3;1,0,0;1,2,0)=\mathrm {MAT}(3;1,0,0;1,0,2)=\mathrm {RE}$$.

2.
$$\mathrm {MAT}(2;1,1,0;1,1,0)=\mathrm {MAT}(2;1,0,1;1,0,1)=\mathrm {RE}$$.

3.
$$\mathrm {MAT}(3;1,2,0;1,0,0)=\mathrm {MAT}(3;1,0,2;1,0,0)=\mathrm {RE}$$.

4.
$$\mathcal {L}_{reg}(\mathrm {LIN}) \subsetneq \mathrm {MAT}(3;1,1,0;1,0,0) \cap \mathrm {MAT}(3;1,0,1;1,0,0)$$.

5.
$$\mathcal {L}_{reg}(\mathrm {LIN}) \subsetneq \mathrm {MAT}(2;2,1,0;1,0,0) \cap \mathrm {MAT}(2;2,0,1;1,0,0)$$.

However, most of the proofs have been suppressed in the conference paper. In this extended version of the paper, we retain the first one and the last two results stated above from Fernau et al. ( 2019), however with detailed proofs. We must also thank one of the referees of this manuscript for pointing us to a problem with the construction suggested in Fernau et al. ( 2019) for proving the above-stated second result. The matrix length in the above-stated third result is improved from 3 to 2 in this paper and a new result $$\mathrm {MAT}(2;1,1,1;1,0,0)=\mathrm {RE}$$, is also proved here. A summary of the results of this paper can be seen in Tables  1 and  2 below.

## 2 Preliminaries

We assume that the readers are familiar with the standard notations in formal language theory. We recall a few notations here to keep the paper self-contained.
Let $$\Sigma ^*$$ denote the free monoid generated by the alphabet (finite set) $$\Sigma$$. The elements of $$\Sigma ^*$$ are called strings or words; $$\lambda$$ denotes the empty string, $$L^R$$ and $$\mathcal {L}^R$$ denote the reversal of language L and language family $$\mathcal {L}$$, respectively. $$\mathrm {RE}$$ and $$\mathrm {LIN}$$ denote the families of recursively enumerable languages and linear languages, respectively. Occasionally, we use the shuffle operator, written as  .
For the computational completeness results, we use the fact that type-0 grammars in Special Geffert Normal Form (SGNF) (Freund et al. 2010) are known to characterize the recursively enumerable languages and is extensively used in Fernau et al. ( 2017a, 2017c, 2018a) and Petre and Verlan ( 2012). In fact, we slightly extend this notion in order to simplify certain arguments below. These simplifications were often tacitly assumed in previous works, but the following new definition gives good ground for it.
Definition 1
A type-0 grammar $$G=(N,T,S,P)$$ is said to be in time separating special Geffert normal form, tsSGNF for short, if N is the nonterminal alphabet, T is the terminal alphabet, $$S\in N$$ is the start symbol and P is the set of production rules satisfying the following conditions.
• N decomposes as $$N=N^{(0)}\cup N'\cup N''$$, where $$N''=\{A,B,C,D\}$$ and $$S\in N^{(0)}$$, $$S'\in N'$$,
• the only non-context-free rules in P are the two erasing rules $$AB\rightarrow \lambda$$ and $$CD \rightarrow \lambda$$,
• the context-free rules are of one of the following forms:
(a)
$$X \rightarrow Yb$$ or $$X\rightarrow b'Y$$ where $$X\in N^{(0)}$$, $$Y\in N^{(0)}\cup N'$$, $$b\in T$$,

(b)
$$X \rightarrow Yb''$$ or $$X\rightarrow b'Y$$ where $$X,Y\in N'$$, or $$S'\rightarrow \lambda$$,
where $$b'\in \{A,C\}$$, $$b''\in \{B,D\}$$ and $$X\ne Y$$ in (a) and (b);

(c)
possibly, there is also the rule $$S\rightarrow \lambda$$.

Remark 1
Notice that as these context-free rules are more of a linear type, it is easy to see that there can be at most one nonterminal from $$N^{(0)}\cup N'$$ present in the derivation of G. We exploit this observation in our proofs. According to the construction of this normal form described in Freund et al. ( 2010) and Geffert ( 1991), the derivation of a string is performed in two phases. In Phase I, the context-free rules are applied repeatedly. More precisely, this phase splits into two stages: in stage one, rules from (a) have left-hand sides from $$N^{(0)}$$; this stage produces a string of terminal symbols to the right side of the only nonterminal from $$N^{(0)}$$ in the sentential form and codings thereof are put on the left side of the only nonterminal occurrence from $$N^{(0)}$$; the transition to stage two is performed by using rules with left-hand sides from $$N^{(0)}$$ and one symbol from $$N'$$ occurring on the right-hand sides; in stage two, rules from (b) with left-hand sides from $$N'$$ are applied; importantly, here (and later) no further terminal symbols are produced. This separation into non-interacting times of the derivation process was the reason to call this normal form time separating. In the previous version of the normal form, it was possible that rules from the two stages could be applied also in different orders; this could lead to problems in the simulation. The two erasing rules $$AB \rightarrow \lambda$$ and $$CD \rightarrow \lambda$$ are not applicable during the first phase as long as there is a S (or $$S'$$) in the middle. All the symbols A and  C are generated on the left side of these middle symbols and the corresponding symbols B and D are generated on the right side. Phase I is completed by applying the rule $$S' \rightarrow \lambda$$ in the derivation. In Phase II, only the non-context-free erasing rules are applied repeatedly and the derivation ends. By induction, it is clear that sentential forms derivable by tsSGNF grammars belong to $$\{A,C\}^*N^{(0)}T^*\cup \{A,C\}^*(N^{\prime }\cup \{\lambda \})\{B,D\}^*T^*$$.
Finally, notice that [similar to Fernau et al. ( 2017e)], we can also assume that $$X\ne Y$$ in the context-free rules, as we can we can alternate between even and odd derivation steps.
Let us remark that the idea of a time separating (special) Geffert normal form is useful even when the number of nonterminal symbols matter, as proven by us in a recent paper presented at ICMC 2020 (to appear in its LNCS volume).
Our reasoning shows in particular the following first result:
Theorem 1
For any recursively enumerable language L , i.e., $$L\in \mathrm {RE}$$ , there exists a type-0 grammar  G in tsSGNF with $$L=L(G)$$.

### 2.1 Matrix insertion-deletion systems

In this subsection, we describe matrix insertion-deletion systems as in  Kuppusamy et al. ( 2011), Petre and Verlan ( 2012) and Kuppusamy and Mahendran ( 2016).
Definition 2
A matrix insertion-deletion system is a construct $$\Gamma =(V, T,A, R)$$ where V is an alphabet, $$T \subseteq V$$, A is a finite language over V, R is a finite set of matrices $$\{r_1, r_2, \ldots r_l\}$$, where each $$r_i$$, $$1 \le i \le l$$, is a matrix of the form $$r_i= [(u_1, \alpha _1, v_1)_{t_1}, (u_2, \alpha _2, v_2)_{t_2}, \ldots ,(u_k,\alpha _k, v_k)_{t_k}]$$. For $$1 \le j \le k$$, $$u_j,~v_j \in V^*$$, $$\alpha _j \in V^+$$ and $$t_j \in \{ins,del\}$$.
The triplet $$(u_j, \alpha _j, v_j)_{t_j}$$ is called an ins-del rule and the pair $$(u_j,v_j)$$ is termed the context with $$u_i$$ as the left and $$v_i$$ as the right context for $$\alpha _j$$ in $$t_j$$; $$\alpha _j$$ is called insertion string if $$t_j =ins$$ and deletion string if $$t_j =del$$. The elements of A are called axioms. For all contexts of t where $$t \in \{ ins, del \}$$, if $$u=\lambda$$ or $$v=\lambda$$, then we call the context to be one-sided. If $$u=v=\lambda$$ for a rule, then the corresponding insertion/deletion can be done freely anywhere in the string and is called context-free insertion/deletion. An insertion rule is of the form $$(u,\eta ,v)_{ins}$$, which means that the string $$\eta$$ is inserted between u and  v. A deletion rule is of the form $$(u, \delta ,v)_{del}$$, which means that the string $$\delta$$ is deleted between u and  v. Applying $$(u, \eta ,v)_{ins}$$ corresponds to applying the rewrite rule $$uv \rightarrow u \eta v$$, and applying $$(u, \delta ,v)_{del}$$ corresponds to applying the rewriting rule $$u \delta v \rightarrow uv$$.
At this point, we make a note that in a derivation, the rules of a matrix are applied sequentially one after another in the given order and no rule is used in appearance checking, as it is often the case in more classical matrix grammars with rewriting rules; see Dassow and Păun ( 1989). For $$x,y \in V^*$$ we write $$x \Rightarrow _{r_i} y$$, if y can be obtained from x by applying all the rules of a matrix $$r_i, ~1 \le i \le l$$, in order. The language $$L(\Gamma )$$ generated by  $$\Gamma$$ is defined as $$L(\Gamma )= \{w \in T^* \mid x \Rightarrow _{*} w,~\mathrm{for~ some}~ x \in A \},$$ where $$\Rightarrow _*$$ (as usual with matrix ins-del systems) denotes the reflexive and transitive closure of $$\Rightarrow :=\bigcup _{r\in R}\Rightarrow _r$$.
If a matrix ins-del system has at most k rules in a matrix and the size of the underlying ins-del system is $$(n,i',i'';m,j',j'')$$, then we denote the corresponding class of language by $$\mathrm {MAT}(k;n,i',i'';m,j',j'')$$.
Example 1
The language $$L_1=\{a^nb^mc^nd^m \mid m,n \ge 1 \}$$ of cross-serial dependencies can be generated by a binary matrix insertion-deletion system as follows: $$\Gamma _1=(\{a,b,c,d\}, \{a,b,c,d\}, \{abcd\}, R)$$, where R consists of two matrices: $$m1= [(a,a,\lambda )_{ins},(c,c,\lambda )_{ins}]$$, $$m2= [(b,b,\lambda )_{ins},(d,d,\lambda )_{ins}]$$. We note that the matrices
\begin{aligned} m1'= [(\lambda ,a,a)_{ins},(\lambda ,c,c)_{ins}], \quad m2'= [(\lambda ,b,b)_{ins},(\lambda ,d,d)_{ins}] \end{aligned}
also generate $$L_1$$. Hence,
\begin{aligned} L_1 \in \mathrm {MAT}(2;1,1,0;0,0,0) \cap \mathrm {MAT}(2;1,0,1;0,0,0)\,. \end{aligned}
We refer to Stabler ( 2004) for further variants and a discussion of the linguistic relevance of this type of example.

### 2.2 Regular closure of linear languages

Recall that a linear grammar is a context-free grammar $$G=(N,T,S,P)$$ whose productions are of the form $$A\rightarrow x$$, where A is a nonterminal symbol, and x is a word over $$N\cup T$$ with at most one occurrence of a nonterminal symbol. The language class $$\mathrm {LIN}$$ collects all languages that can be described by linear grammars. $$\mathrm {LIN}$$ can be characterized by linear grammars in normal form, which means that any rule $$A\rightarrow x$$ either obeys $$x\in T\cup \{\lambda \}$$ or $$x\in TN$$ or $$x\in NT$$. It is well known that $$\mathrm {LIN}$$ is not closed under concatenation and Kleene star. This motivates to consider the class $$\mathcal {L}_{reg}(\mathrm {LIN})$$ as the smallest class containing $$\mathrm {LIN}$$ that is closed under union, concatenation or Kleene star. Similarly, we can assume that any right-linear grammar that we consider is in normal form, i.e., it has only rules $$A\rightarrow aB$$ or $$A\rightarrow \lambda$$, with $$A\in N$$, $$B\in N{\setminus }\{A\}$$ and $$a\in T$$. The following grammatical characterization for $$\mathcal {L}_{reg}(\mathrm {LIN})$$ was shown in Fernau et al. ( 2018d).
Proposition 1
Fernau et al. ( 2018d) Let $$L\subseteq T^*$$ be some language. Then, $$L\in \mathcal {L}_{reg}(\mathrm {LIN})$$ if and only if there is a context-free grammar $$G=(N,T,S,P)$$ with $$L(G)=L$$ that satisfies the following properties.
• N can be partitioned into $$N_0$$ and $$N'$$.
• There is a right-linear grammar $$G_R=(N_0,N',S,P_0)$$.
• $$N'$$ can be further partitioned into $$N_1,\dots ,N_k$$ for some k, such that the restriction $$P_i$$ of P involving symbols from $$N_i\cup T$$ are only linear rules, with T serving as the terminal alphabet.
• P can be partitioned into $$P_0,P_1,\dots , P_k$$.
Notice that this characterization corresponds to a two-stage approach: First, the right-linear grammar $$G_R$$ is used to produce a sequence of symbols from $$N'$$ that both serve as terminal symbols for $$G_R$$ and as nonterminal symbols for the linear grammar $$G_i$$ that can be obtained from G by using rules $$P_i$$ only. Here, it is not necessary but possible to insist on using $$N''\subseteq N'$$ instead of $$N'$$ as the terminal alphabet of $$G_R$$, such that $$N''\cap N_i=\{S_i\}$$ for each $$i\in [1\dots k]$$, i.e., we can single out a start symbol $$S_i$$ for each $$G_i$$. Clearly, the linear rules mentioned in the previous proposition can be assumed to be in normal form. In order to simplify the proofs of some of our main results, the following observations from Fernau et al. ( 2018b); Fernau et al. ( 2018c) are helpful.
Proposition 2
Fernau et al. ( 2018b) Let $$\mathcal {L}$$ be a language class that is closed under reversal. Then, for all non-negative integers $$k,n,i',i'',m,j,j''$$ , we have that
• $$\mathrm {MAT}(k;n,i',i'';m,j',j'')=[\mathrm {MAT}(k;n,i'',i';m,j'',j')]^R;$$
• $$\mathcal {L}=\mathrm {MAT}(k;n,i',i'';m,j',j'')$$ if and only if $$\mathcal {L}=\mathrm {MAT}(k;n,i'',i';m,j'',j')$$;
• $$\mathcal {L}\subseteq \mathrm {MAT}(k;n,i',i'';m,j',j'')$$ if and only if $$\mathcal {L}\subseteq \mathrm {MAT}(k;n,i'',i';m,j'',j')$$.
Proposition 3
Fernau et al. ( 2018c) $$\mathcal {L}_{reg}(\mathrm {LIN})$$ is closed under reversal.

## 3 Computational completeness results

In this section, we show the computational completeness of matrix ins-del systems of sizes (3; 1, 0, 0; 1, 2, 0), (3; 1, 0, 0; 1, 0, 2), (2; 1, 2, 0; 1, 0, 0), (2; 1, 0, 2; 1, 0, 0), (2; 1, 1, 0; 1, 1, 0), (2; 1, 0, 1; 1, 0, 1), (2; 1, 1, 1; 1, 0, 0), by providing matrix ins-del systems of the above said sizes that simulate type-0 grammars in tsSGNF. How these results improve/complement previously known results is explained in Table  2.
Table 2
How the results of this section/paper improve/complement previous results
Result
Theorem
Improves/relates to
References
$$\mathrm {MAT}(3;1,0,0;1,2,0)=\mathrm {RE}$$
Theorem 2
$$\mathrm {MAT}(3;1,0,0;1,1,1)=\mathrm {RE}$$
Fernau et al. ( 2018b)

$$\mathrm {MAT}(3;1,0,0;1,1,0)=\mathrm {???}$$
OPEN
$$\mathrm {MAT}(2;1,2,0;1,0,0)=\mathrm {RE}$$
Theorem 3
$$\mathrm {MAT}(3;1,2,0;1,0,0)=\mathrm {RE}$$
Fernau et al. ( 2019)

$$\mathrm {MAT}(3;1,1,1;1,0,0)=\mathrm {RE}$$
Fernau et al. ( 2018b)
$$\mathrm {MAT}(2;1,1,0;1,1,0)=\mathrm {???}$$
OPEN
$$\mathrm {MAT}(2;1,1,0;1,1,1)=\mathrm {RE}$$
Fernau et al. ( 2018b)

$$\mathrm {MAT}(2;1,1,1;1,1,0)=\mathrm {RE}$$
Fernau et al. ( 2018b)

$$\mathrm {MAT}(3;1,1,0;1,1,0)=\mathrm {RE}$$
Fernau et al. ( 2018b)
$$\mathrm {MAT}(2;1,1,1;1,0,0)=\mathrm {RE}$$
Theorem 4
$$\mathrm {MAT}(3;1,1,1;1,0,0)=\mathrm {RE}$$
Fernau et al. ( 2018b)
We often use labels from $$[1\dots |P|]$$ to uniquely address the rules of a grammar in tsSGNF. Then, such labels (and possibly also primed version thereof) will be used as rule markers that are therefore part of the nonterminal alphabet of the simulating matrix ins-del system. For the ease of reference, we collect in $$P_{ll}$$ the labels of the context-free rules of the form $$X\rightarrow Yb$$ (which resemble left-linear rules) and in $$P_{rl}$$ the labels of the context-free rules of the form $$X\rightarrow bY$$ (which resemble right-linear rules).
Some of the key features in our construction of matrix ins-del systems in this section are the following ones:
• There is at least one deletion rule in most simulating matrices, mostly for reasons of control. In some cases, the axiom will have $$\$$ and we delete this in order to avoid repeating the matrix and to ensure that the matrix (containing the deletion of $$\$$) is applied only once until the intended rule simulation is completed. We insert $$\$$ again at the end of the simulation.
• In the majority of cases, at least one of the deletion rules of every matrix has a rule marker in the left context or the marker itself is deleted. A matrix of this type is said to be guarded. The importance of a matrix being guarded is that it can be applied only in the presence of the corresponding rule marker. This will avoid interference of any other matrix application.
• After successful application of every matrix, either a rule marker remains or the intended simulation is completed.
• As discussed in Remark  1, during Phase I, the symbols A and C are on the left of the middle nonterminal S or $$S'$$ and the corresponding symbols B and D are on the right of S or $$S'$$. When $$S'$$ is deleted from the center, the symbols from $$\{A,C\}$$ and $$\{B,D\}$$ may combine to be erased in Phase II.
• In the transition to Phase II, a special symbol Z is introduced that is assumed to stay to the left of AB or CD, whatever substring is to be deleted. Special matrices allow Z to move in the sentential form or to be (finally) deleted. This is our novel idea not used in earlier papers.
• There is a subtlety if $$\lambda \in L$$. Then, we can assume that $$S\rightarrow \lambda$$ is in the simulated grammar, which would add one more erasing matrix, similar to h1 in Fig.  2, which deletes S and introduces Z.
We now proceed to present our results.
Theorem 2
$$\mathrm {MAT}(3;1,0,0;1,2,0)=\mathrm {MAT}(3;1,0,0;1,0,2)=\mathrm {RE}$$.
Proof
Formally, consider a type-0 grammar $$G=(N,T,P,S)$$ in tsSGNF. The rules from P are supposed to be labelled injectively with labels from the set $$[1 \ldots |P|]$$, with label sets $$P_{ll}$$ and $$P_{rl}$$ as defined above. Also recall that the nonterminal alphabet decomposes as $$N=N^{(0)}\cup N'\cup N''$$, $$N''=\{A,B,C,D\}$$, $$S\in N^{(0)},S'\in N'$$, according to the normal form. We construct a matrix insertion–deletion system $$\Gamma = (V,T,\{S \},M)$$, where the alphabet of $$\Gamma$$ is
\begin{aligned} V = N \cup T \cup \{p,p',p'',p'''\mid p\in P_{rl} \}\cup \{q,q',q''\mid q\in P_{ll} \}\cup \{Z\}\,. \end{aligned}
The set of matrices M of $$\Gamma$$ consists of the matrices described in the following.
We simulate a rule p: $$X \rightarrow bY$$, $$X,Y\in N^{(0)}\cup N'$$, $$b\in N''$$, i.e., $$p\in P_{rl}$$, by the four matrices displayed on the left-hand side of Fig.  1.
Similarly, we simulate the rule q: $$X \rightarrow Yb$$, $$X,Y\in N^{(0)}\cup N'$$, $$b\in N''\cup T$$, i.e., $$q\in P_{ll}$$, by the four matrices shown on the right-hand side of Fig.  1 in a symmetrical manner.
Recall that applying the rule $$h: S' \rightarrow \lambda$$ starts Phase II within the working of G. In the simulation, the presence of a new symbol, Z, indicates that we are in Phase II. This motivates the introduction of the five matrices listed in Fig.  2.
We now proceed to prove that $$L(\Gamma ) = L(G)$$. We initially prove that $$L(G) \subseteq L(\Gamma )$$ by showing that $$\Gamma$$ correctly simulates the application of the rules of the types pqfgh, as discussed above. We explain the working of the simulation matrices for the cases p and f mainly, as the working of q and g simulation matrices are similar, and as the working of the simulation of the h rule is clear. Notice that the transition from Phase I to Phase II (as accomplished by applying h in G) is now carried out by applying h1 and hence introducing Z which will be always present when simulating Phase II with the system $$\Gamma$$.
Simulation of $${p: X \rightarrow bY}$$: consider the string $$\alpha X \beta$$ derivable from S in G, with $$X\in N^{(0)}\cup N'$$ and $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*T^*$$ according to Remark  1. We now show that on applying the matrices introduced for simulating rules from $$P_{rl}$$, we can derive $$\alpha bY \beta$$ within $$\Gamma$$, starting from $$\alpha X \beta$$. First, we apply the rules of matrix p1. The markers p and $$p'$$ are randomly inserted by the first two rules, leading to a string from . However, the third rule of p1 is applicable only when $$p'p$$ is inserted before the nonterminal X. This shows that $$\alpha X \beta \Rightarrow _{p1}\gamma _1$$ is possible if and only if $$\gamma _1=\alpha p'p\beta$$.
Now, on applying matrix p2, $$p''$$ and $$p'''$$ are inserted anywhere, so intermediately we arrive at a string from . Then, $$p'$$ is deleted in the left context of $$p'''p''$$. So, we now arrive at the string $$\alpha p'''p''p \beta$$. This shows that $$\gamma _1 =\alpha p'p\beta \Rightarrow _{p2}\gamma _2$$ is possible if and only if $$\gamma _2=\alpha p'''p''p\beta$$. We now apply matrix p3. Hence, b is first inserted randomly, leading to a string from . The left context in the second rule of p3, enforces that, inevitably, we arrive at $$\gamma _3=\alpha p'''bp\beta$$. Finally, we apply matrix p4. Here, Y is inserted anywhere by the first rule, but the second one enforces that we now look at $$\alpha p'''bY\beta$$, which yields $$\gamma _4=\alpha bY\beta$$ by the last rule. As G is in tsSGNF, we also know that $$Y\in N''$$. This shows that $$\gamma _3 =\alpha p'''bp\beta \Rightarrow _{p4}\gamma _4$$ is possible if and only if $$\gamma _4=\alpha bY\beta$$. The intended sequence of derivations is hence:
\begin{aligned} \alpha X\beta \Rightarrow _{p1} \alpha p'p\beta \Rightarrow _{p2}\alpha p'''p''p\beta \Rightarrow _{p3} \alpha p'''bp \beta \Rightarrow _{p4} \alpha bY \beta . \end{aligned}
This completes the simulation of rule p. Simulation of $$q: X \rightarrow Yb$$ is similar and symmetric to the working of the p rule simulation and hence omitted.
Simulation of $${f: AB \rightarrow \lambda }$$ or $${g: CD \rightarrow \lambda }$$: consider the sentential form $$\alpha AB\beta$$ derivable in  G. This means that we are in Phase II. As said above, the symbol Z will be present in the corresponding sentential form derivable in $$\Gamma$$. Any string from can be transformed into $$\alpha ZAB\beta$$ by using matrix $$\text {move-}Z$$. Now, $$\alpha ZAB\beta \Rightarrow _{f1}\alpha Z\beta$$ correctly simulates one application of f.
Now, we prove $$L(\Gamma )\subseteq L(G)$$. Formally, this is an inductive argument that proves the following properties of a string $$w\in V^*$$ such that $$S\Rightarrow _*w$$ in  $$\Gamma$$:
1.
At most one symbol from $$N^{(0)}\cup N'$$ is occurring in w.

2.
If one symbol X from $$N^{(0)}$$ occurs in w, then $$w=\alpha X u$$, where $$\alpha \in \{A,C\}^*$$ and $$u\in T^*$$: w is derivable in G;

3.
If one symbol X from $$N'$$ occurs in w, then $$w=\alpha X\beta u$$, where $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*$$ and $$u\in T^*$$: w is derivable in G;

4.
If no symbol from $$N^{(0)}\cup N'$$ occurs in w, then Z occurs at most once in w.

5.
If no symbol from $$N^{(0)}\cup N'\cup \{Z\}$$ occurs in w, then
(a)
either $$w=\alpha r'r\beta u$$, where $$\alpha \in \{A,C\}^*$$, r is some context-free rule from  G with left-hand side X, $$\beta \in \{B,D\}^*$$ and $$u\in T^*$$: $$\alpha X\beta u$$ is derivable in  G;

(b)
or $$w=\alpha p'''p''p\beta u$$, where $$\alpha \in \{A,C\}^*$$, $$p\in P_{rl}$$ with $$p:X\rightarrow bY$$, $$\beta \in \{B,D\}^*$$ and $$u\in T^*$$: $$\alpha X\beta u$$ is derivable in G;

(c)
or $$w=\alpha p'''bp\beta u$$, where $$\alpha \in \{A,C\}^*$$, $$p\in P_{rl}$$ with $$p:X\rightarrow bY$$, $$\beta \in \{B,D\}^*$$ and $$u\in T^*$$: $$\alpha X\beta u$$ is derivable in G;

(d)
or $$w\in (\{A,B,C,D\}\cup T)^*$$: w is derivable in G.

6.
If Z occurs in w, then , where $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*$$ and $$u\in T^*$$: $$\alpha \beta u$$ is derivable in G.

These properties are surely true at the very beginning, as the sentential form  S satisfies 2. We are discussing the induction step in what follows.
Conditions 2 and 3 consider any sentential form w that satisfies 2 or 3. Hence, $$w=\alpha X\beta u$$, where $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*$$ and $$u\in T^*$$. In fact, the conditions are more specific about some details regarding X and $$\beta$$. All rules but r1 (for some context-free rule r) or h1 require the presence of some rule marker (or of Z) and are hence not applicable.
• If matrix r1 is applied, then $$w'=\alpha r'r\beta u$$ results, satisfying 5(a). The left-hand side of rule r must have been X. By induction, $$\alpha X \beta u$$ is derivable in  G.
• If h1 is applied, then $$w'$$ results, with , satisfying 6. Matrix h1 checks the presence of $$X=S'$$. By induction, $$\alpha S'\beta u$$ is derivable in  G, so that applying h1 corresponds to an application of h on $$\alpha S'\beta u$$ in  G.
Condition 5(a) Consider any sentential form w that satisfies 5(a). Hence, $$w=\alpha r'r\beta u$$, where $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*$$ and $$u\in T^*$$. Moreover, $$\alpha X\beta u$$ is derivable in G, where X is the left-hand side of rule r. As no other rule markers are present in w, only matrix r1 is applicable. Now, we have to distinguish between $$r=p\in P_{rl}$$ and $$r=q\in P_{ll}$$. In the first case, we inevitably derive $$w'=\alpha p'''p''p\beta u$$ [see 5(b)], and in the second case, we arrive at $$w'=\alpha q'q''b\beta u$$ [see 5(d)]. In both cases, by induction hypothesis, the stated derivability conditions on G are true.
Condition 5(b) Consider any sentential form w that satisfies 5(b). Hence, $$w=\alpha p''bp\beta u$$, where $$\alpha \in \{A,C\}^*$$, $$p\in P_{rl}$$ with $$p:X\rightarrow bY$$, $$\beta \in \{B,D\}^*$$ and $$u\in T^*$$: $$\alpha X\beta u$$ is derivable in G. The only matrix that can cope with the presence of the rules markers p and $$p''$$ (and only these) is p3. Now, inevitably, $$w\Rightarrow _{p3}w'=\alpha bY\beta u$$, bringing us into Case 2 or 3, as G can derive this string from $$\alpha X\beta u$$ (induction hypothesis) by applying p.
Condition 5(c) Consider any sentential form w that satisfies 5(c). Hence, $$w=\alpha p'''bp\beta u$$, where $$\alpha \in \{A,C\}^*$$, $$p\in P_{rl}$$ with $$p:X\rightarrow bY$$, $$\beta \in \{B,D\}^*$$ and $$u\in T^*$$: $$\alpha X\beta u$$ is derivable in G. The presence of the rules markers p and $$p'''$$ enforces us to apply matrix p4. Now, inevitably, $$w\Rightarrow _{p4}w'=\alpha bY\beta u$$, bringing us into Cases 2 or 3, as G can derive this string from $$\alpha X\beta u$$ (induction hypothesis) by applying p.
Condition 5(d) As the reader may check, no matrix from $$\Gamma$$ is applicable in such a situation.
Condition 6 Consider any sentential form w that satisfies 6. The only applicable matrices are $$\text {move-}Z$$, $$\text {del-}Z$$, f1, g1. If we apply matrix $$\text {move-}Z$$, this brings us back to another situation of Case 6, whose claims readily hold by induction. A similar easy case is the application of $$\text {del-}Z$$, which leads us into Case 5(d). This is particularly interesting when , as now a terminal string was derived in $$\Gamma$$ that is also generated by G. Matrix f1 can only be applied if Z sits immediately to the left of the substring AB. The effect of applying f1 corresponds to deleting this substring and hence to applying the rule f in G. This brings us back to Case 6. A similar argument holds for applying matrix g1.
These considerations complete the proof due to Condition 5(d) that applies to $$w\in T^*$$. The second equality follows by Proposition  2. $$\square$$
It is shown that $$\mathrm {MAT}(3;1,2,0;1,0,0)=\mathrm {MAT}(3;1,0,2;1,0,0)=\mathrm {RE}$$ in Fernau et al. ( 2019), which is the conference version of this paper. We now improve the matrix length from 3 to 2 in the following theorem.
Theorem 3
$$\mathrm {MAT}(2;1,2,0;1,0,0)=\mathrm {MAT}(2;1,0,2;1,0,0)=\mathrm {RE}$$.
Before we begin our proof, we highlight the key feature of the markers first. In order to simulate, say, $$AB \rightarrow \lambda$$, we have to use deletion rules $$(\lambda ,A,\lambda )_{del}$$ and $$(\lambda ,B,\lambda )_{del}$$, as deletions cannot be performed under contexts. However, there is the danger that we are deleting unintended occurrences. So we have to carefully place markers before, after and between the chosen nonterminals A and B in order to check that they are neighbored. Also, the auxiliary nonterminal  $$\$$, which is present in the axiom itself, serves as a semaphore flag as known in concurrent programming, preventing simulation cycles from being interrupted. We will use this trick also in other simulations below.
Proof
Consider a type-0 grammar $$G=(N,T,P,S)$$ in tsSGNF, with the rules uniquely labelled with $$P_{ll}\cup P_{rl}$$. Recall the decomposition $$N=N^{(0)}\cup N'\cup N''$$ by tsSGNF. We can construct a matrix ins-del system $$\Gamma = (V,T,\{\ S\},M)$$ with alphabet $$V = N \cup T \cup P_{rl}\cup P_{ll} \cup K$$ where K is the following set of markers:
\begin{aligned}&\{p,p',p'',p'''\mid p\in P_{rl}\} \cup \ \{q,q',q'',q'''\mid q\in P_{ll}\}\\&\quad \cup \ \{f,f',f'',f^3,f^4,f^5,f^6,f^7, f^8, f^9, f^{10},g,g',g'',g^3,g^4,g^5,g^6,g^7, g^8, g^9, g^{10},\\}\,. \end{aligned}
The set of matrices M is defined as follows. Rules $$p: X\rightarrow bY\in P_{rl}$$ and $$q: X \rightarrow Yb\in P_{ll}$$ are simulated by the matrices shown in Fig.  3a, b, respectively. We simulate rule $$f: AB\rightarrow \lambda$$ by the matrices shown in Fig.  4. Rule $$g: CD\rightarrow \lambda$$ is simulated alike.
Recalling that our axiom is $$\ S$$, we also have two additional matrices (i) $$\tau =[(\lambda ,\,\lambda )_{del}]$$ for termination and (ii) $$h1=[(\lambda ,S',\lambda )_{del}]$$ to simulate the phase transition $$h: S' \rightarrow \lambda$$.
We now proceed to prove that $$L(\Gamma ) = L(G)$$, starting with the inclusion $$L(G)\subseteq L(\Gamma )$$. Consider the string $$\alpha X \beta$$ derivable from S in G, with $$X\in N^{(0)}\cup N'$$ and $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*T^*$$ according to Remark  1. We now show that on applying the matrices of Fig.  3a (introduced for simulating rules from $$P_{rl}$$ of the type $$X\rightarrow bY$$), we can derive some string within $$\Gamma$$, starting from some , simulating $$w=\alpha X \beta \Rightarrow \alpha bY \beta =w'$$ in  G.
Quite similarly, applying the matrices of Fig.  3b (introduced for simulating rules from $$P_{ll}$$ of the type $$X\rightarrow Yb$$), we can derive some string within $$\Gamma$$, starting from some , simulating $$w=\alpha X \beta \Rightarrow \alpha Yb \beta =w'$$ in  G.
Every time a context-free rule is simulated by either p or q rules, the marker  $$\$$ is first deleted and finally re-inserted at a somewhat random position at the end of every simulation. In other words, the $$\$$ gets shuffled around the string during Phase I. The phase transition rule $$h:S'\rightarrow \lambda$$ is simulated by applying h1, so that we can now speak about Phase II of G. As the working of the g rule simulation is similar to the working of the f rule simulation, we discuss only rule  f below. To actually produce a terminal word, $$\Gamma$$ has to apply $$\tau$$ at the very end.
We now discuss Phase II in detail, focussing on $$f: AB \rightarrow \lambda$$. Let $$w=\alpha AB\beta$$ be a sentential form derivable in G, with $$A,B \in N''$$ and $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*T^*$$, ensured by tsSGNF. This means that is derivable in $$\Gamma$$ (by induction). We can now see that
The purpose of introducing a $$\$$ in f13 is to enable another simulation of $$AB \rightarrow \lambda$$ or of $$CD\rightarrow \lambda$$. When all occurrences of AB and CD are deleted by repeated applications of the matrices designed for simulating the f and g rules, there is still a $$\$$ at the end of every simulation. This $$\$$ is deleted by applying rule $$\tau$$, thereby terminating Phase II of tsSGNF. Inductively, this shows that $$L(G)\subseteq L(\Gamma )$$.
Let us now prove the converse direction. Consider once more a string $$w= \alpha X \beta$$ derivable from S in G, with $$X\in N^{(0)}\cup N'$$ and $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*T^*$$ according to Remark  1. We know by our previous arguments that the variation can be derived in  $$\Gamma$$. In particular, the axiom fits into this description. We will now argue that, starting with such a string, we either terminate the first phase of the simulation by applying h1, which obviously corresponds to applying $$S'\rightarrow \lambda$$ in G, or we try to apply any of the other matrices. We will then show that any non-blocked derivation will again lead to a string of the form , which justifies our starting point inductively. As all matrices of the form p2, ..., p7 or q2, ..., q7 or f2, ..., f13 require the presence of a marker symbol from $$K{\setminus } \{\\}$$, we can focus on applying the matrices p1, q1 or f1. Here, we ignore the rules g1, ..., g13 due to their similarity to the matrices simulating the f rules. Also, the matrices simulating the p rules and the q rules are very similar, so that we only discuss the first ones.
If we apply f1, this blocks any other simulation branch, as the marker $$\$$ is deleted and the marker f is inserted, but no other marker is present in the string $$w'$$ that can be thought of being produced from w by randomly inserting some  f. As can be checked case-by-case, the only applicable matrix is now f2, which also makes clear that the f was inserted left to some A-occurrence (and hence within $$\alpha$$) in w. To highlight the chosen insertion place, let $$\alpha =\alpha 'A\alpha ''$$ such that $$w'=\alpha 'fA\alpha ''X\beta$$. Now, if $$w'\Rightarrow _{f2} w''$$, we can conclude that $$w''=\alpha 'ff'Af''\alpha ''X\beta$$. The only applicable matrix is now f4, leading to $$w''\Rightarrow _{f4}\alpha 'ff'f''f^6\alpha ''X\beta \Rightarrow _{f5}\alpha 'ff'\alpha ''X\beta$$. Notice that the application of f5 that we displayed is in fact the only possibility. But now, the derivation is stuck. Therefore, there is no use of applying f1 on $$w^{\}$$.
We can only simulate p rules on $$w^{\}$$ that have X as their left-hand side, because the correctness of this (unique) symbol is tested as left context in p1. This enforces $$w^\\Rightarrow _{p1}\alpha Xp\beta$$ as intended. As $$\$$ is deleted and no (variation of) f is present as a marker, in particular no fi matrix is applicable for $$i=1,\dots ,13$$. As p is the only marker in $$\alpha Xp\beta$$, only p2 is applicable indeed, which enforces $$\alpha Xp\beta \Rightarrow _{p2}\alpha pp'\beta$$ as intended. The presence of p and $$p'$$ makes three matrices interesting to apply: p2, p3 and p7. However, the absence of X renders applying p2 impossible. Yet, p7 could be applied, leading to $$\alpha p\beta$$. But how to continue from here? Any rule dealing with p either requires some symbol like X to the left of p (in matrix  p2) or some $$p'$$ to the right of p (in matrix  p3) or the presence of $$p''p'$$ (in matrix  p4). The absence of $$\$$ also prohibits starting another rule simulation. In other words, the derivation is stuck. This shows that we have to continue with $$\alpha pp'\beta \Rightarrow _{p3}\alpha pp'''p'p''\beta$$, as was our intention. Observe that neither $$\$$ nor X nor $$pp'$$ nor $$bp''$$ nor Y is present in the current sentential form, which means that only p4 or p7 might apply. After applying p7, we obtain $$\alpha pp'''p''\beta$$, which again has no substring like $$\$$, X, $$pp'$$, $$bp''$$ or Y. Additionally, the substring $$p'''p'$$ is now lacking, which means that none of the rules is applicable and hence the derivation is blocked. Therefore, we have to apply matrix  p4 as intended, enforcing $$\alpha pp'''p'p''\beta \Rightarrow _{p4}\alpha p'''p'bp''\beta$$. The absence of $$\$$, X (or Y) and p blocks pi for $$i=1,2,3,4,6$$, as well as f1. If we apply p7, we arrive at $$\alpha p'''bp''\beta$$. Still, symbols X, Y and p are missing, so that p5 would be the only applicable rule, leading to $$\alpha p'''bp''\beta \Rightarrow _{p5}\alpha p'''bY\beta$$. Due to the absence of a $$\$$-marker, the only matrix that can deal with the substring $$p'''b$$ is p6, which leads to $$\alpha bY\\beta$$ as intended, although by following a strategy slightly different from the one presented before. Finally, we have to discuss what happens if we apply p5 on $$\alpha p'''p'bp''\beta$$ (as presented above). We arrive at $$\alpha p'''p'bY\beta$$. As the markers p and $$\$$ are missing, only p6 and p7 are applicable. If p7 is applied first, then arguments similar to the previous ones show that now only p6 is applicable, leading to $$\alpha bY\\beta$$ as intended, although by again following a slightly different strategy. Alternatively, we consider applying p6 on $$\alpha p'''p'bY\beta$$ as intended, leading to $$\alpha p'bY\\beta$$. When we apply p7 now, we have arrived at $$\alpha bY\\beta$$ with the derivation strategy presented above.
However, there is one last catch in this simulation. Nobody forces us to apply p7 on $$\alpha p'bY\\beta$$; we could also keep $$p'$$ within the string and continue with simulations of context-free rules or (failed) simulations of non-context-free rules (as discussed above) or, after applying h1, we might even start (successfully) simulating non-context-free rules (as discussed below). A similar analysis is valid for the q rules, leading to a string like $$\alpha q'Yb\\beta$$. Now, observe that any matrix (apart from p7 or q7 that erase $$p'$$ or $$q'$$) that makes use of $$p'$$ (or $$q'$$) expects p or $$p'''$$ ( q or $$q'''$$, respectively) to the left of $$p'$$ (or $$q'$$), which cannot happen, as the (unique) symbols from $$N^{(0)}\cup N'$$ of the tsSGNF grammar stay to the right of $$p'$$ (or $$q'$$). Therefore, the presence of single-primed rule markers is harmless, they are to be deleted at any time later using p7 or q7. Hence, we ignore them in our discussions. In particular, assuming that we only have to discuss a string $$w= \alpha X \beta$$ derivable from S in G (as we did so far) is not devaluating our argumentation.
Now, assume that Phase I was simulated as intended. This means that we consider some string w that is derivable in Phase I by  G as well as is by  $$\Gamma$$. As we are using tsSGNF, this means that $$w\in \{A,C\}^*\{B,D\}^*T^*$$ by Remark  1. Notice that by the discussions performed so far, this observation translates to the simulating grammar  $$\Gamma$$.
First, assume that $$w=\alpha A\zeta \beta$$ for some $$\alpha \in \{A,C\}^*$$, $$\zeta \in \{B,D\}$$ and $$\beta \in \{B,D\}^*T^*$$, or $$\zeta \in T$$ and $$\beta \in T^*$$, or $$w=\alpha A$$ is a sentential form derivable in  G, corresponding to some string $$w^{\}$$ derivable in  $$\Gamma$$. The only applicable rule is  f1 (or  g1, discussed at the end), since other matrices demand the presence of a marker (say, either f or  $$f''$$). Also, no rule from the simulation of Phase I is applicable anymore due to the absence of symbols from $$N^{(0)}\cup N'$$. Recall that we ignore the possible presence of symbols like $$p'$$ as discussed above. Also, by the very structure of the matrices simulating an f rule, no progress on a sentential form $$\beta$$ with $$\beta \in \{B,D\}^*T^*$$ is observed beyond the possible application of  f1. One application of f1 on w deletes the symbol $$\$$ that was present in the axiom and introduces a marker f randomly, hence leading to a string $$w_1$$ from . All matrices simulating the f rule require the presence of (multiply) primed versions of the marker f, apart from f2 which has to be applied next. Matrix f2 checks that an A must be immediately to the right of f, which means that f has been previously inserted within the prefix $$\alpha A$$ of w. Hence, the resulting string $$w_2$$ can be (equivalently) obtained from w by first picking one occurrence of A within the prefix $$\alpha A$$ and then inserting the string $$f''$$ immediately to the right of this A-occurrence and $$f'$$ immediately to the right of marker f. The introduction of $$f'$$ between f and A spoils the pattern fA, thus f2 cannot be applied again. This leads to the string $$\alpha ' ff'Af''\alpha ''\zeta \beta$$, with $$\alpha A=\alpha 'A\alpha ''$$. All matrices fi but f3 require at least one of the substrings $$\$$, fA, $$f'f''$$ or $$f^k$$ ( $$3 \le k \le 10$$) to be present, which renders them inapplicable. However, matrix f3 also checks that the A-occurrence that we picked within w sees a B-occurrence to the immediate right of $$Af''$$. This enforces $$\zeta =B$$ in our string $$w=\alpha A\zeta \beta$$ and $$\alpha =\alpha '\alpha ''$$ (all other strings are stuck at this point). Therefore, after applying matrix f3, we arrive at the string $$w_3=\alpha ff'A f''f^3Bf^5\beta$$. Now, none of the matrices $$f5-f13$$ are applicable due to the absence of one of the markers $$f^4, f^6, f^7, f^9, f^{10}$$. The matrices f1,  f2,  f3 are also inapplicable due to the absence of $$\$$, fA, and $$f''B$$, respectively. Hence, the only applicable rule is  f4. The first rule in  f4 demands the deletion of one occurrence of  A. Initially, it might be tempting to delete some other unintended A-occurrence. However, only if the intended A-occurrence (part of the substring $$f'Af''$$) is deleted, then we will have the substring $$f'f''$$ in order to apply the second rule of  f4. Hence, we get $$w_4 = \alpha ff'f''f^6f^3Bf^5\beta$$. Again, matrices $$f1-f3$$, $$f6-f13$$ still remain inapplicable due to the same reason as above. The only applicable matrices on $$w_4$$ are f4 (again) or f5.
Suppose f4 is applied again (repeatedly for $$k \ge 0$$ times) on $$w_4$$, then some  k number of As in $$\alpha$$ are deleted freely and $$(f^6)^k$$ is inserted after $$f'f''$$, leading to a string $$w_4'=\alpha ' ff'f''(f^6)^kf^6f^3Bf^5\beta$$, where $$\alpha '$$ is obtained from $$\alpha$$ by deleting k occurrences of  A. For further continuation of the derivation, the only other applicable matrix is  f5. Since there is a unique occurrence of $$f''$$, the matrix f5 can be applied only once, which will delete only one $$f^6$$ leading to the string $$w_4''=\alpha ' ff'(f^6)^kf^3Bf^5\beta$$. Assuming f4 is applied no more, no other matrix (including f6), is applicable due to the absence of one of $$\, fA, f''B, f'', f^4, f^7, f^9, f^{10}$$. Hence, the derivation is stuck here if $$k>0$$. We hence apply matrix f5 on $$w_4=\alpha ff'f''f^6f^3Bf^5\beta$$ to get $$w_5= \alpha ff'f^3Bf^5\beta$$.
Due to the absence of one of the markers or substrings $$\, fA, f'', f^4, f^7, f^9,f^{10}$$, no matrix other than f6 is applicable. Applying f6 on $$w_5$$ yields $$w_6=\alpha ff'f^7Bf^5\beta$$. The absence of $$\, fA, f'', f^3, f^4, f^7f^5, f^8, f^9,f^{10}$$ blocks the applicability of all fi matrices except  f7. Applying f7 on $$w_6$$ leads to the string $$w_7=\alpha f'f^7Bf^4f^5\beta$$ as intended. Matrices $$f1-f7$$ are not applicable, since the markers $$\, f,f'',f^3$$ are not present in $$w_7$$. Similarly, matrices f9, f11, f12 and f13 are not applicable on $$w_7$$, since the markers $$f^8$$, $$f^9$$ and $$f^{10}$$ are not present. Additionally, f10 is inapplicable due to the absence of $$f^7f^5$$. Thus, the only applicable matrix is f8. The first rule in f8 demands the deletion of one occurrence of  B. Initially, it might be tempting to delete some other unintended B-occurrence. However, only if the indented B is deleted, then we will have the substring $$f^7f^4$$, and the second rule of f8 can be applied. Hence, we get $$w_8 = \alpha f^3f^7f^4f^8f^5\beta$$. The matrices $$f1-f7$$, $$f11-f13$$ are inapplicable due to the absence of one of the markers $$\, f, f^3, f'', f^9,f^{10}$$. The matrix f10 is inapplicable, since we do not the have the context $$f^7f^5$$ in $$w_8$$. The only applicable matrices on $$w_8$$ are f8 (again) or f9.
Suppose f8 is applied again (repeatedly for $$k \ge 0$$ times) on $$w_8$$, then some k number of Bs in $$\beta$$ are deleted freely and $$(f^8)^k$$ is inserted after $$f^7f^4$$, leading to a string $$w_8'=\alpha f'f^7f^4(f^8)^kf^8f^5\beta '$$, where $$\beta ' = \beta$$ without some k occurrences of B. For further continuation of the derivation, the only other applicable matrix is  f9. Since there is a unique occurrence of $$f^4$$, the matrix f9 can be applied only once, which will delete only one $$f^8$$, leading to the string $$w_9'=\alpha ' f'f^7(f^8)^kf^5\beta$$. Due to the absence of any of $$\, f, f'', f^3, f^4, f^9,f^{10}$$, no matrix is applicable on $$w_9'$$ but $$f^{10}$$, but this requires the presence of the substring $$f^7f^5$$. This is only possible of $$k=0$$ (as intended), i.e., if $$w_8=w_8'$$ and $$w_9= w_9'=\alpha f'f^7f^5\beta$$.
The matrices $$f1-f9$$, $$f11-f13$$ are clearly inapplicable on $$w_9$$ due to the absence of the markers $$\, f, f', f'', f^4, f^9,f^{10}$$. So we could now apply f10 on $$w_9$$, then we will have $$w_{10}= \alpha f'f^7f^9\beta$$. Due to absence of essential markers, all matrices except f11 and f12 are inapplicable. If the latter is applied, then we get $$w_{10}'= \alpha f^7\beta$$ and the derivation is stuck as no continuation is possible. In particular, we cannot apply matrix f7 due to the absence of f, a marker that we cannot introduce by applying f1 due to the absence of the marker  $$\$$. So, f11 is applied onto $$w_{10}$$ and we get $$w_{11}= \alpha f'f^9f^{10}\beta$$. Now, the only applicable matrices are f12 and f13. Both the rules can be applied in any order (independently) to get . Observe that there are also scenarios where only f13 is applied, so that we stick with . Now, we can indeed start a new cycle of simulation with f1 and f2, but then at latest, in order to find the substring $$f''B$$ as required by  f3, we should apply f12 to connect $$\alpha$$ and $$\beta$$.
Let us finally return to the situation when we try to apply g1 on $$w^{\}$$, with $$w=\alpha A\zeta \beta$$ for some $$\alpha \in \{A,C\}^*$$, $$\zeta \in \{B,D\}$$ and $$\beta \in \{B,D\}^*T^*$$, or $$\zeta \in T$$ and $$\beta \in T^*$$, or $$w=\alpha A$$ (i.e., $$\zeta =\beta =\lambda$$), being a sentential form derivable in  G, corresponding to some string $$w^{\}$$ derivable in  $$\Gamma$$. If $$w\Rightarrow _{g1}w'$$, then . If we now apply g2, this means that we have split $$\alpha$$ as $$\alpha 'C\alpha ''$$, so that for $$w'\Rightarrow _{g2}w''$$, $$w''=\alpha 'gg'Cg''\alpha ''A\zeta \beta$$, with $$\zeta$$ and $$\beta$$ satisfying the conditions stated above. However, now the derivation is stuck, as the string $$w''$$ does not contain a substring CD as required by matrix  g3.
These arguments show that also the non-context-free rules are correctly simulated and hence the whole simulation is correct, as also no successful re-starts of simulations are possible on strings from .
The second claimed computational completeness result follows by Proposition  2 and this concludes the proof. $$\square$$
In the previous two theorems, the maximum length of the insertion/deletion context was two and the other operation, namely deletion/insertion is done in a context-free manner. If we restrict the parameters in the size to be binary (0 or 1), then we achieve computational completeness using matrices of maximum length two; however, insertion is now performed under a 2-sided context.
Theorem 4
$$\mathrm {MAT}(2;1,1,1;1,0,0)=\mathrm {RE}$$.
Proof
Consider a type-0 grammar $$G=(N,T,P,S)$$ in tsSGNF. The rules from P are supposed to be labelled injectively with labels from the set $$[1 \ldots |P|]$$, with label sets $$P_{ll}$$ and $$P_{rl}$$ as defined above. Also recall that the nonterminal alphabet decomposes as $$N=N^{(0)}\cup N'\cup N''$$, $$N''=\{A,B,C,D\}$$, $$S\in N^{(0)},S'\in N'$$, according to the normal form. We construct a matrix insertion-deletion system $$\Gamma = (V,T,\{\ S\},M)$$ with alphabet set $$V = N \cup T \cup K$$, where
\begin{aligned} K= & {} \{p,p'\mid p\in P_{rl} \}\cup \{q,q'\mid q\in P_{ll} \}\cup \{\\}\\&\cup \{f,f',f'',f^3,\ldots , f^{7},g,g',g'',g^3, \ldots ,g^{7}\} \end{aligned}
The set of matrices M of $$\Gamma$$ consists of the matrices described in Fig.  5a, b and in Fig.  6.
Specifically, we simulate a rule p: $$X \rightarrow bY$$, $$X \ne Y$$, $$X,Y\in N^{(0)} \cup N'$$, $$b\in N''$$, i.e., $$p\in P_{rl}$$, by the matrices displayed on the left-hand side of Fig.  5a. Similarly, we simulate a rule q: $$X \rightarrow Yb$$, $$X \ne Y$$$$X,Y\in N^{(0)}\cup N'$$, $$b\in N''\cup T$$, i.e., $$q\in P_{ll}$$, by the matrices shown on the right-hand side of Fig.  5b. We simulate rule $$f: AB\rightarrow \lambda$$ by the matrices shown in Fig.  6. The simulation of rule $$g: CD\rightarrow \lambda$$ is done is a similar manner.
Recalling that our axiom is $$\ S$$, we also have two additional matrices (i) $$\tau =[(\lambda ,\,\lambda )_{del}]$$ for termination and (ii) $$h1=[(\lambda ,S',\lambda )_{del}]$$ to simulate the phase transition $$h: S' \rightarrow \lambda$$.
We now proceed to prove that $$L(\Gamma ) = L(G)$$, starting with the inclusion $$L(G)\subseteq L(\Gamma )$$. Consider the string derivable from $$\ S$$ in G, with $$X\in N^{(0)}\cup N'$$ and $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*T^*$$ according to Remark  1. We apply matrix p1, which deletes the only hanging marker $$\$$ and then inserts p to the right of X, leading to $$w_1=\alpha Xp \beta$$. On applying matrix p2, which inserts a $$p'$$ between X and p and then deletes the nonterminal X, we have $$w_2 = \alpha p'p\beta$$. Applying matrices p3,  p4,  p5 in the specified order, we have the following:
\begin{aligned} w^\ \Rightarrow _{p1} \alpha Xp\beta \Rightarrow _{p2} \alpha p'p\beta \Rightarrow _{p3} \alpha bp\beta \Rightarrow _{p4} \alpha bYp\beta \Rightarrow _{p5} \alpha b\Y\beta . \end{aligned}
We note that at every cycle of context-free rule simulation, the $$\$$ is deleted at the beginning of the simulation and introduced at the end of the simulation so as to enable the start of next simulation cycle.
The phase transition rule $$h:S'\rightarrow \lambda$$ is simulated by applying h1, so that we can now speak about Phase II of G. As the working of the g rule simulation is similar to the working of the f rule simulation, we discuss only rule  f below. To actually produce a terminal word, $$\Gamma$$ has to apply $$\tau$$ at the very end.
We now discuss Phase II in detail, focusing on $$f: AB \rightarrow \lambda$$. Let $$w=\alpha AB\beta$$ be a sentential form derivable in G, with $$A,B \in N''$$ and $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*T^*$$, ensured by tsSGNF. This means that is derivable in $$\Gamma$$ (by induction). We can now see that
The purpose of introducing a $$\$$ in the last rule f10 or g10 is to enable another simulation of $$AB \rightarrow \lambda$$ or of $$CD\rightarrow \lambda$$. When all occurrences of AB and CD are deleted by repeated applications of the simulations of the f and g rules, there is still a $$\$$ at the end of every simulation. This $$\$$ is deleted by applying rule $$\tau$$, thereby terminating Phase II of tsSGNF. Inductively, this shows that $$L(G)\subseteq L(\Gamma )$$.
Let us now consider the converse direction. Consider a string $$w= \alpha X \beta$$ derivable from S in G, with $$X\in N^{(0)}\cup N'$$ and $$\alpha \in \{A,C\}^*$$, $$\beta \in \{B,D\}^*T^*$$ according to Remark  1. We know by our previous arguments that the variation can be derived in $$\Gamma$$. We will now argue that, starting with such a string, we either terminate the first phase of the simulation by applying h1, which obviously corresponds to applying $$S'\rightarrow \lambda$$ in G, or we try to apply any of the other matrices. As all matrices of the form p2, ..., p5 or q2, ..., q5 or f2, ..., f10 require the presence of a marker symbol from $$K{\setminus } \{\\}$$, we can focus on applying p1, q1 or f1. Here, we ignore the matrices g1, ..., g10 due to their similarity to the fi matrices. Also, the simulations of the p rules and q rules are very similar, so that we only discuss the first ones.
We now start to apply p1 on $$w^{\}$$ which deletes the marker $$\$$ and then inserts a p to the right of X. Hence, $$w^\ \Rightarrow _{p1} \alpha Xp\beta$$. The absence of $$\, p', Y$$ blocks the application of rules p1,  p3,  p4,  p5. Deterministically, the next matrix application is p2 on $$w_1$$ which inserts $$p'$$ after X and then deletes the X to yield $$w_2 =\alpha p'p\beta$$. It is to be noted that to the left of p, the marker $$p'$$ is present and therefore, matrix  p4 cannot be applied. Since there is no X, the matrices p1 and p2 cannot be applied. Further, the absence of bY prevents the application of the rule of p5. We must apply matrix p3 now, introducing a b between $$p'$$ and p and then deleting the marker  $$p'$$. This corresponds to applying the rewriting rule $$p' \rightarrow b$$ to  $$w_2$$, yielding $$w_3 = \alpha bp\beta$$. The absence of $$\, X, p', Y$$ forces us to apply the only applicable matrix p4 on $$w_3$$ that inserts a  Y before  p, yielding $$w_4 = \alpha bYp\beta$$. The only applicable matrix is now  p5 which inserts a $$\$$ between the recently introduced bY and deletes the marker p, yielding $$w_5 = b\ Y$$.
Now, assume that Phase I was simulated as intended. This means that we consider some string  w that is derivable in Phase I by  G as well as is by  $$\Gamma$$. As we are using tsSGNF, this means that $$w\in \{A,C\}^*\{B,D\}^*T^*$$ by Remark  1.
First, assume that $$w=\alpha A\zeta \beta$$ for some $$\alpha \in \{A,C\}^*$$ and $$\beta \in \{B,D\}^*T^*$$, or $$\zeta \in T$$ and $$\beta \in T^*$$, or $$w=\alpha A$$ is a sentential form derivable in G, corresponding to some string $$w^{\}$$ derivable in $$\Gamma$$. The only applicable rule is f1 (or g1, discussed at the end), since other matrices demand the presence of a marker (say, either f or $$f''$$). Also, no rule from the simulation of Phase I is applicable anymore due to the absence of symbols from $$N^{(0)}\cup N'$$. Also, by the very structure of the matrices simulating an f rule, no progress on a sentential form $$\beta$$ with $$\beta \in \{B,D\}^*T^*$$ is observed beyond the possible application of f1. On applying f1 on w deletes the symbol $$\$$ that was present in the axiom and introduces a marker f randomly, hence leading to a string $$w_1$$ from . Application of f1 again is not possible due to the absence of $$\$$. Apart from the matrices f2 and f3, all other matrices simulating the f rule require the presence of (multiply) primed versions of the marker f and therefore they cannot be applied.
Assume first we had applied matrix  f3 on  $$w_1$$. The obtained string $$w'$$ can be obtained from w by inserting a $$f''$$ between AB (which must form the central part). Since $$f'$$ is missing, f4 cannot be applied. But $$f'$$ is introduced by f2 only, which supposes f to be presented, whose introduction assumes the presence of  $$\$$. Can we get rid of $$f''$$ again? In order to do so, we must apply matrix  f6, which assumes the presence of $$f^4$$. However, in order to introduce $$f^4$$, we must apply matrix f4, which is impossible. Hence, the derivation is stuck.
So, to proceed further, one has to apply matrix  f2 on  $$w_1$$. The first rule in matrix f2 checks that an A must be immediately to the right of f, which means that the previously inserted f has been introduced within the prefix $$\alpha A$$ of $$w_1$$, thus verifying the decomposition $$w_1=\alpha ' fA\alpha ''\zeta \beta$$, with $$\alpha A=\alpha 'A\alpha ''$$. A new marker $$f'$$ is introduced between f and A by the first rule of matrix  f2. The second rule in f2 introduces the marker $$f^3$$ after any B, thus obtaining $$w_2=\alpha 'ff'A\alpha ''\beta 'Bf^3\beta ''$$, with $$\zeta \beta =\beta 'B\beta ''$$. The same matrix cannot be applied again, as the substring fA is no longer present in the derived string. Now, only two matrices, namely f3 and  f8, are applicable to $$w_2$$, since other matrices require markers which are not introduced yet. Applying f8 makes no sense, as the just introduced markers  $$f'$$ and  $$f^3$$ are deleted. Alternatively, the matrix  f3 introduces the marker $$f''$$ between A and  B, this ensures that the B must be immediately to the right of A in $$w_2$$, which enforces $$\zeta =B$$ in  w (also enforces that in $$w_2$$, $$\alpha ''$$ must end with A and $$\beta '$$ must start with a  B) and also deletes the marker  f. Notice that this introduction of $$f''$$ also ensures that the central part of w was properly formed, avoiding mismatches like AD.
Hence, we have $$w_3=\alpha 'f'A\alpha ''f''\beta 'Bf^3\beta ''$$. The absence of the contexts $$\$$, fA, AB and $$f^i$$ for $$i\ge 4$$ blocks the application of matrices $$f1-f10$$ except f4, f5 and  f8. If we apply matrix  f8 now, the derivation is stuck, as now also f4 and f5 are inapplicable. A similar problem appears if we apply first f4 or f5 and then f8. The matrix f4 (applied on $$w_3$$) demands that if a A is deleted randomly, then we should get the substring $$f'f''$$ in the sentential form in order to apply the second rule of f4 and this is possible only if $$\alpha 'f'A\alpha ''f'' = \alpha ' f'Af''$$ in  $$w_3$$, i.e., if $$\alpha ''=\lambda$$. Similarly, the matrix f5 demands that if a B is deleted randomly, then we should get the substring $$f''f^3$$ in the sentential form in order to apply the second rule of f5 and this is possible only if $$f''\beta 'Bf^3\beta ''=f''Bf^3\beta ''$$ in $$w_3$$, i.e., if $$\beta '=\lambda$$. In summary, we find that necessarily $$w_3=\alpha 'f'Af''Bf^3\beta ''$$. Now, matrices f4 and f5 are applied in any order. They basically simulate the rewriting rules $$f'A\rightarrow f'f^4$$ and $$f''B\rightarrow f''f^5$$, so that we get $$w_4=\alpha ' f'f^4f''f^5f^3\beta ''$$. As the only matrix (namely, f6) that requires (apart from an occurrence of $$f''$$) the presence of $$f^4$$ also requires the presence of $$f^5$$ (and vice versa), we have to apply both matrices f4 and f5 before proceeding.
Note that from now on the matrices  f4 or  f5 can be re-applied if and only if $$f'f''$$ or $$f''f^3$$ is a substring of our sentential form, because deleting A or B cannot produce these substrings that are required by the second rules of the matrices  f4 or  f5, respectively. We call this observation  ob*. Now, what remains is the deletion of the markers that were introduced in this simulation in some order, however taking care that $$f'f''$$ and $$f''f^3$$ is never a substring in the resulting sentential form, so as to avoid an unintended (re)application of rules f4, f5 which will delete random occurrences of A or B (see ob*).
This danger is handled by replacing $$f''$$ with $$f^6$$ in matrix  f6. One may wonder what if the matrix f6 was never applied to the sentential form $$w_4=\alpha ' f'f^4f''f^5f^3\beta ''$$. Due to ob*, the only matrices that are applicable to $$w_4$$ are  f6 and  f8. If f6 was avoided and f8 was applied, then the markers $$f', f^3$$ are deleted, leaving behind $$w_5=\alpha ' f^4f''f^5\beta ''$$. At this point, no matrix is applicable except f6. Hence the matrix f6 is somehow enforced to be applied on $$w_4$$ or $$w_5$$ which replaces the marker $$f''$$ as $$f^6$$. Due to absence of $$f''$$ and ob*, matrices f4 and f5 can never be re-applied. If $$f''$$ has to be introduced again (using matrix f3), then AB need to be present as substring which is not possible as some markers are present in between them in the derived string $$w_4$$ or $$w_5$$. The purpose of matrix f7 is to make sure that $$f^4$$ and $$f^5$$ are deleted in different matrices. The markers are deleted using matrices f8 to f10 in any desired (applicable) order yielding . Three such derivations are shown below.
• $$w_4=\alpha ' f'f^4f''f^5f^3\beta '' \Rightarrow _{f6} \alpha ' f'f^4f^6f^5f^3\beta '' \Rightarrow _{f7} \alpha ' f'f^6f^7f^5f^3\beta '' \Rightarrow _{f8}$$
$$\alpha ' f^6f^7f^5\beta '' \Rightarrow _{f9} \alpha ' f^7\beta '' \Rightarrow _{f10} w_9^{\}$$.
• $$w_4=\alpha ' f'f^4f''f^5f^3\beta '' \Rightarrow _{f8} w_5=\alpha ' f^4f''f^5\beta '' \Rightarrow _{f6} \alpha ' f^4f^6f^5\beta '' \Rightarrow _{f7}$$
$$\alpha ' f^6f^7f^5\beta '' \Rightarrow _{f9} \alpha ' f^7\beta '' \Rightarrow _{f10} w_9^{\}$$.
• $$w_4=\alpha ' f'f^4f''f^5f^3\beta '' \Rightarrow _{f6} \alpha ' f'f^4f^6f^5f^3\beta '' \Rightarrow _{f7} \alpha ' f'f^6f^7f^5f^3\beta '' \Rightarrow _{f9} \alpha ' f'f^7f^3\beta '' \Rightarrow _{f8} \alpha ' f^7\beta '' \Rightarrow _{f10} w_9^{\}$$.
Let us finally return to the situation when we try to apply g1 on $$w^{\}$$, with $$w=\alpha A\zeta \beta$$ for some $$\alpha \in \{A,C\}^*$$, $$\zeta \in \{B,D\}$$ and $$\beta \in \{B,D\}^*T^*$$, or $$\zeta \in T$$ and $$\beta \in T^*$$, or $$w=\alpha A$$ (i.e., $$\zeta =\beta =\lambda$$), being a sentential form derivable in  G, corresponding to some string $$w^{\}$$ derivable in  $$\Gamma$$. If $$w\Rightarrow _{g1}w'$$, then . If we now apply g2, this means that we have split $$\alpha$$ as $$\alpha 'C\alpha ''$$ and $$\zeta \beta$$ like $$\beta 'D\beta ''$$, with $$\zeta$$ and $$\beta$$ satisfying the conditions stated above, so that for $$w'\Rightarrow _{g2}w''$$, $$w''=\alpha 'gg'C\alpha ''A\beta 'Dg^3\beta ''$$. However, now the derivation is stuck, as $$w''$$ contains no substring CD as required by matrix  g3.
These arguments show that also the non-context-free rules are correctly simulated and hence completes the proof. $$\square$$
The reader might think that the markers $$f^6,f^7,f^8$$ and their relevant insertion and deletion rules are not necessary. One idea would be to construct matrices like $$f7' =[(\lambda ,f',\lambda )_{del},(\lambda ,f'',\lambda )_{del})]$$ and $$f8'=[(\lambda ,f^3,\lambda )_{del},(\lambda ,\,\lambda )_{ins}]$$. The problem is that however, such ideas might not work as one can start to apply these matrices $$f7'$$ and $$f8'$$ soon after applying f3, leading to unintended situations. This would make a correctness proof more tedious if not impossible at all. With the introduction of the markers $$f^6,f^7,f^8$$, the proof is simplified, to say the least.

## 4 Describing the regular closure of linear languages

We showed in Theorem  4 that matrix ins-del systems of size (2; 1, 1, 1; 1, 0, 0) can describe $$\mathrm {RE}$$. Further, it is shown in Fernau et al. ( 2018b) that, if we have a one-sided context for insertion, then matrix ins-del systems of size (3; 1, 1, 0; 1, 0, 0) or (3; 1, 0, 1; 1, 0, 0) and also (2; 2, 1, 0; 1, 0, 0) or (2; 2, 0, 1; 1, 0, 0) can simulate (meta-)linear grammars. However, whether or not one can simulate general context-free grammars with matrix ins-del systems of the above-mentioned sizes is still open. We summarize these results now. Example  1 shows that there are non-meta-linear languages that can be described by these matrix ins-del systems.
Proposition 4
Fernau et al. ( 2018b) The following language relations are true.
• $$\mathrm {LIN} \subsetneq \mathrm {MAT}(3;1,1,0;1,0,0) \cap \mathrm {MAT}(3;1,0,1;1,0,0)$$,
• $$\mathrm {LIN} \subsetneq \mathrm {MAT}(2;2,1,0;1,0,0) \cap \mathrm {MAT}(2;2,0,1;1,0,0)$$.
For quick reference, we present the matrix ins-del rules of $$\mathrm {MAT}(3;1,1,0;1,0,0)$$ that simulates the linear rules $$p: X \rightarrow aY$$, $$q: X \rightarrow Ya$$, $$f: X \rightarrow \lambda$$ in Fig.  7a and the matrix ins-del rules of $$\mathrm {MAT}(2;2,1,0;1,0,0)$$ in Fig.  7b.
Initially, our main objective was to find how much beyond $$\mathrm {LIN}$$ can a matrix ins-del system (of the four sizes stated in Proposition  4) lead us to. However, we then succeeded to provide a general result showing that if there exists a matrix ins-del systems of size $$(k;n,i',i'';m,j'j'')$$ describing $$\mathrm {LIN}$$, then the same system will describe $$\mathcal {L}_{reg}(\mathrm {LIN})$$.
Theorem 5
For all integers $$n,m \ge 1$$, $$t \ge 2$$ and $$i',i'',j',j'' \ge 0$$ with $$t+n \ge 4$$ and $$i'+i''\ge 1$$ , if every $$L\in \mathrm {LIN}$$ can be generated by a $$\mathrm {MAT}(t;n,i',i'';m,j',j'')$$ system with a single axiom that is identical to the start symbol S of a linear grammar describing L , then $$\mathcal {L}_{reg}(\mathrm {LIN}) \subseteq \mathrm {MAT}(t;n,i',i'';m,j',j'')$$ , as well. 3
Proof
Let $$L\in \mathcal {L}_{reg}(\mathrm {LIN})$$ for some $$L\subseteq T^*$$. By Proposition  1, we can assume that L is described by a context-free grammar $$G=(N,T,S,P)$$ that basically consists of a right-linear grammar $$G_R=(N_0,N'',S,P_0)$$ and linear grammars $$G_i=(N_i,T,S_i,P_i)$$ for $$1\le i\le k$$. For technical reasons that should become clear soon, we rather consider $$G_i'=(N_i',T,S_i,P_i')$$, where $$N_i'=N_i\cup \{\langle S_i,A\rangle \mid A\in N_0\}$$ and $$P_i'$$ contains, besides all rules from $$P_i$$, rules of the form $$\langle S_i,A\rangle \rightarrow w$$ whenever $$S_i\rightarrow w\in P_i$$ for some $$w\in (N_i\cup T)^*$$. This means, apart from $$L(G_i')=L(G_i)$$ (as the new nonterminals will never be used in terminating derivations), that also $$L((N_i',T,\langle S_i,A\rangle ,P_i'))=L(G_i)$$ for any $$A\in N_0$$.
Since $$\mathrm {LIN} \subseteq \mathrm {MAT}(t;n,i',i'';m,j',j'')$$, each $$G_i'$$ can be simulated by a matrix ins-del system $$\Gamma _i=(V_i,T,\{S_i\},R_i)$$ for $$1 \le i \le k$$, each of size $$(t;n,i',i'';m,j',j'')$$. We assume, without loss of generality, that $$V_i\cap V_j=T$$ if $$1\le i<j\le k$$. Let us first consider the case $$i' \ge 1$$ and $$i''=0$$. We construct a matrix ins-del system $$\Gamma$$ for G as follows 4: $$\Gamma = \left( V,T,\{\langle S_i,A\rangle A'\mid S\rightarrow S_iA\in P \},R \cup R'\right) \text {, where}$$ $$V = \left( \displaystyle \bigcup _{i=1}^{k} \left( V_i \cup \{\langle S_i,A\rangle \mid A \in N_0\}\right) \right) \cup N_0\cup \{A'\mid A\in N_0\}$$; $$R = \displaystyle \bigcup _{i=1}^{k} R_i$$; and for $$t\ge 3$$, $$R'$$ is the set $$\{m_p\mid p\in P_0\}$$, where:
• $$m_p = [(A', \langle S_i, B\rangle ,\lambda )_{ins},(\langle S_i, B\rangle , B',\lambda )_{ins},(\lambda ,A',\lambda )_{del})]$$ if $$p=A\rightarrow S_iB\in P_0$$,
• $$m_p = [(\lambda ,A',\lambda )_{del})]$$ if $$p=A\rightarrow \lambda \in P_0$$ (terminating matrix).
For $$t=2$$ and $$n\ge 2$$, we add the following matrix $$m_p$$ instead of the above-defined matrix $$m_p$$ into $$R'$$: $$m_p = [(A', \langle S_i, B\rangle B',\lambda )_{ins},(\lambda ,A',\lambda )_{del})]$$ for $$p=A\rightarrow S_iB\in P_0$$. Notice that $$A\ne B$$, as we assume that $$G_0$$ is in normal form, which is important for both variants of $$m_p$$.
The case when $$i'=0$$ and $$i'' \ge 1$$ follows from Propositions  2 and 3. $$\square$$
Combining Theorem 5 with results from Fernau et al. ( 2018b), we have the following corollary. The strictness of the subset relation in the theorem below follows from Example  1.
Corollary 1
The following assertions are true.
• $$\mathcal {L}_{reg}(\mathrm {LIN}) \subsetneq \mathrm {MAT}(3;1,1,0;1,0,0) \cap \mathrm {MAT}(3;1,0,1;1,0,0)$$,
• $$\mathcal {L}_{reg}(\mathrm {LIN}) \subsetneq \mathrm {MAT}(2;2,1,0;1,0,0) \cap \mathrm {MAT}(2;2,0,1;1,0,0)$$. $$\square$$

## 5 Describing regular languages

Petre and Verlan ( 2012), matrices of maximum length 3 and size (1, 1, 0; 1, 1, 0) were used to describe RE. By Krassovitskiy et al. ( 2011), ins-del systems of size (1, 1, 0; 1, 1, 0) do not achieve computational completeness. However, it is strictly contained in the class in the context-free languages [see Krassovitskiy et al. ( 2011)]. This corresponds to matrix length one. In the following, we attempt to study the generative power of (1, 1, 0; 1, 1, 0) with matrix length 2. With rules of size (2; 1, 1, 0; 1, 1, 0), we simulate the right-linear rules $$p: X \rightarrow bY$$ and $$X \rightarrow \lambda$$. The non-context rules of tsSGNF namely $$AB \rightarrow \lambda$$ and $$CD \rightarrow \lambda$$ can be simulated by rules of (2; 1, 0, 0; 1, 1, 0) (see Fig.  2). So, if we are able to show that $$q: X \rightarrow Yb$$ can also be simulated by some set of rules of size (2; 1, 1, 0; 1, 1, 0), then $$\mathrm {MAT}(2;1,1,0;1,1,0)$$ will be computationally complete. However, unfortunately, the simulation of $$q: X \rightarrow Yb$$ by rules of size (2; 1, 1, 0; 1, 1, 0) is still open. This also explains the difficulty with our earlier claims in Fernau et al. ( 2019).
Theorem 6
$$\mathrm {REG} \subsetneq \mathrm {MAT}(2;1,1,0;1,1,0) \cap \mathrm {MAT}(2;1,0,1;1,0,1)$$.
Proof
Consider a type-3 grammar $$G=(N,T,P,S)$$ in right-linear form, with the rules uniquely labelled with $$p: X \rightarrow bY$$ and $$r: X \rightarrow \lambda$$ where $$X, Y \in N$$ and $$b \in T$$. We can construct a matrix ins-del system $$\Gamma = (V,T,S,M)$$ with alphabet
\begin{aligned} V = N \cup T \cup \{p,p',p'',p'''\}\,. \end{aligned}
The set of matrices shown in Fig.  8a, b constitutes M.
Consider a sentential form $$w=\alpha X$$ derivable in the grammar  G, with $$\alpha \in T^*$$. Assume we are about to apply a concrete rule $$X\rightarrow bY\in P$$, with $$X,Y\in N$$, yielding $$w'=\alpha bY$$. Hence, the matrices listed in Fig.  8a should apply, one after the other, giving:
This shows that $$L(G)\subseteq L(\Gamma )$$. In the following, we are arguing why $$L(G)\supseteq L(\Gamma )$$.
Consider a string $$w=\alpha X$$ that is derivable from S in $$\Gamma$$. As no rule markers are present, only a matrix of type p1 or p2 can be applied.
Assume more concretely that matrix p1 or p2 belonging to $$p: X\rightarrow bY$$ is applied. If p2 is applied first before applying p1, then this means that X is replaced by the marker $$p''$$ and no further matrices are possible to apply as they require either the markers p or $$p'$$ or X which are not present in the derived string. So, only p1 can be applied first and it can be applied any number of times, yielding $$w_1(n) = \alpha X (p'p)^n,~n \ge 1$$. On $$w_1(n)$$, matrices p3 and p4 are inapplicable due to the absence of $$p''$$. At first glance, it may appear that matrix p5 is applicable. However, if we closely look at the two rules in the matrix p5, the matrix is applicable if and only if bp is a substring of our sentential form. We use this observation (calling it ob1) repeatedly. Note that every p in $$w_1(n)$$ is preceded by $$p'$$ and hence bp is not a substring in $$w_1(n)$$. This prohibits the application of p5 on $$w_1(n)$$. Finally, the absence of $$p'''$$ renders p6 inapplicable. So, assuming that p1 is not applied again, the only matrix applicable on $$w_1(n)$$ is p2. On applying p2, the marker $$p''$$ is introduced in the place of X which yields $$w_2(n) =\alpha p''(p'p)^n$$. The matrices p1 and p2 are hereafter not applicable due to the absence of  X. Matrices p4 and p6 require the presence of $$p'''$$ and are hence not applicable to $$w_2(n)$$. Observation  ob1 prevents us from applying p5. The only matrix that is applicable to $$w_2(n)$$ is p3 which introduces yet another new marker $$p'''$$ randomly and the second rule of p3 deletes the occurrence of $$p'$$ rightmost of  $$p''$$. Thus we obtain a string $$w_3(n)$$ that can be described as . For convenience, we let $$\tau _n = (p'p)^{n-1}$$. The absence of X in $$w_3(n)$$ prevents applying matrices p1 and p2. We can now observe that the matrix p3 is applicable if and only $$p''p'$$ is a substring (we call this observation  ob2). Since $$w_3(n)$$ contains neither bp nor $$p''p'$$ as a substring, matrices p5 and p3 are inapplicable on $$w_3(n)$$ due to ob1, ob2. Hence, the only matrices applicable on $$w_3(n)$$ are either p4 or p6. As a last observation  ob3, notice that matrices p4 and p6 are only applicable if $$p'''$$ is present.
Case 1: If we apply matrix p6 to $$w_3(n)$$, then the (randomly inserted) marker $$p'''$$ is deleted and we end up with $$w_3'(n)=\alpha p''p \tau _n$$. Due to the absence of any of $$X, p''p', p''', bp$$, all rules become inapplicable due to ob1, ob2, ob3 and hence the derivation is stuck.
Case 2: If we apply matrix p4 to $$w_3(n)$$, then the marker $$p''$$ is deleted with the (randomly inserted) marker $$p'''$$ on its left. This enforces the marker $$p'''$$ that was introduced randomly should have been placed on the left of $$p''$$. Hence, we now know that $$w_3(n)=\alpha p'''p''p \tau _n$$. So, the first rule of p4 guarantees the presence of $$p'''p''$$ as a substring and also deletes  $$p''$$. The second rule of p4 then introduces a  b (the intended b of the simulation rule $$X \rightarrow bY$$) to the right of $$p'''$$, thus yielding $$w_4(n) = \alpha p'''bp \tau _n$$. The matrix p4 actually simulates the rewriting rule $$p'' \rightarrow b$$.
The absence of X and $$p''$$ blocks the application of matrices p1 through p4. Now, either p5 or p6 is applicable. The order of application of the rules hereafter do not matter, as they are independent.
Case A: If we apply matrix p5 on $$w_4(n)$$, then we get $$w_5(n) = \alpha p'''bY \tau _n$$. The absence of $$X, p'', bp$$ enforces the application of matrix p6 which deletes the marker $$p'''$$, thus yielding $$w_6(n) = \alpha bY \tau _n$$. It is to be noted that matrix p6 need not be even applied after applying p5 and one can start the derivation based on some Y-rule, but of course p6 can be applied at any time. In particular, sentential forms that contain various $$r'''$$ for different rules r are possible but do not invalidate our arguments, as all these triple-primed markers have to and can be deleted in a terminal derivation. Notice that if indeed several occurrences of $$p'''$$ are present upon applying matrix  p4, there is clearly the danger of inserting b to the right of the wrong occurrence of $$p'''$$ by the second rule of  p4. However, a successful application of p5 enforces the correct occurrence of $$p'''$$ to be chosen by p4, again by observation  ob1.
Case B: If we first apply matrix p6 on $$w_4(n)$$, then we get $$w_5'(n) = \alpha bp \tau _n$$. The absence of $$X, p'', p'''$$ enforces the deterministic application of matrix p5 which yields $$w_6(n) = \alpha bY \tau _n$$.
Notice that should we “forget” to apply matrix p6, i.e., should $$p'''$$ stay in the string, then we cannot make mischievous use of that left-over occurrence of $$p'''$$ by applying the second rule of p4 to it, because then there would not be any b left to Y as required by matrix p5.
After applying both matrices, we get $$w_6(n) = \alpha bY\tau _n$$. Since no markers $$p', p$$ in $$\tau _n$$ can be deleted hereafter, this enforces that the matrix p1 was applied exactly once, i.e., $$n=1$$, and hence $$\tau _n=\lambda$$.
The singleton rule shown in Fig.  8b clearly simulate the rule $$r: X \rightarrow \lambda$$. The above arguments show the first inclusion $$\mathrm {REG} \subseteq \mathrm {MAT}(2;1,1,0;1,1,0)$$. The second inclusion result follows by Proposition  2. The strictness of the inclusions follow by Example 1. These considerations complete the proof. $$\square$$

## 6 Conclusions and further research directions

In this paper, using matrix ins-del systems, we have obtained some (improved) computational completeness results and described the regular closure of linear languages with small resource needs. It is interesting to note that if one could describe linear languages by a matrix insertion-deletion system of size s, then with the same size s, we could describe the regular closure of linear languages, as well. We have also given a complete picture of the state of the art of the generative power of the matrix ins-del systems with sum-norm 3 or 4 in Table  1. Finally, we believe that the normal form tsSGNF that we introduced offers some features that could be used in other computational completeness proofs. In particular, no substrings with nonterminals to the right of terminals are derivable in this normal form.
Further to some open problems mentioned in the introduction, We now present some further concrete research questions.
• It would be interesting to explore closure properties for matrix ins-del systems of small sizes. For instance, is the family $$\mathrm {MAT}(2;2,1,0;1,0,0)$$ closed under reversal? If this were true, then $$\mathrm {MAT}(2;2,1,0;1,0,0) = \mathrm {MAT}(2;2,0,1;1,0,0)$$, which would also mean that the statement of Corollary  1 could be simplified.
• Do matrix ins-del systems of small sum-norm allow for efficient parsing? We are not aware of any research in this direction. Also this area seems to be largely neglected, although it is clear that this is of much importance if it comes to finally applying these generative devices in language processing.
• It has been argued in other places that ins-del systems could form the basis of biocomputing devices. Insertion and deletions would form the basic operations for such machines. Then the question arises how to program such devices. Following the paradigm of imperative programming, the most basic way of building programs would be to design program fragments consisting of basic operations that should be performed one after the other. This is exactly what matrix grammars can do. One future challenge would be to devise implementations that are based on these basic commands and their sequential execution.
• In a recent paper (Vu and Fernau 2021), Vu and Fernau studied matrix grammars with insertions, deletions and substitutions, a third operation whose relevance to biocomputing is explained in Beaver ( 1995) and Kari ( 1997), allowing for further restrictions on the resources studied in this paper. In this context, questions similar to the previous item arise.

## Acknowledgements

We are grateful to the referees of this paper for spotting some errors in one of the results of the previous version of the paper.

## Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Footnotes
1
In papers like (Krassovitskiy et al. 2011), our sum-norm is termed as total size. We avoid the latter terminology just to avoid its confusion with size s of MAT( s).

2
One should also keep in mind the slightly different meanings of the parameter lists.

3
The technical condition on MAT ins-del systems is not that severe, as we can always take a new start symbol and first generate any finite set with the resources at hand.

4
There is one subtlety with the case when $$\lambda \in L(G)$$: in that case, $$\lambda$$ should be added as an axiom of $$\Gamma$$.

Literatur
Über diesen Artikel

Zur Ausgabe