In this section, we characterize the termination of the semi-oblivious chase for sticky sets of TGDs via a graph-based condition. More precisely, we show that a set
\({\varSigma } \in \mathbb {S}\) belongs to
\({\mathbb {CT}}_{\forall }^{\mathsf {so}}\) iff a linearized version of it, i.e., a set of linear TGD obtained from
Σ, enjoys a condition inspired by weak-acyclicity [
17], called critical-weak-acyclicity, introduced in [
8]. Recall that linear TGDs are TGDs with only one body atom [
12]; we write
\({\mathbb {L}}\) for the class of linear TGDs. The proof of the above result boils down to showing that the given sticky set
Σ of TGDs can be rewritten into a set of linear TGDs, while this rewriting preserves the termination of the semi-oblivious chase. The latter heavily relies on Theorem 3, which establishes that non-termination of the semi-oblivious chase coincides with the existence of a linear infinite chase derivation of
cr(
Σ) w.r.t.
Σ. We can then apply the characterization for the termination of the semi-oblivious chase for linear TGDs from [
8], which states that a set
\({\varSigma } \in {\mathbb {L}}\) belongs to
\({{\mathbb {CT}}_{\forall }^{\mathsf {so}}}\) iff it is critically-weakly-acyclic. Let us first recall the notion of critical-weak-acyclicity for linear TGDs, which has been originally introduced in [
8].
5.1 Critically-Weakly-Acyclic Linear TGDs
Since critical-weak-acyclicity is inspired by weak-acyclicity, it is not surprising that it relies on the dependency graph of a set of TGDs introduced in [
17], that encodes how terms might be propagated during the chase. We assume a fixed ordering on the head-atoms of TGDs.
8 For a TGD
σ with
head(
σ) =
α1,…,
αk, we write (
σ,
i) for the TGD that has only one atom in its head, called
single-head, obtained from
σ by keeping only the atom
αi, and the existentially quantified variables in
αi. Recall that
pos(
α,
x) is the set of positions in
α at which
x occurs, while
pos(
body(
σ),
x) is the set of positions at which
x occurs in the body of
σ. We also write
pos(
sch(
Σ)) for the set of positions of
sch(
Σ), i.e., the set {(
R,
i)∣
R/
n ∈
sch(
Σ) and
i ∈ [
n]}.
Intuitively speaking, a normal edge \((\pi ,\pi ^{\prime })\) encodes the fact that a term may propagate from π to \(\pi ^{\prime }\) during the chase. Moreover, a special edge \((\pi ,\pi ^{\prime \prime })\) keeps track of the fact that the propagation of a term from π to \(\pi ^{\prime }\) also creates a new null at position \(\pi ^{\prime \prime }\). A simple example that illustrates the notion of the dependency graph follows:
The class of weakly-acyclic sets of TGDs is a well-known formalism, introduced in the context of data exchange, that guarantees the termination of the semi-oblivious chase [
17].
9 Formally, a set
Σ is
weakly-acyclic if there is no cycle in
dg(Σ) that contains a special edge. It would be extremely useful if, whenever we focus on linear TGDs, weak-acyclicity is also a necessary condition for the termination of the semi-oblivious chase. Unfortunately, this is not the case. A simple counterexample follows:
Interestingly, as it has been shown in [
8], there is an extension of weak-acyclicity, called critical-weak-acyclicity, that, whenever we focus on linear TGDs, provides a necessary and sufficient condition for the termination of the semi-oblivious chase. A key notion underlying critical-weak-acyclicity is the notion of compatibility among two single-head linear TGDs. Intuitively, if a single-head linear TGD
σ1 is compatible with a single-head linear TGD
σ2, then the atom obtained during the chase by applying
σ1 may trigger
σ2. To formally define the notion of compatibility, we first need to recall the standard notion of unification among atoms.
We say that two atoms
α and
β (containing only variables from
V)
unify if there exists a substitution
γ from the variables occurring in
α and
β to
V, called
unifier for
α and
β, such that
γ(
α) =
γ(
β). A
most general unifier (MGU) for
α and
β is a unifier for
α and
β, denoted
mgu(
α,
β), such that, for each other unifier
γ for
α and
β, there exists a substitution
\(\gamma ^{\prime }\) such that
\(\gamma = \gamma ^{\prime } \circ {\mathsf {mgu}({\alpha },{\beta })}\). It is well-known that if two atoms
α and
β unify, then they have an MGU that is unique (modulo variable renaming), and thus, we can refer to
the MGU for
α and
β [
1]. For brevity, given a TGD
σ and a variable
x, let
\({\varPi }_{x}^{\sigma } = \mathsf {pos}(\mathsf {body}(\sigma ),x)\). Let
var(
α,
π), where
α is an atom, and
π a set of positions, be the set of variables in
α at positions of
π.
Having the notion of compatibility among two single-head linear TGDs in place, we can recall the notion of resolvent of a sequence σ1,…,σn of single-head linear TGDs, which is in turn a single-head TGD. Roughly, such a resolvent mimics the behavior of the sequence σ1,…,σn during the chase. Notice that the existence of such a resolvent is not guaranteed, but if it exists, this implies that we may have a sequence of trigger applications that involve the TGDs σ1,…,σn in this order. In such a case, we call the sequence σ1,…,σn active.
At this point, one may think that the right extension of weak-acyclicity, which will provide a necessary condition for the termination of the semi-oblivious chase under linear TGDs, is to allow cycles with special edges in the underlying dependency graph as long as the corresponding sequence of single-head TGDs, which can be extracted from the edge labels, is not active. However, as thoroughly discussed in [
8], this is not enough. If a cycle with a special edge is labeled with an active sequence, then we can conclude that it will be traversed at least once during the chase. Nevertheless, it is not guaranteed that it will be traversed infinitely many times. A cycle that is labeled with an active sequence
σ1,…,
σn, and contains a special edge, will be certainly traversed infinitely many times if the resolvent of the sequence
ρ,…,
ρ of length
k, where
ρ = [
σ1,…,
σn], exists, for every
k > 0. Interestingly, for ensuring the latter condition, it suffices to consider sequences of length at most (
ω + 1), where
ω is the arity of the predicate of
body(
σ1). This brings us to critical sequences. For brevity, we write
σk for the sequence
σ,…,
σ of length
k.
We can now recall critical-weak-acyclicity as defined in [
8]. It is essentially weak-acyclicity, with the key difference that a cycle in the underlying graph is “dangerous”, not only if it contains a special edge, but if it is also labeled with a critical sequence of single-head linear TGDs. The formal definition follows.
The essence of critical-weak-acyclicity is revealed by the following result:
5.2 From Sticky to Linear TGDs
Before presenting the linearization procedure, we need to introduce some auxiliary notions. Given a tuple
\(\bar {t} = (t_{1},\ldots ,t_{n}) \in ({\mathbf {V}} \cup \{\natural \})^{n}\), we write
\({\mathsf {shape}(\bar {t})}\) for the tuple obtained from
\({\bar {t}}\) by replacing each variable of
V with the special symbol ∗. For example,
shape((
x,
y,
♮,
z,
x,
♮)) = (∗,∗,
♮,∗,∗,
♮). We also write
\(\mathsf {svar}(\bar {t})\) for the tuple obtained from
\(\bar {t}\) by removing all the occurrences of the constant
♮. For example,
svar((
x,
y,
♮,
z,
x,
♮)) = (
x,
y,
z,
x). For an atom
\(\alpha = R(\bar {t})\), let
$$ \natural\text{-}\mathsf{free}(\alpha)\ =\ R_{\mathsf{shape}(\bar{t})}(\mathsf{svar}(\bar{t})), $$
where
\(R_{\mathsf {shape}(\bar {t})}\) is of arity
\(|\mathsf {svar}(\bar {t})|\). In fact,
♮-
free(
α) is the constant-free version of
α, while the subscript
\(\mathsf {shape}(\bar {t})\) keeps track of the original shape of
α, i.e., where each occurrence of
♮ occurs in
α. Notice that, having the constant-free version of an atom
α in place, we can unambiguously write down
α. For a set of atoms
A, let
$$ \natural\text{-}\mathsf{free}(A)\ =\ \{\natural\text{-}\mathsf{free}(\alpha) \mid \alpha \in A\}. $$
Given a TGD
σ and an atom
α ∈
body(
σ), let
$$ V_{\alpha,\sigma}\ =\ \mathsf{var}(\mathsf{body}(\sigma) \setminus \{\alpha\}), $$
that is, the set of body variables of
σ that do not occur only in
α. Now, given a TGD
σ, and an atom
α ∈
body(
σ), let
$$ M_{\alpha,\sigma}\ =\ \{h : T {\rightarrow} \{\natural\} \mid V_{\alpha,\sigma} \subseteq T \subseteq \mathsf{var}(\mathsf{body}(\sigma))\}, $$
i.e., for each subset
T of
var(
body(
σ)) that contains all the variables of
Vα, σ,
Mα, σ contains a mapping that maps each variable of
T to the special constant
♮.
We are now ready to introduce the linearization of a set of TGDs. Note that the following definition talks about arbitrary TGDs. The notion of stickiness is used later for showing that the termination of the semi-oblivious chase is preserved.
The linearization procedure converts a TGD σ into a set of (constant-free) linear TGDs, where the body atom of each such linear TGD corresponds to an atom α of body(σ), while the variables in body(σ) ∖{α}, and possibly additional variables that occur only in α, are instantiated with the special constant ♮. An example follows:
Lemma 3 allows us to show that this procedure preserves the termination of the semi-oblivious chase whenever the input set of TGDs is sticky.
Proof
We assume, w.l.o.g., that Σ is in normal form, i.e., each TGD of Σ has only one atom in its head. Given a TGD \(\sigma = \phi (\bar x, \bar y) {\rightarrow } \exists \bar z\ R(\bar x,\bar z)\) of Σ, an atom α ∈body(σ), and a mapping h ∈ Mα, σ, we write σα, h for the linear TGD \({\natural \text {-}\mathsf {free}(h(\alpha ))} {\rightarrow } \exists \bar z\ {\natural \text {-}\mathsf {free}({h(R(\bar x, \bar z))})} \in {\mathsf {Lin}({\varSigma })}\). We are now ready to proceed with the proof.
(⇒) Assume that \(\mathsf {Lin}({\varSigma }) \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). By Corollary 1, there exists an infinite so-chase derivation δ = (Ii)i≥ 0 of cr(Lin(Σ)) w.r.t. Lin(Σ), with \(I_{i} {\langle {\sigma ^{\alpha _{i},g_{i}}_{i},h_{i}} \rangle } I_{i+1}\), or, equivalently, \(I_{i+1} = I_{i} \cup {\mathsf {so}}\text {-}{\mathsf {result}({\sigma ^{\alpha _{i},g_{i}}_{i}},{h_{i}})}\), for each i ≥ 0. By construction, for each i ≥ 0, the TGD \(\sigma ^{\alpha _{i},g_{i}}_{i}\) is linear, with its body being an atom of the form \(R_{\bar t}(\bar x)\), where \(\bar x\) is a tuple of variables, and \(\bar t\) is a tuple over {∗,♮}. Let \(\hat {h}_{i}\) be the extension of hi that maps each variable in body(σi) that is not in \(\bar {x}\) to ♮.
Consider the infinite sequence of instances \(\delta ^{\prime } = (J_{i})_{i \ge 0}\), where J0 = cr(Σ), and \(J_{i+1} = J_{i} \cup \mathsf {so}\text {-}{\mathsf {result}({\sigma _{i}},{\hat {h}_{i}})}\), for each i ≥ 0. By construction of \(\hat {h}_{i}\), for each i ≥ 0, \((\sigma _{i},\hat {h}_{i})\) is a trigger for Σ on Ji, which implies that \(J_{i}{\langle {{\mathsf {so}},\sigma _{i},\hat {h}_{i}} \rangle }J_{i+1}\). However, the above sequence is not necessarily an infinite so-chase derivation of Σ w.r.t. cr(Σ) since some of the involved triggers may not be semi-obliviously active. We therefore consider the infinite sequence of instances \(\delta ^{\prime \prime } = (J^{\prime }_{i})_{i \geq 0}\) obtained from (Ji)i≥ 0 by simply removing the triggers that are not semi-obliviously active, or, more formally, by removing the instances obtained from triggers that are not semi-obliviously active. Recall that, for each i > 0, Ji is obtained via a trigger that is not semi-obliviously active iff Ji = Ji− 1. It remains to show that \(\delta ^{\prime \prime }\) is infinite, which in turn implies that \({\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). It suffices to show that, for each i ≥ 0, Ji occurs in \(\delta ^{\prime }\) finitely many times.
By contradiction, assume that there is k ≥ 0 such that Jk occurs in \(\delta ^{\prime }\) infinitely many times, which means that Jk = Jk+ 1 = Jk+ 2 = ⋯. Since Lin(Σ) is finite, there are indices k ≤ j < i such that \(\sigma ^{\alpha _{i},g_{i}}_{i} = \sigma ^{\alpha _{j},g_{j}}_{j}\); we refer to those TGDs as ρ. Since (Ii)i≥ 0 is a so-chase derivation of Lin(Σ) w.r.t. cr(Lin(Σ)), there is a variable x ∈fr(ρ) such that hi(x)≠hj(x). By definition, \(h_{i}(x) = \hat {h}_{i}(x)\) and \(h_{j}(x) = \hat {h}_{j}(x)\). This implies that \(\hat {h}_{i}(x) \neq \hat {h}_{j}(x)\), and therefore, Ji≠Jj, which is a contradiction.
(⇐) Assume now that
\({\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). By Theorem 3, there exists a linear infinite
so-chase derivation
δ = (
Ii)
i≥ 0 of
cr(
Σ) w.r.t.
Σ, with
Ii〈
σi,
hi〉
Ii+ 1, for
i ≥ 0. In other words, there exists an infinite sequence of distinct atoms (
αi)
i≥ 0 such that
2.
for each i ≥ 0, αi+ 1 ∈ Ii+ 1 ∖ Ii, and there exists an atom βi ∈body(σi) such that αi = hi(βi) and \(h_{i}(\mathsf {body}({\sigma _{i}}) \setminus \{\beta _{i}\}) \subseteq {\mathsf {cr}({\varSigma })}\).
For every
i ≥ 0, let
Xi =
var(
body(
σi) ∖{
βi}),
\(g_{i} = h_{i | X_{i}}\), and
fi =
hi ∖
gi. Note that, since
δ is linear, for every
x ∈
Xi,
gi(
x) =
♮. Assuming that
σi is of the form
\(\phi _{i}(\bar x_{i}, \bar y_{i}) {\rightarrow } \exists \bar z_{i}\ R_{i}(\bar x_{i},\bar z_{i})\), let
$$ {\varSigma}^{\prime}\ =\ \left\{\underbrace{\natural\text{-}\mathsf{free}(g_{i}(\beta_{i})) {\rightarrow} \exists \bar z_{i} \natural\text{-}\mathsf{free}(g_{i}(R_{i}(\bar x_{i},\bar z_{i})))}_{\sigma^{\prime}_{i}}\right\}_{i \geq 0}. $$
By construction,
\({\varSigma }^{\prime } \subseteq \mathsf {Lin}({\varSigma })\), and therefore,
\({\varSigma }^{\prime } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\) implies
\(\mathsf {Lin}({\varSigma }) \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). Hence, to conclude the proof of Lemma 3, it suffices to exhibit an infinite
so-chase derivation of
\({\mathsf {cr}({\varSigma }^{\prime })} \subseteq {\mathsf {cr}({\varSigma })}\) w.r.t.
\({\varSigma }^{\prime }\). To this end, consider the infinite sequence of instances
\(\delta ^{\prime } = (J_{i})_{i \ge 0}\), where
\(J_{0} = \mathsf {cr}({\varSigma }^{\prime })\), and
\(J_{i} = J_{i-1} \cup \{\alpha ^{\prime }_{i}\}\), where
\(\alpha ^{\prime }_{i} = f_{i}({\mathsf {body}({{\sigma }_{i}^{\prime }})}) = f_{i}({\natural \text {-}\mathsf {free}(g_{i}(\beta _{i}))})\), for each
i > 0. It is not difficult to verify that
\(\delta ^{\prime }\) is an infinite
so-chase derivation of
\({\mathsf {cr}({\varSigma }^{\prime })}\) w.r.t.
\({\varSigma }^{\prime }\) (modulo null renaming). □
The main result of this section follows from Theorem 4 and Lemma 3:
We would like to remark that the above characterization of the termination of the
so-chase is different than the one presented in the conference paper [
10] (see Corollary 24). More precisely, the linearization procedure in [
10] produces TGDs
with constants, and thus, we had to properly extend the notion of critical-weak-acyclicity from [
8], which was defined only for constant-free TGDs. It turned out that we can directly use the notion of critical-weak-acyclicity as defined in [
8], at the price of a slightly more complex linearization procedure that produces constant-free TGDs.