Skip to main content
Erschienen in: Theory of Computing Systems 1/2021

Open Access 17.08.2020

Semi-Oblivious Chase Termination: The Sticky Case

verfasst von: Marco Calautti, Andreas Pieris

Erschienen in: Theory of Computing Systems | Ausgabe 1/2021

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The chase procedure is a fundamental algorithmic tool in database theory with a variety of applications. A key problem concerning the chase procedure is all-instances termination: for a given set of tuple-generating dependencies (TGDs), is it the case that the chase terminates for every input database? In view of the fact that this problem is undecidable, it is natural to ask whether known well-behaved classes of TGDs, introduced in different contexts such as ontological reasoning, ensure decidability. We consider a prominent paradigm that led to a robust TGD-based formalism, called stickiness. We show that for sticky sets of TGDs, all-instances chase termination is decidable if we focus on the (semi-)oblivious chase, and we pinpoint its exact complexity: PSpace-complete in general, and NLogSpace-complete for predicates of bounded arity. These complexity results are obtained via a graph-based syntactic characterization of chase termination that is of independent interest.
Hinweise
This article belongs to the Topical Collection: Special Issue on Database Theory (ICDT 2019)
Guest Editor: Pablo Barceló

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The chase procedure (or simply chase) is a fundamental algorithmic tool that has been successfully applied to several database problems such as containment of queries under constraints [2], checking logical implication of constraints [5, 27], computing data exchange solutions [17], and query answering under constraints [11], to name a few. The chase procedure takes as input a database D and a set Σ of constraints, which, for this work, are tuple-generating dependencies (TGDs) of the form
$$ \forall \bar x \forall \bar y (\phi(\bar x,\bar y)\ {\rightarrow}\ \exists \bar z \psi(\bar x,\bar z)), $$
where ϕ (the body) and ψ (the head) are conjunctions of relational atoms, and it produces an instance DΣ that is a universal model of D and Σ, i.e., a model that can be homomorphically embedded into every other model of D and Σ. Somehow DΣ acts as a representative of all the models of D and Σ. This is the reason for the ubiquity of the chase in database theory, as discussed in [15]. Indeed, many database problems can be solved by simply exhibiting a universal model.

1.1 The Chase in a Nutshell

Roughly, the chase adds new tuples to the database D (possibly involving null values that act as witnesses for the existentially quantified variables), as dictated by the TGDs of Σ, and it keeps doing this until all the TGDs of Σ are satisfied. There are, in principle, three different ways for formalizing this simple idea, which lead to different versions of the chase procedure:
Oblivious Chase
The first one, which gives rise to the oblivious chase, is as follows: for each pair \((\bar t,\bar u)\) of tuples of terms from the instance I constructed so far, apply a TGD σ of the form \(\forall \bar x \forall \bar y \left (\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)\right )\) if \(\phi (\bar t,\bar u) \subseteq I\), and σ has not been applied in a previous chase step due to the same pair \((\bar t,\bar u)\), and add to I the set of atoms \(\psi (\bar t,\bar v)\), where \(\bar v\) is a tuple of new nulls not occurring in I.
Semi-oblivious Chase
The second one, which is a refinement of the oblivious chase, and it gives rise to the semi-oblivious chase, is as follows: for each pair \((\bar t,\bar u)\) of tuples of terms from the instance I constructed so far, apply a TGD σ of the form \(\forall \bar x \forall \bar y \left (\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)\right )\) if \(\phi (\bar t,\bar u) \subseteq I\), and σ has not been applied in a previous chase step due to a pair of tuples \((\bar t,\bar u^{\prime })\) with \(\phi (\bar t,\bar u^{\prime }) \subseteq I\), where \(\bar u\) and \(\bar u^{\prime }\) might be different, and add to I the set of atoms \(\psi (\bar t,\bar v)\), where \(\bar v\) is a tuple of new nulls not in I. In other words, a TGD σ of the above form is applied only once due to a certain witness \(\bar t\) for the variables \(\bar x\) that appear both in ϕ and ψ.
Restricted Chase
The third one, which is a refinement of the (semi-)oblivious chase that leads to the restricted (a.k.a. standard) chase, is as follows: for each pair \((\bar t,\bar u)\) of tuples of terms from the instance I constructed so far, apply a TGD σ of the form \(\forall \bar x \forall \bar y \left (\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)\right )\) if \(\phi (\bar t,\bar u) \subseteq I\), and there is no tuple \(\bar u^{\prime }\) of terms from I such that \(\psi (\bar t,\bar u^{\prime }) \subseteq I\), i.e., the TGD is not already satisfied, and add to I the set of atoms \(\psi (\bar t,\bar v)\), where \(\bar v\) is a tuple of new nulls not in I.
Thus, the key difference between the (semi-)oblivious and restricted versions of the chase is that the former apply a TGD whenever the body is satisfied, while the latter applies a TGD if the body is satisfied but the head is not.

1.2 Restricted vs. (Semi-)Oblivious Chase

It should not be difficult to verify that the restricted chase, in general, builds smaller instances than the (semi-)oblivious one. In fact, it is easy to devise an example where, according to the restricted chase, none of the TGDs should be applied, while the (semi-)oblivious chase builds an infinite instance.
Example 1
Consider the database D = {R(a, a)} and the TGD
$$ \forall x \forall y (R(x,y)\ {\rightarrow}\ \exists z R(z,x)). $$
The restricted chase will detect that the database already satisfies the TGD, while the (semi-)oblivious chase will build the infinite instance
$$ \{R(a,a),R(\bot_{1},a),R(\bot_{2},\bot_{1}),R(\bot_{3},\bot_{2}),\ldots\}, $$
where ⊥1,⊥2,⊥3,… are (labeled) nulls.
It should be also clear that the semi-oblivious chase, in general, builds smaller instances than the oblivious one. This is illustrated by the following example.
Example 2
Consider the database D = {R(a, a)} and the TGD
$$ \forall x \forall y (R(x,y)\ {\rightarrow}\ \exists z R(x,z)), $$
The semi-oblivious chase will build the instance {R(a, a),R(a,⊥)}, where ⊥ is a null, whereas the oblivious chase will build the infinite instance
$$ \{R(a,a),R(a,\bot_{1}),R(a,\bot_{2}),R(a,\bot_{3}),\ldots\}, $$
where ⊥1,⊥2,⊥3,… are nulls.
The restricted chase has a clear advantage over the (semi-)oblivious chase when it comes to the size of the final result. But, of course, this advantage does not come for free: at each application, the restricted chase has to check that there is no way to satisfy the head of the TGD at hand, and this can be computationally costly in practice. On the other hand, the advantage of the semi-oblivious chase over the oblivious chase comes without any additional overhead since both versions have to keep track of the TGDs and pairs of tuples that have been considered so far.
It has been recently observed that for RAM-based implementations the restricted chase is the indicated approach since the benefit from producing smaller instances justifies the additional effort for checking whether a TGD is already satisfied; see, e.g., [6, 22]. However, as discussed in [6], an RDBMS-based implementation of the restricted chase is quite challenging, whereas an efficient implementation of the semi-oblivious chase is feasible. Hence, both the semi-oblivious and restricted versions of the chase are relevant algorithmic tools for practical implementations, whereas the oblivious version of the chase is mostly of theoretical interest.

1.3 The Challenge of Non-Termination

There are indeed efficient implementations of the semi-oblivious and restricted chase that allow us to solve central database problems by adopting a materialization-based approach [6, 22, 30, 33]. Nevertheless, for this to be feasible in practice we need a guarantee that the chase terminates, which is not always the case. This fact motivated a long line of research on identifying classes of TGDs that ensure the termination of the semi-oblivious and/or restricted chase, no matter how the input database looks like. A prime example is the class of weakly-acyclic sets of TGDs [17], which has been introduced in the context of data exchange, and guarantees the termination of both the semi-oblivious and restricted chase. Many other termination criteria can be found in the literature; see, e.g., [4, 9, 1416, 21, 23, 28, 29].
With so much effort spent on identifying sufficient conditions for the termination of the chase, the question that immediately comes up is whether a sufficient condition that is also necessary exists. In other words, given a set Σ of TGDs, is it possible to decide whether, for every database D, the semi-oblivious or the restricted chase on D and Σ terminates? The answer is negative, no matter which version of the chase we consider [18]; this is actually true even for the oblivious version of the chase. The problem remains undecidable even if the database is known; this has been established in [15] for the restricted chase, and it was observed in [28] that the same proof shows undecidability also for the (semi-)oblivious chase.

1.4 Deciding the Termination of the Chase

The undecidability proof given in [18] constructs a sophisticated set of TGDs that goes beyond existing well-behaved classes of TGDs that enjoy certain syntactic properties, which in turn ensure useful model-theoretic properties. Such well-behaved classes of TGDs have been proposed in the context of ontological reasoning. The two main paradigms that led to robust TGD-based formalisms, without forcing the chase to terminate, are guardedness [4, 11, 12] and stickiness [13]:
Guardedness
A TGD is guarded if its body has an atom that contains (or “guards”) all the universally quantified variables. This condition has been inspired by the guarded fragment of first-order logic, and is powerful enough to capture important Description Logics (DLs) such as the members of the \(\mathcal {E}{\mathscr{L}}\) family. The key model-theoretic property of the class of guarded TGDs, which explains its robust behaviour, is the existence of tree-like universal models [11].
Stickiness
On the other hand, sticky sets of TGDs are powerful enough to model interesting statements that are inherently non-tree-like, and thus, not expressible via guarded TGDs. Such a statement, for example, consists of the TGDs
$$ \forall x \forall y (R(x,y)\ {\rightarrow}\ \exists z R(y,z) \wedge P(z)) \qquad \forall x \forall y (P(x) \wedge P(y)\ {\rightarrow}\ S(x,y)), $$
which compute the cartesian product of an infinite unary relation, a useful modeling feature that, in DL parlance, is known as concept product [32].
The fact that the set of TGDs constructed in the undecidability proof of [18] is neither guarded nor sticky raised the following question: is the semi-oblivious/restricted chase termination problem decidable for guarded or sticky sets of TGDs? The current state of affairs concerning this central question is as follows:
  • For the semi-oblivious chase and guarded TGDs the problem is well-understood: it is 2ExpTime-complete in general, and ExpTime-complete for predicates of bounded arity [8]. The same paper [8] considered also linear TGDs, i.e., TGDs with only one body atom, which form a central subclass of guarded TGDs: the problem becomes PSpace-complete, and NLogSpace-complete for predicates of bounded arity. An alternative proof for linear TGDs, which relies on derivation trees, a notion that was originally introduced in the context of ontological query answering [3], has been recently proposed in [24].
  • For the restricted chase and guarded TGDs, it has been recently shown, by exploiting Monadic-Second Order Logic over infinite trees of bounded degree, that the problem is decidable in elementary time assuming only one atom in the head, whereas the case of more than one atoms in the head remains a challenging open problem [19]. The case of linear TGDs with only one atom in the head has been explicitly considered in [24], where the decidability of the problem in question has been shown by relying on derivation trees.
  • Finally, for the restricted chase and sticky sets of TGDs, it has been recently shown, by exploiting Büchi Automata, that the problem is decidable in elementary time assuming only one atom in the head, whereas the case of more than one atoms in the head remains a challenging open problem [19].
Towards completing the picture concerning the chase termination problem, in this work we concentrate on the semi-oblivious chase and sticky sets of TGDs, and provide precise complexity results: PSpace-complete in general, and NLogSpace-complete for predicates of bounded arity. Our results apply also to the oblivious chase that, although is not very useful for practical purposes, it is a relevant technical tool due to its simplicity; for a discussion on the usefulness of the oblivious chase see [11].

1.5 Summary of Contributions

Our results can be summarized as follows:
  • In Section 4, we provide a semantic characterization of non-termination of the semi-oblivious chase under sticky sets of TGDs via the existence of “path-like” infinite chase derivations, which forms the basis for our decision procedure.
  • By exploiting the above semantic characterization, we then provide, in Section 5, a syntactic characterization of semi-oblivious chase termination via a graph-based condition. To this end, we exploit a recent syntactic characterization from [8] of the termination of the semi-oblivious chase under linear TGDs.
  • In Section 6, by using the graph-based syntactic characterization from the previous section, we establish the desired complexity upper bounds for our problem: PSpace in general, and NLogSpace for predicates of bounded arity. We finally establish matching lower bounds. The PSpace-hardness is obtained by simulating the behaviour of a polynomial space Turing machine, while the NLogSpace-hardness via a reduction from graph reachability.

2 Preliminaries

We consider the mutually disjoint countably infinite sets C, N, and V of constants, (labeled) nulls, and variables, respectively. We refer to constants, nulls and variables as terms. For an integer n > 0, we may write [n] for the set {1,…,n}.
Relational Databases
A schema S is a finite set of relation symbols (or predicates) with associated arity. We write R/n to denote that R has arity n ≥ 0; we may also write ar(R) for the integer n. A (predicate) position of S is a pair (R, i), where R/nS and i ∈ [n], that essentially identifies the i-th argument of R. We write pos(S) for the set of positions of S, that is, the set {(R, i)∣R/nS and i ∈ [n]}. An atom over S is an expression of the form \(R(\bar t)\), where R/nS and \(\bar t\) is an n-tuple of terms. A fact is an atom whose arguments consist only of constants. An (atom) position of \(R(\bar t)\) is a pair \((R(\bar t),i)\), where i ∈ [ar(R)], that essentially identifies the predicate position (R, i) in \(R(\bar t)\). We write \(R(\bar t)[i]\) for the term occurring at position \((R(\bar t),i)\). Moreover, for a variable x in \(\bar t\), \(\mathsf {pos}(R(\bar t), x)\) is the set of predicate positions \(\{(R,i) \mid R(\bar t)[i] = x\}\). We write \({\mathsf {var}({R(\bar t)})}\) for the set of variables in \(\bar t\). The notations pos(⋅,x) and var(⋅) extend to sets of atoms. An instance over S is a (possibly infinite) set of atoms over S with constants and nulls. A database over S is a finite set of facts over S. The active domain of an instance I, denoted dom(I), is the set of terms in I.
Substitutions and Homomorphisms
A substitution from a set of terms T to a set of terms \(T^{\prime }\) is a function \(h : T {\rightarrow } T^{\prime }\). Henceforth, we treat a substitution h as the set of mappings {th(t)∣tT}. The restriction of h to a subset S of T, denoted h|S, is the substitution {th(t)∣tS}. A homomorphism from a set of atoms A to a set of atoms B is a substitution h from the set of terms in A to the set of terms in B such that (i) tC implies h(t) = t, i.e., h is the identity on C, and (ii) R(t1,…,tn) ∈ A implies h(R(t1,…,tn)) = R(h(t1),…,h(tn)) ∈ B.
Tuple-Generating Dependencies
A tuple-generating dependency (TGD) σ is a first-order sentence (without constants) of the form
$$ \forall \bar x \forall \bar y(\phi(\bar x,\bar y)\ {\rightarrow}\ \exists \bar z \psi(\bar x,\bar z)), $$
where \(\bar x, \bar y\) and \(\bar z\) are mutually disjoint tuples of variables of V, while \(\phi (\bar x,\bar y)\) and \(\psi (\bar x,\bar z)\) are conjunctions of atoms. For brevity, we write σ as \(\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)\), and use comma instead of ∧ for joining atoms. We refer to \(\phi (\bar x,\bar y)\) and \(\psi (\bar x,\bar z)\) as the body and head of σ, denoted body(σ) and head(σ), respectively. The frontier of the TGD σ, denoted fr(σ), is the set of variables \(\bar x\), i.e., the variables that appear both in the body and the head of σ. Note that, by abuse of notation, we may treat a tuple of variables as a set of variables. The schema of a set Σ of TGDs, denoted sch(Σ), is the set of predicates occurring in Σ, and we write ar(Σ) for the maximum arity over all those predicates. An instance I satisfies a TGD σ as the one above, written Iσ, if the following holds: whenever there exists a homomorphism h from \(\phi (\bar x, \bar y)\) to I, then there exists \(h^{\prime } \supseteq h_{|\bar x}\) that is a homomorphism from \(\psi (\bar x,\bar z)\) to I. Note that, by abuse of notation, we sometimes treat a conjunction of atoms as a set of atoms. The instance I satisfies a set Σ of TGDs, written IΣ, if Iσ for each σΣ.
Stickiness
One of the main syntactic paradigms for TGDs is stickiness [13]. The key property underlying this condition is as follows: variables that appear more than once in the body of a TGD should be inductively propagated (or “stick”) to every head atom, which can be graphically illustrated as
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figa_HTML.png
where the first set of TGDs is sticky, while the second is not. The formal definition relies on an inductive procedure that marks the variables that may violate the above property. The base step marks a variable in the body of a TGD that does not occur in every head atom. Then, the marking is inductively propagated as follows
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figb_HTML.png
Stickiness requires a marked variable to appear only once in the body of a TGD.
Let us now formalize the above intuitive discussion. Consider a set Σ of TGDs; we assume, w.l.o.g., that the TGDs in Σ do not share variables. Let σΣ and x a variable in body(σ). We inductively define when x is marked in Σ:
  • If x does not occur in every atom of head(σ), then x is marked in Σ.
  • Assuming that head(σ) contains an atom of the form \(R(\bar t)\) and \(x \in \bar t\), if there exists \(\sigma ^{\prime } \in {\varSigma }\) that has in its body an atom of the form \(R(\bar t^{\prime })\), and each variable in \(R(\bar t^{\prime })\) at a position of \(\mathsf {pos}({R(\bar t)},{x})\) is marked in Σ, then x is marked in Σ.
The set Σ is sticky if there is no TGD whose body contains two occurrences of a variable that is marked in Σ. We denote by \(\mathbb {S}\) the family of all sticky finite sets of TGDs.1

3 The Chase Procedure

The chase procedure (or simply chase) accepts as an input a database D and a set Σ of TGDs, and constructs a (possibly infinite) instance that contains D and satisfies Σ. Central notions in this context are those of active trigger and trigger application, which are coming into two different variations, oblivious and semi-oblivious, which in turn give rise to the oblivious [11] and the semi-oblivious [20, 28] chase. The key difference between these two versions of the chase, lies at the adopted naming scheme for the newly generated null values, which are used as witnesses for the existentially quantified variables occurring in the head of a TGD.
Definition 1 (Trigger and Trigger Application)
A trigger for a set Σ of TGDs on an instance I is a pair (σ, h), where σΣ and h is a homomorphism from body(σ) to I. The oblivious result and semi-oblivious result of (σ, h), denoted o-result(σ, h) and so-result(σ, h), is the set of atoms μo(head(σ)) and μso(head(σ)), respectively, where \(\mu _{\mathsf {o}},\mu _{\mathsf {so}} : \mathsf {var}({\mathsf {head}({\sigma })}) {\rightarrow } {\mathbf {C}} \cup {\mathbf {N}}\) are defined as
$$ \begin{array}{@{}rcl@{}} \mu_{\mathsf{o}}(x)\ =\ \left\{ \begin{array}{ll} h(x) & \quad \text{if } x \in \mathsf{fr}(\sigma)\\ &\\ \bot_{\sigma,h}^{x} & \quad \text{otherwise} \end{array} \right. \qquad \mu_{\mathsf{so}}(x)\ =\ \left\{ \begin{array}{ll} h(x) & \quad \text{if } x \in \mathsf{fr}(\sigma)\\ &\\ \bot_{\sigma,h_{|\mathsf{fr}(\sigma)}}^{x} & \quad \text{otherwise} \end{array} \right. \end{array} $$
where \(\bot _{\sigma ,h}^{x}, \bot _{\sigma ,h_{|\mathsf {fr}(\sigma )}}^{x}\) are nulls from N. We call the trigger (σ, h) obliviously active if o-\(\mathsf {result}(\sigma ,h) \not \subseteq I\), and semi-obliviously active if so-\(\mathsf {result}(\sigma ,h) \not \subseteq I\). The oblivious application of (σ, h) to I returns the instance J = Io-result(σ, h), and is denoted as Io, σ, hJ. Analogously, the semi-oblivious application of (σ, h) to I returns the instance J = Iso-result(σ, h), and is denoted as Iso, σ, hJ.
Observe that in the definition of ⋆-result(σ, h), where ⋆ ∈{o, so}, each existentially quantified variable x occurring in head(σ) is mapped by μ to a “fresh” null value of N whose name is uniquely determined by the trigger (σ, h) and x itself. Therefore, for a trigger (σ, h), we can unambiguously write down the set of atoms ⋆-result(σ, h). In our analysis, it would be useful to be able to refer to the terms occurring in ⋆-result(σ, h) that have been propagated (not invented) during the application of (σ, h). Formally, the frontier of ⋆-result(σ, h), denoted fr(⋆-result(σ, h)), is the set of terms dom(h(body(σ))) ∩dom(⋆-result(σ, h)).
(Semi-)Oblivious Chase
The main idea of the chase is, starting from a database D, to exhaustively apply active triggers for the given set Σ of TGDs on the instance constructed so far. We simultaneously define oblivious and semi-oblivious chase derivations. To this end, we distinguish the two cases where a derivation is finite or not:
  • A finite sequence (Ii)0≤in of instances, with D = I0 and n ≥ 0, is an oblivious (resp., semi-oblivious) chase derivation of D w.r.t. Σ if, for each i ∈{0,…,n − 1}, there exists an obliviously (resp., semi-obliviously) active trigger (σ, h) for Σ on Ii such that Iio, σ, hIi+ 1 (resp., Iiso, σ, hIi+ 1), and no obliviously (resp., semi-obliviously) active trigger for Σ on In exists.
  • An infinite sequence (Ii)i≥ 0 of instances, with D = I0, is an oblivious (resp., semi-oblivious) chase derivation of D w.r.t. Σ if, for each i ≥ 0, there exists an obliviously (resp., semi-obliviously) active trigger (σ, h) for Σ on Ii such that Iio, σ, hIi+ 1 (resp., Iiso, σ, hIi+ 1). Moreover, (Ii)i≥ 0 is fair if, for each i ≥ 0, and for every obliviously (resp., semi-obliviously) active trigger (σ, h) for Σ on Ii, there exists j > i such that (σ, h) is not an obliviously (resp., semi-obliviously) active trigger for Σ on Ij. The latter is known as the fairness condition, and guarantees that all the active triggers will eventually be deactivated.
A (semi-)oblivious chase derivation is valid if it is finite, or infinite and fair. Infinite but unfair chase derivations are not valid since they do not serve the main purpose of the chase procedure, i.e., build an instance that satisfies the given TGDs. Henceforth, we write o-chase and so-chase for oblivious and semi-oblivious chase, respectively.
In general, due to the adopted naming scheme and the definition of active triggers, the semi-oblivious chase builds smaller instances than the oblivious one. This is because a trigger that is semi-obliviously active it is also obliviously active, but the other direction is not always true. This has been already illustrated by Example 2 in Section 1, which, for the sake of readability, we recall below.
Example 3
Consider the database D = {R(a, a)}, and the TGD
$$ \sigma\ =\ R(x,y)\ {\rightarrow}\ \exists z R(x,z). $$
It is easy to verify that the only o-chase derivation of D w.r.t. {σ} is infinite. On the other hand, the only so-chase derivation of D w.r.t. {σ} is
$$ \{R(a,a)\}, \left\{R(a,a),R\left( a,\bot^{z}_{\sigma,\{x \mapsto a\}}\right)\right\}, $$
which is, of course, finite. Indeed, if we apply again the TGD σ, then we will obtain the atom \(R\left (a,\bot ^{z}_{\sigma ,\{x \mapsto a\}}\right )\), which is already present.
Chase Relation
A useful notion that we are going to use in our proofs is the so-called chase relation [13], which essentially describes how the atoms generated during the chase depend on each other. Consider a ⋆-chase derivation δ = (Ii)i≥ 0, where ⋆ ∈{o, so}, of a database D w.r.t. a set Σ of TGDs, and assume that for each i ≥ 0, Ii〈⋆,σi,hiIi+ 1, which means that Ii+ 1 = Ii ∪ ⋆-result(σi,hi). The chase relation of δ, denoted ≺δ, is a binary relation over \(\bigcup _{i \geq 0} I_{i}\) such that αδβ iff there exists i ≥ 0 such that αhi(body(σi)) and βIi+ 1Ii. Notice that the relation ≺δ is acyclic, or, in other words, it forms a directed acyclic graph over \(\bigcup _{i \geq 0} I_{i}\).

3.1 Chase Termination Problem

It is known that due to the existentially quantified variables, a valid ⋆-chase derivation, where ⋆ ∈{o, so}, may be infinite. This is true even for very simple settings: it is easy to verify that the only ⋆-chase derivation of D = {R(a, b)} w.r.t. the set Σ consisting of the single TGD \(R(x,y) {\rightarrow } \exists z R(y,z)\) is infinite. The question that comes up is, given a set Σ of TGDs, can we check whether, for every database D, all or some valid (semi-)oblivious chase derivations of D w.r.t. Σ are finite? Before formalizing the above problem, let us recall two central classes of sets of TGDs:
$$ \begin{array}{rcl} \mathbb{CT}_{\forall \forall}^{\star} &=& \left\{{\varSigma}\ \left| \begin{array}{l} \text{ for \textit{every} database } D,\\ \text{ \textit{every} valid $\star$-chase derivation} \text{ of } D \text{ w.r.t.~} {\varSigma} \text{ is finite} \end{array} \right.\right\} \end{array} $$
$$ \begin{array}{rcl} \mathbb{CT}_{\forall \exists}^{\star} &=& \left\{{\varSigma}\ \left| \begin{array}{l} \text{ for \textit{every} database } D,\\ \text{ there \textit{exists} a finite $\star$-chase derivation} \text{ of } D \text{ w.r.t.~} {\varSigma} \end{array} \right.\right\} \end{array} $$
The problems tackled in this work are as follows, where \(\mathbb {C}\) is a class of sets of TGDs:
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figc_HTML.png
It is well-known from [20] that the following holds:
$$ \mathbb{CT}_{\forall \forall}^{\mathsf{o}} = \mathbb{CT}_{\forall \exists}^{\mathsf{o}}\ \subset\ \mathbb{CT}_{\forall \forall}^{\mathsf{so}} = \mathbb{CT}_{\forall \exists}^{\mathsf{so}}. $$
This immediately implies that, after fixing the version of the chase in question, i.e., oblivious or semi-oblivious, the above decision problems are equivalent. Henceforth, for a class \(\mathbb {C}\) of sets of TGDs, we simply refer to the problem \({\mathsf {CT}_{\forall }^{\star }}(\mathbb {C})\), and we write \({\mathbb {CT}_{\forall }^{\star }}\) for the classes \({\mathbb {CT}_{\forall \forall }^{\star }}\) and \({\mathbb {CT}_{\forall \exists }^{\star }}\), where ⋆ ∈{o, so}.
We know that our main decision problems are, in general, undecidable. Assuming that \(\mathbb {TGD}\) denotes the class of arbitrary sets of TGDs, we have that:
Theorem 1 (18)
For ⋆ ∈{o, so}, \(\mathsf {CT}_{\forall }^{\star }(\mathbb {TGD})\) is undecidable, even if we focus on binary and ternary predicates.
However, the set of TGDs employed in the undecidability proof of [18] is not sticky. What about \(\mathsf {CT}_{\forall }^{\star }(\mathbb {S})\) then? This is a non-trivial problem, and pinpointing its complexity is the main goal of this work.

3.2 Some Useful Results

Before proceeding with the complexity analysis of \(\mathsf {CT}_{\forall }^{\star }(\mathbb {S})\), let us recall a couple of technical results that would allow us to significantly simplify our investigation.
Critical Database
It would be useful to have a special database of a very simple form that gives rise to a valid infinite chase derivation whenever there is a database with the same property. Interestingly, such a database exists, which is known as the critical database for a set Σ of TGDs [28]. Formally, the critical database for Σ is
$$ \mathsf{cr}({\varSigma})\ =\ \{R(\natural,\ldots,\natural) \mid R \in \mathsf{sch}({\varSigma})\}, $$
where C is a fixed constant. In other words, cr(Σ) consists of all the atoms that can be formed using the predicates of sch(Σ) and the constant . The following result states that cr(Σ) is indeed the desired database:
Proposition 1 ([28])
Consider a set Σ of TGDs. For ⋆ ∈{o, so}, the following are equivalent:
1.
\({\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\star }\).
 
2.
There exists a valid infinite ⋆-chase derivation of cr(Σ) w.r.t. Σ.
 
Henceforth, we always use for the constant that appears in a critical database. Notice that this special constant does not depend on the set of TGDs in question.
Fairness
As one might expect, we are going to focus on the complement of \(\mathsf {CT}_{\forall }^{\star }(\mathbb {S})\), for ⋆ ∈{o, so}, and pinpoint the complexity of the following problem: given a set \({\varSigma } \in \mathbb {S}\), is there a valid infinite ⋆-chase derivation δ of cr(Σ) w.r.t. Σ (see Proposition 1). However, as observed in [8], where the same problem is studied but for the class of guarded TGDs, one of the main difficulties is to ensure that δ enjoys the fairness condition. Interestingly, as shown in [8], we can completely neglect the fairness condition since the existence of a (possibly unfair) infinite ⋆-chase derivation of some database w.r.t. Σ implies the existence of a fair one.
Proposition 2 (8)
Consider a database D and a set Σ of TGDs. For ⋆ ∈{o, so}, the following are equivalent:
1.
There exists a valid infinite ⋆-chase derivation of D w.r.t. Σ.
 
2.
There exists an infinite ⋆-chase derivation of D w.r.t. Σ
 
By combining Propositions 1 and 2, we obtain the following useful corollary:
Corollary 1
Consider a set Σ of TGDs. For ⋆ ∈{o, so}, the following are equivalent
1.
\({\varSigma } \not \in \mathbb {CT}_{\forall }^{\star }\).
 
2.
There exists an infinite ⋆-chase derivation of cr(Σ) w.r.t Σ.
 

3.3 Our Main Result and Plan of Attack

As discussed above, the main goal of this work is to pinpoint the complexity of chase termination under sticky sets of TGDs, focussing on the oblivious and semi-oblivious versions of the chase procedure. Our main result follows:
Theorem 2
\(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) and \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\) are PSpace-complete, and NLogSpace-complete for predicates of bounded arity.
Consider a set \({\varSigma } \in \mathbb {S}\). By Corollary 1, our main challenge is to show that the problem of deciding whether there exists an infinite ⋆-chase derivation of cr(Σ) w.r.t. Σ, where ⋆ ∈{o, so}, is PSpace-complete, and NLogSpace-complete for predicates of bounded arity. In fact, the bulk of our work concentrates on establishing the desired upper bounds for the semi-oblivious chase, i.e., when ⋆ = so. We can then easily obtain the same upper bounds for the oblivious chase by exploiting a simple reduction from \({\mathsf {CT}_{\forall }^{{\mathsf {o}}}}(\mathbb {S})\) to \({\mathsf {CT}_{\forall }^{{\mathsf {so}}}}(\mathbb {S})\). Our plan of attack follows:
  • The upper bounds heavily rely on the following semantic characterization given in Section 4: there exists an infinite so-chase derivation of cr(Σ) w.r.t. Σ iff there exists a “path-like” infinite so-chase derivation of cr(Σ) w.r.t. Σ.
  • The above semantic characterization allows us to provide, in Section 5, a syntactic graph-based characterization of the existence of an infinite so-chase derivation of cr(Σ) w.r.t. Σ. Actually, the latter coincides with the existence of a certain “bad” cycle in the dependency graph of a “linearized” version of Σ.
  • We show, in Section 6, that checking whether a “bad” cycle exists in the dependency graph of the “linearized” version of Σ is in PSpace, and in NLogSpace for predicates of bounded arity. This shows the desired upper bounds for \({\mathsf {CT}_{\forall }^{\mathsf {so}}}(\mathbb {S})\). We then explain how the upper bounds for \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\) can be transferred to \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) by exploiting a simple construction known as enrichment [20]. We finally provide matching lower bounds for \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) and \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\). The PSpace-hardness is obtained by simulating a deterministic polynomial space Turing machine, while the NLogSpace-hardness by a reduction from graph reachability.
At this point, one may wonder whether a powerful termination criterion from the literature allows us to characterize the fragment of sticky sets of TGDs that guarantees the termination of the semi-oblivious chase, which in turn will lead to the desired complexity results. To the best of our knowledge, such a criterion does not exist. Interestingly, model-faithful acyclicity, one of the largest classes of TGDs that ensure the termination of the semi-oblivious chase known today [14], is not powerful enough for characterizing the class \(\mathbb {S} \cap {\mathbb {CT}_{\forall }^{{\mathsf {so}}}}\). For example, the sticky set of TGDs
$$ P(x,y,z)\ {\rightarrow}\ S(x,y,z) \qquad S(x,y,x)\ {\rightarrow}\ \exists z P(y,z,x) $$
belongs to \(\mathbb {CT}_{\forall }^{\mathsf {so}}\), but it violates the model-faithful acyclicity condition.

4 Semantic Characterization of Semi-Oblivious Chase Non-Termination

We proceed to characterize the non-termination of the so-chase under sticky sets of TGDs. For a set \({\varSigma } \in \mathbb {S}\), our goal is to show that, if there is an infinite so-chase derivation of cr(Σ) w.r.t. Σ, then we can isolate a “path-like” infinite so-chase derivation δ, which we call linear. Roughly speaking, linearity means that there exists an infinite simple path α0,α1,α2… in the chase relation of δ such that α0cr(Σ) and αi is constructed during the i-th trigger application, while all the atoms that are needed to construct this path, and are not already on the path, are atoms of cr(Σ).
Definition 2 (Linearity)
Consider a set Σ of TGDs. An infinite so-chase derivation δ = (Ii)i≥ 0 of cr(Σ) w.r.t. Σ, where Iiso, σi,hiIi+ 1 for i ≥ 0, is called linear if there is an infinite sequence of distinct atoms (αi)i≥ 0 such that the following hold:
1.
α0cr(Σ).
 
2.
For each i ≥ 0, αi+ 1Ii+ 1Ii, and there exists βbody(σi) such that hi(β) = αi and \(h_{i}({\mathsf {body}({\sigma _{i}})} \setminus \{\beta \}) \subseteq \mathsf {cr}({\varSigma })\).
 
We are now ready to present our main characterization of non-termination of the semi-oblivious chase under sticky sets of TGDs.
Theorem 3
Consider a set \({\varSigma } \in \mathbb {S}\). The following are equivalent:
1.
There exists an infinite so-chase derivation of cr(Σ) w.r.t Σ.
 
2.
There exists a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ.
 
It is clear that (2) ⇒ (1) holds trivially. The non-trivial direction is (1) ⇒ (2), which is established in two main steps:
1.
We show that the chase relation of an infinite so-chase derivation δ of cr(Σ) w.r.t. Σ always contains a special path, called continuous, rooted at an atom of cr(Σ), which essentially guarantees the continuous propagation of a new null. Note that the existence of such a special path does not rely on stickiness.
 
2.
By exploiting the existence of a continuous path, we construct a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ. In fact, due to stickiness, we can convert an infinite suffix P of the continuous path in ≺δ, together with all the atoms that are needed to generate the atoms on P via a single trigger application, into a linear infinite so-chase derivation δ of cr(Σ) w.r.t. Σ. As we shall see, stickiness helps us to ensure that δ is linear, while continuity allows us to show that δ is infinite.
 
In the rest of this section, we fix a set \({\varSigma } \in \mathbb {S}\). For technical clarity, we assume that all the TGDs of Σ have a non-empty frontier, i.e., for every TGD σΣ, there exists at least one variable that appears both in body(σ) and head(σ). Furthermore, we assume that Σ is in normal form, i.e., each TGD of Σ has only one atom in its head. The normalization procedure, which preserves stickiness, is rather standard and can be found, e.g., in [13]. The above simplifying assumptions do not affect the generality of our proof. In other words, assuming that Σ is what we obtain after removing from \(\hat {{\varSigma }} \in \mathbb {S}\) all the TGDs with an empty frontier and then normalize it, we can easily show that there exists a linear infinite so-chase derivation of \({\mathsf {cr}({\hat {\varSigma }})}\) w.r.t. \(\hat {{\varSigma }}\) iff there exists a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ.

4.1 Existence of a Continuous Path

Let us first formalize the notion of path in the chase relation of a derivation. Given an infinite so-chase derivation δ = (Ii)i≥ 0 of cr(Σ) w.r.t. Σ, a finite δ-path is a finite sequence of atoms (αi)0≤in such that α0I0 and αiδαi+ 1. Analogously, we can define the notion of infinite δ-path, which refers to an infinite sequence of atoms rooted at an atom of I0. The intention underlying continuity is to ensure the continuous propagation of a new null on a path. Roughly, a δ-path (αi)i≥ 0 is continuous via a sequence of indices (i)i≥ 0, with 0 = 1, if for each i ≥ 0, a new null is invented in \(\alpha _{\ell _{i}}\) that is “necessarily propagated” up to the atom \(\alpha _{\ell _{i+1}}\). At this point, it is crucial to formalize what we mean by “necessarily propagated”.
Let α, β be atoms of \(\bigcup _{i \geq 0} I_{i}\), and assume that βIjIj− 1, for some j > 0, with Ij− 1so, σ, hIj, i.e., β = so-result(σ, h).2 We say that the i-th position of α and the j-th position of β are related (w.r.t δ), denoted \((\alpha ,i) \simeq _{\delta } (\beta ,j)\), if there exists an atom γbody(σ) such that α = h(γ) and γ[i] = head(σ)[j]. A finite δ-path (αi)0≤in is (s, t)-connected, where s ∈ [ar(R0)] and t ∈ [ar(Rn)], with R0 and Rn being the predicates of α0 and αn, respectively, if there exists a sequence of integers (k)1≤kn− 1 such that
$$ (\alpha_{0},s) \simeq_{\delta} (\alpha_{1},\ell_{1}) \simeq_{\delta} (\alpha_{2},\ell_{2}) \simeq_{\delta} {\cdots} \simeq_{\delta} (\alpha_{n-1},\ell_{n-1}) \simeq_{\delta} (\alpha_{n},t). $$
In simple words, the term occurring at position (α0,s) is necessarily propagated up to the position (αn,t) via the intermediate positions (α1,1),…,(αn− 1,n− 1).
For introducing continuity, we also need the notion of the birth atom of a null value. Consider a null ⊥ occurring in \(\bigcup _{i \geq 0} I_{i}\). The birth atom (w.r.t. δ) of ⊥, denoted birthδ(⊥), is the atom of \(\bigcup _{i \geq 0} I_{i}\) such that, for some j > 0, IjIj− 1 = {birthδ(⊥)} and ⊥∈dom(Ij) ∖dom(Ij− 1). It is clear that the atom birthδ(⊥) is unique (since we consider TGDs in normal form). We are now ready to introduce continuity.
Definition 3 (Continuity)
Let δ be an infinite so-chase derivation δ of cr(Σ) w.r.t. Σ. A δ-path (αi)i≥ 0 is continuous if there exist an infinite sequence (i)i≥ 0 of integers, where 0 = 1 and 0 < 1 < ⋯, an infinite sequence (⊥i)i≥ 0 of nulls, and infinite sequences of integers (mi)i≥ 0 and (pi)i≥ 0 from [ar(Σ)], such that, for each k ≥ 0: \(\alpha _{\ell _{k}} = {\mathsf {birth}_{{\delta }}({\bot _{k}})}\), \(\bot _{k} = \alpha _{\ell _{k}}[m_{k}]\), and \((\alpha _{i})_{\ell _{k} \leq i \leq \ell _{k+1}}\) is (mk,pk)-connected.
A simple example that illustrates the notion of continuity follows:
Example 4
Assume that Σ = {σ1,σ2,σ3}, where3
$$ \begin{array}{@{}rcl@{}} \sigma_{1} &=& S(x)\ {\rightarrow}\ \exists y \exists z P(x,y,z),R(y,z)\\ \sigma_{2} &=& P(x,y,z), R(y,w)\ {\rightarrow}\ P(w,y,z)\\ \sigma_{3} &=& P(x,y,z),R(y,w)\ {\rightarrow} \exists v P(z,y,v),R(y,v),Q(w). \end{array} $$
It is easy to verify that there exists an infinite so-chase derivation δ of cr(Σ) w.r.t. Σ such that the following is part of ≺δ; a black solid edge from α to β labeled by σ means that (α, β) belongs to ≺δ due to a trigger that involves the TGD σ:
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figd_HTML.png
It can be verified that the path with P-atoms is a continuous δ-path. Let us explain the reason. The first atom in which a null is born is P(c,⊥1,⊥2), with ⊥1,⊥2 being the new nulls, and continuity is satisfied since ⊥2 is propagated (this is indicated via the red dashed arrows) to the next atom where the new null ⊥3 is born. Now, since the null ⊥3 is propagated up to the next birth atom P(⊥3,⊥1,⊥6), continuity is satisfied. In the rest of the path the same pattern is repeated, and thus continuity is globally satisfied. In fact, the pattern that we can extract is graphically illustrated as
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Fige_HTML.png
where the continuous propagation of a new null (red dashed arrows) can be seen.
We are now ready to establish our main technical result concerning continuity. Note that the next result holds for arbitrary (not necessarily sticky) sets of TGDs.
Lemma 1
For every infinite so-chase derivation δ of cr(Σ) w.r.t Σ, there exists a δ-path that is continuous.
Proof
Assume that δ = (Ii)i≥ 0 is an infinite so-chase derivation of cr(Σ) w.r.t Σ, and let \(\mathcal {I} = \bigcup _{i \geq 0} I_{i}\). Given a term \(t \in {\mathsf {dom}({\mathcal {I}})}\), which is either the constant or a null, and a null \(\bot \in {\mathsf {dom}({\mathcal {I}})}\), we write \(t \vartriangleright _{\delta } \bot \) if t occurs in fr(birthδ(⊥)).4 Now, for each \(t \in {\mathsf {dom}({\mathcal {I}})}\), we inductively define the rank (w.r.t. δ) of t as follows:
$$ \begin{array}{@{}rcl@{}} &\mathsf{rank}_{\delta}(t)\ =\ \left\{ \begin{array}{ll} 0 & \quad \text{if } t = \natural,\\ &\\ 1 + \max\{ \mathsf{rank}(t^{\prime}) \mid t^{\prime} \vartriangleright_{\delta} t\} & \quad \text{otherwise.} \end{array} \right.& \end{array} $$
For a null \(\bot \in \mathsf {dom}(\mathcal I)\), we select a term \(t \in \mathsf {dom}(\mathcal I)\) such that rank(t) = rank(⊥) − 1 and \(t \vartriangleright _{\delta } \bot \), and we write \(t \blacktriangleright _{\delta } \bot \).5 Since, by assumption, all the TGDs of Σ have a non-empty frontier, it is easy to see that the binary relation \(\blacktriangleright _{\delta }\) over \(\mathsf {dom}(\mathcal I)\) forms an infinite rooted tree Tδ, where the root is the constant occurring in cr(Σ). The key property of Tδ is given by the following result:
Lemma 2
Tδ has finite out-degree.
Proof
We can show that, for each i ≥ 0, the set \(\{t \in \mathsf {dom}(\mathcal I) \mid \mathsf {rank}(t) = i\}\) is finite. This can be shown by induction on i ≥ 0, while the key fact is that only finitely many semi-obliviously active triggers can be formed due to which a null with rank i + 1 is generated (since, by induction hypothesis, the set of terms with rank at most i is finite). Therefore, the nodes of Tδ have finite out-degree. □
Having Lemma 2 in place, we can apply König’s Lemma on Tδ, and get that Tδ contains an infinite simple path starting from the root node.6 Let ⊥0,⊥1,⊥2,… be such a path, where ⊥0 = . By construction, for each i ≥ 0, ⊥ifr(birthδ(⊥i+ 1)). Moreover, there is a sequence of atoms
$$ {\alpha^{i}_{0}}, {\alpha^{i}_{1}}, {\alpha^{i}_{2}}, \ldots, \alpha^{i}_{m_{i}}, $$
where \({\alpha ^{i}_{0}} \in \mathsf {cr}({\varSigma })\) if i = 0, \({\alpha ^{i}_{0}} = \mathsf {birth}_{\delta }(\bot _{i})\) if i > 0, and \(\alpha ^{i}_{m_{i}} = \mathsf {birth}_{\delta }(\bot _{i+1})\), such that \({\alpha ^{i}_{k}} {\prec _{{\delta }} {\alpha }_{k+1}^{i}}\), for each k ∈{0,…,mi − 1}, and there are integers s, t ∈ [ar(Σ)] such that \(({{\alpha }_{j}^{i}})_{0 \leq j \leq m_{i}}\) is (s, t)-connected. Consequently, the infinite sequence
$$ {\alpha}_{0}^{0},{\alpha}_{1}^{0} = {\alpha}_{0}^{1},{\alpha}_{1}^{1},\ldots,{\alpha}_{m_{1}}^{1} = {\alpha}_{0}^{2},{\alpha}_{1}^{2},\ldots,{\alpha}_{m_{2}}^{2} = {\alpha}_{0}^{3},\ldots $$
is a continuous δ-path, and the claim follows. □

4.2 From Arbitrary to Linear Infinite Derivations

We can now show that an infinite so-chase derivation δ = (Ii)i≥ 0 of cr(Σ) w.r.t. Σ can always be converted into a linear infinite so-chase derivation δ, which in turn establishes the non-trivial direction (1) ⇒ (2) of Theorem 3. The construction proceeds in three main steps, which are described below. We first describe those steps in a semi-formal way, and exploit Example 4 in order to illustrate the key ideas. We then proceed to give the formal construction.

4.2.1 Semi-Formal Description

Useful part of δ
We first isolate a useful part of the so-chase derivation δ = (Ii)i≥ 0. By Lemma 1, there exists a continuous infinite δ-path P = (αi)i≥ 0. Recall that continuity guarantees the continuous propagation on P of infinitely many nulls, which we call for the purpose of this discussion pivotal. By stickiness, there exists j ≥ 0 such that αj is the last atom on P in which a term t becomes sticky. By saying that the term t becomes sticky, we mean that the first time t participates in a join is during the trigger application that generates αj, and thus t occurs in (or sticks to) every atom of {αi}ij. Let kj be the integer such that αk is the first atom on P after αj in which a new pivotal null is invented. The useful part of δ that we are going to focus on is the infinite sequence of atoms (αi)ik, which we call the backbone, and the atoms of \(\bigcup _{i \geq 0} I_{i}\), which we call side atoms, that are needed to generate the atoms on the backbone via a single trigger application. In other words, for a backbone atom α, if α is obtained via the trigger (σ, h) for Σ on instance Ii, for some i ≥ 0, then the atoms h(body(σ)), excluding the backbone atoms, are side atoms.
Example 5
Consider again the set \({\varSigma } \in \mathbb {S}\) from Example 4. As discussed above, there exists an infinite so-chase derivation δ of cr(Σ) w.r.t. Σ such that a continuous infinite δ-path exists (see the figures above). The useful part of δ is as shown below
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figf_HTML.png
Observe that the last atom on the continuous path in which a term becomes sticky is P(⊥2,⊥1,⊥3); in fact, the sticky term is ⊥1, which is the only sticky term on the continuous path. It happened that P(⊥2,⊥1,⊥3) invents also a new pivotal null, that is, ⊥3, and therefore the suffix of the continuous path that starts at P(⊥2,⊥1,⊥3) is the backbone. It is now easy to verify that all the other atoms, apart from S(), indeed contribute to the generation of a backbone atom via a single trigger application.
Renaming step
We proceed to rename some of the nulls that occur in backbone atoms or side atoms. In particular, for every null ⊥ occurring in a side atom α, we apply the following renaming steps; recall that is the constant occurring in cr(Σ): (i) every occurrence of ⊥ in α is replaced by , and (ii) every occurrence of ⊥ in a backbone atom β that is propagated from α to β is replaced by . For a backbone or side atom α, let ρ(α) be the atom obtained from α after globally applying the above renaming steps. We now define the sequence of instances \(\delta ^{\prime } = (J_{i})_{i \geq 0}\) as follows:
$$ J_{0}\ =\ \{\rho(\alpha) \mid \alpha \text{ is a side atom}\}, $$
which is a subset of cr(Σ), and
$$ J_{i}\ =\ J_{i-1} \cup \{\rho(\alpha_{k+i-1})\} \cup H, \quad \text{for } i>0, $$
where H is the set of atoms that are generated together with αk+i− 1, after renaming the propagated nulls that do not occur in ρ(αk+i− 1) to .7 It is crucial to observe that a new null generated in a backbone atom never participates in a join. This is because the first backbone atom αk comes after the atom αj, which is the last atom on P in which a term becomes sticky. This fact allows us to modify triggers from δ in order to construct, for every i ≥ 0, a trigger (σi,hi) such that Jiso, σi,hiJi+ 1.
Example 6
We consider again our running example. Before renaming the nulls that appear in side atoms, we first need to understand how nulls are propagated from side atoms to backbone atoms during the chase. This is depicted in the following figure
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figg_HTML.png
Note that the boldfaced occurrences of the pivotal nulls ⊥3,⊥6,… are not propagated from side atoms, but generated on the backbone, and thus will not be renamed. Recall that the existence of such nulls is guaranteed by continuity. By applying the renaming step, i.e., by replacing every null in a side atom with the constant , and then propagating it to the backbone as indicated above, we get the sequence of instances
$$ \begin{array}{@{}rcl@{}} &J_{0} = \{R(\natural,\natural),P(\natural,\natural,\natural)\},&\\ &J_{1} = J_{0} \cup \{P(\natural,\natural,\bot_{3}),R(\natural,\bot_{3})\},&\\ &J_{2} = J_{1} \cup \{P(\natural,\natural,\bot_{3})\},&\\ &J_{3} = J_{2} \cup \{P(\natural,\natural,\bot_{3})\},&\\ &J_{4} = J_{3} \cup \{P(\bot_{3},\natural,\bot_{6}),R(\natural,\bot_{6})\},&\\ &\vdots& \end{array} $$
Observe that, due to stickiness, none of the nulls ⊥3,⊥6,… generated on the backbone participates in a join. Hence, the renaming step preserves all the joins, and thus, by adapting triggers from δ, we can devise a trigger for each pair (Ji,Ji+ 1).
Pruning step
At this point, one may be tempted to think that \(\delta ^{\prime } = (J_{i})_{i \geq 0}\), with Jiso, σi,hiJi+ 1 for i ≥ 0, is the desired linear infinite so-chase derivation of cr(Σ) w.r.t. Σ. It is easy to verify that we have the infinite sequence of atoms (ρ(αi))ik− 1 such that ρ(αk− 1) ∈cr(Σ) since αk− 1 is a side atom, and for each ik − 1, \(J_{i-k+2} \supseteq J_{i-k+1} \cup \{\rho (\alpha _{i+1})\}\), and there exists βbody(σi) such that hi(βi) = αi and \(h_{i}(\mathsf {body}(\sigma _{i}) \setminus \{\beta \}) \subseteq \mathsf {cr}({\varSigma })\). However, we cannot conclude yet that \(\delta ^{\prime }\) is the desired so-chase derivation since we may apply a trigger that is not semi-obliviously active. This can be fixed by simply removing the instances that are obtained via a trigger that is not semi-obliviously active, which actually means that we keep only one occurrence of each instance. However, such a pruning step might be applied infinitely many times, and thus, the question that we need to answer is whether the obtained so-chase derivation \(\delta ^{\prime \prime }\) remains infinite. Interestingly, this is the case due to continuity. Since the backbone (αi)ik is part of a continuous δ-path, we get that all the (necessarily consecutive) occurrences of an instance appear between two instances in which new pivotal nulls are invented. Since the distance between such instances is finite, each instance occurs in \(\delta ^{\prime }\) finitely many times. Therefore, after the pruning step, the obtained so-chase derivation is infinite. Thus, \(\delta ^{\prime \prime }\) is a linear infinite so-chase derivation of J0 w.r.t. Σ. Since \(J_{0} \subseteq {\mathsf {cr}({\varSigma })}\), we can construct a linear infinite so-chase derivation δ of cr(Σ) w.r.t. Σ by adding to J0 the set of atoms cr(Σ) ∖ J0.
Example 7
Coming back to our running example, it can be seen that the sequence of instances devised in Example 6 is not the desired linear derivation due to non-active triggers, which implies that there are consecutive occurrences of the same instance, e.g., JI,J2,J3. However, after applying the pruning step, we get the sequence
$$ \begin{array}{@{}rcl@{}} &{J}_{0}^{\prime} = J_{0},&\\ &{J}_{1}^{\prime} = J_{1},&\\ &{J}_{2}^{\prime} = J^{\prime}_{1} \cup \{P(\bot_{3},\natural,\bot_{6}),R(\natural,\bot_{6})\},&\\ &{J}_{3}^{\prime} = J^{\prime}_{2} \cup \{P(\bot_{6},\natural,\bot_{9}),R(\natural,\bot_{9})\},&\\ &\vdots& \end{array} $$
Now, it is easy to verify that after adding the atoms S() and Q() to \(J^{\prime }_{0}\), we get a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ.

4.2.2 The Formal Construction

Useful part of δ
In what follows, let \(\mathcal {I} = \bigcup _{i \ge 0} I_{i}\). By Lemma 1, there exists a continuous infinite δ-path P = (αi)i≥ 0. That is, there exist an infinite sequence (i)i≥ 0 of integers, where 0 = 1 and 0 < 1 < ⋯, an infinite sequence (⊥i)i≥ 0 of nulls, and infinite sequences of integers (mi)i≥ 0 and (pi)i≥ 0 from [ar(Σ)], such that, for each k ≥ 0, \(\alpha _{\ell _{k}}= {\mathsf {birth}_{\delta }(\bot _{k})}\), \(\bot _{k} = \alpha _{\ell _{k}}[m_{k}]\), and \((\alpha _{i})_{\ell _{k} \le i \le \ell _{k+1}}\) is (mk,pk)-connected. Given an atom αi = so-result(σ, h) for some i > 0, we say that a term \(t \in \mathsf {dom}(\mathcal {I})\) becomes sticky in αi if i is the smallest integer such that there is a variable xfr(σ) occurring more than once in body(σ) and t = h(x). By stickiness of Σ, and from the fact that the arity of each predicate in Σ is finite, there exists some j ≥ 0 such that, for every ij, no term \(t \in {\mathsf {dom}(\mathcal {I})}\) becomes sticky in αi. Let k ≥ 0 be the smallest integer such that no term becomes sticky in \(\alpha _{\ell _{k}}\). Note that this integer exists since the sequence of indices (i)i≥ 0 is infinite. We call the sequence of atoms \(B = (\alpha _{i})_{i \ge \ell _{k}}\) the backbone of δ; let \({\mathscr{B}} = \{\alpha _{i}\}_{i \ge \ell _{k}}\). Furthermore, we define the set
$$ S\ =\ \{\alpha \in \mathcal{I}\setminus \mathcal{B} \mid \text{there exists } \beta \in \mathcal{B} \text{ such that } \alpha \prec_{\delta} \beta\}, $$
which we call the side atoms of B. The set \(S \cup {\mathscr{B}}\) is the useful part of δ.
Renaming step
For notational convenience, let B = (βi)i≥ 0. We define \(B^{\prime }\) as the sequence of atoms obtained from B as follows. For every atom \(\beta = R(\bar t) \in {\mathscr{B}}\), and every integer j ∈ [ar(R)], if there exists an atom \(\alpha = S(\bar u) \in S\), and an integer i ∈ [ar(S)], with α[i] ∈N, such that an (i, j)-connected δ-path of the form α,…,β exists, then the term at position (β, j) is renamed to the constant . Furthermore, for each i ≥ 0, assuming that βi = so-result(σ, h), let \(\eta _{i} = (\sigma ,h^{\prime })\) be the trigger obtained from (σ, h) after updating h to \(h^{\prime }\) as dictated by the renaming step applied to βi. Assuming \(B^{\prime }=({\beta }_{i}^{\prime })_{i \ge 0}\), we define \(\delta ^{\prime }=(J_{i})_{i \ge 0}\) such that J0 = cr(Σ), and for i > 0, \(J_{i} = J_{i-1} \cup \{{\beta }_{i-1}^{\prime }\}\). Since
  • no term becomes sticky in some atom of B, and
  • for each null ⊥ such that \(\beta = \mathsf {birth}_{\delta }(\bot ) \in {\mathscr{B}}\), which means that ⊥ has been generated on B, ⊥ is not renamed on \(B^{\prime }\),
we get that, for i ≥ 0, ηi is a trigger for Σ on Ji, and thus, Ji+ 1 = Jiso-result(ηi).
Pruning step
Although \(\delta ^{\prime } = (J_{i})_{i\ge 0}\) is locally consistent, that is, for each i ≥ 0, Jiso, σi,hiJi+ 1, for some trigger (σi,hi) for Σ on Ji, it is not yet a so-chase derivation since some of the involved triggers might not be semi-obliviously active. In other words, there might exist i ≥ 0 such that Jiso, σi,hiJi+ 1, but so-result(σi,hi) ∈ Ji, which implies that Ji = Ji+ 1. Let \(\delta ^{\prime \prime }\) be the sequence of instances obtained from \(\delta ^{\prime }\) where only one occurrence of each instance is kept. Clearly, \(\delta ^{\prime \prime }\) is still locally consistent. Moreover, by exploiting continuity, we can show that, for each i ≥ 0, Ji occurs in \(\delta ^{\prime }\) finitely many times, which in turn implies that \(\delta ^{\prime \prime }\) is a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ, as needed.

5 Graph-Based Characterization of Semi-Oblivious Chase Termination

In this section, we characterize the termination of the semi-oblivious chase for sticky sets of TGDs via a graph-based condition. More precisely, we show that a set \({\varSigma } \in \mathbb {S}\) belongs to \({\mathbb {CT}}_{\forall }^{\mathsf {so}}\) iff a linearized version of it, i.e., a set of linear TGD obtained from Σ, enjoys a condition inspired by weak-acyclicity [17], called critical-weak-acyclicity, introduced in [8]. Recall that linear TGDs are TGDs with only one body atom [12]; we write \({\mathbb {L}}\) for the class of linear TGDs. The proof of the above result boils down to showing that the given sticky set Σ of TGDs can be rewritten into a set of linear TGDs, while this rewriting preserves the termination of the semi-oblivious chase. The latter heavily relies on Theorem 3, which establishes that non-termination of the semi-oblivious chase coincides with the existence of a linear infinite chase derivation of cr(Σ) w.r.t. Σ. We can then apply the characterization for the termination of the semi-oblivious chase for linear TGDs from [8], which states that a set \({\varSigma } \in {\mathbb {L}}\) belongs to \({{\mathbb {CT}}_{\forall }^{\mathsf {so}}}\) iff it is critically-weakly-acyclic. Let us first recall the notion of critical-weak-acyclicity for linear TGDs, which has been originally introduced in [8].

5.1 Critically-Weakly-Acyclic Linear TGDs

Since critical-weak-acyclicity is inspired by weak-acyclicity, it is not surprising that it relies on the dependency graph of a set of TGDs introduced in [17], that encodes how terms might be propagated during the chase. We assume a fixed ordering on the head-atoms of TGDs.8 For a TGD σ with head(σ) = α1,…,αk, we write (σ, i) for the TGD that has only one atom in its head, called single-head, obtained from σ by keeping only the atom αi, and the existentially quantified variables in αi. Recall that pos(α, x) is the set of positions in α at which x occurs, while pos(body(σ),x) is the set of positions at which x occurs in the body of σ. We also write pos(sch(Σ)) for the set of positions of sch(Σ), i.e., the set {(R, i)∣R/nsch(Σ) and i ∈ [n]}.
Definition 4 (Dependency Graph)
The dependency graph of a set Σ of TGDs is a labeled directed multigraph dg(Σ) = (N, E, λ), where N = pos(sch(Σ)), \(\lambda : E {\rightarrow } {\varSigma } \times \mathbb {N}\), and E contains only the following edges. For each σΣ with head(σ) = α1,…,αk, for each xfr(σ), and for each position πpos(body(σ),x):
  • For each i ∈ [k], and for each position \(\pi ^{\prime } \in \mathsf {pos}(\alpha _{i},x)\), there is a normal edge \(e = (\pi ,\pi ^{\prime }) \in E\) with λ(e) = (σ, i).
  • For each existentially quantified variable z in σ, for each i ∈ [k], and for each \(\pi ^{\prime } \in \mathsf {pos}(\alpha _{i},z)\), there is a special edge \(e = (\pi ,\pi ^{\prime }) \in E\) with λ(e) = (σ, i).
Intuitively speaking, a normal edge \((\pi ,\pi ^{\prime })\) encodes the fact that a term may propagate from π to \(\pi ^{\prime }\) during the chase. Moreover, a special edge \((\pi ,\pi ^{\prime \prime })\) keeps track of the fact that the propagation of a term from π to \(\pi ^{\prime }\) also creates a new null at position \(\pi ^{\prime \prime }\). A simple example that illustrates the notion of the dependency graph follows:
Example 8
Consider the set Σ consisting of the TGD
$$ \sigma\ =\ R(x,y)\ {\rightarrow}\ \exists z P(x,z),R(x,z). $$
The graph dg(Σ) is as follows
https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figh_HTML.png
where the dashed arrows represent special edges. The normal edges occur due to the variable x, while the special edges due to the existentially quantified variable z.
The class of weakly-acyclic sets of TGDs is a well-known formalism, introduced in the context of data exchange, that guarantees the termination of the semi-oblivious chase [17].9 Formally, a set Σ is weakly-acyclic if there is no cycle in dg(Σ) that contains a special edge. It would be extremely useful if, whenever we focus on linear TGDs, weak-acyclicity is also a necessary condition for the termination of the semi-oblivious chase. Unfortunately, this is not the case. A simple counterexample follows:
Example 9
Consider the set Σ of linear TGDs consisting of \( R(x,x) {\rightarrow } \exists z R(z,x). \) In dg(Σ) there is a cycle that contains a special edge. However, there exists only one so-chase derivation of cr(Σ) w.r.t. Σ that is finite, and thus, \({\varSigma } \in {{\mathbb {CT}}_{\forall }^{\mathsf {so}}}\).
Interestingly, as it has been shown in [8], there is an extension of weak-acyclicity, called critical-weak-acyclicity, that, whenever we focus on linear TGDs, provides a necessary and sufficient condition for the termination of the semi-oblivious chase. A key notion underlying critical-weak-acyclicity is the notion of compatibility among two single-head linear TGDs. Intuitively, if a single-head linear TGD σ1 is compatible with a single-head linear TGD σ2, then the atom obtained during the chase by applying σ1 may trigger σ2. To formally define the notion of compatibility, we first need to recall the standard notion of unification among atoms.
We say that two atoms α and β (containing only variables from V) unify if there exists a substitution γ from the variables occurring in α and β to V, called unifier for α and β, such that γ(α) = γ(β). A most general unifier (MGU) for α and β is a unifier for α and β, denoted mgu(α, β), such that, for each other unifier γ for α and β, there exists a substitution \(\gamma ^{\prime }\) such that \(\gamma = \gamma ^{\prime } \circ {\mathsf {mgu}({\alpha },{\beta })}\). It is well-known that if two atoms α and β unify, then they have an MGU that is unique (modulo variable renaming), and thus, we can refer to the MGU for α and β [1]. For brevity, given a TGD σ and a variable x, let \({\varPi }_{x}^{\sigma } = \mathsf {pos}(\mathsf {body}(\sigma ),x)\). Let var(α, π), where α is an atom, and π a set of positions, be the set of variables in α at positions of π.
Definition 5 (Compatibility)
Consider two single-head linear TGDs σ1 and σ2. We say that σ1 is compatible with σ2 if the following hold:
1.
head(σ1) and body(σ2) unify.
 
2.
For each xvar(body(σ2)), either \(\mathsf {var}(\mathsf {head}(\sigma _{1}),{\varPi }_{x}^{\sigma _{2}}) \subseteq \mathsf {fr}(\sigma _{1})\), or there is an existentially quantified variable z in σ1 such that \(\mathsf {var}(\mathsf {head}(\sigma _1),{\Pi }_{x}^{\sigma _2}) = \{z\}\).
 
Having the notion of compatibility among two single-head linear TGDs in place, we can recall the notion of resolvent of a sequence σ1,…,σn of single-head linear TGDs, which is in turn a single-head TGD. Roughly, such a resolvent mimics the behavior of the sequence σ1,…,σn during the chase. Notice that the existence of such a resolvent is not guaranteed, but if it exists, this implies that we may have a sequence of trigger applications that involve the TGDs σ1,…,σn in this order. In such a case, we call the sequence σ1,…,σn active.
Definition 6 (Resolvent)
The resolvent of a sequence σ1,…,σn of single-head linear TGDs, denoted [σ1,…,σn], is inductively defined as follows (for notational convenience, we simply write ρ for [σ1,…,σn− 1]):
1.
[σ1] = σ1.
 
2.
\([\sigma _{1},\ldots ,\sigma _{n}] = \gamma (\mathsf {body}(\rho )) {\rightarrow } \gamma (\mathsf {head}(\sigma _{n}))\), with γ = mgu(head(ρ),body(σn)), if \(\rho \neq \blacklozenge \) and ρ is compatible with σn; otherwise, \([\sigma _{1},\ldots ,\sigma _{n}] = \blacklozenge \).
 
The sequence σ1,…,σn is called active if \([\sigma _{1},\ldots ,\sigma _{n}] \neq \blacklozenge \).
At this point, one may think that the right extension of weak-acyclicity, which will provide a necessary condition for the termination of the semi-oblivious chase under linear TGDs, is to allow cycles with special edges in the underlying dependency graph as long as the corresponding sequence of single-head TGDs, which can be extracted from the edge labels, is not active. However, as thoroughly discussed in [8], this is not enough. If a cycle with a special edge is labeled with an active sequence, then we can conclude that it will be traversed at least once during the chase. Nevertheless, it is not guaranteed that it will be traversed infinitely many times. A cycle that is labeled with an active sequence σ1,…,σn, and contains a special edge, will be certainly traversed infinitely many times if the resolvent of the sequence ρ,…,ρ of length k, where ρ = [σ1,…,σn], exists, for every k > 0. Interestingly, for ensuring the latter condition, it suffices to consider sequences of length at most (ω + 1), where ω is the arity of the predicate of body(σ1). This brings us to critical sequences. For brevity, we write σk for the sequence σ,…,σ of length k.
Definition 7 (Critical Sequence)
A sequence σ1,…,σn of single-head linear TGDs is critical if σ1,…,σn is active, and [σ1,…,σn]ω+ 1 is active, where ω is the arity of the predicate of body(σ1).
We can now recall critical-weak-acyclicity as defined in [8]. It is essentially weak-acyclicity, with the key difference that a cycle in the underlying graph is “dangerous”, not only if it contains a special edge, but if it is also labeled with a critical sequence of single-head linear TGDs. The formal definition follows.
Definition 8 (Critical-Weak-Acyclicity)
Consider a set \({\varSigma } \in \mathbb {L}\) of TGDs, and let dg(Σ) = (N, E, λ). A cycle v0,v1,…,vn,v0 in ΣgΣ is critical if the sequence λ(v0,v1),λ(v1,v2),…,λ(vn,v0) of single-head linear TGDs is critical. We call Σ critically-weakly-acyclic if no critical cycle in dg(Σ) contains a special edge.
The essence of critical-weak-acyclicity is revealed by the following result:
Theorem 4 (8)
Consider a set \({\varSigma } \in \mathbb {L}\) of TGDs. The following are equivalent:
1.
\({\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\).
 
2.
Σ is critically-weakly-acyclic.
 

5.2 From Sticky to Linear TGDs

Before presenting the linearization procedure, we need to introduce some auxiliary notions. Given a tuple \(\bar {t} = (t_{1},\ldots ,t_{n}) \in ({\mathbf {V}} \cup \{\natural \})^{n}\), we write \({\mathsf {shape}(\bar {t})}\) for the tuple obtained from \({\bar {t}}\) by replacing each variable of V with the special symbol ∗. For example, shape((x, y, , z, x, )) = (∗,∗,,∗,∗,). We also write \(\mathsf {svar}(\bar {t})\) for the tuple obtained from \(\bar {t}\) by removing all the occurrences of the constant . For example, svar((x, y, , z, x, )) = (x, y, z, x). For an atom \(\alpha = R(\bar {t})\), let
$$ \natural\text{-}\mathsf{free}(\alpha)\ =\ R_{\mathsf{shape}(\bar{t})}(\mathsf{svar}(\bar{t})), $$
where \(R_{\mathsf {shape}(\bar {t})}\) is of arity \(|\mathsf {svar}(\bar {t})|\). In fact, -free(α) is the constant-free version of α, while the subscript \(\mathsf {shape}(\bar {t})\) keeps track of the original shape of α, i.e., where each occurrence of occurs in α. Notice that, having the constant-free version of an atom α in place, we can unambiguously write down α. For a set of atoms A, let
$$ \natural\text{-}\mathsf{free}(A)\ =\ \{\natural\text{-}\mathsf{free}(\alpha) \mid \alpha \in A\}. $$
Given a TGD σ and an atom αbody(σ), let
$$ V_{\alpha,\sigma}\ =\ \mathsf{var}(\mathsf{body}(\sigma) \setminus \{\alpha\}), $$
that is, the set of body variables of σ that do not occur only in α. Now, given a TGD σ, and an atom αbody(σ), let
$$ M_{\alpha,\sigma}\ =\ \{h : T {\rightarrow} \{\natural\} \mid V_{\alpha,\sigma} \subseteq T \subseteq \mathsf{var}(\mathsf{body}(\sigma))\}, $$
i.e., for each subset T of var(body(σ)) that contains all the variables of Vα, σ, Mα, σ contains a mapping that maps each variable of T to the special constant .
We are now ready to introduce the linearization of a set of TGDs. Note that the following definition talks about arbitrary TGDs. The notion of stickiness is used later for showing that the termination of the semi-oblivious chase is preserved.
Definition 9 (Linearization)
The linearization of a TGD σ of the form \(\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)\), denoted Lin(σ), is the set of linear TGDs
$$ \underset{\alpha \in \phi(\bar x,\bar y)}{\bigcup} \underset{h \in M_{\alpha,\sigma}}{\bigcup} \left\{\natural\text{-}\mathsf{free}(h(\alpha))\ {\rightarrow}\ \exists \bar z {\natural\text{-}\mathsf{free}(h(\psi(\bar x,\bar z)))} \right\}. $$
The linearization of a set Σ of TGDs is defined as \(\mathsf {Lin}({\varSigma }) = \bigcup _{\sigma \in {\varSigma }} \mathsf {Lin}(\sigma )\).
The linearization procedure converts a TGD σ into a set of (constant-free) linear TGDs, where the body atom of each such linear TGD corresponds to an atom α of body(σ), while the variables in body(σ) ∖{α}, and possibly additional variables that occur only in α, are instantiated with the special constant . An example follows:
Example 10
Consider the TGD
$$ \sigma\ =\ \underbrace{P(x,y,z)}_{\alpha_{1}}, \underbrace{R(y,w)}_{\alpha_{2}}\ {\rightarrow}\ \exists u S(w,y,u), $$
We have that the set \(V_{\alpha _{1},\sigma }\) consists of all the variables in body(σ) ∖{α1}, while \(V_{\alpha _{2},\sigma }\) of all the variables in body(σ) ∖{α2}, that is,
$$ V_{\alpha_{1},\sigma}\ =\ \{y,w\} \qquad V_{\alpha_{2},\sigma}\ =\ \{x,y,z\}. $$
There are four sets of variables in body(σ) that are supersets of \(V_{\alpha _{1},\sigma }\):
$$ T_{1} = \{y,w\} \qquad T_{2} = \{x,y,w\} \qquad T_{3} = \{y,z,w\} \qquad T_{4} = \{x,y,z,w\}. $$
Moreover, there are two sets of variables in body(σ) that are supersets of \(V_{\alpha _{2},\sigma }\):
$$ T^{\prime}_{1} = \{x,y,z\} \qquad T^{\prime}_{2} = \{x,y,z,w\}. $$
Therefore, the set of mappings \(M_{\alpha _{1},\sigma }\) contains, for each i ∈{1,…,4}, a mapping of the form {xxTi}, whereas \(M_{\alpha _{2},\sigma }\) contains, for each i ∈{1,2}, a mapping of the form \(\{x \mapsto \natural \mid x \in T^{\prime }_{i}\}\). Finally, the linearization of σ is the set of linear TGDs Lin(σ) consisting of the TGDs
$$ \begin{array}{@{}rcl@{}} P_{{}_{(\ast,\natural,\ast)}}(x,z) &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u),\\ P_{{}_{(\natural,\natural,\ast)}}(z) &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u),\\ P_{{}_{(\ast,\natural,\natural)}}(x) &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u),\\ P_{{}_{(\natural,\natural,\natural)}}() &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u), \end{array} $$
due to the set of mappings \(M_{\alpha _{1},\sigma }\), and the TGDs
$$ \begin{array}{@{}rcl@{}} R_{{}_{(\natural,\ast)}}(w) &{\rightarrow}& \exists u S_{{}_{(\ast,\natural,\ast)}}(w,u),\\ R_{{}_{(\natural,\natural)}}() &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u). \end{array} $$
due to the set of mappings \(M_{\alpha _{2},\sigma }\).
Lemma 3 allows us to show that this procedure preserves the termination of the semi-oblivious chase whenever the input set of TGDs is sticky.
Lemma 3
For every set \({\varSigma } \in \mathbb {S}\) of TGDs, \({\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\) iff \(\mathsf {Lin}({\varSigma }) \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\).
Proof
We assume, w.l.o.g., that Σ is in normal form, i.e., each TGD of Σ has only one atom in its head. Given a TGD \(\sigma = \phi (\bar x, \bar y) {\rightarrow } \exists \bar z\ R(\bar x,\bar z)\) of Σ, an atom αbody(σ), and a mapping hMα, σ, we write σα, h for the linear TGD \({\natural \text {-}\mathsf {free}(h(\alpha ))} {\rightarrow } \exists \bar z\ {\natural \text {-}\mathsf {free}({h(R(\bar x, \bar z))})} \in {\mathsf {Lin}({\varSigma })}\). We are now ready to proceed with the proof.
(⇒) Assume that \(\mathsf {Lin}({\varSigma }) \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). By Corollary 1, there exists an infinite so-chase derivation δ = (Ii)i≥ 0 of cr(Lin(Σ)) w.r.t. Lin(Σ), with \(I_{i} {\langle {\sigma ^{\alpha _{i},g_{i}}_{i},h_{i}} \rangle } I_{i+1}\), or, equivalently, \(I_{i+1} = I_{i} \cup {\mathsf {so}}\text {-}{\mathsf {result}({\sigma ^{\alpha _{i},g_{i}}_{i}},{h_{i}})}\), for each i ≥ 0. By construction, for each i ≥ 0, the TGD \(\sigma ^{\alpha _{i},g_{i}}_{i}\) is linear, with its body being an atom of the form \(R_{\bar t}(\bar x)\), where \(\bar x\) is a tuple of variables, and \(\bar t\) is a tuple over {∗,}. Let \(\hat {h}_{i}\) be the extension of hi that maps each variable in body(σi) that is not in \(\bar {x}\) to .
Consider the infinite sequence of instances \(\delta ^{\prime } = (J_{i})_{i \ge 0}\), where J0 = cr(Σ), and \(J_{i+1} = J_{i} \cup \mathsf {so}\text {-}{\mathsf {result}({\sigma _{i}},{\hat {h}_{i}})}\), for each i ≥ 0. By construction of \(\hat {h}_{i}\), for each i ≥ 0, \((\sigma _{i},\hat {h}_{i})\) is a trigger for Σ on Ji, which implies that \(J_{i}{\langle {{\mathsf {so}},\sigma _{i},\hat {h}_{i}} \rangle }J_{i+1}\). However, the above sequence is not necessarily an infinite so-chase derivation of Σ w.r.t. cr(Σ) since some of the involved triggers may not be semi-obliviously active. We therefore consider the infinite sequence of instances \(\delta ^{\prime \prime } = (J^{\prime }_{i})_{i \geq 0}\) obtained from (Ji)i≥ 0 by simply removing the triggers that are not semi-obliviously active, or, more formally, by removing the instances obtained from triggers that are not semi-obliviously active. Recall that, for each i > 0, Ji is obtained via a trigger that is not semi-obliviously active iff Ji = Ji− 1. It remains to show that \(\delta ^{\prime \prime }\) is infinite, which in turn implies that \({\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). It suffices to show that, for each i ≥ 0, Ji occurs in \(\delta ^{\prime }\) finitely many times.
By contradiction, assume that there is k ≥ 0 such that Jk occurs in \(\delta ^{\prime }\) infinitely many times, which means that Jk = Jk+ 1 = Jk+ 2 = ⋯. Since Lin(Σ) is finite, there are indices kj < i such that \(\sigma ^{\alpha _{i},g_{i}}_{i} = \sigma ^{\alpha _{j},g_{j}}_{j}\); we refer to those TGDs as ρ. Since (Ii)i≥ 0 is a so-chase derivation of Lin(Σ) w.r.t. cr(Lin(Σ)), there is a variable xfr(ρ) such that hi(x)≠hj(x). By definition, \(h_{i}(x) = \hat {h}_{i}(x)\) and \(h_{j}(x) = \hat {h}_{j}(x)\). This implies that \(\hat {h}_{i}(x) \neq \hat {h}_{j}(x)\), and therefore, JiJj, which is a contradiction.
(⇐) Assume now that \({\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). By Theorem 3, there exists a linear infinite so-chase derivation δ = (Ii)i≥ 0 of cr(Σ) w.r.t. Σ, with Iiσi,hiIi+ 1, for i ≥ 0. In other words, there exists an infinite sequence of distinct atoms (αi)i≥ 0 such that
1.
α0cr(Σ), and
 
2.
for each i ≥ 0, αi+ 1Ii+ 1Ii, and there exists an atom βibody(σi) such that αi = hi(βi) and \(h_{i}(\mathsf {body}({\sigma _{i}}) \setminus \{\beta _{i}\}) \subseteq {\mathsf {cr}({\varSigma })}\).
 
For every i ≥ 0, let Xi = var(body(σi) ∖{βi}), \(g_{i} = h_{i | X_{i}}\), and fi = higi. Note that, since δ is linear, for every xXi, gi(x) = . Assuming that σi is of the form \(\phi _{i}(\bar x_{i}, \bar y_{i}) {\rightarrow } \exists \bar z_{i}\ R_{i}(\bar x_{i},\bar z_{i})\), let
$$ {\varSigma}^{\prime}\ =\ \left\{\underbrace{\natural\text{-}\mathsf{free}(g_{i}(\beta_{i})) {\rightarrow} \exists \bar z_{i} \natural\text{-}\mathsf{free}(g_{i}(R_{i}(\bar x_{i},\bar z_{i})))}_{\sigma^{\prime}_{i}}\right\}_{i \geq 0}. $$
By construction, \({\varSigma }^{\prime } \subseteq \mathsf {Lin}({\varSigma })\), and therefore, \({\varSigma }^{\prime } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\) implies \(\mathsf {Lin}({\varSigma }) \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). Hence, to conclude the proof of Lemma 3, it suffices to exhibit an infinite so-chase derivation of \({\mathsf {cr}({\varSigma }^{\prime })} \subseteq {\mathsf {cr}({\varSigma })}\) w.r.t. \({\varSigma }^{\prime }\). To this end, consider the infinite sequence of instances \(\delta ^{\prime } = (J_{i})_{i \ge 0}\), where \(J_{0} = \mathsf {cr}({\varSigma }^{\prime })\), and \(J_{i} = J_{i-1} \cup \{\alpha ^{\prime }_{i}\}\), where \(\alpha ^{\prime }_{i} = f_{i}({\mathsf {body}({{\sigma }_{i}^{\prime }})}) = f_{i}({\natural \text {-}\mathsf {free}(g_{i}(\beta _{i}))})\), for each i > 0. It is not difficult to verify that \(\delta ^{\prime }\) is an infinite so-chase derivation of \({\mathsf {cr}({\varSigma }^{\prime })}\) w.r.t. \({\varSigma }^{\prime }\) (modulo null renaming). □
The main result of this section follows from Theorem 4 and Lemma 3:
Theorem 5
Consider a set \({\varSigma } \in \mathbb {S}\) of TGDs. The following are equivalent:
1.
\({\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\).
 
2.
Lin(Σ) is critically-weakly-acyclic.
 
We would like to remark that the above characterization of the termination of the so-chase is different than the one presented in the conference paper [10] (see Corollary 24). More precisely, the linearization procedure in [10] produces TGDs with constants, and thus, we had to properly extend the notion of critical-weak-acyclicity from [8], which was defined only for constant-free TGDs. It turned out that we can directly use the notion of critical-weak-acyclicity as defined in [8], at the price of a slightly more complex linearization procedure that produces constant-free TGDs.

6 Complexity Analysis

We are now ready to complete the proof of our main result, that is, Theorem 2, which states that \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) and \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\) are PSpace-complete, and NLogSpace-complete for predicates of bounded arity. We first concentrate on the upper bounds.

6.1 Upper Bounds for \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\)

We know that the problem \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {L})\) is in PSpace, and in NLogSpace for predicates of bounded arity [8]. In fact, these results exploit Theorem 4, and are obtained by showing that deciding whether a set of linear TGDs is critically-weakly-acyclic is in PSpace, and in NLogSpace for predicates of bounded arity. However, despite the fact that we can reduce \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\) to \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {L})\) (see Lemma 3), we cannot directly exploit the complexity results for linear TGDs. The reason is because the linearization procedure takes exponential time, in general, and polynomial time in the case of bounded-arity predicates. Therefore, we cannot simply compute the set Lin(Σ), and then check for critical-weak-acyclicity, but a more refined approach is needed.
We focus on the complement of our problem, i.e., given a set \({\varSigma } \in \mathbb {S}\) of TGDs, we want to check whether \({\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). By Theorem 5, it suffices to show that Lin(Σ) is not critically-weakly-acyclic. The latter problem can be seen as a generalization of the standard graph reachability problem. Indeed, we need to check whether there exists a node in the dependency graph of Lin(Σ) that is reachable from itself via a critical cycle that contains a special edge. However, as discussed above, we cannot explicitly construct Lin(Σ) and its dependency graph G. Instead, the above reachability check should be performed on a compact representation of G, which is the set Σ itself.
Lemma 4
Consider a set Σ of TGDs. The problem of deciding whether Lin(Σ) is not critically-weakly-acyclic is in
$$ \textup{NSPACE}(\log |{\varSigma}| + \omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m), $$
where ω = ar(Σ), and m is the number of variables occurring in Σ.
Proof
We employ Algorithm 1, which takes as input a set Σ of TGDs, and checks whether Lin(Σ) is not critically-weakly-acyclic by non-deterministically searching for the existence of a critical cycle in dg(Lin(Σ)) that contains a special edge. Note that this is done without explicitly computing the graph dg(Lin(Σ)). Before showing that Algorithm 1 is correct, and analyzing its space complexity, let us first introduce an auxiliary notion that is used by the algorithm.
A Σ-label is a tuple = (σ, α, β, T), where σΣ, αbody(σ), βhead(σ), and \(V_{\alpha ,\sigma } \cap ({\mathsf {var}(\alpha )} \cup {\mathsf {var}(\beta )}) \subseteq T \subseteq {\mathsf {var}({{\mathsf {body}(\sigma )}})} \cap ({\mathsf {var}({\alpha })} \cup {\mathsf {var}({\beta }))}\). Note that a Σ-label induces a single-head TGD of Lin(Σ), denoted τ(), which might be the label of an edge in dg(Lin(Σ)); hence the name Σ-label. In fact, τ() is the TGD
$$ \natural\text{-}\mathsf{free}(h(\alpha))\ {\rightarrow}\ \exists \bar z {\natural\text{-}\mathsf{free}(h(\beta))}, $$
where h maps each variable of T to , and \(\bar z\) are the existentially quantified variables of σ occurring in β. Let us now proceed with the correctness of Algorithm 1. □
Correctness
It is an easy exercise to show the following claim concerning the procedure isEdge(⋅,⋅,⋅), which is described in Algorithm 2. Let dg(Lin(Σ)) = (N, E, λ).
Claim 1
Consider a Σ-label , and two positions v, usch(Lin(Σ)). Then:
1.
isEdge(, v, u) = (1,_) iff (v, u) ∈ E and λ((v, u)) = τ().
 
2.
isEdge(, v, u) = (1,1) iff (v, u) ∈ E is a special edge.
 
We can now show that Algorithm 1 accepts on input Σ iff there is a critical cycle in dg(Lin(Σ)) that contains a special edge:
(⇒) Assume first that Algorithm 1 accepts. This implies that there is an accepting computation that guesses a sequence of Σ-labels 1,…,n, and a sequence of positions v0,…,vn of pos(sch(Lin(Σ))). By construction, the following hold:
  • For each i ∈ [n], isEdge(i,vi− 1,vi) = (1,_), which implies that (vi− 1,vi) ∈ E and λ((vi− 1,vi)) = τ().
  • There exists i ∈ [n] such that isEdge(i,vi− 1,vi) = (1,1). Hence, (vi− 1,vi) ∈ E is a special edge.
  • For each i ∈ [n − 1], [τ(1),…,τ(i− 1)] is compatible with τ(i+ 1), which in turn implies that ρ = [τ(1),…,τ(n)] is active.
  • v0 = vn.
  • [ρ]ω+ 1 is active.
The above properties imply that v0,…,vn− 1,v0 is a critical cycle in dg(Lin(Σ)) that contains at least one special edge, and the claim follows.
(⇐) By hypothesis, there exists a critical cycle v0,…,vn− 1,v0 in dg(Lin(Σ)). Assume that, for each i ∈ [n − 1], λ((vi− 1,vi)) = σi and λ((vn− 1,v0)) = σn. Observe that each single-head linear TGD σi, for i ∈ [n], corresponds to a Σ-label i. Consider now the computation of Algorithm 1 on input Σ that guesses the sequence of labels 1,…,n, and the sequence of positions v0,…,vn− 1,v0. It should be clear that this is an accepting computation, and the claim follows.
Space Complexity
It remains to show that the space needed at each step of a computation of Algorithm 1 on input Σ is
$$ O(\log |{\varSigma}| + \omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m). $$
We proceed to analyze the space needed to store a Σ-label, a position of sch(Lin(Σ)), and a single-head linear TGD τ() for a Σ-label . We also analyze the space for a compatibility check, for constructing a resolvent, and for checking whether [ρ]ω+ 1 is active. It will be then apparent that indeed the space used at each step of a computation of Algorithm 1 on input Σ is what we claimed above. Note that the latter ensures the termination of Algorithm 1 since we can always force a space-bounded algorithm to terminate and reject after exponentially many steps in the required space [31].
  • The required space for a Σ-label = (σ, α, β, T) is
    $$ O(\log |{\varSigma}| + \log |\mathsf{sch}({\varSigma})| + \omega \cdot \log m). $$
    A TGD σΣ takes \(O(\log |{\varSigma }|)\) space. For storing an atom occurring in Σ we need to store a predicate of sch(Σ), and, in the worst-case, ω variables occurring in Σ, which can be done in space \(O(\log |{\mathsf {sch}({\varSigma })}| + \omega \cdot \log m)\). Finally, since T contains at most 2ω variables occurring in Σ, it can be stored in space \(O(\omega \cdot \log m)\). Summing up, requires the space stated above.
  • A position of sch(Lin(Σ)) takes space
    $$ O(\log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega})). $$
    This follows from the fact that, by construction, the number of predicates occurring in Lin(Σ) is at most |sch(Σ)|⋅ 2ω.
  • Given a Σ-label , the single-head linear TGD τ() takes space
    $$ O(\log(|\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m). $$
    It is clear that the two predicates occurring in τ() require O(|sch(Σ)|⋅ 2ω) space. We also need to store, in the worst-case, 2ω variables occurring in Σ, which takes \(O(\omega \cdot \log m)\) space. Summing up, τ() requires the space stated above.
  • To check whether two single-head linear TGDs σ1,σ2, computed during the execution of the algorithm, are compatible, we only need to check that, for each xvar(body(σ2)), either \(\mathsf {var}({\mathsf {head}(\sigma _{1})},{\varPi }_{x}^{\sigma _{2}}) \subseteq \mathsf {fr}(\sigma _{1})\), or there is an existentially quantified variable z in σ1 such that \({\mathsf {var}({{\mathsf {head}(\sigma _{1})}},{\varPi }_{x}^{\sigma _{2}})} = \{z\}\). The latter can be performed using space
    $$ O(\omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega})), $$
    which is the space needed for storing \({\varPi }_{x}^{\sigma _{2}}\). Note that head(σ1) and body(σ2) always unify since, at the point that we perform the compatibility check, we know that they have the same predicate (this has been checked by isEdge), and thus, there is no way for the unification check to fail.
  • Given two single-head linear TGDs σ1,σ2, computed during the execution of the algorithm, that are compatible, constructing [σ1,σ2] can be done using space
    $$ O(\omega \cdot \log m), $$
    which is essentially the space needed for storing mgu(head(σ1),body(σ2)).10
  • Finally, given the single-head linear TGD ρ, computed after the execution of the repeat-until, checking whether [ρ]ω+ 1 is active can be done using space
    $$ O(\omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m). $$
    This easily follows from the analysis performed above.
Summing up, each step of a computation of Algorithm 1 on input Σ takes space
$$ O(\log |{\varSigma}| + \omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m), $$
and the claim follows.
Having Lemma 4 in place, it is clear that the complement of \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\), and thus \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\) itself, is in PSpace, and in NLogSpace for predicates of bounded arity.11

6.2 Upper Bounds for \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\)

We proceed to explain how the upper bounds established above for \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\) can be transferred to \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\). Since the semi-oblivious chase is a refined version of the oblivious chase, it is not surprising that \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {TGD})\) can be reduced to \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {TGD})\). This relies on a very simple construction, known as enrichment [20]. Formally, the enrichment of a set Σ of TGDs, denoted enrichment(Σ), is the set of TGDs obtained by replacing each TGD σΣ of the form \(\phi (\bar x,\bar y) {\rightarrow } \exists \bar z\ \psi (\bar x,\bar z)\) with the TGD
$$ \phi(\bar x,\bar y) {\rightarrow} \exists \bar z\ \psi(\bar x,\bar z), \text{Aux}_{\sigma}(\bar x, \bar y), $$
where Auxσ is an auxiliary predicate of arity \((|\bar x|+|\bar y|)\) not occurring in sch(Σ). It is easy to show that the notion of enrichment provides the desired reduction from \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {TGD})\) to \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {TGD})\). More precisely:
Lemma 5
For a set Σ of TGDs, the following are equivalent:
1.
\({\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {o}}\).
 
2.
\(\mathsf {enrichment}({\varSigma }) \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\).
 
Proof
The fact that (2) implies (1) is a consequence of Theorem 6.1 in [20]. We proceed to show that (1) implies (2). Assume that \(\mathsf {enrichment}({\varSigma }) \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\). Therefore, there exists a database D, and an infinite so-chase derivation δ = (Ii)i≥ 0 of D w.r.t. enrichment(Σ), where \(I_{i} {\langle {{\mathsf {so}},{\sigma }_{i}^{\prime }, {h_{i}}\rangle }, I_{i+1}}\), and, for each i ≥ 0, \(\sigma _{i}^{\prime } \in {\mathsf {enrichment}({\varSigma })}\) is the TGD corresponding to some TGD σiΣ. We can easily construct an infinite o-chase derivation \(\delta ^{\prime } = (J_{i})_{i \ge 0}\) of D w.r.t. Σ: let J0 = D, and, for each i > 0, Ji is obtained from Ii by removing every atom with a predicate symbol of the form Auxσ, for some σΣ, and by replacing every null of the form \(\bot ^{z}_{\sigma ^{\prime },h}\) with the null \(\bot ^{z}_{\sigma ,h}\). The fact that Jio, σi,hiJi+ 1 and (σi,h) is obliviously active, for each i ≥ 0, follows by construction and the observation that, for each TGD \(\sigma _{i}^{\prime } \in {\mathsf {enrichment}({\varSigma })}\) for i ≥ 0, \({\mathsf {fr}({\sigma _{i}^{\prime }})}\) coincides with var(body(σi)). □
It is also crucial to observe that the class of sticky sets of TGDs is closed under enrichment. In other words:
Lemma 6
For each set \({\varSigma } \in \mathbb {S}\), \(\mathsf {enrichment}({\varSigma }) \in \mathbb {S}\).
Proof
Consider a set \({\varSigma } \in \mathbb {S}\). The claim follows from the fact that, for each σΣ of the form \(\phi (\bar x,\bar y) {\rightarrow } \exists \bar z\ \psi (\bar x,\bar z)\), the corresponding TGD \(\sigma ^{\prime } \in {\mathsf {enrichment}({\varSigma })}\) of the form \(\phi (\bar x,\bar y) {\rightarrow } \exists \bar z\ \psi (\bar x,\bar z), \text {Aux}_{\sigma }(\bar x, \bar y)\) is such that the atom \(\text {Aux}_{\sigma }(\bar x, \bar y)\) contains every body variable of σ, and no TGD in enrichment(Σ) has an atom with predicate Auxσ in its body. Hence, no variable in enrichment(Σ) is marked. □
From Lemmas 5 and 6, we can conclude that \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) can be reduced in logspace to \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\). Since, as shown above, \({\mathsf {CT}_{\forall }^{{\mathsf {o}}}}(\mathbb {S})\) is in PSpace, and PSpace is closed under logspace reductions, we immediately get that \({\mathsf {CT}_{\forall }^{{\mathsf {o}}}}(\mathbb {S})\) is in PSpace, as needed. However, in the case of predicates of bounded arity, we cannot immediately inherit the NLogSpace upper bound since the reduction provided by Lemmas 5 and 6 introduces auxiliary predicates of the form Auxσ, where σ is a TGD, of unbounded arity. Indeed, the arity of this auxiliary predicate is the number of variables occurring in the body of the TGD σ, which can be unbounded even if we use only bounded-arity predicates. Nevertheless, a predicate Auxσ does not occur in the body of a TGD, which means that it cannot be part of a cycle in the underlying dependency graph. Therefore, we can consider a slightly modified version of Algorithm 1 that simply ignores the atoms that mention a predicate Auxσ. By relying on this modified algorithm, we get that for a set Σ of TGDs over predicates of bounded arity, checking whether Lin(enrichment(Σ)) is not critically-weakly-acyclic is in NLogSpace, which implies that \({\mathsf {CT}_{\forall }^{{\mathsf {o}}}}(\mathbb {S})\) is in NLogSpace for predicates of bounded arity.

6.3 Lower Bounds

We conclude the proof of Theorem 2 by providing matching lower bounds. We show that \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) and \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\) are PSpace-hard, and NLogSpace-hard for predicates of bounded arity. Let us start with the general case of unbounded-arity predicates.
Predicates of Unbounded Arity
Since, as discussed above, \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) can be reduced in logspace to \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\), it suffices to show that \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) is PSpace-hard, or, equivalently, its complement is PSpace-hard. We show that every problem in PSpace can be reduced in logspace to the complement of \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\). Fix a problem π in PSpace, and let M = (S, Λ, f, s1) be the deterministic polynomial space Turing machine that solves π, where S = {s1,…,sq} is the set of states of M, Λ = {0,1,⊔} is the tape alphabet of M with ⊔ being the blank symbol, \(f : S \times {\varLambda } {\rightarrow } (S \times {\varLambda } \times \{\leftarrow ,-,\rightarrow \})\) is the transition function of M, and s1S is the initial state. We assume, w.l.o.g., that M is well-behaved and never tries to read beyond its tape boundaries, always halts, and uses exactly n = mk tape cells, where k > 0 and m is the size of the input string. We also assume that the machine accepts if it reaches a configuration with state s2. For the purposes of the current proof, we represent a configuration of M as a string
$$ s,t_{1},c_{1},t_{2},c_{2},\ldots,t_{n},c_{n}, $$
where sS, (ti,ci) ∈Λ×{, }, for each i ∈ [n], and there is exactly one i ∈ [n] such that ci =. Such a string encodes the configuration where the state is s, the i-th cell of the machine contains the symbol ti, and, assuming that ci =, the cursor is on the i-th cell of the machine. For example, the initial configuration of M on input I = a1,…,am is
$$ s_{1},a_{1},\uparrow,a_{2},\flat,\ldots,a_{m},\flat,\underbrace{\sqcup,\flat,\ldots,\sqcup,\flat}_{n-m}. $$
Our goal is to construct a set \({\varSigma } \in \mathbb {S}\) such that M accepts on input I = a1,…,am iff \({\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {o}}\), or, equivalently, there exists an infinite o-chase derivation of cr(Σ) w.r.t. Σ. The high-level idea is, starting from an atom of the form Start(,⊥,), where ⊥ is a null, to apply a TGD σstart and generate the initial configuration of M on input I, which will be stored in a predicate Config. Then, each application of a TGD will mimic a transition rule of f and generate a valid configuration of M. Once an accepting configuration is reached, then an atom of the form \(\text {Start}(\bot ,\bot ^{\prime },\natural )\), where \(\bot ^{\prime }\) is a null different than ⊥, will be generated, which will trigger again the TGD σstart. This will give rise to an infinite o-chase derivation of cr(Σ) w.r.t. Σ. To achieve this, however, via a sticky set of TGDs, we need a proper encoding of a configuration of M as a Config-atom generated during the execution of the chase. We proceed to explain this encoding, which will then allow us to define our sticky set of TGDs.
The key idea is to encode a state of M, the symbols of Λ, and the symbols , , as tuples consisting of a null ⊥∈N and a single occurrence of the constant , while the position at which occurs in this tuple identifies the state or symbol in question. In particular, a state siS, for i ∈ [q], will be represented in the chase as a tuple
$$ \underbrace{\bot,\ldots,\bot}_{i-1},\natural,\underbrace{\bot,\ldots,\bot}_{|S|-i}. $$
The tape alphabet 0,1,⊔ will be represented as
$$ \natural,\bot,\bot \qquad \bot,\natural,\bot \qquad \bot,\bot,\natural, $$
respectively. Finally, the symbols , will be represented as
$$ \natural,\bot \qquad \bot,\natural, $$
respectively. For example, assuming that I = 1,1,…,1,0, that is, ai = 1, for each i ∈ [m − 1], and am = 0, the initial configuration of M on input I will be represented in the chase via an atom of the form \(\text {Config}(\bar t_{1},\bar t_{2},\bar t_{3})\), where
$$ \bar t_1\ =\ \overbrace{\natural,\underbrace{\bot,\ldots,\bot}_{|S|-1}}^{s_1} $$
$$ \bar t_2\ =\ \overbrace{\bot,{\natural},\bot}^{1},\overbrace{{\natural},\bot}^{\uparrow}, \underbrace{\overbrace{\bot,{\natural},\bot}^{1},\overbrace{\bot,{\natural}}^{\flat},\ldots,\overbrace{\bot,{\natural},\bot}^{1},\overbrace{\bot,{\natural}}^{\flat}}_{m-2}, \overbrace{{\natural},\bot,\bot}^{0},\overbrace{\bot,{\natural}}^{\flat}, $$
$$ \bar t_3\ =\ \underbrace{\overbrace{\bot,\bot,\natural}^{\sqcup},\overbrace{\bot,\natural}^{\flat},\ldots,\overbrace{\bot,\bot,\natural}^{\sqcup},\overbrace{\bot,\natural}^{\flat}}_{n-m} $$
We are now ready to devise the desired set Σ of TGDs. For notional convenience, we are going to use the following abbreviations for tuples of variables:
$$ \text{for each } i \in [q],\ \pmb{s_{i}}\ =\ \underbrace{y,\ldots,y}_{i-1},z,\underbrace{y,\ldots,y}_{|S|-i} $$
$$ \pmb{0}\ =\ z,y,y \qquad \pmb{1}\ =\ y,z,y \qquad \pmb{\sqcup}\ = y,y,z \qquad \pmb{\uparrow}\ =\ z,y \qquad \pmb{\flat}\ =\ y,z $$
Observe that the above abbreviations follow exactly the same pattern as the encodings described above. In fact, whenever we need to refer to the state si in a TGD, we are going to use the abbreviation si, which means that the variable y will be mapped to a null, and the variable z to the constant . Analogously, if we need to refer to a tape symbol or to , , we are going to use the abbreviations 0, 1, ⊔, and , respectively. The set Σ consists of the following TGDs:
$$ \text{Start}(x,y,z)\ {\rightarrow}\ \text{Config}(\pmb{s_{1}},\pmb{a_{1}}, \pmb{\uparrow},\pmb{a_{2}},\pmb{\flat},\ldots,\pmb{a_{m}},\pmb{\flat},\underbrace{\pmb{\sqcup},\pmb{\flat},\ldots,\pmb{\sqcup},\pmb{\flat}}_{n-m},y,z) $$
which generates the initial configuration of M on input I = a1,…,am. The rest of the TGDs are responsible for simulating the transition function of M.
Before defining those TGDs, let us introduce some additional abbreviations. Let
$$ \text{for each } i \in [n],\ \pmb{x_{i}}\ =\ {x}_{i}^{0},{x}_{i}^{1},{x}_{i}^{\sqcup} \quad \text{and} \quad \pmb{{x}_{i}^{c}}\ =\ {x}_{i}^{\uparrow},{x}_{i}^{\flat} $$
which will be used as placeholders for the content of the i-th cell, and the position of the cursor. In fact, the above abbreviations will simply help us to copy the unchanged cells from a configuration to its subsequent one. In addition, for tSΛ∪{, }, we write \(\hat {\pmb {t}}\) for the tuple of variables obtained from the tuple t after replacing each occurrence of the variable y at some position i with the new variable \({y}_{t}^{i}\), while the variable z remains unchanged. More precisely:
$$ \text{for each } i \in [q],\ \hat{\pmb{s_{i}}}\ =\ {y}_{s_{i}}^{1},\ldots,{y}_{s_{i}}^{i-1},z,{y}_{s_{i}}^{i+1},\ldots,{y}_{s_{i}}^{q} $$
$$ \hat{\pmb{0}} = z,{y}_{0}^{1},{y}_{0}^{2} \qquad \hat{\pmb{1}} = {y}_{1}^{1},z,{y}_{1}^{2} \qquad \hat{\pmb{\sqcup}} = {y}_{\sqcup}^{1},{y}_{\sqcup}^{2},z \qquad \hat{\pmb{\uparrow}} = z,{y}_{\uparrow}^{1} \qquad \hat{\pmb{\flat}} = {y}_{\flat}^{1},z $$
The above abbreviations will allow us to break some unnecessary joins over the variable y in the bodies of the TGDs, which in turn will ensure that the obtained set of TGDs is sticky. We proceed to introduce the rest of the TGDs.
We first introduce several TGDs that simulate the transition rules of f where the cursor moves to the left. In particular, for each transition rule \(f(s_{i},a) = (s_{j},b,\leftarrow )\), and for each ∈ [n], we have the TGD
$$ \begin{array}{@{}rcl@{}} \text{Config}({\hat{\pmb{s_i}}},{\pmb{x_1}},{\pmb{x_1^c}}, \ldots,{\pmb{x_{\ell-1}}},{\pmb{x_{\ell-1}^c}}, {\hat{\pmb{a}}},{\hat{\pmb{\uparrow}}}, {\pmb{x_{\ell+1}}},{\pmb{x_{\ell+1}^c}},\ldots,{\pmb{x_{n}}},{\pmb{x_{n}^c}},y,z)\ \rightarrow\\ \text{Config}({\pmb{s_j}}, {\pmb{x_1}},{\pmb{x_1^c}},\ldots,{\pmb{x_{\ell-1}}},{\pmb{\uparrow}},{\pmb{b}},{\pmb{\flat}},{\pmb{x_{\ell+1}}},{\pmb{x_{\ell+1}^c}},\ldots,{\pmb{x_{n}}},{\pmb{x_{n}^c}},y,z). \end{array} $$
Analogously, we have TGDs that simulate the transition rules of f where the cursor moves to the right. In particular, for each transition rule \(f(s_{i},a) = (s_{j},b,\rightarrow )\), and for each ∈ [n], we have the TGD
$$ \begin{array}{@{}rcl@{}} &&\text{Config}(\hat{\pmb{s_{i}}},\pmb{x_{1}},\pmb{{x_{1}^{c}}}, \ldots,\pmb{x_{\ell-1}},\pmb{x_{\ell-1}^{c}}, \hat{\pmb{a}},\hat{\pmb{\uparrow}}, \pmb{x_{\ell+1}},\pmb{x_{\ell+1}^{c}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z)\ {\rightarrow}\\ &&\quad\quad\text{Config}(\pmb{s_{j}}, \pmb{x_{1}},\pmb{{x_{1}^{c}}},\ldots,\pmb{x_{\ell-1}},\pmb{x_{\ell+1}^{c}},\pmb{b},\pmb{\flat},\pmb{x_{\ell+1}},\pmb{\uparrow},\pmb{x_{\ell+2}},\pmb{x_{\ell+2}^{c}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z). \end{array} $$
We complete the simulation of the transition function by adding TGDs that simulate the transition rules of f where the cursor stays at the same position. In particular, for each transition rule f(si,a) = (sj,b,−), and for each ∈ [n], we have the TGD
$$ \begin{array}{@{}rcl@{}} &&\text{Config}(\hat{\pmb{s_{i}}},\pmb{x_{1}},\pmb{{x_{1}^{c}}}, \ldots,\pmb{x_{\ell-1}},\pmb{x_{\ell-1}^{c}}, \hat{\pmb{a}},\hat{\pmb{\uparrow}}, \pmb{x_{\ell+1}},\pmb{x_{\ell+1}^{c}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z)\ {\rightarrow}\\ &&\quad\quad\quad\quad\text{Config}(\pmb{s_{j}}, \pmb{x_{1}},\pmb{{x_{1}^{c}}},\ldots,\pmb{x_{\ell-1}},\pmb{x_{\ell-1}^{c}},\pmb{b},\pmb{\uparrow},\pmb{x_{\ell+1}},\pmb{x_{\ell+1}^{c}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z). \end{array} $$
Finally, we introduce a TGD that checks whether an accepting configuration has been reached; recall that, by assumption, the machine accepts if it reaches a configuration with state s2:
$$ \text{Config}(\hat{\pmb{s_{2}}},\pmb{x_{1}},\pmb{{x_{1}^{c}}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z) {\rightarrow} \exists w \text{Start}(y,w,z). $$
Observe that, if an accepting configuration is reached, then a new atom of the form Start(_,⊥,) is obtained, where ⊥ is a new null value, which will trigger again the first TGD that generates the initial configuration. This completes the definition of Σ.
It is now easy to see that M accepts on input I iff there exists an infinite o-chase derivation of cr(Σ) w.r.t. Σ, or, equivalently, \({\varSigma } \not \in {\mathbb {CT}}_{\forall }^{{\mathsf {o}}}\). Let us remark that starting from cr(Σ), the first TGD that will be applied is the one with the Start predicate in the head, and an atom α of the form Start(,⊥,), where ⊥ is a null, will be generated. Then, α will trigger the TGD that generates the initial configuration of M on input I, which will then guarantee a faithful simulation of the computation of M on I. Moreover, it is easy to verify that \({\varSigma } \in \mathbb {S}\). The key reason why this holds is because the only variable that occurs more than once in the body of a TGD is z, which is always propagated to the head at the same position, which is actually the last one.
Predicates of Bounded Arity.
By Lemmas 5 and 6, and the definition of enrichment, for every set \({\varSigma } \in \mathbb {S}\), it holds that:
  • \({\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {o}}\) iff \(\mathsf {enrichment}({\varSigma }) \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}\).
  • \(\mathsf {enrichment}({\varSigma }) \in \mathbb {S}\).
  • \(\mathsf {ar}(\mathsf {enrichment}({\varSigma })) = \max \limits \{\mathsf {ar}({\varSigma }),k\}\), where k ≥ 0 is the maximum number of variables occurring in the body of a TGD of Σ.
Therefore, to show that \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) and \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\) are NLogSpace-hard for predicates of bounded arity, it suffices to concentrate on \({\mathbb {CT}}_{\forall }^{\mathsf {o}}\) and provide a hardness result even for TGDs that can have at most k body variables for some fixed integer k ≥ 0. This can be easily shown via a logspace reduction from graph reachability. Consider a directed graph G = (N, E), and two nodes s, tN. The problem of deciding whether in G the node t is reachable from s is NLogSpace-hard. We are going to construct a sticky set Σ of TGDs that uses only unary predicates, and each TGD has only one body variable, such that t is reachable from s in G iff \({\varSigma } \not \in {\mathbb {CT}_{\forall }^{\mathsf {o}}}\). This in turn implies that \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\) is NLogSpace-hard since coNLogSpace = NLogSpace. The set Σ consists of the following TGDs:
$$ \begin{array}{@{}rcl@{}} \text{ Loop}(x) &\rightarrow& P_s(x)\\ P_v(x) &\rightarrow& P_u(x), \text{ for each } (v,u) \in E\\ P_t(x) &\rightarrow& \exists z \text{ Loop}(z). \end{array} $$
It is easy to see that indeed t is reachable from s iff \({\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {o}}\), and the claim follows.

7 Discussion and Future Work

We have studied all-instances (semi-)oblivious chase termination for sticky sets of TGDs, and provide precise complexity results. It turned out that our results and techniques allow us to obtain further complexity results concerning the chase procedure.
Further Results
Interestingly, our main result has already found applications in the context of (semi-)oblivious chase boundedness, which has been recently studied in [7]. Roughly, chase boundedness guarantees, not only the termination of the chase procedure, but also the existence of a uniform bound on the depth of the chase. As shown in [7], in the case of sticky sets of TGDs, (semi-)oblivious chase termination and (semi-)oblivious chase boundedness coincide. Therefore, from the main result of our work, we immediately get the complexity of deciding (semi-)oblivious chase boundedness in the case of sticky sets of TGDs.
Moreover, the techniques that have been developed in this work can also be applied to shy sets of TGDs, another well-known TGD-based formalism [25, 26]. Shy sets of TGDs are not powerful enough to join over null values that have been invented during the execution of the chase. This central property of shy sets of TGDs allows us to reuse the machinery developed for stickiness, and show that the (semi-)oblivious chase termination problem for shy sets of TGDs is PSpace-complete in general, and NLogSpace-complete for predicates of bounded arity.
Future Work
Although all-instances (semi-)oblivious chase termination for sticky sets of TGDs is well-understood, there are still interesting and highly non-trivial open questions that we are planning to address in the near future:
1.
Whenever the (semi-)oblivious chase terminates under a sticky set of TGDs, what is the exact size of the final result?
 
2.
Even if the (semi-)oblivious chase does not terminate, it might be the case that a finite universal model exists [15]. Can we decide, in the case of sticky sets of TGDs, whether a finite universal model exists? And what about its size?
 

Acknowledgements

This work has been funded by the EPSRC grant EP/S003800/1 “EQUID”, and the EPSRC programme grant EP/M025268/ “VADA”.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Fußnoten
1
We work with finite sets of TGDs only. Thus, in the rest of the paper, a set of TGDs is always finite.
 
2
Recall that Σ is in normal form, which means that head(σ) is a single atom. By abuse of notation, we treat so-result(σ, h) as an atom instead of a singleton set.
 
3
Although we assume that Σ is in normal form, for the sake of readability, in our example we use TGDs with more than one atom in the head. Moreover, instead of following the naming scheme for nulls introduced in Definition 1, we are going to use abbreviated names of the form ⊥i, where i > 0.
 
4
Since we focus on TGDs with only one head atom, and since we apply only active triggers, given an atom \(\alpha \in \mathcal {I} \setminus {\mathsf {cr}({\varSigma })}\) that is the birth atom of a null, we can extract the trigger that generates α, and therefore, we can unambiguously refer to its frontier.
 
5
We assume that there exists some fixed mechanism that selects t. For example, t can be the lexicographically first element of \(\{t^{\prime } \mid \mathsf {rank}_{\delta }(t^{\prime }) = \mathsf {rank}(\bot ) - 1 \text { and } t^{\prime } \vartriangleright _{\delta } \bot \}\).
 
6
König’s Lemma is a well-known result from graph theory, which states that an infinite rooted tree with finite out-degree has an infinite directed simple path starting from the root.
 
7
Recall that, although we assume TGDs in normal form, for the sake of readability, in Example 4 (and thus, also in Example 5), we use TGDs with more than one atom in the head. This is why the set of atoms H is considered. Notice that H is empty in case of single-head TGDs.
 
8
Let us clarify that in this section, unlike Section 4, we consider arbitrary TGDs not in normal form. In the previous section, we assumed TGDs in normal form only for technical reasons.
 
9
Let us clarify that [17] does not consider the semi-oblivious, but the restricted (a.k.a. the standard) version of the chase. However, the exact same proof applies in the case of the semi-oblivious chase.
 
10
For constructing the MGU, one can use a simplified version of Robinson’s unification algorithm that does not consider functions symbols.
 
11
Recall that by the Immerman-Szelepcsényi Theorem, coNLogSpace = NLogSpace.
 
Literatur
1.
Zurück zum Zitat Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading, MA (1995)MATH Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading, MA (1995)MATH
2.
Zurück zum Zitat Aho, A.V., Sagiv, Y., Ullman, J.D.: Efficient optimization of a class of relational expressions. ACM Trans. Database Syst. 4(4), 435–454 (1979)CrossRef Aho, A.V., Sagiv, Y., Ullman, J.D.: Efficient optimization of a class of relational expressions. ACM Trans. Database Syst. 4(4), 435–454 (1979)CrossRef
3.
Zurück zum Zitat Baget, J., Mugnier, M., Rudolph, S., Thomazo, M.: Walking the Complexity Lines for Generalized Guarded Existential Rules. In: IJCAI, pp 712–717 (2011) Baget, J., Mugnier, M., Rudolph, S., Thomazo, M.: Walking the Complexity Lines for Generalized Guarded Existential Rules. In: IJCAI, pp 712–717 (2011)
4.
Zurück zum Zitat Baget, J.F., Leclère, M., Mugnier, M.L., Salvat, E.: On rules with existential variables: walking the decidability line. Artif. Intell. 175 (9-10), 1620–1654 (2011)MathSciNetCrossRef Baget, J.F., Leclère, M., Mugnier, M.L., Salvat, E.: On rules with existential variables: walking the decidability line. Artif. Intell. 175 (9-10), 1620–1654 (2011)MathSciNetCrossRef
6.
Zurück zum Zitat Benedikt, M., Konstantinidis, G., Mecca, G., Motik, B., Papotti, P., Santoro, D., Tsamoura, E.: Benchmarking the Chase. In: PODS, pp 37–52 (2017) Benedikt, M., Konstantinidis, G., Mecca, G., Motik, B., Papotti, P., Santoro, D., Tsamoura, E.: Benchmarking the Chase. In: PODS, pp 37–52 (2017)
7.
Zurück zum Zitat Bourhis, P., Leclėre, M., Mugnier, M., Tison, S., Ulliana, F., Gallois, L.: Oblivious and Semi-Oblivious Boundedness for Existential Rules. In: IJCAI, pp 1581–1587 (2019) Bourhis, P., Leclėre, M., Mugnier, M., Tison, S., Ulliana, F., Gallois, L.: Oblivious and Semi-Oblivious Boundedness for Existential Rules. In: IJCAI, pp 1581–1587 (2019)
8.
Zurück zum Zitat Calautti, M., Gottlob, G., Pieris, A.: Chase Termination for Guarded Existential Rules. In: PODS, pp 91–103 (2015) Calautti, M., Gottlob, G., Pieris, A.: Chase Termination for Guarded Existential Rules. In: PODS, pp 91–103 (2015)
9.
Zurück zum Zitat Calautti, M., Greco, S., Molinaro, C., Trubitsyna, I.: Exploiting equality generating dependencies in checking chase termination. PVLDB 9(5), 396–407 (2016) Calautti, M., Greco, S., Molinaro, C., Trubitsyna, I.: Exploiting equality generating dependencies in checking chase termination. PVLDB 9(5), 396–407 (2016)
10.
Zurück zum Zitat Calautti, M., Pieris, A.: Oblivious Chase Termination: The Sticky Case. In: ICDT, pp 17:1–17:18 (2019) Calautti, M., Pieris, A.: Oblivious Chase Termination: The Sticky Case. In: ICDT, pp 17:1–17:18 (2019)
11.
Zurück zum Zitat Calì, A., Gottlob, G., Kifer, M.: Taming the infinite chase: query answering under expressive relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013)MathSciNetCrossRef Calì, A., Gottlob, G., Kifer, M.: Taming the infinite chase: query answering under expressive relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013)MathSciNetCrossRef
12.
Zurück zum Zitat Calì, A., Gottlob, G., Lukasiewicz, T.: A general Datalog-based framework for tractable query answering over ontologies. J. Web Sem. 14, 57–83 (2012)CrossRef Calì, A., Gottlob, G., Lukasiewicz, T.: A general Datalog-based framework for tractable query answering over ontologies. J. Web Sem. 14, 57–83 (2012)CrossRef
13.
Zurück zum Zitat Calì, A., Gottlob, G., Pieris, A.: Towards more expressive ontology languages: the query answering problem. Artif. Intell. 193, 87–128 (2012)MathSciNetCrossRef Calì, A., Gottlob, G., Pieris, A.: Towards more expressive ontology languages: the query answering problem. Artif. Intell. 193, 87–128 (2012)MathSciNetCrossRef
14.
Zurück zum Zitat Cuenca Grau, B., Horrocks, I., Krȯtzsch, M., Kupke, C., Magka, D., Motik, B., Wang, Z.: Acyclicity notions for existential rules and their application to query answering in ontologies. J. Artif. Intell. Res. 47, 741–808 (2013)MathSciNetCrossRef Cuenca Grau, B., Horrocks, I., Krȯtzsch, M., Kupke, C., Magka, D., Motik, B., Wang, Z.: Acyclicity notions for existential rules and their application to query answering in ontologies. J. Artif. Intell. Res. 47, 741–808 (2013)MathSciNetCrossRef
15.
Zurück zum Zitat Deutsch, A., Nash, A., Remmel, J.B.: The Chase Revisisted. In: PODS, pp 149–158 (2008) Deutsch, A., Nash, A., Remmel, J.B.: The Chase Revisisted. In: PODS, pp 149–158 (2008)
16.
Zurück zum Zitat Deutsch, A., Tannen, V.: Reformulation of XML Queries and Constraints. In: ICDT, pp 225–241 (2003) Deutsch, A., Tannen, V.: Reformulation of XML Queries and Constraints. In: ICDT, pp 225–241 (2003)
17.
Zurück zum Zitat Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRef Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRef
18.
Zurück zum Zitat Gogacz, T., Marcinkowski, J.: All-Instances Termination of Chase is Undecidable. In: ICALP, pp 293–304 (2014) Gogacz, T., Marcinkowski, J.: All-Instances Termination of Chase is Undecidable. In: ICALP, pp 293–304 (2014)
19.
Zurück zum Zitat Gogacz, T., Marcinkowski, J., Pieris, A.: All-Instances Restricted Chase Termination. In: PODS. To appear (2020) Gogacz, T., Marcinkowski, J., Pieris, A.: All-Instances Restricted Chase Termination. In: PODS. To appear (2020)
21.
Zurück zum Zitat Greco, S., Spezzano, F., Trubitsyna, I.: Stratification criteria and rewriting techniques for checking chase termination. PVLDB 4(11), 1158–1168 (2011) Greco, S., Spezzano, F., Trubitsyna, I.: Stratification criteria and rewriting techniques for checking chase termination. PVLDB 4(11), 1158–1168 (2011)
22.
Zurück zum Zitat Krötzsch, M., Marx, M., Rudolph, S.: The Power of the Terminating Chase (Invited Talk). In: ICDT, pp 3:1–3:17 (2019) Krötzsch, M., Marx, M., Rudolph, S.: The Power of the Terminating Chase (Invited Talk). In: ICDT, pp 3:1–3:17 (2019)
23.
Zurück zum Zitat Krötzsch, M., Rudolph, S.: Extending Decidable Existential Rules by Joining Acyclicity and Guardedness. In: IJCAI, pp 963–968 (2011) Krötzsch, M., Rudolph, S.: Extending Decidable Existential Rules by Joining Acyclicity and Guardedness. In: IJCAI, pp 963–968 (2011)
24.
Zurück zum Zitat Leclère, M., Mugnier, M., Thomazo, M., Ulliana, F.: A Single Approach to Decide Chase Termination on Linear Existential Rules. In: ICDT, pp 18:1–18:19 (2019) Leclère, M., Mugnier, M., Thomazo, M., Ulliana, F.: A Single Approach to Decide Chase Termination on Linear Existential Rules. In: ICDT, pp 18:1–18:19 (2019)
25.
Zurück zum Zitat Leone, N., Manna, M., Terracina, G., Veltri, P.: Efficiently Computable Datalog∃ Programs. In: KR (2012) Leone, N., Manna, M., Terracina, G., Veltri, P.: Efficiently Computable Datalog∃ Programs. In: KR (2012)
26.
Zurück zum Zitat Leone, N., Manna, M., Terracina, G., Veltri, P.: Fast query answering over existential rules. ACM Trans. Comput. Log. 20(2), 12:1–12:48 (2019)MathSciNetCrossRef Leone, N., Manna, M., Terracina, G., Veltri, P.: Fast query answering over existential rules. ACM Trans. Comput. Log. 20(2), 12:1–12:48 (2019)MathSciNetCrossRef
27.
Zurück zum Zitat Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing implications of data dependencies. ACM Trans Database Syst. 4(4), 455–469 (1979)CrossRef Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing implications of data dependencies. ACM Trans Database Syst. 4(4), 455–469 (1979)CrossRef
28.
Zurück zum Zitat Marnette, B.: Generalized Schema-Mappings: from Termination to Tractability. In: PODS, pp 13–22 (2009) Marnette, B.: Generalized Schema-Mappings: from Termination to Tractability. In: PODS, pp 13–22 (2009)
29.
Zurück zum Zitat Meier, M., Schmidt, M., Lausen, G.: On chase termination beyond stratification. PVLDB 2(1), 970–981 (2009) Meier, M., Schmidt, M., Lausen, G.: On chase termination beyond stratification. PVLDB 2(1), 970–981 (2009)
30.
Zurück zum Zitat Nenov, Y., Piro, R., Motik, B., Horrocks, I., Wu, Z., Banerjee, J.: Rdfox: a Highly-Scalable RDF Store. In: ISWC, pp 3–20 (2015) Nenov, Y., Piro, R., Motik, B., Horrocks, I., Wu, Z., Banerjee, J.: Rdfox: a Highly-Scalable RDF Store. In: ISWC, pp 3–20 (2015)
31.
Zurück zum Zitat Papadimitriou, C.H.: Computational complexity. Addison-Wesley, Reading, MA (1994)MATH Papadimitriou, C.H.: Computational complexity. Addison-Wesley, Reading, MA (1994)MATH
32.
Zurück zum Zitat Rudolph, S., Krötzsch, M., Hitzler, P.: All Elephants are Bigger than All Mice. In: DL (2008) Rudolph, S., Krötzsch, M., Hitzler, P.: All Elephants are Bigger than All Mice. In: DL (2008)
33.
Zurück zum Zitat Urbani, J., Krötzsch, M., Jacobs, C.J.H., Dragoste, I., Carral, D.: Efficient Model Construction for Horn Logic with Vlog - System Description. In: IJCAR, pp 680–688 (2018) Urbani, J., Krötzsch, M., Jacobs, C.J.H., Dragoste, I., Carral, D.: Efficient Model Construction for Horn Logic with Vlog - System Description. In: IJCAR, pp 680–688 (2018)
Metadaten
Titel
Semi-Oblivious Chase Termination: The Sticky Case
verfasst von
Marco Calautti
Andreas Pieris
Publikationsdatum
17.08.2020
Verlag
Springer US
Erschienen in
Theory of Computing Systems / Ausgabe 1/2021
Print ISSN: 1432-4350
Elektronische ISSN: 1433-0490
DOI
https://doi.org/10.1007/s00224-020-09994-5

Weitere Artikel der Ausgabe 1/2021

Theory of Computing Systems 1/2021 Zur Ausgabe