nach oben

Theory of Computing Systems

Erschienen in:

Open Access 17.08.2020

Semi-Oblivious Chase Termination: The Sticky Case

verfasst von: Marco Calautti, Andreas Pieris

Erschienen in: Theory of Computing Systems | Ausgabe 1/2021

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

The chase procedure is a fundamental algorithmic tool in database theory with a variety of applications. A key problem concerning the chase procedure is all-instances termination: for a given set of tuple-generating dependencies (TGDs), is it the case that the chase terminates for every input database? In view of the fact that this problem is undecidable, it is natural to ask whether known well-behaved classes of TGDs, introduced in different contexts such as ontological reasoning, ensure decidability. We consider a prominent paradigm that led to a robust TGD-based formalism, called stickiness. We show that for sticky sets of TGDs, all-instances chase termination is decidable if we focus on the (semi-)oblivious chase, and we pinpoint its exact complexity: PSpace-complete in general, and NLogSpace-complete for predicates of bounded arity. These complexity results are obtained via a graph-based syntactic characterization of chase termination that is of independent interest.

This article belongs to the Topical Collection: Special Issue on Database Theory (ICDT 2019)

Guest Editor: Pablo Barceló

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The chase procedure (or simply chase) is a fundamental algorithmic tool that has been successfully applied to several database problems such as containment of queries under constraints [2], checking logical implication of constraints [5, 27], computing data exchange solutions [17], and query answering under constraints [11], to name a few. The chase procedure takes as input a database D and a set Σ of constraints, which, for this work, are tuple-generating dependencies (TGDs) of the form

$$ \forall \bar x \forall \bar y (\phi(\bar x,\bar y)\ {\rightarrow}\ \exists \bar z \psi(\bar x,\bar z)), $$

where ϕ (the body) and ψ (the head) are conjunctions of relational atoms, and it produces an instance D_Σ that is a universal model of D and Σ, i.e., a model that can be homomorphically embedded into every other model of D and Σ. Somehow D_Σ acts as a representative of all the models of D and Σ. This is the reason for the ubiquity of the chase in database theory, as discussed in [15]. Indeed, many database problems can be solved by simply exhibiting a universal model.

1.1 The Chase in a Nutshell

Roughly, the chase adds new tuples to the database D (possibly involving null values that act as witnesses for the existentially quantified variables), as dictated by the TGDs of Σ, and it keeps doing this until all the TGDs of Σ are satisfied. There are, in principle, three different ways for formalizing this simple idea, which lead to different versions of the chase procedure:

Oblivious Chase

The first one, which gives rise to the oblivious chase, is as follows: for each pair $(\bar t,\bar u)$ of tuples of terms from the instance I constructed so far, apply a TGD σ of the form $\forall \bar x \forall \bar y \left (\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)\right )$ if $\phi (\bar t,\bar u) \subseteq I$, and σ has not been applied in a previous chase step due to the same pair $(\bar t,\bar u)$, and add to I the set of atoms $\psi (\bar t,\bar v)$, where $\bar v$ is a tuple of new nulls not occurring in I.

Semi-oblivious Chase

The second one, which is a refinement of the oblivious chase, and it gives rise to the semi-oblivious chase, is as follows: for each pair $(\bar t,\bar u)$ of tuples of terms from the instance I constructed so far, apply a TGD σ of the form $\forall \bar x \forall \bar y \left (\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)\right )$ if $\phi (\bar t,\bar u) \subseteq I$, and σ has not been applied in a previous chase step due to a pair of tuples $(\bar t,\bar u^{\prime })$ with $\phi (\bar t,\bar u^{\prime }) \subseteq I$, where $\bar u$ and $\bar u^{\prime }$ might be different, and add to I the set of atoms $\psi (\bar t,\bar v)$, where $\bar v$ is a tuple of new nulls not in I. In other words, a TGD σ of the above form is applied only once due to a certain witness $\bar t$ for the variables $\bar x$ that appear both in ϕ and ψ.

Restricted Chase

The third one, which is a refinement of the (semi-)oblivious chase that leads to the restricted (a.k.a. standard) chase, is as follows: for each pair $(\bar t,\bar u)$ of tuples of terms from the instance I constructed so far, apply a TGD σ of the form $\forall \bar x \forall \bar y \left (\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)\right )$ if $\phi (\bar t,\bar u) \subseteq I$, and there is no tuple $\bar u^{\prime }$ of terms from I such that $\psi (\bar t,\bar u^{\prime }) \subseteq I$, i.e., the TGD is not already satisfied, and add to I the set of atoms $\psi (\bar t,\bar v)$, where $\bar v$ is a tuple of new nulls not in I.

Thus, the key difference between the (semi-)oblivious and restricted versions of the chase is that the former apply a TGD whenever the body is satisfied, while the latter applies a TGD if the body is satisfied but the head is not.

1.2 Restricted vs. (Semi-)Oblivious Chase

It should not be difficult to verify that the restricted chase, in general, builds smaller instances than the (semi-)oblivious one. In fact, it is easy to devise an example where, according to the restricted chase, none of the TGDs should be applied, while the (semi-)oblivious chase builds an infinite instance.

Example 1

Consider the database D = {R(a, a)} and the TGD

$$ \forall x \forall y (R(x,y)\ {\rightarrow}\ \exists z R(z,x)). $$

The restricted chase will detect that the database already satisfies the TGD, while the (semi-)oblivious chase will build the infinite instance

$$ \{R(a,a),R(\bot_{1},a),R(\bot_{2},\bot_{1}),R(\bot_{3},\bot_{2}),\ldots\}, $$

where ⊥₁,⊥₂,⊥₃,… are (labeled) nulls.

It should be also clear that the semi-oblivious chase, in general, builds smaller instances than the oblivious one. This is illustrated by the following example.

Example 2

Consider the database D = {R(a, a)} and the TGD

$$ \forall x \forall y (R(x,y)\ {\rightarrow}\ \exists z R(x,z)), $$

The semi-oblivious chase will build the instance {R(a, a),R(a,⊥)}, where ⊥ is a null, whereas the oblivious chase will build the infinite instance

$$ \{R(a,a),R(a,\bot_{1}),R(a,\bot_{2}),R(a,\bot_{3}),\ldots\}, $$

where ⊥₁,⊥₂,⊥₃,… are nulls.

The restricted chase has a clear advantage over the (semi-)oblivious chase when it comes to the size of the final result. But, of course, this advantage does not come for free: at each application, the restricted chase has to check that there is no way to satisfy the head of the TGD at hand, and this can be computationally costly in practice. On the other hand, the advantage of the semi-oblivious chase over the oblivious chase comes without any additional overhead since both versions have to keep track of the TGDs and pairs of tuples that have been considered so far.

It has been recently observed that for RAM-based implementations the restricted chase is the indicated approach since the benefit from producing smaller instances justifies the additional effort for checking whether a TGD is already satisfied; see, e.g., [6, 22]. However, as discussed in [6], an RDBMS-based implementation of the restricted chase is quite challenging, whereas an efficient implementation of the semi-oblivious chase is feasible. Hence, both the semi-oblivious and restricted versions of the chase are relevant algorithmic tools for practical implementations, whereas the oblivious version of the chase is mostly of theoretical interest.

1.3 The Challenge of Non-Termination

There are indeed efficient implementations of the semi-oblivious and restricted chase that allow us to solve central database problems by adopting a materialization-based approach [6, 22, 30, 33]. Nevertheless, for this to be feasible in practice we need a guarantee that the chase terminates, which is not always the case. This fact motivated a long line of research on identifying classes of TGDs that ensure the termination of the semi-oblivious and/or restricted chase, no matter how the input database looks like. A prime example is the class of weakly-acyclic sets of TGDs [17], which has been introduced in the context of data exchange, and guarantees the termination of both the semi-oblivious and restricted chase. Many other termination criteria can be found in the literature; see, e.g., [4, 9, 14‐16, 21, 23, 28, 29].

With so much effort spent on identifying sufficient conditions for the termination of the chase, the question that immediately comes up is whether a sufficient condition that is also necessary exists. In other words, given a set Σ of TGDs, is it possible to decide whether, for every database D, the semi-oblivious or the restricted chase on D and Σ terminates? The answer is negative, no matter which version of the chase we consider [18]; this is actually true even for the oblivious version of the chase. The problem remains undecidable even if the database is known; this has been established in [15] for the restricted chase, and it was observed in [28] that the same proof shows undecidability also for the (semi-)oblivious chase.

1.4 Deciding the Termination of the Chase

The undecidability proof given in [18] constructs a sophisticated set of TGDs that goes beyond existing well-behaved classes of TGDs that enjoy certain syntactic properties, which in turn ensure useful model-theoretic properties. Such well-behaved classes of TGDs have been proposed in the context of ontological reasoning. The two main paradigms that led to robust TGD-based formalisms, without forcing the chase to terminate, are guardedness [4, 11, 12] and stickiness [13]:

Guardedness

A TGD is guarded if its body has an atom that contains (or “guards”) all the universally quantified variables. This condition has been inspired by the guarded fragment of first-order logic, and is powerful enough to capture important Description Logics (DLs) such as the members of the $\mathcal {E}{\mathscr{L}}$ family. The key model-theoretic property of the class of guarded TGDs, which explains its robust behaviour, is the existence of tree-like universal models [11].

Stickiness

On the other hand, sticky sets of TGDs are powerful enough to model interesting statements that are inherently non-tree-like, and thus, not expressible via guarded TGDs. Such a statement, for example, consists of the TGDs

$$ \forall x \forall y (R(x,y)\ {\rightarrow}\ \exists z R(y,z) \wedge P(z)) \qquad \forall x \forall y (P(x) \wedge P(y)\ {\rightarrow}\ S(x,y)), $$

which compute the cartesian product of an infinite unary relation, a useful modeling feature that, in DL parlance, is known as concept product [32].

The fact that the set of TGDs constructed in the undecidability proof of [18] is neither guarded nor sticky raised the following question: is the semi-oblivious/restricted chase termination problem decidable for guarded or sticky sets of TGDs? The current state of affairs concerning this central question is as follows:

For the semi-oblivious chase and guarded TGDs the problem is well-understood: it is 2ExpTime-complete in general, and ExpTime-complete for predicates of bounded arity [8]. The same paper [8] considered also linear TGDs, i.e., TGDs with only one body atom, which form a central subclass of guarded TGDs: the problem becomes PSpace-complete, and NLogSpace-complete for predicates of bounded arity. An alternative proof for linear TGDs, which relies on derivation trees, a notion that was originally introduced in the context of ontological query answering [3], has been recently proposed in [24].
For the restricted chase and guarded TGDs, it has been recently shown, by exploiting Monadic-Second Order Logic over infinite trees of bounded degree, that the problem is decidable in elementary time assuming only one atom in the head, whereas the case of more than one atoms in the head remains a challenging open problem [19]. The case of linear TGDs with only one atom in the head has been explicitly considered in [24], where the decidability of the problem in question has been shown by relying on derivation trees.
Finally, for the restricted chase and sticky sets of TGDs, it has been recently shown, by exploiting Büchi Automata, that the problem is decidable in elementary time assuming only one atom in the head, whereas the case of more than one atoms in the head remains a challenging open problem [19].

Towards completing the picture concerning the chase termination problem, in this work we concentrate on the semi-oblivious chase and sticky sets of TGDs, and provide precise complexity results: PSpace-complete in general, and NLogSpace-complete for predicates of bounded arity. Our results apply also to the oblivious chase that, although is not very useful for practical purposes, it is a relevant technical tool due to its simplicity; for a discussion on the usefulness of the oblivious chase see [11].

1.5 Summary of Contributions

Our results can be summarized as follows:

In Section 4, we provide a semantic characterization of non-termination of the semi-oblivious chase under sticky sets of TGDs via the existence of “path-like” infinite chase derivations, which forms the basis for our decision procedure.
By exploiting the above semantic characterization, we then provide, in Section 5, a syntactic characterization of semi-oblivious chase termination via a graph-based condition. To this end, we exploit a recent syntactic characterization from [8] of the termination of the semi-oblivious chase under linear TGDs.
In Section 6, by using the graph-based syntactic characterization from the previous section, we establish the desired complexity upper bounds for our problem: PSpace in general, and NLogSpace for predicates of bounded arity. We finally establish matching lower bounds. The PSpace-hardness is obtained by simulating the behaviour of a polynomial space Turing machine, while the NLogSpace-hardness via a reduction from graph reachability.

2 Preliminaries

We consider the mutually disjoint countably infinite sets C, N, and V of constants, (labeled) nulls, and variables, respectively. We refer to constants, nulls and variables as terms. For an integer n > 0, we may write [n] for the set {1,…,n}.

Relational Databases

A schema S is a finite set of relation symbols (or predicates) with associated arity. We write R/n to denote that R has arity n ≥ 0; we may also write ar(R) for the integer n. A (predicate) position of S is a pair (R, i), where R/n ∈S and i ∈ [n], that essentially identifies the i-th argument of R. We write pos(S) for the set of positions of S, that is, the set {(R, i)∣R/n ∈S and i ∈ [n]}. An atom over S is an expression of the form $R(\bar t)$, where R/n ∈S and $\bar t$ is an n-tuple of terms. A fact is an atom whose arguments consist only of constants. An (atom) position of $R(\bar t)$ is a pair $(R(\bar t),i)$, where i ∈ [ar(R)], that essentially identifies the predicate position (R, i) in $R(\bar t)$. We write $R(\bar t)[i]$ for the term occurring at position $(R(\bar t),i)$. Moreover, for a variable x in $\bar t$, $\mathsf {pos}(R(\bar t), x)$ is the set of predicate positions $\{(R,i) \mid R(\bar t)[i] = x\}$. We write ${\mathsf {var}({R(\bar t)})}$ for the set of variables in $\bar t$. The notations pos(⋅,x) and var(⋅) extend to sets of atoms. An instance over S is a (possibly infinite) set of atoms over S with constants and nulls. A database over S is a finite set of facts over S. The active domain of an instance I, denoted dom(I), is the set of terms in I.

Substitutions and Homomorphisms

A substitution from a set of terms T to a set of terms $T^{\prime }$ is a function $h : T {\rightarrow } T^{\prime }$. Henceforth, we treat a substitution h as the set of mappings {t↦h(t)∣t ∈ T}. The restriction of h to a subset S of T, denoted h_|S, is the substitution {t↦h(t)∣t ∈ S}. A homomorphism from a set of atoms A to a set of atoms B is a substitution h from the set of terms in A to the set of terms in B such that (i) t ∈C implies h(t) = t, i.e., h is the identity on C, and (ii) R(t₁,…,t_n) ∈ A implies h(R(t₁,…,t_n)) = R(h(t₁),…,h(t_n)) ∈ B.

Tuple-Generating Dependencies

A tuple-generating dependency (TGD) σ is a first-order sentence (without constants) of the form

$$ \forall \bar x \forall \bar y(\phi(\bar x,\bar y)\ {\rightarrow}\ \exists \bar z \psi(\bar x,\bar z)), $$

where $\bar x, \bar y$ and $\bar z$ are mutually disjoint tuples of variables of V, while $\phi (\bar x,\bar y)$ and $\psi (\bar x,\bar z)$ are conjunctions of atoms. For brevity, we write σ as $\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)$, and use comma instead of ∧ for joining atoms. We refer to $\phi (\bar x,\bar y)$ and $\psi (\bar x,\bar z)$ as the body and head of σ, denoted body(σ) and head(σ), respectively. The frontier of the TGD σ, denoted fr(σ), is the set of variables $\bar x$, i.e., the variables that appear both in the body and the head of σ. Note that, by abuse of notation, we may treat a tuple of variables as a set of variables. The schema of a set Σ of TGDs, denoted sch(Σ), is the set of predicates occurring in Σ, and we write ar(Σ) for the maximum arity over all those predicates. An instance I satisfies a TGD σ as the one above, written I⊧σ, if the following holds: whenever there exists a homomorphism h from $\phi (\bar x, \bar y)$ to I, then there exists $h^{\prime } \supseteq h_{|\bar x}$ that is a homomorphism from $\psi (\bar x,\bar z)$ to I. Note that, by abuse of notation, we sometimes treat a conjunction of atoms as a set of atoms. The instance I satisfies a set Σ of TGDs, written I⊧Σ, if I⊧σ for each σ ∈Σ.

Stickiness

One of the main syntactic paradigms for TGDs is stickiness [13]. The key property underlying this condition is as follows: variables that appear more than once in the body of a TGD should be inductively propagated (or “stick”) to every head atom, which can be graphically illustrated as

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figa_HTML.png

where the first set of TGDs is sticky, while the second is not. The formal definition relies on an inductive procedure that marks the variables that may violate the above property. The base step marks a variable in the body of a TGD that does not occur in every head atom. Then, the marking is inductively propagated as follows

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figb_HTML.png

Stickiness requires a marked variable to appear only once in the body of a TGD.

Let us now formalize the above intuitive discussion. Consider a set Σ of TGDs; we assume, w.l.o.g., that the TGDs in Σ do not share variables. Let σ ∈Σ and x a variable in body(σ). We inductively define when x is marked in Σ:

If x does not occur in every atom of head(σ), then x is marked in Σ.
Assuming that head(σ) contains an atom of the form $R(\bar t)$ and $x \in \bar t$, if there exists $\sigma ^{\prime } \in {\varSigma }$ that has in its body an atom of the form $R(\bar t^{\prime })$, and each variable in $R(\bar t^{\prime })$ at a position of $\mathsf {pos}({R(\bar t)},{x})$ is marked in Σ, then x is marked in Σ.

The set Σ is sticky if there is no TGD whose body contains two occurrences of a variable that is marked in Σ. We denote by $\mathbb {S}$ the family of all sticky finite sets of TGDs.¹

3 The Chase Procedure

The chase procedure (or simply chase) accepts as an input a database D and a set Σ of TGDs, and constructs a (possibly infinite) instance that contains D and satisfies Σ. Central notions in this context are those of active trigger and trigger application, which are coming into two different variations, oblivious and semi-oblivious, which in turn give rise to the oblivious [11] and the semi-oblivious [20, 28] chase. The key difference between these two versions of the chase, lies at the adopted naming scheme for the newly generated null values, which are used as witnesses for the existentially quantified variables occurring in the head of a TGD.

Definition 1 (Trigger and Trigger Application)

A trigger for a set Σ of TGDs on an instance I is a pair (σ, h), where σ ∈Σ and h is a homomorphism from body(σ) to I. The oblivious result and semi-oblivious result of (σ, h), denoted o-result(σ, h) and so-result(σ, h), is the set of atoms μ_o(head(σ)) and μ_so(head(σ)), respectively, where $\mu _{\mathsf {o}},\mu _{\mathsf {so}} : \mathsf {var}({\mathsf {head}({\sigma })}) {\rightarrow } {\mathbf {C}} \cup {\mathbf {N}}$ are defined as

$$ \begin{array}{@{}rcl@{}} \mu_{\mathsf{o}}(x)\ =\ \left\{ \begin{array}{ll} h(x) & \quad \text{if } x \in \mathsf{fr}(\sigma)\\ &\\ \bot_{\sigma,h}^{x} & \quad \text{otherwise} \end{array} \right. \qquad \mu_{\mathsf{so}}(x)\ =\ \left\{ \begin{array}{ll} h(x) & \quad \text{if } x \in \mathsf{fr}(\sigma)\\ &\\ \bot_{\sigma,h_{|\mathsf{fr}(\sigma)}}^{x} & \quad \text{otherwise} \end{array} \right. \end{array} $$

where $\bot _{\sigma ,h}^{x}, \bot _{\sigma ,h_{|\mathsf {fr}(\sigma )}}^{x}$ are nulls from N. We call the trigger (σ, h) obliviously active if o-$\mathsf {result}(\sigma ,h) \not \subseteq I$, and semi-obliviously active if so-$\mathsf {result}(\sigma ,h) \not \subseteq I$. The oblivious application of (σ, h) to I returns the instance J = I ∪o-result(σ, h), and is denoted as I〈o, σ, h〉J. Analogously, the semi-oblivious application of (σ, h) to I returns the instance J = I ∪so-result(σ, h), and is denoted as I〈so, σ, h〉J.

Observe that in the definition of ⋆-result(σ, h), where ⋆ ∈{o, so}, each existentially quantified variable x occurring in head(σ) is mapped by μ_⋆ to a “fresh” null value of N whose name is uniquely determined by the trigger (σ, h) and x itself. Therefore, for a trigger (σ, h), we can unambiguously write down the set of atoms ⋆-result(σ, h). In our analysis, it would be useful to be able to refer to the terms occurring in ⋆-result(σ, h) that have been propagated (not invented) during the application of (σ, h). Formally, the frontier of ⋆-result(σ, h), denoted fr(⋆-result(σ, h)), is the set of terms dom(h(body(σ))) ∩dom(⋆-result(σ, h)).

(Semi-)Oblivious Chase

The main idea of the chase is, starting from a database D, to exhaustively apply active triggers for the given set Σ of TGDs on the instance constructed so far. We simultaneously define oblivious and semi-oblivious chase derivations. To this end, we distinguish the two cases where a derivation is finite or not:

A finite sequence (I_i)_0≤i≤n of instances, with D = I₀ and n ≥ 0, is an oblivious (resp., semi-oblivious) chase derivation of D w.r.t. Σ if, for each i ∈{0,…,n − 1}, there exists an obliviously (resp., semi-obliviously) active trigger (σ, h) for Σ on I_i such that I_i〈o, σ, h〉I_i+ 1 (resp., I_i〈so, σ, h〉I_i+ 1), and no obliviously (resp., semi-obliviously) active trigger for Σ on I_n exists.
An infinite sequence (I_i)_i≥ 0 of instances, with D = I₀, is an oblivious (resp., semi-oblivious) chase derivation of D w.r.t. Σ if, for each i ≥ 0, there exists an obliviously (resp., semi-obliviously) active trigger (σ, h) for Σ on I_i such that I_i〈o, σ, h〉I_i+ 1 (resp., I_i〈so, σ, h〉I_i+ 1). Moreover, (I_i)_i≥ 0 is fair if, for each i ≥ 0, and for every obliviously (resp., semi-obliviously) active trigger (σ, h) for Σ on I_i, there exists j > i such that (σ, h) is not an obliviously (resp., semi-obliviously) active trigger for Σ on I_j. The latter is known as the fairness condition, and guarantees that all the active triggers will eventually be deactivated.

A (semi-)oblivious chase derivation is valid if it is finite, or infinite and fair. Infinite but unfair chase derivations are not valid since they do not serve the main purpose of the chase procedure, i.e., build an instance that satisfies the given TGDs. Henceforth, we write o-chase and so-chase for oblivious and semi-oblivious chase, respectively.

In general, due to the adopted naming scheme and the definition of active triggers, the semi-oblivious chase builds smaller instances than the oblivious one. This is because a trigger that is semi-obliviously active it is also obliviously active, but the other direction is not always true. This has been already illustrated by Example 2 in Section 1, which, for the sake of readability, we recall below.

Example 3

Consider the database D = {R(a, a)}, and the TGD

$$ \sigma\ =\ R(x,y)\ {\rightarrow}\ \exists z R(x,z). $$

It is easy to verify that the only o-chase derivation of D w.r.t. {σ} is infinite. On the other hand, the only so-chase derivation of D w.r.t. {σ} is

$$ \{R(a,a)\}, \left\{R(a,a),R\left( a,\bot^{z}_{\sigma,\{x \mapsto a\}}\right)\right\}, $$

which is, of course, finite. Indeed, if we apply again the TGD σ, then we will obtain the atom $R\left (a,\bot ^{z}_{\sigma ,\{x \mapsto a\}}\right )$, which is already present.

Chase Relation

A useful notion that we are going to use in our proofs is the so-called chase relation [13], which essentially describes how the atoms generated during the chase depend on each other. Consider a ⋆-chase derivation δ = (I_i)_i≥ 0, where ⋆ ∈{o, so}, of a database D w.r.t. a set Σ of TGDs, and assume that for each i ≥ 0, I_i〈⋆,σ_i,h_i〉I_i+ 1, which means that I_i+ 1 = I_i ∪ ⋆-result(σ_i,h_i). The chase relation of δ, denoted ≺_δ, is a binary relation over $\bigcup _{i \geq 0} I_{i}$ such that α ≺_δβ iff there exists i ≥ 0 such that α ∈ h_i(body(σ_i)) and β ∈ I_i+ 1 ∖ I_i. Notice that the relation ≺_δ is acyclic, or, in other words, it forms a directed acyclic graph over $\bigcup _{i \geq 0} I_{i}$.

3.1 Chase Termination Problem

It is known that due to the existentially quantified variables, a valid ⋆-chase derivation, where ⋆ ∈{o, so}, may be infinite. This is true even for very simple settings: it is easy to verify that the only ⋆-chase derivation of D = {R(a, b)} w.r.t. the set Σ consisting of the single TGD $R(x,y) {\rightarrow } \exists z R(y,z)$ is infinite. The question that comes up is, given a set Σ of TGDs, can we check whether, for every database D, all or some valid (semi-)oblivious chase derivations of D w.r.t. Σ are finite? Before formalizing the above problem, let us recall two central classes of sets of TGDs:

$$ \begin{array}{rcl} \mathbb{CT}_{\forall \forall}^{\star} &=& \left\{{\varSigma}\ \left| \begin{array}{l} \text{ for \textit{every} database } D,\\ \text{ \textit{every} valid $\star$-chase derivation} \text{ of } D \text{ w.r.t.~} {\varSigma} \text{ is finite} \end{array} \right.\right\} \end{array} $$

$$ \begin{array}{rcl} \mathbb{CT}_{\forall \exists}^{\star} &=& \left\{{\varSigma}\ \left| \begin{array}{l} \text{ for \textit{every} database } D,\\ \text{ there \textit{exists} a finite $\star$-chase derivation} \text{ of } D \text{ w.r.t.~} {\varSigma} \end{array} \right.\right\} \end{array} $$

The problems tackled in this work are as follows, where $\mathbb {C}$ is a class of sets of TGDs:

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figc_HTML.png

It is well-known from [20] that the following holds:

$$ \mathbb{CT}_{\forall \forall}^{\mathsf{o}} = \mathbb{CT}_{\forall \exists}^{\mathsf{o}}\ \subset\ \mathbb{CT}_{\forall \forall}^{\mathsf{so}} = \mathbb{CT}_{\forall \exists}^{\mathsf{so}}. $$

This immediately implies that, after fixing the version of the chase in question, i.e., oblivious or semi-oblivious, the above decision problems are equivalent. Henceforth, for a class $\mathbb {C}$ of sets of TGDs, we simply refer to the problem ${\mathsf {CT}_{\forall }^{\star }}(\mathbb {C})$, and we write ${\mathbb {CT}_{\forall }^{\star }}$ for the classes ${\mathbb {CT}_{\forall \forall }^{\star }}$ and ${\mathbb {CT}_{\forall \exists }^{\star }}$, where ⋆ ∈{o, so}.

We know that our main decision problems are, in general, undecidable. Assuming that $\mathbb {TGD}$ denotes the class of arbitrary sets of TGDs, we have that:

Theorem 1 (18)

For ⋆ ∈{o, so}, $\mathsf {CT}_{\forall }^{\star }(\mathbb {TGD})$ is undecidable, even if we focus on binary and ternary predicates.

However, the set of TGDs employed in the undecidability proof of [18] is not sticky. What about $\mathsf {CT}_{\forall }^{\star }(\mathbb {S})$ then? This is a non-trivial problem, and pinpointing its complexity is the main goal of this work.

3.2 Some Useful Results

Before proceeding with the complexity analysis of $\mathsf {CT}_{\forall }^{\star }(\mathbb {S})$, let us recall a couple of technical results that would allow us to significantly simplify our investigation.

Critical Database

It would be useful to have a special database of a very simple form that gives rise to a valid infinite chase derivation whenever there is a database with the same property. Interestingly, such a database exists, which is known as the critical database for a set Σ of TGDs [28]. Formally, the critical database for Σ is

$$ \mathsf{cr}({\varSigma})\ =\ \{R(\natural,\ldots,\natural) \mid R \in \mathsf{sch}({\varSigma})\}, $$

where ♮ ∈C is a fixed constant. In other words, cr(Σ) consists of all the atoms that can be formed using the predicates of sch(Σ) and the constant ♮. The following result states that cr(Σ) is indeed the desired database:

Proposition 1 ([28])

Consider a set Σ of TGDs. For ⋆ ∈{o, so}, the following are equivalent:

${\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\star }$.

There exists a valid infinite ⋆-chase derivation of cr(Σ) w.r.t. Σ.

Henceforth, we always use ♮ for the constant that appears in a critical database. Notice that this special constant does not depend on the set of TGDs in question.

Fairness

As one might expect, we are going to focus on the complement of $\mathsf {CT}_{\forall }^{\star }(\mathbb {S})$, for ⋆ ∈{o, so}, and pinpoint the complexity of the following problem: given a set ${\varSigma } \in \mathbb {S}$, is there a valid infinite ⋆-chase derivation δ of cr(Σ) w.r.t. Σ (see Proposition 1). However, as observed in [8], where the same problem is studied but for the class of guarded TGDs, one of the main difficulties is to ensure that δ enjoys the fairness condition. Interestingly, as shown in [8], we can completely neglect the fairness condition since the existence of a (possibly unfair) infinite ⋆-chase derivation of some database w.r.t. Σ implies the existence of a fair one.

Proposition 2 (8)

Consider a database D and a set Σ of TGDs. For ⋆ ∈{o, so}, the following are equivalent:

There exists a valid infinite ⋆-chase derivation of D w.r.t. Σ.

There exists an infinite ⋆-chase derivation of D w.r.t. Σ

By combining Propositions 1 and 2, we obtain the following useful corollary:

Corollary 1

Consider a set Σ of TGDs. For ⋆ ∈{o, so}, the following are equivalent

${\varSigma } \not \in \mathbb {CT}_{\forall }^{\star }$.

There exists an infinite ⋆-chase derivation of cr(Σ) w.r.t Σ.

3.3 Our Main Result and Plan of Attack

As discussed above, the main goal of this work is to pinpoint the complexity of chase termination under sticky sets of TGDs, focussing on the oblivious and semi-oblivious versions of the chase procedure. Our main result follows:

Theorem 2

$\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ and $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$ are PSpace-complete, and NLogSpace-complete for predicates of bounded arity.

Consider a set ${\varSigma } \in \mathbb {S}$. By Corollary 1, our main challenge is to show that the problem of deciding whether there exists an infinite ⋆-chase derivation of cr(Σ) w.r.t. Σ, where ⋆ ∈{o, so}, is PSpace-complete, and NLogSpace-complete for predicates of bounded arity. In fact, the bulk of our work concentrates on establishing the desired upper bounds for the semi-oblivious chase, i.e., when ⋆ = so. We can then easily obtain the same upper bounds for the oblivious chase by exploiting a simple reduction from ${\mathsf {CT}_{\forall }^{{\mathsf {o}}}}(\mathbb {S})$ to ${\mathsf {CT}_{\forall }^{{\mathsf {so}}}}(\mathbb {S})$. Our plan of attack follows:

The upper bounds heavily rely on the following semantic characterization given in Section 4: there exists an infinite so-chase derivation of cr(Σ) w.r.t. Σ iff there exists a “path-like” infinite so-chase derivation of cr(Σ) w.r.t. Σ.
The above semantic characterization allows us to provide, in Section 5, a syntactic graph-based characterization of the existence of an infinite so-chase derivation of cr(Σ) w.r.t. Σ. Actually, the latter coincides with the existence of a certain “bad” cycle in the dependency graph of a “linearized” version of Σ.
We show, in Section 6, that checking whether a “bad” cycle exists in the dependency graph of the “linearized” version of Σ is in PSpace, and in NLogSpace for predicates of bounded arity. This shows the desired upper bounds for ${\mathsf {CT}_{\forall }^{\mathsf {so}}}(\mathbb {S})$. We then explain how the upper bounds for $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$ can be transferred to $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ by exploiting a simple construction known as enrichment [20]. We finally provide matching lower bounds for $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ and $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$. The PSpace-hardness is obtained by simulating a deterministic polynomial space Turing machine, while the NLogSpace-hardness by a reduction from graph reachability.

At this point, one may wonder whether a powerful termination criterion from the literature allows us to characterize the fragment of sticky sets of TGDs that guarantees the termination of the semi-oblivious chase, which in turn will lead to the desired complexity results. To the best of our knowledge, such a criterion does not exist. Interestingly, model-faithful acyclicity, one of the largest classes of TGDs that ensure the termination of the semi-oblivious chase known today [14], is not powerful enough for characterizing the class $\mathbb {S} \cap {\mathbb {CT}_{\forall }^{{\mathsf {so}}}}$. For example, the sticky set of TGDs

$$ P(x,y,z)\ {\rightarrow}\ S(x,y,z) \qquad S(x,y,x)\ {\rightarrow}\ \exists z P(y,z,x) $$

belongs to $\mathbb {CT}_{\forall }^{\mathsf {so}}$, but it violates the model-faithful acyclicity condition.

4 Semantic Characterization of Semi-Oblivious Chase Non-Termination

We proceed to characterize the non-termination of the so-chase under sticky sets of TGDs. For a set ${\varSigma } \in \mathbb {S}$, our goal is to show that, if there is an infinite so-chase derivation of cr(Σ) w.r.t. Σ, then we can isolate a “path-like” infinite so-chase derivation δ_ℓ, which we call linear. Roughly speaking, linearity means that there exists an infinite simple path α₀,α₁,α₂… in the chase relation of δ_ℓ such that α₀ ∈cr(Σ) and α_i is constructed during the i-th trigger application, while all the atoms that are needed to construct this path, and are not already on the path, are atoms of cr(Σ).

Definition 2 (Linearity)

Consider a set Σ of TGDs. An infinite so-chase derivation δ = (I_i)_i≥ 0 of cr(Σ) w.r.t. Σ, where I_i〈so, σ_i,h_i〉I_i+ 1 for i ≥ 0, is called linear if there is an infinite sequence of distinct atoms (α_i)_i≥ 0 such that the following hold:

α₀ ∈cr(Σ).

For each i ≥ 0, α_i+ 1 ∈ I_i+ 1 ∖ I_i, and there exists β ∈body(σ_i) such that h_i(β) = α_i and $h_{i}({\mathsf {body}({\sigma _{i}})} \setminus \{\beta \}) \subseteq \mathsf {cr}({\varSigma })$.

We are now ready to present our main characterization of non-termination of the semi-oblivious chase under sticky sets of TGDs.

Theorem 3

Consider a set ${\varSigma } \in \mathbb {S}$. The following are equivalent:

There exists an infinite so-chase derivation of cr(Σ) w.r.t Σ.

There exists a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ.

It is clear that (2) ⇒ (1) holds trivially. The non-trivial direction is (1) ⇒ (2), which is established in two main steps:

We show that the chase relation of an infinite so-chase derivation δ of cr(Σ) w.r.t. Σ always contains a special path, called continuous, rooted at an atom of cr(Σ), which essentially guarantees the continuous propagation of a new null. Note that the existence of such a special path does not rely on stickiness.

By exploiting the existence of a continuous path, we construct a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ. In fact, due to stickiness, we can convert an infinite suffix P of the continuous path in ≺_δ, together with all the atoms that are needed to generate the atoms on P via a single trigger application, into a linear infinite so-chase derivation δ_ℓ of cr(Σ) w.r.t. Σ. As we shall see, stickiness helps us to ensure that δ_ℓ is linear, while continuity allows us to show that δ_ℓ is infinite.

In the rest of this section, we fix a set ${\varSigma } \in \mathbb {S}$. For technical clarity, we assume that all the TGDs of Σ have a non-empty frontier, i.e., for every TGD σ ∈Σ, there exists at least one variable that appears both in body(σ) and head(σ). Furthermore, we assume that Σ is in normal form, i.e., each TGD of Σ has only one atom in its head. The normalization procedure, which preserves stickiness, is rather standard and can be found, e.g., in [13]. The above simplifying assumptions do not affect the generality of our proof. In other words, assuming that Σ is what we obtain after removing from $\hat {{\varSigma }} \in \mathbb {S}$ all the TGDs with an empty frontier and then normalize it, we can easily show that there exists a linear infinite so-chase derivation of ${\mathsf {cr}({\hat {\varSigma }})}$ w.r.t. $\hat {{\varSigma }}$ iff there exists a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ.

4.1 Existence of a Continuous Path

Let us first formalize the notion of path in the chase relation of a derivation. Given an infinite so-chase derivation δ = (I_i)_i≥ 0 of cr(Σ) w.r.t. Σ, a finite δ-path is a finite sequence of atoms (α_i)_0≤i≤n such that α₀ ∈ I₀ and α_i≺_δα_i+ 1. Analogously, we can define the notion of infinite δ-path, which refers to an infinite sequence of atoms rooted at an atom of I₀. The intention underlying continuity is to ensure the continuous propagation of a new null on a path. Roughly, a δ-path (α_i)_i≥ 0 is continuous via a sequence of indices (ℓ_i)_i≥ 0, with ℓ₀ = 1, if for each i ≥ 0, a new null is invented in $\alpha _{\ell _{i}}$ that is “necessarily propagated” up to the atom $\alpha _{\ell _{i+1}}$. At this point, it is crucial to formalize what we mean by “necessarily propagated”.

Let α, β be atoms of $\bigcup _{i \geq 0} I_{i}$, and assume that β ∈ I_j ∖ I_j− 1, for some j > 0, with I_j− 1〈so, σ, h〉I_j, i.e., β = so-result(σ, h).² We say that the i-th position of α and the j-th position of β are related (w.r.t δ), denoted $(\alpha ,i) \simeq _{\delta } (\beta ,j)$, if there exists an atom γ ∈body(σ) such that α = h(γ) and γ[i] = head(σ)[j]. A finite δ-path (α_i)_0≤i≤n is (s, t)-connected, where s ∈ [ar(R₀)] and t ∈ [ar(R_n)], with R₀ and R_n being the predicates of α₀ and α_n, respectively, if there exists a sequence of integers (ℓ_k)_{1≤k≤n− 1} such that

$$ (\alpha_{0},s) \simeq_{\delta} (\alpha_{1},\ell_{1}) \simeq_{\delta} (\alpha_{2},\ell_{2}) \simeq_{\delta} {\cdots} \simeq_{\delta} (\alpha_{n-1},\ell_{n-1}) \simeq_{\delta} (\alpha_{n},t). $$

In simple words, the term occurring at position (α₀,s) is necessarily propagated up to the position (α_n,t) via the intermediate positions (α₁,ℓ₁),…,(α_n− 1,ℓ_n− 1).

For introducing continuity, we also need the notion of the birth atom of a null value. Consider a null ⊥ occurring in $\bigcup _{i \geq 0} I_{i}$. The birth atom (w.r.t. δ) of ⊥, denoted birth_δ(⊥), is the atom of $\bigcup _{i \geq 0} I_{i}$ such that, for some j > 0, I_j ∖ I_j− 1 = {birth_δ(⊥)} and ⊥∈dom(I_j) ∖dom(I_j− 1). It is clear that the atom birth_δ(⊥) is unique (since we consider TGDs in normal form). We are now ready to introduce continuity.

Definition 3 (Continuity)

Let δ be an infinite so-chase derivation δ of cr(Σ) w.r.t. Σ. A δ-path (α_i)_i≥ 0 is continuous if there exist an infinite sequence (ℓ_i)_i≥ 0 of integers, where ℓ₀ = 1 and ℓ₀ < ℓ₁ < ⋯, an infinite sequence (⊥_i)_i≥ 0 of nulls, and infinite sequences of integers (m_i)_i≥ 0 and (p_i)_i≥ 0 from [ar(Σ)], such that, for each k ≥ 0: $\alpha _{\ell _{k}} = {\mathsf {birth}_{{\delta }}({\bot _{k}})}$, $\bot _{k} = \alpha _{\ell _{k}}[m_{k}]$, and $(\alpha _{i})_{\ell _{k} \leq i \leq \ell _{k+1}}$ is (m_k,p_k)-connected.

A simple example that illustrates the notion of continuity follows:

Example 4

Assume that Σ = {σ₁,σ₂,σ₃}, where³

$$ \begin{array}{@{}rcl@{}} \sigma_{1} &=& S(x)\ {\rightarrow}\ \exists y \exists z P(x,y,z),R(y,z)\\ \sigma_{2} &=& P(x,y,z), R(y,w)\ {\rightarrow}\ P(w,y,z)\\ \sigma_{3} &=& P(x,y,z),R(y,w)\ {\rightarrow} \exists v P(z,y,v),R(y,v),Q(w). \end{array} $$

It is easy to verify that there exists an infinite so-chase derivation δ of cr(Σ) w.r.t. Σ such that the following is part of ≺_δ; a black solid edge from α to β labeled by σ means that (α, β) belongs to ≺_δ due to a trigger that involves the TGD σ:

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figd_HTML.png

It can be verified that the path with P-atoms is a continuous δ-path. Let us explain the reason. The first atom in which a null is born is P(c,⊥₁,⊥₂), with ⊥₁,⊥₂ being the new nulls, and continuity is satisfied since ⊥₂ is propagated (this is indicated via the red dashed arrows) to the next atom where the new null ⊥₃ is born. Now, since the null ⊥₃ is propagated up to the next birth atom P(⊥₃,⊥₁,⊥₆), continuity is satisfied. In the rest of the path the same pattern is repeated, and thus continuity is globally satisfied. In fact, the pattern that we can extract is graphically illustrated as

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Fige_HTML.png

where the continuous propagation of a new null (red dashed arrows) can be seen.

We are now ready to establish our main technical result concerning continuity. Note that the next result holds for arbitrary (not necessarily sticky) sets of TGDs.

Lemma 1

For every infinite so-chase derivation δ of cr(Σ) w.r.t Σ, there exists a δ-path that is continuous.

Proof

Assume that δ = (I_i)_i≥ 0 is an infinite so-chase derivation of cr(Σ) w.r.t Σ, and let $\mathcal {I} = \bigcup _{i \geq 0} I_{i}$. Given a term $t \in {\mathsf {dom}({\mathcal {I}})}$, which is either the constant ♮ or a null, and a null $\bot \in {\mathsf {dom}({\mathcal {I}})}$, we write $t \vartriangleright _{\delta } \bot $ if t occurs in fr(birth_δ(⊥)).⁴ Now, for each $t \in {\mathsf {dom}({\mathcal {I}})}$, we inductively define the rank (w.r.t. δ) of t as follows:

$$ \begin{array}{@{}rcl@{}} &\mathsf{rank}_{\delta}(t)\ =\ \left\{ \begin{array}{ll} 0 & \quad \text{if } t = \natural,\\ &\\ 1 + \max\{ \mathsf{rank}(t^{\prime}) \mid t^{\prime} \vartriangleright_{\delta} t\} & \quad \text{otherwise.} \end{array} \right.& \end{array} $$

For a null $\bot \in \mathsf {dom}(\mathcal I)$, we select a term $t \in \mathsf {dom}(\mathcal I)$ such that rank(t) = rank(⊥) − 1 and $t \vartriangleright _{\delta } \bot $, and we write $t \blacktriangleright _{\delta } \bot $.⁵ Since, by assumption, all the TGDs of Σ have a non-empty frontier, it is easy to see that the binary relation $\blacktriangleright _{\delta }$ over $\mathsf {dom}(\mathcal I)$ forms an infinite rooted tree T_δ, where the root is the constant occurring in cr(Σ). The key property of T_δ is given by the following result:

Lemma 2

T_δ has finite out-degree.

Proof

We can show that, for each i ≥ 0, the set $\{t \in \mathsf {dom}(\mathcal I) \mid \mathsf {rank}(t) = i\}$ is finite. This can be shown by induction on i ≥ 0, while the key fact is that only finitely many semi-obliviously active triggers can be formed due to which a null with rank i + 1 is generated (since, by induction hypothesis, the set of terms with rank at most i is finite). Therefore, the nodes of T_δ have finite out-degree. □

Having Lemma 2 in place, we can apply König’s Lemma on T_δ, and get that T_δ contains an infinite simple path starting from the root node.⁶ Let ⊥₀,⊥₁,⊥₂,… be such a path, where ⊥₀ = ♮. By construction, for each i ≥ 0, ⊥_i ∈fr(birth_δ(⊥_i+ 1)). Moreover, there is a sequence of atoms

$$ {\alpha^{i}_{0}}, {\alpha^{i}_{1}}, {\alpha^{i}_{2}}, \ldots, \alpha^{i}_{m_{i}}, $$

where ${\alpha ^{i}_{0}} \in \mathsf {cr}({\varSigma })$ if i = 0, ${\alpha ^{i}_{0}} = \mathsf {birth}_{\delta }(\bot _{i})$ if i > 0, and $\alpha ^{i}_{m_{i}} = \mathsf {birth}_{\delta }(\bot _{i+1})$, such that ${\alpha ^{i}_{k}} {\prec _{{\delta }} {\alpha }_{k+1}^{i}}$, for each k ∈{0,…,m_i − 1}, and there are integers s, t ∈ [ar(Σ)] such that $({{\alpha }_{j}^{i}})_{0 \leq j \leq m_{i}}$ is (s, t)-connected. Consequently, the infinite sequence

$$ {\alpha}_{0}^{0},{\alpha}_{1}^{0} = {\alpha}_{0}^{1},{\alpha}_{1}^{1},\ldots,{\alpha}_{m_{1}}^{1} = {\alpha}_{0}^{2},{\alpha}_{1}^{2},\ldots,{\alpha}_{m_{2}}^{2} = {\alpha}_{0}^{3},\ldots $$

is a continuous δ-path, and the claim follows. □

4.2 From Arbitrary to Linear Infinite Derivations

We can now show that an infinite so-chase derivation δ = (I_i)_i≥ 0 of cr(Σ) w.r.t. Σ can always be converted into a linear infinite so-chase derivation δ_ℓ, which in turn establishes the non-trivial direction (1) ⇒ (2) of Theorem 3. The construction proceeds in three main steps, which are described below. We first describe those steps in a semi-formal way, and exploit Example 4 in order to illustrate the key ideas. We then proceed to give the formal construction.

4.2.1 Semi-Formal Description

Useful part of δ

We first isolate a useful part of the so-chase derivation δ = (I_i)_i≥ 0. By Lemma 1, there exists a continuous infinite δ-path P = (α_i)_i≥ 0. Recall that continuity guarantees the continuous propagation on P of infinitely many nulls, which we call for the purpose of this discussion pivotal. By stickiness, there exists j ≥ 0 such that α_j is the last atom on P in which a term t becomes sticky. By saying that the term t becomes sticky, we mean that the first time t participates in a join is during the trigger application that generates α_j, and thus t occurs in (or sticks to) every atom of {α_i}_i≥j. Let k ≥ j be the integer such that α_k is the first atom on P after α_j in which a new pivotal null is invented. The useful part of δ that we are going to focus on is the infinite sequence of atoms (α_i)_i≥k, which we call the backbone, and the atoms of $\bigcup _{i \geq 0} I_{i}$, which we call side atoms, that are needed to generate the atoms on the backbone via a single trigger application. In other words, for a backbone atom α, if α is obtained via the trigger (σ, h) for Σ on instance I_i, for some i ≥ 0, then the atoms h(body(σ)), excluding the backbone atoms, are side atoms.

Example 5

Consider again the set ${\varSigma } \in \mathbb {S}$ from Example 4. As discussed above, there exists an infinite so-chase derivation δ of cr(Σ) w.r.t. Σ such that a continuous infinite δ-path exists (see the figures above). The useful part of δ is as shown below

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figf_HTML.png

Observe that the last atom on the continuous path in which a term becomes sticky is P(⊥₂,⊥₁,⊥₃); in fact, the sticky term is ⊥₁, which is the only sticky term on the continuous path. It happened that P(⊥₂,⊥₁,⊥₃) invents also a new pivotal null, that is, ⊥₃, and therefore the suffix of the continuous path that starts at P(⊥₂,⊥₁,⊥₃) is the backbone. It is now easy to verify that all the other atoms, apart from S(♮), indeed contribute to the generation of a backbone atom via a single trigger application.

Renaming step

We proceed to rename some of the nulls that occur in backbone atoms or side atoms. In particular, for every null ⊥ occurring in a side atom α, we apply the following renaming steps; recall that ♮ is the constant occurring in cr(Σ): (i) every occurrence of ⊥ in α is replaced by ♮, and (ii) every occurrence of ⊥ in a backbone atom β that is propagated from α to β is replaced by ♮. For a backbone or side atom α, let ρ(α) be the atom obtained from α after globally applying the above renaming steps. We now define the sequence of instances $\delta ^{\prime } = (J_{i})_{i \geq 0}$ as follows:

$$ J_{0}\ =\ \{\rho(\alpha) \mid \alpha \text{ is a side atom}\}, $$

which is a subset of cr(Σ), and

$$ J_{i}\ =\ J_{i-1} \cup \{\rho(\alpha_{k+i-1})\} \cup H, \quad \text{for } i>0, $$

where H is the set of atoms that are generated together with α_k+i− 1, after renaming the propagated nulls that do not occur in ρ(α_k+i− 1) to ♮.⁷ It is crucial to observe that a new null generated in a backbone atom never participates in a join. This is because the first backbone atom α_k comes after the atom α_j, which is the last atom on P in which a term becomes sticky. This fact allows us to modify triggers from δ in order to construct, for every i ≥ 0, a trigger (σ_i,h_i) such that J_i〈so, σ_i,h_i〉J_i+ 1.

Example 6

We consider again our running example. Before renaming the nulls that appear in side atoms, we first need to understand how nulls are propagated from side atoms to backbone atoms during the chase. This is depicted in the following figure

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figg_HTML.png

Note that the boldfaced occurrences of the pivotal nulls ⊥₃,⊥₆,… are not propagated from side atoms, but generated on the backbone, and thus will not be renamed. Recall that the existence of such nulls is guaranteed by continuity. By applying the renaming step, i.e., by replacing every null in a side atom with the constant ♮, and then propagating it to the backbone as indicated above, we get the sequence of instances

$$ \begin{array}{@{}rcl@{}} &J_{0} = \{R(\natural,\natural),P(\natural,\natural,\natural)\},&\\ &J_{1} = J_{0} \cup \{P(\natural,\natural,\bot_{3}),R(\natural,\bot_{3})\},&\\ &J_{2} = J_{1} \cup \{P(\natural,\natural,\bot_{3})\},&\\ &J_{3} = J_{2} \cup \{P(\natural,\natural,\bot_{3})\},&\\ &J_{4} = J_{3} \cup \{P(\bot_{3},\natural,\bot_{6}),R(\natural,\bot_{6})\},&\\ &\vdots& \end{array} $$

Observe that, due to stickiness, none of the nulls ⊥₃,⊥₆,… generated on the backbone participates in a join. Hence, the renaming step preserves all the joins, and thus, by adapting triggers from δ, we can devise a trigger for each pair (J_i,J_i+ 1).

Pruning step

At this point, one may be tempted to think that $\delta ^{\prime } = (J_{i})_{i \geq 0}$, with J_i〈so, σ_i,h_i〉J_i+ 1 for i ≥ 0, is the desired linear infinite so-chase derivation of cr(Σ) w.r.t. Σ. It is easy to verify that we have the infinite sequence of atoms (ρ(α_i))_i≥k− 1 such that ρ(α_k− 1) ∈cr(Σ) since α_k− 1 is a side atom, and for each i ≥ k − 1, $J_{i-k+2} \supseteq J_{i-k+1} \cup \{\rho (\alpha _{i+1})\}$, and there exists β ∈body(σ_i) such that h_i(β_i) = α_i and $h_{i}(\mathsf {body}(\sigma _{i}) \setminus \{\beta \}) \subseteq \mathsf {cr}({\varSigma })$. However, we cannot conclude yet that $\delta ^{\prime }$ is the desired so-chase derivation since we may apply a trigger that is not semi-obliviously active. This can be fixed by simply removing the instances that are obtained via a trigger that is not semi-obliviously active, which actually means that we keep only one occurrence of each instance. However, such a pruning step might be applied infinitely many times, and thus, the question that we need to answer is whether the obtained so-chase derivation $\delta ^{\prime \prime }$ remains infinite. Interestingly, this is the case due to continuity. Since the backbone (α_i)_i≥k is part of a continuous δ-path, we get that all the (necessarily consecutive) occurrences of an instance appear between two instances in which new pivotal nulls are invented. Since the distance between such instances is finite, each instance occurs in $\delta ^{\prime }$ finitely many times. Therefore, after the pruning step, the obtained so-chase derivation is infinite. Thus, $\delta ^{\prime \prime }$ is a linear infinite so-chase derivation of J₀ w.r.t. Σ. Since $J_{0} \subseteq {\mathsf {cr}({\varSigma })}$, we can construct a linear infinite so-chase derivation δ_ℓ of cr(Σ) w.r.t. Σ by adding to J₀ the set of atoms cr(Σ) ∖ J₀.

Example 7

Coming back to our running example, it can be seen that the sequence of instances devised in Example 6 is not the desired linear derivation due to non-active triggers, which implies that there are consecutive occurrences of the same instance, e.g., J_I,J₂,J₃. However, after applying the pruning step, we get the sequence

$$ \begin{array}{@{}rcl@{}} &{J}_{0}^{\prime} = J_{0},&\\ &{J}_{1}^{\prime} = J_{1},&\\ &{J}_{2}^{\prime} = J^{\prime}_{1} \cup \{P(\bot_{3},\natural,\bot_{6}),R(\natural,\bot_{6})\},&\\ &{J}_{3}^{\prime} = J^{\prime}_{2} \cup \{P(\bot_{6},\natural,\bot_{9}),R(\natural,\bot_{9})\},&\\ &\vdots& \end{array} $$

Now, it is easy to verify that after adding the atoms S(♮) and Q(♮) to $J^{\prime }_{0}$, we get a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ.

4.2.2 The Formal Construction

Useful part of δ

In what follows, let $\mathcal {I} = \bigcup _{i \ge 0} I_{i}$. By Lemma 1, there exists a continuous infinite δ-path P = (α_i)_i≥ 0. That is, there exist an infinite sequence (ℓ_i)_i≥ 0 of integers, where ℓ₀ = 1 and ℓ₀ < ℓ₁ < ⋯, an infinite sequence (⊥_i)_i≥ 0 of nulls, and infinite sequences of integers (m_i)_i≥ 0 and (p_i)_i≥ 0 from [ar(Σ)], such that, for each k ≥ 0, $\alpha _{\ell _{k}}= {\mathsf {birth}_{\delta }(\bot _{k})}$, $\bot _{k} = \alpha _{\ell _{k}}[m_{k}]$, and $(\alpha _{i})_{\ell _{k} \le i \le \ell _{k+1}}$ is (m_k,p_k)-connected. Given an atom α_i = so-result(σ, h) for some i > 0, we say that a term $t \in \mathsf {dom}(\mathcal {I})$ becomes sticky in α_i if i is the smallest integer such that there is a variable x ∈fr(σ) occurring more than once in body(σ) and t = h(x). By stickiness of Σ, and from the fact that the arity of each predicate in Σ is finite, there exists some j ≥ 0 such that, for every i ≥ j, no term $t \in {\mathsf {dom}(\mathcal {I})}$ becomes sticky in α_i. Let k ≥ 0 be the smallest integer such that no term becomes sticky in $\alpha _{\ell _{k}}$. Note that this integer exists since the sequence of indices (ℓ_i)_i≥ 0 is infinite. We call the sequence of atoms $B = (\alpha _{i})_{i \ge \ell _{k}}$ the backbone of δ; let ${\mathscr{B}} = \{\alpha _{i}\}_{i \ge \ell _{k}}$. Furthermore, we define the set

$$ S\ =\ \{\alpha \in \mathcal{I}\setminus \mathcal{B} \mid \text{there exists } \beta \in \mathcal{B} \text{ such that } \alpha \prec_{\delta} \beta\}, $$

which we call the side atoms of B. The set $S \cup {\mathscr{B}}$ is the useful part of δ.

Renaming step

For notational convenience, let B = (β_i)_i≥ 0. We define $B^{\prime }$ as the sequence of atoms obtained from B as follows. For every atom $\beta = R(\bar t) \in {\mathscr{B}}$, and every integer j ∈ [ar(R)], if there exists an atom $\alpha = S(\bar u) \in S$, and an integer i ∈ [ar(S)], with α[i] ∈N, such that an (i, j)-connected δ-path of the form α,…,β exists, then the term at position (β, j) is renamed to the constant ♮. Furthermore, for each i ≥ 0, assuming that β_i = so-result(σ, h), let $\eta _{i} = (\sigma ,h^{\prime })$ be the trigger obtained from (σ, h) after updating h to $h^{\prime }$ as dictated by the renaming step applied to β_i. Assuming $B^{\prime }=({\beta }_{i}^{\prime })_{i \ge 0}$, we define $\delta ^{\prime }=(J_{i})_{i \ge 0}$ such that J₀ = cr(Σ), and for i > 0, $J_{i} = J_{i-1} \cup \{{\beta }_{i-1}^{\prime }\}$. Since

no term becomes sticky in some atom of B, and
for each null ⊥ such that $\beta = \mathsf {birth}_{\delta }(\bot ) \in {\mathscr{B}}$, which means that ⊥ has been generated on B, ⊥ is not renamed on $B^{\prime }$,

we get that, for i ≥ 0, η_i is a trigger for Σ on J_i, and thus, J_i+ 1 = J_i ∪so-result(η_i).

Pruning step

Although $\delta ^{\prime } = (J_{i})_{i\ge 0}$ is locally consistent, that is, for each i ≥ 0, J_i〈so, σ_i,h_i〉J_i+ 1, for some trigger (σ_i,h_i) for Σ on J_i, it is not yet a so-chase derivation since some of the involved triggers might not be semi-obliviously active. In other words, there might exist i ≥ 0 such that J_i〈so, σ_i,h_i〉J_i+ 1, but so-result(σ_i,h_i) ∈ J_i, which implies that J_i = J_i+ 1. Let $\delta ^{\prime \prime }$ be the sequence of instances obtained from $\delta ^{\prime }$ where only one occurrence of each instance is kept. Clearly, $\delta ^{\prime \prime }$ is still locally consistent. Moreover, by exploiting continuity, we can show that, for each i ≥ 0, J_i occurs in $\delta ^{\prime }$ finitely many times, which in turn implies that $\delta ^{\prime \prime }$ is a linear infinite so-chase derivation of cr(Σ) w.r.t. Σ, as needed.

5 Graph-Based Characterization of Semi-Oblivious Chase Termination

In this section, we characterize the termination of the semi-oblivious chase for sticky sets of TGDs via a graph-based condition. More precisely, we show that a set ${\varSigma } \in \mathbb {S}$ belongs to ${\mathbb {CT}}_{\forall }^{\mathsf {so}}$ iff a linearized version of it, i.e., a set of linear TGD obtained from Σ, enjoys a condition inspired by weak-acyclicity [17], called critical-weak-acyclicity, introduced in [8]. Recall that linear TGDs are TGDs with only one body atom [12]; we write ${\mathbb {L}}$ for the class of linear TGDs. The proof of the above result boils down to showing that the given sticky set Σ of TGDs can be rewritten into a set of linear TGDs, while this rewriting preserves the termination of the semi-oblivious chase. The latter heavily relies on Theorem 3, which establishes that non-termination of the semi-oblivious chase coincides with the existence of a linear infinite chase derivation of cr(Σ) w.r.t. Σ. We can then apply the characterization for the termination of the semi-oblivious chase for linear TGDs from [8], which states that a set ${\varSigma } \in {\mathbb {L}}$ belongs to ${{\mathbb {CT}}_{\forall }^{\mathsf {so}}}$ iff it is critically-weakly-acyclic. Let us first recall the notion of critical-weak-acyclicity for linear TGDs, which has been originally introduced in [8].

5.1 Critically-Weakly-Acyclic Linear TGDs

Since critical-weak-acyclicity is inspired by weak-acyclicity, it is not surprising that it relies on the dependency graph of a set of TGDs introduced in [17], that encodes how terms might be propagated during the chase. We assume a fixed ordering on the head-atoms of TGDs.⁸ For a TGD σ with head(σ) = α₁,…,α_k, we write (σ, i) for the TGD that has only one atom in its head, called single-head, obtained from σ by keeping only the atom α_i, and the existentially quantified variables in α_i. Recall that pos(α, x) is the set of positions in α at which x occurs, while pos(body(σ),x) is the set of positions at which x occurs in the body of σ. We also write pos(sch(Σ)) for the set of positions of sch(Σ), i.e., the set {(R, i)∣R/n ∈sch(Σ) and i ∈ [n]}.

Definition 4 (Dependency Graph)

The dependency graph of a set Σ of TGDs is a labeled directed multigraph dg(Σ) = (N, E, λ), where N = pos(sch(Σ)), $\lambda : E {\rightarrow } {\varSigma } \times \mathbb {N}$, and E contains only the following edges. For each σ ∈Σ with head(σ) = α₁,…,α_k, for each x ∈fr(σ), and for each position π ∈pos(body(σ),x):

For each i ∈ [k], and for each position $\pi ^{\prime } \in \mathsf {pos}(\alpha _{i},x)$, there is a normal edge $e = (\pi ,\pi ^{\prime }) \in E$ with λ(e) = (σ, i).
For each existentially quantified variable z in σ, for each i ∈ [k], and for each $\pi ^{\prime } \in \mathsf {pos}(\alpha _{i},z)$, there is a special edge $e = (\pi ,\pi ^{\prime }) \in E$ with λ(e) = (σ, i).

Intuitively speaking, a normal edge $(\pi ,\pi ^{\prime })$ encodes the fact that a term may propagate from π to $\pi ^{\prime }$ during the chase. Moreover, a special edge $(\pi ,\pi ^{\prime \prime })$ keeps track of the fact that the propagation of a term from π to $\pi ^{\prime }$ also creates a new null at position $\pi ^{\prime \prime }$. A simple example that illustrates the notion of the dependency graph follows:

Example 8

Consider the set Σ consisting of the TGD

$$ \sigma\ =\ R(x,y)\ {\rightarrow}\ \exists z P(x,z),R(x,z). $$

The graph dg(Σ) is as follows

https://static-content.springer.com/image/art%3A10.1007%2Fs00224-020-09994-5/MediaObjects/224_2020_9994_Figh_HTML.png

where the dashed arrows represent special edges. The normal edges occur due to the variable x, while the special edges due to the existentially quantified variable z.

The class of weakly-acyclic sets of TGDs is a well-known formalism, introduced in the context of data exchange, that guarantees the termination of the semi-oblivious chase [17].⁹ Formally, a set Σ is weakly-acyclic if there is no cycle in dg(Σ) that contains a special edge. It would be extremely useful if, whenever we focus on linear TGDs, weak-acyclicity is also a necessary condition for the termination of the semi-oblivious chase. Unfortunately, this is not the case. A simple counterexample follows:

Example 9

Consider the set Σ of linear TGDs consisting of $ R(x,x) {\rightarrow } \exists z R(z,x). $ In dg(Σ) there is a cycle that contains a special edge. However, there exists only one so-chase derivation of cr(Σ) w.r.t. Σ that is finite, and thus, ${\varSigma } \in {{\mathbb {CT}}_{\forall }^{\mathsf {so}}}$.

Interestingly, as it has been shown in [8], there is an extension of weak-acyclicity, called critical-weak-acyclicity, that, whenever we focus on linear TGDs, provides a necessary and sufficient condition for the termination of the semi-oblivious chase. A key notion underlying critical-weak-acyclicity is the notion of compatibility among two single-head linear TGDs. Intuitively, if a single-head linear TGD σ₁ is compatible with a single-head linear TGD σ₂, then the atom obtained during the chase by applying σ₁ may trigger σ₂. To formally define the notion of compatibility, we first need to recall the standard notion of unification among atoms.

We say that two atoms α and β (containing only variables from V) unify if there exists a substitution γ from the variables occurring in α and β to V, called unifier for α and β, such that γ(α) = γ(β). A most general unifier (MGU) for α and β is a unifier for α and β, denoted mgu(α, β), such that, for each other unifier γ for α and β, there exists a substitution $\gamma ^{\prime }$ such that $\gamma = \gamma ^{\prime } \circ {\mathsf {mgu}({\alpha },{\beta })}$. It is well-known that if two atoms α and β unify, then they have an MGU that is unique (modulo variable renaming), and thus, we can refer to the MGU for α and β [1]. For brevity, given a TGD σ and a variable x, let ${\varPi }_{x}^{\sigma } = \mathsf {pos}(\mathsf {body}(\sigma ),x)$. Let var(α, π), where α is an atom, and π a set of positions, be the set of variables in α at positions of π.

Definition 5 (Compatibility)

Consider two single-head linear TGDs σ₁ and σ₂. We say that σ₁ is compatible with σ₂ if the following hold:

head(σ₁) and body(σ₂) unify.

For each x ∈var(body(σ₂)), either $\mathsf {var}(\mathsf {head}(\sigma _{1}),{\varPi }_{x}^{\sigma _{2}}) \subseteq \mathsf {fr}(\sigma _{1})$, or there is an existentially quantified variable z in σ₁ such that $\mathsf {var}(\mathsf {head}(\sigma _1),{\Pi }_{x}^{\sigma _2}) = \{z\}$.

Having the notion of compatibility among two single-head linear TGDs in place, we can recall the notion of resolvent of a sequence σ₁,…,σ_n of single-head linear TGDs, which is in turn a single-head TGD. Roughly, such a resolvent mimics the behavior of the sequence σ₁,…,σ_n during the chase. Notice that the existence of such a resolvent is not guaranteed, but if it exists, this implies that we may have a sequence of trigger applications that involve the TGDs σ₁,…,σ_n in this order. In such a case, we call the sequence σ₁,…,σ_n active.

Definition 6 (Resolvent)

The resolvent of a sequence σ₁,…,σ_n of single-head linear TGDs, denoted [σ₁,…,σ_n], is inductively defined as follows (for notational convenience, we simply write ρ for [σ₁,…,σ_n− 1]):

[σ₁] = σ₁.

$[\sigma _{1},\ldots ,\sigma _{n}] = \gamma (\mathsf {body}(\rho )) {\rightarrow } \gamma (\mathsf {head}(\sigma _{n}))$, with γ = mgu(head(ρ),body(σ_n)), if $\rho \neq \blacklozenge $ and ρ is compatible with σ_n; otherwise, $[\sigma _{1},\ldots ,\sigma _{n}] = \blacklozenge $.

The sequence σ₁,…,σ_n is called active if $[\sigma _{1},\ldots ,\sigma _{n}] \neq \blacklozenge $.

At this point, one may think that the right extension of weak-acyclicity, which will provide a necessary condition for the termination of the semi-oblivious chase under linear TGDs, is to allow cycles with special edges in the underlying dependency graph as long as the corresponding sequence of single-head TGDs, which can be extracted from the edge labels, is not active. However, as thoroughly discussed in [8], this is not enough. If a cycle with a special edge is labeled with an active sequence, then we can conclude that it will be traversed at least once during the chase. Nevertheless, it is not guaranteed that it will be traversed infinitely many times. A cycle that is labeled with an active sequence σ₁,…,σ_n, and contains a special edge, will be certainly traversed infinitely many times if the resolvent of the sequence ρ,…,ρ of length k, where ρ = [σ₁,…,σ_n], exists, for every k > 0. Interestingly, for ensuring the latter condition, it suffices to consider sequences of length at most (ω + 1), where ω is the arity of the predicate of body(σ₁). This brings us to critical sequences. For brevity, we write σ^k for the sequence σ,…,σ of length k.

Definition 7 (Critical Sequence)

A sequence σ₁,…,σ_n of single-head linear TGDs is critical if σ₁,…,σ_n is active, and [σ₁,…,σ_n]^ω+ 1 is active, where ω is the arity of the predicate of body(σ₁).

We can now recall critical-weak-acyclicity as defined in [8]. It is essentially weak-acyclicity, with the key difference that a cycle in the underlying graph is “dangerous”, not only if it contains a special edge, but if it is also labeled with a critical sequence of single-head linear TGDs. The formal definition follows.

Definition 8 (Critical-Weak-Acyclicity)

Consider a set ${\varSigma } \in \mathbb {L}$ of TGDs, and let dg(Σ) = (N, E, λ). A cycle v₀,v₁,…,v_n,v₀ in ΣgΣ is critical if the sequence λ(v₀,v₁),λ(v₁,v₂),…,λ(v_n,v₀) of single-head linear TGDs is critical. We call Σ critically-weakly-acyclic if no critical cycle in dg(Σ) contains a special edge.

The essence of critical-weak-acyclicity is revealed by the following result:

Theorem 4 (8)

Consider a set ${\varSigma } \in \mathbb {L}$ of TGDs. The following are equivalent:

${\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$.

Σ is critically-weakly-acyclic.

5.2 From Sticky to Linear TGDs

Before presenting the linearization procedure, we need to introduce some auxiliary notions. Given a tuple $\bar {t} = (t_{1},\ldots ,t_{n}) \in ({\mathbf {V}} \cup \{\natural \})^{n}$, we write ${\mathsf {shape}(\bar {t})}$ for the tuple obtained from ${\bar {t}}$ by replacing each variable of V with the special symbol ∗. For example, shape((x, y, ♮, z, x, ♮)) = (∗,∗,♮,∗,∗,♮). We also write $\mathsf {svar}(\bar {t})$ for the tuple obtained from $\bar {t}$ by removing all the occurrences of the constant ♮. For example, svar((x, y, ♮, z, x, ♮)) = (x, y, z, x). For an atom $\alpha = R(\bar {t})$, let

$$ \natural\text{-}\mathsf{free}(\alpha)\ =\ R_{\mathsf{shape}(\bar{t})}(\mathsf{svar}(\bar{t})), $$

where $R_{\mathsf {shape}(\bar {t})}$ is of arity $|\mathsf {svar}(\bar {t})|$. In fact, ♮-free(α) is the constant-free version of α, while the subscript $\mathsf {shape}(\bar {t})$ keeps track of the original shape of α, i.e., where each occurrence of ♮ occurs in α. Notice that, having the constant-free version of an atom α in place, we can unambiguously write down α. For a set of atoms A, let

$$ \natural\text{-}\mathsf{free}(A)\ =\ \{\natural\text{-}\mathsf{free}(\alpha) \mid \alpha \in A\}. $$

Given a TGD σ and an atom α ∈body(σ), let

$$ V_{\alpha,\sigma}\ =\ \mathsf{var}(\mathsf{body}(\sigma) \setminus \{\alpha\}), $$

that is, the set of body variables of σ that do not occur only in α. Now, given a TGD σ, and an atom α ∈body(σ), let

$$ M_{\alpha,\sigma}\ =\ \{h : T {\rightarrow} \{\natural\} \mid V_{\alpha,\sigma} \subseteq T \subseteq \mathsf{var}(\mathsf{body}(\sigma))\}, $$

i.e., for each subset T of var(body(σ)) that contains all the variables of V_{α, σ}, M_{α, σ} contains a mapping that maps each variable of T to the special constant ♮.

We are now ready to introduce the linearization of a set of TGDs. Note that the following definition talks about arbitrary TGDs. The notion of stickiness is used later for showing that the termination of the semi-oblivious chase is preserved.

Definition 9 (Linearization)

The linearization of a TGD σ of the form $\phi (\bar x,\bar y) {\rightarrow } \exists \bar z \psi (\bar x,\bar z)$, denoted Lin(σ), is the set of linear TGDs

$$ \underset{\alpha \in \phi(\bar x,\bar y)}{\bigcup} \underset{h \in M_{\alpha,\sigma}}{\bigcup} \left\{\natural\text{-}\mathsf{free}(h(\alpha))\ {\rightarrow}\ \exists \bar z {\natural\text{-}\mathsf{free}(h(\psi(\bar x,\bar z)))} \right\}. $$

The linearization of a set Σ of TGDs is defined as $\mathsf {Lin}({\varSigma }) = \bigcup _{\sigma \in {\varSigma }} \mathsf {Lin}(\sigma )$.

The linearization procedure converts a TGD σ into a set of (constant-free) linear TGDs, where the body atom of each such linear TGD corresponds to an atom α of body(σ), while the variables in body(σ) ∖{α}, and possibly additional variables that occur only in α, are instantiated with the special constant ♮. An example follows:

Example 10

Consider the TGD

$$ \sigma\ =\ \underbrace{P(x,y,z)}_{\alpha_{1}}, \underbrace{R(y,w)}_{\alpha_{2}}\ {\rightarrow}\ \exists u S(w,y,u), $$

We have that the set $V_{\alpha _{1},\sigma }$ consists of all the variables in body(σ) ∖{α₁}, while $V_{\alpha _{2},\sigma }$ of all the variables in body(σ) ∖{α₂}, that is,

$$ V_{\alpha_{1},\sigma}\ =\ \{y,w\} \qquad V_{\alpha_{2},\sigma}\ =\ \{x,y,z\}. $$

There are four sets of variables in body(σ) that are supersets of $V_{\alpha _{1},\sigma }$:

$$ T_{1} = \{y,w\} \qquad T_{2} = \{x,y,w\} \qquad T_{3} = \{y,z,w\} \qquad T_{4} = \{x,y,z,w\}. $$

Moreover, there are two sets of variables in body(σ) that are supersets of $V_{\alpha _{2},\sigma }$:

$$ T^{\prime}_{1} = \{x,y,z\} \qquad T^{\prime}_{2} = \{x,y,z,w\}. $$

Therefore, the set of mappings $M_{\alpha _{1},\sigma }$ contains, for each i ∈{1,…,4}, a mapping of the form {x↦♮∣x ∈ T_i}, whereas $M_{\alpha _{2},\sigma }$ contains, for each i ∈{1,2}, a mapping of the form $\{x \mapsto \natural \mid x \in T^{\prime }_{i}\}$. Finally, the linearization of σ is the set of linear TGDs Lin(σ) consisting of the TGDs

$$ \begin{array}{@{}rcl@{}} P_{{}_{(\ast,\natural,\ast)}}(x,z) &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u),\\ P_{{}_{(\natural,\natural,\ast)}}(z) &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u),\\ P_{{}_{(\ast,\natural,\natural)}}(x) &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u),\\ P_{{}_{(\natural,\natural,\natural)}}() &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u), \end{array} $$

due to the set of mappings $M_{\alpha _{1},\sigma }$, and the TGDs

$$ \begin{array}{@{}rcl@{}} R_{{}_{(\natural,\ast)}}(w) &{\rightarrow}& \exists u S_{{}_{(\ast,\natural,\ast)}}(w,u),\\ R_{{}_{(\natural,\natural)}}() &{\rightarrow}& \exists u S_{{}_{(\natural,\natural,\ast)}}(u). \end{array} $$

due to the set of mappings $M_{\alpha _{2},\sigma }$.

Lemma 3 allows us to show that this procedure preserves the termination of the semi-oblivious chase whenever the input set of TGDs is sticky.

Lemma 3

For every set ${\varSigma } \in \mathbb {S}$ of TGDs, ${\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$ iff $\mathsf {Lin}({\varSigma }) \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$.

Proof

We assume, w.l.o.g., that Σ is in normal form, i.e., each TGD of Σ has only one atom in its head. Given a TGD $\sigma = \phi (\bar x, \bar y) {\rightarrow } \exists \bar z\ R(\bar x,\bar z)$ of Σ, an atom α ∈body(σ), and a mapping h ∈ M_{α, σ}, we write σ^{α, h} for the linear TGD ${\natural \text {-}\mathsf {free}(h(\alpha ))} {\rightarrow } \exists \bar z\ {\natural \text {-}\mathsf {free}({h(R(\bar x, \bar z))})} \in {\mathsf {Lin}({\varSigma })}$. We are now ready to proceed with the proof.

(⇒) Assume that $\mathsf {Lin}({\varSigma }) \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$. By Corollary 1, there exists an infinite so-chase derivation δ = (I_i)_i≥ 0 of cr(Lin(Σ)) w.r.t. Lin(Σ), with $I_{i} {\langle {\sigma ^{\alpha _{i},g_{i}}_{i},h_{i}} \rangle } I_{i+1}$, or, equivalently, $I_{i+1} = I_{i} \cup {\mathsf {so}}\text {-}{\mathsf {result}({\sigma ^{\alpha _{i},g_{i}}_{i}},{h_{i}})}$, for each i ≥ 0. By construction, for each i ≥ 0, the TGD $\sigma ^{\alpha _{i},g_{i}}_{i}$ is linear, with its body being an atom of the form $R_{\bar t}(\bar x)$, where $\bar x$ is a tuple of variables, and $\bar t$ is a tuple over {∗,♮}. Let $\hat {h}_{i}$ be the extension of h_i that maps each variable in body(σ_i) that is not in $\bar {x}$ to ♮.

Consider the infinite sequence of instances $\delta ^{\prime } = (J_{i})_{i \ge 0}$, where J₀ = cr(Σ), and $J_{i+1} = J_{i} \cup \mathsf {so}\text {-}{\mathsf {result}({\sigma _{i}},{\hat {h}_{i}})}$, for each i ≥ 0. By construction of $\hat {h}_{i}$, for each i ≥ 0, $(\sigma _{i},\hat {h}_{i})$ is a trigger for Σ on J_i, which implies that $J_{i}{\langle {{\mathsf {so}},\sigma _{i},\hat {h}_{i}} \rangle }J_{i+1}$. However, the above sequence is not necessarily an infinite so-chase derivation of Σ w.r.t. cr(Σ) since some of the involved triggers may not be semi-obliviously active. We therefore consider the infinite sequence of instances $\delta ^{\prime \prime } = (J^{\prime }_{i})_{i \geq 0}$ obtained from (J_i)_i≥ 0 by simply removing the triggers that are not semi-obliviously active, or, more formally, by removing the instances obtained from triggers that are not semi-obliviously active. Recall that, for each i > 0, J_i is obtained via a trigger that is not semi-obliviously active iff J_i = J_i− 1. It remains to show that $\delta ^{\prime \prime }$ is infinite, which in turn implies that ${\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$. It suffices to show that, for each i ≥ 0, J_i occurs in $\delta ^{\prime }$ finitely many times.

By contradiction, assume that there is k ≥ 0 such that J_k occurs in $\delta ^{\prime }$ infinitely many times, which means that J_k = J_k+ 1 = J_k+ 2 = ⋯. Since Lin(Σ) is finite, there are indices k ≤ j < i such that $\sigma ^{\alpha _{i},g_{i}}_{i} = \sigma ^{\alpha _{j},g_{j}}_{j}$; we refer to those TGDs as ρ. Since (I_i)_i≥ 0 is a so-chase derivation of Lin(Σ) w.r.t. cr(Lin(Σ)), there is a variable x ∈fr(ρ) such that h_i(x)≠h_j(x). By definition, $h_{i}(x) = \hat {h}_{i}(x)$ and $h_{j}(x) = \hat {h}_{j}(x)$. This implies that $\hat {h}_{i}(x) \neq \hat {h}_{j}(x)$, and therefore, J_i≠J_j, which is a contradiction.

(⇐) Assume now that ${\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$. By Theorem 3, there exists a linear infinite so-chase derivation δ = (I_i)_i≥ 0 of cr(Σ) w.r.t. Σ, with I_i〈σ_i,h_i〉I_i+ 1, for i ≥ 0. In other words, there exists an infinite sequence of distinct atoms (α_i)_i≥ 0 such that

α₀ ∈cr(Σ), and

for each i ≥ 0, α_i+ 1 ∈ I_i+ 1 ∖ I_i, and there exists an atom β_i ∈body(σ_i) such that α_i = h_i(β_i) and $h_{i}(\mathsf {body}({\sigma _{i}}) \setminus \{\beta _{i}\}) \subseteq {\mathsf {cr}({\varSigma })}$.

For every i ≥ 0, let X_i = var(body(σ_i) ∖{β_i}), $g_{i} = h_{i | X_{i}}$, and f_i = h_i ∖ g_i. Note that, since δ is linear, for every x ∈ X_i, g_i(x) = ♮. Assuming that σ_i is of the form $\phi _{i}(\bar x_{i}, \bar y_{i}) {\rightarrow } \exists \bar z_{i}\ R_{i}(\bar x_{i},\bar z_{i})$, let

$$ {\varSigma}^{\prime}\ =\ \left\{\underbrace{\natural\text{-}\mathsf{free}(g_{i}(\beta_{i})) {\rightarrow} \exists \bar z_{i} \natural\text{-}\mathsf{free}(g_{i}(R_{i}(\bar x_{i},\bar z_{i})))}_{\sigma^{\prime}_{i}}\right\}_{i \geq 0}. $$

By construction, ${\varSigma }^{\prime } \subseteq \mathsf {Lin}({\varSigma })$, and therefore, ${\varSigma }^{\prime } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$ implies $\mathsf {Lin}({\varSigma }) \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$. Hence, to conclude the proof of Lemma 3, it suffices to exhibit an infinite so-chase derivation of ${\mathsf {cr}({\varSigma }^{\prime })} \subseteq {\mathsf {cr}({\varSigma })}$ w.r.t. ${\varSigma }^{\prime }$. To this end, consider the infinite sequence of instances $\delta ^{\prime } = (J_{i})_{i \ge 0}$, where $J_{0} = \mathsf {cr}({\varSigma }^{\prime })$, and $J_{i} = J_{i-1} \cup \{\alpha ^{\prime }_{i}\}$, where $\alpha ^{\prime }_{i} = f_{i}({\mathsf {body}({{\sigma }_{i}^{\prime }})}) = f_{i}({\natural \text {-}\mathsf {free}(g_{i}(\beta _{i}))})$, for each i > 0. It is not difficult to verify that $\delta ^{\prime }$ is an infinite so-chase derivation of ${\mathsf {cr}({\varSigma }^{\prime })}$ w.r.t. ${\varSigma }^{\prime }$ (modulo null renaming). □

The main result of this section follows from Theorem 4 and Lemma 3:

Theorem 5

Consider a set ${\varSigma } \in \mathbb {S}$ of TGDs. The following are equivalent:

${\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$.

Lin(Σ) is critically-weakly-acyclic.

We would like to remark that the above characterization of the termination of the so-chase is different than the one presented in the conference paper [10] (see Corollary 24). More precisely, the linearization procedure in [10] produces TGDs with constants, and thus, we had to properly extend the notion of critical-weak-acyclicity from [8], which was defined only for constant-free TGDs. It turned out that we can directly use the notion of critical-weak-acyclicity as defined in [8], at the price of a slightly more complex linearization procedure that produces constant-free TGDs.

6 Complexity Analysis

We are now ready to complete the proof of our main result, that is, Theorem 2, which states that $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ and $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$ are PSpace-complete, and NLogSpace-complete for predicates of bounded arity. We first concentrate on the upper bounds.

6.1 Upper Bounds for $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$

We know that the problem $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {L})$ is in PSpace, and in NLogSpace for predicates of bounded arity [8]. In fact, these results exploit Theorem 4, and are obtained by showing that deciding whether a set of linear TGDs is critically-weakly-acyclic is in PSpace, and in NLogSpace for predicates of bounded arity. However, despite the fact that we can reduce $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$ to $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {L})$ (see Lemma 3), we cannot directly exploit the complexity results for linear TGDs. The reason is because the linearization procedure takes exponential time, in general, and polynomial time in the case of bounded-arity predicates. Therefore, we cannot simply compute the set Lin(Σ), and then check for critical-weak-acyclicity, but a more refined approach is needed.

We focus on the complement of our problem, i.e., given a set ${\varSigma } \in \mathbb {S}$ of TGDs, we want to check whether ${\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$. By Theorem 5, it suffices to show that Lin(Σ) is not critically-weakly-acyclic. The latter problem can be seen as a generalization of the standard graph reachability problem. Indeed, we need to check whether there exists a node in the dependency graph of Lin(Σ) that is reachable from itself via a critical cycle that contains a special edge. However, as discussed above, we cannot explicitly construct Lin(Σ) and its dependency graph G. Instead, the above reachability check should be performed on a compact representation of G, which is the set Σ itself.

Lemma 4

Consider a set Σ of TGDs. The problem of deciding whether Lin(Σ) is not critically-weakly-acyclic is in

$$ \textup{NSPACE}(\log |{\varSigma}| + \omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m), $$

where ω = ar(Σ), and m is the number of variables occurring in Σ.

Proof

We employ Algorithm 1, which takes as input a set Σ of TGDs, and checks whether Lin(Σ) is not critically-weakly-acyclic by non-deterministically searching for the existence of a critical cycle in dg(Lin(Σ)) that contains a special edge. Note that this is done without explicitly computing the graph dg(Lin(Σ)). Before showing that Algorithm 1 is correct, and analyzing its space complexity, let us first introduce an auxiliary notion that is used by the algorithm.

A Σ-label is a tuple ℓ = (σ, α, β, T), where σ ∈Σ, α ∈body(σ), β ∈head(σ), and $V_{\alpha ,\sigma } \cap ({\mathsf {var}(\alpha )} \cup {\mathsf {var}(\beta )}) \subseteq T \subseteq {\mathsf {var}({{\mathsf {body}(\sigma )}})} \cap ({\mathsf {var}({\alpha })} \cup {\mathsf {var}({\beta }))}$. Note that a Σ-label induces a single-head TGD of Lin(Σ), denoted τ(ℓ), which might be the label of an edge in dg(Lin(Σ)); hence the name Σ-label. In fact, τ(ℓ) is the TGD

$$ \natural\text{-}\mathsf{free}(h(\alpha))\ {\rightarrow}\ \exists \bar z {\natural\text{-}\mathsf{free}(h(\beta))}, $$

where h maps each variable of T to ♮, and $\bar z$ are the existentially quantified variables of σ occurring in β. Let us now proceed with the correctness of Algorithm 1. □

Correctness

It is an easy exercise to show the following claim concerning the procedure isEdge(⋅,⋅,⋅), which is described in Algorithm 2. Let dg(Lin(Σ)) = (N, E, λ).

Claim 1

Consider a Σ-label ℓ, and two positions v, u ∈sch(Lin(Σ)). Then:

isEdge(ℓ, v, u) = (1,_) iff (v, u) ∈ E and λ((v, u)) = τ(ℓ).

isEdge(ℓ, v, u) = (1,1) iff (v, u) ∈ E is a special edge.

We can now show that Algorithm 1 accepts on input Σ iff there is a critical cycle in dg(Lin(Σ)) that contains a special edge:

(⇒) Assume first that Algorithm 1 accepts. This implies that there is an accepting computation that guesses a sequence of Σ-labels ℓ₁,…,ℓ_n, and a sequence of positions v₀,…,v_n of pos(sch(Lin(Σ))). By construction, the following hold:

For each i ∈ [n], isEdge(ℓ_i,v_i− 1,v_i) = (1,_), which implies that (v_i− 1,v_i) ∈ E and λ((v_i− 1,v_i)) = τ(ℓ).
There exists i ∈ [n] such that isEdge(ℓ_i,v_i− 1,v_i) = (1,1). Hence, (v_i− 1,v_i) ∈ E is a special edge.
For each i ∈ [n − 1], [τ(ℓ₁),…,τ(ℓ_i− 1)] is compatible with τ(ℓ_i+ 1), which in turn implies that ρ = [τ(ℓ₁),…,τ(ℓ_n)] is active.
v₀ = v_n.
[ρ]^ω+ 1 is active.

The above properties imply that v₀,…,v_n− 1,v₀ is a critical cycle in dg(Lin(Σ)) that contains at least one special edge, and the claim follows.

(⇐) By hypothesis, there exists a critical cycle v₀,…,v_n− 1,v₀ in dg(Lin(Σ)). Assume that, for each i ∈ [n − 1], λ((v_i− 1,v_i)) = σ_i and λ((v_n− 1,v₀)) = σ_n. Observe that each single-head linear TGD σ_i, for i ∈ [n], corresponds to a Σ-label ℓ_i. Consider now the computation of Algorithm 1 on input Σ that guesses the sequence of labels ℓ₁,…,ℓ_n, and the sequence of positions v₀,…,v_n− 1,v₀. It should be clear that this is an accepting computation, and the claim follows.

Space Complexity

It remains to show that the space needed at each step of a computation of Algorithm 1 on input Σ is

$$ O(\log |{\varSigma}| + \omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m). $$

We proceed to analyze the space needed to store a Σ-label, a position of sch(Lin(Σ)), and a single-head linear TGD τ(ℓ) for a Σ-label ℓ. We also analyze the space for a compatibility check, for constructing a resolvent, and for checking whether [ρ]^ω+ 1 is active. It will be then apparent that indeed the space used at each step of a computation of Algorithm 1 on input Σ is what we claimed above. Note that the latter ensures the termination of Algorithm 1 since we can always force a space-bounded algorithm to terminate and reject after exponentially many steps in the required space [31].

The required space for a Σ-label ℓ = (σ, α, β, T) is
$$ O(\log |{\varSigma}| + \log |\mathsf{sch}({\varSigma})| + \omega \cdot \log m). $$
A TGD σ ∈Σ takes $O(\log |{\varSigma }|)$ space. For storing an atom occurring in Σ we need to store a predicate of sch(Σ), and, in the worst-case, ω variables occurring in Σ, which can be done in space $O(\log |{\mathsf {sch}({\varSigma })}| + \omega \cdot \log m)$. Finally, since T contains at most 2ω variables occurring in Σ, it can be stored in space $O(\omega \cdot \log m)$. Summing up, ℓ requires the space stated above.
A position of sch(Lin(Σ)) takes space
$$ O(\log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega})). $$
This follows from the fact that, by construction, the number of predicates occurring in Lin(Σ) is at most |sch(Σ)|⋅ 2^ω.
Given a Σ-label ℓ, the single-head linear TGD τ(ℓ) takes space
$$ O(\log(|\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m). $$
It is clear that the two predicates occurring in τ(ℓ) require O(|sch(Σ)|⋅ 2^ω) space. We also need to store, in the worst-case, 2ω variables occurring in Σ, which takes $O(\omega \cdot \log m)$ space. Summing up, τ(ℓ) requires the space stated above.
To check whether two single-head linear TGDs σ₁,σ₂, computed during the execution of the algorithm, are compatible, we only need to check that, for each x ∈var(body(σ₂)), either $\mathsf {var}({\mathsf {head}(\sigma _{1})},{\varPi }_{x}^{\sigma _{2}}) \subseteq \mathsf {fr}(\sigma _{1})$, or there is an existentially quantified variable z in σ₁ such that ${\mathsf {var}({{\mathsf {head}(\sigma _{1})}},{\varPi }_{x}^{\sigma _{2}})} = \{z\}$. The latter can be performed using space
$$ O(\omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega})), $$
which is the space needed for storing ${\varPi }_{x}^{\sigma _{2}}$. Note that head(σ₁) and body(σ₂) always unify since, at the point that we perform the compatibility check, we know that they have the same predicate (this has been checked by isEdge), and thus, there is no way for the unification check to fail.
Given two single-head linear TGDs σ₁,σ₂, computed during the execution of the algorithm, that are compatible, constructing [σ₁,σ₂] can be done using space
$$ O(\omega \cdot \log m), $$
which is essentially the space needed for storing mgu(head(σ₁),body(σ₂)).¹⁰
Finally, given the single-head linear TGD ρ, computed after the execution of the repeat-until, checking whether [ρ]^ω+ 1 is active can be done using space
$$ O(\omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m). $$
This easily follows from the analysis performed above.

Summing up, each step of a computation of Algorithm 1 on input Σ takes space

$$ O(\log |{\varSigma}| + \omega \cdot \log(\omega \cdot |\mathsf{sch}({\varSigma})| \cdot 2^{\omega}) + \omega \cdot \log m), $$

and the claim follows.

Having Lemma 4 in place, it is clear that the complement of $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$, and thus $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$ itself, is in PSpace, and in NLogSpace for predicates of bounded arity.¹¹

6.2 Upper Bounds for $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$

We proceed to explain how the upper bounds established above for $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$ can be transferred to $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$. Since the semi-oblivious chase is a refined version of the oblivious chase, it is not surprising that $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {TGD})$ can be reduced to $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {TGD})$. This relies on a very simple construction, known as enrichment [20]. Formally, the enrichment of a set Σ of TGDs, denoted enrichment(Σ), is the set of TGDs obtained by replacing each TGD σ ∈Σ of the form $\phi (\bar x,\bar y) {\rightarrow } \exists \bar z\ \psi (\bar x,\bar z)$ with the TGD

$$ \phi(\bar x,\bar y) {\rightarrow} \exists \bar z\ \psi(\bar x,\bar z), \text{Aux}_{\sigma}(\bar x, \bar y), $$

where Aux_σ is an auxiliary predicate of arity $(|\bar x|+|\bar y|)$ not occurring in sch(Σ). It is easy to show that the notion of enrichment provides the desired reduction from $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {TGD})$ to $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {TGD})$. More precisely:

Lemma 5

For a set Σ of TGDs, the following are equivalent:

${\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {o}}$.

$\mathsf {enrichment}({\varSigma }) \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$.

Proof

The fact that (2) implies (1) is a consequence of Theorem 6.1 in [20]. We proceed to show that (1) implies (2). Assume that $\mathsf {enrichment}({\varSigma }) \not \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$. Therefore, there exists a database D, and an infinite so-chase derivation δ = (I_i)_i≥ 0 of D w.r.t. enrichment(Σ), where $I_{i} {\langle {{\mathsf {so}},{\sigma }_{i}^{\prime }, {h_{i}}\rangle }, I_{i+1}}$, and, for each i ≥ 0, $\sigma _{i}^{\prime } \in {\mathsf {enrichment}({\varSigma })}$ is the TGD corresponding to some TGD σ_i ∈Σ. We can easily construct an infinite o-chase derivation $\delta ^{\prime } = (J_{i})_{i \ge 0}$ of D w.r.t. Σ: let J₀ = D, and, for each i > 0, J_i is obtained from I_i by removing every atom with a predicate symbol of the form Aux_σ, for some σ ∈Σ, and by replacing every null of the form $\bot ^{z}_{\sigma ^{\prime },h}$ with the null $\bot ^{z}_{\sigma ,h}$. The fact that J_i〈o, σ_i,h_i〉J_i+ 1 and (σ_i,h) is obliviously active, for each i ≥ 0, follows by construction and the observation that, for each TGD $\sigma _{i}^{\prime } \in {\mathsf {enrichment}({\varSigma })}$ for i ≥ 0, ${\mathsf {fr}({\sigma _{i}^{\prime }})}$ coincides with var(body(σ_i)). □

It is also crucial to observe that the class of sticky sets of TGDs is closed under enrichment. In other words:

Lemma 6

For each set ${\varSigma } \in \mathbb {S}$, $\mathsf {enrichment}({\varSigma }) \in \mathbb {S}$.

Proof

Consider a set ${\varSigma } \in \mathbb {S}$. The claim follows from the fact that, for each σ ∈Σ of the form $\phi (\bar x,\bar y) {\rightarrow } \exists \bar z\ \psi (\bar x,\bar z)$, the corresponding TGD $\sigma ^{\prime } \in {\mathsf {enrichment}({\varSigma })}$ of the form $\phi (\bar x,\bar y) {\rightarrow } \exists \bar z\ \psi (\bar x,\bar z), \text {Aux}_{\sigma }(\bar x, \bar y)$ is such that the atom $\text {Aux}_{\sigma }(\bar x, \bar y)$ contains every body variable of σ, and no TGD in enrichment(Σ) has an atom with predicate Aux_σ in its body. Hence, no variable in enrichment(Σ) is marked. □

From Lemmas 5 and 6, we can conclude that $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ can be reduced in logspace to $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$. Since, as shown above, ${\mathsf {CT}_{\forall }^{{\mathsf {o}}}}(\mathbb {S})$ is in PSpace, and PSpace is closed under logspace reductions, we immediately get that ${\mathsf {CT}_{\forall }^{{\mathsf {o}}}}(\mathbb {S})$ is in PSpace, as needed. However, in the case of predicates of bounded arity, we cannot immediately inherit the NLogSpace upper bound since the reduction provided by Lemmas 5 and 6 introduces auxiliary predicates of the form Aux_σ, where σ is a TGD, of unbounded arity. Indeed, the arity of this auxiliary predicate is the number of variables occurring in the body of the TGD σ, which can be unbounded even if we use only bounded-arity predicates. Nevertheless, a predicate Aux_σ does not occur in the body of a TGD, which means that it cannot be part of a cycle in the underlying dependency graph. Therefore, we can consider a slightly modified version of Algorithm 1 that simply ignores the atoms that mention a predicate Aux_σ. By relying on this modified algorithm, we get that for a set Σ of TGDs over predicates of bounded arity, checking whether Lin(enrichment(Σ)) is not critically-weakly-acyclic is in NLogSpace, which implies that ${\mathsf {CT}_{\forall }^{{\mathsf {o}}}}(\mathbb {S})$ is in NLogSpace for predicates of bounded arity.

6.3 Lower Bounds

We conclude the proof of Theorem 2 by providing matching lower bounds. We show that $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ and $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$ are PSpace-hard, and NLogSpace-hard for predicates of bounded arity. Let us start with the general case of unbounded-arity predicates.

Predicates of Unbounded Arity

Since, as discussed above, $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ can be reduced in logspace to $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$, it suffices to show that $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ is PSpace-hard, or, equivalently, its complement is PSpace-hard. We show that every problem in PSpace can be reduced in logspace to the complement of $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$. Fix a problem π in PSpace, and let M = (S, Λ, f, s₁) be the deterministic polynomial space Turing machine that solves π, where S = {s₁,…,s_q} is the set of states of M, Λ = {0,1,⊔} is the tape alphabet of M with ⊔ being the blank symbol, $f : S \times {\varLambda } {\rightarrow } (S \times {\varLambda } \times \{\leftarrow ,-,\rightarrow \})$ is the transition function of M, and s₁ ∈ S is the initial state. We assume, w.l.o.g., that M is well-behaved and never tries to read beyond its tape boundaries, always halts, and uses exactly n = m^k tape cells, where k > 0 and m is the size of the input string. We also assume that the machine accepts if it reaches a configuration with state s₂. For the purposes of the current proof, we represent a configuration of M as a string

$$ s,t_{1},c_{1},t_{2},c_{2},\ldots,t_{n},c_{n}, $$

where s ∈ S, (t_i,c_i) ∈Λ×{↑, ♭}, for each i ∈ [n], and there is exactly one i ∈ [n] such that c_i =↑. Such a string encodes the configuration where the state is s, the i-th cell of the machine contains the symbol t_i, and, assuming that c_i =↑, the cursor is on the i-th cell of the machine. For example, the initial configuration of M on input I = a₁,…,a_m is

$$ s_{1},a_{1},\uparrow,a_{2},\flat,\ldots,a_{m},\flat,\underbrace{\sqcup,\flat,\ldots,\sqcup,\flat}_{n-m}. $$

Our goal is to construct a set ${\varSigma } \in \mathbb {S}$ such that M accepts on input I = a₁,…,a_m iff ${\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {o}}$, or, equivalently, there exists an infinite o-chase derivation of cr(Σ) w.r.t. Σ. The high-level idea is, starting from an atom of the form Start(♮,⊥,♮), where ⊥ is a null, to apply a TGD σ_start and generate the initial configuration of M on input I, which will be stored in a predicate Config. Then, each application of a TGD will mimic a transition rule of f and generate a valid configuration of M. Once an accepting configuration is reached, then an atom of the form $\text {Start}(\bot ,\bot ^{\prime },\natural )$, where $\bot ^{\prime }$ is a null different than ⊥, will be generated, which will trigger again the TGD σ_start. This will give rise to an infinite o-chase derivation of cr(Σ) w.r.t. Σ. To achieve this, however, via a sticky set of TGDs, we need a proper encoding of a configuration of M as a Config-atom generated during the execution of the chase. We proceed to explain this encoding, which will then allow us to define our sticky set of TGDs.

The key idea is to encode a state of M, the symbols of Λ, and the symbols ↑, ♭, as tuples consisting of a null ⊥∈N and a single occurrence of the constant ♮, while the position at which ♮ occurs in this tuple identifies the state or symbol in question. In particular, a state s_i ∈ S, for i ∈ [q], will be represented in the chase as a tuple

$$ \underbrace{\bot,\ldots,\bot}_{i-1},\natural,\underbrace{\bot,\ldots,\bot}_{|S|-i}. $$

The tape alphabet 0,1,⊔ will be represented as

$$ \natural,\bot,\bot \qquad \bot,\natural,\bot \qquad \bot,\bot,\natural, $$

respectively. Finally, the symbols ↑, ♭ will be represented as

$$ \natural,\bot \qquad \bot,\natural, $$

respectively. For example, assuming that I = 1,1,…,1,0, that is, a_i = 1, for each i ∈ [m − 1], and a_m = 0, the initial configuration of M on input I will be represented in the chase via an atom of the form $\text {Config}(\bar t_{1},\bar t_{2},\bar t_{3})$, where

$$ \bar t_1\ =\ \overbrace{\natural,\underbrace{\bot,\ldots,\bot}_{|S|-1}}^{s_1} $$

$$ \bar t_2\ =\ \overbrace{\bot,{\natural},\bot}^{1},\overbrace{{\natural},\bot}^{\uparrow}, \underbrace{\overbrace{\bot,{\natural},\bot}^{1},\overbrace{\bot,{\natural}}^{\flat},\ldots,\overbrace{\bot,{\natural},\bot}^{1},\overbrace{\bot,{\natural}}^{\flat}}_{m-2}, \overbrace{{\natural},\bot,\bot}^{0},\overbrace{\bot,{\natural}}^{\flat}, $$

$$ \bar t_3\ =\ \underbrace{\overbrace{\bot,\bot,\natural}^{\sqcup},\overbrace{\bot,\natural}^{\flat},\ldots,\overbrace{\bot,\bot,\natural}^{\sqcup},\overbrace{\bot,\natural}^{\flat}}_{n-m} $$

We are now ready to devise the desired set Σ of TGDs. For notional convenience, we are going to use the following abbreviations for tuples of variables:

$$ \text{for each } i \in [q],\ \pmb{s_{i}}\ =\ \underbrace{y,\ldots,y}_{i-1},z,\underbrace{y,\ldots,y}_{|S|-i} $$

$$ \pmb{0}\ =\ z,y,y \qquad \pmb{1}\ =\ y,z,y \qquad \pmb{\sqcup}\ = y,y,z \qquad \pmb{\uparrow}\ =\ z,y \qquad \pmb{\flat}\ =\ y,z $$

Observe that the above abbreviations follow exactly the same pattern as the encodings described above. In fact, whenever we need to refer to the state s_i in a TGD, we are going to use the abbreviation s_i, which means that the variable y will be mapped to a null, and the variable z to the constant ♮. Analogously, if we need to refer to a tape symbol or to ↑, ♭, we are going to use the abbreviations 0, 1, ⊔, ↑ and ♭, respectively. The set Σ consists of the following TGDs:

$$ \text{Start}(x,y,z)\ {\rightarrow}\ \text{Config}(\pmb{s_{1}},\pmb{a_{1}}, \pmb{\uparrow},\pmb{a_{2}},\pmb{\flat},\ldots,\pmb{a_{m}},\pmb{\flat},\underbrace{\pmb{\sqcup},\pmb{\flat},\ldots,\pmb{\sqcup},\pmb{\flat}}_{n-m},y,z) $$

which generates the initial configuration of M on input I = a₁,…,a_m. The rest of the TGDs are responsible for simulating the transition function of M.

Before defining those TGDs, let us introduce some additional abbreviations. Let

$$ \text{for each } i \in [n],\ \pmb{x_{i}}\ =\ {x}_{i}^{0},{x}_{i}^{1},{x}_{i}^{\sqcup} \quad \text{and} \quad \pmb{{x}_{i}^{c}}\ =\ {x}_{i}^{\uparrow},{x}_{i}^{\flat} $$

which will be used as placeholders for the content of the i-th cell, and the position of the cursor. In fact, the above abbreviations will simply help us to copy the unchanged cells from a configuration to its subsequent one. In addition, for t ∈ S ∪Λ∪{↑, ♭}, we write $\hat {\pmb {t}}$ for the tuple of variables obtained from the tuple t after replacing each occurrence of the variable y at some position i with the new variable ${y}_{t}^{i}$, while the variable z remains unchanged. More precisely:

$$ \text{for each } i \in [q],\ \hat{\pmb{s_{i}}}\ =\ {y}_{s_{i}}^{1},\ldots,{y}_{s_{i}}^{i-1},z,{y}_{s_{i}}^{i+1},\ldots,{y}_{s_{i}}^{q} $$

$$ \hat{\pmb{0}} = z,{y}_{0}^{1},{y}_{0}^{2} \qquad \hat{\pmb{1}} = {y}_{1}^{1},z,{y}_{1}^{2} \qquad \hat{\pmb{\sqcup}} = {y}_{\sqcup}^{1},{y}_{\sqcup}^{2},z \qquad \hat{\pmb{\uparrow}} = z,{y}_{\uparrow}^{1} \qquad \hat{\pmb{\flat}} = {y}_{\flat}^{1},z $$

The above abbreviations will allow us to break some unnecessary joins over the variable y in the bodies of the TGDs, which in turn will ensure that the obtained set of TGDs is sticky. We proceed to introduce the rest of the TGDs.

We first introduce several TGDs that simulate the transition rules of f where the cursor moves to the left. In particular, for each transition rule $f(s_{i},a) = (s_{j},b,\leftarrow )$, and for each ℓ ∈ [n], we have the TGD

$$ \begin{array}{@{}rcl@{}} \text{Config}({\hat{\pmb{s_i}}},{\pmb{x_1}},{\pmb{x_1^c}}, \ldots,{\pmb{x_{\ell-1}}},{\pmb{x_{\ell-1}^c}}, {\hat{\pmb{a}}},{\hat{\pmb{\uparrow}}}, {\pmb{x_{\ell+1}}},{\pmb{x_{\ell+1}^c}},\ldots,{\pmb{x_{n}}},{\pmb{x_{n}^c}},y,z)\ \rightarrow\\ \text{Config}({\pmb{s_j}}, {\pmb{x_1}},{\pmb{x_1^c}},\ldots,{\pmb{x_{\ell-1}}},{\pmb{\uparrow}},{\pmb{b}},{\pmb{\flat}},{\pmb{x_{\ell+1}}},{\pmb{x_{\ell+1}^c}},\ldots,{\pmb{x_{n}}},{\pmb{x_{n}^c}},y,z). \end{array} $$

Analogously, we have TGDs that simulate the transition rules of f where the cursor moves to the right. In particular, for each transition rule $f(s_{i},a) = (s_{j},b,\rightarrow )$, and for each ℓ ∈ [n], we have the TGD

$$ \begin{array}{@{}rcl@{}} &&\text{Config}(\hat{\pmb{s_{i}}},\pmb{x_{1}},\pmb{{x_{1}^{c}}}, \ldots,\pmb{x_{\ell-1}},\pmb{x_{\ell-1}^{c}}, \hat{\pmb{a}},\hat{\pmb{\uparrow}}, \pmb{x_{\ell+1}},\pmb{x_{\ell+1}^{c}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z)\ {\rightarrow}\\ &&\quad\quad\text{Config}(\pmb{s_{j}}, \pmb{x_{1}},\pmb{{x_{1}^{c}}},\ldots,\pmb{x_{\ell-1}},\pmb{x_{\ell+1}^{c}},\pmb{b},\pmb{\flat},\pmb{x_{\ell+1}},\pmb{\uparrow},\pmb{x_{\ell+2}},\pmb{x_{\ell+2}^{c}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z). \end{array} $$

We complete the simulation of the transition function by adding TGDs that simulate the transition rules of f where the cursor stays at the same position. In particular, for each transition rule f(s_i,a) = (s_j,b,−), and for each ℓ ∈ [n], we have the TGD

$$ \begin{array}{@{}rcl@{}} &&\text{Config}(\hat{\pmb{s_{i}}},\pmb{x_{1}},\pmb{{x_{1}^{c}}}, \ldots,\pmb{x_{\ell-1}},\pmb{x_{\ell-1}^{c}}, \hat{\pmb{a}},\hat{\pmb{\uparrow}}, \pmb{x_{\ell+1}},\pmb{x_{\ell+1}^{c}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z)\ {\rightarrow}\\ &&\quad\quad\quad\quad\text{Config}(\pmb{s_{j}}, \pmb{x_{1}},\pmb{{x_{1}^{c}}},\ldots,\pmb{x_{\ell-1}},\pmb{x_{\ell-1}^{c}},\pmb{b},\pmb{\uparrow},\pmb{x_{\ell+1}},\pmb{x_{\ell+1}^{c}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z). \end{array} $$

Finally, we introduce a TGD that checks whether an accepting configuration has been reached; recall that, by assumption, the machine accepts if it reaches a configuration with state s₂:

$$ \text{Config}(\hat{\pmb{s_{2}}},\pmb{x_{1}},\pmb{{x_{1}^{c}}},\ldots,\pmb{x_{n}},\pmb{{x_{n}^{c}}},y,z) {\rightarrow} \exists w \text{Start}(y,w,z). $$

Observe that, if an accepting configuration is reached, then a new atom of the form Start(_,⊥,♮) is obtained, where ⊥ is a new null value, which will trigger again the first TGD that generates the initial configuration. This completes the definition of Σ.

It is now easy to see that M accepts on input I iff there exists an infinite o-chase derivation of cr(Σ) w.r.t. Σ, or, equivalently, ${\varSigma } \not \in {\mathbb {CT}}_{\forall }^{{\mathsf {o}}}$. Let us remark that starting from cr(Σ), the first TGD that will be applied is the one with the Start predicate in the head, and an atom α of the form Start(♮,⊥,♮), where ⊥ is a null, will be generated. Then, α will trigger the TGD that generates the initial configuration of M on input I, which will then guarantee a faithful simulation of the computation of M on I. Moreover, it is easy to verify that ${\varSigma } \in \mathbb {S}$. The key reason why this holds is because the only variable that occurs more than once in the body of a TGD is z, which is always propagated to the head at the same position, which is actually the last one.

Predicates of Bounded Arity.

By Lemmas 5 and 6, and the definition of enrichment, for every set ${\varSigma } \in \mathbb {S}$, it holds that:

${\varSigma } \in {\mathbb {CT}}_{\forall }^{\mathsf {o}}$ iff $\mathsf {enrichment}({\varSigma }) \in {\mathbb {CT}}_{\forall }^{\mathsf {so}}$.
$\mathsf {enrichment}({\varSigma }) \in \mathbb {S}$.
$\mathsf {ar}(\mathsf {enrichment}({\varSigma })) = \max \limits \{\mathsf {ar}({\varSigma }),k\}$, where k ≥ 0 is the maximum number of variables occurring in the body of a TGD of Σ.

Therefore, to show that $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ and $\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})$ are NLogSpace-hard for predicates of bounded arity, it suffices to concentrate on ${\mathbb {CT}}_{\forall }^{\mathsf {o}}$ and provide a hardness result even for TGDs that can have at most k body variables for some fixed integer k ≥ 0. This can be easily shown via a logspace reduction from graph reachability. Consider a directed graph G = (N, E), and two nodes s, t ∈ N. The problem of deciding whether in G the node t is reachable from s is NLogSpace-hard. We are going to construct a sticky set Σ of TGDs that uses only unary predicates, and each TGD has only one body variable, such that t is reachable from s in G iff ${\varSigma } \not \in {\mathbb {CT}_{\forall }^{\mathsf {o}}}$. This in turn implies that $\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})$ is NLogSpace-hard since coNLogSpace = NLogSpace. The set Σ consists of the following TGDs:

$$ \begin{array}{@{}rcl@{}} \text{ Loop}(x) &\rightarrow& P_s(x)\\ P_v(x) &\rightarrow& P_u(x), \text{ for each } (v,u) \in E\\ P_t(x) &\rightarrow& \exists z \text{ Loop}(z). \end{array} $$

It is easy to see that indeed t is reachable from s iff ${\varSigma } \not \in {\mathbb {CT}}_{\forall }^{\mathsf {o}}$, and the claim follows.

7 Discussion and Future Work

We have studied all-instances (semi-)oblivious chase termination for sticky sets of TGDs, and provide precise complexity results. It turned out that our results and techniques allow us to obtain further complexity results concerning the chase procedure.

Further Results

Interestingly, our main result has already found applications in the context of (semi-)oblivious chase boundedness, which has been recently studied in [7]. Roughly, chase boundedness guarantees, not only the termination of the chase procedure, but also the existence of a uniform bound on the depth of the chase. As shown in [7], in the case of sticky sets of TGDs, (semi-)oblivious chase termination and (semi-)oblivious chase boundedness coincide. Therefore, from the main result of our work, we immediately get the complexity of deciding (semi-)oblivious chase boundedness in the case of sticky sets of TGDs.

Moreover, the techniques that have been developed in this work can also be applied to shy sets of TGDs, another well-known TGD-based formalism [25, 26]. Shy sets of TGDs are not powerful enough to join over null values that have been invented during the execution of the chase. This central property of shy sets of TGDs allows us to reuse the machinery developed for stickiness, and show that the (semi-)oblivious chase termination problem for shy sets of TGDs is PSpace-complete in general, and NLogSpace-complete for predicates of bounded arity.

Future Work

Although all-instances (semi-)oblivious chase termination for sticky sets of TGDs is well-understood, there are still interesting and highly non-trivial open questions that we are planning to address in the near future:

Whenever the (semi-)oblivious chase terminates under a sticky set of TGDs, what is the exact size of the final result?

Even if the (semi-)oblivious chase does not terminate, it might be the case that a finite universal model exists [15]. Can we decide, in the case of sticky sets of TGDs, whether a finite universal model exists? And what about its size?

Acknowledgements

This work has been funded by the EPSRC grant EP/S003800/1 “EQUID”, and the EPSRC programme grant EP/M025268/ “VADA”.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel Index-based, High-dimensional, Cosine Threshold Querying with Optimality Guarantees

Nächster Artikel Consistent Query Answering for Primary Keys in Datalog

We work with finite sets of TGDs only. Thus, in the rest of the paper, a set of TGDs is always finite.

Recall that Σ is in normal form, which means that head(σ) is a single atom. By abuse of notation, we treat so-result(σ, h) as an atom instead of a singleton set.

Although we assume that Σ is in normal form, for the sake of readability, in our example we use TGDs with more than one atom in the head. Moreover, instead of following the naming scheme for nulls introduced in Definition 1, we are going to use abbreviated names of the form ⊥_i, where i > 0.

Since we focus on TGDs with only one head atom, and since we apply only active triggers, given an atom $\alpha \in \mathcal {I} \setminus {\mathsf {cr}({\varSigma })}$ that is the birth atom of a null, we can extract the trigger that generates α, and therefore, we can unambiguously refer to its frontier.

We assume that there exists some fixed mechanism that selects t. For example, t can be the lexicographically first element of $\{t^{\prime } \mid \mathsf {rank}_{\delta }(t^{\prime }) = \mathsf {rank}(\bot ) - 1 \text { and } t^{\prime } \vartriangleright _{\delta } \bot \}$.

König’s Lemma is a well-known result from graph theory, which states that an infinite rooted tree with finite out-degree has an infinite directed simple path starting from the root.

Recall that, although we assume TGDs in normal form, for the sake of readability, in Example 4 (and thus, also in Example 5), we use TGDs with more than one atom in the head. This is why the set of atoms H is considered. Notice that H is empty in case of single-head TGDs.

Let us clarify that in this section, unlike Section 4, we consider arbitrary TGDs not in normal form. In the previous section, we assumed TGDs in normal form only for technical reasons.

Let us clarify that [17] does not consider the semi-oblivious, but the restricted (a.k.a. the standard) version of the chase. However, the exact same proof applies in the case of the semi-oblivious chase.

For constructing the MGU, one can use a simplified version of Robinson’s unification algorithm that does not consider functions symbols.

Recall that by the Immerman-Szelepcsényi Theorem, coNLogSpace = NLogSpace.

Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading, MA (1995)MATH

Aho, A.V., Sagiv, Y., Ullman, J.D.: Efficient optimization of a class of relational expressions. ACM Trans. Database Syst. 4(4), 435–454 (1979)CrossRef

Baget, J., Mugnier, M., Rudolph, S., Thomazo, M.: Walking the Complexity Lines for Generalized Guarded Existential Rules. In: IJCAI, pp 712–717 (2011)

Baget, J.F., Leclère, M., Mugnier, M.L., Salvat, E.: On rules with existential variables: walking the decidability line. Artif. Intell. 175 (9-10), 1620–1654 (2011)MathSciNetCrossRef

Beeri, C., Vardi, M.Y.: A proof procedure for data dependencies. J. ACM 31(4), 718–741 (1984)MathSciNetCrossRef

Benedikt, M., Konstantinidis, G., Mecca, G., Motik, B., Papotti, P., Santoro, D., Tsamoura, E.: Benchmarking the Chase. In: PODS, pp 37–52 (2017)

Bourhis, P., Leclėre, M., Mugnier, M., Tison, S., Ulliana, F., Gallois, L.: Oblivious and Semi-Oblivious Boundedness for Existential Rules. In: IJCAI, pp 1581–1587 (2019)

Calautti, M., Gottlob, G., Pieris, A.: Chase Termination for Guarded Existential Rules. In: PODS, pp 91–103 (2015)

Calautti, M., Greco, S., Molinaro, C., Trubitsyna, I.: Exploiting equality generating dependencies in checking chase termination. PVLDB 9(5), 396–407 (2016)

10.

Calautti, M., Pieris, A.: Oblivious Chase Termination: The Sticky Case. In: ICDT, pp 17:1–17:18 (2019)

11.

Calì, A., Gottlob, G., Kifer, M.: Taming the infinite chase: query answering under expressive relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013)MathSciNetCrossRef

12.

Calì, A., Gottlob, G., Lukasiewicz, T.: A general Datalog-based framework for tractable query answering over ontologies. J. Web Sem. 14, 57–83 (2012)CrossRef

13.

Calì, A., Gottlob, G., Pieris, A.: Towards more expressive ontology languages: the query answering problem. Artif. Intell. 193, 87–128 (2012)MathSciNetCrossRef

14.

Cuenca Grau, B., Horrocks, I., Krȯtzsch, M., Kupke, C., Magka, D., Motik, B., Wang, Z.: Acyclicity notions for existential rules and their application to query answering in ontologies. J. Artif. Intell. Res. 47, 741–808 (2013)MathSciNetCrossRef

15.

Deutsch, A., Nash, A., Remmel, J.B.: The Chase Revisisted. In: PODS, pp 149–158 (2008)

16.

Deutsch, A., Tannen, V.: Reformulation of XML Queries and Constraints. In: ICDT, pp 225–241 (2003)

17.

Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRef

18.

Gogacz, T., Marcinkowski, J.: All-Instances Termination of Chase is Undecidable. In: ICALP, pp 293–304 (2014)

19.

Gogacz, T., Marcinkowski, J., Pieris, A.: All-Instances Restricted Chase Termination. In: PODS. To appear (2020)

20.

Grahne, G., Onet, A.: Anatomy of the chase. Fundam. Inform. 157(3), 221–270 (2018)MathSciNetCrossRef

21.

Greco, S., Spezzano, F., Trubitsyna, I.: Stratification criteria and rewriting techniques for checking chase termination. PVLDB 4(11), 1158–1168 (2011)

22.

Krötzsch, M., Marx, M., Rudolph, S.: The Power of the Terminating Chase (Invited Talk). In: ICDT, pp 3:1–3:17 (2019)

23.

Krötzsch, M., Rudolph, S.: Extending Decidable Existential Rules by Joining Acyclicity and Guardedness. In: IJCAI, pp 963–968 (2011)

24.

Leclère, M., Mugnier, M., Thomazo, M., Ulliana, F.: A Single Approach to Decide Chase Termination on Linear Existential Rules. In: ICDT, pp 18:1–18:19 (2019)

25.

Leone, N., Manna, M., Terracina, G., Veltri, P.: Efficiently Computable Datalog∃ Programs. In: KR (2012)

26.

Leone, N., Manna, M., Terracina, G., Veltri, P.: Fast query answering over existential rules. ACM Trans. Comput. Log. 20(2), 12:1–12:48 (2019)MathSciNetCrossRef

27.

Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing implications of data dependencies. ACM Trans Database Syst. 4(4), 455–469 (1979)CrossRef

28.

Marnette, B.: Generalized Schema-Mappings: from Termination to Tractability. In: PODS, pp 13–22 (2009)

29.

Meier, M., Schmidt, M., Lausen, G.: On chase termination beyond stratification. PVLDB 2(1), 970–981 (2009)

30.

Nenov, Y., Piro, R., Motik, B., Horrocks, I., Wu, Z., Banerjee, J.: Rdfox: a Highly-Scalable RDF Store. In: ISWC, pp 3–20 (2015)

31.

Papadimitriou, C.H.: Computational complexity. Addison-Wesley, Reading, MA (1994)MATH

32.

Rudolph, S., Krötzsch, M., Hitzler, P.: All Elephants are Bigger than All Mice. In: DL (2008)

33.

Urbani, J., Krötzsch, M., Jacobs, C.J.H., Dragoste, I., Carral, D.: Efficient Model Construction for Horn Logic with Vlog - System Description. In: IJCAR, pp 680–688 (2018)

Titel: Semi-Oblivious Chase Termination: The Sticky Case
verfasst von: Marco Calautti
Andreas Pieris
Publikationsdatum: 17.08.2020
Verlag: Springer US
Erschienen in: Theory of Computing Systems / Ausgabe 1/2021
Print ISSN: 1432-4350
Elektronische ISSN: 1433-0490
DOI: https://doi.org/10.1007/s00224-020-09994-5

Springer Professional

Semi-Oblivious Chase Termination: The Sticky Case

Abstract

Publisher’s Note

1 Introduction

1.1 The Chase in a Nutshell

1.2 Restricted vs. (Semi-)Oblivious Chase

1.3 The Challenge of Non-Termination

1.4 Deciding the Termination of the Chase

1.5 Summary of Contributions

2 Preliminaries

3 The Chase Procedure

3.1 Chase Termination Problem

3.2 Some Useful Results

3.3 Our Main Result and Plan of Attack

4 Semantic Characterization of Semi-Oblivious Chase Non-Termination

4.1 Existence of a Continuous Path

4.2 From Arbitrary to Linear Infinite Derivations

4.2.1 Semi-Formal Description

4.2.2 The Formal Construction

5 Graph-Based Characterization of Semi-Oblivious Chase Termination

5.1 Critically-Weakly-Acyclic Linear TGDs

5.2 From Sticky to Linear TGDs

6 Complexity Analysis

6.1 Upper Bounds for \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\)

6.2 Upper Bounds for \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\)

6.3 Lower Bounds

7 Discussion and Future Work

Acknowledgements

Publisher’s Note

Springer Professional

Abstract

Publisher’s Note

1 Introduction

1.1 The Chase in a Nutshell

1.2 Restricted vs. (Semi-)Oblivious Chase

1.3 The Challenge of Non-Termination

1.4 Deciding the Termination of the Chase

1.5 Summary of Contributions

2 Preliminaries

3 The Chase Procedure

3.1 Chase Termination Problem

3.2 Some Useful Results

3.3 Our Main Result and Plan of Attack

4 Semantic Characterization of Semi-Oblivious Chase Non-Termination

4.1 Existence of a Continuous Path

4.2 From Arbitrary to Linear Infinite Derivations

4.2.1 Semi-Formal Description

4.2.2 The Formal Construction

5 Graph-Based Characterization of Semi-Oblivious Chase Termination

5.1 Critically-Weakly-Acyclic Linear TGDs

5.2 From Sticky to Linear TGDs

6 Complexity Analysis

6.1 Upper Bounds for \(\mathsf {CT}_{\forall }^{\mathsf {so}}(\mathbb {S})\)

6.2 Upper Bounds for \(\mathsf {CT}_{\forall }^{\mathsf {o}}(\mathbb {S})\)

6.3 Lower Bounds

7 Discussion and Future Work

Acknowledgements

Publisher’s Note

Weitere Artikel der Ausgabe 1/2021

Consistent Query Answering for Primary Keys in Datalog

Correction to: On the Expressive Power of Linear Algebra on Graphs

Special Issue on Database Theory

Characterizing Tractability of Simple Well-Designed Pattern Trees with Projection

Index-based, High-dimensional, Cosine Threshold Querying with Optimality Guarantees

On the Expressive Power of Linear Algebra on Graphs