Top

2011 | Book

Read chapter Read first chapter

Datalog Reloaded

First International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Revised Selected Papers

Editors: Oege de Moor, Georg Gottlob, Tim Furche, Andrew Sellers

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book constitutes the thoroughly refereed post-workshop proceedings of the First International Workshop on Datalog 2.0, held in Oxford, UK, in March 2010. The 22 revised full papers presented were carefully selected during two rounds of reviewing and improvements from numerous submissions. The papers showcase the state-of-the-art in theory and systems for datalog, divided in three sections: Properties, applications, and extensions of datalog.

Frontmatter

Section 1: Theoretical Aspects of Datalog

Datalog-Based Program Analysis with BES and RWL

Abstract

This paper describes two techniques for Datalog query evaluation and their application to object-oriented program analysis. The first technique transforms Datalog programs into an implicit Boolean Equation System (Bes) that can then be solved by using linear-time complexity algorithms that are available in existing, general purpose verification toolboxes such as Cadp. In order to improve scalability and to enable analyses involving advanced meta-programming features, we develop a second methodology that transforms Datalog programs into rewriting logic (Rwl) theories. This method takes advantage of the preeminent features and facilities that are available within the high-performance system Maude, which provides a very efficient implementation of Rwl. We provide evidence of the practicality of both approaches by reporting on some experiments with a number of real-world Datalog-based analyses.

María Alpuente, Marco Antonio Feliú, Christophe Joubert, Alicia Villanueva

Datalog for Security, Privacy and Trust

Abstract

Logic-based policy languages are appreciated because of their clean semantics and expressiveness. Datalog has been used for a long time as a foundation of many security models and policy languages. Recently, Description Logics (DLs for short) have been adopted as policy languages, too. In this paper we carry out a comparison of Datalog and Description Logics as policy languages, based both on expressiveness analysis and on an assessment of the current maturity of the two fields, expressly related to the representation and reasoning tasks involved in policy authoring, enforcement, and management. We shall argue that Datalog-based approaches are currently more powerful and mature than those based on pure DLs, although the ongoing research on the latter might change the picture in a near future. The potential of hybrid approaches will be briefly discussed.

Piero A. Bonatti

Answer Set Modules for Logical Agents

Abstract

Various approaches exist to the application of Answer Set Programming (ASP) in the agent realm. Nonetheless, a controversial point is how to combine answer set modules with the other modules an agent is composed of, considering that an agent can be seen as a set of “capabilities” that in suitable combination produce the overall agent behavior as an emergent behavior. In this paper, we outline a possible fruitful integration of ASP into many agent architectures, by introducing two kinds of modules: one that allows for complex reaction, the other one that allows for reasoning about necessity and possibility.

Stefania Costantini

First-Order Encodings for Modular Nonmonotonic Datalog Programs

Abstract

Recently Modular Nonmonotonic Logic Programs (MLP) have been introduced which incorporate a call-by-value mechanism and allow for unrestricted calls between modules, including mutual and self recursion, as an approach to provide module constructs akin to those in conventional programming in Nonmonotonic Logic Programming under Answer Set Semantics. This paper considers MLPs in a Datalog setting and provides characterizations of their answers sets in terms of classical (Herbrand) models of a first-order formula, extending a line of research for ordinary logic programs. To this end, we lift the well-known loop formulas method to MLPs, and we also consider the recent ordered completion approach that avoids explicit construction of loop formulas using auxiliary predicates. Independent of computational perspectives, the novel characterizations widen our understanding of MLPs and they may prove useful for semantic investigations.

Minh Dao-Tran, Thomas Eiter, Michael Fink, Thomas Krennwallner

Datalog Programs and Their Stable Models

Abstract

This paper is about the functionality of software systems used in answer set programming (ASP). ASP languages are viewed here, in the spirit of Datalog, as mechanisms for characterizing intensional (output) predicates in terms of extensional (input) predicates. Our approach to the semantics of ASP programs is based on the concept of a stable model defined in terms of a modification of parallel circumscription.

Vladimir Lifschitz

Exploiting Bounded Treewidth with Datalog (A Survey)

Abstract

Many intractable problems have been shown to become tractable if the treewidth of the underlying structure is bounded by a constant. An important tool for deriving such results is Courcelle’s Theorem, which states that all properties definable by Monadic Second Order (MSO) sentences are fixed-parameter tractable with respect to the treewidth. In principle, algorithms can be generated automatically from the MSO definition of a problem by exploiting the correspondence between MSO and finite tree automata (FTA). However, this approach has turned out to be problematic, since even relatively simple MSO formulae may lead to a ”state explosion” of the FTA.

Recently, monadic datalog (i.e., datalog where all intensional predicate symbols are unary) has been proposed as an alternative method to tackle this class of fixed-parameter tractable problems. On the one hand, if some property of finite structures is expressible in MSO then this property can also be expressed by means of a monadic datalog program. Moreover, the resulting fragment of datalog can be evaluated in linear time (both with respect to the program size and with respect to the data size). In this survey, we present the main ideas of this approach and its extension to counting and enumeration problems.

Reinhard Pichler

Equivalence between Extended Datalog Programs — A Brief Survey

Abstract

This paper gives a brief overview about the research field on equivalences in Answer-Set Programming. More precisely, we are concerned here with disjunctive logic programs under the stable-model semantics. Such programs can be understood as extended datalog queries (i.e., datalog augmented by default negation and disjunction). In particular, we shall report on characterizations and complexity results for the notions of strong and respectively uniform equivalence. Most notably, uniform equivalence becomes undecidable in the presence of default negation, while strong equivalence remains decidable for full disjunctive datalog. We also consider a restricted setting where the arity of predicates is bounded by a fixed constant.

Stefan Woltran

Section 2: Applications of Datalog

Cluster Computing, Recursion and Datalog

Abstract

The cluster-computing environment typified by Hadoop, the open-source implementation of map-reduce, is receiving serious attention as the way to execute queries and other operations on very large-scale data. Datalog execution presents several unusual issues for this enviroment. We discuss the best way to execute a round of seminaive evaluation on a computing cluster using the map-reduce. Using transitive closure as an example, we examine the cost of executing recursions in several different ways. Recursive processes such as evaluation of a recursive Datalog program do not fit the key map-reduce assumption that tasks deliver output only when they are completed. As a result, the resilience under compute-node failure that is a key element of the map-reduce framework is not supported for recursive programs. We discuss extensions to this framework that are suitable for executing recursive Datalog programs on very large-scale data in a way that allows progress to continue after node failures, without restarting the entire job.

Foto N. Afrati, Vinayak Borkar, Michael Carey, Neoklis Polyzotis, Jeffrey D. Ullman

Datalog-Related Aspects in Lixto Visual Developer

Abstract

Lixto Visual Developer is an integrated development environment specifically geared towards the visual development of Web data extraction programs, supporting complex navigation and extraction tasks on highly dynamic Web applications. Internally, created extraction rules are reflected in a declarative extraction language called Elog, which relies on a datalog syntax and semantics. It is ideally suited for representing and successively incrementing the knowledge about patterns described by application designers. In this paper, we illustrate aspects of the Visual Developer and the Elog language exploiting some examples.

Robert Baumgartner

Informing Datalog through Language Intelligence – A Personal Perspective

Abstract

Despite AI’s paramount aim of developing convincing similes of true natural language “understanding”, crucial knowledge that is increasingly becoming available to computers in text form on web repositories remains in fact decipherable only by humans. In this position paper, we present our views on the reasons for this failure, and we argue that for bringing computers closer to becoming true extensions of the human brain, we need to endow them with a cognitively-informed web by integrating new methodologies in the inter-disciplines involved, around the pivot of Logic Programming and Datalog.

Veronica Dahl

Dyna: Extending Datalog for Modern AI

Abstract

Modern statistical AI systems are quite large and complex; this interferes with research, development, and education. We point out that most of the computation involves database-like queries and updates on complex views of the data. Specifically, recursive queries look up and aggregate relevant or potentially relevant values. If the results of these queries are memoized for reuse, the memos may need to be updated through change propagation. We propose a declarative language, which generalizes Datalog, to support this work in a generic way. Through examples, we show that a broad spectrum of AI algorithms can be concisely captured by writing down systems of equations in our notation. Many strategies could be used to actually solve those systems. Our examples motivate certain extensions to Datalog, which are connected to functional and object-oriented programming paradigms.

Jason Eisner, Nathaniel W. Filardo

Datalog for the Web 2.0: The Case of Social Network Data Management

Abstract

The clean representation of recursive queries enabled by Datalog makes it a strong candidate to be used as the reference query language for social network data management. In this extended abstract we try to identify the capabilities that should be provided by a language for the manipulation of social data.

Matteo Magnani, Danilo Montesi

Context Modelling and Context-Aware Querying

(Can Datalog Be of Help?)

Abstract

Many interpretations of the notion of context have emerged in various fields and context-aware systems are pervading everyday life, becoming an expanding research field. Context has often a significant impact on the way humans (or machines) act, and on how they interpret things; furthermore, a change in context causes a transformation in the experience that is going to be lived. Accordingly, while the computer science community has initially perceived the context simply as a matter of user time and location, in the last few years this notion has been considered not simply as a state, but as part of a process in which users are involved; thus, sophisticated and general context models and systems have been proposed to support context-aware applications. In this paper we propose a foundational framework for the life-cycle of context-aware system, in which the system design and management activities consider context as an orthogonal, first-class citizen. In doing so, we present a Datalog-based formulation for the definition of context-aware databases.

Giorgio Orsi, Letizia Tanca

Using Datalog for Fast and Easy Program Analysis

Abstract

Our recent work introduced the Doop framework for points-to analysis of Java programs. Although Datalog has been used for points-to analyses before, Doop is the first implementation to express full end-to-end context-sensitive analyses in Datalog. This includes key elements such as call-graph construction as well as the logic dealing with various semantic complexities of the Java language (native methods, reflection, threading, etc.).

The findings from the Doop research effort have been surprising. We set out to create a framework that would be highly complete and elegant without sacrificing performance “too much”. By the time Doop reached maturity, it was a full order-of-magnitude faster than Lhoták and Hendren’s Paddle—the state-of-the-art framework for context-sensitive points-to analyses. For the exact same logical points-to definitions (and, consequently, identical precision) Doop is more than 15x faster than Paddle for a 1-call-site sensitive analysis, with lower but still substantial speedups for other important analyses. Additionally, Doop scales to very precise analyses that are impossible with prior frameworks, directly addressing open problems in past literature. Finally, our implementation is modular and can be easily configured to analyses with a wide range of characteristics, largely due to its declarativeness.

Although this performance difference is largely attributable to architectural choices (e.g., the use of an explicit representation vs. BDDs), we believe that our ability to efficiently optimize our implementation was largely due to the declarative specifications of analyses. Working at the Datalog level eliminated much of the artificial complexity of a points-to analysis implementation, allowing us to concentrate on indexing optimizations and on the algorithmic essence of each analysis.

Yannis Smaragdakis, Martin Bravenboer

Section 3: New Languages Extending Datalog

Distributed Datalog Revisited

Introduction

The emergence of Web 2.0 and social network applications has enabled more and more users to share sensitive information over the Web. The information they manipulate has many facets: personal data (e.g., pictures, movies, music, contacts, emails), social data (e.g., annotations, recommendations, contacts), localization information (e.g., bookmarks), access information (e.g., login, keys), web services (e.g., legacy data, search engines), access rights, ontologies, beliefs, time and provenance information, etc. The tasks they perform are very diverse: search, query, update, authentication, data extraction, etc. We believe that all this should be viewed in the holistic context of the management of a distributed knowledge base. Furthermore, we believe that datalog (and its extensions) forms the sound formal basis for representing such information and supporting these tasks. In this paper, we revisit datalog with this goal in mind. The focus of the presentation is on the formal extension of the model of distributed datalog and does not consider the implementation or the evaluation of the corresponding system [8].

Serge Abiteboul, Meghyn Bienvenu, Alban Galland, Marie-Christine Rousset

Dedalus: Datalog in Time and Space

Abstract

Recent research has explored using Datalog-based languages to express a distributed system as a set of logical invariants. Two properties of distributed systems proved difficult to model in Datalog. First, the state of any such system evolves with its execution. Second, deductions in these systems may be arbitrarily delayed, dropped, or reordered by the unreliable network links they must traverse. Previous efforts addressed the former by extending Datalog to include updates, key constraints, persistence and events, and the latter by assuming ordered and reliable delivery while ignoring delay. These details have a semantics outside Datalog, which increases the complexity of the language and its interpretation, and forces programmers to think operationally. We argue that the missing component from these previous languages is a notion of time.

In this paper we present Dedalus, a foundation language for programming and reasoning about distributed systems. Dedalus reduces to a subset of Datalog with negation, aggregate functions, successor and choice, and adds an explicit notion of logical time to the language. We show that Dedalus provides a declarative foundation for the two signature features of distributed systems: mutable state, and asynchronous processing and communication. Given these two features, we address two important properties of programs in a domain-specific manner: a notion of safety appropriate to non-terminating computations, and stratified monotonic reasoning with negation over time. We also provide conservative syntactic checks for our temporal notions of safety and stratification. Our experience implementing full-featured systems in variants of Datalog suggests that Dedalus is well-suited to the specification of rich distributed services and protocols, and provides both cleaner semantics and richer tests of correctness.

Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, Russell Sears

The Disjunctive Datalog System DLV

Abstract

DLV is one of the most successful and widely used answer set programming (ASP) systems. It supports a powerful language extending Disjunctive Datalog with many expressive constructs, including aggregates, strong and weak constraints, functions, lists, and sets. The system provides database connectivity offering a simple way for powerful reasoning on top of relational databases. In this paper, we provide an ample overview of the DLV system. We illustrate its input language and give indications on how to use it for representing knowledge. We also provide a panorama on the system architecture and the main optimizations it incorporates. We then focus on DLV^DB, an extension of the basic system which allows for tight coupling with traditional database systems. Finally, we report on a number industrial applications which rely on DLV.

Mario Alviano, Wolfgang Faber, Nicola Leone, Simona Perri, Gerald Pfeifer, Giorgio Terracina

Datalog as a Query Language for Data Exchange Systems

Abstract

The class of unions of conjunctive queries (UCQ) has been shown to be particularly well-behaved for data exchange; its certain answers can be computed in polynomial time (in terms of data complexity). However, this is not the only class with this property; the certain answers to any Datalog program can also can be computed in polynomial time. The problem is that both UCQ and Datalog do not allow for negated atoms, while most database query languages are equipped with negation. Unfortunately, adding an unrestricted form of negation to these languages yields to intractability of the problem of computing certain answers.

In order to face this challenge, we have recently proposed a language, called Datalog ^C( ≠ ) [5], that extends Datalog with a restricted form of negation while keeping the good properties of Datalog, and UCQ, for data exchange. In this article, we provide evidence in favor of the use of Datalog ^C( ≠ ) as a query language for data exchange systems. More precisely, we introduce the syntax and semantics of Datalog ^C( ≠ ), we present some of the fundamental results about this language shown in [5], and we extend those results to the case of data exchange settings that allow for constraints in the target schema. All of these results provide justification for the use of Datalog ^C( ≠ ) in practice.

Marcelo Arenas, Pablo Barceló, Juan Reutter

Datalog Relaunched: Simulation Unification and Value Invention

Abstract

For reasoning on the Web, Datalog is lacking data extraction and value invention. This article proposes to overcome these limitations with “simulation unification” and “RDFLog”.

Simulation unification is a non-standard unification inspired from regular path queries. Like standard unification, it yields bindings for variables in both terms to unify. Unlike standard unification, it does not try to make the two terms identical but instead to embed the query into the data. Simulation unification is decidable. Without variables, it has polynomial complexity. With variables it is, like standard unification, np-complete. We identify a number of interesting special cases of unification, e.g., in presence or absence of term injectivity. In particular, we show that simulation unification without term injectivity on tree data is linear and in presence of injectivity it is still polynomial even on unordered trees in contrast to the np-complete unordered tree inclusion problem.

RDFLog is Datalog with arbitrary quantifier alternation: Blank nodes, i.e., existentially quantified variables, in rule heads may be governed by universally quantified variables, universally quantified variables by blank nodes. RDFLog’s declarative semantics is defined in terms of RDF entailment; its sound and complete operational semantics, in terms of Skolemization, standard Datalog evaluation, and un-Skolemization. We show that RDFLog limited to ∀ ^* ∃ ^* prefixes is (up to unique helper predicates) equivalent to RDFLog with full quantifier alternation. A light-weight implementation points to the efficiency of the approach.

François Bry, Tim Furche, Clemens Ley, Bruno Marnette, Benedikt Linse, Sebastian Schaffert

Datalog+/-: A Family of Languages for Ontology Querying

Abstract

In ontology-based data access, an extensional database is enhanced by an ontology that generates new intensional knowledge which has to be considered when answering queries. In this setting, tractable data complexity (i.e., complexity w.r.t. the data only) of query answering is crucial, given the need to deal with large data sets. This paper summarizes results on a recently introduced family of Datalog-based languages, called Datalog+/-, which is a new framework for tractable ontology querying. Plain Datalog is extended by allowing existential quantifiers, the equality predicate, and the truth constant false to appear in rule heads. At the same time, the resulting language is syntactically restricted, so as to achieve decidability and even tractability.

Andrea Calì, Georg Gottlob, Thomas Lukasiewicz, Andreas Pieris

Knowledge Representation Language P-Log – A Short Introduction

Abstract

The paper gives a short informal introduction to the knowledge representation language P-Log. The language allows natural and elaboration tolerant representation of commonsense knowledge involving logic and probabilities. The logical framework of P-Log is Answer Set Prolog which can be viewed as a significant extension of Datalog. On the probabilistic side, the authors adopt the view which understands probabilistic reasoning as commonsense reasoning about degrees of belief of a rational agent, and use causal Bayes nets as P-log probabilistic foundation. Several examples are aimed at explaining the syntax and semantics of the language and the methodology of its use.

Michael Gelfond

Living with Inconsistency and Taming Nonmonotonicity

Abstract

In this paper we consider rule-based query languages with negation in bodies and heads of rules, traditionally denoted by Datalog ^¬¬. Tractable and at the same time intuitive semantics for Datalog ^¬¬ has not been provided even though the area of deductive databases is over 30 years old. In this paper we identify sources of the problem and propose a query language, which we call 4QL.

The 4QL language supports a modular and layered architecture and provides a tractable framework for many forms of rule-based reasoning both monotonic and nonmonotonic. As the underpinning principle we assume openness of the world, which may lead to the lack of knowledge. Negation in rule heads may lead to inconsistencies. To reduce the unknown/inconsistent zones we introduce simple constructs which provide means for application-specific disambiguation of inconsistent information, the use of Local Closed World Assumption (thus also Closed World Assumption, if needed), as well as various forms of default and defeasible reasoning.

Jan Małuszyński, Andrzej Szałas

Backmatter

Title: Datalog Reloaded
Editors: Oege de Moor
Georg Gottlob
Tim Furche
Andrew Sellers
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-24206-9
Print ISBN: 978-3-642-24205-2
DOI: https://doi.org/10.1007/978-3-642-24206-9

Springer Professional

Datalog Reloaded

First International Workshop, Datalog 2010, Oxford, UK, March 16-19, 2010. Revised Selected Papers

About this book

Table of Contents

Frontmatter

Section 1: Theoretical Aspects of Datalog

Datalog-Based Program Analysis with BES and RWL

Datalog for Security, Privacy and Trust

Answer Set Modules for Logical Agents

First-Order Encodings for Modular Nonmonotonic Datalog Programs

Datalog Programs and Their Stable Models

Exploiting Bounded Treewidth with Datalog (A Survey)

Equivalence between Extended Datalog Programs — A Brief Survey

Section 2: Applications of Datalog

Cluster Computing, Recursion and Datalog

Datalog-Related Aspects in Lixto Visual Developer

Informing Datalog through Language Intelligence – A Personal Perspective

Dyna: Extending Datalog for Modern AI

Datalog for the Web 2.0: The Case of Social Network Data Management

Context Modelling and Context-Aware Querying

Using Datalog for Fast and Easy Program Analysis

Section 3: New Languages Extending Datalog

Distributed Datalog Revisited

Dedalus: Datalog in Time and Space

The Disjunctive Datalog System DLV

Datalog as a Query Language for Data Exchange Systems

Datalog Relaunched: Simulation Unification and Value Invention

Datalog+/-: A Family of Languages for Ontology Querying

Knowledge Representation Language P-Log – A Short Introduction

Living with Inconsistency and Taming Nonmonotonicity

Backmatter

Premium Partner