nach oben

2008 | Buch

Kapitel lesen Erstes Kapitel lesen

Semantics in Data and Knowledge Bases

Third International Workshop, SDKB 2008, Nantes, France, March 29, 2008, Revised Selected Papers

herausgegeben von: Klaus-Dieter Schewe, Bernhard Thalheim

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book constitutes the thoroughly refereed post-workshop proceedings of the Third International Workshop on Semantics in Data and Knolwedge Bases, SDKB 2008, held in Nantes, France, on March 29, 2008. The 6 revised full papers presented together with 4 invited papers and a survey on the state of the art in the field, were carefully reviewed and selected for inclusion in the book. The SDKB workshop presented original contributions demonstrating the use of logic, discrete mathematics, combinatorics, domain theory and other mathematical theories of semantics for database and knowledge bases, computational linguistics and semiotics, and information and knowledge-based systems.

Inhaltsverzeichnis

Frontmatter

Introduction

Semantics in Data and Knowledge Bases

Abstract

Semantics is the study of meaning, i.e. how meaning is constructed, interpreted, clarified, obscured, illustrated, simplified, negotiated, contradicted and paraphrased [Wan87]. It has been treated differently in the scientific community, e.g., in the area of knowledge bases and by database users.

The scientific community prefers the treatment of ‘always valid’ semantics based on the mathematical logic. A constraint is valid if this is the case in any correct database.
Database modellers often use a ‘strong’ semantics for several classes of constraints. Cardinality constraints are based on the requirement that databases exist for both cases, for the minimal and for the maximal case.
Database mining is based on a ‘may be valid’ semantics. A constraint is considered to be a candidate for a valid formula.
Users usually use a weak ‘in most cases valid’ semantics. They consider a constraint to be valid if this is the usual case.
Different groups of users use an ‘epistemic’ semantics. For each of the group its set of constraints is valid in their data. Different sets of constraints can even contradict.

Semantics is currently one of the most overused notions in modern computer science literature. Its understanding spans from synonyms for structuring or synonyms for structuring on the basis of words to precise defined semantics. This partial misuse results in a mismatch of languages, in neglecting formal foundations, and in brute-force definitions of the meaning of syntactic constructions.

Klaus-Dieter Schewe, Bernhard Thalheim

Invited Papers

Data Integration through ${\textit{DL-Lite}_{\mathcal A}}$ Ontologies

Abstract

The goal of data integration is to provide a uniform access to a set of heterogeneous data sources, freeing the user from the knowledge about where the data are, how they are stored, and how they can be accessed. One of the outcomes of the research work carried out on data integration in the last years is a clear conceptual architecture, comprising a global schema, the source schema, and the mapping between the source and the global schema. In this paper, we present a comprehensive approach to, and a complete system for, ontology-based data integration. In this system, the global schema is expressed in terms of a TBox of the tractable Description Logics ${\textit{DL-Lite}_{\mathcal A}}$, the sources are relations, and the mapping language allows for expressing GAV sound mappings between the sources and the global schema. The mapping language has specific mechanisms for addressing the so-called impedance mismatch problem, arising from the fact that, while the data sources store values, the instances of concepts in the ontology are objects. By virtue of the careful design of the various languages used in our system, answering unions of conjunctive queries can be done through a very efficient technique (LogSpace with respect to data complexity) which reduces this task to standard SQL query evaluation. We also show that even very slight extensions of the expressive abilities of our system lead beyond this complexity bound.

Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Antonella Poggi, Riccardo Rosati, Marco Ruzzi

The Relational Polynomial-Time Hierarchy and Second-Order Logic

Abstract

We explore the connection between the concept of relational complexity introduced by S. Abiteboul, M. Vardi and V. Vianu and the restricted second-order logic SO^ω introduced by A. Dawar. In relational complexity, the basis for measuring the complexity of computing queries with relational machines, is the number of different FO^k-types realized by the input database. In SO^ω, the second-order quantifiers range over relations that are closed under equality of FO^k-types of k-tuples. We give a direct proof (in the style of the proof of Fagin’s theorem) of the result of A. Dawar on the fact that the existential fragment of SO^ω captures relational NP. Then we define formally the concept of relational machine with relational oracle and show the exact correspondence between the prenex fragments of SO^ω and the levels of the relational polynomial-time hierarchy.

Flavio A. Ferrarotti, José M. Turull Torres

Qualitative Knowledge Discovery

Abstract

Knowledge discovery and data mining deal with the task of finding useful information and especially rules in unstructured data. Most knowledge discovery approaches associate conditional probabilities to discovered rules in order to specify their strength. In this paper, we propose a qualitative approach to knowledge discovery. We do so by abstracting from actual probabilities to qualitative information and in particular, by developing a method for the computation of an ordinal conditional function from a possibly noisy probability distribution. The link between structural and numerical knowledge is established by a powerful algebraic theory of conditionals. By applying this theory, we develop an algorithm that computes sets of default rules from the qualitative abstraction of the input distribution. In particular, we show how sparse information can be dealt with appropriately in our framework. By making use of the duality between inductive reasoning and knowledge discovery within the algebraic theory of conditionals, we can ensure that the discovered rules can be considered as being most informative in a strict, formal sense.

Gabriele Kern-Isberner, Matthias Thimm, Marc Finthammer

On the Notion of an XML Key

Abstract

Ongoing efforts in academia and industry to advance the management of XML data have created an increasing interest in research on integrity constraints for XML. In particular keys have recently gained much attention. Keys help to discover and capture relevant semantics of XML data, and are crucial for developing better methods and tools for storing, querying and manipulating XML data. Various notions of keys have been proposed and investigated over the past few years. Due to the different ways of picking and comparing data items involved, these proposals give rise to constraint classes that differ in their expressive power and tractability of the associated decision problems. This paper provides an overview of XML key proposals that enjoy popularity in the research literature.

Sven Hartmann, Henning Köhler, Sebastian Link, Thu Trinh, Jing Wang

Research Papers

Embodied Context Semantics

Abstract

It is a desirable feature of knowledge bases that they are able to accommodate and reason across the different perspectives that may exist on a particular theory or situation. With the aim of obtaining an adequate logic for this problem, the knowledge representation community has extensively researched into the formalization of contexts as first-class citizens. However, most of the proposed logics of context only deal with the propositional case, which for many applications is not enough, and those tackling the quantificational case face many counterintuitive restrictions. In this paper, we present a model-theoretic semantics that, based on a cognitive approach to the notions of context and meaning, succeeds in addressing the quantificational case in a flexible manner that overcomes the limitations of the previous initiatives. The expressive power of the system will be evaluated in the paper by formalizing some of the benchmark examples that can be found in the literature.

Ander Altuna

Definition and Formalization of Entity Resolution Functions for Everyday Information Integration

Abstract

Data integration on a human-manageable scale, by users without database expertise, is a more common activity than integration of large databases. Users often gather fine-grained data and organize it in an entity-centric way, developing tables of information regarding real-world objects, ideas, or people. Often, they do this by copying and pasting bits of data from e-mails, databases, or text files into a spreadsheet. During this process, users evolve their notions of entities and attributes. They combine sets of entities or attributes, split them again, update attribute values, and retract those updates. These functions are neither well supported by current tools, nor formally well understood. Our research seeks to capture and make explicit the data integration decisions made during these activities. In this paper, we formally define entity resolution and de-resolution, and show that these functions behave predictably and intuitively in the presence of attribute value updates.

David W. Archer, Lois M. L. Delcambre

A Theoretical Framework for the Declarative Debugging of Datalog Programs

Abstract

The logic programming language Datalog has been extensively researched as a query language for deductive databases. Although similar to Prolog, the Datalog operational mechanisms are more intricate, leading to computations quite hard to debug by traditional approaches. In this paper, we present a theoretical framework for debugging Datalog programs based on the ideas of declarative debugging. In our setting, a debugging session starts when the user detects an unexpected answer for some query, and ends with the debugger pointing to either an erroneous predicate or to a set of mutually recursive predicates as the cause of the unexpected answer. Instead of representing the computations by means of trees, as usual in declarative debugging, we propose graphs as a more convenient structure in the case of Datalog, proving formally the soundness and completeness of the debugging technique. We also present a debugging tool implemented in the publicly available deductive database system DES following this theoretical framework.

R. Caballero, Y. García-Ruiz, F. Sáenz-Pérez

Semantic Bijectivity and the Uniqueness of Constant-Complement Updates in the Relational Context

Abstract

Within the context of the relational model, a general technique for establishing that the translation of a view update defined by constant complement is independent of the choice of complement is presented. In contrast to previous results, the uniqueness is not limited to order-based updates (those constructed from insertions and deletions), nor is it limited to those well-behaved complements which define closed update strategies. Rather, the approach is based upon optimizing the change of information in the main schema which the view update entails. The only requirement is that the view and its complement together possess a property called semantic bijectivity relative to the information measure. It is furthermore established that a very wide range of views have this property. This results formalizes the intuition, long observed in examples, that it is difficult to find different complements which define distinct but reasonable update strategies.

Stephen J. Hegner

A Top-Down Approach to Rewriting Conjunctive Queries Using Views

Abstract

The problem of answering queries using views is concerned with finding answers to a query using only answers to views. In data integration context with the Local-As-Views approach, this problem translates to finding maximally contained rewriting for a given query. Existing solutions follow a bottom-up approach and, for efficiency reason, often require a post-processing phase, which comes at an additional cost.

We propose a solution which follows a top-down approach. For this, we first present a graph-based model for conjunctive queries and views, and identify conditions that if satisfied ensures maximality of a rewriting. Using this model as a basis, we then introduce a novel top-down algorithm, TreeWise, which efficiently generates maximally contained rewritings which are in general less expensive to evaluate, compared to the bottom-up algorithms, without requiring post-processing. The preliminary results of our experiments indicate that while TreeWise has comparable performance, it generally produces better quality rewritings.

Nima Mohajerin, Nematollaah Shiri

Rewriting Conjunctive Queries over Description Logic Knowledge Bases

Abstract

We consider the problems of conjunctive query answering and rewriting for information integration systems in which a Description Logic ontology is used to provide a global view of the data. We present a resolution-based query rewriting algorithm for DL-Lite⁺ ontologies, and use it to show that query answering in this setting is NLogSpace-complete with respect to data complexity. We also show that our algorithm produces an optimal rewriting when the input ontology is expressed in the language DL-Lite. Finally, we sketch an extended version of the algorithm that would, we are confident, be optimal for several DL languages with data complexity of query answering ranging from LogSpace to PTime-complete.

Héctor Pérez-Urbina, Boris Motik, Ian Horrocks

Backmatter

Titel: Semantics in Data and Knowledge Bases
herausgegeben von: Klaus-Dieter Schewe
Bernhard Thalheim
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-88594-8
Print ISBN: 978-3-540-88593-1
DOI: https://doi.org/10.1007/978-3-540-88594-8

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Introduction

Semantics in Data and Knowledge Bases

Invited Papers

Data Integration through ${\textit{DL-Lite}_{\mathcal A}}$ Ontologies

The Relational Polynomial-Time Hierarchy and Second-Order Logic

Qualitative Knowledge Discovery

On the Notion of an XML Key

Research Papers

Embodied Context Semantics

Definition and Formalization of Entity Resolution Functions for Everyday Information Integration

A Theoretical Framework for the Declarative Debugging of Datalog Programs

Semantic Bijectivity and the Uniqueness of Constant-Complement Updates in the Relational Context

A Top-Down Approach to Rewriting Conjunctive Queries Using Views

Rewriting Conjunctive Queries over Description Logic Knowledge Bases

Backmatter

Premium Partner