Skip to main content
Top

2005 | Book

Database Programming Languages

10th International Workshop, DBPL 2005, Trondheim, Norway, August 28-29, 2005, Revised Selected Papers

Editors: Gavin Bierman, Christoph Koch

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

Table of Contents

Frontmatter
Patterns and Types for Querying XML Documents
Abstract
Among various proposals for primitives for deconstructing XML data two approaches seem to clearly stem from practice: path expressions, widely adopted by the database community, and regular expression patterns, mainly developed and studied in the programming language community. We think that the two approaches are complementary and should be both integrated in languages for XML, and we see in that an opportunity of collaboration between the two communities. With this aim, we give a presentation of regular expression patterns and the type systems they are tightly coupled with. Although this article advocates a construction promoted by the programming language community, we will try to stress some characteristics that the database community, we hope, may find interesting.
Giuseppe Castagna
Dual Syntax for XML Languages
Abstract
XML is successful as a machine processable data interchange format, but it is often too verbose for human use. For this reason, many XML languages permit an alternative more legible non-XML syntax. XSLT stylesheets are often used to convert from the XML syntax to the alternative syntax; however, such transformations are not reversible since no general tool exists to automatically parse the alternative syntax back into XML.
We present XSugar, which makes it possible to manage dual syntax for XML languages. An XSugar specification is built around a context-free grammar that unifies the two syntaxes of a language. Given such a specification, the XSugar tool can translate from alternative syntax to XML and vice versa. Moreover, the tool statically checks that the transformations are reversible and that all XML documents generated from the alternative syntax are valid according to a given XML schema.
Claus Brabrand, Anders Møller, Michael I. Schwartzbach
Exploiting Schemas in Data Synchronization
Abstract
Increased reliance on optimistic data replication has led to burgeoning interest in tools and frameworks for synchronizing disconnected updates to replicated data. We have implemented a generic synchronization framework, called Harmony, that can be used to build statebased synchronizers for a wide variety of tree-structured data formats. A novel feature of this framework is that the synchronization process—in particular, the recognition of conflicts—is driven by the schema of the structures being synchronized. We formalize Harmony’s synchronization algorithm, state a simple and intuitive specification, and illustrate how it can be used to synchronize trees representing a variety of specific forms of application data, including sets, records, and tuples.
J. Nathan Foster, Michael B. Greenwald, Christian Kirkegaard, Benjamin C. Pierce, Alan Schmitt
Efficiently Enumerating Results of Keyword Search
Abstract
Various approaches for keyword search in different settings (e.g., relational databases, XML and the Web) actually deal with the problem of enumerating K-fragments. For a given set of keywords K, a K-fragment is a subtree T of the given data graph, such that T contains all the keywords of K and no proper subtree of T has this property. There are three types of K-fragments: rooted, undirected and strong. This paper describes the first provably efficient algorithms for enumerating K-fragments. Specifically, for all three types of K-fragments, algorithms are given for enumerating all K-fragments with polynomial delay. For rooted K-fragments and acyclic data graphs, an algorithm is given for enumerating with polynomial delay in the order of increasing weight (i.e., the ranked order), assuming that K is of a fixed size. Finally, an efficient algorithm is described for enumerating K-fragments in a heuristically ranked order.
Benny Kimelfeld, Yehoshua Sagiv
Mapping Maintenance in XML P2P Databases
Abstract
Unstructured p2p database systems are usually characterized by the presence of schema mappings among peers. In these systems, the detection of corrupted mappings is a key problem. A corrupted mapping fails in matching the target or the source schema, hence it is not able to transform data conforming to a schema \(\mathcal{S}_i\) into data conforming to a schema \(\mathcal{S}_j\), nor it can be used for effective query reformulation.
This paper describes a novel technique for maintaining mappings in XML p2p databases, based on a semantic notion of mapping correctness.
Dario Colazzo, Carlo Sartiani
Inconsistency Tolerance in P2P Data Integration: An Epistemic Logic Approach
Abstract
We study peer-to-peer data integration, where each peer models an autonomous system that exports data in terms of its own schema, and data interoperation is achieved by means of mappings among the peer schemas, rather than through a global schema. We propose a multi-modal epistemic semantics based on the idea that each peer is conceived as a rational agent that exchanges knowledge/belief with other peers, thus nicely modeling the modular structure of the system. We then address the issue of dealing with possible inconsistencies, and distinguish between two types of inconsistencies, called local and P2P, respectively. We define a nonmonotonic extension of our logic that is able to reason on the beliefs of peers under inconsistency tolerance. Tolerance to local inconsistency essentially means that the presence of inconsistency within one peer does not affect the consistency of the whole system. Tolerance to P2P inconsistency means being able to resolve inconsistencies arising from the interaction between peers. We study query answering and its data complexity in this setting, and we present an algorithm that is sound and complete with respect to the proposed semantics, and optimal with respect to worst-case complexity.
Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Riccardo Rosati
XML Data Integration with Identification
Abstract
Data integration is the problem of combining data residing at different sources, and providing the user with a virtual view, called global schema, which is independent from the model and the physical origin of the sources. Whereas many data integration systems and theoretical works have been proposed for relational data, not much investigation has been focused yet on XML data integration. Our goal is therefore to address some of its related issues. In particular, we highlight two major issues that emerge in the XML context: (i) the global schema may be characterized by a set of constraints, expressed by means of a DTD and XML integrity constraints, (ii) the concept of node identity requires to introduce semantic criteria to identify nodes coming from different sources. We propose a formal framework for XML data integration systems based on an expressive XML global schema, a set of XML data sources and a set of mappings specified by means of a simple tree language. Then, we define an identification function that aims at globally identifying nodes coming from different sources. Finally, we propose algorithms to answer queries under different assumptions for the mappings.
Antonella Poggi, Serge Abiteboul
Satisfiability of XPath Queries with Sibling Axes
Abstract
We study the satisfiability problem for XPath fragments supporting the following-sibling and preceding-sibling axes. Although this problem was recently studied for XPath fragments without sibling axes, little is known about the impact of the sibling axes on the satisfiability analysis. To this end we revisit the satisfiability problem for a variety of XPath fragments with sibling axes, in the presence of DTDs, in the absence of DTDs, and under various restricted DTDs. In these settings we establish complexity bounds ranging from NLOGSPACE to undecidable. Our main conclusion is that in many cases, the presence of sibling axes complicates the satisfiability analysis. Indeed, we show that there are XPath satisfiability problems that are in PTIME and PSPACE in the absence of sibling axes, but that become NP-hard and EXPTIME-hard, respectively, when sibling axes are used instead of the corresponding vertical modalities (e.g., the wildcard and the descendant axis).
Floris Geerts, Wenfei Fan
XML Subtree Queries: Specification and Composition
Abstract
A frequent task encountered in XML processing is to filter an input document to produce a subdocument; that is, a document whose root-to-leaf paths are root-to-leaf paths of the original document and which inherits the tree structure of the original document. These are what we mean by subtree queries, and while they are similar to XPath filters, they cannot be naturally specified either in XPath or in XQuery. Special-purpose subtree query languages provide a natural idiom for specifying this class of queries, but both composition and evaluation are problematic. In this paper we show that for natural fragments of XPath, the resulting subtree query languages are closed under composition. This closure property allows a sequence of subtree queries to be rewritten as a single subtree query, which can then be evaluated either by a subtree-query specific evaluator or via translation to XQuery. We provide a set of composition algorithms for each common XPath fragment and discuss their complexity.
Michael Benedikt, Irini Fundulaki
On the Expressive Power of XQuery Fragments
Abstract
XQuery is known to be a powerful XML query language with many bells and whistles. For many common queries we do not need all the expressive power of XQuery. We investigate the effect of omitting certain features of XQuery on the expressive power of the language. We start from a simple base fragment which can be extended by several optional features being aggregation functions such as count and sum, sequence generation, node construction, position information in for loops, and recursion. In this way we obtain 64 different XQuery fragments which can be divided into 17 different equivalence classes such that two fragments can express the same functions iff they are in the same equivalence class. Moreover, we investigate the relationships between these equivalence classes.
Jan Hidders, Stefania Marrara, Jan Paredaens, Roel Vercammen
A Type Safe DOM API
Abstract
DOM (Document Object Model) is the W3C recommendation for an API for manipulating XML documents. The API is stated in terms of a language neutral IDL with normative bindings given for Java and ECMAScript. The type system underlying the DOM API is a simple object-based one which relies entirely on interfaces and interface extension. However, this simplicity is deceiving because the DOM architects implicitly impose a large number of constraints which are only stated informally. The long list of exceptions that some DOM methods may raise witnesses these constraints.
The present work defines a refinement of Java’s type system which makes most of these constraints accessible and thus checkable to the compiler. Technically, we graft a polymorphic annotation system on top of the host language’s types and propagate the annotations using ideas borrowed from affine type systems. We provide a type soundness proof with respect to an operational semantics of a Java core language.
Peter Thiemann
Type-Based Optimization for Regular Patterns
Abstract
Pattern matching mechanisms based on regular expressions feature in a number of recent languages for processing XML. The flexibility of these mechanisms demands novel approaches to the familiar problems of pattern-match compilation—how to minimize the number of tests performed during pattern matching while keeping the size of the output code small.
We describe semantic compilation methods in which we use the schema of the value flowing into a pattern matching expression to generate efficient target code. We start by discussing a pragmatic algorithm used currently in the compiler of Xtatic and report some preliminary performance results. For a more fundamental analysis, we define an optimality criterion of “no useless tests” and show that it is not satisfied by Xtatic’s algorithm. We constructively demonstrate that the problem of generating optimal pattern matching code is decidable for finite (non-recursive) patterns.
Michael Y. Levin, Benjamin C. Pierce
Efficient Memory Representation of XML Documents
Abstract
Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by “compressing” their tree structure; the latter means to detect and remove repetitions of tree patterns. The functionality of basic tree operations, like traversal along edges, is preserved in the compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. For certain tasks like validation against an XML type or checking equality of documents, the representation allows for provably more efficient algorithms than those running on conventional representations.
Giorgio Busatto, Markus Lohrey, Sebastian Maneth
N-Ary Queries by Tree Automata
Abstract
We investigate n-ary node selection queries in trees by successful runs of tree automata. We show that run-based n-ary queries capture MSO, contribute algorithms for enumerating answers of n-ary queries, and study the complexity of the problem. We investigate the subclass of run-based n-ary queries by unambiguous tree automata.
Joachim Niehren, Laurent Planque, Jean-Marc Talbot, Sophie Tison
Minimizing Tree Automata for Unranked Trees
Abstract
Automata for unranked trees form a foundation for XML schemas, querying and pattern languages. We study the problem of efficiently minimizing such automata. We start with the unranked tree automata (UTAs) that are standard in database theory, assuming bottom-up determinism and that horizontal recursion is represented by deterministic finite automata. We show that minimal UTAs in that class are not unique and that minimization is np-hard. We then study more recent automata classes that do allow for polynomial time minimization. Among those, we show that bottom-up deterministic stepwise tree automata yield the most succinct representations.
Wim Martens, Joachim Niehren
Dependency-Preserving Normalization of Relational and XML Data
Abstract
Having a database design that avoids redundant information and update anomalies is the main goal of normalization techniques. Ideally, data as well as constraints should be preserved. However, this is not always achievable: while BCNF eliminates all redundancies, it may not preserve constraints, and 3NF, which achieves dependency preservation, may not always eliminate all redundancies.
Our first goal is to investigate how much redundancy 3NF tolerates in order to achieve dependency preservation. We apply an information-theoretic measure and show that only prime attributes admit redundant information in 3NF, but their information content may be arbitrarily low.
Then we study the possibility of achieving both redundancy elimination and dependency preservation by a hierarchical representation of relational data in XML. We provide a characterization of cases when an XML normal form called XNF guarantees both.
Finally, we deal with dependency preservation in XML and show that like in the relational case, normalizing XML documents to achieve non-redundant data can result in losing constraints. By modifying the definition of XNF, we define another normal form for XML documents, X3NF, that generalizes 3NF for the case of XML and achieves dependency preservation.
Solmaz Kolahi
Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints
Abstract
Consistent query answering is the problem of computing the answers from a database that are consistent with respect to certain integrity constraints that the database as a whole may fail to satisfy. Those answers are characterized as those that are invariant under minimal forms of restoring the consistency of the database. In this context, we study the problem of repairing databases by fixing integer numerical values at the attribute level with respect to denial and aggregation constraints. We introduce a quantitative definition of database fix, and investigate the complexity of several decision and optimization problems, including DFP, i.e. the existence of fixes within a given distance from the original instance, and CQA, i.e. deciding consistency of answers to aggregate conjunctive queries under different semantics. We provide sharp complexity bounds, identify relevant tractable cases; and introduce approximation algorithms for some of those that are intractable. More specifically, we obtain results like undecidability of existence of fixes for aggregation constraints; MAXSNP-hardness of DFP, but a good approximation algorithm for a relevant special case; and intractability but good approximation for CQA for aggregate queries for one database atom denials (plus built-ins).
Leopoldo Bertossi, Loreto Bravo, Enrico Franconi, Andrei Lopatenko
Consistent Query Answers on Numerical Databases Under Aggregate Constraints
Abstract
The problem of extracting consistent information from relational databases violating integrity constraints on numerical data is addressed. In particular, aggregate constraints defined as linear inequalities on aggregate-sum queries on input data are considered. The notion of repair as consistent set of updates at attribute-value level is exploited, and the characterization of several data-complexity issues related to repairing data and computing consistent query answers is provided.
Sergio Flesca, Filippo Furfaro, Francesco Parisi
Backmatter
Metadata
Title
Database Programming Languages
Editors
Gavin Bierman
Christoph Koch
Copyright Year
2005
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-31445-5
Print ISBN
978-3-540-30951-2
DOI
https://doi.org/10.1007/11601524

Premium Partner