Skip to main content

2009 | Buch

Dataspace: The Final Frontier

26th British National Conference on Databases, BNCOD 26, Birmingham, UK, July 7-9, 2009. Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 26th British National Conference on Databases, BNCOD 26, held in Birmingham, UK, in July 2009. The 12 revised full papers, 2 short papers and 5 poster papers presented together with 2 keynote talks, 2 tutorial papers and summaries of 3 co-located workshops were carefully reviewed and selected from 33 submissions. The papers are organized in topical sections on data integration, warehousing and privacy; alternative data models; querying; and path queries and XML;data mining and privacy, data integration, stream and event data processing, and query processing and optimisation.

Inhaltsverzeichnis

Frontmatter

Keynote Talks

Dataspaces: Progress and Prospects
Abstract
The concept of Dataspaces was proposed in a 2005 paper by Franklin, Halevy and Maier as a new data management paradigm to help unify the diverse efforts on flexible data models, data cleaning, and data integration being pursued by the database community. The key idea is to expand the scope of database technology by moving away from the schema-first nature of traditional data integration techniques and instead, support a spectrum of degrees of structure, schema conformance, entity resolution, quality and so on. Dataspaces offer an initial, lightweight set of useful services over a group of data sources, allowing users, administrators, or programs to improve the semantic relationships between the data in an incremental and iterative manner. This “pay-as-you-go” philosophy is at the heart of the Dataspaces approach. In this talk, I will revisit the motivation behind Dataspaces, outline the evolution of thinking on the topic since the 2005 proposal, and describe our current work focused on developing a metrics-driven framework for Dataspace systems.
Michael J. Franklin
XtreemOS: Towards a Grid-Enabled Linux-Based Operating System
Abstract
The term “Grid Computing” was introduced at the end of 90s by Foster and Kesselman; it was envisioned as “an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation” [1].
Domenico Laforenza

Tutorials

The iMeMex Dataspace Management System: Architecture, Concepts, and Lessons Learned
Abstract
The iMeMex Project was one of the first systems trying to build a so-called dataspace management system. This tutorial presents the core concepts of iMeMex. We discuss system design concepts, dataspace modelling, dataspace indexing, dataspace query processing, and pay-as-you-go information integration. We will present some important lessons learned from this project and also discuss ongoing and open research challenges.
Jens Dittrich
Conditional Dependencies: A Principled Approach to Improving Data Quality
Abstract
Real-life data is often dirty and costs billions of pounds to businesses worldwide each year. This paper presents a promising approach to improving data quality. It effectively detects and fixes inconsistencies in real-life data based on conditional dependencies, an extension of database dependencies by enforcing bindings of semantically related data values. It accurately identifies records from unreliable data sources by leveraging relative candidate keys, an extension of keys for relations by supporting similarity and matching operators across relations. In contrast to traditional dependencies that were developed for improving the quality of schema, the revised constraints are proposed to improve the quality of data. These constraints yield practical techniques for data repairing and record matching in a uniform framework.
Wenfei Fan, Floris Geerts, Xibei Jia

Data Integration, Warehousing and Privacy

A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces
Abstract
Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed. These approaches make use of different types of information about schemas, including structures, linguistic features and data types etc, to measure different types of similarity between the attributes of two schemas. They then combine different types of similarity and use combined similarity to select a collection of attribute correspondences for every source attribute. Thresholds are usually used for filtering out likely incorrect attribute correspondences, which have to be set manually and are matcher and domain dependent. A selection strategy is also used to resolve any conflicts between attribute correspondences of different source attributes. In this paper, we propose a new prioritized collective selection strategy that has two distinct characteristics. First, this strategy clusters a set of attribute correspondences into a number of clusters and collectively selects attribute correspondences from each of these clusters in a prioritized order. Second, it introduces use of a null correspondence for each source attribute, which represents the option that the source attribute has no attribute correspondence. By considering this option, our strategy does not need a threshold to filter out likely incorrect attribute correspondences. Our experimental results show that our approach is highly effective.
Zhongtian He, Jun Hong, David A. Bell
An Alternative Data Warehouse Reference Architectural Configuration
Abstract
In the last few years the amount of data stored on computer systems is growing at an accelerated rate. These data are frequently managed within data warehouses. However, the current data warehouse architectures based on n-ary-Relational DBMSs are overcoming their limits in order to efficiently manage such large amounts of data. Some DBMS are able to load huge amounts of data nevertheless; the response times become unacceptable for business users during information retrieval. In this paper we describe an alternative data warehouse reference architectural configuration (ADW) which addresses many issues that organisations are facing. The ADW approach considers a Binary-Relational DBMS as an underlying data repository. Therefore, a number of improvements have been achieved, such as data density increment, reduction of data sparsity, query response times dramatically decreased, and significant workload reduction with data loading, backup and restore tasks.
Victor González-Castro, Lachlan M. MacKinnon, María del Pilar Angeles
A Data Privacy Taxonomy
Abstract
Privacy has become increasingly important to the database community which is reflected by a noteworthy increase in research papers appearing in the literature. While researchers often assume that their definition of “privacy” is universally held by all readers, this is rarely the case; so many papers addressing key challenges in this domain have actually produced results that do not consider the same problem, even when using similar vocabularies. This paper provides an explicit definition of data privacy suitable for ongoing work in data repositories such as a DBMS or for data mining. The work contributes by briefly providing the larger context for the way privacy is defined legally and legislatively but primarily provides a taxonomy capable of thinking of data privacy technologically. We then demonstrate the taxonomy’s utility by illustrating how this perspective makes it possible to understand the important contribution made by researchers to the issue of privacy. The conclusion of this paper is that privacy is indeed multifaceted so no single current research effort adequately addresses the true breadth of the issues necessary to fully understand the scope of this important issue.
Ken Barker, Mina Askari, Mishtu Banerjee, Kambiz Ghazinour, Brenan Mackas, Maryam Majedi, Sampson Pun, Adepele Williams

Alternative Data Models

Dimensions of Dataspaces
Abstract
The vision of dataspaces has been articulated as providing various of the benefits of classical data integration, but with reduced up-front costs, combined with opportunities for incremental refinement, enabling a “pay as you go” approach. However, results that seek to realise the vision exhibit considerable variety in their contexts, priorities and techniques, to the extent that the definitional characteristics of dataspaces are not necessarily becoming clearer over time. With a view to clarifying the key concepts in the area, encouraging the use of consistent terminology, and enabling systematic comparison of proposals, this paper defines a collection of dimensions that capture both the components that a dataspace management system may contain and the lifecycle it may support, and uses these dimensions to characterise representative proposals.
Cornelia Hedeler, Khalid Belhajjame, Alvaro A. A. Fernandes, Suzanne M. Embury, Norman W. Paton
The Use of the Binary-Relational Model in Industry: A Practical Approach
Abstract
In recent years there has been a growing interest in the research community in the utilisation of alternative data models that abandon the relational record storage and manipulation structure. The authors have already reported experimental considerations of the behaviour of n-ary Relational, Binary-Relational, Associative and Transrelational models within the context of Data Warehousing [1], [2], [3] to address issues of storage efficiency and combinatorial explosion through data repetition. In this paper we present the results obtained during the industrial usage of Binary-Relational model based DBMS within a reference architectural configuration. These industrial results are similar to the ones obtained during the experimental stage of this research at the University laboratory [4] where improvements on query speed, data load and considerable reductions on disk space are achieved. These industrial tests considered a wide set of industries: Manufacturing, Government, Retail, Telecommunications and Finance.
Victor González-Castro, Lachlan M. MacKinnon, María del Pilar Angeles
Hyperset Approach to Semi-structured Databases
Computation of Bisimulation in the Distributed Case
Abstract
We will briefly describe the recently implemented hyperset approach to semi-structured or Web-like and possibly distributed databases with the query system available online at http://www.csc.liv.ac.uk/~molyneux/t/ . As this approach is crucially based on the bisimulation relation, the main stress in this paper is on its computation in the distributed case by using a so called bisimulation engine and local approximations of the global bisimulation relation.
Richard Molyneux, Vladimir Sazonov

Querying

Multi-Join Continuous Query Optimization: Covering the Spectrum of Linear, Acyclic, and Cyclic Queries
Abstract
Traditional optimization algorithms that guarantee optimal plans have exponential time complexity and are thus not viable in streaming contexts. Continuous query optimizers commonly adopt heuristic techniques such as Adaptive Greedy to attain polynomial-time execution. However, these techniques are known to produce optimal plans only for linear and star shaped join queries. Motivated by the prevalence of acyclic, cyclic and even complete query shapes in stream applications, we conduct an extensive experimental study of the behavior of the state-of-the-art algorithms. This study has revealed that heuristic-based techniques tend to generate sub-standard plans even for simple acyclic join queries. For general acyclic join queries we extend the classical IK approach to the streaming context to define an algorithm TreeOpt that is guaranteed to find an optimal plan in polynomial time. For the case of cyclic queries, for which finding optimal plans is known to be NP-complete, we present an algorithm FAB which improves other heuristic-based techniques by (i) increasing the likelihood of finding an optimal plan and (ii) improving the effectiveness of finding a near-optimal plan when an optimal plan cannot be found in polynomial time. To handle the entire spectrum of query shapes from acyclic to cyclic we propose a Q-Aware approach that selects the optimization algorithm used for generating the join order, based on the shape of the query.
Venkatesh Raghavan, Yali Zhu, Elke A. Rundensteiner, Daniel Dougherty
An XML-Based Model for Supporting Context-Aware Query and Cache Management
Abstract
Database systems (DBSs) can play an essential role in facilitating the query and cache management in context-aware mobile information systems (CAMIS). Two of the fundamental aspects of such management are update notifications and context-aware query processing. Unfortunately, DBSs does not provide a built-in update notification function and are not aware of the context of their usage. This paper presents an XML model called XReAl (XML-based Relational Algebra) that assists DBSs in extending their capabilities to support context-aware queries and cache management for mobile environments.
Essam Mansour, Hagen Höpfner
Answering Multiple-Item Queries in Data Broadcast Systems
Abstract
A lot of research has been done on answering single-item queries, only a few have looked at answering multiple-item queries in data broadcast systems. The few that did, have proposed approaches that are less responsive to changes in the query queue. It is not immediately clear how single-item scheduling algorithms will perform when used in answering pull-based multiple-item queries. This paper investigates the performance of existing single-item scheduling algorithms in answering multiple-item queries in pull-based data broadcast systems. We observed that Longest Wait First, a near-optimal single-item data scheduling algorithm, has been used in environments where users’ data access pattern is skewed. This paper also investigates the performance of Longest Wait First under various user access patterns. We propose \(\mathcal{Q}\)LWF: an online data broadcast scheduling algorithm for answering multiple-item queries in pull-based data broadcast systems. For the purpose of comparison with \(\mathcal{Q}\)LWF, we adapted existing pull single-item algorithm, push single-item algorithm, and push multiple-item algorithm to answer multiple-item queries in pull environments. Results from extensive sets of experiments show that \(\mathcal{Q}\)LWF has a superior performance compared with the adapted algorithms.
Adesola Omotayo, Ken Barker, Moustafa Hammad, Lisa Higham, Jalal Kawash

Path Queries and XML

A Study of a Positive Fragment of Path Queries: Expressiveness, Normal Form, and Minimization
Abstract
We study the expressiveness of a positive fragment of path queries, denoted Path\(\mathstrut^+\), on node-labeled trees documents. The expressiveness of Path\(\mathstrut^+\) is studied from two angles. First, we establish that Path\(\mathstrut^+\) is equivalent in expressive power to a particular sub-fragment as well as to the class of tree queries, a sub-class of the first-order conjunctive queries defined over label, parent-child, and child-parent predicates. The translation algorithm from tree queries to Path\(\mathstrut^+\) yields a normal form for Path\(\mathstrut^+\) queries. Using this normal form, we can decompose a Path\(\mathstrut^+\) query into sub-queries that can be expressed in a very small sub-fragment of Path\(\mathstrut^+\) for which efficient evaluation strategies are available. Second, we characterize the expressiveness of Path\(\mathstrut^+\) in terms of its ability to resolve nodes in a document. This result is used to show that each tree query can be translated to a unique, equivalent, and minimal tree query. The combination of these results yields an effective strategy to evaluate a large class of path queries on documents.
Yuqing Wu, Dirk Van Gucht, Marc Gyssens, Jan Paredaens
Metamodel-Based Optimisation of XPath Queries
Abstract
To date, query performance in XML databases remains a difficult problem. XML documents are often very large making fast access to nodes within document trees cumbersome for query processors. Many research teams have addressed this issue with efficient algorithms and powerful indexes, but XML systems still cannot perform at the same level as relational databases. In this paper, we present a metamodel, which enables us to efficiently solve relationships between nodes in an XML database using standard SQL. By implementing the metamodel presented here, one can turn any off-the-shelf relational database into a high performance XPath processor. We will demonstrate the significant improvement achieved over three leading databases, and identify where each database is strongest in relation to XPath query performance.
Gerard Marks, Mark Roantree
Compacting XML Structures Using a Dynamic Labeling Scheme
Abstract
Due to the growing popularity of XML as a data exchange and storage format, the need to develop efficient techniques for storing and querying XML documents has emerged. A common approach to achieve this is to use labeling techniques. However, their main problem is that they either do not support updating XML data dynamically or impose huge storage requirements. On the other hand, with the verbosity and redundancy problem of XML, which can lead to increased cost for processing XML documents, compaction of XML documents has become an increasingly important research issue. In this paper, we propose an approach called CXDLS combining the strengths of both, labeling and compaction techniques. Our approach exploits repetitive consecutive subtrees and tags for compacting the structure of XML documents by taking advantage of the ORDPATH labeling scheme. In addition it stores the compacted structure and the data values separately. Using our proposed approach, it is possible to support efficient query and update processing on compacted XML documents and to reduce storage space dramatically. Results of a comprehensive performance study are provided to show the advantages of CXDLS.
Ramez Alkhatib, Marc H. Scholl

Short Papers

Evaluating a Peer-to-Peer Database Server Based on BitTorrent
Abstract
Database systems have traditionally used a Client-Server architecture. As the server becomes overloaded, clients experience an increase in query response time, and in the worst case the server may be unable to provide any service at all.
In file-sharing, the problem of server overloading has been addressed by the use of Peer-to-Peer (P2P) techniques in which users (peers) supply files to each other, so sharing the load. This paper describes the Wigan P2P Database System, which was designed to investigate if P2P techniques for reducing server load, thus increasing system scalability, could be applied successfully in a database environment. It is based on the BitTorrent file-sharing approach.
This paper introduces the Wigan system architecture, explaining how the BitTorrent approach must be modified for a P2P database server. It presents and analyses experimental results, including the TPC-H benchmark, which show that the approach can succeed in delivering scalability in particular cases.
John Colquhoun, Paul Watson
Towards Building a Knowledge Base for Research on Andean Weaving
Abstract
We are working on a knowledge base to store 3D Andean textile patterns together with rich cultural and historic context information. This will allow ontological studies in museum collections as well as on ethnographic and archaeological fieldwork. We build on an existing ontology, extending it to incorporate more content and make it more accessible. This goes well beyond storing and retrieving textile patterns and enables us to capture the semantics and wider context of these patterns.
Denise Y. Arnold, Sven Helmer, Rodolfo Velásquez Arando

Posters

The Adaptation Model of a Runtime Adaptable DBMS
Abstract
Nowadays maintenance of database management systems (DBMSs) often requires offline operations for enhancement of functionality or security updates. This hampers the availability of the provided services and can cause undesirable implications. Therefore it is essential to minimize the downtime of DBMSs. We present the CoBRA DB (Component Based Runtime Adaptable DataBase) project that allows the adaptation and extension of a modular DBMS at runtime. In this paper we focus on the definition of an adaptation model describing the semantics of adaptation processes.
Florian Irmert, Thomas Fischer, Frank Lauterwald, Klaus Meyer-Wegener
Schema Merging Based on Semantic Mappings
Abstract
In model management, the Merge operator takes as input a pair of schemas, together with a set of mappings between their objects, and returns an integrated schema. In this paper we present a new approach to implementing the Merge operator based on semantic mappings between objects. Our approach improves upon previous work by (1) using formal low-level transformation rules that can be translated into higher-level rules and (2) specifying precise BAV mappings, which merge schemas without any information loss or gain.
Nikos Rizopoulos, Peter McBrien
A Database System for Absorbing Conflicting and Uncertain Information from Multiple Correspondents
Abstract
This paper discusses a database system which absorbs assertions about the data from a community of correspondents capturing also the uncertainty of the assertion and taking account of the potential unreliability of the correspondent. The paper describes a system compromising the capture of such assertions, the ability to impose an authorised version of a value, the maintenance of a reliability measure for each correspondent and a querying system which returns the most likely values.
Richard Cooper, Laura Devenny
Semantic Exploitation of Engineering Models: An Application to Oilfield Models
Abstract
Engineering development activities rely on computer-based models, which enclose technical data issued from different sources. In this heterogeneous context, retrieving, re-using and merging information is a challenge. We propose to annotate engineering models with concepts of domain ontologies, which provide data with explicit semantics. The semantic annotation makes it possible to formulate queries using the semantic concepts that are significant to the domain of the engineers. This work is inspired from a petroleum engineering case study and we validate our approach by presenting an implementation of this case study.
Laura Silveira Mastella, Yamine Aït-Ameur, Stéphane Jean, Michel Perrin, Jean-François Rainaud
Ontology-Based Method for Schema Matching in a Peer-to-Peer Database System
Abstract
In a P2P DBS, the databases are often developed independently so their schemas are highly heterogeneous. Creating matching rules (henceforth MR) between a given mediated schema and each peer schema at the design-time is not suitable for a volatile P2P environment; in which, a peer may participate in the system only once. For this reason, the MR must be done at the run-time. Schema designers are often the only persons knowing about the semantics of their schemas. At the run-time, one (or both) schema designer(s) could not be available; hence the user must be able to create the MR to support his/her changing requirements. Given that the semantics of a domain ontology is explicitly explained and in order to help the user to create the MR, we propose a schema matching method based on a domain ontology which plays a similar role as that played by a given mediated schema.
Raddad Al King, Abdelkader Hameurlain, Franck Morvan

Co-located Workshops

Ideas in Teaching, Learning and Assessment of Databases: A Communication of the 7thInternational Workshop on Teaching, Learning and Assessment of Databases (TLAD 2009)
Abstract
This paper is a record of the Seventh International Workshop on the Teaching, Learning and Assessment of Databases (TLAD 2009). Based on the contributions received, the paper describes the efforts that academics based in the UK and elsewhere around the world have made towards finding and disseminating new and interesting ways to enhance the teaching and learning of the database subject area. This year most of the submissions centred around the following areas: issues in teaching for databases and how to resolve these; methods for teaching enterprise systems development and the relevance of databases; interactive and collaborative learning support environments; and tools to aid learning in the database area.
Anne James, David Nelson, Karen Davis, Richard Cooper, Alastair Monger
Research Directions in Database Architectures for the Internet of Things: A Communication of the First International Workshop on Database Architectures for the Internet of Things (DAIT 2009)
Abstract
This paper is a record of the First International Workshop on Database Architectures for the Internet of Things. The Internet of Things refers to the future internet which will contain trillions of nodes representing various objects from small ubiquitous sensor devices and handhelds to large web servers and supercomputer clusters. The workshop investigated a number of areas appertaining to data management in the Internet of Things from storage structures, through database management methods, to service-oriented architectures and new approaches to information search. Running orthogonal to these layers the matter of security was also considered. Taking a philosophical viewpoint our whole current framework for understanding data management may be ill-equipped to meet the challenges of the Internet of Things. The workshop gave participants the chance to discuss these matters, exchange ideas and explore future collaborations.
Anne James, Joshua Cooper, Keith Jeffery, Gunter Saake
Design Challenges and Solutions: Review of the 4th International Workshop on Ubiquitous Computing (iUBICOM 2009)
Abstract
This paper provides an overview of several approaches, methods and techniques, of ubiquitous and collaborative computing, discussed in the papers submitted to the International Workshop on Ubiquitous Computing (iUBICOM-09). In this workshop, we aimed to balance discussion of technological factors with human aspects in order to explore implications for better design. The theme was information retrieval, decision making processes, and user needs in the context of ubiquitous computing. This paper includes work carried out on different dimensions focusing on technological as well as social aspects of ubiquitous and collaborative computing.
John Halloran, Rahat Iqbal, Dzmitry Aliakseyeu, Martinez Fernando, Richard Cooper, Adam Grzywaczewski, Ratvinder Grewal, Anne James, Chris Greenhalgh
Backmatter
Metadaten
Titel
Dataspace: The Final Frontier
herausgegeben von
Alan P. Sexton
Copyright-Jahr
2009
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-02843-4
Print ISBN
978-3-642-02842-7
DOI
https://doi.org/10.1007/978-3-642-02843-4

Premium Partner