Skip to main content

Über dieses Buch

This book constitutes the refereed proceedings of the 15th International Conference on Advances in Databases and Information Systems, ADBIS 2011, held in Vienna, Austria, in September 2011. The 30 revised full papers presented together with 2 full length invited talks were carefully reviewed and selected from 105 submissions. They are organized in topical sections on query processing; data warehousing; DB systems; spatial data; information systems; physical DB design; evolution, integrity, security; and data semantics.



Invited Papers

Ontological Query Answering via Rewriting

Ontological databases extend traditional databases with ontological constraints. This technology is crucial for many applications such as semantic data publishing and integration as well as model-driven database design. For many classes of ontological constraints, query answering can be solved via query rewriting. In particular, given a conjunctive query and a set of ontological constraints, the query is compiled into a first-order query, called the perfect rewriting, that encodes the intensional knowledge implied by the constraints. Then, for every database


, the answer is obtained by directly evaluating the perfect rewriting over


. Since first-order queries can be easily translated into SQL, ontological query answering can be delegated to traditional DBMSs. This allows us to utilize all the query optimization techniques available in the underlying DBMS. This paper surveys current approaches to rewriting-based query answering of ontological databases.

Georg Gottlob, Giorgio Orsi, Andreas Pieris

On the Convergence of Data and Process Engineering

It is common practice in contemporary information systems engineering to combine data engineering methods with process engineering methods. However, these methods are applied rather independently and at different layers of an information system. This situation engenders an impedance mismatch between the process layer and the business logic and data layers in contemporary information systems. We expose some of the issues that this impedance mismatch raises by means of a concrete example. We then discuss emerging paradigms for seamlessly integrating data and process engineering.

Marlon Dumas

Query Processing 1

Mixing Bottom-Up and Top-Down XPath Query Evaluation

Available XPath evaluators basically follow one of two strategies to evaluate an XPath query on hierarchical XML data: either they evaluate it topdown or they evaluate it bottom-up. In this paper, we present an approach that allows evaluating an XPath query in arbitrary directions, including a mixture of bottom-up and top-down direction. For each location step, it can be decided whether to evaluate it top-down or bottom-up, such that we can start e.g. with a location step of low selectivity and evaluate all child-axis steps top-down at the same time. As our experiments have shown, this approach allows for a very efficient XPath evaluation which is 15 times faster than the JDK1.6 XPath query evaluation (JAXP) and which is several times faster than MonetDB if the file size is ≤ 30 MB or the query to be evaluated contains at least one location step that has a low selectivity. Furthermore, our approach is applicable to most compressed XML formats too, which may prevent swapping when a large XML document does not fit into main memory but its compressed representation does.

Markus Benter, Stefan Böttcher, Rita Hartel

Querying Versioned Software Repositories

Large parts of today’s data is stored in text documents that undergo a series of changes during their lifetime. For instance during the development of a software product the source code changes frequently. Currently, managing such data relies on version control systems (VCSs). Extracting information from large documents and their different versions is a manual and tedious process. We present


, a system that allows to declaratively query documents. It leverages information about the structure of a document that is available as a context-free grammar and allows to declaratively query document versions through a grammar annotated with relational algebra expressions. We define and illustrate the annotation of grammars with relational algebra expressions and show how to translate the annotations to easy to use SQL views.

Dietrich Christopeit, Michael Böhlen, Carl-Christian Kanne, Arturas Mazeika

Subsuming Multiple Sliding Windows for Shared Stream Computation

Shared evaluation of multiple user requests is an utmost priority for stream processing engines in order to achieve high throughput and provide timely results. Given that most continuous queries specify windowing constraints, we suggest a multi-level scheme for concurrent evaluation of time-based sliding windows seeking for potential subsumptions among them. As requests may be registered or suspended dynamically, we develop a technique for choosing the most suitable embedding of a given window into a group composed of multi-grained time frames already employed for other queries. Intuitively, the proposed methodology ”clusters” windowed operators into common hierarchical constructs, thus drastically reducing the need for their separate evaluation. Our empirical study confirms that such a scheme achieves dramatic memory savings with almost negligible maintenance cost.

Kostas Patroumpas, Timos Sellis

Data Warehousing 1

Revisiting the Partial Data Cube Materialization

The problem of selecting views and/or indexes to materialize has been extensively studied in the context of query optimization. Traditionally, the problem is formalized as follows: given a set of queries and a budget e.g., an available memory space, find the objects to materialize (views and/or indexes) that (1) satisfy the given budget and (2) minimize the query cost. In this paper, we depart from this setting by adopting a user-centric point of view: given a constraint on query evaluation, namely a maximal query cost the user does accept, find the objects (1) whose materialization needs the minimal storage space and (2) that guarantee the query evaluation constraint. We study this problem in the data cube setting and provide exact and approximate solutions.

Nicolas Hanusse, Sofian Maabout, Radu Tofan

Mining Preferences from OLAP Query Logs for Proactive Personalization

The goal of personalization is to deliver information that is relevant to an individual or a group of individuals in the most appropriate format and layout. In the OLAP context personalization is quite beneficial, because queries can be very complex and they may return huge amounts of data. Aimed at making the user’s experience with OLAP as plain as possible, in this paper we propose a proactive approach that couples an MDX-based language for expressing OLAP preferences to a mining technique for automatically deriving preferences. First, the log of past MDX queries issued by that user is mined to extract a set of association rules that relate sets of frequent query fragments; then, given a specific query, a subset of pertinent and effective rules is selected; finally, the selected rules are translated into a preference that is used to annotate the user’s query. A set of experimental results proves the effectiveness and efficiency of our approach.

Julien Aligon, Matteo Golfarelli, Patrick Marcel, Stefano Rizzi, Elisa Turricchia

A Novel Integrated Classifier for Handling Data Warehouse Anomalies

Within databases employed in various commercial sectors, anomalies continue to persist and hinder the overall integrity of data. Typically,






observations of spatial-temporal data causes the user to be not able to accurately utilise recorded information. In literature, different methods have been mentioned to clean data which fall into the category of either deterministic and probabilistic approaches. However, we believe that to ensure the maximum integrity, a data cleaning methodology must have properties of both of these categories to effectively eliminate the anomalies. To realise this, we have proposed a method which relies both on integrated deterministic and probabilistic classifiers using fusion techniques. We have empirically evaluated the proposed concept with state-of-the-art techniques and found that our approach improves the integrity of the resulting data set.

Peter Darcy, Bela Stantic, Abdul Sattar

Data Warehousing 2

Variable Granularity Space Filling Curve for Indexing Multidimensional Data

Efficiently accessing multidimensional data is a challenge for building modern database applications that involve many folds of data such as temporal, spatial, data warehousing, bio-informatics, etc. This problem stems from the fact that multidimensional data have no given order that preserves proximity. The majority of the existing solutions to this problem cannot be easily integrated into the current relational database systems since they require modifications to the kernel. A prominent class of methods that can use existing access structures are ‘space filling curves’. In this study, we describe a method that is also based on the space filling curve approach, but in contrast to earlier methods, it connects regions of various sizes rather than points in multidimensional space. Our approach allows an efficient transformation of interval queries into regions of data that results in significant improvements when accessing the data. A detailed empirical study demonstrates that the proposed method outperforms the best available off-the-shelf methods for accessing multidimensional data.

Justin Terry, Bela Stantic, Paolo Terenziani, Abdul Sattar

MOLAP Cube Based on Parallel Scan Algorithm

This paper describes a new approach to multidimensional OLAP cubes implementation by employing a massively parallel scan operation. This task requires dedicated data structures, setting up and querying algorithms. A prototype implementation is evaluated in aspects of robustness and scalability for both time and storage.

Krzysztof Kaczmarski, Tomasz Rudny

Real-Time Computation of Advanced Rules in OLAP Databases

In Online Analytical Processing (OLAP) users view data through a multidimensional model known as the

data cube

, allowing the aggregation of information along different attributes and operations such as slicing and dicing. In-memory OLAP systems keep all relevant data in main memory and also support efficient updates of cube data, enabling interactive planning, forecasting, and what-if analysis. Since usually only the base data is stored and all aggregations and other calculations are computed on the fly, complex computations may seriously downgrade performance. We present an approach that uses graphics processing units (GPUs) as parallel coprocessors for high performance in-memory OLAP operations. In particular, our method accelerates the calculation of compute-intensive


, which represent business dependencies that are more complex than mere aggregates. In addition to the data structures and algorithms, we describe how to extend the approach to multi-GPU systems in order to scale it to larger data sets.

Steffen Wittmer, Tobias Lauer, Amitava Datta

DB Systems

Designing a Flash-Aware Two-Level Cache

The random read efficiency of flash memory, combined with its growing density and dropping price, make it well-suited for use as a read cache. We explore how a system can use flash memory as a cache layer between the main memory buffer pool and the magnetic disk. We study the problem of deciding which data pages to cache on flash and propose alternatives that serve different purposes. We give an analytical model to decide the optimal caching scheme for any workload, taking into account the physical properties of the flash disk used. We discuss implementation issues such as the effect of the flash cache block size on performance. Our experimental evaluation shows that questions on systems with flash-resident caches cannot be given universal answers that hold across all flash disks and workloads. Rather, our cost model should be applied per case to provide an optimal setup with confidence.

Ioannis Koltsidas, Stratis D. Viglas

Declarative Serializable Snapshot Isolation

Snapshot isolation (SI) is a popular concurrency control protocol, but it permits non-serializable schedules that violate database integrity. The Serializable Snapshot Isolation (SSI) protocol ensures (view) serializability by preventing pivot structures in SI schedules. In this paper, we leverage the SSI approach and develop the Declarative Serializable Snapshot Isolation (DSSI) protocol, an SI protocol that guarantees serializable schedules. Our approach requires no analysis of application programs or changes to the underlying DBMS. We present an implementation and prove that it ensures serializability.

Christian Tilgner, Boris Glavic, Michael Böhlen, Carl-Christian Kanne

Resource Scheduling Methods for Query Optimization in Data Grid Systems

Resource allocation (RA) is one of the most important stages of distributed query processing in Data Grid systems. Recently, a number of papers that propose different methods for RA were published. To deal with specific characteristics of the data grid systems, such as dynamicity, heterogeneity and large-scale, many studies extend classic methods from distributed and parallel databases domains. Others invite fundamentally different methods based on incentives for autonomous nodes. The present study provides a brief description, qualitative comparison and performance evaluation of the most interesting approaches (extended classic and incentive-based) for RA. Both approaches are promising and appropriate for successful data grid systems.

Igor Epimakhov, Abdelkader Hameurlain, Tharam Dillon, Franck Morvan

Spatial Data

A Recommendation Technique for Spatial Data

Recommendation functionalities have been recently considered in traditional database systems as an approach for guaranteeing a satisfactory interaction with the database also to users with a low or moderate technical skill or in presence of huge volumes of, potentially heterogeneous, data. Recommendation is performed by extending query results with additional and potentially interesting items. Among the proposed techniques, current-state approaches exploit the content and the schema of a query result as well as the database instance in order to recommend new items. While some preliminary current-state approaches have been proposed for relational databases, in this paper, we claim that current-state approaches can also be relevant for providing new ways of interactions in spatial databases. To support our claim, we present a current-state recommendation approach for spatial data and topological queries. The proposed approach exploits the principles of locality and similarity between topological predicates to recommend new spatial objects besides those precisely returned by a query. An index-based query processing algorithm for the proposed recommendation operator is also proposed, to guarantee an efficient computation of recommended items.

Barbara Catania, Maria Teresa Pinto, Paola Podestà, Davide Pomerano

Processing (Multiple) Spatio-temporal Range Queries in Multicore Settings

Research in Moving Objects Databases (MOD) has addressed various aspects of storing and querying trajectories of moving objects: from modelling, through linguistic constructs and formalisms/ algebras, to indexing structures and efficient processing of different query-categories have been subjects to a large body of works. Given the architectural trends of multicore CPUs becoming a commonplace, in this work we focus on efficient processing of spatio-temporal range queries in such settings. We postulate that coupling the semantics of the problem domain into the query processing algorithms in a manner that is aware of the multicore features, can yield performance improvements that surpass the gains obtained by relying solely on the compiler-generated threads parallelization. Towards that end, we present and evaluate heuristics for processing variants spatio-temporal range queries in multicore settings by partitioning the load (i.e., data set) and assigning partial tasks to the individual cores. Our experiments demonstrate that 5-fold speed-ups can be achieved, when compared to the (semi) naive approach which relies on the compiler to generate the multicore-compatible code.

Goce Trajcevski, Anan Yaagoub, Peter Scheuermann

Performance Comparison of xBR-trees and R*-trees for Single Dataset Spatial Queries

Processing of spatial queries has been studied extensively in the literature. In most cases, it is accomplished by indexing spatial data by an access method. For queries involving a single dataset, like the Point Location Query, the Window (Distance Range) Query, the (Constrained) K Nearest Neighbor Query, the R*-tree (a data-driven structure) is a very popular choice of such a method. In this paper, we compare the performance of the R*-tree for processing single dataset spatial queries to the performance of a disk based structure that belongs to the Quadtree family, the xBR-tree (a space-driven structure). We demonstrate performance results (I/O efficiency and execution time) of extensive experimentation that was based on real datasets, using these two index structures. The winner depends on several parameters and the results show that the xBR-tree is a promising alternative for these spatial operations.

George Roumelis, Michael Vassilakopoulos, Antonio Corral

Query Processing 2

Efficient Detection of Minimal Failing Subqueries in a Fuzzy Querying Context

This paper deals with conjunctive fuzzy queries that yield an empty or unsatisfactory answer set. We propose a cooperative answering approach which efficiently retrieves the minimal failing subqueries of the initial query (which can then be used to explain the failure). The detection of the minimal failing subqueries relies on a prior step of fuzzy cardinalities computation. The main advantage of this strategy is to imply a single scan of the database. Moreover, the storage of such knowledge about the data distributions easily fits in memory.

Olivier Pivert, Grégory Smits, Allel Hadjali, Hélène Jaudoin

Rewriting Fuzzy Queries Using Imprecise Views

This paper proposes an approach to the tolerant rewriting of queries in terms of views when the views and the queries may involve


value constraints in the context of a Local-As-View mediation system. These constraints describe attribute values as a set of elements attached with a degree in


that expresses the plausibility attached to a given element, i.e., attribute values more or less plausible/typical in the views, while in the queries, they denotes preferences, i.e., more or less desired values. The problem of rewriting queries is formalized in the setting of the description logic

${\cal FL}_0$

extended to fuzzy value constraints. We propose an algorithm of gradual and structural subsumption for this extended logic, that plays a key role in the query rewriting algorithm. Finally, we characterize the tolerant query rewriting forms and propose an algorithm to compute them.

Hélène Jaudoin, Olivier Pivert

Personalizing Queries over Large Data Tables

We present a formal framework for the processing of preference queries over large data tables, in which user preferences are expressed as comparisons between attribute values (e.g. “I prefer




”).The main contributions of the paper are as follows: (a) a formal framework for the statement of the problem, under no restrictions whatsoever on the preferences expressed by the user, (b) a rewriting algorithm that takes as input a preference query and returns a sequence of ordinary sub-queries whose evaluations construct the answer to the preference query, (c) a general definition of “skyline” and (d) a user-friendly interface supporting preference query formulation and incremental query evaluation with on-the-fly modification.

Nicolas Spyratos, Tsuyoshi Sugibuchi, Jitao Yang

Information Systems

An Analysis of the Structure and Dynamics of Large-Scale Q/A Communities

In recent years, the World Wide Web (WWW) has transformed to a gigantic social network where people interact and collaborate in diverse online communities. By using Web 2.0 tools, people contribute content and knowledge at a rapid pace. Knowledge-intensive social networks such as Q/A communities offer a great source of expertise for crowdsourcing applications. Companies desiring to outsource human tasks to the crowd, however, demand for certain guarantees such as quality that can be expected from returned tasks. We argue that the quality of crowd-sourced tasks greatly depends on incentives and the users’ dynamically evolving expertise and interests. Here we propose expertise mining techniques that are applied in online social communities. Our approach recommends users by considering contextual properties of Q/A communities such as participation degree and topic-sensitive expertise. Furthermore, we discuss prediction mechanisms to estimate answering dynamics considering a person’s interest and social preferences.

Daniel Schall, Florian Skopik

Forcasting Evolving Time Series of Energy Demand and Supply

Real-time balancing of energy demand and supply requires accurate and efficient forecasting in order to take future consumption and production into account. These balancing capabilities are reasoned by emerging energy market developments, which also pose new challenges to forecasting in the energy domain not addressed so far: First, real-time balancing requires accurate forecasts at any point in time. Second, the hierarchical market organization motivates forecasting in a distributed system environment. In this paper, we present an approach that adapts forecasting to the hierarchical organization of today’s energy markets. Furthermore, we introduce a forecasting framework, which allows efficient forecasting and forecast model maintenance of time series that evolve due to continuous streams of measurements. This framework includes model evaluation and adaptation techniques that enhance the model maintenance process by exploiting context knowledge from previous model adaptations. With this approach (1) more accurate forecasts can be produced within the same time budget, or (2) forecasts with similar accuracy can be produced in less time.

Lars Dannecker, Matthias Böhm, Wolfgang Lehner, Gregor Hackenbroich

The NestFlow Interpretation of Workflow Control-Flow Patterns

Business process models are designed using a set of control-flow and data-flow constructs provided by the chosen Business Process Modeling Language (BPML). As research confirms, the adoption of a structured control-flow is always desirable for enhancing model comprehensibility and reducing the presence of errors. However, existing BPMLs cannot promote a fully structured approach to control-flow design because any restriction imposed on the existing language constructs results in a loss of expressiveness in terms of definable models. This paper proposes a novel BPML called NestFlow, characterized by a small set of language constructs that together overcome the aforementioned limitation. NestFlow expressiveness is discussed in terms of supported Workflow Control-Flow Patterns (WCPs), showing how the right combination of control-flow and data-flow constructs allows one to express most of these patterns in a structured way.

Carlo Combi, Mauro Gambini, Sara Migliorini

Physical DB Design

On Simplifying Integrated Physical Database Design

This paper deals with the problem of integrated physical database design involving two optimization techniques: horizontal data partitioning (HDP) and bitmap join indexes (BJI). These techniques compete for the same resource representing selection attributes. This competition incurs

attribute interchangeability

phenomena, where same attribute(s) may be used to select either HDP or BJI schemes. Existing studies dealing with integrated physical database design problem not consider this competition. We propose to study its contribution on simplifying the complexity of our problem. Instead of tackling it in an integrated way, we propose to start by


to each technique its own attributes and then it


its own selection algorithm. This assignment is done using the K-Means method. Our design is compared with the state of the art work using APB1 benchmark. The results show that an

interchangeability attribute-aware database designer

can improve significantly query performance within the less space budget.

Rima Bouchakri, Ladjel Bellatreche

Generic Information System Architecture for Distributed Multimedia Indexation and Management

Effective and scalable distributed solutions for data indexation and management should provide solutions for three sensitive issues: the system’s architecture that integrates the mentioned facilities, the management of the indexation techniques and the modeling of the metadata that describe the multimedia contents. This paper presents a generic and scalable distributed architectural solution that integrates a flexible indexation management technique and enables to deal with any metadata models. Our solution was developed in the context of the LINDO project, by capitalizing and enhancing the existing solutions for each issue while also respecting the project’s requirements. The solution proposes an innovative indexing management technique, and includes the system’s architecture and the metadata model that sustain it. In order to validate our proposal, several implementations are presented which are related to broadcast, video surveillance and archive activities.

Mihaela Brut, Sébastien Laborie, Ana-Maria Manzat, Florence Sèdes

Automatic Physical Database Tuning Middleware for Web-Based Applications

In this paper we conceptualize the database layout problem as a state space search problem. A state is a given assignment of tables to computer servers. We begin with a database and collect, for use as a workload input, a sequence of queries that were executed during normal usage of the database. The operators in the search are to fully replicate, horizontally partition, vertically partition, and de-normalize a table. We do a time intensive search over different table layouts, and at each iteration, physically create the configurations, and evaluate the total throughput of the system. We report our empirical results of two forms. First, we empirically validate as facts the heuristics that Database Administrators (DBAs) currently use as in doing this task manually: for tables that have a high ratio of update, delete, and insert to retrieval queries one should horizontally partition, but for a small ratio one should fully replicate a table. Such rules of thumb are reasonable, however we want to parameterize some common guidelines that DBAs can use. Our second empirical result is that we applied this search to our existing data test case and found a reliable increase in total system throughput. The search over layouts is very expensive, but we argue that our method is practical and useful, as entities trying to scale up their Web-based applications would be perfectly happy to spend a few weeks of CPU time to increase their system throughput (and potentially reduce the investment in hardware). To make this search more practical, we want to learn reasonable rules to guide the search to eliminate many layout configurations that are not very likely to succeed. The second aspect of our project (not reported here) is to use the created configurations as input into a machine learning system, to create general rules about when to use the different layout operators.

Jozsef Patvarczki, Neil T. Heffernan

Evolution, Integrity, Security

XML Data Transformations as Schema Evolves

One of the key characteristics of XML applications is their dynamic nature. When a system grows and evolves, old user requirements change and/or new requirements accumulate. Apart from changes in the interface, it is also necessary to modify the existing documents with each new version, so they are valid against the new specification. The approach presented in this paper extends an existing XML conceptual model with the support for multiple versions of the model. Thanks to this extension, it is possible to define a set of changes between two versions of a schema. This work contains an outline of an algorithm that compares two versions of a schema and produces a revalidation script in XSL.

Jakub Malý, Irena Mlýnková, Martin Nečaský

Partial Repairs That Tolerate Inconsistency

The consistency of databases can be supported by enforcing integrity constraints on the stored data. Constraints that are violated should be repaired by eliminating the causes of the violations. Traditionally, repairs are conceived to be total. However, it may be unfeasible to eliminate all violations. We show that it is possible to get by with partial repairs that tolerate extant inconsistencies. They may not eliminate all causes of integrity violations but preserve the consistent parts of the database. Remaining violations can be controlled by measuring inconsistency, and further reduced by inconsistency-tolerant integrity checking.

Hendrik Decker

Modularisation in Maude of Parametrized RBAC for Row Level Access Control

We formalize a Parametrized Role-Based Access Control in the language Maude. We demonstrate how this formalization can be used to specify a row level access control policy in a database and how module algebra capabilities of Maude assist in modularization of such specification.

Ścibor Sobieski, Bartosz Zieliński

Data Semantics

A Clustering-Based Approach for Large-Scale Ontology Matching

Schema and ontology matching have attracted a great deal of interest among researchers. Despite the advances achieved, the large matching problem still presents a real challenge, such as it is a time-consuming and memory-intensive process. We therefore propose a scalable, clustering-based matching approach that breaks up the large matching problem into smaller matching problems. In particular, we first introduce a structure-based clustering approach to partition each schema graph into a set of disjoint subgraphs (clusters). Then, we propose a new measure that efficiently determines similar clusters between every two sets of clusters to obtain a set of small matching tasks. Finally, we adopt the matching prototype COMA++ to solve individual matching tasks and combine their results. The experimental analysis reveals that the proposed method permits encouraging and significant improvements.

Alsayed Algergawy, Sabine Massmann, Erhard Rahm

Automatic Building of an Appropriate Global Ontology

Our objective is to automatically build a global ontology from several data sources, annotated with local ontologies and aiming to share their data in a specific application domain. The originality of our proposal lies in the use of a background knowledge, i.e. a reference ontology, as a mediation support for data integration. We represent ontologies using Description Logics and we combine syntactic-matching with logical-reasoning in order to build the shared global TBox from both the TBoxes of sources and that of the reference ontology.

Cheikh Niang, Béatrice Bouchou, Moussa Lo, Yacine Sam

Semantic Interoperation of Information Systems by Evolving Ontologies through Formalized Social Processes

For autonomously developed information systems to interoperate in a meaningful manner, ontologies capturing the intended semantics of that interoperation have to be developed by a community of stakeholders in those information systems. As the requirements of the ontology and the ontology itself evolve, so in general will the community, and vice versa. Ontology construction should thus be viewed as a complex activity leading to formalized semantic agreement involving various social processes within the community, and that may translate into a number of ontology evolution operators to be implemented. The hybrid ontologies that emerge in this way indeed need to support both the social agreement processes in the stakeholder communities and the eventual reasoning implemented in the information systems that are governed by these ontologies. In this paper, we discuss formal aspects of the social processes involved, a so-called fact-oriented methodology and formalism to structure and describe these, as well as certain relevant aspects of the communities in which they occur. We also report on a prototypical tool set that supports such a methodology, and on examples of some early experiments.

Christophe Debruyne, Robert Meersman


Weitere Informationen

Premium Partner