Top

2013 | Book

Read chapter Read first chapter

Advanced Query Processing

Volume 1: Issues and Trends

Editors: Barbara Catania, Lakhmi C. Jain

Publisher: Springer Berlin Heidelberg

Book Series : Intelligent Systems Reference Library

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This research book presents key developments, directions, and challenges concerning advanced query processing for both traditional and non-traditional data. A special emphasis is devoted to approximation and adaptivity issues as well as to the integration of heterogeneous data sources.

The book will prove useful as a reference book for senior undergraduate or graduate courses on advanced data management issues, which have a special focus on query processing and data integration. It is aimed for technologists, managers, and developers who want to know more about emerging trends in advanced query processing.

Frontmatter

Advanced Query Processing: An Introduction

Abstract

Traditional query processing techniques have played a major role in the success of relational Database Management Systems over the last decade. However, they do not obviously extend to much more challenging, unorganized and unpredictable data providers, typical of emerging data intensive applications and novel processing environments. For them, advanced query processing and data integration approaches have been proposed with the aim of still guaranteeing an effective and efficient data access in such more complex data management scenarios. The aim of this chapter is to present the main issues and trends arising in advanced query processing and to relate them to the various parts of this book. For each part, a brief description of the background concepts and of the presented contributions is also provided.

Barbara Catania, Lakhmi Jain

On Skyline Queries and How to Choose from Pareto Sets

Abstract

Skyline queries are well known for their intuitive query formalization and easy to understand semantics when selecting the most interesting database objects in a personalized fashion. They naturally fill the gap between set-based SQL queries and rank-aware database retrieval and thus have emerged in the last few years as a popular tool for personalized retrieval in the database research community. Unfortunately, the Skyline paradigm also exhibits some significant drawbacks. Most prevalent among those problems is the so called “curse of dimensionality” which often leads to unmanageable result set sizes. This flood of query results, usually containing a significant portion of the original database, in turn severely hampers the paradigm’s applicability in real-life systems. In this chapter, we will provide a survey of techniques to remedy this problem by choosing the most interesting objects from the multitude of skyline objects in order to obtain truly manageable and personalized query results.

Christoph Lofi, Wolf-Tilo Balke

Processing Framework for Ranking and Skyline Queries

Abstract

In the previous chapter, the need to support ranking and skyline queries for multi-criteria decision-making for given user preferences was motivated. We now survey existing algorithms for each query and show a ‘meta-algorithm’ framework for each query. The goal of this chapter is to show that how this framework and cost model enable us to (a) generalize existing algorithms and (b) observe important principles not observed from individual algorithms.

Seung-won Hwang

Preference-Based Query Personalization

Abstract

In the context of database queries, computational methods for handling preferences can be broadly divided into two categories. Query personalization methods consider that user preferences are provided as a user profile separately from the query and dynamically determine how this profile will affect the query results. On the other hand, preferential query answering methods consider that preferences are explicitly expressed within queries. The focus of this chapter is on query personalization methods. We will first describe how preferences can be represented and stored in user profiles. Then, we will discuss how preferences are selected from a user profile and applied to a query.

Georgia Koutrika, Evaggelia Pitoura, Kostas Stefanidis

Approximate Queries for Spatial Data

Abstract

Approximation techniques for spatial data traditionally concern data capture and data representation issues. On the other hand, more recently developed approximation techniques refer to the query to be executed and not to data representation as in the the past monolithic Geographic Information Systems and for this reason they are called query-based approximation techniques. The aim of this chapter is to survey such approximation techniques and to identify the issues that from our point of view have still to be investigated to complete the picture. In particular, we observe that most of the proposed approaches for spatial approximate queries rely on the usage of quantitative, i.e., metric (distance-based), information. On the other hand, only few of them take into account qualitative information, e.g., topological and cardinal spatial relations. Based on this consideration, we provide new types of queries relying on qualitative relations and we discuss how the query processing algorithms already defined for metric relations can be extended to cope with qualitative information.

Alberto Belussi, Barbara Catania, Sara Migliorini

Approximate XML Query Processing

Abstract

The standard XML query languages, XPath and XQuery, are built on the assumption of a regular structure with well-defined parent/child relationships between nodes and exact conditions on nodes. Full text extensions to both languages allow Information Retrieval (IR) style queries over text-rich documents. Important applications exist for which the purely textual information is not predominant and documents exhibit a structure, that is however not relatively regular. Thus, approaches to relax both content and structure conditions in queries on XML document collections and to rank results according to some measure to assess similarity have been proposed, as well as processing approaches to efficiently evaluate them. In the chapter, the various dimensions of query relaxation and alternative approaches to approximate processing will be discussed.

Giovanna Guerrini

Progressive and Approximate Join Algorithms on Data Streams

Abstract

In this chapter, we discuss the design and implementation of join algorithms for data streaming systems, wherememory is often limited relative to the data that needs to be processed.We first focus on progressive join algorithms for various data models. We introduce a framework for progressive join processing, called the Result Rate based Progressive Join (RRPJ) framework which can be used for join processing for various data models, and discuss its various instantiations for processing relational, high-dimensional, spatial and XML data.

We then consider progressive and approximate join algorithms. The need for approximate join algorithms is motivated by the observation that users often do not require complete set of answers. Some answers, which we refer to as an approximate result, are often sufficient. Users expect the approximate result to be either the largest possible or the most representative (or both) given the resources available. We discuss the tradeoffs between maximizing quantity and quality of the approximate result. To address the different tradeoffs, we discuss a family of algorithms for progressive and approximate join processing.

Wee Hyong Tok, Stéphane Bressan

Online Aggregation

Abstract

In this chapter, we introduce a new promising technique for query processing, online aggregation. Online aggregation is proposed based on the assumption that for some applications, the precise results are not always required. Instead, the approximate results can provide a good enough estimation. Compared to the precise results, computing the approximate ones are more cost effective, especially for large-scale datasets. To generate the approximate result, online aggregation retrieves samples continuously from the database. The samples are streamed to the query engine for processing the query. The accuracy of the approximate result is described by a statistical model. Normally, the result is refined as more samples are obtained. The user can terminate the processing at any time, when he/she is satisfied with the quality of the result.

The performance of online aggregation relies on the sampling approach and estimation model. In this chapter, our discussion is focused on these two components. Besides introducing the basic principles of online aggregation, we also review some new applications built on top of it. We complete the chapter by discussing the challenges of online aggregation and some future directions.

Sai Wu, Beng Chin Ooi, Kian-Lee Tan

Adaptive Query Processing in Distributed Settings

Abstract

In this survey chapter,we discuss adaptive query processing (AdQP) techniques for distributed environments. We also investigate the issues involved in extending AdQP techniques originally proposed for single-node processing so that they become applicable to multi-node environments as well. In order to make it easier for the reader to understand the similarities among the various proposals, we adopt a common framework, which decomposes the adaptivity loop into the monitoring, analysis, planning and actuation (or execution) phase. The main distributed AdQP techniques developed so far tend to differ significantly from their centralized counterparts, both in their objectives and in their focus. The objectives in distributed AdQP are more tailored to distributed settings, whereas more attention is paid to issues relating to the adaptivity cost, which is significant, especially when operators and data are moved over the network.

Anastasios Gounaris, Efthymia Tsamoura, Yannis Manolopoulos

Approximate Queries with Adaptive Processing

Abstract

The traditional query processing approach, by which queries are executed exactly according to a query execution plan selected before query execution starts, breaks down in heterogeneous and dynamic processing environments that are becoming more and more common as query processing contexts. In such environments, queries are often relaxed and query processing is forced to be adaptive and approximate, either to cope with limited processing resources or with limited data knowledge and data heterogeneity.When approximation and adaptivity are applied in order to cope with limited processing resources, possibly sacrificing result quality, we refer to as Quality of Service (QoS)-oriented techniques. On the other hand, when they are a means to improve the quality of results, in presence of limited data knowledge and data heterogeneity, we refer to as Quality of Data (QoD)-oriented techniques. While both kinds of approximation techniques have been proposed, most adaptive solutions are QoS-oriented. In this chapter, we first survey both kinds of approximation and introduce adaptive query processing techniques; then, we show that techniques which apply a QoD-oriented approximation in a QoD-oriented adaptive way, though demonstrated potentially useful on some examples, are still largely neglected.

Barbara Catania, Giovanna Guerrini

Querying Conflicting Web Data Sources

Abstract

Over the last twenty years, information integration has received considerable efforts from both industry and academia. Approaches to information integration developed so far can be categorized as follows: (1) first-generation approaches, that require the definition of a global schema and a semantic integration which should be performed upfront (before query execution); (2) second-generation approaches, well illustrated by the dataspace management concept, which promote a pay-asyou-go data integration. The first category has led to well known mediation approaches such as GAV (Global as View), LAV (Local as View), GLAV (Generalized Local As View), BAV (Both As View), and BGLAV (BYU Global-Local-as-View). Approaches pertaining to the second category are geared towards the development of dataspace management systems and are currently gaining a lot of attention. In this chapter we are interested in exploiting both types of approaches in querying conflicting data spread over multiple web sources. To this aim, first we show how an XML-based BGLAV approach can handle these conflicting data sources, then we describe how the same problem can be addressed by using the Multi Fusion Approach (MFA), an approach pertaining to second-generation techniques. Both BGLAV and MFA are illustrated in using genomic data sources accessible through the Web.

Gilles Nachouki, Mohamed Quafafou, Omar Boucelma, François-Marie Colonna

A Functional Model for Dataspace Management Systems

Abstract

Dataspace management systems (DSMSs) hold the promise of pay-as-you-go data integration. We describe a comprehensive model of DSMS functionality using an algebraic style. We begin by characterizing a dataspace life cycle and highlighting opportunities for both automation and user-driven improvement techniques. Building on the observation that many of the techniques developed in model management are of use in data integration contexts as well, we briefly introduce the model management area and explain how previous work on both data integration and model management needs extending if the full dataspace life cycle is to be supported.We show that many model management operators already enable important functionalities (e.g., the merging of schemas, the composition of mappings, etc.) and formulate these capabilities in an algebraic structure, thereby giving rise to the notion of the core functionality of a DSMS as a many-sorted algebra. Given this view, we show how core tasks in the dataspace life cycle can be enacted by means of algebraic programs. An extended case study illustrates how such algebraic programs capture a challenging, practical scenario.

Cornelia Hedeler, Alvaro A. A. Fernandes, Khalid Belhajjame, Lu Mao, Chenjuan Guo, Norman W. Paton, Suzanne M. Embury

Backmatter

Title: Advanced Query Processing
Editors: Barbara Catania
Lakhmi C. Jain
Publisher: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-28323-9
Print ISBN: 978-3-642-28322-2
DOI: https://doi.org/10.1007/978-3-642-28323-9

Springer Professional

Advanced Query Processing

Volume 1: Issues and Trends

About this book

Table of Contents

Frontmatter

Advanced Query Processing: An Introduction

On Skyline Queries and How to Choose from Pareto Sets

Processing Framework for Ranking and Skyline Queries

Preference-Based Query Personalization

Approximate Queries for Spatial Data

Approximate XML Query Processing

Progressive and Approximate Join Algorithms on Data Streams

Online Aggregation

Adaptive Query Processing in Distributed Settings

Approximate Queries with Adaptive Processing

Querying Conflicting Web Data Sources

A Functional Model for Dataspace Management Systems

Backmatter

Premium Partner