Skip to main content

Über dieses Buch

Researchers in data management have recently recognized the importance of a new class of data-intensive applications that requires managing data streams, i.e., data composed of continuous, real-time sequence of items. Streaming applications pose new and interesting challenges for data management systems. Such application domains require queries to be evaluated continuously as opposed to the one time evaluation of a query for traditional applications. Streaming data sets grow continuously and queries must be evaluated on such unbounded data sets. These, as well as other challenges, require a major rethink of almost all aspects of traditional database management systems to support streaming applications.

Stream Data Management comprises eight invited chapters by researchers active in stream data management. The collected chapters provide exposition of algorithms, languages, as well as systems proposed and implemented for managing streaming data.

Stream Data Management is designed to appeal to researchers or practitioners already involved in stream data management, as well as to those starting out in this area. This book is also suitable for graduate students in computer science interested in learning about stream data management.



Chapter 1. Introduction to Stream Data Management

In recent years, a new class of applications has emerged that requires managing data streams, i.e., data composed of continuous, real-time sequence of items. This chapter introduces issues and solutions in managing stream data. Some typical applications requiring support for streaming data are described and the challenges for data management systems in supporting these requirements are identified. This is followed by a description of solutions aimed at providing the required functionality. The chapter concludes with a tour of the rest of the chapters in the book.
Nauman A. Chaudhry

Chapter 2. Query Execution and Optimization

Query execution and optimization for streaming data revisits almost all aspects of query execution and optimization over traditional, disk-bound database systems. The reason is that two fundamental assumptions of disk-bound systems are dropped: (i) the data resides on disk, and (ii) the data is finite. As such, new evaluation algorithms and new optimization metrics need to be devised. The approaches can be broadly classified into two categories. First, there are static approaches that follow the traditional optimize-then-execute paradigm by assuming that optimization-time assumptions will continue to hold during execution; the environment is expected to be relatively static in that respect. Alternatively, there are adaptive approaches that assume the environment is completely dynamic and highly unpredictable. In this chapter we explore both approaches and present novel query optimization and evaluation techniques for queries over streaming sources.
Stratis D. Viglas

Chapter 3. Filtering, Punctuation, Windows and Synopses

This chapter addresses some of the problems raised by the high-volume, nonterminating nature of many data streams. We begin by outlining challenges for query processing over such streams, such as outstripping CPU or memory resources, operators that wait for the end of input and unbounded query state. We then consider various techniques for meeting those challenges. Filtering attempts to reduce stream volume in order to save on system resources. Punctuations incorporate semantics on the structure of a stream into the stream itself, and can help unblock query operators and reduce the state they must retain. Windowing modifies a query so that processing takes place on finite subsets of full streams. Synopses are compact, efficiently maintained summaries of data that can provide approximate answers to particular queries.
David Maier, Peter A. Tucker, Minos Garofalakis

Chapter 4. XML & Data Streams

XQuery path queries form the basis of complex matching and processing of XML data. Most current XML query processing techniques can be divided in two groups. Navigation-based algorithms compute results by analyzing an input stream of documents one tag at a time. In contrast, index-based algorithms take advantage of (precomputed or computed-on-demand) numbering schemes over each input XML document in the stream. In this chapter, we present an index-based technique, Index-Filter, to answer multiple path queries. Index-Filter uses indexes built over the document tags to avoid processing large portions of an input document that are guaranteed not to be part of any match. We analyze Index-Filter, compare it against Y-Filter, a state-of-the-art navigation-based technique, and present the advantages of each technique.
Nicolas Bruno, Luis Gravano, Nick Koudas, Divesh Srivastava

Chapter 5. CAPE: A Constraint-Aware Adaptive Stream Processing Engine

Without Abstract
Elke A. Rundensteiner, Luping Ding, Yali Zhu, Timothy Sutherland, Braeford Pielech

Chapter 6. Efficient Support for Time Series Queries in Data Stream Management Systems

There is much current interest in supporting continuous queries on data streams using generalizations of database query languages, such as SQL. The research challenges faced by this approach include (i) overcoming the expressive power limitations of database languages on data stream applications, and (ii) providing query processing and optimization techniques for the data stream execution environment that is so different from that of traditional databases. In particular, SQL must be extended to support sequence queries on time series, and to overcome the loss of expressive power due to the exclusion of blocking query operators. Furthermore, the query processing techniques of relational databases must be replaced with techniques that optimize execution of time-series queries and the utilization of main memory. The Expressive Stream Language for Time Series (ESL-TS) and its query optimization techniques solve these problems efficiently and are part of the data stream management system prototype developed at UCLA.
Yijian Bai, Chang R. Luo, Hetal Thakkar, Carlo Zaniolo

Chapter 7. Managing Distributed Geographical Data Streams with the GIDB Portal System

The Naval Research Laboratory (NRL) has developed a portal system, called the Geospatial Information Database (GIDB®) which links together several hundred geographic information databases. The GIDB portal enables users to access many distributed data sources with a single protocol and from a single source. This chapter will highlight the current functionality of the GIDB Portal System and give specific applications to military and homeland security uses.
John T. Sample, Frank P. McCreedy, Michael Thomas

Chapter 8. Streaming Data Dissemination Using Peer-Peer Systems

Many characteristics of peer-peer systems make them suitable for addressing the traditional problems of information storage and dissemination. Peer-peer systems give a distributed solution to these problems. Typically, peer-peer systems (research prototypes or commercial systems) have dynamic topologies where peers join and leave the network at any point. However, the information that is stored and queried in these peers is assumed to be static. Most of these current peer-peer systems do not deal with data that is changing dynamically, i.e., data that changes rapidly and unpredictably. This chapter first examines a few of the existing peer-peer systems and the various issues that they address. It then discusses some of the research issues in using peer-peer systems for managing dynamic or streaming data and presents a peer-peer solution for the dissemination of dynamic data.
Shetal Shah, Krithi Ramamritham


Weitere Informationen

Premium Partner