Article

Adapting to source properties in processing data integration queries

Authors:
Zachary G. Ives

University of Pennsylvania

University of Pennsylvania
View Profile

,
Alon Y. Halevy

University of Washington

University of Washington
View Profile

,
Daniel S. Weld

University of Washington

University of Washington
View Profile

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of dataJune 2004Pages 395–406https://doi.org/10.1145/1007568.1007613

Published:13 June 2004Publication History

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Pages 395–406

ABSTRACT

An effective query optimizer finds a query plan that exploits the characteristics of the source data. In data integration, little is known in advance about sources' properties, which necessitates the use of adaptive query processing techniques to adjust query processing on-the-fly. Prior work in adaptive query processing has focused on compensating for delays and adjusting for mis-estimated cardinality or selectivity values. In this paper, we present a generalized architecture for adaptive query processing and introduce a new technique, called adaptive data partitioning (ADP), which is based on the idea of dividing the source data into regions, each executed by different, complementary plans. We show how this model can be applied in novel ways to not only correct for underestimated selectivity and cardinality values, but also to discover and exploit order in the source data, and to detect and exploit source data that can be effectively pre-aggregated. We experimentally compare a number of alternative strategies and show that our approach is effective.

References

G. Antoshenkov and M. Ziauddin. Query processing and optimization in Oracle Rdb. VLDB Journal, 5(4):229--237, 1996. Google ScholarDigital Library
R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD '00. Google ScholarDigital Library
N. Bruno and S. Chaudhuri. Exploiting statistics on query expressions for optimization. In SIGMOD '02, 2002. Google ScholarDigital Library
S. Chaudhuri and K. Shim. Including group-by in query optimization. In VLDB '94. Google ScholarDigital Library
C.-M. Chen and N. Roussopoulos. Adaptive selectivity estimation using query feedback. In SIGMOD '94. Google ScholarDigital Library
R. L. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In SIGMOD '94. Google ScholarDigital Library
D. Donjerkovic, Y. E. Ioannidis, and R. Ramakrishnan. Dynamic histograms: Capturing evolving data sets. In ICDE '00.Google ScholarCross Ref
D. Florescu, A. Y. Levy, I. Manolescu, and D. Suciu. Query optimization in the presence of limited access patterns. In SIGMOD '99. Google ScholarDigital Library
P. B. Gibbons. Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. TODS, 27(3), 2002. Google ScholarDigital Library
G. Graefe, R. Bunker, and S. Cooper. Hash joins and hashteams in Microsoft SQL Server. In VLDB '98. Google ScholarDigital Library
L. M. Haas, D. Kossmann, E. L. Wimmers, and J. Yang. Optimizing queries across diverse data sources. In VLDB '97. Google ScholarDigital Library
P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD '99. Google ScholarDigital Library
P. J. Haas, J. F. Naughton, S. Seshadri, and L. Stokes. Sampling-based estimation of the number of distinct values of an attribute. In VLDB '95. Google ScholarDigital Library
A. Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4):270--294, 2001. Google ScholarDigital Library
J. M. Hellerstein, P. J. Haas, and H. Wang. Online aggregation. In SIGMOD '97. Google ScholarDigital Library
Z. G. Ives. Efficient Query Processing for Data Integration. PhD thesis, University of Washington, August 2002. Google ScholarDigital Library
Z. G. Ives, D. Florescu, M. T. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In SIGMOD '99. Google ScholarDigital Library
Z. G. Ives, A. Y. Halevy, and D. S. Weld. An XML query engine for network-bound data. VLDB Journal, 11(4):380--402, December 2002. Google ScholarDigital Library
N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD '98. Google ScholarDigital Library
S. Madden, M. A. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In SIGMOD '02. Google ScholarDigital Library
V. Raman, A. Deshpande, and J. M. Hellerstein. Using state modules for adaptive query processing. In ICDE '03.Google Scholar
L. Raschid and S. Y. W. Su. A parallel processing strategy for evaluating recursive queries. In VLDB '86. Google ScholarDigital Library
M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO ---DB2's LEearning Optimizer. In VLDB '01. Google ScholarDigital Library
F. Tian and D. J. DeWitt. Tuple routing strategies for distributed eddies. In VLDB '03. Google ScholarDigital Library
T. Urhan and M. J. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. In VLDB '01. Google ScholarDigital Library
T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2), June 2000.Google Scholar
T. Urhan, M. J. Franklin, and L. Amsaleg. Cost based query scrambling for initial delays. In SIGMOD '98. Google ScholarDigital Library

Adapting to source properties in processing data integration queries
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

Processing Aggregate Queries with Materialized Views in Data Warehouse Environment

Materialized views, which are derived from base relations and stored in the database, offer opportunities for significant performance gain in query evaluation by providing quick access to the pre-computed data. A materialized view can be utilized in ...
Read More
Scalable and efficient processing of top-k multiple-type integrated queries
Abstract
In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Read More
Processing multi-join queries
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
June 2004
988 pages
ISBN:1581138598
DOI:10.1145/1007568
Conference Chairs:
Arnd Christian König
Microsoft Research
,
Stefan Dessloch
University of Kaiserslautern, Germany
,
General Chair:
Patrick Valduriez
INRIA, France
,
Program Chair:
Gerhard Weikum
University of the Saarland
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 66
  Total Citations
  View Citations
- 860
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adapting to source properties in processing data integration queries

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

Processing Aggregate Queries with Materialized Views in Data Warehouse Environment

Scalable and efficient processing of top-k multiple-type integrated queries

Processing multi-join queries