ABSTRACT
An effective query optimizer finds a query plan that exploits the characteristics of the source data. In data integration, little is known in advance about sources' properties, which necessitates the use of adaptive query processing techniques to adjust query processing on-the-fly. Prior work in adaptive query processing has focused on compensating for delays and adjusting for mis-estimated cardinality or selectivity values. In this paper, we present a generalized architecture for adaptive query processing and introduce a new technique, called adaptive data partitioning (ADP), which is based on the idea of dividing the source data into regions, each executed by different, complementary plans. We show how this model can be applied in novel ways to not only correct for underestimated selectivity and cardinality values, but also to discover and exploit order in the source data, and to detect and exploit source data that can be effectively pre-aggregated. We experimentally compare a number of alternative strategies and show that our approach is effective.
- G. Antoshenkov and M. Ziauddin. Query processing and optimization in Oracle Rdb. VLDB Journal, 5(4):229--237, 1996. Google ScholarDigital Library
- R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD '00. Google ScholarDigital Library
- N. Bruno and S. Chaudhuri. Exploiting statistics on query expressions for optimization. In SIGMOD '02, 2002. Google ScholarDigital Library
- S. Chaudhuri and K. Shim. Including group-by in query optimization. In VLDB '94. Google ScholarDigital Library
- C.-M. Chen and N. Roussopoulos. Adaptive selectivity estimation using query feedback. In SIGMOD '94. Google ScholarDigital Library
- R. L. Cole and G. Graefe. Optimization of dynamic query evaluation plans. In SIGMOD '94. Google ScholarDigital Library
- D. Donjerkovic, Y. E. Ioannidis, and R. Ramakrishnan. Dynamic histograms: Capturing evolving data sets. In ICDE '00.Google ScholarCross Ref
- D. Florescu, A. Y. Levy, I. Manolescu, and D. Suciu. Query optimization in the presence of limited access patterns. In SIGMOD '99. Google ScholarDigital Library
- P. B. Gibbons. Y. Matias, and V. Poosala. Fast incremental maintenance of approximate histograms. TODS, 27(3), 2002. Google ScholarDigital Library
- G. Graefe, R. Bunker, and S. Cooper. Hash joins and hashteams in Microsoft SQL Server. In VLDB '98. Google ScholarDigital Library
- L. M. Haas, D. Kossmann, E. L. Wimmers, and J. Yang. Optimizing queries across diverse data sources. In VLDB '97. Google ScholarDigital Library
- P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD '99. Google ScholarDigital Library
- P. J. Haas, J. F. Naughton, S. Seshadri, and L. Stokes. Sampling-based estimation of the number of distinct values of an attribute. In VLDB '95. Google ScholarDigital Library
- A. Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4):270--294, 2001. Google ScholarDigital Library
- J. M. Hellerstein, P. J. Haas, and H. Wang. Online aggregation. In SIGMOD '97. Google ScholarDigital Library
- Z. G. Ives. Efficient Query Processing for Data Integration. PhD thesis, University of Washington, August 2002. Google ScholarDigital Library
- Z. G. Ives, D. Florescu, M. T. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In SIGMOD '99. Google ScholarDigital Library
- Z. G. Ives, A. Y. Halevy, and D. S. Weld. An XML query engine for network-bound data. VLDB Journal, 11(4):380--402, December 2002. Google ScholarDigital Library
- N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In SIGMOD '98. Google ScholarDigital Library
- S. Madden, M. A. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In SIGMOD '02. Google ScholarDigital Library
- V. Raman, A. Deshpande, and J. M. Hellerstein. Using state modules for adaptive query processing. In ICDE '03.Google Scholar
- L. Raschid and S. Y. W. Su. A parallel processing strategy for evaluating recursive queries. In VLDB '86. Google ScholarDigital Library
- M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO ---DB2's LEearning Optimizer. In VLDB '01. Google ScholarDigital Library
- F. Tian and D. J. DeWitt. Tuple routing strategies for distributed eddies. In VLDB '03. Google ScholarDigital Library
- T. Urhan and M. J. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. In VLDB '01. Google ScholarDigital Library
- T. Urhan and M. J. Franklin. XJoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2), June 2000.Google Scholar
- T. Urhan, M. J. Franklin, and L. Amsaleg. Cost based query scrambling for initial delays. In SIGMOD '98. Google ScholarDigital Library
- Adapting to source properties in processing data integration queries
Recommendations
Processing Aggregate Queries with Materialized Views in Data Warehouse Environment
Materialized views, which are derived from base relations and stored in the database, offer opportunities for significant performance gain in query evaluation by providing quick access to the pre-computed data. A materialized view can be utilized in ...
Scalable and efficient processing of top-k multiple-type integrated queries
AbstractIn this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Comments