ABSTRACT
Virtually every commercial query optimizer chooses the best plan for a query using a cost model that relies heavily on accurate cardinality estimation. Cardinality estimation errors can occur due to the use of inaccurate statistics, invalid assumptions about attribute independence, parameter markers, and so on. Cardinality estimation errors may cause the optimizer to choose a sub-optimal plan. We present an approach to query processing that is extremely robust because it is able to detect and recover from cardinality estimation errors. We call this approach "progressive query optimization" (POP). POP validates cardinality estimates against actual values as measured during query execution. If there is significant disagreement between estimated and actual values, execution might be stopped and re-optimization might occur. Oscillation between optimization and execution steps can occur any number of times. A re-optimization step can exploit both the actual cardinality and partial results, computed during a previous execution step. Checkpoint operators (CHECK) validate the optimizer's cardinality estimates against actual cardinalities. Each CHECK has a condition that indicates the cardinality bounds within which a plan is valid. We compute this validity range through a novel sensitivity analysis of query plan operators. If the CHECK condition is violated, CHECK triggers re-optimization. POP has been prototyped in a leading commercial DBMS. An experimental evaluation of POP using TPC-H queries illustrates the robustness POP adds to query processing, while incurring only negligible overhead. A case-study applying POP to a real-world database and workload shows the potential of POP, accelerating complex OLAP queries by almost two orders of magnitude.
- AH00 R. Avnur and J. M. Hellerstein, Eddies: Continuously Adaptive Query Optimization, SIGMOD 2000 Google ScholarDigital Library
- ARM89 R. Ahad, K. V. B. Rao, and D. McLeod, On Estimating the Cardinality of the Projection of a Database Relation, TODS 14(1), 1989 Google ScholarDigital Library
- BC02 N. Bruno and S. Chaudhuri. Exploiting Statistics on Query Expressions for Optimization, SIGMOD 2002 Google ScholarDigital Library
- C+81 D. D. Chamberlin et al., Support for Repetitive Transactions and Ad-Hoc Query in System R, TODS 6(1), 1981. Google ScholarDigital Library
- CG94 R. Cole and G. Graefe. Optimization of Dynamic query evaluation plans, SIGMOD 1994. Google ScholarDigital Library
- Ge193 A. Van Gelder, Multiple Join Size Estimation by Virtual Domains, PODS 1993. Google ScholarDigital Library
- GM93 G. Graefe and W. McKenna, The Volcano Optimizer Generator: Extensibility and Efficient Search. ICDE, 1993 Google ScholarDigital Library
- GW89 G. Graefe and K. Ward, Dynamic Query Evaluation Plans. SIGMOD 1989 Google ScholarDigital Library
- HS93 P. Haas and A. Swami, Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics, IBM Research Report, 1993Google Scholar
- HS02 A. Hulgeri and S. Sudarshan. Parametric Query Optimization for Linear and Piecewise Linear Cost Functions. VLDB 2002.Google Scholar
- IC91 Y.E. Ioannidis and S. Christodoulakis. Propagation of Errors in the Size of Join Results, SIGMOD 1991 Google ScholarDigital Library
- Ives 02 Z. Ives, Efficient Query Processing for Data Integration, Ph.D thesis, University of Washington, 2002 Google ScholarDigital Library
- KD98 N. Kabra and D. DeWitt, Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans, SIGMOD 1998 Google ScholarDigital Library
- MS02 S. Madden, M. Shah, J. M. Hellerstein, V. Raman. Continuously Adaptive Continuous Queries. SIGMOD 2000. Google ScholarDigital Library
- PIH+96 V. Poosala, et. al, Improved histograms for selectivity estimation of range predicates, SIGMOD 1996 Google ScholarDigital Library
- P197 V. Poosala and Y. Ioannidis, Selectivity Estimation without value independence, VLDB 1997 Google ScholarDigital Library
- RAH03 V. Raman, A. Deshpande, and J. M. Hellerstein, Using State Modules for Adaptive Query Optimization. ICDE 2003Google Scholar
- SAC+79 P.G. Selinger et al. Access Path Selection in a Relational DBMS. SIGMOD 1979 Google ScholarDigital Library
- SS94 A. N. Swami and K. B. Schiefer, On the Estimation of Join Result Sizes, EDBT 1994 Google ScholarDigital Library
- SWK96 M. Stonebraker, E. Wong and P. Kreps. The Design and Implementation of INGRES. TODS 1(3), 1976. Google ScholarDigital Library
- UFA98 T. Urhan, M. J. Franklin, and L. Amsaleg, Cost-based Query Scrambling for Initial Delays, SIGMOD 1998 Google ScholarDigital Library
- SLM+01 M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO --- DB2's Learning Optimizer, VLDB 2001 Google ScholarDigital Library
- ZCL+00 M. Zaharioudakis et. al: Answering Complex SQL Queries Using ASTs. SIGMOD 2000 Google ScholarDigital Library
- Robust query processing through progressive optimization
Recommendations
View-based query processing: On the relationship between rewriting, answering and losslessness
As a result of the extensive research in view-based query processing, three notions have been identified as fundamental, namely rewriting, answering, and losslessness. Answering amounts to computing the tuples satisfying the query in all databases ...
Comments