Abstract
Query optimizers rely on accurate estimations of the sizes of intermediate results. Wrong size estimations can lead to overly expensive execution plans. We first define the q-error to measure deviations of size estimates from actual sizes. The q-error enables the derivation of two important results: (1) We provide bounds such that if the q-error is smaller than this bound, the query optimizer constructs an optimal plan. (2) If the q-error is bounded by a number q, we show that the cost of the produced plan is at most a factor of q4 worse than the optimal plan. Motivated by these findings, we next show how to find the best approximation under the q-error. These techniques can then be used to build synopsis for size estimates. Finally, we give some experimental results where we apply the developed techniques.
- J. F. Bonnans and A. Shapiro. Pertubation Analysis of Optimization Problems. Springer, 2000.Google ScholarCross Ref
- M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya. Towards estimation error guarantees for distinct values. In PODS, pages 268--279, 2000. Google ScholarDigital Library
- D. Donoho and M. Elad. Optimally sparse representation in general (non-orthogonal) dictionaries via l1 minimization. Proc. of the National Academy of Sciences, 100(5), 2003.Google Scholar
- M. N. Garofalakis and A. Kumar. Wavelet synopses for general error metrics. ACM Trans. Database Syst., 30(4), 2005. Google ScholarDigital Library
- P. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In VLDB, pages 541--550, 2001. Google ScholarDigital Library
- L. Haas, M. Carey, M. Livny, and A. Shukla. Seeking the truth about ad hoc join costs. VLDB Journal, 6(3), 1997. Google ScholarDigital Library
- T. Ibaraki and T. Kameda. Optimal nesting for computing n-relational joins. ACM Trans. Database Syst., 9(3), 1984. Google ScholarDigital Library
- Y. E. Ioannidis. The history of histograms (abridged). In VLDB, 2003. Google ScholarDigital Library
- Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, 1991. Google ScholarDigital Library
- H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal histograms with quality guarantees. In VLDB, 1998. Google ScholarDigital Library
- A. C. König and G. Weikum. Combining histograms and parametric curve fitting for feedback-driven query result-size estimation. In VLDB, 1999. Google ScholarDigital Library
- R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of nonrecursive queries. In VLDB, 1986. Google ScholarDigital Library
- Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In SIGMOD, 1998. Google ScholarDigital Library
- G. Moerkotte. Best approximation under a convex paranorm. Technical Report MA-08-07, University of Mannheim, 2008.Google Scholar
- T. Neumann and S. Michel. Smooth interpolating histograms with error guarantees. In BNCOD, 2008. Google ScholarDigital Library
- V. Poosola, Y. Ioannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimates of range predicates. In SIGMOD, 1996. Google ScholarDigital Library
- N. Reddy and J. R. Haritsa. Analyzing plan diagrams of database query optimizers. In VLDB, 2005. Google ScholarDigital Library
- R. Rockafellar. Convex Analysis. Princeton University Press, 1970.Google Scholar
- D. W. Scott. Multivariate Density Estimation: Theory, practice, and visualization. Wiley, 1992.Google ScholarCross Ref
- P. Spellucci. Numerische Verfahren der Nichtlinearen Optimierung. Birkhäuser, 1993.Google Scholar
- J. Ullman. Database and Knowledge Base Systems, volume Volume 1. Computer Science Press, 1989.Google Scholar
- G. Watson. Approximation Theory and Numerical Methods. Addison-Wesley, 1980.Google Scholar
Index Terms
- Preventing bad plans by bounding the impact of cardinality estimation errors
Recommendations
Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataIn this work we introduce a novel approach to the problem of cardinality estimation over multijoin queries. Our approach leveraging randomized hashing and data sketching to tighten these bounds beyond the current state of the art. We demonstrate that ...
Bounded-error estimation using dead zone and bounding ellipsoid
The use of a dead zone and a bounding ellipsoid for parameter estimation when measurement errors are bounded is discussed. the size of the dead zone is set to be exactly equal to the assumed noise bound. The algorithm retains the properties of computing ...
Comments