skip to main content
research-article

Preventing bad plans by bounding the impact of cardinality estimation errors

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

Query optimizers rely on accurate estimations of the sizes of intermediate results. Wrong size estimations can lead to overly expensive execution plans. We first define the q-error to measure deviations of size estimates from actual sizes. The q-error enables the derivation of two important results: (1) We provide bounds such that if the q-error is smaller than this bound, the query optimizer constructs an optimal plan. (2) If the q-error is bounded by a number q, we show that the cost of the produced plan is at most a factor of q4 worse than the optimal plan. Motivated by these findings, we next show how to find the best approximation under the q-error. These techniques can then be used to build synopsis for size estimates. Finally, we give some experimental results where we apply the developed techniques.

References

  1. J. F. Bonnans and A. Shapiro. Pertubation Analysis of Optimization Problems. Springer, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya. Towards estimation error guarantees for distinct values. In PODS, pages 268--279, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Donoho and M. Elad. Optimally sparse representation in general (non-orthogonal) dictionaries via l1 minimization. Proc. of the National Academy of Sciences, 100(5), 2003.Google ScholarGoogle Scholar
  4. M. N. Garofalakis and A. Kumar. Wavelet synopses for general error metrics. ACM Trans. Database Syst., 30(4), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In VLDB, pages 541--550, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Haas, M. Carey, M. Livny, and A. Shukla. Seeking the truth about ad hoc join costs. VLDB Journal, 6(3), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Ibaraki and T. Kameda. Optimal nesting for computing n-relational joins. ACM Trans. Database Syst., 9(3), 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. E. Ioannidis. The history of histograms (abridged). In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal histograms with quality guarantees. In VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. C. König and G. Weikum. Combining histograms and parametric curve fitting for feedback-driven query result-size estimation. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of nonrecursive queries. In VLDB, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In SIGMOD, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Moerkotte. Best approximation under a convex paranorm. Technical Report MA-08-07, University of Mannheim, 2008.Google ScholarGoogle Scholar
  15. T. Neumann and S. Michel. Smooth interpolating histograms with error guarantees. In BNCOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Poosola, Y. Ioannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimates of range predicates. In SIGMOD, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Reddy and J. R. Haritsa. Analyzing plan diagrams of database query optimizers. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Rockafellar. Convex Analysis. Princeton University Press, 1970.Google ScholarGoogle Scholar
  19. D. W. Scott. Multivariate Density Estimation: Theory, practice, and visualization. Wiley, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  20. P. Spellucci. Numerische Verfahren der Nichtlinearen Optimierung. Birkhäuser, 1993.Google ScholarGoogle Scholar
  21. J. Ullman. Database and Knowledge Base Systems, volume Volume 1. Computer Science Press, 1989.Google ScholarGoogle Scholar
  22. G. Watson. Approximation Theory and Numerical Methods. Addison-Wesley, 1980.Google ScholarGoogle Scholar

Index Terms

  1. Preventing bad plans by bounding the impact of cardinality estimation errors

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader