skip to main content
research-article

Lightweight graphical models for selectivity estimation without independence assumptions

Published:01 August 2011Publication History
Skip Abstract Section

Abstract

As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintain one-dimensional statistical summaries and make the attribute value independence and join uniformity assumptions for efficiently estimating selectivities. Therefore, selectivity estimation errors in today's optimizers are frequently caused by missed correlations between attributes. We present a selectivity estimation approach that does not make the independence assumptions. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution of all the attributes in the database into small, usually two-dimensional distributions. We describe several optimizations that can make selectivity estimation highly efficient, and we present a complete implementation inside PostgreSQL's query optimizer. Experimental results indicate an order of magnitude better selectivity estimates, while keeping optimization time in the range of tens of milliseconds.

References

  1. S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, pp. 275--286, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Berry, J. R. S. Blair, P. Heggernes, and B. W. Peyton. Maximum cardinality search for computing minimal triangulations of graphs. Algorithmica, 39(4):287--298, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. K. Chow and C. N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462467, 1968.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. G. Cowell, P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilisitc Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer, 1999.Google ScholarGoogle Scholar
  5. A. Deshpande, M. N. Garofalakis, and M. I. Jordan. Efficient stepwise selection in decomposable models. In UAI, pp. 128--135, 2001.Google ScholarGoogle Scholar
  6. A. Deshpande, M. N. Garofalakis, and R. Rastogi. Independence is good: Dependency-based histogram synopses for high-dimensional data. In SIGMOD, pp. 199--210, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Getoor, B. Taskar, and D. Koller. Selectivity estimation using probabilistic models. In SIGMOD, pp. 461--472, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic discovery of correlations and soft functional dependencies. In SIGMOD, pp. 647--658, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. E. Ioannidis. The history of histograms (abridged). In VLDB, pp. 19--30, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, pp. 268--277, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. V. Jensen and F. Jensen. Optimal junction trees. In UAI, pp. 360--366, 1994.Google ScholarGoogle Scholar
  12. H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. CORADD: Correlation aware database designer for materialized views and indexes. PVLDB, 3(1):1103--1113, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In SIGMOD, pp. 448--459, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Moerkotte, T. Neumann, and G. Steidl. Preventing bad plans by bounding the impact of cardinality estimation errors. PVLDB, 2(1):982--993, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Poosala and Y. E. Ioannidis. Selectivity estimation without the attribute value independence assumption. In VLDB, pp. 486--495, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In SIGMOD, pp. 23--34, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Spiegel and N. Polyzotis. Graph-based synopses for relational selectivity estimation. In SIGMOD, pp. 205--216, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Tzoumas, A. Deshpande, and C. S. Jensen. Sharing-aware horizontal partitioning for exploiting correlations during query processing. PVLDB, 3(1):542--553, 2010.Google ScholarGoogle Scholar

Index Terms

  1. Lightweight graphical models for selectivity estimation without independence assumptions
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 4, Issue 11
          August 2011
          520 pages

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 August 2011
          Published in pvldb Volume 4, Issue 11

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader