research-article

Lightweight graphical models for selectivity estimation without independence assumptions

Authors:
Kostas Tzoumas

Aalborg University, Denmark

Aalborg University, Denmark
View Profile

,
Amol Deshpande

University of Maryland

University of Maryland
View Profile

,
Christian S. Jensen

Aarhus University, Denmark

Aarhus University, Denmark
View Profile

Proceedings of the VLDB Endowment Volume 4 Issue 11pp 852–863https://doi.org/10.14778/3402707.3402724

Published:01 August 2011Publication History

Proceedings of the VLDB Endowment

Abstract

As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintain one-dimensional statistical summaries and make the attribute value independence and join uniformity assumptions for efficiently estimating selectivities. Therefore, selectivity estimation errors in today's optimizers are frequently caused by missed correlations between attributes. We present a selectivity estimation approach that does not make the independence assumptions. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution of all the attributes in the database into small, usually two-dimensional distributions. We describe several optimizations that can make selectivity estimation highly efficient, and we present a complete implementation inside PostgreSQL's query optimizer. Experimental results indicate an order of magnitude better selectivity estimates, while keeping optimization time in the range of tens of milliseconds.

References

S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, pp. 275--286, 1999.Google ScholarDigital Library
A. Berry, J. R. S. Blair, P. Heggernes, and B. W. Peyton. Maximum cardinality search for computing minimal triangulations of graphs. Algorithmica, 39(4):287--298, 2004.Google ScholarDigital Library
C. K. Chow and C. N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462467, 1968.Google ScholarDigital Library
R. G. Cowell, P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilisitc Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer, 1999.Google Scholar
A. Deshpande, M. N. Garofalakis, and M. I. Jordan. Efficient stepwise selection in decomposable models. In UAI, pp. 128--135, 2001.Google Scholar
A. Deshpande, M. N. Garofalakis, and R. Rastogi. Independence is good: Dependency-based histogram synopses for high-dimensional data. In SIGMOD, pp. 199--210, 2001.Google ScholarDigital Library
L. Getoor, B. Taskar, and D. Koller. Selectivity estimation using probabilistic models. In SIGMOD, pp. 461--472, 2001.Google ScholarDigital Library
I. F. Ilyas, V. Markl, P. J. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic discovery of correlations and soft functional dependencies. In SIGMOD, pp. 647--658, 2004.Google ScholarDigital Library
Y. E. Ioannidis. The history of histograms (abridged). In VLDB, pp. 19--30, 2003.Google ScholarDigital Library
Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD, pp. 268--277, 1991.Google ScholarDigital Library
F. V. Jensen and F. Jensen. Optimal junction trees. In UAI, pp. 360--366, 1994.Google Scholar
H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. CORADD: Correlation aware database designer for materialized views and indexes. PVLDB, 3(1):1103--1113, 2010.Google ScholarDigital Library
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. In SIGMOD, pp. 448--459, 1998.Google ScholarDigital Library
G. Moerkotte, T. Neumann, and G. Steidl. Preventing bad plans by bounding the impact of cardinality estimation errors. PVLDB, 2(1):982--993, 2009.Google ScholarDigital Library
V. Poosala and Y. E. Ioannidis. Selectivity estimation without the attribute value independence assumption. In VLDB, pp. 486--495, 1997.Google ScholarDigital Library
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In SIGMOD, pp. 23--34, 1979.Google ScholarDigital Library
J. Spiegel and N. Polyzotis. Graph-based synopses for relational selectivity estimation. In SIGMOD, pp. 205--216, 2006.Google ScholarDigital Library
K. Tzoumas, A. Deshpande, and C. S. Jensen. Sharing-aware horizontal partitioning for exploiting correlations during query processing. PVLDB, 3(1):542--553, 2010.Google Scholar

Index Terms

Lightweight graphical models for selectivity estimation without independence assumptions
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
  2. Information systems applications
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Selectivity estimation - the problem of estimating the result size of queries - is a fundamental problem in databases. Accurate estimation of query selectivity involving multiple correlated attributes is especially challenging. Poor cardinality ...
Read More
Selectivity estimation for range predicates using lightweight models

Query optimizers depend on selectivity estimates of query predicates to produce a good execution plan. When a query contains multiple predicates, today's optimizers use a variety of assumptions, such as independence between predicates, to estimate ...
Read More
Efficiently adapting graphical models for selectivity estimation

Query optimizers rely on statistical models that succinctly describe the underlying data. Models are used to derive cardinality estimates for intermediate relations, which in turn guide the optimizer to choose the best query execution plan. The quality ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 4, Issue 11
August 2011
520 pages
ISSN:2150-8097
Editor:
H. V. Jagadish
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2011
Published in pvldb Volume 4, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 91
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lightweight graphical models for selectivity estimation without independence assumptions

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries

Selectivity estimation for range predicates using lightweight models

Efficiently adapting graphical models for selectivity estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Lightweight graphical models for selectivity estimation without independence assumptions

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries

Selectivity estimation for range predicates using lightweight models

Efficiently adapting graphical models for selectivity estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media