research-article

Worst-case optimal join algorithms: [extended abstract]

Authors:
Hung Q. Ngo

University at Buffalo, SUNY, Amherst, NY, USA

University at Buffalo, SUNY, Amherst, NY, USA
View Profile

,
Ely Porat

Bar-Ilan University, Ramat-Gan, Israel

Bar-Ilan University, Ramat-Gan, Israel
View Profile

,
Christopher Ré

University of Wisconsin, Madison, USA

University of Wisconsin, Madison, USA
View Profile

,
Atri Rudra

University at Buffalo, SUNY, Amherst, NY, USA

University at Buffalo, SUNY, Amherst, NY, USA
View Profile

PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database SystemsMay 2012Pages 37–48https://doi.org/10.1145/2213556.2213565

Published:21 May 2012Publication History

PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems

Pages 37–48

ABSTRACT

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a full conjunctive query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer's entropy inequality which is information-theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose running time achieve these optimal bounds. An answer to this question may be interesting to database practice, as we show in this paper that any project-join plan is polynomially slower than the optimal bound for some queries. We construct an algorithm whose running time is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer's inequality. In addition, we show that this bound is equivalent to a geometric inequality by Bollobás and Thomason, one of whose special cases is the famous Loomis-Whitney inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to compute a relaxed notion of joins.

References

S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-join sizes in limited storage. In PODS, pages 10--20, 1999. Google ScholarDigital Library
N. Alon, I. Newman, A. Shen, G. Tardos, and N. K. Vereshchagin. Partitioning multi-dimensional sets in a small number of "uniform" parts. Eur. J. Comb., 28(1):134--144, 2007. Google ScholarDigital Library
N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarCross Ref
A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739--748. IEEE, 2008. Google ScholarDigital Library
R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD Conference, pages 261--272, 2000. Google ScholarDigital Library
S. Babu, P. Bizarro, and D. J. DeWitt. Proactive reoptimization. In SIGMOD Conference, pages 107--118, 2005. Google ScholarDigital Library
B. Bollobás and A. Thomason. Projections of bodies and hereditary properties of hypergraphs. Bull. London Math. Soc., 27(5), 1995.Google Scholar
F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer. Some intersection theorems for ordered sets and graphs. J. Combin. Theory Ser. A, 43(1):23--37, 1986. Google ScholarDigital Library
A. Deligiannakis, M. N. Garofalakis, and N. Roussopoulos. Extended wavelets for multiple measures. TODS, 32(2):10, 2007. Google ScholarDigital Library
J. Flum, M. Frick, and M. Grohe. Query evaluation via tree-decompositions. J. ACM, 49(6):716--752, 2002. Google ScholarDigital Library
A. C. Gilbert, H. Q. Ngo, E. Porat, A. Rudra, and M. J. Strauss. Efficiently decodable l 2/l 2 for each compressed sensing with tiny failure probability, November 2011. Manuscript.Google Scholar
G. Gottlob, S. T. Lee, and G. Valiant. Size and treewidth bounds for conjunctive queries. In PODS, pages 45--54, 2009. Google ScholarDigital Library
G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions: A survey. In MFCS, 2001. Google ScholarDigital Library
G. Gottlob, Z. Miklós, and T. Schwentick. Generalized hypertree decompositions: np-hardness and tractable variants. In PODS, 2007. Google ScholarDigital Library
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, June 1993. Google ScholarDigital Library
M. Grohe and D. Marx. Constraint solving via fractional edge covers. In SODA, pages 289--298, 2006. Google ScholarDigital Library
K. Gyarmati, M. Matolcsi, and I. Z. Ruzsa. A superadditivity and submultiplicativity property for cardinalities of sumsets. Combinatorica, 30(2):163--174, 2010. Google ScholarDigital Library
T. S. Han. Nonnegative entropy measures of multivariate symmetric correlations. Information and Control, 36(2):133--156, 1978.Google ScholarCross Ref
Y. E. Ioannidis. The history of histograms (abridged). In VLDB, 2003. Google ScholarDigital Library
Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD Conference, 1991. Google ScholarDigital Library
D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004. Google ScholarDigital Library
H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. In VLDB, 1998. Google ScholarDigital Library
A. C. König and G. Weikum. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. In VLDB, 1999. Google ScholarDigital Library
A. R. Lehman and E. Lehman. Network coding: does the model need tuning? In SODA, pages 499--504, 2005. Google ScholarDigital Library
L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bull. Amer. Math. Soc, 55:961--962, 1949.Google ScholarCross Ref
R. Lyons. Probability on trees and networks, jun 2011. with Yuval Peres url: http://php.indiana.edu/ rdlyons/prbtree/prbtree.html.Google Scholar
V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, pages 373--384, 2005. Google ScholarDigital Library
D. Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. In STOC, pages 735--744, 2010. Google ScholarDigital Library
H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms, 2012. arXiv:1203.1952 {cs.DB}.Google ScholarDigital Library
H. Q. Ngo, E. Porat, and A. Rudra. Personal Communciation.Google Scholar
A. Pagh and R. Pagh. Scalable computation of acyclic joins. In PODS, pages 225--232, 2006. Google ScholarDigital Library
V. Poosala, Y. Ioannidis, P. Haas, and E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In SIGMOD, pages 294--305, 1996. Google ScholarDigital Library
A. Schrijver. Combinatorial optimization. Polyhedra and efficiency. Vol. A, volume 24 of Algorithms and Combinatorics. Springer-Verlag, Berlin, 2003.Google Scholar
U. Srivastava, P. J. Haas, V. Markl,M. Kutsch, and T. M. Tran. Isomer: Consistent histogram construction using query feedback. In ICDE, page 39, 2006. Google ScholarDigital Library
K. Tzoumas, A. Deshpande, and C. S. Jensen. Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB, 4(11):852--863, 2011.Google ScholarDigital Library
L. G. Valiant and V. V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85--93, 1986. Google ScholarDigital Library
D. E. Willard. Applications of range query theory to relational data base join and selection operations. J. Comput. Syst. Sci., 52(1), 1996. Google ScholarDigital Library
Y. Xu, P. Kostamaa, X. Zhou, and L. Chen. Handling data skew in parallel joins in shared-nothing systems. In SIGMOD, 2008. Google ScholarDigital Library

Index Terms

Worst-case optimal join algorithms: [extended abstract]
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Relational database model
    2. Query languages
      1. Relational database query languages
2. Theory of computation
  1. Randomness, geometry and discrete structures

Recommendations

Worst-case Optimal Join Algorithms

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in ...
Read More
Relations between Average-Case and Worst-Case Complexity

The consequences of the worst-case assumption NP=P are very well understood. On the other hand, we only know a few consequences of the analogous average-case assumption “NP is easy on average.” In this paper we establish several new results on the worst-...
Read More
On Worst-Case to Average-Case Reductions for NP Problems

We show that if an NP-complete problem has a nonadaptive self-corrector with respect to any samplable distribution, then coNP is contained in NP/poly and the polynomial hierarchy collapses to the third level. Feigenbaum and Fortnow [SIAM J. Comput., 22 (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems
May 2012
332 pages
ISBN:9781450312486
DOI:10.1145/2213556
Editor:
Markus Krötzsch
University of Oxford, UK
,
General Chair:
Maurizio Lenzerini
University of Rome La Sapienza, Italy
,
Program Chair:
Michael Benedikt
University of Oxford, UK
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 May 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bollob?s-thomason inequality
fractional cover bound
join algorithms
loomis-whitney inequality
Qualifiers
- research-article
Conference

Acceptance Rates
PODS '12 Paper Acceptance Rate26of101submissions,26%Overall Acceptance Rate642of2,707submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 107
  Total Citations
  View Citations
- 843
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Worst-case optimal join algorithms: [extended abstract]

PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Worst-case Optimal Join Algorithms

Relations between Average-Case and Worst-Case Complexity

On Worst-Case to Average-Case Reductions for NP Problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Worst-case optimal join algorithms: [extended abstract]

PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Worst-case Optimal Join Algorithms

Relations between Average-Case and Worst-Case Complexity

On Worst-Case to Average-Case Reductions for NP Problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media