ABSTRACT
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a full conjunctive query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer's entropy inequality which is information-theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose running time achieve these optimal bounds. An answer to this question may be interesting to database practice, as we show in this paper that any project-join plan is polynomially slower than the optimal bound for some queries. We construct an algorithm whose running time is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer's inequality. In addition, we show that this bound is equivalent to a geometric inequality by Bollobás and Thomason, one of whose special cases is the famous Loomis-Whitney inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to compute a relaxed notion of joins.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-join sizes in limited storage. In PODS, pages 10--20, 1999. Google ScholarDigital Library
- N. Alon, I. Newman, A. Shen, G. Tardos, and N. K. Vereshchagin. Partitioning multi-dimensional sets in a small number of "uniform" parts. Eur. J. Comb., 28(1):134--144, 2007. Google ScholarDigital Library
- N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarCross Ref
- A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739--748. IEEE, 2008. Google ScholarDigital Library
- R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD Conference, pages 261--272, 2000. Google ScholarDigital Library
- S. Babu, P. Bizarro, and D. J. DeWitt. Proactive reoptimization. In SIGMOD Conference, pages 107--118, 2005. Google ScholarDigital Library
- B. Bollobás and A. Thomason. Projections of bodies and hereditary properties of hypergraphs. Bull. London Math. Soc., 27(5), 1995.Google Scholar
- F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer. Some intersection theorems for ordered sets and graphs. J. Combin. Theory Ser. A, 43(1):23--37, 1986. Google ScholarDigital Library
- A. Deligiannakis, M. N. Garofalakis, and N. Roussopoulos. Extended wavelets for multiple measures. TODS, 32(2):10, 2007. Google ScholarDigital Library
- J. Flum, M. Frick, and M. Grohe. Query evaluation via tree-decompositions. J. ACM, 49(6):716--752, 2002. Google ScholarDigital Library
- A. C. Gilbert, H. Q. Ngo, E. Porat, A. Rudra, and M. J. Strauss. Efficiently decodable l 2/l 2 for each compressed sensing with tiny failure probability, November 2011. Manuscript.Google Scholar
- G. Gottlob, S. T. Lee, and G. Valiant. Size and treewidth bounds for conjunctive queries. In PODS, pages 45--54, 2009. Google ScholarDigital Library
- G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions: A survey. In MFCS, 2001. Google ScholarDigital Library
- G. Gottlob, Z. Miklós, and T. Schwentick. Generalized hypertree decompositions: np-hardness and tractable variants. In PODS, 2007. Google ScholarDigital Library
- G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, June 1993. Google ScholarDigital Library
- M. Grohe and D. Marx. Constraint solving via fractional edge covers. In SODA, pages 289--298, 2006. Google ScholarDigital Library
- K. Gyarmati, M. Matolcsi, and I. Z. Ruzsa. A superadditivity and submultiplicativity property for cardinalities of sumsets. Combinatorica, 30(2):163--174, 2010. Google ScholarDigital Library
- T. S. Han. Nonnegative entropy measures of multivariate symmetric correlations. Information and Control, 36(2):133--156, 1978.Google ScholarCross Ref
- Y. E. Ioannidis. The history of histograms (abridged). In VLDB, 2003. Google ScholarDigital Library
- Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD Conference, 1991. Google ScholarDigital Library
- D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004. Google ScholarDigital Library
- H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. In VLDB, 1998. Google ScholarDigital Library
- A. C. König and G. Weikum. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. In VLDB, 1999. Google ScholarDigital Library
- A. R. Lehman and E. Lehman. Network coding: does the model need tuning? In SODA, pages 499--504, 2005. Google ScholarDigital Library
- L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bull. Amer. Math. Soc, 55:961--962, 1949.Google ScholarCross Ref
- R. Lyons. Probability on trees and networks, jun 2011. with Yuval Peres url: http://php.indiana.edu/ rdlyons/prbtree/prbtree.html.Google Scholar
- V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, pages 373--384, 2005. Google ScholarDigital Library
- D. Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. In STOC, pages 735--744, 2010. Google ScholarDigital Library
- H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms, 2012. arXiv:1203.1952 {cs.DB}.Google ScholarDigital Library
- H. Q. Ngo, E. Porat, and A. Rudra. Personal Communciation.Google Scholar
- A. Pagh and R. Pagh. Scalable computation of acyclic joins. In PODS, pages 225--232, 2006. Google ScholarDigital Library
- V. Poosala, Y. Ioannidis, P. Haas, and E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In SIGMOD, pages 294--305, 1996. Google ScholarDigital Library
- A. Schrijver. Combinatorial optimization. Polyhedra and efficiency. Vol. A, volume 24 of Algorithms and Combinatorics. Springer-Verlag, Berlin, 2003.Google Scholar
- U. Srivastava, P. J. Haas, V. Markl,M. Kutsch, and T. M. Tran. Isomer: Consistent histogram construction using query feedback. In ICDE, page 39, 2006. Google ScholarDigital Library
- K. Tzoumas, A. Deshpande, and C. S. Jensen. Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB, 4(11):852--863, 2011.Google ScholarDigital Library
- L. G. Valiant and V. V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85--93, 1986. Google ScholarDigital Library
- D. E. Willard. Applications of range query theory to relational data base join and selection operations. J. Comput. Syst. Sci., 52(1), 1996. Google ScholarDigital Library
- Y. Xu, P. Kostamaa, X. Zhou, and L. Chen. Handling data skew in parallel joins in shared-nothing systems. In SIGMOD, 2008. Google ScholarDigital Library
Index Terms
- Worst-case optimal join algorithms: [extended abstract]
Recommendations
Worst-case Optimal Join Algorithms
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in ...
Relations between Average-Case and Worst-Case Complexity
The consequences of the worst-case assumption NP=P are very well understood. On the other hand, we only know a few consequences of the analogous average-case assumption “NP is easy on average.” In this paper we establish several new results on the worst-...
On Worst-Case to Average-Case Reductions for NP Problems
We show that if an NP-complete problem has a nonadaptive self-corrector with respect to any samplable distribution, then coNP is contained in NP/poly and the polynomial hierarchy collapses to the third level. Feigenbaum and Fortnow [SIAM J. Comput., 22 (...
Comments