skip to main content
10.1145/2213556.2213565acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Worst-case optimal join algorithms: [extended abstract]

Published:21 May 2012Publication History

ABSTRACT

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a full conjunctive query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer's entropy inequality which is information-theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose running time achieve these optimal bounds. An answer to this question may be interesting to database practice, as we show in this paper that any project-join plan is polynomially slower than the optimal bound for some queries. We construct an algorithm whose running time is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer's inequality. In addition, we show that this bound is equivalent to a geometric inequality by Bollobás and Thomason, one of whose special cases is the famous Loomis-Whitney inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to compute a relaxed notion of joins.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-join sizes in limited storage. In PODS, pages 10--20, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Alon, I. Newman, A. Shen, G. Tardos, and N. K. Vereshchagin. Partitioning multi-dimensional sets in a small number of "uniform" parts. Eur. J. Comb., 28(1):134--144, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. In FOCS, pages 739--748. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD Conference, pages 261--272, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Babu, P. Bizarro, and D. J. DeWitt. Proactive reoptimization. In SIGMOD Conference, pages 107--118, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Bollobás and A. Thomason. Projections of bodies and hereditary properties of hypergraphs. Bull. London Math. Soc., 27(5), 1995.Google ScholarGoogle Scholar
  9. F. R. K. Chung, R. L. Graham, P. Frankl, and J. B. Shearer. Some intersection theorems for ordered sets and graphs. J. Combin. Theory Ser. A, 43(1):23--37, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Deligiannakis, M. N. Garofalakis, and N. Roussopoulos. Extended wavelets for multiple measures. TODS, 32(2):10, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Flum, M. Frick, and M. Grohe. Query evaluation via tree-decompositions. J. ACM, 49(6):716--752, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. C. Gilbert, H. Q. Ngo, E. Porat, A. Rudra, and M. J. Strauss. Efficiently decodable l 2/l 2 for each compressed sensing with tiny failure probability, November 2011. Manuscript.Google ScholarGoogle Scholar
  13. G. Gottlob, S. T. Lee, and G. Valiant. Size and treewidth bounds for conjunctive queries. In PODS, pages 45--54, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions: A survey. In MFCS, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Gottlob, Z. Miklós, and T. Schwentick. Generalized hypertree decompositions: np-hardness and tractable variants. In PODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Grohe and D. Marx. Constraint solving via fractional edge covers. In SODA, pages 289--298, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Gyarmati, M. Matolcsi, and I. Z. Ruzsa. A superadditivity and submultiplicativity property for cardinalities of sumsets. Combinatorica, 30(2):163--174, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. S. Han. Nonnegative entropy measures of multivariate symmetric correlations. Information and Control, 36(2):133--156, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  20. Y. E. Ioannidis. The history of histograms (abridged). In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD Conference, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64(9):1017--1026, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. In VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. C. König and G. Weikum. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. R. Lehman and E. Lehman. Network coding: does the model need tuning? In SODA, pages 499--504, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. H. Loomis and H. Whitney. An inequality related to the isoperimetric inequality. Bull. Amer. Math. Soc, 55:961--962, 1949.Google ScholarGoogle ScholarCross RefCross Ref
  27. R. Lyons. Probability on trees and networks, jun 2011. with Yuval Peres url: http://php.indiana.edu/ rdlyons/prbtree/prbtree.html.Google ScholarGoogle Scholar
  28. V. Markl, N. Megiddo, M. Kutsch, T. M. Tran, P. J. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In VLDB, pages 373--384, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Marx. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. In STOC, pages 735--744, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms, 2012. arXiv:1203.1952 {cs.DB}.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Q. Ngo, E. Porat, and A. Rudra. Personal Communciation.Google ScholarGoogle Scholar
  32. A. Pagh and R. Pagh. Scalable computation of acyclic joins. In PODS, pages 225--232, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. Poosala, Y. Ioannidis, P. Haas, and E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In SIGMOD, pages 294--305, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Schrijver. Combinatorial optimization. Polyhedra and efficiency. Vol. A, volume 24 of Algorithms and Combinatorics. Springer-Verlag, Berlin, 2003.Google ScholarGoogle Scholar
  35. U. Srivastava, P. J. Haas, V. Markl,M. Kutsch, and T. M. Tran. Isomer: Consistent histogram construction using query feedback. In ICDE, page 39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Tzoumas, A. Deshpande, and C. S. Jensen. Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB, 4(11):852--863, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. G. Valiant and V. V. Vazirani. NP is as easy as detecting unique solutions. Theor. Comput. Sci., 47(3):85--93, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. E. Willard. Applications of range query theory to relational data base join and selection operations. J. Comput. Syst. Sci., 52(1), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Xu, P. Kostamaa, X. Zhou, and L. Chen. Handling data skew in parallel joins in shared-nothing systems. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Worst-case optimal join algorithms: [extended abstract]

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems
          May 2012
          332 pages
          ISBN:9781450312486
          DOI:10.1145/2213556

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 May 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          PODS '12 Paper Acceptance Rate26of101submissions,26%Overall Acceptance Rate642of2,707submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader