skip to main content
research-article
Public Access

Joins via Geometric Resolutions: Worst Case and Beyond

Published:08 November 2016Publication History
Skip Abstract Section

Abstract

We present a simple geometric framework for the relational join. Using this framework, we design an algorithm that achieves the fractional hypertree-width bound, which generalizes classical and recent worst-case algorithmic results on computing joins. In addition, we use our framework and the same algorithm to show a series of what are colloquially known as beyond worst-case results. The framework allows us to prove results for data stored in BTrees, multidimensional data structures, and even multiple indices per table. A key idea in our framework is formalizing the inference one does with an index as a type of geometric resolution, transforming the algorithmic problem of computing joins to a geometric problem. Our notion of geometric resolution can be viewed as a geometric analog of logical resolution. In addition to the geometry and logic connections, our algorithm can also be thought of as backtracking search with memoization.

Skip Supplemental Material Section

Supplemental Material

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2015. Joins via geometric resolutions: Worst-case and beyond. In Proceedings of the 34th ACM Symposium on Principles of Database Systems (PODS’15). ACM, New York, NY, 213--228. DOI:http://dx.doi.org/10.1145/2745754.2745776 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peyman Afshani, Jérémy Barbay, and Timothy M. Chan. 2009. Instance-optimal geometric algorithms. In FOCS. 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eric Allender, Lisa Hellerstein, Paul McCabe, Toniann Pitassi, and Michael E. Saks. 2006. Minimizing DNF formulas and AC0d circuits given a truth table. In IEEE Conference on Computational Complexity (CCC’06). IEEE Computer Society, 237--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Noga Alon. 1981. On the number of subgraphs of prescribed type of graphs with a given number of edges. Israel J. Math. 38, 1--2 (1981), 116--130. DOI:http://dx.doi.org/10.1007/BF02761855Google ScholarGoogle ScholarCross RefCross Ref
  6. Stefan Arnborg and Andrzej Proskurowski. 1989. Linear time algorithms for NP-hard problems restricted to partial k-trees. Discrete Appl. Math. 23, 1 (1989), 11--24. DOI:http://dx.doi.org/10.1016/0166-218X(89)90031-0 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query plans for relational joins. In FOCS. IEEE Computer Society, 739--748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jérémy Barbay and Claire Kenyon. 2002. Adaptive intersection and t-threshold problems. In SODA. 390--399. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jérémy Barbay and Claire Kenyon. 2008. Alternation and redundancy analysis of the intersection problem. ACM Trans. Algorithms 4, 1 (2008), 4:1--4:18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Elisa Bertino, Beng Chin Ooi, Ron Sacks-Davis, Kian-Lee Tan, Justin Zobel, Boris Shidlovsky, and Daniele Andronico. 2012. Indexing Techniques for Advanced Database Systems. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Olaf Beyersdorff, Nicola Galesi, and Massimo Lauria. 2013. Parameterized complexity of DPLL search procedures. ACM Trans. Comput. Log. 14, 3 (2013), 20. DOI:http://dx.doi.org/10.1145/2499937.2499941 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bozhena Bidyuk and Rina Dechter. 2004. On finding minimal w-cutset. In UAI, David Maxwell Chickering and Joseph Y. Halpern (Eds.). AUAI Press, 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD. ACM, 37--48. DOI:http://dx.doi.org/10.1145/1989323.1989328 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. M. Chan. 2013. Klee’s measure problem made easy. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS’13). 410--419. DOI:http://dx.doi.org/10.1109/FOCS.2013.51 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Surajit Chaudhuri. 1998. An overview of query optimization in relational systems. In PODS. ACM, 34--43. DOI:http://dx.doi.org/10.1145/275487.275492 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chandra Chekuri and Anand Rajaraman. 2000. Conjunctive query containment revisited. Theor. Comput. Sci. 239, 2 (2000), 211--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jianer Chen, Songjian Lu, Sing-Hoi Sze, and Fenghui Zhang. 2007. Improved algorithms for path, matching, and packing problems. In SODA, Nikhil Bansal, Kirk Pruhs, and Clifford Stein (Eds.). SIAM, 298--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Martin Davis, George Logemann, and Donald Loveland. 1962. A machine program for theorem-proving. Comm. ACM 5 (1962), 394--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Martin Davis and Hilary Putnam. 1960. A computing procedure for quantification theory. J. Assoc. Comput. Mach. 7 (1960), 201--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. G. de Bruijn, Ca. van Ebbenhorst Tengbergen, and D. Kruyswijk. 1951. On the set of divisors of a number. Nieuw Arch. Wiskunde (2) 23 (1951), 191--193.Google ScholarGoogle Scholar
  21. Rina Dechter. 1990. Enhancement schemes for constraint processing: Backjumping, learning, and cutset decomposition. Artif. Intell. 41, 3 (1990), 273--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rina Dechter. 2003. Constraint Processing. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Rina Dechter and Judea Pearl. 1988. Tree-clustering schemes for constraint-processing. In AAAI, Howard E. Shrobe, Tom M. Mitchell, and Reid G. Smith (Eds.). AAAI Press/MIT Press, 150--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rina Dechter and Judea Pearl. 1989. Tree clustering for constraint networks. Artific. Intell. 38, 3 (1989), 353--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rina Dechter and Irina Rish. 1994. Directional resolution: The Davis-Putnam procedure, revisited. In KR, Jon Doyle, Erik Sandewall, and Pietro Torasso (Eds.). Morgan Kaufmann, 134--145.Google ScholarGoogle Scholar
  26. Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. 2000. Adaptive set intersections, unions, and differences. In SODA. 743--752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Niklas Eén and Niklas Sörensson. 2003. An extensible SAT-solver. In SAT (Lecture Notes in Computer Science), Enrico Giunchiglia and Armando Tacchella (Eds.), Vol. 2919. Springer, 502--518.Google ScholarGoogle Scholar
  28. Ronald Fagin. 1983. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM 30, 3 (1983), 514--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ronald Fagin, Amnon Lotem, and Moni Naor. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’01). ACM, New York, NY, 102--113. DOI:http://dx.doi.org/10.1145/375551.375567 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wenfei Fan, Floris Geerts, Yang Cao, Ting Deng, and Ping Lu. 2015. Querying big data by accessing small data. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’15). ACM, New York, NY, 173--184. DOI:http://dx.doi.org/10.1145/2745754.2745771 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ehud Friedgut and Jeff Kahn. 1998. On the number of copies of one hypergraph in another. Israel J. Math. 105 (1998), 251--256. DOI:http://dx.doi.org/10.1007/BF02780332Google ScholarGoogle ScholarCross RefCross Ref
  32. Andreas Goerdt. 1992. Davis-Putnam resolution versus unrestricted resolution. Ann. Math. Artific. Intell. 6, 1--3 (1992), 169--184. DOI:http://dx.doi.org/10.1007/BF01531027Google ScholarGoogle Scholar
  33. Georg Gottlob, Nicola Leone, and Francesco Scarcello. 2003. Robbers, marshals, and guards: Game theoretic and logical characterizations of hypertree width. J. Comput. Syst. Sci. 66, 4 (2003), 775--808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Goetz Graefe. 1993. Query evaluation techniques for large databases. Comput. Surv. 25, 2 (June 1993), 73--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jerrold R. Griggs. 1984. Maximum antichains in the product of chains. Order 1, 1 (1984), 21--28. DOI:http://dx.doi.org/10.1007/BF00396270Google ScholarGoogle ScholarCross RefCross Ref
  36. Martin Grohe. 2013. Bounds and algorithms for joins via fractional edge covers. In Search of Elegance in the Theory and Practice of Computation (Lecture Notes in Computer Science), Val Tannen, Limsoon Wong, Leonid Libkin, Wenfei Fan, Wang-Chiew Tan, and Michael P. Fourman (Eds.), Vol. 8000. Springer, 321--338.Google ScholarGoogle Scholar
  37. Martin Grohe and Dániel Marx. 2006. Constraint solving via fractional edge covers. In SODA. ACM, 289--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Marc Gyssens, Peter Jeavons, and David A. Cohen. 1994. Decomposing constraint satisfaction problems using database techniques. Artif. Intell. 66, 1 (1994), 57--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Marc Gyssens and Jan Paredaens. 1982. A decomposition methodology for cyclic databases. In Advances in Data Base Theory. 85--122.Google ScholarGoogle Scholar
  40. Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1378--1389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Phokion G. Kolaitis and Moshe Y. Vardi. 2000. Conjunctive-query containment and constraint satisfaction. J. Comput. Syst. Sci. 61, 2 (2000), 302--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. David Maier. 1983. The Theory of Relational Databases. Computer Science Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Joao Marques-Silva, Inês Lynce, and Sharad Malik. 2009. Conflict-driven clause learning SAT solvers. Handbook of Satisfiability 185 (2009), 131--153.Google ScholarGoogle Scholar
  44. Dániel Marx. 2010a. Approximating fractional hypertree width. ACM Trans. Algorithms 6, 2 (2010), 29:1--29:17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Dániel Marx. 2010b. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. In STOC. 735--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Dániel Marx. 2011. Tractable structures for constraint satisfaction with truth tables. Theory Comput. Syst. 48, 3 (2011), 444--464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Dániel Marx. 2013. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J. ACM 60, 6 (2013), 42. DOI:http://dx.doi.org/10.1145/2535926 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. W. J. Masek. 1979. Some NP-complete set covering problems. Unpublished manuscript.Google ScholarGoogle Scholar
  49. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (October 2002), 824--827.Google ScholarGoogle ScholarCross RefCross Ref
  50. Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. 2001. Chaff: Engineering an efficient SAT solver. In DAC. ACM, 530--535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Hung Q. Ngo, Dung T. Nguyen, Christopher Ré, and Atri Rudra. 2014. Beyond worst-case analysis for joins with minesweeper. In PODS. 234--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst-case optimal join algorithms {Extended abstract}. In PODS. 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2013. Skew strikes back: New developments in the theory of join algorithms. In SIGMOD RECORD. 5--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Dung Nguyen, Molham Aref, Martin Bravenboer, George Kollias, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2015. Join processing for graph patterns: An old dog with new tricks. In Proceedings of the GRADES’15 (GRADES’15). ACM, New York, NY, Article 2, 8 pages. DOI:http://dx.doi.org/10.1145/2764947.2764948 Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Dan Olteanu and Jakub Zavodny. 2015. Size bounds for factorised representations of query results. ACM Trans. Database Syst. 40, 1 (March 2015), Article No. 2, 2:1--2:44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Patrick E. O’Neil and Goetz Graefe. 1995. Multi-table joins through bitmapped join indices. SIGMOD Rec. 24, 3 (1995), 8--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Patrick E. O’Neil and Dallan Quass. 1997. Improved query performance with variant indexes. In SIGMOD Conference, Joan Peckham (Ed.). ACM Press, 38--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Mark H. Overmars and Chee-Keng Yap. 1991. New upper bounds in Klees measure problem. SIAM J. Comput. 20, 6 (1991), 1034--1045. DOI:http://dx.doi.org/10.1137/0220065 Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Christos H. Papadimitriou and Mihalis Yannakakis. 1997. On the complexity of database queries. In PODS. 12--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Mihai Pǎtraşcu. 2010. Towards polynomial lower bounds for dynamic problems. In Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC’10). 603--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Judea Pearl. 1989. Probabilistic Reasoning in Intelligent Systems - Networks of Plausible Inference. Morgan Kaufmann, I--XIX, 1--552 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Natasa Przulj, Derek G. Corneil, and Igor Jurisica. 2004. Modeling interactome: Scale-free or geometric? Bioinformatics 20, 18 (2004), 3508--3515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Raghu Ramakrishnan and Johannes Gehrke. 2003. Database Management Systems (3rd ed.). McGraw-Hill, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Neil Robertson and P. D. Seymour. 1986. Graph minors. II. Algorithmic aspects of tree-width. J. Algorithms 7, 3 (1986), 309--322. DOI:http://dx.doi.org/10.1016/0196-6774(86)90023-4Google ScholarGoogle ScholarCross RefCross Ref
  65. Tim Roughgarden. 2009. Lecture notes on beyond worst-case analysis (CS264). http://theory.stanford.edu/∼tim/notes.html.Google ScholarGoogle Scholar
  66. Francesco Scarcello. 2005. Query answering exploiting structural properties. SIGMOD Rec. 34, 3 (2005), 91--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. João P. Marques Silva and Karem A. Sakallah. 1996. GRASP - a new search algorithm for satisfiability. In ICCAD. 220--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. João P. Marques Silva and Karem A. Sakallah. 1999. GRASP: A search algorithm for propositional satisfiability. IEEE Trans. Comput. 48, 5 (1999), 506--521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Israel Spiegler and Rafi Maayan. 1985. Storage and retrieval considerations of binary data bases. Inf. Process. Manage. 21, 3 (1985), 233--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW. 607--614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In ICDM. IEEE Computer Society, 608--617. DOI:http://dx.doi.org/10.1109/ICDM.2008.72 Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Jeffrey D. Ullman. 1989. Principles of Database and Knowledge-Base Systems, Vol. II. Computer Science Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Todd L. Veldhuizen. 2014. Triejoin: A simple, worst-case optimal join algorithm. In ICDT. 96--106.Google ScholarGoogle Scholar
  74. Mihalis Yannakakis. 1981. Algorithms for acyclic database schemes. In VLDB. 82--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Hantao Zhang. 1997. SATO: An efficient propositional prover. In CADE (Lecture Notes in Computer Science), William McCune (Ed.), Vol. 1249. Springer, 272--275. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Joins via Geometric Resolutions: Worst Case and Beyond

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 41, Issue 4
      Invited Paper from EDBT 2015, Invited Paper from PODS 2015 and Regular Papers
      December 2016
      309 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/3014437
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 November 2016
      • Revised: 1 June 2016
      • Accepted: 1 June 2016
      • Received: 1 December 2015
      Published in tods Volume 41, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader