Abstract
We present a simple geometric framework for the relational join. Using this framework, we design an algorithm that achieves the fractional hypertree-width bound, which generalizes classical and recent worst-case algorithmic results on computing joins. In addition, we use our framework and the same algorithm to show a series of what are colloquially known as beyond worst-case results. The framework allows us to prove results for data stored in BTrees, multidimensional data structures, and even multiple indices per table. A key idea in our framework is formalizing the inference one does with an index as a type of geometric resolution, transforming the algorithmic problem of computing joins to a geometric problem. Our notion of geometric resolution can be viewed as a geometric analog of logical resolution. In addition to the geometry and logic connections, our algorithm can also be thought of as backtracking search with memoization.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Joins via Geometric Resolutions: Worst Case and Beyond
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
- Mahmoud Abo Khamis, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2015. Joins via geometric resolutions: Worst-case and beyond. In Proceedings of the 34th ACM Symposium on Principles of Database Systems (PODS’15). ACM, New York, NY, 213--228. DOI:http://dx.doi.org/10.1145/2745754.2745776 Google ScholarDigital Library
- Peyman Afshani, Jérémy Barbay, and Timothy M. Chan. 2009. Instance-optimal geometric algorithms. In FOCS. 129--138. Google ScholarDigital Library
- Eric Allender, Lisa Hellerstein, Paul McCabe, Toniann Pitassi, and Michael E. Saks. 2006. Minimizing DNF formulas and AC0d circuits given a truth table. In IEEE Conference on Computational Complexity (CCC’06). IEEE Computer Society, 237--251. Google ScholarDigital Library
- Noga Alon. 1981. On the number of subgraphs of prescribed type of graphs with a given number of edges. Israel J. Math. 38, 1--2 (1981), 116--130. DOI:http://dx.doi.org/10.1007/BF02761855Google ScholarCross Ref
- Stefan Arnborg and Andrzej Proskurowski. 1989. Linear time algorithms for NP-hard problems restricted to partial k-trees. Discrete Appl. Math. 23, 1 (1989), 11--24. DOI:http://dx.doi.org/10.1016/0166-218X(89)90031-0 Google ScholarDigital Library
- Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query plans for relational joins. In FOCS. IEEE Computer Society, 739--748. Google ScholarDigital Library
- Jérémy Barbay and Claire Kenyon. 2002. Adaptive intersection and t-threshold problems. In SODA. 390--399. Google ScholarDigital Library
- Jérémy Barbay and Claire Kenyon. 2008. Alternation and redundancy analysis of the intersection problem. ACM Trans. Algorithms 4, 1 (2008), 4:1--4:18. Google ScholarDigital Library
- Elisa Bertino, Beng Chin Ooi, Ron Sacks-Davis, Kian-Lee Tan, Justin Zobel, Boris Shidlovsky, and Daniele Andronico. 2012. Indexing Techniques for Advanced Database Systems. Springer. Google ScholarDigital Library
- Olaf Beyersdorff, Nicola Galesi, and Massimo Lauria. 2013. Parameterized complexity of DPLL search procedures. ACM Trans. Comput. Log. 14, 3 (2013), 20. DOI:http://dx.doi.org/10.1145/2499937.2499941 Google ScholarDigital Library
- Bozhena Bidyuk and Rina Dechter. 2004. On finding minimal w-cutset. In UAI, David Maxwell Chickering and Joseph Y. Halpern (Eds.). AUAI Press, 43--50. Google ScholarDigital Library
- Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD. ACM, 37--48. DOI:http://dx.doi.org/10.1145/1989323.1989328 Google ScholarDigital Library
- T. M. Chan. 2013. Klee’s measure problem made easy. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS’13). 410--419. DOI:http://dx.doi.org/10.1109/FOCS.2013.51 Google ScholarDigital Library
- Surajit Chaudhuri. 1998. An overview of query optimization in relational systems. In PODS. ACM, 34--43. DOI:http://dx.doi.org/10.1145/275487.275492 Google ScholarDigital Library
- Chandra Chekuri and Anand Rajaraman. 2000. Conjunctive query containment revisited. Theor. Comput. Sci. 239, 2 (2000), 211--229. Google ScholarDigital Library
- Jianer Chen, Songjian Lu, Sing-Hoi Sze, and Fenghui Zhang. 2007. Improved algorithms for path, matching, and packing problems. In SODA, Nikhil Bansal, Kirk Pruhs, and Clifford Stein (Eds.). SIAM, 298--307. Google ScholarDigital Library
- Martin Davis, George Logemann, and Donald Loveland. 1962. A machine program for theorem-proving. Comm. ACM 5 (1962), 394--397. Google ScholarDigital Library
- Martin Davis and Hilary Putnam. 1960. A computing procedure for quantification theory. J. Assoc. Comput. Mach. 7 (1960), 201--215. Google ScholarDigital Library
- N. G. de Bruijn, Ca. van Ebbenhorst Tengbergen, and D. Kruyswijk. 1951. On the set of divisors of a number. Nieuw Arch. Wiskunde (2) 23 (1951), 191--193.Google Scholar
- Rina Dechter. 1990. Enhancement schemes for constraint processing: Backjumping, learning, and cutset decomposition. Artif. Intell. 41, 3 (1990), 273--312. Google ScholarDigital Library
- Rina Dechter. 2003. Constraint Processing. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarDigital Library
- Rina Dechter and Judea Pearl. 1988. Tree-clustering schemes for constraint-processing. In AAAI, Howard E. Shrobe, Tom M. Mitchell, and Reid G. Smith (Eds.). AAAI Press/MIT Press, 150--154. Google ScholarDigital Library
- Rina Dechter and Judea Pearl. 1989. Tree clustering for constraint networks. Artific. Intell. 38, 3 (1989), 353--366. Google ScholarDigital Library
- Rina Dechter and Irina Rish. 1994. Directional resolution: The Davis-Putnam procedure, revisited. In KR, Jon Doyle, Erik Sandewall, and Pietro Torasso (Eds.). Morgan Kaufmann, 134--145.Google Scholar
- Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. 2000. Adaptive set intersections, unions, and differences. In SODA. 743--752. Google ScholarDigital Library
- Niklas Eén and Niklas Sörensson. 2003. An extensible SAT-solver. In SAT (Lecture Notes in Computer Science), Enrico Giunchiglia and Armando Tacchella (Eds.), Vol. 2919. Springer, 502--518.Google Scholar
- Ronald Fagin. 1983. Degrees of acyclicity for hypergraphs and relational database schemes. J. ACM 30, 3 (1983), 514--550. Google ScholarDigital Library
- Ronald Fagin, Amnon Lotem, and Moni Naor. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’01). ACM, New York, NY, 102--113. DOI:http://dx.doi.org/10.1145/375551.375567 Google ScholarDigital Library
- Wenfei Fan, Floris Geerts, Yang Cao, Ting Deng, and Ping Lu. 2015. Querying big data by accessing small data. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’15). ACM, New York, NY, 173--184. DOI:http://dx.doi.org/10.1145/2745754.2745771 Google ScholarDigital Library
- Ehud Friedgut and Jeff Kahn. 1998. On the number of copies of one hypergraph in another. Israel J. Math. 105 (1998), 251--256. DOI:http://dx.doi.org/10.1007/BF02780332Google ScholarCross Ref
- Andreas Goerdt. 1992. Davis-Putnam resolution versus unrestricted resolution. Ann. Math. Artific. Intell. 6, 1--3 (1992), 169--184. DOI:http://dx.doi.org/10.1007/BF01531027Google Scholar
- Georg Gottlob, Nicola Leone, and Francesco Scarcello. 2003. Robbers, marshals, and guards: Game theoretic and logical characterizations of hypertree width. J. Comput. Syst. Sci. 66, 4 (2003), 775--808. Google ScholarDigital Library
- Goetz Graefe. 1993. Query evaluation techniques for large databases. Comput. Surv. 25, 2 (June 1993), 73--170. Google ScholarDigital Library
- Jerrold R. Griggs. 1984. Maximum antichains in the product of chains. Order 1, 1 (1984), 21--28. DOI:http://dx.doi.org/10.1007/BF00396270Google ScholarCross Ref
- Martin Grohe. 2013. Bounds and algorithms for joins via fractional edge covers. In Search of Elegance in the Theory and Practice of Computation (Lecture Notes in Computer Science), Val Tannen, Limsoon Wong, Leonid Libkin, Wenfei Fan, Wang-Chiew Tan, and Michael P. Fourman (Eds.), Vol. 8000. Springer, 321--338.Google Scholar
- Martin Grohe and Dániel Marx. 2006. Constraint solving via fractional edge covers. In SODA. ACM, 289--298. Google ScholarDigital Library
- Marc Gyssens, Peter Jeavons, and David A. Cohen. 1994. Decomposing constraint satisfaction problems using database techniques. Artif. Intell. 66, 1 (1994), 57--89. Google ScholarDigital Library
- Marc Gyssens and Jan Paredaens. 1982. A decomposition methodology for cyclic databases. In Advances in Data Base Theory. 85--122.Google Scholar
- Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1378--1389. Google ScholarDigital Library
- Phokion G. Kolaitis and Moshe Y. Vardi. 2000. Conjunctive-query containment and constraint satisfaction. J. Comput. Syst. Sci. 61, 2 (2000), 302--332. Google ScholarDigital Library
- David Maier. 1983. The Theory of Relational Databases. Computer Science Press. Google ScholarDigital Library
- Joao Marques-Silva, Inês Lynce, and Sharad Malik. 2009. Conflict-driven clause learning SAT solvers. Handbook of Satisfiability 185 (2009), 131--153.Google Scholar
- Dániel Marx. 2010a. Approximating fractional hypertree width. ACM Trans. Algorithms 6, 2 (2010), 29:1--29:17. Google ScholarDigital Library
- Dániel Marx. 2010b. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. In STOC. 735--744. Google ScholarDigital Library
- Dániel Marx. 2011. Tractable structures for constraint satisfaction with truth tables. Theory Comput. Syst. 48, 3 (2011), 444--464. Google ScholarDigital Library
- Dániel Marx. 2013. Tractable hypergraph properties for constraint satisfaction and conjunctive queries. J. ACM 60, 6 (2013), 42. DOI:http://dx.doi.org/10.1145/2535926 Google ScholarDigital Library
- W. J. Masek. 1979. Some NP-complete set covering problems. Unpublished manuscript.Google Scholar
- R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5594 (October 2002), 824--827.Google ScholarCross Ref
- Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. 2001. Chaff: Engineering an efficient SAT solver. In DAC. ACM, 530--535. Google ScholarDigital Library
- Hung Q. Ngo, Dung T. Nguyen, Christopher Ré, and Atri Rudra. 2014. Beyond worst-case analysis for joins with minesweeper. In PODS. 234--245. Google ScholarDigital Library
- Hung Q. Ngo, Ely Porat, Christopher Ré, and Atri Rudra. 2012. Worst-case optimal join algorithms {Extended abstract}. In PODS. 37--48. Google ScholarDigital Library
- Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2013. Skew strikes back: New developments in the theory of join algorithms. In SIGMOD RECORD. 5--16. Google ScholarDigital Library
- Dung Nguyen, Molham Aref, Martin Bravenboer, George Kollias, Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2015. Join processing for graph patterns: An old dog with new tricks. In Proceedings of the GRADES’15 (GRADES’15). ACM, New York, NY, Article 2, 8 pages. DOI:http://dx.doi.org/10.1145/2764947.2764948 Google ScholarDigital Library
- Dan Olteanu and Jakub Zavodny. 2015. Size bounds for factorised representations of query results. ACM Trans. Database Syst. 40, 1 (March 2015), Article No. 2, 2:1--2:44. Google ScholarDigital Library
- Patrick E. O’Neil and Goetz Graefe. 1995. Multi-table joins through bitmapped join indices. SIGMOD Rec. 24, 3 (1995), 8--11. Google ScholarDigital Library
- Patrick E. O’Neil and Dallan Quass. 1997. Improved query performance with variant indexes. In SIGMOD Conference, Joan Peckham (Ed.). ACM Press, 38--49. Google ScholarDigital Library
- Mark H. Overmars and Chee-Keng Yap. 1991. New upper bounds in Klees measure problem. SIAM J. Comput. 20, 6 (1991), 1034--1045. DOI:http://dx.doi.org/10.1137/0220065 Google ScholarDigital Library
- Christos H. Papadimitriou and Mihalis Yannakakis. 1997. On the complexity of database queries. In PODS. 12--19. Google ScholarDigital Library
- Mihai Pǎtraşcu. 2010. Towards polynomial lower bounds for dynamic problems. In Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC’10). 603--610. Google ScholarDigital Library
- Judea Pearl. 1989. Probabilistic Reasoning in Intelligent Systems - Networks of Plausible Inference. Morgan Kaufmann, I--XIX, 1--552 pages. Google ScholarDigital Library
- Natasa Przulj, Derek G. Corneil, and Igor Jurisica. 2004. Modeling interactome: Scale-free or geometric? Bioinformatics 20, 18 (2004), 3508--3515. Google ScholarDigital Library
- Raghu Ramakrishnan and Johannes Gehrke. 2003. Database Management Systems (3rd ed.). McGraw-Hill, New York, NY. Google ScholarDigital Library
- Neil Robertson and P. D. Seymour. 1986. Graph minors. II. Algorithmic aspects of tree-width. J. Algorithms 7, 3 (1986), 309--322. DOI:http://dx.doi.org/10.1016/0196-6774(86)90023-4Google ScholarCross Ref
- Tim Roughgarden. 2009. Lecture notes on beyond worst-case analysis (CS264). http://theory.stanford.edu/∼tim/notes.html.Google Scholar
- Francesco Scarcello. 2005. Query answering exploiting structural properties. SIGMOD Rec. 34, 3 (2005), 91--99. Google ScholarDigital Library
- João P. Marques Silva and Karem A. Sakallah. 1996. GRASP - a new search algorithm for satisfiability. In ICCAD. 220--227. Google ScholarDigital Library
- João P. Marques Silva and Karem A. Sakallah. 1999. GRASP: A search algorithm for propositional satisfiability. IEEE Trans. Comput. 48, 5 (1999), 506--521. Google ScholarDigital Library
- Israel Spiegler and Rafi Maayan. 1985. Storage and retrieval considerations of binary data bases. Inf. Process. Manage. 21, 3 (1985), 233--254. Google ScholarDigital Library
- Siddharth Suri and Sergei Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW. 607--614. Google ScholarDigital Library
- Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In ICDM. IEEE Computer Society, 608--617. DOI:http://dx.doi.org/10.1109/ICDM.2008.72 Google ScholarDigital Library
- Jeffrey D. Ullman. 1989. Principles of Database and Knowledge-Base Systems, Vol. II. Computer Science Press.Google ScholarDigital Library
- Todd L. Veldhuizen. 2014. Triejoin: A simple, worst-case optimal join algorithm. In ICDT. 96--106.Google Scholar
- Mihalis Yannakakis. 1981. Algorithms for acyclic database schemes. In VLDB. 82--94. Google ScholarDigital Library
- Hantao Zhang. 1997. SATO: An efficient propositional prover. In CADE (Lecture Notes in Computer Science), William McCune (Ed.), Vol. 1249. Springer, 272--275. Google ScholarDigital Library
Index Terms
- Joins via Geometric Resolutions: Worst Case and Beyond
Recommendations
Joins via Geometric Resolutions: Worst-case and Beyond
PODS '15: Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsWe present a simple geometric framework for the relational join. Using this framework, we design an algorithm that achieves the fractional hypertree-width bound, which generalizes classical and recent worst-case algorithmic results on computing joins. ...
The SB-index and the HSB-index: efficient indices for spatial data warehouses
Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding ...
HSJ-Solver: a new method based on GHD for answering conjunctive queries and solving constraint satisfaction problems
AbstractEvaluating conjunctive queries (CQs) is NP-hard in general; however, acyclic CQs or nearest acyclic CQs can be evaluated in polynomial time. Many structural methods for characterising such classes are proposed in the literature. However, ...
Comments