skip to main content
research-article

Adopting worst-case optimal joins in relational database systems

Published:01 July 2020Publication History
Skip Abstract Section

Abstract

Worst-case optimal join algorithms are attractive from a theoretical point of view, as they offer asymptotically better runtime than binary joins on certain types of queries. In particular, they avoid enumerating large intermediate results by processing multiple input relations in a single multi-way join. However, existing implementations incur a sizable overhead in practice, primarily since they rely on suitable ordered index structures on their input. Systems that support worst-case optimal joins often focus on a specific problem domain, such as read-only graph analytic queries, where extensive precomputation allows them to mask these costs.

In this paper, we present a comprehensive implementation approach for worst-case optimal joins that is practical within general-purpose relational database management systems supporting both hybrid transactional and analytical workloads. The key component of our approach is a novel hash-based worst-case optimal join algorithm that relies only on data structures that can be built efficiently during query execution. Furthermore, we implement a hybrid query optimizer that intelligently and transparently combines both binary and multi-way joins within the same query plan. We demonstrate that our approach far outperforms existing systems when worst-case optimal joins are beneficial while sacrificing no performance when they are not.

References

  1. C. Aberger. EmptyHeaded GitHub repository. https://github.com/HazyResearch/EmptyHeaded.Google ScholarGoogle Scholar
  2. C. R. Aberger, A. Lamb, K. Olukotun, and C. Ré. LevelHeaded: A unified engine for business intelligence and linear algebra querying. In ICDE, pages 449--460. IEEE Computer Society, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  3. C. R. Aberger, A. Lamb, S. Tu, A. Nötzli, K. Olukotun, and C. Ré. EmptyHeaded: A relational engine for graph processing. ACM Trans. Database Syst., 42(4):20:1--20:44, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. N. Afrati and J. D. Ullman. Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng., 23(9):1282--1298, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Alway, E. Blais, and S. Salihoglu. Box covers and domain orderings for beyond worst-case join processing. CoRR, abs/1909.12102, 2019.Google ScholarGoogle Scholar
  6. K. Ammar, F. McSherry, S. Salihoglu, and M. Joglekar. Distributed evaluation of subgraph queries using worst-case optimal and low-memory dataflows. PVLDB, 11(6):691--704, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Appleby. Murmurhash GitHub repository. https://github.com/aappleby/smhasher.Google ScholarGoogle Scholar
  8. M. Aref, B. ten Cate, T. J. Green, B. Kimelfeld, D. Olteanu, E. Pasalic, T. L. Veldhuizen, and G. Washburn. Design and implementation of the LogicBlox system. In SIGMOD Conference, pages 1371--1382. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. SIAM J. Comput., 42(4):1737--1767, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD Conference, pages 261--272. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Backstrom, D. P. Huttenlocher, J. M. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD, pages 44--54. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373. IEEE Computer Society, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Chu, M. Balazinska, and D. Suciu. From theory to practice: Efficient join query evaluation in a parallel database system. In SIGMOD Conference, pages 63--78. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Fekete, B. Franks, H. Jordan, and B. Scholz. Worst-case optimal radix triejoin. CoRR, abs/1912.12747, 2019.Google ScholarGoogle Scholar
  15. M. Freitag, M. Bandle, T. Schmidt, A. Kemper, and T. Neumann. Combining worst-case optimal and traditional binary join processing. Technical Report TUM-I2082, Technische Universität München, 2020.Google ScholarGoogle Scholar
  16. M. Freitag, M. Bandle, T. Schmidt, A. Kemper, and T. Neumann. Queries used in the experimental evaluation, Jan. 2020. https://github.com/freitmi/queries-vldb2020.Google ScholarGoogle Scholar
  17. M. Freitag and T. Neumann. Every row counts: Combining sketches and sampling for accurate group-by result estimates. In CIDR, 2019.Google ScholarGoogle Scholar
  18. G. Gottlob, M. Grohe, N. Musliu, M. Samer, and F. Scarcello. Hypertree decompositions: Structure, algorithms, and applications. In WG, volume 3787 of Lecture Notes in Computer Science, pages 1--15. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Graefe, R. Bunker, and S. Cooper. Hash joins and hash teams in microsoft SQL server. In VLDB, pages 86--97. Morgan Kaufmann, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Helmer, R. Aly, T. Neumann, and G. Moerkotte. Indexing set-valued attributes with a multi-level extendible hashing scheme. In DEXA, volume 4653 of Lecture Notes in Computer Science, pages 98--108. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Henderson and R. Lawrence. Are multi-way joins actually useful? In ICEIS (1), pages 13--22. SciTePress, 2013.Google ScholarGoogle Scholar
  22. A. Hogan, C. Riveros, C. Rojas, and A. Soto. A worst-case optimal join algorithm for SPARQL. In ISWC (1), volume 11778 of Lecture Notes in Computer Science, pages 258--275. Springer, 2019.Google ScholarGoogle Scholar
  23. S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.Google ScholarGoogle Scholar
  24. Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD Conference, pages 268--277. ACM Press, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Joglekar, R. Puttagunta, and C. Ré. Aggregations over generalized hypertree decompositions. CoRR, abs/1508.07532, 2015.Google ScholarGoogle Scholar
  26. M. R. Joglekar, R. Puttagunta, and C. Ré. AJAR: aggregations and joins over annotated relations. In PODS, pages 91--106. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Kalinsky, Y. Etsion, and B. Kimelfeld. Flexible caching in trie joins. In EDBT, pages 282--293. OpenProceedings.org, 2017.Google ScholarGoogle Scholar
  28. A. Kara, H. Q. Ngo, M. Nikolic, D. Olteanu, and H. Zhang. Counting triangles under updates in worst-case optimal time. In ICDT, volume 127 of LIPIcs, pages 4:1--4:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019.Google ScholarGoogle Scholar
  29. A. Kara and D. Olteanu. Covers of query results. In ICDT, volume 98 of LIPIcs, pages 16:1--16:22. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.Google ScholarGoogle Scholar
  30. A. Kemper, D. Kossmann, and C. Wiesner. Generalised hash teams for join and group-by. In VLDB, pages 30--41. Morgan Kaufmann, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Kemper and T. Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. A. Khamis, R. R. Curtin, B. Moseley, H. Q. Ngo, X. Nguyen, D. Olteanu, and M. Schleich. On functional aggregate queries with additive inequalities. In PODS, pages 414--431. ACM, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. A. Khamis, H. Q. Ngo, C. Ré, and A. Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Trans. Database Syst., 41(4):22:1--22:45, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. A. Khamis, H. Q. Ngo, and A. Rudra. FAQ: questions asked frequently. In PODS, pages 13--28. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Koutris, P. Beame, and D. Suciu. Worst-case optimal algorithms for parallel query processing. In ICDT, volume 48 of LIPIcs, pages 8:1--8:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.Google ScholarGoogle Scholar
  36. H. Kwak, C. Lee, H. Park, and S. B. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. V. Leis, P. A. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD Conference, pages 743--754. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Leskovec, D. P. Huttenlocher, and J. M. Kleinberg. Signed networks in social media. In CHI, pages 1361--1370. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  41. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  42. A. Mhedhbi and S. Salihoglu. Optimizing subgraph queries by combining binary and worst-case optimal joins. PVLDB, 12(11):1692--1704, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. Neumann and M. Freitag. Umbra: A disk-based system with in-memory performance. In CIDR, 2020.Google ScholarGoogle Scholar
  45. H. Q. Ngo, D. T. Nguyen, C. Ré, and A. Rudra. Beyond worst-case analysis for joins with minesweeper. In PODS, pages 234--245. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. J. ACM, 65(3):16:1--16:40, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Record, 42(4):5--16, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. D. T. Nguyen, M. Aref, M. Bravenboer, G. Kollias, H. Q. Ngo, C. Ré, and A. Rudra. Join processing for graph patterns: An old dog with new tricks. In GRADES@SIGMOD/PODS, pages 2:1--2:8. ACM, 2015.Google ScholarGoogle Scholar
  49. A. Prokopec, P. Bagwell, and M. Odersky. Lock-free resizeable concurrent tries. In LCPC, volume 7146 of Lecture Notes in Computer Science, pages 156--170. Springer, 2011.Google ScholarGoogle Scholar
  50. S. Ramabhadran, S. Ratnasamy, J. M. Hellerstein, and S. Shenker. Brief announcement: Prefix hash tree. In PODC, page 368. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. M. Richardson, R. Agrawal, and P. M. Domingos. Trust management for the semantic web. In International Semantic Web Conference, volume 2870 of Lecture Notes in Computer Science, pages 351--368. Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. A. Rogers. AquaHash GitHub repository. https://github.com/jandrewrogers/AquaHash.Google ScholarGoogle Scholar
  53. T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In WEA, volume 3503 of Lecture Notes in Computer Science, pages 606--609. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. T. L. Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In ICDT, pages 96--106, 2014.Google ScholarGoogle Scholar
  55. A. Vogelsgesang, M. Haubenschild, J. Finis, A. Kemper, V. Leis, T. Mühlbauer, T. Neumann, and M. Then. Get real: How benchmarks fail to represent the real world. In DBTest@SIGMOD, pages 1:1--1:6. ACM, 2018.Google ScholarGoogle Scholar
  56. H. Wu, D. Zinn, M. Aref, and S. Yalamanchili. Multipredicate join algorithms for accelerating relational graph processing on GPUs. In ADMS@VLDB, pages 1--12, 2014.Google ScholarGoogle Scholar
  57. K. Xirogiannopoulos and A. Deshpande. Memory-efficient group-by aggregates over multi-way joins. CoRR, abs/1906.05745, 2019.Google ScholarGoogle Scholar
  58. J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In ICDM, pages 745--754. IEEE Computer Society, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. W. Zhang, R. Cheng, and B. Kao. Evaluating multi-way joins over discounted hitting time. In ICDE, pages 724--735. IEEE Computer Society, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  60. X. Zhang, L. Chen, and M. Wang. Efficient multi-way theta-join processing using mapreduce. PVLDB, 5(11):1184--1195, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Z. Zhang, H. Deshmukh, and J. M. Patel. Data partitioning for in-memory systems: Myths, challenges, and opportunities. In CIDR, 2019.Google ScholarGoogle Scholar
  62. G. Zhu, X. Wu, L. Yin, H. Wang, R. Gu, C. Yuan, and Y. Huang. HyMJ: A hybrid structure-aware approach to distributed multi-way join query. In ICDE, pages 1726--1729. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 12
    August 2020
    1710 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 July 2020
    Published in pvldb Volume 13, Issue 12

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader