research-article

Adopting worst-case optimal joins in relational database systems

Authors:
Michael Freitag

Technische Universität München

Technische Universität München
View Profile

,
Maximilian Bandle

Technische Universität München

Technische Universität München
View Profile

,
Tobias Schmidt

Technische Universität München

Technische Universität München
View Profile

,
Alfons Kemper

Technische Universität München

Technische Universität München
View Profile

,
Thomas Neumann

Technische Universität München

Technische Universität München
View Profile

Proceedings of the VLDB Endowment Volume 13 Issue 12pp 1891–1904https://doi.org/10.14778/3407790.3407797

Published:01 July 2020Publication History

Proceedings of the VLDB Endowment

Abstract

Worst-case optimal join algorithms are attractive from a theoretical point of view, as they offer asymptotically better runtime than binary joins on certain types of queries. In particular, they avoid enumerating large intermediate results by processing multiple input relations in a single multi-way join. However, existing implementations incur a sizable overhead in practice, primarily since they rely on suitable ordered index structures on their input. Systems that support worst-case optimal joins often focus on a specific problem domain, such as read-only graph analytic queries, where extensive precomputation allows them to mask these costs.

In this paper, we present a comprehensive implementation approach for worst-case optimal joins that is practical within general-purpose relational database management systems supporting both hybrid transactional and analytical workloads. The key component of our approach is a novel hash-based worst-case optimal join algorithm that relies only on data structures that can be built efficiently during query execution. Furthermore, we implement a hybrid query optimizer that intelligently and transparently combines both binary and multi-way joins within the same query plan. We demonstrate that our approach far outperforms existing systems when worst-case optimal joins are beneficial while sacrificing no performance when they are not.

References

C. Aberger. EmptyHeaded GitHub repository. https://github.com/HazyResearch/EmptyHeaded.Google Scholar
C. R. Aberger, A. Lamb, K. Olukotun, and C. Ré. LevelHeaded: A unified engine for business intelligence and linear algebra querying. In ICDE, pages 449--460. IEEE Computer Society, 2018.Google ScholarCross Ref
C. R. Aberger, A. Lamb, S. Tu, A. Nötzli, K. Olukotun, and C. Ré. EmptyHeaded: A relational engine for graph processing. ACM Trans. Database Syst., 42(4):20:1--20:44, 2017. Google ScholarDigital Library
F. N. Afrati and J. D. Ullman. Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng., 23(9):1282--1298, 2011. Google ScholarDigital Library
K. Alway, E. Blais, and S. Salihoglu. Box covers and domain orderings for beyond worst-case join processing. CoRR, abs/1909.12102, 2019.Google Scholar
K. Ammar, F. McSherry, S. Salihoglu, and M. Joglekar. Distributed evaluation of subgraph queries using worst-case optimal and low-memory dataflows. PVLDB, 11(6):691--704, 2018. Google ScholarDigital Library
A. Appleby. Murmurhash GitHub repository. https://github.com/aappleby/smhasher.Google Scholar
M. Aref, B. ten Cate, T. J. Green, B. Kimelfeld, D. Olteanu, E. Pasalic, T. L. Veldhuizen, and G. Washburn. Design and implementation of the LogicBlox system. In SIGMOD Conference, pages 1371--1382. ACM, 2015. Google ScholarDigital Library
A. Atserias, M. Grohe, and D. Marx. Size bounds and query plans for relational joins. SIAM J. Comput., 42(4):1737--1767, 2013.Google ScholarDigital Library
R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD Conference, pages 261--272. ACM, 2000. Google ScholarDigital Library
L. Backstrom, D. P. Huttenlocher, J. M. Kleinberg, and X. Lan. Group formation in large social networks: membership, growth, and evolution. In KDD, pages 44--54. ACM, 2006. Google ScholarDigital Library
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373. IEEE Computer Society, 2013. Google ScholarDigital Library
S. Chu, M. Balazinska, and D. Suciu. From theory to practice: Efficient join query evaluation in a parallel database system. In SIGMOD Conference, pages 63--78. ACM, 2015. Google ScholarDigital Library
A. Fekete, B. Franks, H. Jordan, and B. Scholz. Worst-case optimal radix triejoin. CoRR, abs/1912.12747, 2019.Google Scholar
M. Freitag, M. Bandle, T. Schmidt, A. Kemper, and T. Neumann. Combining worst-case optimal and traditional binary join processing. Technical Report TUM-I2082, Technische Universität München, 2020.Google Scholar
M. Freitag, M. Bandle, T. Schmidt, A. Kemper, and T. Neumann. Queries used in the experimental evaluation, Jan. 2020. https://github.com/freitmi/queries-vldb2020.Google Scholar
M. Freitag and T. Neumann. Every row counts: Combining sketches and sampling for accurate group-by result estimates. In CIDR, 2019.Google Scholar
G. Gottlob, M. Grohe, N. Musliu, M. Samer, and F. Scarcello. Hypertree decompositions: Structure, algorithms, and applications. In WG, volume 3787 of Lecture Notes in Computer Science, pages 1--15. Springer, 2005. Google ScholarDigital Library
G. Graefe, R. Bunker, and S. Cooper. Hash joins and hash teams in microsoft SQL server. In VLDB, pages 86--97. Morgan Kaufmann, 1998. Google ScholarDigital Library
S. Helmer, R. Aly, T. Neumann, and G. Moerkotte. Indexing set-valued attributes with a multi-level extendible hashing scheme. In DEXA, volume 4653 of Lecture Notes in Computer Science, pages 98--108. Springer, 2007. Google ScholarDigital Library
M. Henderson and R. Lawrence. Are multi-way joins actually useful? In ICEIS (1), pages 13--22. SciTePress, 2013.Google Scholar
A. Hogan, C. Riveros, C. Rojas, and A. Soto. A worst-case optimal join algorithm for SPARQL. In ISWC (1), volume 11778 of Lecture Notes in Computer Science, pages 258--275. Springer, 2019.Google Scholar
S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40--45, 2012.Google Scholar
Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In SIGMOD Conference, pages 268--277. ACM Press, 1991. Google ScholarDigital Library
M. Joglekar, R. Puttagunta, and C. Ré. Aggregations over generalized hypertree decompositions. CoRR, abs/1508.07532, 2015.Google Scholar
M. R. Joglekar, R. Puttagunta, and C. Ré. AJAR: aggregations and joins over annotated relations. In PODS, pages 91--106. ACM, 2016. Google ScholarDigital Library
O. Kalinsky, Y. Etsion, and B. Kimelfeld. Flexible caching in trie joins. In EDBT, pages 282--293. OpenProceedings.org, 2017.Google Scholar
A. Kara, H. Q. Ngo, M. Nikolic, D. Olteanu, and H. Zhang. Counting triangles under updates in worst-case optimal time. In ICDT, volume 127 of LIPIcs, pages 4:1--4:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019.Google Scholar
A. Kara and D. Olteanu. Covers of query results. In ICDT, volume 98 of LIPIcs, pages 16:1--16:22. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.Google Scholar
A. Kemper, D. Kossmann, and C. Wiesner. Generalised hash teams for join and group-by. In VLDB, pages 30--41. Morgan Kaufmann, 1999. Google ScholarDigital Library
A. Kemper and T. Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011. Google ScholarDigital Library
M. A. Khamis, R. R. Curtin, B. Moseley, H. Q. Ngo, X. Nguyen, D. Olteanu, and M. Schleich. On functional aggregate queries with additive inequalities. In PODS, pages 414--431. ACM, 2019. Google ScholarDigital Library
M. A. Khamis, H. Q. Ngo, C. Ré, and A. Rudra. Joins via geometric resolutions: Worst case and beyond. ACM Trans. Database Syst., 41(4):22:1--22:45, 2016. Google ScholarDigital Library
M. A. Khamis, H. Q. Ngo, and A. Rudra. FAQ: questions asked frequently. In PODS, pages 13--28. ACM, 2016. Google ScholarDigital Library
P. Koutris, P. Beame, and D. Suciu. Worst-case optimal algorithms for parallel query processing. In ICDT, volume 48 of LIPIcs, pages 8:1--8:18. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.Google Scholar
H. Kwak, C. Lee, H. Park, and S. B. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600. ACM, 2010. Google ScholarDigital Library
V. Leis, P. A. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD Conference, pages 743--754. ACM, 2014. Google ScholarDigital Library
V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015. Google ScholarDigital Library
J. Leskovec, D. P. Huttenlocher, and J. M. Kleinberg. Signed networks in social media. In CHI, pages 1361--1370. ACM, 2010. Google ScholarDigital Library
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data.Google Scholar
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29--123, 2009.Google ScholarCross Ref
A. Mhedhbi and S. Salihoglu. Optimizing subgraph queries by combining binary and worst-case optimal joins. PVLDB, 12(11):1692--1704, 2019. Google ScholarDigital Library
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011. Google ScholarDigital Library
T. Neumann and M. Freitag. Umbra: A disk-based system with in-memory performance. In CIDR, 2020.Google Scholar
H. Q. Ngo, D. T. Nguyen, C. Ré, and A. Rudra. Beyond worst-case analysis for joins with minesweeper. In PODS, pages 234--245. ACM, 2014. Google ScholarDigital Library
H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case optimal join algorithms. J. ACM, 65(3):16:1--16:40, 2018. Google ScholarDigital Library
H. Q. Ngo, C. Ré, and A. Rudra. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Record, 42(4):5--16, 2013. Google ScholarDigital Library
D. T. Nguyen, M. Aref, M. Bravenboer, G. Kollias, H. Q. Ngo, C. Ré, and A. Rudra. Join processing for graph patterns: An old dog with new tricks. In GRADES@SIGMOD/PODS, pages 2:1--2:8. ACM, 2015.Google Scholar
A. Prokopec, P. Bagwell, and M. Odersky. Lock-free resizeable concurrent tries. In LCPC, volume 7146 of Lecture Notes in Computer Science, pages 156--170. Springer, 2011.Google Scholar
S. Ramabhadran, S. Ratnasamy, J. M. Hellerstein, and S. Shenker. Brief announcement: Prefix hash tree. In PODC, page 368. ACM, 2004. Google ScholarDigital Library
M. Richardson, R. Agrawal, and P. M. Domingos. Trust management for the semantic web. In International Semantic Web Conference, volume 2870 of Lecture Notes in Computer Science, pages 351--368. Springer, 2003. Google ScholarDigital Library
J. A. Rogers. AquaHash GitHub repository. https://github.com/jandrewrogers/AquaHash.Google Scholar
T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In WEA, volume 3503 of Lecture Notes in Computer Science, pages 606--609. Springer, 2005. Google ScholarDigital Library
T. L. Veldhuizen. Leapfrog triejoin: A simple, worst-case optimal join algorithm. In ICDT, pages 96--106, 2014.Google Scholar
A. Vogelsgesang, M. Haubenschild, J. Finis, A. Kemper, V. Leis, T. Mühlbauer, T. Neumann, and M. Then. Get real: How benchmarks fail to represent the real world. In DBTest@SIGMOD, pages 1:1--1:6. ACM, 2018.Google Scholar
H. Wu, D. Zinn, M. Aref, and S. Yalamanchili. Multipredicate join algorithms for accelerating relational graph processing on GPUs. In ADMS@VLDB, pages 1--12, 2014.Google Scholar
K. Xirogiannopoulos and A. Deshpande. Memory-efficient group-by aggregates over multi-way joins. CoRR, abs/1906.05745, 2019.Google Scholar
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In ICDM, pages 745--754. IEEE Computer Society, 2012. Google ScholarDigital Library
W. Zhang, R. Cheng, and B. Kao. Evaluating multi-way joins over discounted hitting time. In ICDE, pages 724--735. IEEE Computer Society, 2014.Google ScholarCross Ref
X. Zhang, L. Chen, and M. Wang. Efficient multi-way theta-join processing using mapreduce. PVLDB, 5(11):1184--1195, 2012. Google ScholarDigital Library
Z. Zhang, H. Deshmukh, and J. M. Patel. Data partitioning for in-memory systems: Myths, challenges, and opportunities. In CIDR, 2019.Google Scholar
G. Zhu, X. Wu, L. Yin, H. Wang, R. Gu, C. Yuan, and Y. Huang. HyMJ: A hybrid structure-aware approach to distributed multi-way join query. In ICDE, pages 1726--1729. IEEE, 2019.Google ScholarCross Ref

Recommendations

Worst-case Optimal Join Algorithms

Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a new algorithm to process these queries optimally in ...
Read More
Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins
Best of PODS 2019 and Regular Papers

We study the problem of optimizing one-time and continuous subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in ...
Read More
On optimizing relational self-joins
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology

Self-join, which joins a relation with itself, is a prevalent operation in relational database systems. Despite its wide applicability, there has been little attention devoted to improving its performance. In this paper, we present SCALE (<u>S</u>ort ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland, Australia
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2020
Published in pvldb Volume 13, Issue 12
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 600
  Total Downloads
- Downloads (Last 12 months)180
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Adopting worst-case optimal joins in relational database systems

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Worst-case Optimal Join Algorithms

Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins

On optimizing relational self-joins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Adopting worst-case optimal joins in relational database systems

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Worst-case Optimal Join Algorithms

Optimizing One-time and Continuous Subgraph Queries using Worst-case Optimal Joins

On optimizing relational self-joins

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media