Top

The VLDB Journal

Published in:

02-11-2021 | Regular Paper

Accelerating multi-way joins on the GPU

Authors: Zhuohang Lai, Xibo Sun, Qiong Luo, Xiaolong Xie

Published in: The VLDB Journal | Issue 3/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Graphic processing units (GPUs) have been employed as hardware accelerators for online analytics. However, multi-way joins, which are common in analytic workloads, are inefficient on GPUs. Therefore, we propose to accelerate two representative multi-way join algorithms on the GPU: a multi-way hash join (MHJ) and the worst-case optimal Leapfrog Triejoin (LFTJ). Specifically, we design a warp-based parallelization strategy to reduce thread divergence and to facilitate coalesced memory access in parallel searches in a table. We further enhance our implementations with a set of GPU-friendly optimizations, including dynamic workload sharing among threads and elimination of the result counting phase. Additionally, we enable out-of-core multi-way joins with software pipelining. Our experiments show that our optimized MHJ and LFTJ outperform the state-of-the-art GPU algorithms by a factor of up to 67 on an NVIDIA V100 GPU.

previous article RNE: computing shortest paths using road network embedding

next article An authorization model for query execution in the cloud

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

To be consistent with AMHJ, we use join order in ALFTJ to refer to the attribute order.

The Profiler cannot profile the data prefetch from CPU to GPU due to a bug of the Nvidia driver along with CUDA 10.2. Therefore, we invoke a dummy kernel in Stream 16 right before the prefetch operation to identify its start position in the timeline.

Aberger, C.R., Lamb, A., Tu, S., Nötzli, A., Olukotun, K., Ré, C.: Emptyheaded: a relational engine for graph processing. ACM Trans. Database Syst. 42(4), 20:1-20:44 (2017). https://doi.org/10.1145/3129246MathSciNetCrossRef

Aghajarian, D., Puri, S., Prasad, S.K.: GCMF: an efficient end-to-end spatial join system over large polygonal datasets on GPGPU platform. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 18:1–18:10 (2016). https://doi.org/10.1145/2996913.2996982

Alcantara, D.A., Sharf, A., Abbasinejad, F., Sengupta, S., Mitzenmacher, M., Owens, J.D., Amenta, N.: Real-time parallel hashing on the GPU. ACM Trans. Graph. 28(5), 154 (2009). https://doi.org/10.1145/1618452.1618500CrossRef

Alcantara, D.A., Volkov, V., Sengupta, S., Mitzenmacher, M., Owens, J.D., Amenta, N.: Building an efficient hash table on the GPU. In: GPU Computing Gems Jade Edition, pp. 39–53 (2012)

Appleby, A.: Murmurhash. http://code.google.com/p/smhasher/

Atserias, A., Grohe, M., Marx, D.: Size bounds and query plans for relational joins. In: 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 739–748 (2008). https://doi.org/10.1109/FOCS.2008.43

Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013)

Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In: ICDE, pp. 362–373 (2013). https://doi.org/10.1109/ICDE.2013.6544839

Barber, R., Lohman, G.M., Pandis, I., Raman, V., Sidle, R., Attaluri, G.K., Chainani, N., Lightstone, S., Sharpe, D.: Memory-efficient hash joins. PVLDB 8(4), 353–364 (2014)

10.

Barthels, C., Alonso, G., Hoefler, T., Schneider, T., Müller, I.: Distributed join algorithms on thousands of cores. PVLDB 10(5), 517–528 (2017)

11.

Bentley, J.L., Yao, A.C.: An almost optimal algorithm for unbounded searching. Inf. Process. Lett. 5(3), 82–87 (1976). https://doi.org/10.1016/0020-0190(76)90071-5MathSciNetCrossRefMATH

12.

Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: SIGMOD, pp. 37–48 (2011). https://doi.org/10.1145/1989323.1989328

13.

Böhm, C., Noll, R., Plant, C., Zherdin, A.: Index-supported similarity join on graphics processors. DBIS (2009)

14.

Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009)

15.

Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-pipelining query execution. In: CIDR, pp. 225–237 (2005). http://cidrdb.org/cidr2005/papers/P19.pdf

16.

Breß, S., Funke, H., Teubner, J.: Robust query processing in co-processor-accelerated databases. In: SIGMOD, pp. 1891–1906 (2016). https://doi.org/10.1145/2882903.2882936

17.

Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval - Implementing and Evaluating Search Engines (2010). http://mitpress.mit.edu/books/information-retrieval

18.

Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 442–446 (2004). https://doi.org/10.1137/1.9781611972740.43

19.

Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM Trans. Database Syst. 32(3), 17 (2007). https://doi.org/10.1145/1272743.1272747CrossRef

20.

Chu, S., Balazinska, M., Suciu, D.: From theory to practice: Efficient join query evaluation in a parallel database system. In: SIGMOD, pp. 63–78 (2015). https://doi.org/10.1145/2723372.2750545

21.

Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. VLDB J. 19(1), 3–20 (2010). https://doi.org/10.1007/s00778-009-0172-zCrossRef

22.

Council, T.: TPC benchmark H specification. http://www.tpc.org/tpch/

23.

Funke, H., Breß, S., Noll, S., Markl, V., Teubner, J.: Pipelined query processing in coprocessor environments. In: SIGMOD, pp. 1603–1618 (2018). https://doi.org/10.1145/3183713.3183734

24.

Gallet, B., Gowanlock, M.: Load imbalance mitigation optimizations for GPU-accelerated similarity joins. In: IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2019, pp. 396–405 (2019). https://doi.org/10.1109/IPDPSW.2019.00078

25.

Golab, L., Özsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: PVLDB, pp. 500–511 (2003). http://www.vldb.org/conf/2003/papers/S16P01.pdf

26.

Gowanlock, M., Karsin, B.: Accelerating the similarity self-join using the GPU. J. Parallel Distrib. Comput. 133, 107–123 (2019). https://doi.org/10.1016/j.jpdc.2019.06.005CrossRef

27.

He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational joins on graphics processors. In: SIGMOD, pp. 511–524 (2008). https://doi.org/10.1145/1376616.1376670

28.

He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB 6(10), 889–900 (2013)

29.

He, J., Zhang, S., He, B.: In-cache query co-processing on coupled CPU-GPU architectures. PVLDB 8(4), 329–340 (2014)

30.

Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6(9), 709–720 (2013)

31.

Herlihy, M., Shavit, N.: The art of multiprocessor programming. ACM SIGSOFT Softw. Eng. Not. 36(5), 52–53 (2011). https://doi.org/10.1145/2020976.2021006CrossRef

32.

Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L., Monet, D.B.: Two decades of research in column-oriented database architectures. IEEE Data. Eng. Bull. 35(1), 40–45 (2012)

33.

Jenkins, J., Arkatkar, I., Owens, J.D., Choudhary, A.N., Samatova, N.F.: Lessons learned from exploring the backtracking paradigm on the GPU. In: Euro-Par 2011 Parallel Processing—17th International Conference, vol. 6853, pp. 425–437 (2011). https://doi.org/10.1007/978-3-642-23397-5_42

34.

Kaldewey, T., Lohman, G.M., Müller, R., Volk, P.B.: GPU join processing revisited. In: Proceedings of the Eighth International Workshop on Data Management on New Hardware, DaMoN 2012, pp. 55–62 (2012). https://doi.org/10.1145/2236584.2236592

35.

Kalinsky, O., Etsion, Y., Kimelfeld, B.: Flexible caching in trie joins. In: Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, pp. 282–293 (2017). https://doi.org/10.5441/002/edbt.2017.26

36.

Kemper, A., Kossmann, D., Wiesner, C.: Generalised hash teams for join and group-by. In: PVLDB, pp. 30–41 (1999). http://www.vldb.org/conf/1999/P3.pdf

37.

Kersten, T., Leis, V., Kemper, A., Neumann, T., Pavlo, A., Boncz, P.A.: Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. PVLDB 11(13), 2209–2222 (2018)

38.

Kim, C., Sedlar, E., Chhugani, J., Kaldewey, T., Nguyen, A.D., Blas, A.D., Lee, V.W., Satish, N., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009)

39.

Lang, H., Leis, V., Albutiu, M., Neumann, T., Kemper, A.: Massively parallel NUMA-aware hash joins. In: Proceedings of the 1st International Workshop on In Memory Data Management and Analytics, IMDM 2013, pp. 1–12 (2013). http://www-db.in.tum.de/other/imdm2013/papers/Lang.pdf

40.

Lin, X., Zhang, R., Wen, Z., Wang, H., Qi, J.: Efficient subgraph matching using GPUs. In: Databases Theory and Applications—25th Australasian Database Conference, ADC 2014, vol. 8506, pp. 74–85 (2014). https://doi.org/10.1007/978-3-319-08608-8_7

41.

Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. 14(4), 709–730 (2002). https://doi.org/10.1109/TKDE.2002.1019210CrossRef

42.

Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539–550 (2011)

43.

Ngo, H.Q., Porat, E., Ré, C., Rudra, A.: Worst-case optimal join algorithms. In: PODS, pp. 37–48 (2012). https://doi.org/10.1145/2213556.2213565

44.

Ngo, H.Q., Ré, C., Rudra, A.: Skew strikes back: new developments in the theory of join algorithms. SIGMOD 42(4), 5–16 (2013). https://doi.org/10.1145/2590989.2590991CrossRef

45.

Nvidia: CUDA toolkit documentation. https://docs.nvidia.com/cuda/

46.

Paul, J., He, J., He, B.: GPL: A GPU-based pipelined query processing engine. In: SIGMOD, pp. 1935–1950 (2016). https://doi.org/10.1145/2882903.2915224

47.

Pirk, H., Manegold, S., Kersten, M.L.: Accelerating foreign-key joins using asymmetric memory channels. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2011, pp. 27–35 (2011). http://www.adms-conf.org/p27-PIRK.pdf

48.

Rui, R., Tu, Y.: Fast equi-join algorithms on GPUs: Design and implementation. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management, pp. 17:1–17:12 (2017). https://doi.org/10.1145/3085504.3085521

49.

Schneider, D.A., DeWitt, D.J.: Tradeoffs in processing complex join queries via hashing in multiprocessor database machines. In: PVLDB, pp. 469–480 (1990). http://www.vldb.org/conf/1990/P469.PDF

50.

Schuh, S., Chen, X., Dittrich, J.: An experimental comparison of thirteen relational equi-joins in main memory. In: SIGMOD, pp. 1961–1976 (2016). https://doi.org/10.1145/2882903.2882917

51.

Sioulas, P., Chrysogelos, P., Karpathiotakis, M., Appuswamy, R., Ailamaki, A.: Hardware-conscious hash-joins on GPUs. In: ICDE, pp. 698–709 (2019). https://doi.org/10.1109/ICDE.2019.00068

52.

Veldhuizen, T.L.: Triejoin: A simple, worst-case optimal join algorithm. In: Proc. 17th International Conference on Database Theory (ICDT), pp. 96–106 (2014). https://doi.org/10.5441/002/icdt.2014.13

53.

Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: PVLDB, pp. 285–296 (2003). http://www.vldb.org/conf/2003/papers/S10P01.pdf

54.

Wang, J., Yalamanchili, S.: Characterization and analysis of dynamic parallelism in unstructured GPU applications. In: 2014 IEEE International Symposium on Workload Characterization, IISWC 2014, pp. 51–60 (2014). https://doi.org/10.1109/IISWC.2014.6983039

55.

Wang, L., Wang, Y., Owens, J.D.: Fast parallel subgraph matching on the GPU. In: HPDC (2016)

56.

Wu, H., Zinn, D., Aref, M., Yalamanchili, S.: Multipredicate join algorithms for accelerating relational graph processing on GPUs. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2014, pp. 1–12 (2014). http://www.adms-conf.org/2014/adms14_wu.pdf

57.

Wu, S., Li, F., Mehrotra, S., Ooi, B.C.: Query optimization for massively parallel data processing. In: ACM Symposium on Cloud Computing in conjunction with SOSP 2011, p. 12 (2011). https://doi.org/10.1145/2038916.2038928

58.

Yabuta, M., Nguyen, A., Kato, S., Edahiro, M., Kawashima, H.: Relational joins on GPUs: a closer look. IEEE Trans. Parallel Distrib. Syst. 28(9), 2663–2673 (2017). https://doi.org/10.1109/TPDS.2017.2677451CrossRef

59.

Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. PVLDB 6(10), 817–828 (2013)

60.

Zinn, D., Wu, H., Wang, J., Aref, M., Yalamanchili, S.: General-purpose join algorithms for large graph triangle listing on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, GPGPU@PPoPP 2016, pp. 12–21 (2016). https://doi.org/10.1145/2884045.2884054

Title: Accelerating multi-way joins on the GPU
Authors: Zhuohang Lai
Xibo Sun
Qiong Luo
Xiaolong Xie
Publication date: 02-11-2021
Publisher: Springer Berlin Heidelberg
Published in: The VLDB Journal / Issue 3/2022
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI: https://doi.org/10.1007/s00778-021-00708-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2022

Privacy and efficiency guaranteed social subgraph matching

RNE: computing shortest paths using road network embedding

A survey of RDF stores & SPARQL engines for querying knowledge graphs

Fast fully dynamic labelling for distance queries

Continuous monitoring of moving skyline and top-k queries

Fairness in rankings and recommendations: an overview

Premium Partner