Skip to main content
Top
Published in: The VLDB Journal 3/2022

02-11-2021 | Regular Paper

Accelerating multi-way joins on the GPU

Authors: Zhuohang Lai, Xibo Sun, Qiong Luo, Xiaolong Xie

Published in: The VLDB Journal | Issue 3/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Graphic processing units (GPUs) have been employed as hardware accelerators for online analytics. However, multi-way joins, which are common in analytic workloads, are inefficient on GPUs. Therefore, we propose to accelerate two representative multi-way join algorithms on the GPU: a multi-way hash join (MHJ) and the worst-case optimal Leapfrog Triejoin (LFTJ). Specifically, we design a warp-based parallelization strategy to reduce thread divergence and to facilitate coalesced memory access in parallel searches in a table. We further enhance our implementations with a set of GPU-friendly optimizations, including dynamic workload sharing among threads and elimination of the result counting phase. Additionally, we enable out-of-core multi-way joins with software pipelining. Our experiments show that our optimized MHJ and LFTJ outperform the state-of-the-art GPU algorithms by a factor of up to 67 on an NVIDIA V100 GPU.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
To be consistent with AMHJ, we use join order in ALFTJ to refer to the attribute order.
 
2
The Profiler cannot profile the data prefetch from CPU to GPU due to a bug of the Nvidia driver along with CUDA 10.2. Therefore, we invoke a dummy kernel in Stream 16 right before the prefetch operation to identify its start position in the timeline.
 
Literature
2.
go back to reference Aghajarian, D., Puri, S., Prasad, S.K.: GCMF: an efficient end-to-end spatial join system over large polygonal datasets on GPGPU platform. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 18:1–18:10 (2016). https://doi.org/10.1145/2996913.2996982 Aghajarian, D., Puri, S., Prasad, S.K.: GCMF: an efficient end-to-end spatial join system over large polygonal datasets on GPGPU platform. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 18:1–18:10 (2016). https://​doi.​org/​10.​1145/​2996913.​2996982
4.
go back to reference Alcantara, D.A., Volkov, V., Sengupta, S., Mitzenmacher, M., Owens, J.D., Amenta, N.: Building an efficient hash table on the GPU. In: GPU Computing Gems Jade Edition, pp. 39–53 (2012) Alcantara, D.A., Volkov, V., Sengupta, S., Mitzenmacher, M., Owens, J.D., Amenta, N.: Building an efficient hash table on the GPU. In: GPU Computing Gems Jade Edition, pp. 39–53 (2012)
7.
go back to reference Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013) Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013)
9.
go back to reference Barber, R., Lohman, G.M., Pandis, I., Raman, V., Sidle, R., Attaluri, G.K., Chainani, N., Lightstone, S., Sharpe, D.: Memory-efficient hash joins. PVLDB 8(4), 353–364 (2014) Barber, R., Lohman, G.M., Pandis, I., Raman, V., Sidle, R., Attaluri, G.K., Chainani, N., Lightstone, S., Sharpe, D.: Memory-efficient hash joins. PVLDB 8(4), 353–364 (2014)
10.
go back to reference Barthels, C., Alonso, G., Hoefler, T., Schneider, T., Müller, I.: Distributed join algorithms on thousands of cores. PVLDB 10(5), 517–528 (2017) Barthels, C., Alonso, G., Hoefler, T., Schneider, T., Müller, I.: Distributed join algorithms on thousands of cores. PVLDB 10(5), 517–528 (2017)
13.
go back to reference Böhm, C., Noll, R., Plant, C., Zherdin, A.: Index-supported similarity join on graphics processors. DBIS (2009) Böhm, C., Noll, R., Plant, C., Zherdin, A.: Index-supported similarity join on graphics processors. DBIS (2009)
14.
go back to reference Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009) Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009)
28.
go back to reference He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB 6(10), 889–900 (2013) He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB 6(10), 889–900 (2013)
29.
go back to reference He, J., Zhang, S., He, B.: In-cache query co-processing on coupled CPU-GPU architectures. PVLDB 8(4), 329–340 (2014) He, J., Zhang, S., He, B.: In-cache query co-processing on coupled CPU-GPU architectures. PVLDB 8(4), 329–340 (2014)
30.
go back to reference Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6(9), 709–720 (2013) Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6(9), 709–720 (2013)
32.
go back to reference Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L., Monet, D.B.: Two decades of research in column-oriented database architectures. IEEE Data. Eng. Bull. 35(1), 40–45 (2012) Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L., Monet, D.B.: Two decades of research in column-oriented database architectures. IEEE Data. Eng. Bull. 35(1), 40–45 (2012)
33.
37.
go back to reference Kersten, T., Leis, V., Kemper, A., Neumann, T., Pavlo, A., Boncz, P.A.: Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. PVLDB 11(13), 2209–2222 (2018) Kersten, T., Leis, V., Kemper, A., Neumann, T., Pavlo, A., Boncz, P.A.: Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. PVLDB 11(13), 2209–2222 (2018)
38.
go back to reference Kim, C., Sedlar, E., Chhugani, J., Kaldewey, T., Nguyen, A.D., Blas, A.D., Lee, V.W., Satish, N., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009) Kim, C., Sedlar, E., Chhugani, J., Kaldewey, T., Nguyen, A.D., Blas, A.D., Lee, V.W., Satish, N., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009)
42.
go back to reference Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539–550 (2011) Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539–550 (2011)
47.
go back to reference Pirk, H., Manegold, S., Kersten, M.L.: Accelerating foreign-key joins using asymmetric memory channels. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2011, pp. 27–35 (2011). http://www.adms-conf.org/p27-PIRK.pdf Pirk, H., Manegold, S., Kersten, M.L.: Accelerating foreign-key joins using asymmetric memory channels. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2011, pp. 27–35 (2011). http://​www.​adms-conf.​org/​p27-PIRK.​pdf
55.
go back to reference Wang, L., Wang, Y., Owens, J.D.: Fast parallel subgraph matching on the GPU. In: HPDC (2016) Wang, L., Wang, Y., Owens, J.D.: Fast parallel subgraph matching on the GPU. In: HPDC (2016)
56.
go back to reference Wu, H., Zinn, D., Aref, M., Yalamanchili, S.: Multipredicate join algorithms for accelerating relational graph processing on GPUs. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2014, pp. 1–12 (2014). http://www.adms-conf.org/2014/adms14_wu.pdf Wu, H., Zinn, D., Aref, M., Yalamanchili, S.: Multipredicate join algorithms for accelerating relational graph processing on GPUs. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2014, pp. 1–12 (2014). http://​www.​adms-conf.​org/​2014/​adms14_​wu.​pdf
59.
go back to reference Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. PVLDB 6(10), 817–828 (2013) Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. PVLDB 6(10), 817–828 (2013)
60.
go back to reference Zinn, D., Wu, H., Wang, J., Aref, M., Yalamanchili, S.: General-purpose join algorithms for large graph triangle listing on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, GPGPU@PPoPP 2016, pp. 12–21 (2016). https://doi.org/10.1145/2884045.2884054 Zinn, D., Wu, H., Wang, J., Aref, M., Yalamanchili, S.: General-purpose join algorithms for large graph triangle listing on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, GPGPU@PPoPP 2016, pp. 12–21 (2016). https://​doi.​org/​10.​1145/​2884045.​2884054
Metadata
Title
Accelerating multi-way joins on the GPU
Authors
Zhuohang Lai
Xibo Sun
Qiong Luo
Xiaolong Xie
Publication date
02-11-2021
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 3/2022
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-021-00708-y

Other articles of this Issue 3/2022

The VLDB Journal 3/2022 Go to the issue

Premium Partner