Skip to main content
Erschienen in: The VLDB Journal 3/2022

02.11.2021 | Regular Paper

Accelerating multi-way joins on the GPU

verfasst von: Zhuohang Lai, Xibo Sun, Qiong Luo, Xiaolong Xie

Erschienen in: The VLDB Journal | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graphic processing units (GPUs) have been employed as hardware accelerators for online analytics. However, multi-way joins, which are common in analytic workloads, are inefficient on GPUs. Therefore, we propose to accelerate two representative multi-way join algorithms on the GPU: a multi-way hash join (MHJ) and the worst-case optimal Leapfrog Triejoin (LFTJ). Specifically, we design a warp-based parallelization strategy to reduce thread divergence and to facilitate coalesced memory access in parallel searches in a table. We further enhance our implementations with a set of GPU-friendly optimizations, including dynamic workload sharing among threads and elimination of the result counting phase. Additionally, we enable out-of-core multi-way joins with software pipelining. Our experiments show that our optimized MHJ and LFTJ outperform the state-of-the-art GPU algorithms by a factor of up to 67 on an NVIDIA V100 GPU.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
To be consistent with AMHJ, we use join order in ALFTJ to refer to the attribute order.
 
2
The Profiler cannot profile the data prefetch from CPU to GPU due to a bug of the Nvidia driver along with CUDA 10.2. Therefore, we invoke a dummy kernel in Stream 16 right before the prefetch operation to identify its start position in the timeline.
 
Literatur
2.
Zurück zum Zitat Aghajarian, D., Puri, S., Prasad, S.K.: GCMF: an efficient end-to-end spatial join system over large polygonal datasets on GPGPU platform. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 18:1–18:10 (2016). https://doi.org/10.1145/2996913.2996982 Aghajarian, D., Puri, S., Prasad, S.K.: GCMF: an efficient end-to-end spatial join system over large polygonal datasets on GPGPU platform. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 18:1–18:10 (2016). https://​doi.​org/​10.​1145/​2996913.​2996982
4.
Zurück zum Zitat Alcantara, D.A., Volkov, V., Sengupta, S., Mitzenmacher, M., Owens, J.D., Amenta, N.: Building an efficient hash table on the GPU. In: GPU Computing Gems Jade Edition, pp. 39–53 (2012) Alcantara, D.A., Volkov, V., Sengupta, S., Mitzenmacher, M., Owens, J.D., Amenta, N.: Building an efficient hash table on the GPU. In: GPU Computing Gems Jade Edition, pp. 39–53 (2012)
7.
Zurück zum Zitat Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013) Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013)
9.
Zurück zum Zitat Barber, R., Lohman, G.M., Pandis, I., Raman, V., Sidle, R., Attaluri, G.K., Chainani, N., Lightstone, S., Sharpe, D.: Memory-efficient hash joins. PVLDB 8(4), 353–364 (2014) Barber, R., Lohman, G.M., Pandis, I., Raman, V., Sidle, R., Attaluri, G.K., Chainani, N., Lightstone, S., Sharpe, D.: Memory-efficient hash joins. PVLDB 8(4), 353–364 (2014)
10.
Zurück zum Zitat Barthels, C., Alonso, G., Hoefler, T., Schneider, T., Müller, I.: Distributed join algorithms on thousands of cores. PVLDB 10(5), 517–528 (2017) Barthels, C., Alonso, G., Hoefler, T., Schneider, T., Müller, I.: Distributed join algorithms on thousands of cores. PVLDB 10(5), 517–528 (2017)
13.
Zurück zum Zitat Böhm, C., Noll, R., Plant, C., Zherdin, A.: Index-supported similarity join on graphics processors. DBIS (2009) Böhm, C., Noll, R., Plant, C., Zherdin, A.: Index-supported similarity join on graphics processors. DBIS (2009)
14.
Zurück zum Zitat Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009) Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009)
28.
Zurück zum Zitat He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB 6(10), 889–900 (2013) He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB 6(10), 889–900 (2013)
29.
Zurück zum Zitat He, J., Zhang, S., He, B.: In-cache query co-processing on coupled CPU-GPU architectures. PVLDB 8(4), 329–340 (2014) He, J., Zhang, S., He, B.: In-cache query co-processing on coupled CPU-GPU architectures. PVLDB 8(4), 329–340 (2014)
30.
Zurück zum Zitat Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6(9), 709–720 (2013) Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6(9), 709–720 (2013)
32.
Zurück zum Zitat Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L., Monet, D.B.: Two decades of research in column-oriented database architectures. IEEE Data. Eng. Bull. 35(1), 40–45 (2012) Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L., Monet, D.B.: Two decades of research in column-oriented database architectures. IEEE Data. Eng. Bull. 35(1), 40–45 (2012)
33.
37.
Zurück zum Zitat Kersten, T., Leis, V., Kemper, A., Neumann, T., Pavlo, A., Boncz, P.A.: Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. PVLDB 11(13), 2209–2222 (2018) Kersten, T., Leis, V., Kemper, A., Neumann, T., Pavlo, A., Boncz, P.A.: Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. PVLDB 11(13), 2209–2222 (2018)
38.
Zurück zum Zitat Kim, C., Sedlar, E., Chhugani, J., Kaldewey, T., Nguyen, A.D., Blas, A.D., Lee, V.W., Satish, N., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009) Kim, C., Sedlar, E., Chhugani, J., Kaldewey, T., Nguyen, A.D., Blas, A.D., Lee, V.W., Satish, N., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009)
42.
Zurück zum Zitat Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539–550 (2011) Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539–550 (2011)
47.
Zurück zum Zitat Pirk, H., Manegold, S., Kersten, M.L.: Accelerating foreign-key joins using asymmetric memory channels. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2011, pp. 27–35 (2011). http://www.adms-conf.org/p27-PIRK.pdf Pirk, H., Manegold, S., Kersten, M.L.: Accelerating foreign-key joins using asymmetric memory channels. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2011, pp. 27–35 (2011). http://​www.​adms-conf.​org/​p27-PIRK.​pdf
55.
Zurück zum Zitat Wang, L., Wang, Y., Owens, J.D.: Fast parallel subgraph matching on the GPU. In: HPDC (2016) Wang, L., Wang, Y., Owens, J.D.: Fast parallel subgraph matching on the GPU. In: HPDC (2016)
56.
Zurück zum Zitat Wu, H., Zinn, D., Aref, M., Yalamanchili, S.: Multipredicate join algorithms for accelerating relational graph processing on GPUs. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2014, pp. 1–12 (2014). http://www.adms-conf.org/2014/adms14_wu.pdf Wu, H., Zinn, D., Aref, M., Yalamanchili, S.: Multipredicate join algorithms for accelerating relational graph processing on GPUs. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, ADMS 2014, pp. 1–12 (2014). http://​www.​adms-conf.​org/​2014/​adms14_​wu.​pdf
59.
Zurück zum Zitat Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. PVLDB 6(10), 817–828 (2013) Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. PVLDB 6(10), 817–828 (2013)
60.
Zurück zum Zitat Zinn, D., Wu, H., Wang, J., Aref, M., Yalamanchili, S.: General-purpose join algorithms for large graph triangle listing on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, GPGPU@PPoPP 2016, pp. 12–21 (2016). https://doi.org/10.1145/2884045.2884054 Zinn, D., Wu, H., Wang, J., Aref, M., Yalamanchili, S.: General-purpose join algorithms for large graph triangle listing on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, GPGPU@PPoPP 2016, pp. 12–21 (2016). https://​doi.​org/​10.​1145/​2884045.​2884054
Metadaten
Titel
Accelerating multi-way joins on the GPU
verfasst von
Zhuohang Lai
Xibo Sun
Qiong Luo
Xiaolong Xie
Publikationsdatum
02.11.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 3/2022
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-021-00708-y

Weitere Artikel der Ausgabe 3/2022

The VLDB Journal 3/2022 Zur Ausgabe

Premium Partner