Joins on high-bandwidth memory: a new level in the memory hierarchy

Pohl, Constantin; Sattler, Kai-Uwe; Graefe, Goetz

doi:10.1007/s00778-019-00546-z

Joins on high-bandwidth memory: a new level in the memory hierarchy

Special Issue Paper
Published: 13 July 2019

Volume 29, pages 797–817, (2020)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

1412 Accesses
5 Citations
Explore all metrics

Abstract

High-bandwidth memory (HBM) gives an additional opportunity for hardware performance benefits. The high available bandwidth compared to regular DRAM allows execution of many threads in parallel, avoiding memory stalls through many concurrent memory accesses This is especially interesting considering database join algorithms optimized for multicore CPUs, even more when running on a manycore processor like a Xeon Phi Knights Landing (KNL). The drawback of HBM, however, is its small capacity as well as under-utilization in random memory access patterns. In this paper, we analyze the impact of HBM on join processing on the KNL architecture. We evaluate main memory hash join and sort-merge join algorithms of relational DBMS as well as data stream joins, comparing execution times in different HBM configurations. Our results show performance gains up to 3\(\times \) for joins when HBM is used. Finally, we summarize our lessons learned, give additional advice for HBM utilization, and discuss generalizations for other levels of the memory hierarchy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The New Hardware Development Trend and the Challenges in Data Management and Analysis

Article Open access 24 September 2018

Wei Pan, Zhanhuai Li, … Chuliang Weng

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Xingqi Zou, Sheng Xu, … Yinhe Han

A Modern Primer on Processing in Memory

Notes

References

Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. Proc. VLDB Endow. 5(10), 1064–1075 (2012)
Article Google Scholar
Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. Proc. VLDB Endow. 7(1), 85–96 (2013)
Article Google Scholar
Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: Proceedings of the 2013 IEEE International Conference on Data Engineering, ICDE ’13, pp. 362–373. IEEE Computer Society (2013)
Barnes, T., Cook, B., Deslippe, J., Doerfler, D., Friesen, B., He, Y.H., Kurth, T., Koskela, T., Lobet, M., Malas, T., Oliker, L., Ovsyannikov, A., Sarje, A., Vay, J.L., Vincenti, H., Williams, S., Carrier, P., Wichmann, N., Wagner, M., Kent, P., Kerr, C., Dennis, J.: Evaluating and optimizing the NERSC workload on Knights Landing. In: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS ’16, pp. 43–53. IEEE Press (2016)
Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pp. 37–48. ACM (2011)
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR 2005, 2nd Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 4–7 January 2005, Online Proceedings, pp. 225–237 (2005)
Breß, S.: The design and implementation of CoGaDB: a column-oriented GPU-accelerated DBMS. Datenbank Spektrum 14(3), 199–209 (2014)
Article Google Scholar
Cheng, X., He, B., Du, X., Lau, C.T.: A study of main-memory hash joins on many-core processor: a case with Intel Knights Landing architecture. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, pp. 657–666. ACM (2017)
Cheng, X., He, B., Lu, M., Lau, C.T., Huynh, H.P., Goh, R.S.M.: Efficient query processing on many-core architectures: a case study with Intel Xeon Phi processor. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 2081–2084. ACM (2016)
Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database—an architecture overview. IEEE Data Eng. Bull. 35, 28–33 (2012)
Google Scholar
Graefe, G., Linville, A., Shapiro, L.D.: Sort versus hash revisited. IEEE Trans. Knowl. Data Eng. 6(6), 934–944 (1994)
Article Google Scholar
Gulisano, V., Nikolakopoulos, Y., Papatriantafilou, M., Tsigas, P.: ScaleJoin: a deterministic, disjoint-parallel and skew-resilient stream join. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 144–153 (2015)
He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU–GPU architecture. Proc. VLDB Endow. 6(10), 889–900 (2013)
Article Google Scholar
Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach. Proc. VLDB Endow. 8(6), 642–653 (2015)
Article Google Scholar
Karnagel, T., Habich, D., Schlegel, B., Lehner, W.: The HELLS-join: a heterogeneous stream join for extremely large windows. In: Proceedings of the 9th International Workshop on Data Management on New Hardware, DaMoN ’13, pp. 2:1–2:7. ACM (2013)
Lang, H., Leis, V., Albutiu, M.C., Neumann, T., Kemper, A.: Massively parallel NUMA-aware hash joins. In: Memory Data Management and Analysis, pp. 3–14. Springer International Publishing (2015)
Loh, G.H.: 3D-stacked memory architectures for multi-core processors. SIGARCH Comput. Archit. News 36(3), 453–464 (2008)
Article MathSciNet Google Scholar
Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. 14(4), 709–730 (2002)
Article Google Scholar
Peng, I.B., Gioiosa, R., Kestor, G., Laure, E., Markidis, S.: Exploring the performance benefit of hybrid memory system on HPC environments. CoRR (2017)
Pohl, C., Sattler, K.: Joins in a heterogeneous memory hierarchy: exploiting high-bandwidth memory. In: Proceedings of the 14th International Workshop on Data Management on New Hardware, Houston, TX, USA, 11 June 2018, pp. 8:1–8:10 (2018)
Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD ’15, pp. 1493–1508. ACM (2015)
Ramos, S., Hoefler, T.: Capability models for manycore memory systems: a case-study with Xeon Phi KNL. In: 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, FL, USA, 29 May–2 June 2017, pp. 297–306 (2017)
Schuh, S., Chen, X., Dittrich, J.: An Experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 1961–1976. ACM (2016)
Smith, S., Park, J., Karypis, G.: Sparse tensor factorization on many-core processors with high-bandwidth memory. In: 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS ’17, pp. 1058–1067 (2017)
Stonebraker, M., Cetintemel, U.: “One size fits all”: an idea whose time has come and gone. In: Proceedings of the 21st International Conference on Data Engineering, ICDE ’05, pp. 2–11. IEEE Computer Society (2005)
Teubner, J., Mueller, R.: How Soccer players would do stream joins. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pp. 625–636. ACM (2011)
Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Proceedings of the 1st International Conference on Parallel and Distributed Information Systems, PDIS ’91, pp. 68–77. IEEE Computer Society Press (1991)
Yu, X., Bezerra, G., Pavlo, A., Devadas, S., Stonebraker, M.: Staring into the abyss: an evaluation of concurrency control with one thousand cores. Proc. VLDB Endow. 8(3), 209–220 (2014)
Article Google Scholar
Zhang, S., He, J., He, B., Lu, M.: OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures. Proc. VLDB Endow. 6(12), 1374–1377 (2013)
Article Google Scholar

Download references

Acknowledgements

We thank Schloss Dagstuhl and the participants of Seminar 18251 for the valuable discussions and motivation that helped us to improve this paper. This work was partially funded by the German Research Foundation (DFG) within the SPP2037 under Grant No. SA 782/28.

Author information

Authors and Affiliations

TU Ilmenau, Ilmenau, Germany
Constantin Pohl & Kai-Uwe Sattler
Google, Madison, WI, USA
Goetz Graefe

Authors

Constantin Pohl
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Uwe Sattler
View author publications
You can also search for this author in PubMed Google Scholar
Goetz Graefe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Constantin Pohl.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pohl, C., Sattler, KU. & Graefe, G. Joins on high-bandwidth memory: a new level in the memory hierarchy. The VLDB Journal 29, 797–817 (2020). https://doi.org/10.1007/s00778-019-00546-z

Download citation

Received: 30 November 2018
Revised: 30 April 2019
Accepted: 13 June 2019
Published: 13 July 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s00778-019-00546-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joins on high-bandwidth memory: a new level in the memory hierarchy

Abstract

Access this article

Similar content being viewed by others

The New Hardware Development Trend and the Challenges in Data Management and Analysis

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joins on high-bandwidth memory: a new level in the memory hierarchy

Abstract

Access this article

Similar content being viewed by others

The New Hardware Development Trend and the Challenges in Data Management and Analysis

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation