Skip to main content
Log in

Joins on high-bandwidth memory: a new level in the memory hierarchy

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

High-bandwidth memory (HBM) gives an additional opportunity for hardware performance benefits. The high available bandwidth compared to regular DRAM allows execution of many threads in parallel, avoiding memory stalls through many concurrent memory accesses This is especially interesting considering database join algorithms optimized for multicore CPUs, even more when running on a manycore processor like a Xeon Phi Knights Landing (KNL). The drawback of HBM, however, is its small capacity as well as under-utilization in random memory access patterns. In this paper, we analyze the impact of HBM on join processing on the KNL architecture. We evaluate main memory hash join and sort-merge join algorithms of relational DBMS as well as data stream joins, comparing execution times in different HBM configurations. Our results show performance gains up to 3\(\times \) for joins when HBM is used. Finally, we summarize our lessons learned, give additional advice for HBM utilization, and discuss generalizations for other levels of the memory hierarchy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

Notes

  1. Dagstuhl report https://www.dagstuhl.de/18251.

  2. https://colfaxresearch.com/knl-numa/.

  3. https://software.intel.com/en-us/articles/intelr-memory-latency-checker.

  4. http://www.cs.virginia.edu/stream/.

  5. https://www.systutorials.com/docs/linux/man/8-numactl/.

  6. http://memkind.github.io/memkind/.

  7. https://github.com/dbis-ilm/pipefabric.

  8. http://web.cse.ohio-state.edu/~blanas.2/.

  9. https://www.systems.ethz.ch/projects/paralleljoins.

  10. https://www.systutorials.com/docs/linux/man/8-numastat/.

References

  1. Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. Proc. VLDB Endow. 5(10), 1064–1075 (2012)

    Article  Google Scholar 

  2. Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. Proc. VLDB Endow. 7(1), 85–96 (2013)

    Article  Google Scholar 

  3. Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: Proceedings of the 2013 IEEE International Conference on Data Engineering, ICDE ’13, pp. 362–373. IEEE Computer Society (2013)

  4. Barnes, T., Cook, B., Deslippe, J., Doerfler, D., Friesen, B., He, Y.H., Kurth, T., Koskela, T., Lobet, M., Malas, T., Oliker, L., Ovsyannikov, A., Sarje, A., Vay, J.L., Vincenti, H., Williams, S., Carrier, P., Wichmann, N., Wagner, M., Kent, P., Kerr, C., Dennis, J.: Evaluating and optimizing the NERSC workload on Knights Landing. In: Proceedings of the 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS ’16, pp. 43–53. IEEE Press (2016)

  5. Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pp. 37–48. ACM (2011)

  6. Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR 2005, 2nd Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, 4–7 January 2005, Online Proceedings, pp. 225–237 (2005)

  7. Breß, S.: The design and implementation of CoGaDB: a column-oriented GPU-accelerated DBMS. Datenbank Spektrum 14(3), 199–209 (2014)

    Article  Google Scholar 

  8. Cheng, X., He, B., Du, X., Lau, C.T.: A study of main-memory hash joins on many-core processor: a case with Intel Knights Landing architecture. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM ’17, pp. 657–666. ACM (2017)

  9. Cheng, X., He, B., Lu, M., Lau, C.T., Huynh, H.P., Goh, R.S.M.: Efficient query processing on many-core architectures: a case study with Intel Xeon Phi processor. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 2081–2084. ACM (2016)

  10. Färber, F., May, N., Lehner, W., Große, P., Müller, I., Rauhe, H., Dees, J.: The SAP HANA database—an architecture overview. IEEE Data Eng. Bull. 35, 28–33 (2012)

    Google Scholar 

  11. Graefe, G., Linville, A., Shapiro, L.D.: Sort versus hash revisited. IEEE Trans. Knowl. Data Eng. 6(6), 934–944 (1994)

    Article  Google Scholar 

  12. Gulisano, V., Nikolakopoulos, Y., Papatriantafilou, M., Tsigas, P.: ScaleJoin: a deterministic, disjoint-parallel and skew-resilient stream join. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 144–153 (2015)

  13. He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU–GPU architecture. Proc. VLDB Endow. 6(10), 889–900 (2013)

    Article  Google Scholar 

  14. Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach. Proc. VLDB Endow. 8(6), 642–653 (2015)

    Article  Google Scholar 

  15. Karnagel, T., Habich, D., Schlegel, B., Lehner, W.: The HELLS-join: a heterogeneous stream join for extremely large windows. In: Proceedings of the 9th International Workshop on Data Management on New Hardware, DaMoN ’13, pp. 2:1–2:7. ACM (2013)

  16. Lang, H., Leis, V., Albutiu, M.C., Neumann, T., Kemper, A.: Massively parallel NUMA-aware hash joins. In: Memory Data Management and Analysis, pp. 3–14. Springer International Publishing (2015)

  17. Loh, G.H.: 3D-stacked memory architectures for multi-core processors. SIGARCH Comput. Archit. News 36(3), 453–464 (2008)

    Article  MathSciNet  Google Scholar 

  18. Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. 14(4), 709–730 (2002)

    Article  Google Scholar 

  19. Peng, I.B., Gioiosa, R., Kestor, G., Laure, E., Markidis, S.: Exploring the performance benefit of hybrid memory system on HPC environments. CoRR (2017)

  20. Pohl, C., Sattler, K.: Joins in a heterogeneous memory hierarchy: exploiting high-bandwidth memory. In: Proceedings of the 14th International Workshop on Data Management on New Hardware, Houston, TX, USA, 11 June 2018, pp. 8:1–8:10 (2018)

  21. Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD ’15, pp. 1493–1508. ACM (2015)

  22. Ramos, S., Hoefler, T.: Capability models for manycore memory systems: a case-study with Xeon Phi KNL. In: 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, FL, USA, 29 May–2 June 2017, pp. 297–306 (2017)

  23. Schuh, S., Chen, X., Dittrich, J.: An Experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 1961–1976. ACM (2016)

  24. Smith, S., Park, J., Karypis, G.: Sparse tensor factorization on many-core processors with high-bandwidth memory. In: 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS ’17, pp. 1058–1067 (2017)

  25. Stonebraker, M., Cetintemel, U.: “One size fits all”: an idea whose time has come and gone. In: Proceedings of the 21st International Conference on Data Engineering, ICDE ’05, pp. 2–11. IEEE Computer Society (2005)

  26. Teubner, J., Mueller, R.: How Soccer players would do stream joins. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pp. 625–636. ACM (2011)

  27. Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Proceedings of the 1st International Conference on Parallel and Distributed Information Systems, PDIS ’91, pp. 68–77. IEEE Computer Society Press (1991)

  28. Yu, X., Bezerra, G., Pavlo, A., Devadas, S., Stonebraker, M.: Staring into the abyss: an evaluation of concurrency control with one thousand cores. Proc. VLDB Endow. 8(3), 209–220 (2014)

    Article  Google Scholar 

  29. Zhang, S., He, J., He, B., Lu, M.: OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures. Proc. VLDB Endow. 6(12), 1374–1377 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

We thank Schloss Dagstuhl and the participants of Seminar 18251 for the valuable discussions and motivation that helped us to improve this paper. This work was partially funded by the German Research Foundation (DFG) within the SPP2037 under Grant No. SA 782/28.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Constantin Pohl.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pohl, C., Sattler, KU. & Graefe, G. Joins on high-bandwidth memory: a new level in the memory hierarchy. The VLDB Journal 29, 797–817 (2020). https://doi.org/10.1007/s00778-019-00546-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-019-00546-z

Keywords

Navigation