skip to main content
10.1145/3085504.3085521acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article
Public Access

Fast Equi-Join Algorithms on GPUs: Design and Implementation

Authors Info & Claims
Published:27 June 2017Publication History

ABSTRACT

Processing relational joins on modern GPUs has attracted much attention in the past few years. With the rapid development on the hardware and software environment in the GPU world, the existing GPU join algorithms designed for earlier architecture cannot make the most out of latest GPU products. In this paper, we report new design and implementation of join algorithms with high performance under today's GPGPU environment. This is a key component of our scientific database engine named G-SDMS. In particular, we overhaul the popular radix hash join and redesign sort-merge join algorithms on GPUs by applying a series of novel techniques to utilize the hardware capacity of latest Nvidia GPU architecture and new features of the CUDA programming framework. Our algorithms take advantage of revised hardware arrangement, larger register file and shared memory, native atomic operation, dynamic parallelism, and CUDA Streams. Experiments show that our new hash join algorithm is 2.0 to 14.6 times as efficient as existing GPU implementation, while the new sort-merge join achieves a speedup of 4.0X to 4.9X. Compared to the best CPU sort-merge join and hash join known to date, our optimized code achieves up to 10.5X and 5.5X speedup. Moreover, we extend our design to scenarios where large data tables cannot fit in the GPU memory.

References

  1. 2005. GPU Gems 2, Chapter 46. (Mar 2005). http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter46.html.Google ScholarGoogle Scholar
  2. Martina-Cezara Albutiu, Alfons Kemper, and Thomas Neumann. 2012. Massively Parallel Sort-merge Joins in Main Memory Multi-core Database Systems. Proc. VLDB Endow. 5, 10 (June 2012), 1064--1075. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peter Bakkum and Kevin Skadron. 2010. Accelerating SQL Database Operations on a GPU with CUDA. In Procs. 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU '10). 94--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multi-core, Main-memory Joins: Sort vs. Hash Revisited. Proc. VLDB Endow. 7, 1 (Sept. 2013), 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Balkesen, J. Teubner, G. Alonso, and M. T...zsu. 2013. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE. 362--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nagender Bandi, Chengyu Sun, Divyakant Agrawal, and Amr El Abbadi. 2004. Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations. In Procs. of VLDB. 1021--1032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Barber, G. Lohman, I. Pandis, V. Raman, R. Sidle, G. Attaluri, N. Chainani, S. Lightstone, and D. Sharpe. 2014. Memory-efficient Hash Joins. Proc. VLDB Endow. 8, 4 (Dec. 2014), 353--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs. In Procs. of SIGMOD. 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin, and Dinesh Manocha. 2004. Fast Computation of Database Operations Using Graphics Processors. In Procs. of SIGMOD. 215--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Oded Green, Robert McColl, and David A. Bader. 2012. GPU Merge Path: A GPU Merging Algorithm. In Procs of ICS. 331--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bingsheng He, Mian Lu, Ke Yang, Rui Fang, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. 2009. Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst. 34, 4, Article 21 (Dec. 2009), 39 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational Joins on Graphics Processors. In Procs. of SIGMOD. 511--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jiong He, Mian Lu, and Bingsheng He. 2013. Revisiting Co-processing for Hash Joins on the Coupled CPU-GPU Architecture. Proc. VLDB Endowment 6, 10 (Aug. 2013), 889--900. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tim Kaldewey, Guy Lohman, Rene Mueller, and Peter Volk. 2012. GPU Join Processing Revisited. In Procs. DaMoN. 55--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1378--1389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey. 2010. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. SIGARCH Comput. Archit. News 38, 3 (June 2010), 451--460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Manegold, P. Boncz, and M. Kersten. 2002. Optimizing main-memory join on modern hardware. IEEE TKDE 14, 4 (Jul 2002), 709--730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Odeh, O. Green, Z. Mwassi, O. Shmueli, and Y. Birk. 2012. Merge Path - Parallel Merging Made Simple. In IPDPSW. 1611--1618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ran Rui, Hao Li, and Yi-Cheng Tu. 2015. Join algorithms on GPUs: A revisit after seven years. In Big Data. 2541--2550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Evangelia A. Sitaridi and Kenneth A. Ross. 2012. Ameliorating Memory Contention of OLAP Operators on GPU Processors. In DaMoN. 39--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chengyu Sun, Divyakant Agrawal, and Amr El Abbadi. 2003. Hardware Acceleration for Spatial Selections and Joins. In Procs. of ACM Intl. Conf. on Management of Data (SIGMOD). 455--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yi-Cheng Tu, Anand Kumar, Di Yu, Ran Rui, and Ryan Wheeler. 2013. Data Management Systems on GPUs: Promises and Challenges. In SSDBM. Article 33, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Haicheng Wu, Gregory Diamos, Tim Sheard, Molham Aref, Sean Baxter, Michael Garland, and Sudhakar Yalamanchili. 2014. Red Fox: An Execution Environment for Relational Query Processing on GPUs. In Procs. CGO. Article 44, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices. Proc. VLDB Endowment 6,10 (Aug. 2013), 817--828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Marco Zagha and Guy E. Blelloch. 1991. Radix Sort for Vector Multiprocessors. In Procs. 1991 ACM/IEEE Conference on Supercomputing (SC '91). 712--721. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management
    June 2017
    373 pages
    ISBN:9781450352826
    DOI:10.1145/3085504

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 June 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate56of146submissions,38%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader