research-article

Public Access

Fast Equi-Join Algorithms on GPUs: Design and Implementation

Authors:
Ran Rui

Department of Computer Science and Engineering, University of South Florida, Tampa, Florida, USA

Department of Computer Science and Engineering, University of South Florida, Tampa, Florida, USA
View Profile

,
Yi-Cheng Tu

Department of Computer Science and Engineering, University of South Florida, Tampa, Florida, USA

Department of Computer Science and Engineering, University of South Florida, Tampa, Florida, USA
View Profile

SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database ManagementJune 2017Article No.: 17Pages 1–12https://doi.org/10.1145/3085504.3085521

Published:27 June 2017Publication History

SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management

Pages 1–12

ABSTRACT

Processing relational joins on modern GPUs has attracted much attention in the past few years. With the rapid development on the hardware and software environment in the GPU world, the existing GPU join algorithms designed for earlier architecture cannot make the most out of latest GPU products. In this paper, we report new design and implementation of join algorithms with high performance under today's GPGPU environment. This is a key component of our scientific database engine named G-SDMS. In particular, we overhaul the popular radix hash join and redesign sort-merge join algorithms on GPUs by applying a series of novel techniques to utilize the hardware capacity of latest Nvidia GPU architecture and new features of the CUDA programming framework. Our algorithms take advantage of revised hardware arrangement, larger register file and shared memory, native atomic operation, dynamic parallelism, and CUDA Streams. Experiments show that our new hash join algorithm is 2.0 to 14.6 times as efficient as existing GPU implementation, while the new sort-merge join achieves a speedup of 4.0X to 4.9X. Compared to the best CPU sort-merge join and hash join known to date, our optimized code achieves up to 10.5X and 5.5X speedup. Moreover, we extend our design to scenarios where large data tables cannot fit in the GPU memory.

References

2005. GPU Gems 2, Chapter 46. (Mar 2005). http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter46.html.Google Scholar
Martina-Cezara Albutiu, Alfons Kemper, and Thomas Neumann. 2012. Massively Parallel Sort-merge Joins in Main Memory Multi-core Database Systems. Proc. VLDB Endow. 5, 10 (June 2012), 1064--1075. Google ScholarDigital Library
Peter Bakkum and Kevin Skadron. 2010. Accelerating SQL Database Operations on a GPU with CUDA. In Procs. 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU '10). 94--103. Google ScholarDigital Library
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multi-core, Main-memory Joins: Sort vs. Hash Revisited. Proc. VLDB Endow. 7, 1 (Sept. 2013), 85--96. Google ScholarDigital Library
C. Balkesen, J. Teubner, G. Alonso, and M. T...zsu. 2013. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE. 362--373. Google ScholarDigital Library
Nagender Bandi, Chengyu Sun, Divyakant Agrawal, and Amr El Abbadi. 2004. Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations. In Procs. of VLDB. 1021--1032. Google ScholarDigital Library
R. Barber, G. Lohman, I. Pandis, V. Raman, R. Sidle, G. Attaluri, N. Chainani, S. Lightstone, and D. Sharpe. 2014. Memory-efficient Hash Joins. Proc. VLDB Endow. 8, 4 (Dec. 2014), 353--364. Google ScholarDigital Library
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-core CPUs. In Procs. of SIGMOD. 37--48. Google ScholarDigital Library
Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin, and Dinesh Manocha. 2004. Fast Computation of Database Operations Using Graphics Processors. In Procs. of SIGMOD. 215--226. Google ScholarDigital Library
Oded Green, Robert McColl, and David A. Bader. 2012. GPU Merge Path: A GPU Merging Algorithm. In Procs of ICS. 331--340. Google ScholarDigital Library
Bingsheng He, Mian Lu, Ke Yang, Rui Fang, Naga K. Govindaraju, Qiong Luo, and Pedro V. Sander. 2009. Relational Query Coprocessing on Graphics Processors. ACM Trans. Database Syst. 34, 4, Article 21 (Dec. 2009), 39 pages. Google ScholarDigital Library
Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational Joins on Graphics Processors. In Procs. of SIGMOD. 511--524. Google ScholarDigital Library
Jiong He, Mian Lu, and Bingsheng He. 2013. Revisiting Co-processing for Hash Joins on the Coupled CPU-GPU Architecture. Proc. VLDB Endowment 6, 10 (Aug. 2013), 889--900. Google ScholarDigital Library
Tim Kaldewey, Guy Lohman, Rene Mueller, and Peter Volk. 2012. GPU Join Processing Revisited. In Procs. DaMoN. 55--62. Google ScholarDigital Library
Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-core CPUs. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1378--1389. Google ScholarDigital Library
Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey. 2010. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. SIGARCH Comput. Archit. News 38, 3 (June 2010), 451--460. Google ScholarDigital Library
S. Manegold, P. Boncz, and M. Kersten. 2002. Optimizing main-memory join on modern hardware. IEEE TKDE 14, 4 (Jul 2002), 709--730. Google ScholarDigital Library
S. Odeh, O. Green, Z. Mwassi, O. Shmueli, and Y. Birk. 2012. Merge Path - Parallel Merging Made Simple. In IPDPSW. 1611--1618. Google ScholarDigital Library
Ran Rui, Hao Li, and Yi-Cheng Tu. 2015. Join algorithms on GPUs: A revisit after seven years. In Big Data. 2541--2550. Google ScholarDigital Library
Evangelia A. Sitaridi and Kenneth A. Ross. 2012. Ameliorating Memory Contention of OLAP Operators on GPU Processors. In DaMoN. 39--47. Google ScholarDigital Library
Chengyu Sun, Divyakant Agrawal, and Amr El Abbadi. 2003. Hardware Acceleration for Spatial Selections and Joins. In Procs. of ACM Intl. Conf. on Management of Data (SIGMOD). 455--466. Google ScholarDigital Library
Yi-Cheng Tu, Anand Kumar, Di Yu, Ran Rui, and Ryan Wheeler. 2013. Data Management Systems on GPUs: Promises and Challenges. In SSDBM. Article 33, 4 pages. Google ScholarDigital Library
Haicheng Wu, Gregory Diamos, Tim Sheard, Molham Aref, Sean Baxter, Michael Garland, and Sudhakar Yalamanchili. 2014. Red Fox: An Execution Environment for Relational Query Processing on GPUs. In Procs. CGO. Article 44, 11 pages. Google ScholarDigital Library
Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2013. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices. Proc. VLDB Endowment 6,10 (Aug. 2013), 817--828. Google ScholarDigital Library
Marco Zagha and Guy E. Blelloch. 1991. Radix Sort for Vector Multiprocessors. In Procs. 1991 ACM/IEEE Conference on Supercomputing (SC '91). 712--721. Google ScholarDigital Library

Recommendations

Join algorithms on GPUs: A revisit after seven years
BIG DATA '15: Proceedings of the 2015 IEEE International Conference on Big Data (Big Data)

Implementing database operations on parallel platforms has gain a lot of momentum in the past decade. A number of studies have shown the potential of using GPUs to speed up database operations. In this paper, we present empirical evaluations of a state-...
Read More
Fast Parallel Connected Components Algorithms on GPUs
Revised Selected Papers, Part I, of the Euro-Par 2014 International Workshops on Parallel Processing - Volume 8805

We study parallel connected components algorithms on GPUs in comparison with CPUs. Although straightforward implementation of PRAM algorithms performs relatively better on GPUs than on CPUs, the GPU memory subsystem performance is poor due to non-...
Read More
Brook for GPUs: stream computing on graphics hardware
SIGGRAPH '04: ACM SIGGRAPH 2004 Papers

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management
June 2017
373 pages
ISBN:9781450352826
DOI:10.1145/3085504
General Chair:
Alok Choudhary,
Program Chair:
Kesheng Wu,
Publications Chair:
Bin Dong
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate56of146submissions,38%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 636
  Total Downloads
- Downloads (Last 12 months)123
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast Equi-Join Algorithms on GPUs: Design and Implementation

SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management

ABSTRACT

References

Cited By

Recommendations

Join algorithms on GPUs: A revisit after seven years

Fast Parallel Connected Components Algorithms on GPUs

Brook for GPUs: stream computing on graphics hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fast Equi-Join Algorithms on GPUs: Design and Implementation

SSDBM '17: Proceedings of the 29th International Conference on Scientific and Statistical Database Management

ABSTRACT

References

Cited By

Recommendations

Join algorithms on GPUs: A revisit after seven years

Fast Parallel Connected Components Algorithms on GPUs

Brook for GPUs: stream computing on graphics hardware

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media