research-article

Relational joins on graphics processors

Authors:
Bingsheng He

Hong Kong Univ. of Science and Technology, Hong Kong, China

Hong Kong Univ. of Science and Technology, Hong Kong, China
View Profile

,
Ke Yang

Zhejiang University, China, China

Zhejiang University, China, China
View Profile

,
Rui Fang

Highbridge Capital Management LLC, USA, USA

Highbridge Capital Management LLC, USA, USA
View Profile

,
Mian Lu

Hong Kong Univ. of Science and Technology, Hong Kong, China

Hong Kong Univ. of Science and Technology, Hong Kong, China
View Profile

,
Naga Govindaraju

Microsoft Corporation, USA, Seattle, USA

Microsoft Corporation, USA, Seattle, USA
View Profile

,
Qiong Luo

Hong Kong Univ. of Science and Technology, Hong Kong, China

Hong Kong Univ. of Science and Technology, Hong Kong, China
View Profile

,
Pedro Sander

Hong Kong Univ. of Science and Technology, Hong Kong, China

Hong Kong Univ. of Science and Technology, Hong Kong, China
View Profile

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataJune 2008Pages 511–524https://doi.org/10.1145/1376616.1376670

Published:09 June 2008Publication History

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Pages 511–524

ABSTRACT

We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). The most recent GPU features include support for writing to random memory locations, efficient inter-processor communication, and a programming model for general-purpose computing. Taking advantage of these new features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts.

References

A. Ailamaki, N. Govindaraju, S. Harizopoulos and D. Manocha. Query co-processing on commodity processors. VLDB, 2006. Google ScholarDigital Library
AMD CTM, http://ati.amd.com/products/streamprocessor/.Google Scholar
S. Azadegan, A. R. Tripathi. Parallel join algorithms for SIMD models. ICPP (3) 1991: 125--133.Google Scholar
S. Azadegan, A. R. Tripathi. A parallel join algorithm for SIMD architectures. Journal of Systems and Software 39(3), 1997. Google ScholarDigital Library
N. Bandi, C. Sun, D. Agrawal and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. VLDB, 2004. Google ScholarDigital Library
D. Blythe. The Direct3D 10 system. SIGGRAPH 2006. Google ScholarDigital Library
C. Boyd. Mass market applications of massively parallel computing. ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 2007.Google Scholar
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. SIGGRAPH 2004. Google ScholarDigital Library
G. E. Blelloch. Prefix sums and their applications. Technical report, CMU-CS-90-190, Nov 1990.Google Scholar
P. Boncz, S. Manegold and M. Kersten. Database architecture optimized for the new bottleneck: memory access. VLDB, 1999. Google ScholarDigital Library
J. Cieslewicz, J. Berry, B. Hendrickson and K. A. Ross. Realizing parallelism in database operations: insights from a massively multithreaded architecture. DaMoN, 2006. Google ScholarDigital Library
T. Cormen, C. Leiserson, R. Rivest and C. Stein. Introduction to Algorithms, Second Edition. 2001. Google ScholarDigital Library
D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. In Communications of the ACM, Vol. 35, No. 6, June 1992. Google ScholarDigital Library
N. Govindaraju, J. Gray, R. Kumar and D. Manocha. GPUTeraSort: high performance graphics coprocessor sorting for large database management. SIGMOD, 2006. Google ScholarDigital Library
N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. SIGMOD, 2004. Google ScholarDigital Library
N. Govindaraju, N. Raghuvanshi and D. Manocha. Fast and approximate stream mining of quantiles and frequencies using graphics processors. SIGMOD, 2005. Google ScholarDigital Library
N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: limitations and opportunities. CIDR, 2007.Google Scholar
M. Harris, J. Owens, S. Sengupta, Y. Zhang and A. Davidson. CUDPP: CUDA Data Parallel Primitives Library. http://www.gpgpu.org/developer/cudpp/, 2007.Google Scholar
B. He, N. Govindaraju, Q. Luo and B. Smith. Efficient gather and scatter operations on graphics processors. ACM/IEEE Supercomputing, 2007. Google ScholarDigital Library
D. Horn. Stream reduction operations for GPGPU applications. In GPU Gems 2, Ed. Addison Wesley, 2005.Google Scholar
K. A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. VLDB, 1991. Google ScholarDigital Library
A. Lamarca and R. Ladner. The influence of caches on the performance of sorting. SODA, 1997. Google ScholarDigital Library
M. D. Lieberman, J. Sankaranarayanan, H. Samet. A fast similarity join algorithm using graphics processing units. ICDE, 2008. Google ScholarDigital Library
B. Liu and E. Rundensteiner. Revisiting pipelined parallelism in multi-join query processing. VLDB, 2005. Google ScholarDigital Library
H. Lu, K. Tan and M. Shan. Hash-based join algorithms for multiprocessor computers with shared memory. VLDB, 1990. Google ScholarDigital Library
MonetDB. http://monetdb.cwi.nl/.Google Scholar
NVIDIA CUDA (Compute Unified Device Architecture), http://developer.nvidia.com/object/cuda.html.Google Scholar
OpenGL, http://www.opengl.org/.Google Scholar
OpenMP, http://www.openmp.org/.Google Scholar
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum (26), 2007.Google Scholar
J. Rao and K. A. Ross. Cache conscious indexing for decision-support in main memory. VLDB, 1999. Google ScholarDigital Library
D. A. Schneider and D. DeWitt. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. SIGMOD, 1989. Google ScholarDigital Library
S. Sengupta, M. Harris, Y. Zhang, J. D. Owens. Scan primitives for GPU computing. ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 2007. Google ScholarDigital Library
A. Shatdal, C. Kant and J. F. Naughton. Cache conscious algorithms for relational query processing. VLDB, 1994. Google ScholarDigital Library
M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran and S. Zdonik. C-Store: A column-oriented DBMS. VLDB, 2005. Google ScholarDigital Library
C. Sun, D. Agrawal and A. El Abbadi. Hardware acceleration for spatial selections and joins. SIGMOD, 2003. Google ScholarDigital Library
D. Tarditi, S. Puri and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. ASPLOS, 2006. Google ScholarDigital Library
A. Thall. Extended-precision floating-point numbers for GPU computation. SIGGRAPH, 2006. Google ScholarDigital Library
M. Zagha and G. E. Blelloch. Radix sort for vector multiprocessors. ACM/IEEE Supercomputing, 1991. Google ScholarDigital Library
M. Zukowski, S. Heman, N. Nes, P. Boncz. Super-Scalar RAM-CPU Cache Compression. ICDE, 2006. Google ScholarDigital Library

Index Terms

Relational joins on graphics processors
1. Information systems
  1. Data management systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Efficient gather and scatter operations on graphics processors
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Gather and scatter are two fundamental data-parallel operations, where a large number of data items are read (gathered) from or are written (scattered) to given locations. In this paper, we study these two operations on graphics processing units (GPUs).
...
Read More
Relational query coprocessing on graphics processors

Graphics processors (GPUs) have recently emerged as powerful coprocessors for general purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. Moreover, new-generation GPUs ...
Read More
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
General Chairs:
Laks V. S. Lakshmanan
University of British Columbia, Canada
,
Raymond T. Ng
University of British Columbia, Canada
,
Dennis Shasha
New York University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graphics processors
join
parallel processing
primitive
relational database
sort
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 232
  Total Citations
  View Citations
- 1,926
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Relational joins on graphics processors

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient gather and scatter operations on graphics processors

Relational query coprocessing on graphics processors

A performance study of general-purpose applications on graphics processors using CUDA