skip to main content
10.1145/1376616.1376670acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Relational joins on graphics processors

Authors Info & Claims
Published:09 June 2008Publication History

ABSTRACT

We present a novel design and implementation of relational join algorithms for new-generation graphics processing units (GPUs). The most recent GPU features include support for writing to random memory locations, efficient inter-processor communication, and a programming model for general-purpose computing. Taking advantage of these new features, we design a set of data-parallel primitives such as split and sort, and use these primitives to implement indexed or non-indexed nested-loop, sort-merge and hash joins. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. We have implemented our algorithms on a PC with an NVIDIA G80 GPU and an Intel quad-core CPU. Our GPU-based join algorithms are able to achieve a performance improvement of 2-7X over their optimized CPU-based counterparts.

References

  1. A. Ailamaki, N. Govindaraju, S. Harizopoulos and D. Manocha. Query co-processing on commodity processors. VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD CTM, http://ati.amd.com/products/streamprocessor/.Google ScholarGoogle Scholar
  3. S. Azadegan, A. R. Tripathi. Parallel join algorithms for SIMD models. ICPP (3) 1991: 125--133.Google ScholarGoogle Scholar
  4. S. Azadegan, A. R. Tripathi. A parallel join algorithm for SIMD architectures. Journal of Systems and Software 39(3), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Bandi, C. Sun, D. Agrawal and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Blythe. The Direct3D 10 system. SIGGRAPH 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Boyd. Mass market applications of massively parallel computing. ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 2007.Google ScholarGoogle Scholar
  8. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. SIGGRAPH 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. E. Blelloch. Prefix sums and their applications. Technical report, CMU-CS-90-190, Nov 1990.Google ScholarGoogle Scholar
  10. P. Boncz, S. Manegold and M. Kersten. Database architecture optimized for the new bottleneck: memory access. VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Cieslewicz, J. Berry, B. Hendrickson and K. A. Ross. Realizing parallelism in database operations: insights from a massively multithreaded architecture. DaMoN, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Cormen, C. Leiserson, R. Rivest and C. Stein. Introduction to Algorithms, Second Edition. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. In Communications of the ACM, Vol. 35, No. 6, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Govindaraju, J. Gray, R. Kumar and D. Manocha. GPUTeraSort: high performance graphics coprocessor sorting for large database management. SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Govindaraju, N. Raghuvanshi and D. Manocha. Fast and approximate stream mining of quantiles and frequencies using graphics processors. SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: limitations and opportunities. CIDR, 2007.Google ScholarGoogle Scholar
  18. M. Harris, J. Owens, S. Sengupta, Y. Zhang and A. Davidson. CUDPP: CUDA Data Parallel Primitives Library. http://www.gpgpu.org/developer/cudpp/, 2007.Google ScholarGoogle Scholar
  19. B. He, N. Govindaraju, Q. Luo and B. Smith. Efficient gather and scatter operations on graphics processors. ACM/IEEE Supercomputing, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Horn. Stream reduction operations for GPGPU applications. In GPU Gems 2, Ed. Addison Wesley, 2005.Google ScholarGoogle Scholar
  21. K. A. Hua and C. Lee. Handling data skew in multiprocessor database computers using partition tuning. VLDB, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Lamarca and R. Ladner. The influence of caches on the performance of sorting. SODA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. D. Lieberman, J. Sankaranarayanan, H. Samet. A fast similarity join algorithm using graphics processing units. ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Liu and E. Rundensteiner. Revisiting pipelined parallelism in multi-join query processing. VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Lu, K. Tan and M. Shan. Hash-based join algorithms for multiprocessor computers with shared memory. VLDB, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. MonetDB. http://monetdb.cwi.nl/.Google ScholarGoogle Scholar
  27. NVIDIA CUDA (Compute Unified Device Architecture), http://developer.nvidia.com/object/cuda.html.Google ScholarGoogle Scholar
  28. OpenGL, http://www.opengl.org/.Google ScholarGoogle Scholar
  29. OpenMP, http://www.openmp.org/.Google ScholarGoogle Scholar
  30. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum (26), 2007.Google ScholarGoogle Scholar
  31. J. Rao and K. A. Ross. Cache conscious indexing for decision-support in main memory. VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. A. Schneider and D. DeWitt. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. SIGMOD, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Sengupta, M. Harris, Y. Zhang, J. D. Owens. Scan primitives for GPU computing. ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Shatdal, C. Kant and J. F. Naughton. Cache conscious algorithms for relational query processing. VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran and S. Zdonik. C-Store: A column-oriented DBMS. VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Sun, D. Agrawal and A. El Abbadi. Hardware acceleration for spatial selections and joins. SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Tarditi, S. Puri and J. Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Thall. Extended-precision floating-point numbers for GPU computation. SIGGRAPH, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Zagha and G. E. Blelloch. Radix sort for vector multiprocessors. ACM/IEEE Supercomputing, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Zukowski, S. Heman, N. Nes, P. Boncz. Super-Scalar RAM-CPU Cache Compression. ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Relational joins on graphics processors

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
            June 2008
            1396 pages
            ISBN:9781605581026
            DOI:10.1145/1376616

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 June 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader