skip to main content
10.1145/1142473.1142511acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

GPUTeraSort: high performance graphics co-processor sorting for large database management

Published:27 June 2006Publication History

ABSTRACT

We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces on the same PC -- a fast and dedicated memory interface on the GPU along with the main memory interface for CPU computations. As a result, we achieve higher memory bandwidth as compared to CPU-based algorithms running on commodity PCs. Our approach takes into account the limited communication bandwidth between the CPU and the GPU, and reduces the data communication between the two processors. Our algorithm also improves the performance of disk transfers and achieves close to peak I/O performance. We have tested the performance of our algorithm on the SortBenchmark and applied it to large databases composed of a few hundred Gigabytes of data. Our results on a 3 GHz Pentium IV PC with $300 NVIDIA 7800 GT GPU indicate a significant performance improvement over optimized CPU-based algorithms on high-end PCs with 3.6 GHz Dual Xeon processors. Our implementation is able to outperform the current high-end PennySort benchmark and results in a higher performance to price ratio. Overall, our results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.

References

  1. Clearspeed technology. (http://www.clearspeed.com), 2005.Google ScholarGoogle Scholar
  2. Ramesh C. Agarwal. A super scalar sort algorithm for RISC processors. SIGMOD Record (ACM Special Interest Group on Management of Data), 25(2):240--246, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alok Aggarwal and S. Vitter Jeffrey. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Ailamaki. Database architectures for new hardware. VLDB Tutorial Notes, 2004.Google ScholarGoogle Scholar
  5. Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and Marios Skounakis. Weaving relations for cache performance. In Proceedings of the Twenty-seventh International Conference on Very Large Data Bases, pages 169--180, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A. Wood. DBMSs on a modern processor: Where does time go? In The VLDB Journal, pages 266--277, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Anon. et al. A measure of transaction processing power. Datamation, page 112, 1 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, David E. Culler, Joseph M. Hellerstein, and David A. Patterson. High-performance sorting on networks of workstations. SIGMOD Record (ACM Special Interest Group on Management of Data), 26(2):243--254, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Bandi, C. Sun, D. Agrawal, and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. VLDB, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K.E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computer Conference, 1968.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Baugsto, J. Griepsland, and J. Kamerbeck. Sorting large data files on poma. Proc. of COMPAR-90 VAPPIV, pages 536--547, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Peter A. Boncz, Stefan Manegold, and Martin L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In Proceedings of the Twenty-fifth International Conference on Very Large Databases, pages 54--65, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, Cambridge, MA, 2nd edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Dewitt, J. Naughton, and D. Schneider. Parallel sorting on a shared-nothing architecture using probabilistic splitting. Proc. of International Conference on Parallel and Distributed Information Systems, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Anonymous et al. A measure of transaction processing power. Datamation, 31(7):112--118, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Friend. Sorting on electronic computer systems. Journal of the ACM, 3(3):134--168, 1956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Garcia and H. F. Korth. Multithreaded architectures and the sort benchmark. Proc. of First International Workshop on the Data Management on New Hardware, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. Proc. of ACM SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Govindaraju and D. Manocha. Efficient relational database management on graphics processors. Proc. of ACM Workshop on Data Management on New Hardware, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Govindaraju, N. Raghuvanshi, and D. Manocha. Fast and approximate stream mining of quantiles and frequencies using graphics processors. Proc. of ACM SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Naga K. Govindaraju, Jim Gray, and Dinesh Manocha. Cache-efficient general purpose algorithms on GPUs. Technical report, Microsoft Research, December 2005.Google ScholarGoogle Scholar
  22. P. Kipfer, M. Segal, and R. Westermann. Uber ow: A gpu-based particle engine. SIGGRAPH/Eurographics Workshop on Graphics Hardware, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading, MA, 1973.Google ScholarGoogle Scholar
  24. A. LaMarca and R. Ladner. The in uence of caches on the performance of sorting. Proc. of SODA, pages 370--379, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Li, G. Linoff, S. Smith, C. Stanfill, and K. Thearling. A practical external ort for shared disk mpps. Proc. of SuperComputing, pages 666--675, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stefan Manegold, Peter A. Boncz, and Martin L. Kersten. What happens during a join? Dissecting CPU and memory optimization effects. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, pages 339--350, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Stefan Manegold, Peter A. Boncz, and Martin L. Kersten. Generic database cost models for hierarchical memory systems. In VLDB 2002: proceedings of the Twenty-Eighth International Conference on Very Large Data Bases, pages 191--202, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Shintaro Meki and Yahiko Kambayashi. Acceleration of relational database operations on vector processors. Systems and Computers in Japan, 31(8):79--88, August 2000.Google ScholarGoogle ScholarCross RefCross Ref
  29. S. Muthukrishnan. Data streams: Algorithms and applications. Proc. of 14th ACM-SIAM Symposium on Discrete Algorithms, 2003. http://athos.rutgers.edu/ muthu/stream-1-1.ps. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. H. Nodine and J. S. Vitter. Greed sort: An optimal sorting algorithm for multiple disks. Journal of the ACM, 42(4):919--933, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nsort: Fast parallel sorting. http://www.ordinal.com/.Google ScholarGoogle Scholar
  32. C. Nyberg, T. Barclay, Z. Cvetanovic, J. Gray, and D. Lomet. Alphasort: A risc machine sort. SIGMOD, pages 233--242, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, and David B. Lomet. AlphaSort: A cache-sensitive parallel external sort. VLDB Journal: Very Large Data Bases, 4(4):603--627, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. Lefohn, and T. Purcell. A survey of general-purpose computation on graphics hardware. 2005.Google ScholarGoogle Scholar
  35. T. Purcell, C. Donner, M. Cammarano, H. Jensen, and P. Hanrahan. Photon mapping on programmable graphics hardware. ACM SIGGRAPH/Eurographics Conference on Graphics Hardware, pages 41--50, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jun Rao and Kenneth A. Ross. Cache conscious indexing for decision-support in main memory. In Proceedings of the Twenty-fifth International Conference on Very Large Databases, pages 78--89, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kenneth A. Ross. Conjunctive selection conditions in main memory. In ACM, editor, Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems: PODS 2002: Madison, Wisconsin, June 3-5, 2002, pages 109--120, New York, NY 10036, USA, 2002. ACM Press. ACM order number 475021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Betty Salzberg, Alex Tsukerman, Jim Gray, Michael Stuewart, Susan Uren, and Bonnie Vaughan. FastSort: A distributed single-input single-output external sort. SIGMOD Record (ACM Special Interest Group on Management of Data), 19(2):94--101, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ambuj Shatdal, Chander Kant, and Jeffrey F. Naughton. Cache conscious algorithms for relational query processing. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago, Chile proceedings, pages 510--521, Los Altos, CA 94022, USA, 1994. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Sun, D. Agrawal, and A. El Abbadi. Hardware acceleration for spatial selections and joins. SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Venkatasubramanian. The graphics card as a stream computer. Workshop on Management and Processing of Data Streams, 2003.Google ScholarGoogle Scholar
  42. J. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, pages 209--271, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. S. Vitter and E. A. M. Shriver. Algorithms for parallel memory II: Hierarchical multilevel memories. Algorithmica, 12:148--169, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Zagha and G. Blelloch. Radix sort for vector multiprocessors. Proc. of SuperComputing, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jingren Zhou and Kenneth A. Ross. Implementing database operations using SIMD instructions. In Michael Franklin, Bongki Moon, and Anastassia Ailamaki, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, June 3-6, 2002, Madison, WI, USA, pages 145--156, New York, NY 10036, USA, 2002. ACM Press. ACM order number 475020. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GPUTeraSort: high performance graphics co-processor sorting for large database management

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
      June 2006
      830 pages
      ISBN:1595934340
      DOI:10.1145/1142473

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 June 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader