skip to main content
research-article

Relational query coprocessing on graphics processors

Published:14 December 2009Publication History
Skip Abstract Section

Abstract

Graphics processors (GPUs) have recently emerged as powerful coprocessors for general purpose computation. Compared with commodity CPUs, GPUs have an order of magnitude higher computation power as well as memory bandwidth. Moreover, new-generation GPUs allow writes to random memory locations, provide efficient interprocessor communication through on-chip local memory, and support a general purpose parallel programming model. Nevertheless, many of the GPU features are specialized for graphics processing, including the massively multithreaded architecture, the Single-Instruction-Multiple-Data processing style, and the execution model of a single application at a time. Additionally, GPUs rely on a bus of limited bandwidth to transfer data to and from the CPU, do not allow dynamic memory allocation from GPU kernels, and have little hardware support for write conflicts. Therefore, a careful design and implementation is required to utilize the GPU for coprocessing database queries.

In this article, we present our design, implementation, and evaluation of an in-memory relational query coprocessing system, GDB, on the GPU. Taking advantage of the GPU hardware features, we design a set of highly optimized data-parallel primitives such as split and sort, and use these primitives to implement common relational query processing algorithms. Our algorithms utilize the high parallelism as well as the high memory bandwidth of the GPU, and use parallel computation and memory optimizations to effectively reduce memory stalls. Furthermore, we propose coprocessing techniques that take into account both the computation resources and the GPU-CPU data transfer cost so that each operator in a query can utilize suitable processors—the CPU, the GPU, or both—for an optimized overall performance. We have evaluated our GDB system on a machine with an Intel quad-core CPU and an NVIDIA GeForce 8800 GTX GPU. Our workloads include microbenchmark queries on memory-resident data as well as TPC-H queries that involve complex data types and multiple query operators on data sets larger than the GPU memory. Our results show that our GPU-based algorithms are 2--27x faster than their optimized CPU-based counterparts on in-memory data. Moreover, the performance of our coprocessing scheme is similar to, or better than, both the GPU-only and the CPU-only schemes.

References

  1. Abadi, D., Madden, S., and Ferreira, M. 2006. Integrating compression and execution in column-oriented database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). ACM, New York, NY, 671--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ailamaki, A., DeWitt, D. J., Hill, M. D., and Wood, D. A. 1999. DBMSs on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers Inc., San Francisco, CA, 266--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ailamaki, A., Govindaraju, N., Harizopoulos, S., and Manocha, D. 2006. Query co-processing on commodity processors. In Proceedings of the International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bandi, N., Sun, C., Agrawal, D., and Abbadi, A. E. 2004. Hardware acceleration in commercial databases: a case study of spatial operations. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 1021--1032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Blelloch, G. E. 1990. Prefix sums and their applications. Tech. rep. CMU-CS-90-190, Carnegie Mellan University.Google ScholarGoogle Scholar
  6. Blythe, D. 2006. The direct3d 10 system. SIGGRAPH, ACM Press, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Boncz, P. A., Manegold, S., and Kersten, M. L. 1999. Database architecture optimized for the new bottleneck: Memory access. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers Inc., San Francisco, CA, 54--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for gpus: Stream computing on graphics hardware. SIGGRAPH, ACM Press, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cieslewicz, J., Berry, J., Hendrickson, B., and Ross, K. A. 2006. Realizing parallelism in database operations: insights from a massively multithreaded architecture. In Proceedings of the 2nd International Workshop on Data Management on New Hardware (DaMoN). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cieslewicz, J. and Ross, K. A. 2007. Adaptive aggregation on chip multiprocessors. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 339--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. DeWitt, D. and Gray, J. 1992. Parallel database systems: the future of high performance database systems. Comm. ACM 35, 6, 85--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. 1999. Cache-oblivious algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS). IEEE Computer Society, Washington, DC, 285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gedik, B., Bordawekar, R. R., and Yu, P. S. 2007. Cellsort: High performance sorting on the cell processor. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 1286--1297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gedik, B., Yu, P. S., and Bordawekar, R. R. 2007. Executing stream joins on the cell processor. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 363--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gold, B., Ailamaki, A., Huston, L., and Falsafi, B. 2005. Accelerating database operators using a network processor. In Proceedings of the 1st International Workshop on Data Management on New Hardware (DaMoN). ACM, New York, NY, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Govindaraju, N., Gray, J., Kumar, R., and Manocha, D. 2006. Gputerasort: high performance graphics co-processor sorting for large database management. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 325--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Govindaraju, N., Lloyd, B., Wang, W., Lin, M., and Manocha, D. 2004. Fast computation of database operations using graphics processors. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 215--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Govindaraju, N. K., Raghuvanshi, N., and Manocha, D. 2005. Fast and approximate stream mining of quantiles and frequencies using graphics processors. In Proceedings of the ACM SIGMOD International Conference on Management of Data. New York, NY, 611--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Graefe, G. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2, 73--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Harizopoulos, S., Abadi, D. J., Madden, S., and Stonebraker, M. 2008. OLTP through the looking glass, and what we found there. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 981--992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Harris, M., Owens, J., Sengupta, S., Zhang, Y., and Davidson, A. 2007. CUDPP: CUDA data parallel primitives library.Google ScholarGoogle Scholar
  22. He, B., Govindaraju, N. K., Luo, Q., and Smith, B. 2007. Efficient gather and scatter operations on graphics processors. In Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, New York, NY, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. He, B. and Luo, Q. 2008. Cache-oblivious databases: Limitations and opportunities. ACM Trans. Datab. Syst. 33, 2, 1--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., and Sander, P. 2008. Relational joins on graphics processors. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 511--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Héman, S., Nes, N., Zukowski, M., and Boncz, P. 2007. Vectorized data processing on the cell broadband engine. In Proceedings of the 3rd International Workshop on Data Management on New Hardware (DaMoN). ACM, New York, NY, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hong, W. and Stonebraker, M. 1991. Optimization of parallel query execution plans in xprs. In Proceedings of the 1st International Conference on Parallel and Distributed Information Systems (PDIS). IEEE Computer Society Press, Los Alamitos, CA, 218--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Johnson, R., Harizopoulos, S., Hardavellas, N., Sabirli, K., Pandis, I., Ailamaki, A., Mancheril, N. G., and Falsafi, B. 2007. To share or not to share? In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 351--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. LaMarca, A. and Ladner, R. E. 1997. The influence of caches on the performance of sorting. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Society for Industrial and Applied Mathematics, Philadelphia, PA, 370--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lieberman, M. D., Sankaranarayanan, J., and Samet, H. 2008. A fast similarity join algorithm using graphics processing units. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Liu, B. and Rundensteiner, E. A. 2005. Revisiting pipelined parallelism in multi-join query processing. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 829--840. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lu, H., Tan, K.-L., and Sahn, M.-C. 1990. Hash-based join algorithms for multiprocessor computers with shared memory. In Proceedings of the 16th International Conference on Very Large Databases. Morgan Kaufmann Publishers Inc., San Francisco, CA, 198--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Manegold, S., Boncz, P., and Kersten, M. 2002. Generic database cost models for hierarchical memory systems. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nguyen, H. 2008. GPU gems 3. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Owens, J. D., Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A. E., and Purcell, T. J. 2007. A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26.Google ScholarGoogle Scholar
  35. Rao, J. and Ross, K. A. 1999. Cache conscious indexing for decision-support in main memory. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers Inc., San Francisco, CA, 78--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Schneider, D. A. and DeWitt, D. J. 1989. A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. SIGMOD Rec. 18, 2, 110--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Selinger, P. G., Astrahan, M. M., Chamberlin, D. D., Lorie, R. A., and Price, T. G. 1979. Access path selection in a relational database management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM Press, New York, NY, 23--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for gpu computing. Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shao, M., Ailamaki, A., and Falsafi, B. 2005. DBmbench: Fast and accurate database workload representation on modern microarchitecture. In Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research (CASCON). IBM Press, 254--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shatdal, A., Kant, C., and Naughton, J. F. 1994. Cache conscious algorithms for relational query processing. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann Publishers Inc., San Francisco, CA, 510--521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E., O'Neil, P., Rasin, A., Tran, N., and Zdonik, S. 2005. C-store: A column-oriented dbms. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sun, C., Agrawal, D., and Abbadi, A. E. 2003. Hardware acceleration for spatial selections and joins. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 455--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tarditi, D., Puri, S., and Oglesby, J. 2006. Accelerator: using data parallelism to program GPUS for general-purpose uses. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zhou, J., Cieslewicz, J., Ross, K. A., and Shah, M. 2005. Improving database performance on simultaneous multithreading processors. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Zukowski, M., Heman, S., Nes, N., and Boncz, P. 2006. Super-scalar RAM-CPU cache compression. In Proceedings of the 22nd International Conference on Data Engineering (ICDE). IEEE Computer Society, Washington, DC, 59. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Relational query coprocessing on graphics processors

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Database Systems
            ACM Transactions on Database Systems  Volume 34, Issue 4
            December 2009
            271 pages
            ISSN:0362-5915
            EISSN:1557-4644
            DOI:10.1145/1620585
            Issue’s Table of Contents

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 December 2009
            • Accepted: 1 August 2009
            • Revised: 1 March 2009
            • Received: 1 August 2008
            Published in tods Volume 34, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader