skip to main content
article

The Yin and Yang of processing data warehousing queries on GPU devices

Authors Info & Claims
Published:01 August 2013Publication History
Skip Abstract Section

Abstract

Database community has made significant research efforts to optimize query processing on GPUs in the past few years. However, we can hardly find that GPUs have been truly adopted in major warehousing production systems. Preparing to merge GPUs to the warehousing systems, we have identified and addressed several critical issues in a three-dimensional study of warehousing queries on GPUs by varying query characteristics, software techniques, and GPU hardware configurations. We also propose an analytical model to understand and predict the query performance on GPUs. Based on our study, we present our performance insights for warehousing query execution on GPUs. The objective of our work is to provide a comprehensive guidance for GPU architects, software system designers, and database practitioners to narrow the speed gap between the GPU kernel execution (the fast mode) and data transfer to prepare GPU execution (the slow mode) for high performance in processing data warehousing queries. The GPU query engine developed in this work is open source to the public.

References

  1. Amd accelerated parallel processing opencl programming guide (v2.8). http://developer.amd. com/download/AMD_Accelerated_Parallel_ Processing_OpenCL_Programming_Guide.pdf.Google ScholarGoogle Scholar
  2. Cuda c programming guide 5.0. http://docs.nvidia. com/cuda/pdf/CUDA_C_Programming_Guide.pdf.Google ScholarGoogle Scholar
  3. Global memory usage and strategy. http://developer.download.nvidia.com/CUDA/training/cuda_webinars_GlobalMemory.pdf.Google ScholarGoogle Scholar
  4. Opencl. http://www.khronos.org/opencl.Google ScholarGoogle Scholar
  5. Opencl programming guide for the cuda architecture. http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingGuide.pdf.Google ScholarGoogle Scholar
  6. D. Abadi, D. Myers, D. DeWitt, and S. Madden. Materialization strategies in a column-oriented dbms. In ICDE, pages 466-475, April 2007.Google ScholarGoogle Scholar
  7. D. J. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD Conference, 2006. Google ScholarGoogle Scholar
  8. D. J. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD Conference, pages 967-980, 2008. Google ScholarGoogle Scholar
  9. D. A. Alcantara, A. Sharf, F. Abbasinejad, S. Sengupta, M. Mitzenmacher, J. D. Owens, and N. Amenta. Real-time parallel hashing on the gpu. ACM Trans. Graph., 28(5), 2009. Google ScholarGoogle Scholar
  10. N. Ao, F. Zhang, D. Wu, D. S. Stones, G. Wang, X. Liu, J. Liu, and S. Lin. Efficient parallel lists intersection and index compression algorithms using graphics processing units. PVLDB, 2011. Google ScholarGoogle Scholar
  11. C. Balkesen, J. Teubner, G. Alonso, and T. Ozsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In ICDE, 2013. Google ScholarGoogle Scholar
  12. N. Bandi, C. Sun, A. El Abbadi, and D. Agrawal. Hardware acceleration in commercial databases: A case study of spatial operations. In VLDB, 2004. Google ScholarGoogle Scholar
  13. S. Blanas, Y. Li, and J. Patel. Design and evaluation of main memory hash join algorithms for multicore cpus. In SIGMOD, pages 37-48, 2011. Google ScholarGoogle Scholar
  14. G. Candea, N. Polyzotis, and R. Vingralek. A scalable, predictable join operator for highly concurrent data warehouses. PVLDB, 2(1):277-288, 2009. Google ScholarGoogle Scholar
  15. P. Du, R. Weber, P. Luszczek, S. Tomov, G. D. Peterson, and J. Dongarra. From cuda to opencl: Towards a performance-portable solution for multi-platform gpu programming. Parallel Computing, 38(8):391-407, 2012. Google ScholarGoogle Scholar
  16. W. Fang, B. He, and Q. Luo. Database compression on graphics processors. In VLDB, 2010. Google ScholarGoogle Scholar
  17. N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: high performance graphics coprocessor sorting for large database management. In SIGMOD, 2006. Google ScholarGoogle Scholar
  18. N. K. Govindaraju, B. Lloyd, W. Wang, M. C. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD Conference, 2004. Google ScholarGoogle Scholar
  19. B. He, M. Liu, K. Yang, R. Fang, N. Govindaraju, Q. Luo, and P. Sander. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems, 34(4), December 2009. Google ScholarGoogle Scholar
  20. B. He, K. Yang, R. Fang, M. Liu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, pages 511-524, 2008. Google ScholarGoogle Scholar
  21. B. He and J. X. Yu. High-throughput transaction executions on graphics processors. PVLDB, 2011. Google ScholarGoogle Scholar
  22. S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. Monetdb: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40-45, 2012.Google ScholarGoogle Scholar
  23. T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. Gpu join processing revisited. In DaMoN, 2012. Google ScholarGoogle Scholar
  24. R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, and X. Zhang. Ysmart: Yet another sql-to-mapreduce translator. In ICDCS, pages 25-36, 2011. Google ScholarGoogle Scholar
  25. M. D. Lieberman, J. Sankaranarayanan, and H. Samet. A fast similarity join algorithm using graphics processing units. In ICDE, 2008. Google ScholarGoogle Scholar
  26. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. Google ScholarGoogle Scholar
  27. P. O'Neil, E. O'Neil, X. Chen, and S. Revilak. Star schema benchmark. http://www.cs.umb.edu/~poneil/StarSchemaB.PDF.Google ScholarGoogle Scholar
  28. H. Pirk, S. Manegold, and M. Kersten. Accelerating foreign-key joins using asymmetric memory channels. In ADMS, 2011.Google ScholarGoogle Scholar
  29. N. Satish, C. Kim, J. Chhugani, A. Nguyen, V. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort. In SIGMOD, 2010. Google ScholarGoogle Scholar
  30. E. Sitaridi and K. Ross. Ameliorating memory contention of olap operators on gpu processors. In DaMoN, pages 39-47, 2012. Google ScholarGoogle Scholar
  31. M. Stonebraker, C. Bear, U. Çetintemel, M. Cherniack, T. Ge, N. Hachem, S. Harizopoulos, J. Lifter, J. Rogers, and S. B. Zdonik. One size fits all? part 2: Benchmarking studies. In CIDR, 2007.Google ScholarGoogle Scholar
  32. K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on cpu-gpu hybrid systems. PVLDB, 5(11):1543-1554, 2012. Google ScholarGoogle Scholar
  33. H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient gpu computation. In MICRO-45, 2012. Google ScholarGoogle Scholar

Index Terms

  1. The Yin and Yang of processing data warehousing queries on GPU devices
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 6, Issue 10
        August 2013
        180 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 August 2013
        Published in pvldb Volume 6, Issue 10

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader