Abstract
Database community has made significant research efforts to optimize query processing on GPUs in the past few years. However, we can hardly find that GPUs have been truly adopted in major warehousing production systems. Preparing to merge GPUs to the warehousing systems, we have identified and addressed several critical issues in a three-dimensional study of warehousing queries on GPUs by varying query characteristics, software techniques, and GPU hardware configurations. We also propose an analytical model to understand and predict the query performance on GPUs. Based on our study, we present our performance insights for warehousing query execution on GPUs. The objective of our work is to provide a comprehensive guidance for GPU architects, software system designers, and database practitioners to narrow the speed gap between the GPU kernel execution (the fast mode) and data transfer to prepare GPU execution (the slow mode) for high performance in processing data warehousing queries. The GPU query engine developed in this work is open source to the public.
- Amd accelerated parallel processing opencl programming guide (v2.8). http://developer.amd. com/download/AMD_Accelerated_Parallel_ Processing_OpenCL_Programming_Guide.pdf.Google Scholar
- Cuda c programming guide 5.0. http://docs.nvidia. com/cuda/pdf/CUDA_C_Programming_Guide.pdf.Google Scholar
- Global memory usage and strategy. http://developer.download.nvidia.com/CUDA/training/cuda_webinars_GlobalMemory.pdf.Google Scholar
- Opencl. http://www.khronos.org/opencl.Google Scholar
- Opencl programming guide for the cuda architecture. http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingGuide.pdf.Google Scholar
- D. Abadi, D. Myers, D. DeWitt, and S. Madden. Materialization strategies in a column-oriented dbms. In ICDE, pages 466-475, April 2007.Google Scholar
- D. J. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD Conference, 2006. Google Scholar
- D. J. Abadi, S. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD Conference, pages 967-980, 2008. Google Scholar
- D. A. Alcantara, A. Sharf, F. Abbasinejad, S. Sengupta, M. Mitzenmacher, J. D. Owens, and N. Amenta. Real-time parallel hashing on the gpu. ACM Trans. Graph., 28(5), 2009. Google Scholar
- N. Ao, F. Zhang, D. Wu, D. S. Stones, G. Wang, X. Liu, J. Liu, and S. Lin. Efficient parallel lists intersection and index compression algorithms using graphics processing units. PVLDB, 2011. Google Scholar
- C. Balkesen, J. Teubner, G. Alonso, and T. Ozsu. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware. In ICDE, 2013. Google Scholar
- N. Bandi, C. Sun, A. El Abbadi, and D. Agrawal. Hardware acceleration in commercial databases: A case study of spatial operations. In VLDB, 2004. Google Scholar
- S. Blanas, Y. Li, and J. Patel. Design and evaluation of main memory hash join algorithms for multicore cpus. In SIGMOD, pages 37-48, 2011. Google Scholar
- G. Candea, N. Polyzotis, and R. Vingralek. A scalable, predictable join operator for highly concurrent data warehouses. PVLDB, 2(1):277-288, 2009. Google Scholar
- P. Du, R. Weber, P. Luszczek, S. Tomov, G. D. Peterson, and J. Dongarra. From cuda to opencl: Towards a performance-portable solution for multi-platform gpu programming. Parallel Computing, 38(8):391-407, 2012. Google Scholar
- W. Fang, B. He, and Q. Luo. Database compression on graphics processors. In VLDB, 2010. Google Scholar
- N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: high performance graphics coprocessor sorting for large database management. In SIGMOD, 2006. Google Scholar
- N. K. Govindaraju, B. Lloyd, W. Wang, M. C. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD Conference, 2004. Google Scholar
- B. He, M. Liu, K. Yang, R. Fang, N. Govindaraju, Q. Luo, and P. Sander. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems, 34(4), December 2009. Google Scholar
- B. He, K. Yang, R. Fang, M. Liu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, pages 511-524, 2008. Google Scholar
- B. He and J. X. Yu. High-throughput transaction executions on graphics processors. PVLDB, 2011. Google Scholar
- S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. Monetdb: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull., 35(1):40-45, 2012.Google Scholar
- T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. Gpu join processing revisited. In DaMoN, 2012. Google Scholar
- R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, and X. Zhang. Ysmart: Yet another sql-to-mapreduce translator. In ICDCS, pages 25-36, 2011. Google Scholar
- M. D. Lieberman, J. Sankaranarayanan, and H. Samet. A fast similarity join algorithm using graphics processing units. In ICDE, 2008. Google Scholar
- R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. Google Scholar
- P. O'Neil, E. O'Neil, X. Chen, and S. Revilak. Star schema benchmark. http://www.cs.umb.edu/~poneil/StarSchemaB.PDF.Google Scholar
- H. Pirk, S. Manegold, and M. Kersten. Accelerating foreign-key joins using asymmetric memory channels. In ADMS, 2011.Google Scholar
- N. Satish, C. Kim, J. Chhugani, A. Nguyen, V. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: a case for bandwidth oblivious simd sort. In SIGMOD, 2010. Google Scholar
- E. Sitaridi and K. Ross. Ameliorating memory contention of olap operators on gpu processors. In DaMoN, pages 39-47, 2012. Google Scholar
- M. Stonebraker, C. Bear, U. Çetintemel, M. Cherniack, T. Ge, N. Hachem, S. Harizopoulos, J. Lifter, J. Rogers, and S. B. Zdonik. One size fits all? part 2: Benchmarking studies. In CIDR, 2007.Google Scholar
- K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on cpu-gpu hybrid systems. PVLDB, 5(11):1543-1554, 2012. Google Scholar
- H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient gpu computation. In MICRO-45, 2012. Google Scholar
Index Terms
- The Yin and Yang of processing data warehousing queries on GPU devices
Recommendations
GPU join processing revisited
DaMoN '12: Proceedings of the Eighth International Workshop on Data Management on New HardwareUntil recently, the use of graphics processing units (GPUs) for query processing was limited by the amount of memory on the graphics card, a few gigabytes at best. Moreover, input tables had to be copied to GPU memory before they could be processed, and ...
GPU Acceleration of Range Queries over Large Data Sets
BDCAT '19: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and TechnologiesData management systems commonly use bitmap indices to increase the efficiency of querying scientific data. Bitmaps are usually highly compressible and can be queried directly using fast hardware-supported bitwise logical operations. The processing of ...
Data-intensive document clustering on graphics processing unit (GPU) clusters
Document clustering is a central method to mine massive amounts of data. Due to the explosion of raw documents generated on the Internet and the necessity to analyze them efficiently in various intelligent information systems, clustering techniques have ...
Comments