ABSTRACT
There have been a number of research proposals to use discrete graphics processing units (GPUs) to accelerate database operations. Although many of these works show up to an order of magnitude performance improvement, discrete GPUs are not commonly used in modern database systems. However, there is now a proliferation of integrated GPUs which are on the same silicon die as the conventional CPU. With the advent of new programming models like heterogeneous system architecture, these integrated GPUs are considered first-class compute units, with transparent access to CPU virtual addresses and very low overhead for computation offloading. We show that integrated GPUs significantly reduce the overheads of using GPUs in a database environment. Specifically, an integrated GPU is 3x faster than a discrete GPU even though the discrete GPU has 4x the computational capability. Therefore, we develop high performance scan and aggregate algorithms for the integrated GPU. We show that the integrated GPU can outperform a four-core CPU with SIMD extensions by an average of 30% (up to 3:2x) and provides an average of 45% reduction in energy on 16 TPC-H queries.
- L. Abraham et al. Scuba: Diving into data at facebook. PVLDB, 6(11):1057--1067, 2013. Google ScholarDigital Library
- AMD. AMD's most advanced APU ever. http://www.amd.com/us/products/desktop/processors/a-series/Pages/nextgenapu.aspx. Accessed: 2014-1-23.Google Scholar
- AMD. Graphics Card Solutions. http://products.amd.com/en-us/GraphicCardResult.aspx. Accessed: 2014-1-23.Google Scholar
- Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. In SIGMOD Conference, pages 271--282, 2001. Google ScholarDigital Library
- W. chun Feng and S. Xiao. To gpu synchronize or not gpu synchronize? In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 3801--3804, May 2010.Google Scholar
- F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA database -- an architecture overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.Google Scholar
- Z. Feng and E. Lo. Accelerating aggregation using intra-cycle parallelism. In Data Engineering (ICDE), 2015 IEEE 31th International Conference on, 2015.Google ScholarCross Ref
- G. GLIGOR and S. Teodoru. Oracle Exalytics: Engineered for Speed-of-Thought Analytics. Database Systems Journal, 2(4):3--8, December 2011.Google Scholar
- N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD Conference, page 325, 2006. Google ScholarDigital Library
- N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD Conference, page 215, 2004. Google ScholarDigital Library
- B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD Conference, page 511, 2008. Google ScholarDigital Library
- J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB, 6(10):889--900, 2013. Google ScholarDigital Library
- J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. Proc. VLDB Endow., 8(4):329--340, Dec. 2014. Google ScholarDigital Library
- R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. PVLDB, 1(1):622--634, 2008. Google ScholarDigital Library
- T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In DaMoN Workshop, pages 55--62, 2012. Google ScholarDigital Library
- S. W. Keckler. Life after Dennard and How I Learned to Love the Picojoule. In MICRO 44 Keynote, 2011.Google Scholar
- Y. Li and J. M. Patel. BitWeaving: fast scans for main memory data processing. In SIGMOD Conference, pages 289--300, 2013. Google ScholarDigital Library
- Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. PVLDB, 7(10), 2014. Google ScholarDigital Library
- NVIDIA. NVIDIA's Next Generation CUDA Compute Architecture: Fermi, 2009.Google Scholar
- O. Polychroniou and K. A. Ross. High throughput heavy hitter aggregation for modern simd processors. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN '13, pages 6:1--6:6, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- J. Power, Y. Li, M. D. Hill, J. M. Patel, and D. A. Wood. Implications of emerging 3D GPU architecture on the scan primitive. SIGMOD Rec., 44(1), 2015. Google ScholarDigital Library
- V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
- V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In ICDE Conference, 2008. Google ScholarDigital Library
- P. Rogers. Heterogeneous System Architecture Overview. In Hot Chips 25, 2013.Google Scholar
- K. Rupp. CPU, GPU and MIC hardware characteristics over time. http://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-characteristics-over-time/. Accessed: 2015-05-05.Google Scholar
- N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: a case for bandwidth oblivious SIMD sort. In SIGMOD Conference, pages 351--362, 2010. Google ScholarDigital Library
- L. Sun, S. Krishnan, R. S. Xin, and M. J. Franklin. A partitioning framework for aggressive data skipping. PVLDB, 7(13):1617--1620, 2014. Google ScholarDigital Library
- T. Willhalm, I. Oukid, I. Müller, and F. Faerber. Vectorizing database column scans with complex predicates. In AMDS Workshop, pages 1--12, 2013.Google Scholar
- T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, 2009. Google ScholarDigital Library
- Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In Proceedings of the Seventh International Workshop on Data Management on New Hardware, DaMoN '11, pages 1--9, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD Conference, pages 145--156, 2002. Google ScholarDigital Library
Index Terms
- Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries
Recommendations
Brook for GPUs: stream computing on graphics hardware
SIGGRAPH '04: ACM SIGGRAPH 2004 PapersIn this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Accelerated 2d image processing on GPUs
ICCS'05: Proceedings of the 5th international conference on Computational Science - Volume Part IIGraphics processing units (GPUs) in recent years have evolved to become powerful, programmable vector processing units. Furthermore, the maximum processing power of current generation GPUs is roughly four times that of current generation CPUs (central ...
Accelerating a hydrological uncertainty ensemble model using graphics processing units (GPUs)
The practical application of hydrological uncertainty models that are designed to generate multiple ensembles can be severely restricted by the available computer processing power and thus, the time taken to generate the results. CPU clusters can help ...
Comments