research-article

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

Authors:
Jason Power

Department of Computer Sciences, University of Wisconsin--Madison

Department of Computer Sciences, University of Wisconsin--Madison
View Profile

,
Yinan Li

Department of Computer Sciences, University of Wisconsin--Madison

Department of Computer Sciences, University of Wisconsin--Madison
View Profile

,
Mark D. Hill

Department of Computer Sciences, University of Wisconsin--Madison

Department of Computer Sciences, University of Wisconsin--Madison
View Profile

,
Jignesh M. Patel

Department of Computer Sciences, University of Wisconsin--Madison

Department of Computer Sciences, University of Wisconsin--Madison
View Profile

,
David A. Wood

Department of Computer Sciences, University of Wisconsin--Madison

Department of Computer Sciences, University of Wisconsin--Madison
View Profile

DaMoN'15: Proceedings of the 11th International Workshop on Data Management on New HardwareMay 2015Article No.: 11Pages 1–8https://doi.org/10.1145/2771937.2771941

Published:31 May 2015Publication History

DaMoN'15: Proceedings of the 11th International Workshop on Data Management on New Hardware

Pages 1–8

ABSTRACT

There have been a number of research proposals to use discrete graphics processing units (GPUs) to accelerate database operations. Although many of these works show up to an order of magnitude performance improvement, discrete GPUs are not commonly used in modern database systems. However, there is now a proliferation of integrated GPUs which are on the same silicon die as the conventional CPU. With the advent of new programming models like heterogeneous system architecture, these integrated GPUs are considered first-class compute units, with transparent access to CPU virtual addresses and very low overhead for computation offloading. We show that integrated GPUs significantly reduce the overheads of using GPUs in a database environment. Specifically, an integrated GPU is 3x faster than a discrete GPU even though the discrete GPU has 4x the computational capability. Therefore, we develop high performance scan and aggregate algorithms for the integrated GPU. We show that the integrated GPU can outperform a four-core CPU with SIMD extensions by an average of 30% (up to 3:2x) and provides an average of 45% reduction in energy on 16 TPC-H queries.

References

L. Abraham et al. Scuba: Diving into data at facebook. PVLDB, 6(11):1057--1067, 2013. Google ScholarDigital Library
AMD. AMD's most advanced APU ever. http://www.amd.com/us/products/desktop/processors/a-series/Pages/nextgenapu.aspx. Accessed: 2014-1-23.Google Scholar
AMD. Graphics Card Solutions. http://products.amd.com/en-us/GraphicCardResult.aspx. Accessed: 2014-1-23.Google Scholar
Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. In SIGMOD Conference, pages 271--282, 2001. Google ScholarDigital Library
W. chun Feng and S. Xiao. To gpu synchronize or not gpu synchronize? In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 3801--3804, May 2010.Google Scholar
F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA database -- an architecture overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.Google Scholar
Z. Feng and E. Lo. Accelerating aggregation using intra-cycle parallelism. In Data Engineering (ICDE), 2015 IEEE 31th International Conference on, 2015.Google ScholarCross Ref
G. GLIGOR and S. Teodoru. Oracle Exalytics: Engineered for Speed-of-Thought Analytics. Database Systems Journal, 2(4):3--8, December 2011.Google Scholar
N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD Conference, page 325, 2006. Google ScholarDigital Library
N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD Conference, page 215, 2004. Google ScholarDigital Library
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD Conference, page 511, 2008. Google ScholarDigital Library
J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB, 6(10):889--900, 2013. Google ScholarDigital Library
J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. Proc. VLDB Endow., 8(4):329--340, Dec. 2014. Google ScholarDigital Library
R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. PVLDB, 1(1):622--634, 2008. Google ScholarDigital Library
T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In DaMoN Workshop, pages 55--62, 2012. Google ScholarDigital Library
S. W. Keckler. Life after Dennard and How I Learned to Love the Picojoule. In MICRO 44 Keynote, 2011.Google Scholar
Y. Li and J. M. Patel. BitWeaving: fast scans for main memory data processing. In SIGMOD Conference, pages 289--300, 2013. Google ScholarDigital Library
Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. PVLDB, 7(10), 2014. Google ScholarDigital Library
NVIDIA. NVIDIA's Next Generation CUDA Compute Architecture: Fermi, 2009.Google Scholar
O. Polychroniou and K. A. Ross. High throughput heavy hitter aggregation for modern simd processors. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN '13, pages 6:1--6:6, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
J. Power, Y. Li, M. D. Hill, J. M. Patel, and D. A. Wood. Implications of emerging 3D GPU architecture on the scan primitive. SIGMOD Rec., 44(1), 2015. Google ScholarDigital Library
V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In ICDE Conference, 2008. Google ScholarDigital Library
P. Rogers. Heterogeneous System Architecture Overview. In Hot Chips 25, 2013.Google Scholar
K. Rupp. CPU, GPU and MIC hardware characteristics over time. http://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-characteristics-over-time/. Accessed: 2015-05-05.Google Scholar
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: a case for bandwidth oblivious SIMD sort. In SIGMOD Conference, pages 351--362, 2010. Google ScholarDigital Library
L. Sun, S. Krishnan, R. S. Xin, and M. J. Franklin. A partitioning framework for aggressive data skipping. PVLDB, 7(13):1617--1620, 2014. Google ScholarDigital Library
T. Willhalm, I. Oukid, I. Müller, and F. Faerber. Vectorizing database column scans with complex predicates. In AMDS Workshop, pages 1--12, 2013.Google Scholar
T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, 2009. Google ScholarDigital Library
Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In Proceedings of the Seventh International Workshop on Data Management on New Hardware, DaMoN '11, pages 1--9, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD Conference, pages 145--156, 2002. Google ScholarDigital Library

Index Terms

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
2. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Brook for GPUs: stream computing on graphics hardware
SIGGRAPH '04: ACM SIGGRAPH 2004 Papers

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Read More
Accelerated 2d image processing on GPUs
ICCS'05: Proceedings of the 5th international conference on Computational Science - Volume Part II

Graphics processing units (GPUs) in recent years have evolved to become powerful, programmable vector processing units. Furthermore, the maximum processing power of current generation GPUs is roughly four times that of current generation CPUs (central ...
Read More
Accelerating a hydrological uncertainty ensemble model using graphics processing units (GPUs)

The practical application of hydrological uncertainty models that are designed to generate multiple ensembles can be severely restricted by the available computer processing power and thus, the time taken to generate the results. CPU clusters can help ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DaMoN'15: Proceedings of the 11th International Workshop on Data Management on New Hardware
May 2015
100 pages
ISBN:9781450336383
DOI:10.1145/2771937
Editors:
Ippokratis Pandis,
Martin Kersten
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
DaMoN'15 Paper Acceptance Rate12of16submissions,75%Overall Acceptance Rate80of102submissions,78%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 240
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

DaMoN'15: Proceedings of the 11th International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Brook for GPUs: stream computing on graphics hardware

Accelerated 2d image processing on GPUs

Accelerating a hydrological uncertainty ensemble model using graphics processing units (GPUs)