research-article

GPL: A GPU-based Pipelined Query Processing Engine

Authors:
Johns Paul

Nanyang Technological University, Singapore, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore, Singapore
View Profile

,
Jiong He

Nanyang Technological University, Singapore, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore, Singapore
View Profile

,
Bingsheng He

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016Pages 1935–1950https://doi.org/10.1145/2882903.2915224

Published:26 June 2016Publication History

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Pages 1935–1950

ABSTRACT

Graphics Processing Units (GPUs) have evolved as a powerful query co-processor for main memory On-Line Analytical Processing (OLAP) databases. However, existing GPU-based query processors adopt a kernel-based execution approach which optimizes individual kernels for resource utilization and executes the GPU kernels involved in the query plan one by one. Such a kernel-based approach cannot utilize all GPU resources efficiently due to the resource underutilization of individual kernels and memory ping-pong across kernel executions. In this paper, we propose GPL, a novel pipelined query execution engine to improve the resource utilization of query co-processing on the GPU. Different from the existing kernel-based execution, GPL takes advantage of hardware features of new-generation GPUs including concurrent kernel execution and efficient data communication channel between kernels. We further develop an analytical model to guide the generation of the optimal pipelined query plan. Thus, the tile size of the pipelined query execution can be adapted in a cost-based manner. We evaluate GPL with TPC-H queries on both AMD and NVIDIA GPUs. The experimental results show that 1) the analytical model is able to guide determining the suitable parameter values in pipelined query execution plan, and 2) GPL is able to significantly outperform the state-of-the-art kernel-based query processing approaches, with improvement up to 48%.

References

A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. Dbmss on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pages 266--277, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
S. Arumugam, A. Dobra, C. M. Jermaine, N. Pansare, and L. Perez. The datapath system: A data-centric analytic processing engine for large data warehouses. In SIGMOD, 2010. Google ScholarDigital Library
C. Balkesen, J. Teubner, G. Alonso, and M. T. Özsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 362--373, April 2013. Google ScholarDigital Library
P. A. Boncz, M. Zukowski, and N. Nes. Monetdb/x100: Hyper-pipelining query execution. Conference on Innovative Data Systems Research (CIDR), 2005.Google Scholar
Z. Chen, J. Xu, J. Tang, K. Kwiat, and C. Kamhoua. G-storm: GPU-enabled high-throughput online data processing in storm. In Big Data (Big Data), 2015 IEEE International Conference on, pages 307--312, Oct 2015. Google ScholarDigital Library
Y. Cheng and F. Rusu. Parallel in-situ data processing with speculative loading. In SIGMOD. ACM, 2014. Google ScholarDigital Library
J. Cieslewicz, W. Mee, and K. A. Ross. Cache-conscious buffering for database operators with state. In Proceedings of the Fifth International Workshop on Data Management on New Hardware, DaMoN '09, New York, NY, USA, 2009. Google ScholarDigital Library
J. Giceva, G. Alonso, T. Roscoe, and T. Harris. Deployment of query plans on multicores. Proc. VLDB Endow., 8(3):233--244, Nov. 2014. Google ScholarDigital Library
N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli. High performance discrete fourier transforms on graphics processors. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC '08, Piscataway, NJ, USA, 2008. Google ScholarDigital Library
G. Graefe. Volcano - an extensible and parallel query evaluation system. IEEE Trans. on Knowl. and Data Eng., 1994. Google ScholarDigital Library
S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. QPipe: A simultaneously pipelined relational query engine. In SIGMOD, 2005. Google ScholarDigital Library
B. He, W. Fang, Q. Luo, N. K. Govindaraju, and T. Wang. Mars: A mapreduce framework on graphics processors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT '08, pages 260--269, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM Trans. Database Syst., 34(4):21:1--21:39, Dec. 2009. Google ScholarDigital Library
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 511--524, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. Proc. VLDB Endow., 6(10):889--900, Aug. 2013. Google ScholarDigital Library
J. He, S. Zhang, and B. He. In-cache query co-processing on coupled CPU-GPU architectures. Proc. VLDB Endow., 8(4):329--340, Dec. 2014. Google ScholarDigital Library
M. Heimel, M. Kiefer, and V. Markl. Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, New York, NY, USA, 2015. Google ScholarDigital Library
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. Proc. VLDB Endow., 6(9):709--720, July 2013. Google ScholarDigital Library
S. Idreos, F. Groffen, N. Nes, S. Manegold, K. S. Mullender, and M. L. Kersten. Monetdb: Two decades of research in column-oriented database architectures. IEEE Data Engineering Bulletin, 35(1), 2012.Google Scholar
S. Jha, B. He, M. Lu, X. Cheng, and H. P. Huynh. Improving main memory hash joins on intel xeon phi processors: An experimental approach. Proc. VLDB Endow., 8(6):642--653, Feb. 2015. Google ScholarDigital Library
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: A high-performance, distributed main memory transaction processing system. Proc. VLDB Endow., 1(2), Aug. 2008. Google ScholarDigital Library
V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: A numa-aware query evaluation framework for the many-core age. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 743--754, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
G. Luo, J. F. Naughton, C. J. Ellmann, and M. W. Watzke. Toward a progress indicator for database queries. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD '04, pages 791--802, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
S. Manegold, P. A. Boncz, and M. L. Kersten. Optimizing database architecture for the new bottleneck: Memory access. The VLDB Journal, 9(3), Dec. 2000. Google ScholarDigital Library
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 2007.Google ScholarCross Ref
I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-oriented transaction execution. Proc. VLDB Endow., 3(1--2), Sept. 2010. Google ScholarDigital Library
H. Pirk, F. Funke, M. Grund, T. Neumann, U. Leser, S. Manegold, A. Kemper, and M. Kersten. CPU and cache efficient management of memory-resident databases. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 14--25, April 2013. Google ScholarDigital Library
H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In 2014 IEEE 30th International Conference on Data Engineering, March 2014.Google ScholarCross Ref
O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking simd vectorization for in-memory databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 1493--1508, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
J. Power, Y. Li, M. D. Hill, J. M. Patel, and D. A. Wood. Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries. In Proceedings of the 11th International Workshop on Data Management on New Hardware, DaMoN'15, 2015. Google ScholarDigital Library
M. Saecker. Ocelot: A Hardware-Oblivious Database Engine. https://bitbucket.org/msaecker/monetdb-opencl.Google Scholar
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, SIGMOD '79, 1979. Google ScholarDigital Library
A. Shatdal, C. Kant, and J. F. Naughton. Cache conscious algorithms for relational query processing. In Proceedings of the 20th International Conference on Very Large Data Bases, VLDB '94, pages 510--521, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
K.-L. Tan, Q. Cai, B. C. Ooi, W.-F. Wong, C. Yao, and H. Zhang. In-memory databases: Challenges and opportunities from software and hardware perspectives. SIGMOD Rec., Aug. 2015. Google ScholarDigital Library
K. Wang, K. Zhang, Y. Yuan, S. Ma, R. Lee, X. Ding, and X. Zhang. Concurrent analytical query processing with GPUs. Proc. VLDB Endow., July 2014. Google ScholarDigital Library
H. Wu, G. Diamos, T. Sheard, M. Aref, S. Baxter, M. Garland, and S. Yalamanchili. Red fox: An execution environment for relational query processing on GPUs. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '14, pages 44:44--44:54, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
Y. Yuan, R. Lee, and X. Zhang. The yin and yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow., 6(10):817--828, Aug. 2013. Google ScholarDigital Library
H. Zhang, G. Chen, B. C. Ooi, K. L. Tan, and M. Zhang. In-memory big data management and processing: A survey. IEEE Transactions on Knowledge and Data Engineering, 27(7):1920--1948, July 2015.Google ScholarDigital Library
K. Zhang, K. Wang, Y. Yuan, L. Guo, R. Lee, and X. Zhang. Mega-kv: A case for GPUs to maximize the throughput of in-memory key-value stores. Proc. VLDB Endow., 8(11):1226--1237, July 2015. Google ScholarDigital Library
S. Zhang, J. He, B. He, and M. Lu. OmniDB: Towards portable and efficient query processing on parallel CPU/GPU architectures. Proc. VLDB Endow., Aug. 2013. Google ScholarDigital Library
M. Zukowski, M. van de Wiel, and P. Boncz. Vectorwise: A vectorized analytical dbms. In Data Engineering (ICDE), 2012 IEEE 28th International Conference on, pages 1349--1350, April 2012. Google ScholarDigital Library

Index Terms

GPL: A GPU-based Pipelined Query Processing Engine
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization
      2. DBMS engine architectures

Recommendations

Fine-Grained Tuple Transfer for Pipelined Query Execution on CPU-GPU Coprocessor
Database Systems for Advanced Applications
Abstract
To leverage the massively parallel capability of GPU for query execution, GPU databases have been studied for over a decade. Recently, researchers proposed to execute queries with both CPU and GPU in a pipelined approach. In the pipelined query ...
Read More
A coordinated tiling and batching framework for efficient GEMM on GPUs
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

General matrix multiplication (GEMM) plays a paramount role in a broad range of domains such as deep learning, scientific computing, and image processing. The primary optimization method is to partition the matrix into many tiles and exploit the ...
Read More
Computing on multi-core platform: performance issues
ICCCS '11: Proceedings of the 2011 International Conference on Communication, Computing & Security

The ubiquity of multi-core processors in commodity computing systems has raised a significant programming challenge for their effective use. As multi-core processors with tens or hundreds of cores begin to proliferate, system optimization issues once ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
General Chairs:
Fatma Özcan
IBM Research, USA
,
Georgia Koutrika
HP Labs, USA
,
Program Chair:
Sam Madden
Massachusetts Institute of Technology, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
KBE
channel
pipelined execution
tiling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 767
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GPL: A GPU-based Pipelined Query Processing Engine

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fine-Grained Tuple Transfer for Pipelined Query Execution on CPU-GPU Coprocessor

A coordinated tiling and batching framework for efficient GEMM on GPUs

Computing on multi-core platform: performance issues

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

GPL: A GPU-based Pipelined Query Processing Engine

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fine-Grained Tuple Transfer for Pipelined Query Execution on CPU-GPU Coprocessor

A coordinated tiling and batching framework for efficient GEMM on GPUs

Computing on multi-core platform: performance issues

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media