research-article

Concurrent analytical query processing with GPUs

Authors:
Kaibo Wang

The Ohio State University

The Ohio State University
View Profile

,
Kai Zhang

University of Science and Technology of China and The Ohio State University

University of Science and Technology of China and The Ohio State University
View Profile

,
Yuan Yuan

The Ohio State University

The Ohio State University
View Profile

,
Siyuan Ma

The Ohio State University

The Ohio State University
View Profile

,
Rubao Lee

The Ohio State University

The Ohio State University
View Profile

,
Xiaoning Ding

New Jersey Institute of Technology

New Jersey Institute of Technology
View Profile

,
Xiaodong Zhang

The Ohio State University

The Ohio State University
View Profile

Proceedings of the VLDB Endowment Volume 7 Issue 11pp 1011–1022https://doi.org/10.14778/2732967.2732976

Published:01 July 2014Publication History

Proceedings of the VLDB Endowment

Abstract

In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the profiling of an open-source GPU query engine running commonly used single-query data warehousing workloads, we observe that the utilization of main GPU resources is only up to 25%. The underutilization leads to low system throughput.

To address the problem, this paper proposes concurrent query execution as an effective solution. To efficiently share GPUs among concurrent queries for high throughput, the major challenge is to provide software support to control and resolve resource contention incurred by the sharing. Our solution relies on GPU query scheduling and device memory swapping policies to address this challenge. We have implemented a prototype system and evaluated it intensively. The experiment results confirm the effectiveness and performance advantage of our approach. By executing multiple GPU queries concurrently, system throughput can be improved by up to 55% compared with dedicated processing.

References

code.google.com/p/gpudb.Google Scholar
monetdb.org.Google Scholar
docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html.Google Scholar
N. Bandi, C. Sun, D. Agrawal, and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. In VLDB, 2004. Google ScholarDigital Library
S. Bress. Why it is time for a HyPE: A hybrid query processing engine for efficient GPU coprocessing in DBMS. Proc. VLDB Endow., 6(12), 2013. Google ScholarDigital Library
N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: High performance graphics co-processor sorting for large database management. In SIGMOD, pages 325--336, 2006. Google ScholarDigital Library
N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD, 2004. Google ScholarDigital Library
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, pages 511--524, 2008. Google ScholarDigital Library
B. He and J. X. Yu. High-throughput transaction executions on graphics processors. Proc. VLDB Endow., 4(5): 314--325, 2011. Google ScholarDigital Library
M. Heimel and V. Markl. A first step towards GPU-assisted query optimization. In ADMS, 2012.Google Scholar
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. Proc. VLDB Endow., 6(9): 709--720, 2013. Google ScholarDigital Library
T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In DaMoN, pages 55--62, 2012. Google ScholarDigital Library
S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In USENIX ATC, pages 2--2, 2011. Google ScholarDigital Library
S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-class GPU resource management in the operating system. In USENIX ATC, 2012. Google ScholarDigital Library
Khronos OpenCL Working Group. The OpenCL Specification, version 2.0, 2013.Google Scholar
R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, and X. Zhang. YSmart: Yet another SQL-to-MapReduce translator. In ICDCS, pages 25--36, 2011. Google ScholarDigital Library
T. Mostak. An overview of MapD (massively parallel database). MIT Technical Report, 2013.Google Scholar
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ISCA, pages 63--74, 2008. Google ScholarDigital Library
T. Ni. DirectCompute: Bring GPU computing to the mainstream. In GTC, 2009.Google Scholar
NVIDIA. CUDA C programming guide, 2013.Google Scholar
P. O'Neil, B. O'Neil, and X. Chen. Star schema benchmark. cs.umb.edu/poneil/StarSchemaB.PDF.Google Scholar
H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In ICDE, 2014.Google ScholarCross Ref
H. Pirk, S. Manegold, and M. L. Kersten. Accelerating foreign-key joins using asymmetric memory channels. In VLDB, pages 27--35, 2011.Google Scholar
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In SOSP, 2011. Google ScholarDigital Library
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351--362, 2010. Google ScholarDigital Library
E. A. Sitaridi and K. A. Ross. Ameliorating memory contention of OLAP operators on GPU processors. In DaMoN, 2012. Google ScholarDigital Library
A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, pages 234--244, 2000. Google ScholarDigital Library
K. Wang, X. Ding, R. Lee, S. Kato, and X. Zhang. GDM: device memory management for GPGPU computing. In SIGMETRICS, 2014. Google ScholarDigital Library
K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems. Proc. VLDB Endow., 5(11): 1543--1554, 2012. Google ScholarDigital Library
H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In Micro, pages 107--118, 2012. Google ScholarDigital Library
S. Yalamanchili. Scaling data warehousing applications using GPUs. In FastPath, 2013.Google Scholar
Y. Yuan, R. Lee, and X. Zhang. The Yin and Yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow., 6(10): 817--828, 2013. Google ScholarDigital Library

Index Terms

Concurrent analytical query processing with GPUs
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Parallel spatial query processing on GPUs using R-trees
BigSpatial '13: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data

R-Trees are popular spatial indexing techniques that have been widely adopted in many geospatial applications. As commodity GPUs (Graphics Processing Units) are increasingly becoming available on personal workstations and cluster computers, there are ...
Read More
The Case for SIMDified Analytical Query Processing on GPUs
DAMON '21: Proceedings of the 17th International Workshop on Data Management on New Hardware

Data-level parallelism (DLP) is a heavily used hardware-driven parallelization technique to optimize the analytical query processing, especially in in-memory column stores. This kind of parallelism is characterized by executing essentially the same ...
Read More
Relational Joins on GPUs for In-memory Database Query Processing
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 7, Issue 11
July 2014
92 pages
ISSN:2150-8097
Editors:
H. V. Jagadish
University of Michigan
,
Aoying Zhou
East Normal University, China
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2014
Published in pvldb Volume 7, Issue 11
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 323
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Concurrent analytical query processing with GPUs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Parallel spatial query processing on GPUs using R-trees

The Case for SIMDified Analytical Query Processing on GPUs

Relational Joins on GPUs for In-memory Database Query Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Concurrent analytical query processing with GPUs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Parallel spatial query processing on GPUs using R-trees

The Case for SIMDified Analytical Query Processing on GPUs

Relational Joins on GPUs for In-memory Database Query Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media