Abstract
In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the profiling of an open-source GPU query engine running commonly used single-query data warehousing workloads, we observe that the utilization of main GPU resources is only up to 25%. The underutilization leads to low system throughput.
To address the problem, this paper proposes concurrent query execution as an effective solution. To efficiently share GPUs among concurrent queries for high throughput, the major challenge is to provide software support to control and resolve resource contention incurred by the sharing. Our solution relies on GPU query scheduling and device memory swapping policies to address this challenge. We have implemented a prototype system and evaluated it intensively. The experiment results confirm the effectiveness and performance advantage of our approach. By executing multiple GPU queries concurrently, system throughput can be improved by up to 55% compared with dedicated processing.
- code.google.com/p/gpudb.Google Scholar
- monetdb.org.Google Scholar
- docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html.Google Scholar
- N. Bandi, C. Sun, D. Agrawal, and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. In VLDB, 2004. Google ScholarDigital Library
- S. Bress. Why it is time for a HyPE: A hybrid query processing engine for efficient GPU coprocessing in DBMS. Proc. VLDB Endow., 6(12), 2013. Google ScholarDigital Library
- N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: High performance graphics co-processor sorting for large database management. In SIGMOD, pages 325--336, 2006. Google ScholarDigital Library
- N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD, 2004. Google ScholarDigital Library
- B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, pages 511--524, 2008. Google ScholarDigital Library
- B. He and J. X. Yu. High-throughput transaction executions on graphics processors. Proc. VLDB Endow., 4(5): 314--325, 2011. Google ScholarDigital Library
- M. Heimel and V. Markl. A first step towards GPU-assisted query optimization. In ADMS, 2012.Google Scholar
- M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. Proc. VLDB Endow., 6(9): 709--720, 2013. Google ScholarDigital Library
- T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In DaMoN, pages 55--62, 2012. Google ScholarDigital Library
- S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In USENIX ATC, pages 2--2, 2011. Google ScholarDigital Library
- S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-class GPU resource management in the operating system. In USENIX ATC, 2012. Google ScholarDigital Library
- Khronos OpenCL Working Group. The OpenCL Specification, version 2.0, 2013.Google Scholar
- R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, and X. Zhang. YSmart: Yet another SQL-to-MapReduce translator. In ICDCS, pages 25--36, 2011. Google ScholarDigital Library
- T. Mostak. An overview of MapD (massively parallel database). MIT Technical Report, 2013.Google Scholar
- O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ISCA, pages 63--74, 2008. Google ScholarDigital Library
- T. Ni. DirectCompute: Bring GPU computing to the mainstream. In GTC, 2009.Google Scholar
- NVIDIA. CUDA C programming guide, 2013.Google Scholar
- P. O'Neil, B. O'Neil, and X. Chen. Star schema benchmark. cs.umb.edu/poneil/StarSchemaB.PDF.Google Scholar
- H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In ICDE, 2014.Google ScholarCross Ref
- H. Pirk, S. Manegold, and M. L. Kersten. Accelerating foreign-key joins using asymmetric memory channels. In VLDB, pages 27--35, 2011.Google Scholar
- C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In SOSP, 2011. Google ScholarDigital Library
- N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351--362, 2010. Google ScholarDigital Library
- E. A. Sitaridi and K. A. Ross. Ameliorating memory contention of OLAP operators on GPU processors. In DaMoN, 2012. Google ScholarDigital Library
- A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, pages 234--244, 2000. Google ScholarDigital Library
- K. Wang, X. Ding, R. Lee, S. Kato, and X. Zhang. GDM: device memory management for GPGPU computing. In SIGMETRICS, 2014. Google ScholarDigital Library
- K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems. Proc. VLDB Endow., 5(11): 1543--1554, 2012. Google ScholarDigital Library
- H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In Micro, pages 107--118, 2012. Google ScholarDigital Library
- S. Yalamanchili. Scaling data warehousing applications using GPUs. In FastPath, 2013.Google Scholar
- Y. Yuan, R. Lee, and X. Zhang. The Yin and Yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow., 6(10): 817--828, 2013. Google ScholarDigital Library
Index Terms
- Concurrent analytical query processing with GPUs
Recommendations
Parallel spatial query processing on GPUs using R-trees
BigSpatial '13: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial DataR-Trees are popular spatial indexing techniques that have been widely adopted in many geospatial applications. As commodity GPUs (Graphics Processing Units) are increasingly becoming available on personal workstations and cluster computers, there are ...
The Case for SIMDified Analytical Query Processing on GPUs
DAMON '21: Proceedings of the 17th International Workshop on Data Management on New HardwareData-level parallelism (DLP) is a heavily used hardware-driven parallelization technique to optimize the analytical query processing, especially in in-memory column stores. This kind of parallelism is characterized by executing essentially the same ...
Comments