skip to main content
research-article

Concurrent analytical query processing with GPUs

Published:01 July 2014Publication History
Skip Abstract Section

Abstract

In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the profiling of an open-source GPU query engine running commonly used single-query data warehousing workloads, we observe that the utilization of main GPU resources is only up to 25%. The underutilization leads to low system throughput.

To address the problem, this paper proposes concurrent query execution as an effective solution. To efficiently share GPUs among concurrent queries for high throughput, the major challenge is to provide software support to control and resolve resource contention incurred by the sharing. Our solution relies on GPU query scheduling and device memory swapping policies to address this challenge. We have implemented a prototype system and evaluated it intensively. The experiment results confirm the effectiveness and performance advantage of our approach. By executing multiple GPU queries concurrently, system throughput can be improved by up to 55% compared with dedicated processing.

References

  1. code.google.com/p/gpudb.Google ScholarGoogle Scholar
  2. monetdb.org.Google ScholarGoogle Scholar
  3. docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html.Google ScholarGoogle Scholar
  4. N. Bandi, C. Sun, D. Agrawal, and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Bress. Why it is time for a HyPE: A hybrid query processing engine for efficient GPU coprocessing in DBMS. Proc. VLDB Endow., 6(12), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: High performance graphics co-processor sorting for large database management. In SIGMOD, pages 325--336, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, pages 511--524, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. He and J. X. Yu. High-throughput transaction executions on graphics processors. Proc. VLDB Endow., 4(5): 314--325, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Heimel and V. Markl. A first step towards GPU-assisted query optimization. In ADMS, 2012.Google ScholarGoogle Scholar
  11. M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. Proc. VLDB Endow., 6(9): 709--720, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In DaMoN, pages 55--62, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In USENIX ATC, pages 2--2, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-class GPU resource management in the operating system. In USENIX ATC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Khronos OpenCL Working Group. The OpenCL Specification, version 2.0, 2013.Google ScholarGoogle Scholar
  16. R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, and X. Zhang. YSmart: Yet another SQL-to-MapReduce translator. In ICDCS, pages 25--36, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Mostak. An overview of MapD (massively parallel database). MIT Technical Report, 2013.Google ScholarGoogle Scholar
  18. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ISCA, pages 63--74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Ni. DirectCompute: Bring GPU computing to the mainstream. In GTC, 2009.Google ScholarGoogle Scholar
  20. NVIDIA. CUDA C programming guide, 2013.Google ScholarGoogle Scholar
  21. P. O'Neil, B. O'Neil, and X. Chen. Star schema benchmark. cs.umb.edu/poneil/StarSchemaB.PDF.Google ScholarGoogle Scholar
  22. H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In ICDE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  23. H. Pirk, S. Manegold, and M. L. Kersten. Accelerating foreign-key joins using asymmetric memory channels. In VLDB, pages 27--35, 2011.Google ScholarGoogle Scholar
  24. C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351--362, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. A. Sitaridi and K. A. Ross. Ameliorating memory contention of OLAP operators on GPU processors. In DaMoN, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, pages 234--244, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Wang, X. Ding, R. Lee, S. Kato, and X. Zhang. GDM: device memory management for GPGPU computing. In SIGMETRICS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems. Proc. VLDB Endow., 5(11): 1543--1554, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In Micro, pages 107--118, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Yalamanchili. Scaling data warehousing applications using GPUs. In FastPath, 2013.Google ScholarGoogle Scholar
  32. Y. Yuan, R. Lee, and X. Zhang. The Yin and Yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow., 6(10): 817--828, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Concurrent analytical query processing with GPUs
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 7, Issue 11
        July 2014
        92 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 July 2014
        Published in pvldb Volume 7, Issue 11

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader