skip to main content
10.1145/2771937.2771941acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

Authors Info & Claims
Published:31 May 2015Publication History

ABSTRACT

There have been a number of research proposals to use discrete graphics processing units (GPUs) to accelerate database operations. Although many of these works show up to an order of magnitude performance improvement, discrete GPUs are not commonly used in modern database systems. However, there is now a proliferation of integrated GPUs which are on the same silicon die as the conventional CPU. With the advent of new programming models like heterogeneous system architecture, these integrated GPUs are considered first-class compute units, with transparent access to CPU virtual addresses and very low overhead for computation offloading. We show that integrated GPUs significantly reduce the overheads of using GPUs in a database environment. Specifically, an integrated GPU is 3x faster than a discrete GPU even though the discrete GPU has 4x the computational capability. Therefore, we develop high performance scan and aggregate algorithms for the integrated GPU. We show that the integrated GPU can outperform a four-core CPU with SIMD extensions by an average of 30% (up to 3:2x) and provides an average of 45% reduction in energy on 16 TPC-H queries.

References

  1. L. Abraham et al. Scuba: Diving into data at facebook. PVLDB, 6(11):1057--1067, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD. AMD's most advanced APU ever. http://www.amd.com/us/products/desktop/processors/a-series/Pages/nextgenapu.aspx. Accessed: 2014-1-23.Google ScholarGoogle Scholar
  3. AMD. Graphics Card Solutions. http://products.amd.com/en-us/GraphicCardResult.aspx. Accessed: 2014-1-23.Google ScholarGoogle Scholar
  4. Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. In SIGMOD Conference, pages 271--282, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. chun Feng and S. Xiao. To gpu synchronize or not gpu synchronize? In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 3801--3804, May 2010.Google ScholarGoogle Scholar
  6. F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA database -- an architecture overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.Google ScholarGoogle Scholar
  7. Z. Feng and E. Lo. Accelerating aggregation using intra-cycle parallelism. In Data Engineering (ICDE), 2015 IEEE 31th International Conference on, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  8. G. GLIGOR and S. Teodoru. Oracle Exalytics: Engineered for Speed-of-Thought Analytics. Database Systems Journal, 2(4):3--8, December 2011.Google ScholarGoogle Scholar
  9. N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD Conference, page 325, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD Conference, page 215, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD Conference, page 511, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB, 6(10):889--900, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. Proc. VLDB Endow., 8(4):329--340, Dec. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. PVLDB, 1(1):622--634, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In DaMoN Workshop, pages 55--62, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. W. Keckler. Life after Dennard and How I Learned to Love the Picojoule. In MICRO 44 Keynote, 2011.Google ScholarGoogle Scholar
  17. Y. Li and J. M. Patel. BitWeaving: fast scans for main memory data processing. In SIGMOD Conference, pages 289--300, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. PVLDB, 7(10), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. NVIDIA. NVIDIA's Next Generation CUDA Compute Architecture: Fermi, 2009.Google ScholarGoogle Scholar
  20. O. Polychroniou and K. A. Ross. High throughput heavy hitter aggregation for modern simd processors. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN '13, pages 6:1--6:6, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Power, Y. Li, M. D. Hill, J. M. Patel, and D. A. Wood. Implications of emerging 3D GPU architecture on the scan primitive. SIGMOD Rec., 44(1), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In ICDE Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Rogers. Heterogeneous System Architecture Overview. In Hot Chips 25, 2013.Google ScholarGoogle Scholar
  25. K. Rupp. CPU, GPU and MIC hardware characteristics over time. http://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-characteristics-over-time/. Accessed: 2015-05-05.Google ScholarGoogle Scholar
  26. N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: a case for bandwidth oblivious SIMD sort. In SIGMOD Conference, pages 351--362, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Sun, S. Krishnan, R. S. Xin, and M. J. Franklin. A partitioning framework for aggressive data skipping. PVLDB, 7(13):1617--1620, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Willhalm, I. Oukid, I. Müller, and F. Faerber. Vectorizing database column scans with complex predicates. In AMDS Workshop, pages 1--12, 2013.Google ScholarGoogle Scholar
  29. T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In Proceedings of the Seventh International Workshop on Data Management on New Hardware, DaMoN '11, pages 1--9, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD Conference, pages 145--156, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DaMoN'15: Proceedings of the 11th International Workshop on Data Management on New Hardware
        May 2015
        100 pages
        ISBN:9781450336383
        DOI:10.1145/2771937

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 May 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        DaMoN'15 Paper Acceptance Rate12of16submissions,75%Overall Acceptance Rate80of102submissions,78%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader