skip to main content
survey

Query Processing on Heterogeneous CPU/GPU Systems

Published:17 January 2022Publication History
Skip Abstract Section

Abstract

Due to their high computational power and internal memory bandwidth, graphic processing units (GPUs) have been extensively studied by the database systems research community. A heterogeneous query processing system that employs CPUs and GPUs at the same time has to solve many challenges, including how to distribute the workload on processors with different capabilities; how to overcome the data transfer bottleneck; and how to support implementations for multiple processors efficiently. In this survey we devise a classification scheme to categorize techniques developed to address these challenges. Based on this scheme, we categorize query processing systems on heterogeneous CPU/GPU systems and identify open research problems.

Skip Supplemental Material Section

Supplemental Material

REFERENCES

  1. [1] Abadi D., Carney D., Çetintemel U., Cherniack M., Convey C., Erwin C., Galvez E., Hatoun M., Maskey A., Rasin A., Singer A., Stonebraker M., Tatbul N., Xing Y., Yan R., and Zdonik S.. 2003. Aurora: A data stream management system. In Proc. of SIGMOD’03. Association for Computing Machinery, 666. DOI: DOI: https://doi.org/10.1145/872757.872855 Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Abadi Daniel, Ailamaki Anastasia, Andersen David, Bailis Peter, Balazinska Magdalena, Bernstein Philip, Boncz Peter, Chaudhuri Surajit, Cheung Alvin, Doan AnHai, Dong Luna, Franklin Michael J., Freire Juliana, Halevy Alon, Hellerstein Joseph M., Idreos Stratos, Kossmann Donald, Kraska Tim, Krishnamurthy Sailesh, Markl Volker, Melnik Sergey, Milo Tova, Mohan C., Neumann Thomas, Ooi Beng Chin, Ozcan Fatma, Patel Jignesh, Pavlo Andrew, Popa Raluca, Ramakrishnan Raghu, Ré Christopher, Stonebraker Michael, and Suciu Dan. 2020. The Seattle report on database research. 48, 4 (2020), 4453. DOI: DOI: https://doi.org/10.1145/3385658.3385668 Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Abadi Daniel, Madden Samuel, and Ferreira Miguel. 2006. Integrating compression and execution in column-oriented database systems. In Proc. of ACM SIGMOD’06. Association for Computing Machinery, 671682. DOI: DOI: https://doi.org/10.1145/1142473.1142548 Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Devices Advanced Micro. 2021. More About How ROCm Uses PCIe Atomics. https://rocmdocs.amd.com/en/latest/Installation_Guide/More-about-how-ROCm-uses-PCIe-Atomics.html.Google ScholarGoogle Scholar
  5. [5] Agbaria Adnan, Minor David, Peterfreund Natan, Rozenberg Eyal, and Rosenberg Ofer. 2017. Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization. In Proc. of ADMS/IMDM@VLDB’17. Springer International Publishing, 5778. https://link.springer.com/chapter/10.1007/978-3-319-56111-0_4.Google ScholarGoogle Scholar
  6. [6] Ajanovic Jasmin. 2009. PCI express 3.0 overview. In Proc. of IEEE HCS 21. 161. DOI: DOI: https://doi.org/10.1109/HOTCHIPS.2009.7478337Google ScholarGoogle Scholar
  7. [7] Appuswamy Raja, Karpathiotakis Manos, Porobic Danica, and Ailamaki Anastasia. 2017. The case for heterogeneous HTAP. In Proc. of CIDR’17. http://infoscience.epfl.ch/record/224447.Google ScholarGoogle Scholar
  8. [8] Arora Sonu, Bouvier Dan, and Weaver Chris. 2020. AMD next generation 7NM Ryzen™ 4000 APU “Renoir”. In Proc. of IEEE HCS 32. 130. DOI: DOI: https://doi.org/10.1109/HCS49909.2020.9220414Google ScholarGoogle Scholar
  9. [9] Astrahan M. M., Blasgen M. W., Chamberlin D. D., Eswaran K. P., Gray J. N., Griffiths P. P., King W. F., Lorie R. A., McJones P. R., Mehl J. W., Putzolu G. R., Traiger I. L., Wade B. W., and Watson V.. 1976. System R: Relational approach to database management. 1, 2 (1976), 97137. DOI: DOI: https://doi.org/10.1145/320455.320457 Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Bandi Nagender, Sun Chengyu, Agrawal Divyakant, and Abbadi Amr El. 2004. Hardware acceleration in commercial databases: A case study of spatial operations. In Proc. of VLDB’04. Morgan Kaufmann, 10211032. https://doi.org/10.1016/B978-012088469-8.50089-9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Beier Felix, Kilias Torsten, and Sattler Kai-Uwe. 2012. GiST scan acceleration using coprocessors. In Proc. of ACM DaMoN’12. 6369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Blythe David. 2020. The Xe GPU architecture. In Proc. of IEEE HCS 32. IEEE Computer Society, 127. https://doi.org/10.1109/HCS49909.2020.9220591Google ScholarGoogle Scholar
  13. [13] Bøgh Kenneth S., Assent Ira, and Magnani Matteo. 2013. Efficient GPU-based skyline computation. In Proc. of ACM DaMoN’13, Article 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Bøgh Kenneth S., Chester Sean, Šidlauskas Darius, and Assent Ira. 2017. Template skycube algorithms for heterogeneous parallelism on multicore and GPU architectures. In Proc. of ACM SIGMOD’17. Association for Computing Machinery, 447462. DOI: DOI: https://doi.org/10.1145/3035918.3035962 Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Bohr M.. 2007. A 30 year retrospective on Dennard’s MOSFET scaling paper. 12, 1 (2007), 1113. https://doi.org/10.1109/N-SSC.2007.4785534Google ScholarGoogle Scholar
  16. [16] Boncz Peter A., Zukowski Marcin, and Nes Niels. 2005. MonetDB/X100: Hyper-pipelining query execution. In Proc. of CIDR’05.Google ScholarGoogle Scholar
  17. [17] Borkar Shekhar and Chien Andrew A.. 2011. The future of microprocessors. 54, 5 (2011), 6777. https://doi.org/10.1145/1941487.1941507 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Bouvier Dan and Sander Ben. 2014. Applying AMD’s Kaveri APU for heterogeneous computing. In Proc. of IEEE HCS 26. 142. DOI: DOI: https://doi.org/10.1109/HOTCHIPS.2014.7478810Google ScholarGoogle Scholar
  19. [19] Branover Alexander, Foley Denis, and Steinman Maurice. 2012. AMD fusion APU: Llano. 32, 2 (2012), 2837. https://doi.org/10.1109/MM.2012.2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Breß Sebastian. 2014. The design and implementation of CoGaDB: A column-oriented GPU-accelerated DBMS. 14, 3 (2014), 199209. DOI: DOI: https://doi.org/10.1007/s13222-014-0164-zGoogle ScholarGoogle Scholar
  21. [21] Breß Sebastian, Beier Felix, Rauhe Hannes, Sattler Kai-Uwe, Schallehn Eike, and Saake Gunter. 2013. Efficient co-processor utilization in database query processing. Information Systems 38, 8 (2013), 10841096. DOI: DOI: https://doi.org/10.1016/j.is.2013.05.004 Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Breß Sebastian, Funke Henning, and Teubner Jens. 2016. Robust query processing in co-processor-accelerated databases. In Proc. of ACM SIGMOD’16. Association for Computing Machinery, 18911906. https://doi.org/10.1145/2882903.2882936Google ScholarGoogle Scholar
  23. [23] Breß Sebastian, Heimel Max, Siegmund Norbert, Bellatreche Ladjel, and Saake Gunter. 2014. GPU-accelerated database systems: Survey and open challenges. Trans. Large Scale Data Knowl. Centered Syst. 15 (2014), 135. DOI: DOI: https://doi.org/10.1007/978-3-662-45761-0_1Google ScholarGoogle Scholar
  24. [24] Breß Sebastian, Köcher Bastian, Funke Henning, Zeuch Steffen, Rabl Tilmann, and Markl Volker. 2018. Generating custom code for efficient query execution on heterogeneous processors. The VLDB Journal 27, 6 (2018), 797822. DOI: DOI: https://doi.org/10.1007/s00778-018-0512-y Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Carbone Paris, Katsifodimos Asterios, Ewen Stephan, Markl Volker, Haridi Seif, and Tzoumas Kostas. 2015. Apache flink: Stream and batch processing in a single engine.IEEE Data Engineering Bulletin 36, 4 (2015).Google ScholarGoogle Scholar
  26. [26] Chen Tianqi, Moreau Thierry, Jiang Ziheng, Zheng Lianmin, Yan Eddie, Shen Haichen, Cowan Meghan, Wang Leyuan, Hu Yuwei, Ceze Luis, Guestrin Carlos, and Krishnamurthy Arvind. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proc. of USENIX OSDI’18. USENIX Association, 578594. https://www.usenix.org/conference/osdi18/presentation/chen. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Chrysogelos Periklis, Karpathiotakis Manos, Appuswamy Raja, and Ailamaki Anastasia. 2019. HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines. Proc. VLDB Endow 12, 5 (2019), 544556. DOI: DOI: https://doi.org/10.14778/3303753.3303760 Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Chrysogelos Periklis, Sioulas Panagiotis, and Ailamaki Anastasia. 2019. Hardware-conscious query processing in GPU-accelerated analytical engines. In Proc. of CIDR’19. 9. http://infoscience.epfl.ch/record/262529.Google ScholarGoogle Scholar
  29. [29] Cugola Gianpaolo and Margara Alessandro. 2012. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv. 44, 3, Article 15 (2012), 62 pages. DOI: DOI: https://doi.org/10.1145/2187671.2187677 Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Dagum Leornado and Menon Ramesh. 1998. OpenMP: An industry-standard API for shared-memory programming. IEEE Computational Science and Engineering 5, 1 (1998), 4655. DOI: DOI: https://doi.org/10.1109/99.660313 Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Dennard R. H., Gaensslen F. H., Rideout V. L., Bassous E., and LeBlanc A. R.. 1974. Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE Journal of Solid-State Circuits 9, 5 (1974), 256268. DOI: DOI: https://doi.org/10.1109/JSSC.1974.1050511Google ScholarGoogle Scholar
  32. [32] Doraiswamy Harish, Vo Huy T., Siva Cláudio T., and Freire Juliana. 2016. A GPU-based index to support interactive spatio-temporal queries over historical data. In Proc. of IEEE ICDE’16. 10861097. https://doi.org/10.1109/ICDE.2016.7498315Google ScholarGoogle Scholar
  33. [33] Eldawy Ahmed and Mokbel Mohamed F.. 2016. The era of big spatial data: A survey. Foundations and Trends® in Databases 6, 3–4 (2016), 163273. https://doi.org/10.1561/1900000054 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Elder Gordon. 2002. Radeon 9700. In Proc. of ACM SIGGRAPH/Eurographics’02 Tutorials. https://www.graphicshardware.org/previous/www_2002/presentations/Hot3D-RADEON9700.ppt.Google ScholarGoogle Scholar
  35. [35] Fang Jian, Mulder Yvo T. B., Hidders Jan, Lee Jinho, and Hofstee H. Peter. 2020. In-memory database acceleration on FPGAs: A survey. 29, 1 (2020), 3359. DOI: DOI: https://doi.org/10.1007/s00778-019-00581-wGoogle ScholarGoogle Scholar
  36. [36] Fang Wenbin, He Bingsheng, and Luo Qiong. 2010. Database compression on graphics processors. Proc. VLDB Endow 3, 1–2 (2010), 670680. DOI: DOI: https://doi.org/10.14778/1920841.1920927 Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Funke Henning, Breß Sebastian, Noll Stefan, Markl Volker, and Teubner Jens. 2018. Pipelined query processing in coprocessor environments. In Proc. of ACM SIGMOD’18. Association for Computing Machinery, 16031618. DOI: DOI: https://doi.org/10.1145/3183713.3183734 Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Funke Henning and Teubner Jens. 2020. Data-parallel query processing on non-uniform data. Proc. VLDB Endow 13, 6 (2020), 884897. DOI: DOI: https://doi.org/10.14778/3380750.3380758 Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Govindaraju Naga K., Gray Jim, Kumar Ritesh, and Manocha Dinesh. 2006. GPUTeraSort: High performance graphics co-processor sorting for large database management. In Proc. of ACM SIGMOD’06. ACM, 325336. DOI: DOI: https://doi.org/10.1145/1142473.1142511 Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Govindaraju Naga K., Lloyd Brandon, Wang Wei, Lin Ming, and Manocha Dinesh. 2004. Fast computation of database operations using graphics processors. In Proc. of ACM SIGMOD’04. ACM, 215226. DOI: DOI: https://doi.org/10.1145/1007568.1007594 Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Graefe Goetz and Shapiro Leonard D.. 1991. Data compression and database performance. In Proc. of IEEE SAC’91. IEEE Computer Society, 2227. DOI: DOI: https://doi.org/10.1109/SOAC.1991.143840Google ScholarGoogle Scholar
  42. [42] Gregg Chris and Hazelwood Kim. 2011. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In Proc. of IEEE ISPASS’11. 134144. DOI: DOI: https://doi.org/10.1109/ISPASS.2011.5762730 Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Grochowski Ed, Ronen Ronny, Shen John, and Wang Hong. 2004. Best of both latency and throughput. In Proc. of IEEE ICCD’04. 236243. DOI: DOI: https://doi.org/10.1109/ICCD.2004.1347928 Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Harizopoulos Stavros, Shkapenyuk Vladislav, and Ailamaki Anastassia. 2005. QPipe: A simultaneously pipelined relational query engine. In Proc. of ACM SIGMOD’05. Association for Computing Machinery, 383394. DOI: DOI: https://doi.org/10.1145/1066157.1066201 Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Harris Mark. 2004. General-purpose computation using graphics hardware. In Eurographics’04 Tutorials. Eurographics Association. DOI: DOI: https://doi.org/10.2312/egt.20041034Google ScholarGoogle Scholar
  46. [46] He Bingsheng, Govindaraju Naga K., Luo Qiong, and Smith Burton. 2007. Efficient gather and scatter operations on graphics processors. In Proc. of ACM SC’07. ACM, Article 46, 12 pages. DOI: DOI: https://doi.org/10.1145/1362622.1362684 Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] He Bingsheng, Lu Mian, Yang Ke, Fang Rui, Govindaraju Naga K., Luo Qiong, and Sander Pedro V.. 2009. Relational query coprocessing on graphics processors. ACM Trans. Database Syst. 34, 4, Article 21 (2009), 39 pages. DOI: DOI: https://doi.org/10.1145/1620585.1620588 Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] He Bingsheng, Yang Ke, Fang Rui, Lu Mian, Govindaraju Naga, Luo Qiong, and Sander Pedro. 2008. Relational joins on graphics processors. In Proc. of ACM SIGMOD’08. ACM, 511524. DOI: DOI: https://doi.org/10.1145/1376616.1376670 Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] He Bingsheng and Yu Jeffrey Xu. 2011. High-throughput transaction executions on graphics processors. Proc. VLDB Endow 4, 5 (2011), 314325. DOI: DOI: https://doi.org/10.14778/1952376.1952381 Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] He Jiong, Lu Mian, and He Bingsheng. 2013. Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. Proc. VLDB Endow 6, 10 (2013), 889900. DOI: DOI: https://doi.org/10.14778/2536206.2536216 Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] He Jiong, Zhang Shuhao, and He Bingsheng. 2014. In-cache query co-processing on coupled CPU-GPU architectures. Proc. VLDB Endow. 8, 4 (2014), 329340. DOI: DOI: https://doi.org/10.14778/2735496.2735497 Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Heimel Max, Kiefer Martin, and Markl Volker. 2015. Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In Proc. of ACM SIGMOD’15. Association for Computing Machinery, 14771492. DOI: DOI: https://doi.org/10.1145/2723372.2749438 Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Heimel Max, Saecker Michael, Pirk Holger, Manegold Stefan, and Markl Volker. 2013. Hardware-oblivious parallelism for in-memory column-stores. Proc. VLDB Endow. 6, 9 (2013), 709720. DOI: DOI: https://doi.org/10.14778/2536360.2536370 Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Hennessy John L. and Patterson David A.. 2017. Computer Architecture: A Quantitative Approach (6 ed.). Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Jensen Christian S., Pedersen Torben Bach, and Thomsen Christian. 2010. Multidimensional databases and data warehousing. Synthesis Lectures on Data Management 2, 1 (2010), 1111. DOI: DOI: https://doi.org/10.2200/S00299ED1V01Y201009DTM009 Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Jia Zhe, Maggioni Marco, Smith Jeffrey, and Scarpazza Daniele Paolo. 2019. Dissecting the NVIDIA turing T4 GPU via microbenchmarking. abs/1903.07486 (2019). http://arxiv.org/abs/1903.07486Google ScholarGoogle Scholar
  57. [57] Jia Zhe, Maggioni Marco, Staiger Benjamin, and Scarpazza Daniele P.. 2018. Dissecting the NVIDIA Volta GPU architecture via microbenchmarking. abs/1804.06826 (2018). https://arxiv.org/abs/1804.06826Google ScholarGoogle Scholar
  58. [58] Jouppi Norman P., Young Cliff, Patil Nishant, and Patterson David. 2018. A domain-specific architecture for deep neural networks. Commun. ACM 61, 9 (2018), 5059. DOI: DOI: https://doi.org/10.1145/3154484 Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Kaldewey Tim, Lohman Guy, Mueller Rene, and Volk Peter. 2012. GPU join processing revisited. In Proc. of ACM DaMoN’12. ACM, 5562. DOI: DOI: https://doi.org/10.1145/2236584.2236592 Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Karnagel Tomas, Ben-Nun Tal, Werner Matthias, Habich Dirk, and Lehner Wolfgang. 2017. Big data causing big (TLB) problems: Taming random memory accesses on the GPU. In Proc. of ACM DaMoN’17. Association for Computing Machinery, Article 6, 10 pages. DOI: DOI: https://doi.org/10.1145/3076113.3076115 Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Karnagel Tomas, Habich Dirk, and Lehner Wolfgang. 2017. Adaptive work placement for query processing on heterogeneous computing resources. Proc. VLDB Endow 10, 7 (2017), 733744. DOI: DOI: https://doi.org/10.14778/3067421.3067423 Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Karnagel Tomas, Habich Dirk, and Lehner Wolfgang. 2015. Local vs. global optimization: Operator placement strategies in heterogeneous environments. In Proc. of EDBT’15 Workshops. CEUR-WS.org, 4855. http://ceur-ws.org/Vol-1330/paper-10.pdf.Google ScholarGoogle Scholar
  63. [63] Karnagel Tomas, Habich Dirk, Schlegel Benjamin, and Lehner Wolfgang. 2014. Heterogeneity-aware operator placement in column-store DBMS. Datenbank-Spektrum 14, 3 (2014), 211221. DOI: DOI: https://doi.org/10.1007/s13222-014-0167-9Google ScholarGoogle Scholar
  64. [64] Karnagel Tomas, Habich Dirk, Schlegel Benjamin, and Lehner Wolfgang. 2013. The HELLS-Join: A heterogeneous stream join for extremely large windows. In Proc. of ACM DaMoN’13. Association for Computing Machinery, Article 2, 7 pages. DOI: DOI: https://doi.org/10.1145/2485278.2485280 Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Karnagel Tomas, Müller René, and Lohman Guy M.. 2015. Optimizing GPU-accelerated group-by and aggregation. In Proc. of ADMS@VLDB’15. 1324. http://www.adms-conf.org/2015/gpu-optimizer-camera-ready.pdf.Google ScholarGoogle Scholar
  66. [66] Kayiran Onur, Nachiappan Nachiappan Chidambaram, Jog Adwait, Ausavarungnirun Rachata, Kandemir Mahmut T., Loh Gabriel H., Mutlu Onur, and Das Chita R.. 2014. Managing GPU concurrency in heterogeneous architectures. In Proc. of IEEE/ACM MICRO 47. 114126. DOI: DOI: https://doi.org/10.1109/MICRO.2014.62 Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Kersten Timo, Leis Viktor, Kemper Alfons, Neumann Thomas, Pavlo Andrew, and Boncz Peter. 2018. Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. 11, 13 (2018), 22092222. DOI: DOI: https://doi.org/10.14778/3275366.3284966 Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Kessenich John, Sellers Graham, and Shreiner Dave. 2016. OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 4.5 with SPIR-V (9 ed.). Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Group Khronos OpenCL Working. 2013. The OpenCL Specification Version 2.0. https://www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf.Google ScholarGoogle Scholar
  70. [70] Koduri Raja. 2019. Exascale for everyone. In Intel HPC Developer Conference’19. https://software.intel.com/content/www/us/en/develop/events/hpc-devcon.html.Google ScholarGoogle Scholar
  71. [71] Koliousis Alexandros, Weidlich Matthias, Fernandez Raul Castro, Wolf Alexander L., Costa Paolo, and Pietzuch Peter. 2016. SABER: Window-based hybrid stream processing for heterogeneous architectures. In Proc. of ACM SIGMOD’16. Association for Computing Machinery, 555569. DOI: DOI: https://doi.org/10.1145/2882903.2882906 Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Körber Michael, Eckstein Jakob, Glombiewski Nikolaus, and Seeger Bernhard. 2019. Event stream processing on heterogeneous system architecture. In Proc. of ACM DaMoN’19. Association for Computing Machinery, Article 3. DOI: DOI: https://doi.org/10.1145/3329785.3329933 Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Lang Harald, Mühlbauer Tobias, Funke Florian, Boncz Peter A., Neumann Thomas, and Kemper Alfons. 2016. Data blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In Proc. of ACM SIGMOD’16. Association for Computing Machinery, 311326. DOI: DOI: https://doi.org/10.1145/2882903.2882925 Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Lattner Chris and Adve Vikram. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proc. of IEEE CGO’04. 7586. DOI: DOI: https://doi.org/10.1109/CGO.2004.1281665 Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Leis Viktor, Boncz Peter, Kemper Alfons, and Neumann Thomas. 2014. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In Proc. of ACM SIGMOD’14. ACM, 743754. DOI: DOI: https://doi.org/10.1145/2588555.2610507 Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Lempel Oded. 2011. 2nd generation Intel® core processor family: Intel® core i7, i5 and i3. In Proc. of IEEE HCS 23. 148. DOI: DOI: https://doi.org/10.1109/HOTCHIPS.2011.7477509Google ScholarGoogle Scholar
  77. [77] Li Chuanwen, Gu Yu, Qi Jianzhong, He Jiayuan, Deng Qingxu, and Yu Ge. 2018. A GPU accelerated update efficient index for kNN queries in road networks. In Proc. of IEEE ICDE’18. 881892. DOI: DOI: https://doi.org/10.1109/ICDE.2018.00084Google ScholarGoogle Scholar
  78. [78] Lin Yuan and Grover Vinod. 2018. Using CUDA Warp-Level Primitives. https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/.Google ScholarGoogle Scholar
  79. [79] Lindholm E., Nickolls J., Oberman S., and Montrym J.. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2 (2008), 3955. DOI: DOI: https://doi.org/10.1109/MM.2008.31 Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Lindholm Erik, Kilgard Mark J., and Moreton Henry. 2001. A user-programmable vertex engine. In Proc. of ACM SIGGRAPH’01. ACM, 149158. DOI: DOI: https://doi.org/10.1145/383259.383274 Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Group LLVM Developer. [n.d.]. The LLVM Target-Independent Code Generator. https://www.llvm.org/docs/CodeGenerator.html.Google ScholarGoogle Scholar
  82. [82] Luitjens Justin. 2014. Faster Parallel Reductions on Kepler. https://developer.nvidia.com/blog/faster-parallel-reductions-kepler/.Google ScholarGoogle Scholar
  83. [83] Lutz Clemens, Breß Sebastian, Rabl Tilmann, Zeuch Steffen, and Markl Volker. 2018. Efficient K-means on GPUs. In Proc. of ACM DaMoN’18. Association for Computing Machinery, Article 3, 3 pages. https://doi.org/10.1145/3211922.3211925 Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. [84] Lutz Clemens, Breß Sebastian, Zeuch Steffen, Rabl Tilmann, and Markl Volker. 2020. Pump up the volume: Processing large data on GPUs with fast interconnects. In Proc. of ACM SIGMOD’20. Association for Computing Machinery, 16331649. DOI: DOI: https://doi.org/10.1145/3318464.3389705 Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Manegold Stefan, Boncz Peter, and Kersten Martin L.. 2002. Generic database cost models for hierarchical memory systems. In Proc. of VLDB’02. VLDB Endowment, 191202. http://vldb.org/conf/2002/S06P03.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. [86] Mantor Mike. 2019. 7nm “Navi” GPU - A GPU built for performance and efficiency. In Proc. of IEEE HCS 31. IEEE Computer Society, 128. DOI: DOI: https://doi.org/10.1109/HOTCHIPS.2019.8875649Google ScholarGoogle Scholar
  87. [87] Mark William R., Glanville R. Steven, Akeley Kurt, and Kilgard Mark J.. 2003. Cg: A system for programming graphics hardware in a C-like language. ACM Trans. Graph 22, 3 (2003), 896907. DOI: DOI: https://doi.org/10.1145/882262.882362 Google ScholarGoogle ScholarCross RefCross Ref
  88. [88] Menon Prashanth, Mowry Todd C., and Pavlo Andrew. 2017. Relaxed operator fusion for in-memory databases: Making compilation, vectorization, and prefetching work together at last. Proc. VLDB Endow 11, 1 (2017), 113. DOI: DOI: https://doi.org/10.14778/3151113.3151114 Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Meraji Sina, Schiefer Berni, Pham Lan, Chu Lee, Kokosielis Peter, Storm Adam, Young Wayne, Ge Chang, Ng Geoffrey, and Kanagaratnam Kajan. 2016. Towards a hybrid design for fast query processing in DB2 with BLU acceleration using graphical processing units: A technology demonstration. In Proc. of ACM SIGMOD’16. Association for Computing Machinery, 19511960. DOI: DOI: https://doi.org/10.1145/2882903.2903735 Google ScholarGoogle ScholarCross RefCross Ref
  90. [90] Mittal Sparsh and Vetter Jeffrey S.. 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47, 4, Article 69 (2015), 35 pages. DOI: DOI: https://doi.org/10.1145/2788396 Google ScholarGoogle ScholarCross RefCross Ref
  91. [91] Mühlbauer Tobias, Rödiger Wolf, Seilbeck Robert, Kemper Alfons, and Neumann Thomas. 2014. Heterogeneity-conscious parallel query execution: Getting a better mileage while driving faster!. In Proc. of ACM DaMoN’14. Association for Computing Machinery, Article 2, 10 pages. DOI: DOI: https://doi.org/10.1145/2619228.2619230 Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. [92] Mukherjee Saoni, Sun Yifan, Blinzer Paul, Ziabari Amir Kavyan, and Kaeli David. 2016. A comprehensive performance analysis of HSA and OpenCL 2.0. In Proc. of IEEE ISPASS’16. 183193. DOI: DOI: https://doi.org/10.1109/ISPASS.2016.7482093Google ScholarGoogle Scholar
  93. [93] Neugebauer Rolf, Antichi Gianni, Zazo José Fernando, Audzevich Yury, López-Buedo Sergio, and Moore Andrew W.. 2018. Understanding PCIe performance for end host networking. In Proc. of ACM SIGCOMM’18. Association for Computing Machinery, 327341. DOI: DOI: https://doi.org/10.1145/3230543.3230560 Google ScholarGoogle ScholarCross RefCross Ref
  94. [94] Neumann Thomas. 2011. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow 4, 9 (2011), 539550. DOI: DOI: https://doi.org/10.14778/2002938.2002940 Google ScholarGoogle ScholarCross RefCross Ref
  95. [95] Nickolls John, Buck Ian, Garland Michael, and Skadron Kevin. 2008. Scalable parallel programming with CUDA. Queue 6, 2 (2008), 4053. DOI: DOI: https://doi.org/10.1145/1365490.1365500 Google ScholarGoogle ScholarCross RefCross Ref
  96. [96] Corporation NVIDIA. 2020. CUDA C Best Practices Guide (v11.1 ed.). https://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf.Google ScholarGoogle Scholar
  97. [97] Corporation NVIDIA. [n.d.]. CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/index.html.Google ScholarGoogle Scholar
  98. [98] Corporation NVIDIA. 2020. NVIDIA A100 Tensor Core GPU Architecture.Google ScholarGoogle Scholar
  99. [99] Corporation NVIDIA. 2007. NVIDIA CUDA Programming Guide (version 1.0 ed.).Google ScholarGoogle Scholar
  100. [100] Corporation NVIDIA. [n.d.]. NVIDIA GPUDirect. https://developer.nvidia.com/gpudirect.Google ScholarGoogle Scholar
  101. [101] Corporation NVIDIA. 2016. NVIDIA Tesla P100.Google ScholarGoogle Scholar
  102. [102] Corporation NVIDIA. 2017. NVIDIA Tesla V100 GPU Architecture.Google ScholarGoogle Scholar
  103. [103] Corporation NVIDIA. 2018. NVIDIA Turing GPU Architecture.Google ScholarGoogle Scholar
  104. [104] Corporation NVIDIA. 2009. NVIDIA’s Next Generation CUDA Compute Architecture: Fermi.Google ScholarGoogle Scholar
  105. [105] O’Neil Patrick, O’Neil Eizabeth, and Chen Xuedong. 2009. Star Schema Benchmark-Revision 3. http://www.cs.umbo.edu/poneil/StarSchemaB.PDF.Google ScholarGoogle Scholar
  106. [106] Laboratory Oak Ridge National. 2019. Frontier Spec Sheet. https://www.olcf.ornl.gov/wp-content/uploads/2019/05/frontier_specsheet.pdf.Google ScholarGoogle Scholar
  107. [107] Papazian Irma Esmer. 2020. New 3rd gen Intel® Xeon® scalable processor (codename: Ice Lake-SP). In Proc. of IEEE HCS 32. IEEE Computer Society, 122. DOI: DOI: https://doi.org/10.1109/HCS49909.2020.9220434Google ScholarGoogle Scholar
  108. [108] Papermaster Mark. 2020. Future of High Performance. https://ir.amd.com/news-events/analyst-day.Google ScholarGoogle Scholar
  109. [109] Paul Johns, He Jiong, and He Bingsheng. 2016. GPL: A GPU-based pipelined query processing engine. In Proc. of ACM SIGMOD’16. Association for Computing Machinery, 19351950. DOI: DOI: https://doi.org/10.1145/2882903.2915224 Google ScholarGoogle ScholarCross RefCross Ref
  110. [110] Pirk Holger, Manegold Stefan, and Kersten Martin. 2014. Waste not... efficient co-processing of relational data. In Proc. of IEEE ICDE’14. 508519. DOI: DOI: https://doi.org/10.1109/ICDE.2014.6816677Google ScholarGoogle Scholar
  111. [111] Pirk Holger, Moll Oscar, Zaharia Matei, and Madden Sam. 2016. Voodoo - a vector algebra for portable database performance on modern hardware. Proc. VLDB Endow 9, 14 (2016), 17071718. DOI: DOI: https://doi.org/10.14778/3007328.3007336 Google ScholarGoogle ScholarCross RefCross Ref
  112. [112] Pirk Holger, Sellam Thibault, Manegold Stefan, and Kersten Martin. 2012. X-device query processing by bitwise distribution. In Proc. of ACM DaMoN’12. ACM, 4854. DOI: DOI: https://doi.org/10.1145/2236584.2236591 Google ScholarGoogle ScholarCross RefCross Ref
  113. [113] Psaroudakis Iraklis, Wolf Florian, May Norman, Neumann Thomas, Böhm Alexander, Ailamaki Anastasia, and Sattler Kai-Uwe. 2015. Scaling up mixed workloads: A battle of data freshness, flexibility, and scheduling. In Proc. of TPCTC’14. Springer International Publishing, 97112. DOI: DOI: https://doi.org/10.1007/978-3-319-15350-6_7Google ScholarGoogle Scholar
  114. [114] Raza Syed Mohammad Aunn, Chrysogelos Periklis, Sioulas Panagiotis, Indjic Vladimir, Anadiotis Angelos Christos, and Ailamaki Anastasia. 2020. GPU-accelerated data management under the test of time. In Proc. of CIDR’20.Google ScholarGoogle Scholar
  115. [115] Rogers Phil. 2013. Heterogeneous system architecture overview. In Proc. of IEEE HCS 25. 141. DOI: DOI: https://doi.org/10.1109/HOTCHIPS.2013.7478286Google ScholarGoogle Scholar
  116. [116] Rogers Phil, Ander Ben, Gaster Benedict, and Bratt Ian. 2013. Heterogeneous system architecture (HSA): Overview and implementation. In Proc. of IEEE HCS 25. 141. DOI: DOI: https://doi.org/10.1109/HOTCHIPS.2013.7478286Google ScholarGoogle Scholar
  117. [117] Rosenfeld Viktor, Breß Sebastian, Zeuch Steffen, Rabl Tilmann, and Markl Volker. 2019. Performance analysis and automatic tuning of hash aggregation on GPUs. In Proc. of ACM DaMoN’19. Association for Computing Machinery, Article 8, 11 pages. DOI: DOI: https://doi.org/10.1145/3329785.3329922 Google ScholarGoogle ScholarCross RefCross Ref
  118. [118] Rosenfeld Viktor, Heimel Max, Viebig Christoph, and Markl Volker. 2015. The operator variant selection problem on heterogeneous hardware. In Proc. of ADMS@VLDB’15. 112. http://www.adms-conf.org/2015/ADMS_Viktor_Rosenfeld_CR.pdf.Google ScholarGoogle Scholar
  119. [119] Rozenberg Eyal and Boncz Peter. 2017. Faster across the PCIe Bus: A GPU library for lightweight decompression. In Proc. of ACM DaMoN’17. Association for Computing Machinery, Article 8, 5 pages. https://doi.org/10.1145/3076113.3076122 Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. [120] Sadasivam Satish Kumar, Thompto Brian W., Kalla Ron, and Starke William J.. 2017. IBM Power9 processor architecture. 37, 2 (2017), 4051. DOI: DOI: https://doi.org/10.1109/MM.2017.40 Google ScholarGoogle ScholarCross RefCross Ref
  121. [121] Sakharnykh Nikolay. 2018. Everything you need to know about unified memory. In GPU Tech Conference 2018. https://on-demand.gputechconf.com/gtc/2018/presentation/s8430-everything-you-need-to-know-about-unified-memory.pdf.Google ScholarGoogle Scholar
  122. [122] Staff Science. 2011. Special online collection: Dealing with data. challenges and opportunities. Science 331, 6018 (2011), 692693. DOI: DOI: https://doi.org/10.1126/science.331.6018.692Google ScholarGoogle Scholar
  123. [123] Sengupta Shubhabrata, Harris Mark, Zhang Yao, and Owens John D.. 2007. Scan primitives for GPU computing. In Proc. of ACM SIGGRAPH/Eurographics’07 Workshop. The Eurographics Association. DOI: DOI: https://doi.org/10.2312/EGGH/EGGH07/097-106 Google ScholarGoogle ScholarCross RefCross Ref
  124. [124] Shahvarani Amirhesam and Jacobsen Hans-Arno. 2016. A hybrid B+-tree as solution for in-memory indexing on CPU-GPU heterogeneous computing platforms. In Proc. of ACM SIGMOD’16. Association for Computing Machinery, 15231538. DOI: DOI: https://doi.org/10.1145/2882903.2882918 Google ScholarGoogle ScholarCross RefCross Ref
  125. [125] Sharma Debendra Das and Tavallaei Siamak. 2020. Compute Express Link™ 2.0 White Paper.Google ScholarGoogle Scholar
  126. [126] Sioulas Panagiotis, Chrysogelos Periklis, Karpathiotakis Manos, Appuswamy Raja, and Ailamaki Anastasia. 2019. Hardware-conscious hash-joins on GPUs. In Proc. of IEEE ICDE’19. 698709. DOI: DOI: https://doi.org/10.1109/ICDE.2019.00068Google ScholarGoogle Scholar
  127. [127] Spafford Kyle L., Meredith Jeremy S., Lee Seyong, Li Dong, Roth Philip C., and Vetter Jeffrey S.. 2012. The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. In Proc. of ACM CF’12. Association for Computing Machinery, 103112. DOI: DOI: https://doi.org/10.1145/2212908.2212924 Google ScholarGoogle ScholarCross RefCross Ref
  128. [128] Starke William and Thompto Brian. 2020. IBM’s POWER10 processor. In Proc. of IEEE HCS 32. IEEE Computer Society, 143. DOI: DOI: https://doi.org/10.1109/HCS49909.2020.9220618Google ScholarGoogle Scholar
  129. [129] Stehle Elias and Jacobsen Hans-Arno. 2017. A memory bandwidth-efficient hybrid radix sort on GPUs. In Proc. of ACM SIGMOD’17. Association for Computing Machinery, 417432. DOI: DOI: https://doi.org/10.1145/3035918.3064043 Google ScholarGoogle ScholarCross RefCross Ref
  130. [130] Stone John E., Gohara David, and Shi Guochun. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering 12, 3 (2010), 6673. DOI: DOI: https://doi.org/10.1109/MCSE.2010.69 Google ScholarGoogle ScholarCross RefCross Ref
  131. [131] Suggs David, Subramony Mahesh, and Bouvier Dan. 2020. The AMD “Zen 2” processor. IEEE Micro 40, 2 (2020), 4552. DOI: DOI: https://doi.org/10.1109/MM.2020.2974217Google ScholarGoogle Scholar
  132. [132] Sun Chengyu, Agrawal Divyakant, and Abbadi Amr El. 2003. Hardware acceleration for spatial selections and joins. In Proc. of ACM SIGMOD’03. ACM, 455466. DOI: DOI: https://doi.org/10.1145/872757.872813 Google ScholarGoogle ScholarCross RefCross Ref
  133. [133] Economist The. 2010. Data, Data Everywhere. A Special Report on Managing Information. https://www.economist.com/special-report/2010/02/27/data-data-everywhere.Google ScholarGoogle Scholar
  134. [134] Group The Khronos. [n.d.]. The Open Standard for Parallel Programming of Heterogeneous Systems. https://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  135. [135] TPC. 2021. TPC-H Version 2 and Version 3. http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  136. [136] Vera Xavier. 2020. Inside Tiger Lake: Intel’s next generation mobile client CPU. In Proc. of IEEE HCS 32. 126. DOI: DOI: https://doi.org/10.1109/HCS49909.2020.9220443Google ScholarGoogle Scholar
  137. [137] Veselý Ján, Basu Arkaprava, Bhattacharjee Abhishek, Loh Gabriel H., Oskin Mark, and Reinhardt Steven K.. 2018. Generic system calls for GPUs. In Proc. of ACM/IEEE ISCA’18. 843856. DOI: DOI: https://doi.org/10.1109/ISCA.2018.00075 Google ScholarGoogle ScholarCross RefCross Ref
  138. [138] Wall David W.. 1993. Limits of Instruction-Level Parallelism.Google ScholarGoogle Scholar
  139. [139] Wang Kaibo, Huai Yin, Lee Rubao, Wang Fusheng, Zhang Xiaodong, and Saltz Joel H.. 2012. Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems. Proc. VLDB Endow. 5, 11 (2012), 15431554. DOI: DOI: https://doi.org/10.14778/2350229.2350268 Google ScholarGoogle ScholarCross RefCross Ref
  140. [140] Wulf Wm. A. and McKee Sally A.. 1995. Hitting the memory wall: Implications of the obvious. SIGARCH Comput. Archit. News 23, 1 (1995), 2024. DOI: DOI: https://doi.org/10.1145/216585.216588 Google ScholarGoogle ScholarCross RefCross Ref
  141. [141] Yuan Yuan, Lee Rubao, and Zhang Xiaodong. 2013. The yin and yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow. 6, 10 (2013), 817828. DOI: DOI: https://doi.org/10.14778/2536206.2536210 Google ScholarGoogle ScholarCross RefCross Ref
  142. [142] Zacharatou Eleni Tzirita, Doraiswamy Harish, Ailamaki Anastasia, Silva Cláudio T., and Freire Juliana. 2017. GPU rasterization for real-time spatial aggregation over arbitrary polygons. Proc. VLDB Endow 11, 3 (2017), 352365. DOI: DOI: https://doi.org/10.14778/3157794.3157803 Google ScholarGoogle ScholarCross RefCross Ref
  143. [143] Zeller Cyril, Fernando Randy, Wloka Matthias, and Harris Mark. 2004. Programming graphics hardware. In Proc. of Eurographics’04 Tutorials. Eurographics Association. DOI: DOI: https://doi.org/10.2312/egt.20041034Google ScholarGoogle Scholar
  144. [144] Zhang Bowen, Shen Yanyan, Zhu Yanmin, and Yu Jiadi. 2018. A GPU-accelerated framework for processing trajectory queries. In Proc. of IEEE ICDE’18. 10371048. DOI: DOI: https://doi.org/10.1109/ICDE.2018.00097Google ScholarGoogle Scholar
  145. [145] Zhang Feng, Yang Lin, Zhang Shuhao, He Bingsheng, Lu Wei, and Du Xiaoyong. 2020. FineStream: Fine-grained window-based stream processing on CPU-GPU integrated architectures. In Proc. of USENIX ATC’20. USENIX Association, 633647. https://www.usenix.org/conference/atc20/presentation/zhang-feng. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. [146] Zhang Kai, Hu Jiayu, He Bingsheng, and Hua Bei. 2017. DIDO: Dynamic pipelines for in-memory key-value stores on coupled CPU-GPU architectures. In Proc. of IEEE ICDE’17. 671682. DOI: DOI: https://doi.org/10.1109/ICDE.2017.120Google ScholarGoogle Scholar
  147. [147] Zhang Kai, Wang Kaibo, Yuan Yuan, Guo Lei, Lee Rubao, and Zhang Xiaodong. 2015. Mega-KV: A case for GPUs to maximize the throughput of in-memory key-value stores. Proc. VLDB Endow 8, 11 (2015), 12261237. DOI: DOI: https://doi.org/10.14778/2809974.2809984 Google ScholarGoogle ScholarCross RefCross Ref
  148. [148] Zukowski Marcin, Héman Sándor, Nes Niels, and Boncz Peter. 2006. Super-scalar RAM-CPU cache compression. In Proc. of IEEE ICDE’06. 5959. DOI: DOI: https://doi.org/10.1109/ICDE.2006.150Google ScholarGoogle Scholar

Index Terms

  1. Query Processing on Heterogeneous CPU/GPU Systems

                      Recommendations

                      Comments

                      Login options

                      Check if you have access through your login credentials or your institution to get full access on this article.

                      Sign in

                      Full Access

                      • Published in

                        cover image ACM Computing Surveys
                        ACM Computing Surveys  Volume 55, Issue 1
                        January 2023
                        860 pages
                        ISSN:0360-0300
                        EISSN:1557-7341
                        DOI:10.1145/3492451
                        Issue’s Table of Contents

                        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                        Publisher

                        Association for Computing Machinery

                        New York, NY, United States

                        Publication History

                        • Published: 17 January 2022
                        • Accepted: 1 August 2021
                        • Revised: 1 June 2021
                        • Received: 1 December 2020
                        Published in csur Volume 55, Issue 1

                        Permissions

                        Request permissions about this article.

                        Request Permissions

                        Check for updates

                        Qualifiers

                        • survey
                        • Refereed

                      PDF Format

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader

                      Full Text

                      View this article in Full Text.

                      View Full Text

                      HTML Format

                      View this article in HTML Format .

                      View HTML Format