ABSTRACT
Energy efficiency is a major design and optimization factor for query co-processing of databases in embedded devices. Recently, GPUs of new-generation embedded devices have evolved with the programmability and computational capability for general-purpose applications. Such CPU-GPU architectures offer us opportunities to revisit GPU query co-processing in embedded environments for energy efficiency. In this paper, we experimentally evaluate and analyze the performance and energy consumption of a GPU query co-processor on such hybrid embedded architectures. Specifically, we study four major database operators as micro-benchmarks and evaluate TPC-H queries on CARMA, which has a quad-core ARM Cortex-A9 CPU and a NVIDIA Quadro 1000M GPU. We observe that the CPU delivers both better performance and lower energy consumption than the GPU for simple operators such as selection and aggregation. However, the GPU outperforms the CPU for sort and hash join in terms of both performance and energy consumption. We further show that CPU-GPU query co-processing can be an effective means of energy-efficient query co-processing in embedded systems with proper tuning and optimizations.
- D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. Fawn: A fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd symposium on Operating Systems Principles, pages 1--14. ACM, 2009. Google ScholarDigital Library
- C. Balkesen, G. Alonso, J. Teubner, and M. T. Ozsu. Multi-core, main-memory joins: Sort vs. hash revisited. Proceedings of the VLDB Endowment, 7(1):85--96, 2013. Google ScholarDigital Library
- B. M. T. Dumitrel Loghin, H. Zhang, B. C. Ooi, and Y. M. Teo. A performance study of big data on small nodes. Proceedings of the VLDB Endowment, 8(7), 2015. Google ScholarDigital Library
- R. Fang, B. He, M. Lu, K. Yang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Gpuqp: query co-processing using graphics processors. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1061--1063. ACM, 2007. Google ScholarDigital Library
- Y. Gu and R. Grossman. Udtv4: Improvements in performance and usability. In Networks for Grid Applications, pages 9--23. Springer, 2009.Google ScholarCross Ref
- J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami. Internet of things (iot): A vision, architectural elements, and future directions. Future Generation Computer Systems, 29(7):1645--1660, 2013. Google ScholarDigital Library
- B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems (TODS), 34(4):21, 2009. Google ScholarDigital Library
- J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled cpu-gpu architecture. Proceedings of the VLDB Endowment, 6(10):889--900, 2013. Google ScholarDigital Library
- S. Jha, B. He, M. Lu, X. Cheng, and P. H. Huynh. Improving main memory hash joins on intel xeon phi processors: An experimental approach. Proceedings of the VLDB Endowment, 8(6), 2015. Google ScholarDigital Library
- T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. Gpu join processing revisited. In Proceedings of the Eighth International Workshop on Data Management on New Hardware, pages 55--62. ACM, 2012. Google ScholarDigital Library
- W. Lang, S. Harizopoulos, J. M. Patel, M. A. Shah, and D. Tsirogiannis. Towards energy-efficient database cluster design. Proceedings of the VLDB Endowment, 5(11):1684--1695, 2012. Google ScholarDigital Library
- K. Ma, X. Li, W. Chen, C. Zhang, and X. Wang. Greengpu: A holistic approach to energy efficiency in gpu-cpu heterogeneous architectures. In Parallel Processing (ICPP), 2012 41st International Conference on, pages 48--57. IEEE, 2012. Google ScholarDigital Library
- F. Mantovani. High performance computing based on embedded processors. In High Performance Computing & Simulation (HPCS), 2014 International Conference on, pages 1034--1034. IEEE, 2014.Google Scholar
- T. Mühlbauer, W. Rödiger, R. Seilbeck, A. Reiser, A. Kemper, and T. Neumann. One dbms for all: the brawny few and the wimpy crowd. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 697--700. ACM, 2014. Google ScholarDigital Library
- A. Pathania, Q. Jiao, A. Prakash, and T. Mitra. Integrated cpu-gpu power management for 3d mobile games. In Design Automation Conference (DAC), 2014 51st ACM/EDAC/IEEE, pages 1--6. IEEE, 2014. Google ScholarDigital Library
- H. Peters, O. Schulz-Hildebrandt, and N. Luttenberger. A novel sorting algorithm for many-core architectures based on adaptive bitonic sort. In Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 227--237. IEEE, 2012. Google ScholarDigital Library
- D. Schall and T. Härder. Energy-proportional query execution using a cluster of wimpy nodes. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, page 1. ACM, 2013. Google ScholarDigital Library
- D. Schall and V. Hudlet. Wattdb: an energy-proportional cluster of wimpy nodes. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1229--1232. ACM, 2011. Google ScholarDigital Library
- D. Tsirogiannis, S. Harizopoulos, and M. A. Shah. Analyzing the energy efficiency of a database server. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 231--242. ACM, 2010. Google ScholarDigital Library
- V. Vasudevan, L. Tan, M. Kaminsky, M. A. Kozuch, D. Andersen, and P. Pillai. Fawnsort: Energy-efficient sorting of 10gb. Sort Benchmark final, 2010.Google Scholar
- D. Wong and M. Annavaram. Knightshift: scaling the energy proportionality wall through server-level heterogeneity. In Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, pages 119--130. IEEE, 2012. Google ScholarDigital Library
- Y. Yuan, R. Lee, and X. Zhang. The yin and yang of processing data warehousing queries on gpu devices. Proceedings of the VLDB Endowment, 6(10):817--828, 2013. Google ScholarDigital Library
- S. Zhang, J. He, B. He, and M. Lu. Omnidb: Towards portable and efficient query processing on parallel cpu/gpu architectures. Proceedings of the VLDB Endowment, 6(12):1374--1377, 2013. Google ScholarDigital Library
Index Terms
- Energy-Efficient Query Processing on Embedded CPU-GPU Architectures
Recommendations
Query Processing on Heterogeneous CPU/GPU Systems
Due to their high computational power and internal memory bandwidth, graphic processing units (GPUs) have been extensively studied by the database systems research community. A heterogeneous query processing system that employs CPUs and GPUs at the same ...
In-cache query co-processing on coupled CPU-GPU architectures
Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can ...
Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataRecently, Intel Xeon Phi is emerging as a many-core processor with up to 61 x86 cores. In this demonstration, we present PhiDB, an OLAP query processor with simultaneous multi-threading (SMT) capabilities on Xeon Phi as a case study for parallel ...
Comments