Abstract
Heterogeneous computing leverages more than one kind of processors to boost the performance of user-space applications with the heterogeneous programming languages, e.g., OpenCL. While some works have been done to accelerate the computations required by Linux kernel software, they are either application-specific solutions or tightly coupled with the certain computing platforms and are not able to support the general-purpose in-kernel accelerations using different types of processors. In this article, the general-purpose software framework called Kernel acceleration with OpenCL (KOCL), is proposed to tackle the problem. KOCL exposes a set of the high-level programming interfaces for the Linux kernel module developers to offload compute-intensive tasks on different hardware accelerators without managing and coordinating the platform-specific computing and memory resources. The simplified programming efforts are achieved by the developed platform management and memory models, which provide a systematic means of managing the heterogeneous hardware resources. In addition, the one- and zero-copy data-buffering schemes are offered by KOCL, so that the offloaded tasks deliver high performance on the platforms with different memory architectures. We have developed the prototype system to accelerate the Network-Attached Storage server applications. Significant performance improvements are achieved with the three different types of accelerators, i.e., the multicore processor, the integrated GPU, and the discrete GPU, respectively. We believe that KOCL is useful for the design of embedded appliances to evaluate the performance of design alternatives.
- A. M. Aji, A. J. Peña, P. Balaji, and W. Feng. 2016. MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL. Parallel Comput. 58 (Oct. 2016), 37--55. Google ScholarDigital Library
- G. Anshuj, M. Debadatta, and K. Purushottam. 2017. Catalyst: GPU-assisted rapid memory deduplication in virtualization environments. In VEE. 44--59. Google ScholarDigital Library
- I. Eidus, A. Arcangeli, C. Wright, and H. Dickins. 2018. Kernel samepage merging. Retrieved from https://www.linux-kvm.org/page/KSM.Google Scholar
- freedesktop. 2018. Beignet. Retrieved from https://www.freedesktop.org/wiki/Software/Beignet/.Google Scholar
- T. Hicks and D. Kirkland. 2018. Retrieved from eCryptfs. http://ecryptfs.org/.Google Scholar
- Intel. 2018. INTEL AES-NI. Retrieved from https://software.intel.com/sites/default/files/m/d/4/1/d/8/Introduction_to_Intel_Secure_Key_Instructions.pdf.Google Scholar
- P. Jääskeläinen, C. Sánchez de La Lama, E. Schnetter, K. Raiskila, J. Takala, and H. Berg. 2015. pocl: A performance-portable OpenCL implementation. Int. J. Parallel Program. 43, 5 (Oct. 2015), 752--785. Google ScholarDigital Library
- D. R. Kaeli, P. Mistry, D. Schaa, and D. P. Zhang. 2015. Heterogeneous Computing with OpenCL 2.0 (1st ed.). Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. 2012. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In ICS. 341--352. Google ScholarDigital Library
- W. C. Lin, C. H. Tu, C. W. Yeh, and S. H. Hung. 2017. GPU acceleration for kernel samepage merging. In RTCSA. 1--6.Google Scholar
- Y. Luo, S. Li, K. Sun, R. Renteria, and K. Choi. 2017. Implementation of deep learning neural network for real-time object recognition in OpenCL framework. In ISOCC. 298--299.Google Scholar
- J. Masek, R. Burget, L. Povoda, and M. Dutta. 2016. Multi-GPU implementation of machine learning algorithm using CUDA and OpenCL. Int. J. Adv. Telecommun. Electrotech. Sign. Syst. 5, 2 (Jun. 2016), 101--107.Google Scholar
- C. Nugteren. 2018. CLTune: An automatic OpenCL and CUDA kernel tuner. Retrieved from https://github.com/CNugteren/CLTune.Google Scholar
- NVIDIA Corporation. 2018. CUDA:Compute Unified Device Architecture. Retrieved from https://developer.nvidia.com/about-cuda.Google Scholar
- NXP Semiconductors. 2018. RD-IMX6Q-SABRE: SABRE Board for Smart Devices Based on the i.MX 6Quad Applications Processors. Retrieved from https://www.nxp.com/support/developer-resources/evaluation-and-development-boards/sabre-development-system/sabre-board-for-smart-devices-based-on-the-i.mx-6quad-applications-processors:RD-IMX6Q-SABRE.Google Scholar
- P. Pandit and R. Govindarajan. 2014. Fluidic kernels: Cooperative execution of OpenCL programs on multiple heterogeneous devices. In CGO. Article 273, 11 pages. Google ScholarDigital Library
- B. Shai, B. Tanya, C. Tzachi, and S. Mark. 2017. SPIN: Seamless operating system integration of peer-to-peer DMA between SSDs and GPUs. In USENIX ATC. 167--179. Google ScholarDigital Library
- K. Shinpei, M. Michael, M. Carlos, and B. Scott. 2012. Gdev: First-class GPU resource management in the operating system. In USENIX ATC. 37--37. Google ScholarDigital Library
- Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: Integrating a file system with GPUs. SIGARCH Comput. Archit. News 41, 1 (Mar. 2013), 485--498. Google ScholarDigital Library
- M. Silberstein, S. Kim, S. Huh, X. Zhang, Y. Hu, A. Wated, and E. Witchel. 2016. GPUnet: Networking abstractions for GPU programs. ACM Trans. Comput. Syst. 34, 3, Article 9 (Sep. 2016), 31 pages. Google ScholarDigital Library
- W. Sun and R. Ricci. 2013. Augmenting operating systems with the GPU. http://arxiv.org/abs/1305.3345. CoRR (May 2013). arxiv:1305.3345Google Scholar
- W. Sun, R. Ricci, and M. L. Curry. 2012. GPUstore:Harnessing GPU computing for storage systems in the OS kernel. In SYSTOR. Article 9, 12 pages. Google ScholarDigital Library
- M. B. Taylor. 2017. The evolution of bitcoin hardware. Computer 50, 9 (Sep. 2017), 58--66.Google Scholar
- The Khronos Group Inc. 2018. OpenCL: The open standard for parallel programming of heterogeneous systems. Retrieved from http://www.khronos.org/opencl/.Google Scholar
- S. Xiao, P. Balaji, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. C. Feng. 2012. VOCL: An optimized environment for transparent virtualization of graphics processing units. In InPar. 1--12.Google Scholar
- Y. P. You, H. J. Wu, Y. N. Tsai, and Y. T. Chao. 2015. VirtCL: A framework for OpenCL device abstraction and management. SIGPLAN Not. 50, 8 (Aug. 2015), 161--172. Google ScholarDigital Library
Index Terms
- Augmenting Operating Systems with OpenCL Accelerators
Recommendations
A Platform-Oblivious Approach for Heterogeneous Computing: A Case Study with Monte Carlo-based Simulation for Medical Applications
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysLight is important and helpful in many medical applications, such as cancer treatment. Computer modeling and simulation of light transport are often adopted to improve the quality of medical treatments. In particular, Monte Carlo-based simulations are ...
Hybrid OpenCL: Connecting Different OpenCL Implementations over Network
CIT '10: Proceedings of the 2010 10th IEEE International Conference on Computer and Information TechnologyWe are developing Hybrid OpenCL, which enables the connection between different OpenCL implementations over the network. Hybrid OpenCL consists of two elements, a runtime system that provides the abstraction of different OpenCL implementations and a ...
Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs
SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisWe evaluate the power and performance of the Rodinia benchmark suite using the Altera SDK for OpenCL targeting a Stratix V FPGA against a modern CPU and GPU. We study multiple OpenCL kernels per benchmark, ranging from direct ports of the original GPU ...
Comments