skip to main content
research-article

Augmenting Operating Systems with OpenCL Accelerators

Published:28 March 2019Publication History
Skip Abstract Section

Abstract

Heterogeneous computing leverages more than one kind of processors to boost the performance of user-space applications with the heterogeneous programming languages, e.g., OpenCL. While some works have been done to accelerate the computations required by Linux kernel software, they are either application-specific solutions or tightly coupled with the certain computing platforms and are not able to support the general-purpose in-kernel accelerations using different types of processors. In this article, the general-purpose software framework called Kernel acceleration with OpenCL (KOCL), is proposed to tackle the problem. KOCL exposes a set of the high-level programming interfaces for the Linux kernel module developers to offload compute-intensive tasks on different hardware accelerators without managing and coordinating the platform-specific computing and memory resources. The simplified programming efforts are achieved by the developed platform management and memory models, which provide a systematic means of managing the heterogeneous hardware resources. In addition, the one- and zero-copy data-buffering schemes are offered by KOCL, so that the offloaded tasks deliver high performance on the platforms with different memory architectures. We have developed the prototype system to accelerate the Network-Attached Storage server applications. Significant performance improvements are achieved with the three different types of accelerators, i.e., the multicore processor, the integrated GPU, and the discrete GPU, respectively. We believe that KOCL is useful for the design of embedded appliances to evaluate the performance of design alternatives.

References

  1. A. M. Aji, A. J. Peña, P. Balaji, and W. Feng. 2016. MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL. Parallel Comput. 58 (Oct. 2016), 37--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Anshuj, M. Debadatta, and K. Purushottam. 2017. Catalyst: GPU-assisted rapid memory deduplication in virtualization environments. In VEE. 44--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Eidus, A. Arcangeli, C. Wright, and H. Dickins. 2018. Kernel samepage merging. Retrieved from https://www.linux-kvm.org/page/KSM.Google ScholarGoogle Scholar
  4. freedesktop. 2018. Beignet. Retrieved from https://www.freedesktop.org/wiki/Software/Beignet/.Google ScholarGoogle Scholar
  5. T. Hicks and D. Kirkland. 2018. Retrieved from eCryptfs. http://ecryptfs.org/.Google ScholarGoogle Scholar
  6. Intel. 2018. INTEL AES-NI. Retrieved from https://software.intel.com/sites/default/files/m/d/4/1/d/8/Introduction_to_Intel_Secure_Key_Instructions.pdf.Google ScholarGoogle Scholar
  7. P. Jääskeläinen, C. Sánchez de La Lama, E. Schnetter, K. Raiskila, J. Takala, and H. Berg. 2015. pocl: A performance-portable OpenCL implementation. Int. J. Parallel Program. 43, 5 (Oct. 2015), 752--785. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. R. Kaeli, P. Mistry, D. Schaa, and D. P. Zhang. 2015. Heterogeneous Computing with OpenCL 2.0 (1st ed.). Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. 2012. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In ICS. 341--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. C. Lin, C. H. Tu, C. W. Yeh, and S. H. Hung. 2017. GPU acceleration for kernel samepage merging. In RTCSA. 1--6.Google ScholarGoogle Scholar
  11. Y. Luo, S. Li, K. Sun, R. Renteria, and K. Choi. 2017. Implementation of deep learning neural network for real-time object recognition in OpenCL framework. In ISOCC. 298--299.Google ScholarGoogle Scholar
  12. J. Masek, R. Burget, L. Povoda, and M. Dutta. 2016. Multi-GPU implementation of machine learning algorithm using CUDA and OpenCL. Int. J. Adv. Telecommun. Electrotech. Sign. Syst. 5, 2 (Jun. 2016), 101--107.Google ScholarGoogle Scholar
  13. C. Nugteren. 2018. CLTune: An automatic OpenCL and CUDA kernel tuner. Retrieved from https://github.com/CNugteren/CLTune.Google ScholarGoogle Scholar
  14. NVIDIA Corporation. 2018. CUDA:Compute Unified Device Architecture. Retrieved from https://developer.nvidia.com/about-cuda.Google ScholarGoogle Scholar
  15. NXP Semiconductors. 2018. RD-IMX6Q-SABRE: SABRE Board for Smart Devices Based on the i.MX 6Quad Applications Processors. Retrieved from https://www.nxp.com/support/developer-resources/evaluation-and-development-boards/sabre-development-system/sabre-board-for-smart-devices-based-on-the-i.mx-6quad-applications-processors:RD-IMX6Q-SABRE.Google ScholarGoogle Scholar
  16. P. Pandit and R. Govindarajan. 2014. Fluidic kernels: Cooperative execution of OpenCL programs on multiple heterogeneous devices. In CGO. Article 273, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Shai, B. Tanya, C. Tzachi, and S. Mark. 2017. SPIN: Seamless operating system integration of peer-to-peer DMA between SSDs and GPUs. In USENIX ATC. 167--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Shinpei, M. Michael, M. Carlos, and B. Scott. 2012. Gdev: First-class GPU resource management in the operating system. In USENIX ATC. 37--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: Integrating a file system with GPUs. SIGARCH Comput. Archit. News 41, 1 (Mar. 2013), 485--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Silberstein, S. Kim, S. Huh, X. Zhang, Y. Hu, A. Wated, and E. Witchel. 2016. GPUnet: Networking abstractions for GPU programs. ACM Trans. Comput. Syst. 34, 3, Article 9 (Sep. 2016), 31 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Sun and R. Ricci. 2013. Augmenting operating systems with the GPU. http://arxiv.org/abs/1305.3345. CoRR (May 2013). arxiv:1305.3345Google ScholarGoogle Scholar
  22. W. Sun, R. Ricci, and M. L. Curry. 2012. GPUstore:Harnessing GPU computing for storage systems in the OS kernel. In SYSTOR. Article 9, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. B. Taylor. 2017. The evolution of bitcoin hardware. Computer 50, 9 (Sep. 2017), 58--66.Google ScholarGoogle Scholar
  24. The Khronos Group Inc. 2018. OpenCL: The open standard for parallel programming of heterogeneous systems. Retrieved from http://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  25. S. Xiao, P. Balaji, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. C. Feng. 2012. VOCL: An optimized environment for transparent virtualization of graphics processing units. In InPar. 1--12.Google ScholarGoogle Scholar
  26. Y. P. You, H. J. Wu, Y. N. Tsai, and Y. T. Chao. 2015. VirtCL: A framework for OpenCL device abstraction and management. SIGPLAN Not. 50, 8 (Aug. 2015), 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Augmenting Operating Systems with OpenCL Accelerators

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Design Automation of Electronic Systems
          ACM Transactions on Design Automation of Electronic Systems  Volume 24, Issue 3
          May 2019
          266 pages
          ISSN:1084-4309
          EISSN:1557-7309
          DOI:10.1145/3319359
          • Editor:
          • Naehyuck Chang
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 March 2019
          • Accepted: 1 January 2019
          • Revised: 1 December 2018
          • Received: 1 September 2018
          Published in todaes Volume 24, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format