research-article

Augmenting Operating Systems with OpenCL Accelerators

Authors:
Chia-Heng Tu

National Cheng Kung University, Taiwan (R.O.C.)

National Cheng Kung University, Taiwan (R.O.C.)

0000-0001-8967-1385
View Profile

,
Te-Sheng Lin

National Cheng Kung University, Taiwan (R.O.C.)

National Cheng Kung University, Taiwan (R.O.C.)
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 24 Issue 3Article No.: 30pp 1–29https://doi.org/10.1145/3315569

Published:28 March 2019Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

Heterogeneous computing leverages more than one kind of processors to boost the performance of user-space applications with the heterogeneous programming languages, e.g., OpenCL. While some works have been done to accelerate the computations required by Linux kernel software, they are either application-specific solutions or tightly coupled with the certain computing platforms and are not able to support the general-purpose in-kernel accelerations using different types of processors. In this article, the general-purpose software framework called Kernel acceleration with OpenCL (KOCL), is proposed to tackle the problem. KOCL exposes a set of the high-level programming interfaces for the Linux kernel module developers to offload compute-intensive tasks on different hardware accelerators without managing and coordinating the platform-specific computing and memory resources. The simplified programming efforts are achieved by the developed platform management and memory models, which provide a systematic means of managing the heterogeneous hardware resources. In addition, the one- and zero-copy data-buffering schemes are offered by KOCL, so that the offloaded tasks deliver high performance on the platforms with different memory architectures. We have developed the prototype system to accelerate the Network-Attached Storage server applications. Significant performance improvements are achieved with the three different types of accelerators, i.e., the multicore processor, the integrated GPU, and the discrete GPU, respectively. We believe that KOCL is useful for the design of embedded appliances to evaluate the performance of design alternatives.

References

A. M. Aji, A. J. Peña, P. Balaji, and W. Feng. 2016. MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL. Parallel Comput. 58 (Oct. 2016), 37--55. Google ScholarDigital Library
G. Anshuj, M. Debadatta, and K. Purushottam. 2017. Catalyst: GPU-assisted rapid memory deduplication in virtualization environments. In VEE. 44--59. Google ScholarDigital Library
I. Eidus, A. Arcangeli, C. Wright, and H. Dickins. 2018. Kernel samepage merging. Retrieved from https://www.linux-kvm.org/page/KSM.Google Scholar
freedesktop. 2018. Beignet. Retrieved from https://www.freedesktop.org/wiki/Software/Beignet/.Google Scholar
T. Hicks and D. Kirkland. 2018. Retrieved from eCryptfs. http://ecryptfs.org/.Google Scholar
Intel. 2018. INTEL AES-NI. Retrieved from https://software.intel.com/sites/default/files/m/d/4/1/d/8/Introduction_to_Intel_Secure_Key_Instructions.pdf.Google Scholar
P. Jääskeläinen, C. Sánchez de La Lama, E. Schnetter, K. Raiskila, J. Takala, and H. Berg. 2015. pocl: A performance-portable OpenCL implementation. Int. J. Parallel Program. 43, 5 (Oct. 2015), 752--785. Google ScholarDigital Library
D. R. Kaeli, P. Mistry, D. Schaa, and D. P. Zhang. 2015. Heterogeneous Computing with OpenCL 2.0 (1st ed.). Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. 2012. SnuCL: An OpenCL framework for heterogeneous CPU/GPU clusters. In ICS. 341--352. Google ScholarDigital Library
W. C. Lin, C. H. Tu, C. W. Yeh, and S. H. Hung. 2017. GPU acceleration for kernel samepage merging. In RTCSA. 1--6.Google Scholar
Y. Luo, S. Li, K. Sun, R. Renteria, and K. Choi. 2017. Implementation of deep learning neural network for real-time object recognition in OpenCL framework. In ISOCC. 298--299.Google Scholar
J. Masek, R. Burget, L. Povoda, and M. Dutta. 2016. Multi-GPU implementation of machine learning algorithm using CUDA and OpenCL. Int. J. Adv. Telecommun. Electrotech. Sign. Syst. 5, 2 (Jun. 2016), 101--107.Google Scholar
C. Nugteren. 2018. CLTune: An automatic OpenCL and CUDA kernel tuner. Retrieved from https://github.com/CNugteren/CLTune.Google Scholar
NVIDIA Corporation. 2018. CUDA:Compute Unified Device Architecture. Retrieved from https://developer.nvidia.com/about-cuda.Google Scholar
NXP Semiconductors. 2018. RD-IMX6Q-SABRE: SABRE Board for Smart Devices Based on the i.MX 6Quad Applications Processors. Retrieved from https://www.nxp.com/support/developer-resources/evaluation-and-development-boards/sabre-development-system/sabre-board-for-smart-devices-based-on-the-i.mx-6quad-applications-processors:RD-IMX6Q-SABRE.Google Scholar
P. Pandit and R. Govindarajan. 2014. Fluidic kernels: Cooperative execution of OpenCL programs on multiple heterogeneous devices. In CGO. Article 273, 11 pages. Google ScholarDigital Library
B. Shai, B. Tanya, C. Tzachi, and S. Mark. 2017. SPIN: Seamless operating system integration of peer-to-peer DMA between SSDs and GPUs. In USENIX ATC. 167--179. Google ScholarDigital Library
K. Shinpei, M. Michael, M. Carlos, and B. Scott. 2012. Gdev: First-class GPU resource management in the operating system. In USENIX ATC. 37--37. Google ScholarDigital Library
Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: Integrating a file system with GPUs. SIGARCH Comput. Archit. News 41, 1 (Mar. 2013), 485--498. Google ScholarDigital Library
M. Silberstein, S. Kim, S. Huh, X. Zhang, Y. Hu, A. Wated, and E. Witchel. 2016. GPUnet: Networking abstractions for GPU programs. ACM Trans. Comput. Syst. 34, 3, Article 9 (Sep. 2016), 31 pages. Google ScholarDigital Library
W. Sun and R. Ricci. 2013. Augmenting operating systems with the GPU. http://arxiv.org/abs/1305.3345. CoRR (May 2013). arxiv:1305.3345Google Scholar
W. Sun, R. Ricci, and M. L. Curry. 2012. GPUstore:Harnessing GPU computing for storage systems in the OS kernel. In SYSTOR. Article 9, 12 pages. Google ScholarDigital Library
M. B. Taylor. 2017. The evolution of bitcoin hardware. Computer 50, 9 (Sep. 2017), 58--66.Google Scholar
The Khronos Group Inc. 2018. OpenCL: The open standard for parallel programming of heterogeneous systems. Retrieved from http://www.khronos.org/opencl/.Google Scholar
S. Xiao, P. Balaji, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. C. Feng. 2012. VOCL: An optimized environment for transparent virtualization of graphics processing units. In InPar. 1--12.Google Scholar
Y. P. You, H. J. Wu, Y. N. Tsai, and Y. T. Chao. 2015. VirtCL: A framework for OpenCL device abstraction and management. SIGPLAN Not. 50, 8 (Aug. 2015), 161--172. Google ScholarDigital Library

Index Terms

Augmenting Operating Systems with OpenCL Accelerators
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
  2. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Embedded software

Recommendations

A Platform-Oblivious Approach for Heterogeneous Computing: A Case Study with Monte Carlo-based Simulation for Medical Applications
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Light is important and helpful in many medical applications, such as cancer treatment. Computer modeling and simulation of light transport are often adopted to improve the quality of medical treatments. In particular, Monte Carlo-based simulations are ...
Read More
Hybrid OpenCL: Connecting Different OpenCL Implementations over Network
CIT '10: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology

We are developing Hybrid OpenCL, which enables the connection between different OpenCL implementations over the network. Hybrid OpenCL consists of two elements, a runtime system that provides the abstraction of different OpenCL implementations and a ...
Read More
Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs
SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

We evaluate the power and performance of the Rodinia benchmark suite using the Altera SDK for OpenCL targeting a Stratix V FPGA against a modern CPU and GPU. We study multiple OpenCL kernels per benchmark, ranging from direct ports of the original GPU ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Design Automation of Electronic Systems Volume 24, Issue 3
May 2019
266 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3319359
Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 28 March 2019
- Accepted: 1 January 2019
- Revised: 1 December 2018
- Received: 1 September 2018
Published in todaes Volume 24, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Heterogeneous computing
OpenCL acceleration
encrypted file system
kernel programming
kernel samepage merging
linux kernel
shared memory architecture
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 255
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Augmenting Operating Systems with OpenCL Accelerators

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

A Platform-Oblivious Approach for Heterogeneous Computing: A Case Study with Monte Carlo-based Simulation for Medical Applications

Hybrid OpenCL: Connecting Different OpenCL Implementations over Network

Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs