Top

Cluster Computing

Published in:

31-07-2019

Design of an adaptive GPU sharing and scheduling scheme in container-based cluster

Authors: Qichen Chen, Jisun Oh, Seoyoung Kim, Yoonhee Kim

Published in: Cluster Computing | Issue 3/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Container based virtualization is an innovative technology that accelerates software development by providing portability and maintainability of applications. Recently, a growing number of workloads such as high performance computing (HPC) and Deep Learning(DL) are deployed in the container based environment. However, GPU resource management issues especially the GPU memory over subscription issue in container-based clusters, which brings substantial performance loss, is still challenging. This paper proposes an adaptive fair-share method to share effectively in container-based virtualization environment as well as an execution rescheduling method to manage the execution order of each container for acquiring maximum performance gain. We also proposed a checkpoint based mechanism especially for DL workload running with TensorFlow, which can efficiently solve the GPU memory over subscription problem. We demonstrate that our approach contributes to overall performance improvement as well as higher resource utilization compared to default and static fair-share methods with homogeneous and heterogeneous workloads. Compared to two other conditions, their results show that the proposed method reduces by 16.37%, 15.61% in average execution time and boosts approximately by 52.46%, 10.3% in average GPU memory utilization, respectively. We also evaluated our checkpoint based mechanism by running multiple CNN workloads with TensorFlow at the same time and the result shows our proposed mechanism can ensure each workload executing safely without out of memory (OOM) error occurs.

previous article Low-overhead dynamic sharing of graphics memory space in GPU virtualization environments

next article Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Graphics Processing Unit. https://en.wikipedia.org/wiki/Graphics_processing_unit

Docker container. https://www.docker.com/

Calmels, J.: “nvidia docker”. https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker

Tensorflow. https://www.tensorflow.org/

Radchenko, G.I., Alaasam, A.B.A., Tchernykh, A.N.: Comparative analysis of virtualization methods in big data processing. Supercomput. Front. Innov. 6(1), 48–79 (2019)

Naik, N.: Migrating from virtualization to dockerization in the cloud: simulation and evaluation of distributed systems. In: 2016 IEEE 10th International Symposium on the Maintenance and Evolution of Service-Oriented and Cloud-Based Environments (MESOCA). pp. 1–8. IEEE (2016). https://doi.org/10.1109/MESOCA.2016.9

Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles—SOSP ’03. pp. 164–177. ACM Press, New York (2003). https://doi.org/10.1145/945445.945462

Gupta, Vishakha, et al.: GViM: GPU-accelerated virtual machines. In: Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM (2009)

Giunta, G., et al. A GPGPU transparent virtualization component for high performance computing clouds. European Conference on Parallel Processing. Springer, Berlin, Heidelberg (2010)

10.

Shi, Lin, et al.: vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61.6, 804–816 (2011)MathSciNetMATH

11.

Duato, José, et al.: rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. 2010 International Conference on High Performance Computing and Simulation. IEEE (2010)

12.

Kim, J., Jun, T.J., Kang, D., Kim, D., Kim, D.: GPU Enabled Serverless Computing Framework. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). pp. 533–540. IEEE (2018). https://doi.org/10.1109/PDP2018.2018.00090

13.

Herrera, A.: NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Tech. Rep. (2014)

14.

Herrera, Al.: NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Nvidia Corp 1–18 (2014)

15.

Boettiger, C.: An introduction to Docker for reproducible research. ACM SIGOPS Oper. Syst. Rev. 49(1), 71–79 (2015). https://doi.org/10.1145/2723872.2723882CrossRef

16.

Jisun O., et al.: Toward an Adaptive Fair GPU Sharing Scheme in Container-based Clusters, Foundations and Applications of Self* Systems (FAS*) (2018)

17.

Convolution Neural Network. https://cs231n.github.io/convolutional-networks/

18.

MNIST. http://yann.lecun.com/exdb/mnist/

19.

Salomon-Ferrer, R., Case, D.A., Walker, R.C.: An overview of the Amber biomolecular simulation package. WIREs Comput. Mol. Sci. 3, 198–210 (2013)CrossRef

20.

AMBER. http://ambermd.org/AmberMD.php

21.

Breuer, Stefan, et al.: Extending the SkelCL skeleton library for stencil computations on multi-GPU systems. In: Proceedings of the 1st International Workshop on High-Performance Stencil Computations (2014)

22.

Kämäräinen, Teemu, et al.: Virtual machines vs. containers in cloud gaming systems. In: 2015 International Workshop on Network and Systems Support for Games (NetGames). IEEE (2015)

Title: Design of an adaptive GPU sharing and scheduling scheme in container-based cluster
Authors: Qichen Chen
Jisun Oh
Seoyoung Kim
Yoonhee Kim
Publication date: 31-07-2019
Publisher: Springer US
Published in: Cluster Computing / Issue 3/2020
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-019-02969-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2020

Conceptualization of smartphone usage and feature preferences among various demographics

Internet of things-based urban waste management system for smart cities using a Cuckoo Search Algorithm

Graph pattern matching with counting quantifiers and label-repetition constraints

Detecting dental problem related brain disease using intelligent bacterial optimized associative deep neural network

Teeth infection and fatigue prediction using optimized neural networks and big data analytic tool

Multi-task learning based on question–answering style reviews for aspect category classification and aspect term extraction on GPU clusters

Premium Partner