Skip to main content
Top
Published in: Cluster Computing 3/2020

31-07-2019

Design of an adaptive GPU sharing and scheduling scheme in container-based cluster

Authors: Qichen Chen, Jisun Oh, Seoyoung Kim, Yoonhee Kim

Published in: Cluster Computing | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Container based virtualization is an innovative technology that accelerates software development by providing portability and maintainability of applications. Recently, a growing number of workloads such as high performance computing (HPC) and Deep Learning(DL) are deployed in the container based environment. However, GPU resource management issues especially the GPU memory over subscription issue in container-based clusters, which brings substantial performance loss, is still challenging. This paper proposes an adaptive fair-share method to share effectively in container-based virtualization environment as well as an execution rescheduling method to manage the execution order of each container for acquiring maximum performance gain. We also proposed a checkpoint based mechanism especially for DL workload running with TensorFlow, which can efficiently solve the GPU memory over subscription problem. We demonstrate that our approach contributes to overall performance improvement as well as higher resource utilization compared to default and static fair-share methods with homogeneous and heterogeneous workloads. Compared to two other conditions, their results show that the proposed method reduces by 16.37%, 15.61% in average execution time and boosts approximately by 52.46%, 10.3% in average GPU memory utilization, respectively. We also evaluated our checkpoint based mechanism by running multiple CNN workloads with TensorFlow at the same time and the result shows our proposed mechanism can ensure each workload executing safely without out of memory (OOM) error occurs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
5.
go back to reference Radchenko, G.I., Alaasam, A.B.A., Tchernykh, A.N.: Comparative analysis of virtualization methods in big data processing. Supercomput. Front. Innov. 6(1), 48–79 (2019) Radchenko, G.I., Alaasam, A.B.A., Tchernykh, A.N.: Comparative analysis of virtualization methods in big data processing. Supercomput. Front. Innov. 6(1), 48–79 (2019)
6.
go back to reference Naik, N.: Migrating from virtualization to dockerization in the cloud: simulation and evaluation of distributed systems. In: 2016 IEEE 10th International Symposium on the Maintenance and Evolution of Service-Oriented and Cloud-Based Environments (MESOCA). pp. 1–8. IEEE (2016). https://doi.org/10.1109/MESOCA.2016.9 Naik, N.: Migrating from virtualization to dockerization in the cloud: simulation and evaluation of distributed systems. In: 2016 IEEE 10th International Symposium on the Maintenance and Evolution of Service-Oriented and Cloud-Based Environments (MESOCA). pp. 1–8. IEEE (2016). https://​doi.​org/​10.​1109/​MESOCA.​2016.​9
7.
go back to reference Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles—SOSP ’03. pp. 164–177. ACM Press, New York (2003). https://doi.org/10.1145/945445.945462 Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles—SOSP ’03. pp. 164–177. ACM Press, New York (2003). https://​doi.​org/​10.​1145/​945445.​945462
8.
go back to reference Gupta, Vishakha, et al.: GViM: GPU-accelerated virtual machines. In: Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM (2009) Gupta, Vishakha, et al.: GViM: GPU-accelerated virtual machines. In: Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing. ACM (2009)
9.
go back to reference Giunta, G., et al. A GPGPU transparent virtualization component for high performance computing clouds. European Conference on Parallel Processing. Springer, Berlin, Heidelberg (2010) Giunta, G., et al. A GPGPU transparent virtualization component for high performance computing clouds. European Conference on Parallel Processing. Springer, Berlin, Heidelberg (2010)
10.
go back to reference Shi, Lin, et al.: vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61.6, 804–816 (2011)MathSciNetMATH Shi, Lin, et al.: vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61.6, 804–816 (2011)MathSciNetMATH
11.
go back to reference Duato, José, et al.: rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. 2010 International Conference on High Performance Computing and Simulation. IEEE (2010) Duato, José, et al.: rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. 2010 International Conference on High Performance Computing and Simulation. IEEE (2010)
13.
go back to reference Herrera, A.: NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Tech. Rep. (2014) Herrera, A.: NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Tech. Rep. (2014)
14.
go back to reference Herrera, Al.: NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Nvidia Corp 1–18 (2014) Herrera, Al.: NVIDIA GRID: Graphics accelerated VDI with the visual performance of a workstation. Nvidia Corp 1–18 (2014)
16.
go back to reference Jisun O., et al.: Toward an Adaptive Fair GPU Sharing Scheme in Container-based Clusters, Foundations and Applications of Self* Systems (FAS*) (2018) Jisun O., et al.: Toward an Adaptive Fair GPU Sharing Scheme in Container-based Clusters, Foundations and Applications of Self* Systems (FAS*) (2018)
19.
go back to reference Salomon-Ferrer, R., Case, D.A., Walker, R.C.: An overview of the Amber biomolecular simulation package. WIREs Comput. Mol. Sci. 3, 198–210 (2013)CrossRef Salomon-Ferrer, R., Case, D.A., Walker, R.C.: An overview of the Amber biomolecular simulation package. WIREs Comput. Mol. Sci. 3, 198–210 (2013)CrossRef
21.
go back to reference Breuer, Stefan, et al.: Extending the SkelCL skeleton library for stencil computations on multi-GPU systems. In: Proceedings of the 1st International Workshop on High-Performance Stencil Computations (2014) Breuer, Stefan, et al.: Extending the SkelCL skeleton library for stencil computations on multi-GPU systems. In: Proceedings of the 1st International Workshop on High-Performance Stencil Computations (2014)
22.
go back to reference Kämäräinen, Teemu, et al.: Virtual machines vs. containers in cloud gaming systems. In: 2015 International Workshop on Network and Systems Support for Games (NetGames). IEEE (2015) Kämäräinen, Teemu, et al.: Virtual machines vs. containers in cloud gaming systems. In: 2015 International Workshop on Network and Systems Support for Games (NetGames). IEEE (2015)
Metadata
Title
Design of an adaptive GPU sharing and scheduling scheme in container-based cluster
Authors
Qichen Chen
Jisun Oh
Seoyoung Kim
Yoonhee Kim
Publication date
31-07-2019
Publisher
Springer US
Published in
Cluster Computing / Issue 3/2020
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-019-02969-3

Other articles of this Issue 3/2020

Cluster Computing 3/2020 Go to the issue

Premium Partner