nach oben

Cluster Computing

Erschienen in:

08.09.2018

On the effect of using rCUDA to provide CUDA acceleration to Xen virtual machines

verfasst von: Javier Prades, Carlos Reaño, Federico Silla

Erschienen in: Cluster Computing | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Nowadays, many data centers use virtual machines (VMs) in order to achieve a more efficient use of hardware resources. The use of VMs provides a reduction in equipment and maintenance expenses as well as a lower electricity consumption. Nevertheless, current virtualization solutions, such as Xen, do not easily provide graphics processing units (GPUs) to applications running in the virtualized domain with the flexibility usually required in data centers (i.e., managing virtual GPU instances and concurrently sharing them among several VMs). Therefore, the execution of GPU-accelerated applications within VMs is hindered by this lack of flexibility. In this regard, remote GPU virtualization solutions may address this concern. In this paper we analyze the use of the remote GPU virtualization mechanism to accelerate scientific applications running inside Xen VMs. We conduct our study with six different applications, namely CUDA-MEME, CUDASW++, GPU-BLAST, LAMMPS, a triangle count application, referred to as TRICO, and a synthetic benchmark used to emulate different application behaviors. Our experiments show that the use of remote GPU virtualization is a feasible approach to address the current concerns of sharing GPUs among several VMs, featuring a very low overhead if an InfiniBand fabric is already present in the cluster.

Vorheriger Artikel A resource recommendation method based on dynamic cluster analysis of application characteristics

Nächster Artikel Quantum virtual machine: power and performance management in virtualized web servers clusters

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

CUDA (compute unified device architecture) is a technology created by NVIDIA which comprises a parallel compute platform (CUDA-enabled graphics processing units) as well as an application programming interface (API) and a compiler.

In addition to the GRID K1 GPU, NVIDIA has also brought to market the GRID K2 model, which features 1536 CUDA cores per GPU and 4 GB of memory. However, this amount of resources per GPU is still noticeable smaller than the ones available in current NVIDIA Tesla K20 and K40 GPUs, featuring, respectively, 2496 and 2880 CUDA cores and 5GB and 12GB of memory. Therefore, using the GRID K2 device for providing acceleration to scientific applications instead of providing desktop virtualization would deliver a significantly lower performance than current mainstream GPUs used in HPC servers, such as the K20 or K40 GPUs.

KVMGT is the open source implementation of Intel’s GPU Virtualization Technology for KVM VMs.

In order to interact with the virtualized GPU, some kind of interface is required so that the application can access the virtual device. This interface could be placed at different levels. For instance, it could be placed at the driver level. However, GPU drivers usually employ low-level protocols which, additionally, are proprietary and strictly closed by GPU vendors. Therefore, a higher-level boundary must be used. This is why the GPU API is commonly selected for placing the virtualization boundary, given that these APIs are public.

The native domain refers to a scenario where virtualization is not used, that is, a real computer is leveraged. On the other hand, the virtual domain refers to the virtual machine.

In this work, we will refer to main memory as host memory or just host, while GPU memory will be referred as device memory or simply device, according to the well-established usage defined in the CUDA ecosystem.

Kernel-Based Virtual Machine, KVM. http://www.linux-kvm.org (2015). Accessed 19 Oct 2015

Xen Project. http://www.xenproject.org/ (2015). Accessed 19 Oct 2015

VMware Virtualization. http://www.vmware.com/ (2015). Accessed 19 Oct 2015

Oracle VM VirtualBox. http://www.virtualbox.org/ (2015). Accessed 19 Oct 2015

Semnanian, A., Pham, J., Englert, B., Wu, X.: Virtualization technology and its impact on computer hardware architecture. In: Proceedings of the Information Technology: New Generations, ITNG, pp. 719–724 (2011)

Felter, W., Ferreira, A., Rajamony, R., Rubio, J.: An updated performance comparison of virtual machines and linux containers. In: IBM Research Report (2014)

Zhang, J., Lu, X., Arnold, M., Panda, D.: MVAPICH2 over OpenStack with SR-IOV: an efficient approach to build HPC Clouds. In: Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid, pp. 71–80 (2015)

Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red Fox: an execution environment for relational query processing on GPUs. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO (2014)

Playne, D.P., Hawick, K.A.: Data parallel three-dimensional Cahn-Hilliard field equation simulation on GPUs with CUDA. In: Proceedings of the Parallel and Distributed Processing Techniques and Applications, PDPTA, pp. 104–110 (2009)

10.

Yamazaki, I., Dong, T., Solcà, R., Tomov, S., Dongarra, J., Schulthess, T.: Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems. Concurr. Comput.: Pract. Exp. 26(16), 2652–2666 (2014)CrossRef

11.

Luo, D.Y.: Canny edge detection on NVIDIA CUDA. In: Proceedings of the Computer Vision and Pattern Recognition Workshops, CVPR Workshops, pp. 1–8 (2008)

12.

Surkov, V.: Parallel option pricing with Fourier space time-stepping method on graphics processing units. Parallel Comput. 36(7), 372–380 (2010)MathSciNetMATHCrossRef

13.

Agarwal, P.K., Hampton, S., Poznanovic, J., Ramanthan, A., Alam, S.R., Crozier, P.S.: Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures. Concurr. Comput.: Pract. Exp. 25(10), 1356–1375 (2013)CrossRef

14.

Luo, G.H., Huang, S.K., Chang, Y.S., Yuan, S.M.: A parallel bees algorithm implementation on GPU. J. Syst. Arch. 60(3), 271–279 (2014)CrossRef

15.

NVIDIA GRID Technology. http://www.nvidia.com/object/grid-technology.html (2015). Accessed 19 Oct 2015

16.

Song, J., et al: KVMGT: a full GPU virtualization solution. In: KVM Forum (2014)

17.

AMD Multiuser GPU, Hardware-Based Virtualized Solution. http://www.amd.com/Documents/Multiuser-GPU-Datasheet.pdf (2015). Accessed 19 Oct 2015

18.

V-GPU: GPU Virtualization. https://github.com/zillians/platform_manifest_vgpu (2015). Accessed 19 Oct 2015

19.

Oikawa, M., Kawai, A., Nomura, K., Yasuoka, K., Yoshikawa, K., Narumi, T.: DS-CUDA: a middleware to use many GPUs in the cloud environment. In: Proceedings of the SC Companion: High Performance Computing, Networking Storage and Analysis, SCC, pp. 1207–1214 (2012)

20.

Reaño, C., Silla, F., Shainer, G., Schultz, S.: Local and remote GPUs perform similar with EDR 100G InfiniBand. In: Proceedings of the Industrial Track of the 16th International Middleware Conference, ACM, Middleware Industry ’15, pp. 4:1–4:7 (2015)

21.

Reaño, C., Silla, F., Duato, J.: Enhancing the rCUDA remote GPU virtualization framework: from a prototype to a production solution. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, IEEE Press, CCGrid ’17, pp. 695–698 (2017)

22.

Shi, L., Chen, H., Sun, J.: vCUDA: GPU accelerated high performance computing in virtual machines. In: Proceedings of the IEEE Parallel and Distributed Processing Symposium, IPDPS, pp. 1–11 (2009)

23.

Liang, T.Y., Chang, Y.W.: GridCuda: A grid-enabled CUDA programming toolkit. In: Proceedings of the IEEE Advanced Information Networking and Applications Workshops, WAINA, pp. 141–146 (2011)

24.

Giunta, G., Montella, R., Agrillo, G., Coviello, G.: A GPGPU transparent virtualization component for high performance computing clouds. In: Proceedings of the Euro-Par Parallel Processing, Euro-Par, pp. 379–391 (2010)

25.

Gupta, V., Gavrilovska, A., Schwan, K., Kharche, H., Tolia, N., Talwar, V., Ranganathan, P. GViM: GPU-accelerated virtual machines. In: Proceedings of the ACM Workshop on System-level Virtualization for High Performance Computing, HPCVirt, pp. 17–24 (2009)

26.

Merritt, A.M., Gupta, V., Verma, A., Gavrilovska, A., Schwan, K.: Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies. In: Proceedings of the International Workshop on Virtualization Technologies in Distributed Computing, VTDC, pp. 3–10 (2011)

27.

Shadowfax II—Scalable Implementation of GPGPU Assemblies. http://keeneland.gatech.edu/software/keeneland/kidron (2015). Accessed 19 Oct 2015

28.

Walters, J.P., Younge, A.J., Kang, D.I., Yao, K.T., Kang, M., Crago, S.P., Fox, G.C.: GPU-passthrough performance: a comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL applications. In: Proceedings of the IEEE International Conference on Cloud Computing, CLOUD (2014)

29.

Yang, C.T., Wang, H.Y., Ou, W.S., Liu, Y.T., Hsu, C.H.: On implementation of GPU virtualization using PCI pass-through. In: Proceedings of the IEEE Cloud Computing Technology and Science, CloudCom, pp. 711–716 (2012)

30.

Jo, H., Jeong, J., Lee, M., Choi, D.H.: Exploiting GPUs in virtual machine for BioCloud. BioMed Res. Int. 2013, 11 (2013). https://doi.org/10.1155/2013/939460 CrossRef

31.

NVIDIA: CUDA C Programming Guide 7.5. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf (2015a). Accessed 19 Oct 2015

32.

NVIDIA: CUDA Runtime API Reference Manual 7.5. http://docs.nvidia.com/cuda/pdf/CUDA_Runtime_API.pdf (2015b). Accessed 19 Oct 2015

33.

NVIDIA: The NVIDIA GPU Computing SDK Version 5.5 (2013)

34.

iperf3: A TCP, UDP, and SCTP Network Bandwidth Measurement Tool. https://github.com/esnet/iperf (2015). Accessed 19 Oct 2015

35.

Reaño, C., Silla, F.: Reducing the performance gap of remote GPU virtualization with InfiniBand Connect-IB. In: 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 920–925 (2016)

36.

Mellanox: Connect-IB Single and Dual QSFP+ Port PCI Express Gen3 x16 Adapter Card User Manual. http://www.mellanox.com/related-docs/user_manuals/Connect-IB_Single_and_Dual_QSFP+_Port_PCI_Express_Gen3_%20x16_Adapter_Card_User_Manual.pdf (2014a). Accessed 19 Oct 2015

37.

Mellanox: ConnectX-3 VPI Single and Dual QSFP+ Port Adapter Card User Manual 1.7. http://www.mellanox.com/related-docs/user_manuals/ConnectX-3_VPI_Single_and_Dual_QSFP_Port_Adapter_Card_User_Manual.pdf (2013). Accessed 19 Oct 2015

38.

Pérez, F., Reaño, C., Silla, F.: Providing CUDA acceleration to KVM virtual machines in InfiniBand clusters with rCUDA. In: 16th International Conference Distributed Applications and Interoperable Systems (DAIS), pp. 82–95. Springer International Publishing (2016)

39.

Mellanox: Mellanox OFED for Linux User Manual. http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v2.3-1.0.1.pdf (2014b). Accessed 19 Oct 2015

40.

Reaño, C., Mayo, R., Quintana-Ortí, E., Silla, F., Duato, J., Peña, A.: Influence of InfiniBand FDR on the performance of remote GPU virtualization. In: Proceedings of the IEEE International Conference on Cluster Computing, CLUSTER, pp. 1–8 (2013)

41.

Laboratories, S.N.: LAMMPS Molecular Dynamics Simulator. http://lammps.sandia.gov/ (2013). Accessed 19 Oct 2015

42.

Liu, Y., Schmidt, B., Liu, W., Maskell, D.L.: CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognit. Lett. 31(14), 2170–2177 (2010)CrossRef

43.

Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformat. 14(1), 1–10 (2013)CrossRef

44.

Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)CrossRef

45.

NVIDIA: NVIDIA Popular GPU-Accelerated Applications Catalog. http://www.nvidia.com/content/gpu-applications/PDF/GPU-apps-catalog-mar2015.pdf (2015c). Accessed 19 Oct 2015

46.

Liu, Y. CUDA-MEME. https://sites.google.com/site/yongchaosoftware/mcuda-meme (2014). Accessed 19 Oct 2015

47.

Polak, A.: Counting triangles in large graphs on GPU. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 740–746 (2016)

48.

Prades, J., Silla, F.: Turning GPUs into floating devices over the cluster: the Beauty of GPU Migration. In: Proceedings of the 6th Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA) (2017)

Titel: On the effect of using rCUDA to provide CUDA acceleration to Xen virtual machines
verfasst von: Javier Prades
Carlos Reaño
Federico Silla
Publikationsdatum: 08.09.2018
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe 1/2019
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-018-2845-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2019

A design and development of an intelligent jammer and jamming detection methodologies using machine learning approach

A survey of privacy preserving schemes in IoE enabled Smart Grid Advanced Metering Infrastructure

RFTL: improving performance of selective caching-based page-level FTL through replication

Towards Sustainable High-Performance Transaction Processing in Cloud-based DBMS

Hybrid gradient descent cuckoo search (HGDCS) algorithm for resource scheduling in IaaS cloud computing environment

A fog based load forecasting strategy for smart grids using big electrical data

Premium Partner