ABSTRACT
Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters. Nevertheless, cloud computing services and resource management frameworks targeting heterogeneous clusters including GPUs are still in their infancy. Further, GPU software stacks (e.g., CUDA driver and runtime) currently provide very limited support to concurrency.
In this paper, we propose a runtime system that provides abstraction and sharing of GPUs, while allowing isolation of concurrent applications. A central component of our runtime is a memory manager that provides a virtual memory abstraction to the applications. Our runtime is flexible in terms of scheduling policies, and allows dynamic (as opposed to programmer-defined) binding of applications to GPUs. In addition, our framework supports dynamic load balancing, dynamic upgrade and downgrade of GPUs, and is resilient to their failures. Our runtime can be deployed in combination with VM-based cloud computing services to allow virtualization of heterogeneous clusters, or in combination with HPC cluster resource managers to form an integrated resource management infrastructure for heterogeneous clusters. Experiments conducted on a three-node cluster show that our GPU sharing scheme allows up to a 28% and a 50% performance improvement over serialized execution on short- and long-running jobs, respectively. Further, dynamic inter-node load balancing leads to an additional 18-20% performance benefit.
- V. Gupta et al. 2009. GViM: GPU-accelerated virtual machines. In Proc. of HPCVirt '09. ACM, New York, NY, USA, pp. 17--24. Google ScholarDigital Library
- L. Shi, H. Chen, and J. Sun. 2009. vCUDA: GPU accelerated high performance computing in virtual machines. In Proc. of IPDPS '09, Washington, DC, USA, pp. 1--11. Google ScholarDigital Library
- J. Duato et al. 2010. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proc. of HPCS'10, pp. 224--231.Google ScholarCross Ref
- G. Giunta, R. Montella, G. Agrillo, and G. Coviello. 2010. A GPGPU transparent virtualization component for high performance computing clouds. In Proc. of Euro-Par 2010, Heidelberg, 2010. Google ScholarDigital Library
- gVirtuS: http://osl.uniparthenope.it/projects/gvirtusGoogle Scholar
- V. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar. 2011. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework. In Proc. of HPDC '11. ACM, New York, NY, USA, pp. 217--228. Google ScholarDigital Library
- M. Guevara, C. Gregg, K. Hazelwood, and K. Skadron. 2009. Enabling Task Parallelism in the CUDA Scheduler. In Workshop on Programming Models for Emerging Architectures, Sep. 2009.Google Scholar
- I. Gelado et al. 2010. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proc. of ASPLOS '10. ACM, New York, NY, USA, pp. 347--358. Google ScholarDigital Library
- M. Becchi, S. Byna, S. Cadambi, and S. Chakradhar. 2010. Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In Proc. of SPAA '10. ACM, New York, NY, USA, pp. 82--91. Google ScholarDigital Library
- M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. 2008. Merge: a programming model for heterogeneous multi-core systems. In Proc. of ASPLOS'08. ACM, New York, NY, USA, pp. 287--296. Google ScholarDigital Library
- B. Saha et al. 2009. Programming model for a heterogeneous x86 platform. In Proc. of PLDI '09. New York, NY, USA, pp. 431--440. Google ScholarDigital Library
- C.-K. Luk, S. Hong, and H. Kim. 2009. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proc. of MICRO'09. ACM, New York, NY, USA, pp. 45--55. Google ScholarDigital Library
- S.-Z. Ueng, M. Lathara, S. Baghsorkhi, and W.-M. Hwu. 2008. CUDA-Lite: Reducing GPU Programming Complexity. In Languages and Compilers for Parallel Computing, Lecture Notes in Comp. Sc., Vol. 5335. Springer-Verlag, Berlin, Heidelberg pp. 1--15. Google ScholarDigital Library
- S. Lee and R. Eigenmann. 2010. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In Proc. of SC '10. Washington, DC, USA, pp. 1--11. Nov 2010. Google ScholarDigital Library
- A. Nukada, H. Takizawa, and S. Matsuoka, 2011. NVCR: A Transparent Checkpoint-Restart Library for NVIDIA CUDA. In Proc. of IPDPDW'11, Shanghai, China, pp. 104--113, Sep 2011. Google ScholarDigital Library
- N. Sundaram, A. Raghunathan, and S. Chakradhar. 2009. A framework for efficient and scalable execution of domain-specific templates on GPUs. In Proc. of IPDPS '09. IEEE Computer Society, Washington, DC, USA, pp. 1--12. Google ScholarDigital Library
- J. Kim, H. Kim, J. Hwan Lee, and J. Lee. 2011. Achieving a single compute device image in OpenCL for multiple GPUs. In Proc. of PPoPP '11. ACM, New York, NY, USA, pp. 277--288. Google ScholarDigital Library
- H. Lim, S. Babu, J. Chase, and S. Parekh. 2009. Automated control in cloud computing: challenges and opportunities. In Proc. of ACDC '09. ACM, New York, NY, USA, pp. 13--18. Google ScholarDigital Library
- P. Marshall, K. Keahey, and T. Freeman. 2010. Elastic Site: Using Clouds to Elastically Extend Site Resources. In Proc. of CCGrid 2010, pp. 43--52, May 2010. Google ScholarDigital Library
- P. Padala et al. 2009. Automated control of multiple virtualized resources. In Proc. of EuroSys '09. New York, NY, USA, pp. 13--26. Google ScholarDigital Library
- M. Becchi and P. Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proc. of CF '06. ACM, New York, NY, USA, pp. 29--40. Google ScholarDigital Library
- J. Nickolls, I. Buck, M. Garland, K. Skadron. 2008. Scalable Parallel Programming with CUDA. In ACM Queue. April 2008. Google ScholarDigital Library
- G. Teodoro et al. 2009. Coordinating the use of GPU and CPU for improving performance of compute intensive applications. In Proc. of CLUSTER, pp. 1--10, 2009.Google ScholarCross Ref
- Eucalyptus: http://www.eucalyptus.comGoogle Scholar
- TORQUE Resource Manager: http://www.clusterresources.com/ products/TORQUE-resource-manager.phpGoogle Scholar
- Amazon EC2 Instances: http://aws.amazon.com/ec2/Google Scholar
- Nimbix Informatics Xcelerated: http://www.nimbix.netGoogle Scholar
- Hoopoe: http://www.hoopoe-cloud.comGoogle Scholar
- BLCR: https://ftg.lbl.gov/projects/CheckpointRestartGoogle Scholar
- Rodinia : https://www.cs.virginia.edu/~skadron/ wiki/rodinia/index.php/Main_PageGoogle Scholar
Index Terms
- A virtual memory based runtime to support multi-tenancy in clusters with GPUs
Recommendations
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computingDriven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still ...
A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computingIn the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and ...
A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computingIn the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and ...
Comments