research-article

A virtual memory based runtime to support multi-tenancy in clusters with GPUs

Authors:
Michela Becchi

University of Missouri, Columbia, MO, USA

University of Missouri, Columbia, MO, USA
View Profile

,
Kittisak Sajjapongse

University of Missouri, Columbia, MO, USA

University of Missouri, Columbia, MO, USA
View Profile

,
Ian Graves

University of Missouri, Columbia, MO, USA

University of Missouri, Columbia, MO, USA
View Profile

,
Adam Procter

University of Missouri, Columbia, MO, USA

University of Missouri, Columbia, MO, USA
View Profile

,
Vignesh Ravi

Ohio State University, Columbus, OH, USA

Ohio State University, Columbus, OH, USA
View Profile

,
Srimat Chakradhar

NEC Laboratories America, Princeton, NJ, USA

NEC Laboratories America, Princeton, NJ, USA
View Profile

HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed ComputingJune 2012Pages 97–108https://doi.org/10.1145/2287076.2287090

Published:18 June 2012Publication History

HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

Pages 97–108

ABSTRACT

Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters. Nevertheless, cloud computing services and resource management frameworks targeting heterogeneous clusters including GPUs are still in their infancy. Further, GPU software stacks (e.g., CUDA driver and runtime) currently provide very limited support to concurrency.

In this paper, we propose a runtime system that provides abstraction and sharing of GPUs, while allowing isolation of concurrent applications. A central component of our runtime is a memory manager that provides a virtual memory abstraction to the applications. Our runtime is flexible in terms of scheduling policies, and allows dynamic (as opposed to programmer-defined) binding of applications to GPUs. In addition, our framework supports dynamic load balancing, dynamic upgrade and downgrade of GPUs, and is resilient to their failures. Our runtime can be deployed in combination with VM-based cloud computing services to allow virtualization of heterogeneous clusters, or in combination with HPC cluster resource managers to form an integrated resource management infrastructure for heterogeneous clusters. Experiments conducted on a three-node cluster show that our GPU sharing scheme allows up to a 28% and a 50% performance improvement over serialized execution on short- and long-running jobs, respectively. Further, dynamic inter-node load balancing leads to an additional 18-20% performance benefit.

References

V. Gupta et al. 2009. GViM: GPU-accelerated virtual machines. In Proc. of HPCVirt '09. ACM, New York, NY, USA, pp. 17--24. Google ScholarDigital Library
L. Shi, H. Chen, and J. Sun. 2009. vCUDA: GPU accelerated high performance computing in virtual machines. In Proc. of IPDPS '09, Washington, DC, USA, pp. 1--11. Google ScholarDigital Library
J. Duato et al. 2010. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proc. of HPCS'10, pp. 224--231.Google ScholarCross Ref
G. Giunta, R. Montella, G. Agrillo, and G. Coviello. 2010. A GPGPU transparent virtualization component for high performance computing clouds. In Proc. of Euro-Par 2010, Heidelberg, 2010. Google ScholarDigital Library
gVirtuS: http://osl.uniparthenope.it/projects/gvirtusGoogle Scholar
V. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar. 2011. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework. In Proc. of HPDC '11. ACM, New York, NY, USA, pp. 217--228. Google ScholarDigital Library
M. Guevara, C. Gregg, K. Hazelwood, and K. Skadron. 2009. Enabling Task Parallelism in the CUDA Scheduler. In Workshop on Programming Models for Emerging Architectures, Sep. 2009.Google Scholar
I. Gelado et al. 2010. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proc. of ASPLOS '10. ACM, New York, NY, USA, pp. 347--358. Google ScholarDigital Library
M. Becchi, S. Byna, S. Cadambi, and S. Chakradhar. 2010. Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In Proc. of SPAA '10. ACM, New York, NY, USA, pp. 82--91. Google ScholarDigital Library
M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. 2008. Merge: a programming model for heterogeneous multi-core systems. In Proc. of ASPLOS'08. ACM, New York, NY, USA, pp. 287--296. Google ScholarDigital Library
B. Saha et al. 2009. Programming model for a heterogeneous x86 platform. In Proc. of PLDI '09. New York, NY, USA, pp. 431--440. Google ScholarDigital Library
C.-K. Luk, S. Hong, and H. Kim. 2009. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proc. of MICRO'09. ACM, New York, NY, USA, pp. 45--55. Google ScholarDigital Library
S.-Z. Ueng, M. Lathara, S. Baghsorkhi, and W.-M. Hwu. 2008. CUDA-Lite: Reducing GPU Programming Complexity. In Languages and Compilers for Parallel Computing, Lecture Notes in Comp. Sc., Vol. 5335. Springer-Verlag, Berlin, Heidelberg pp. 1--15. Google ScholarDigital Library
S. Lee and R. Eigenmann. 2010. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In Proc. of SC '10. Washington, DC, USA, pp. 1--11. Nov 2010. Google ScholarDigital Library
A. Nukada, H. Takizawa, and S. Matsuoka, 2011. NVCR: A Transparent Checkpoint-Restart Library for NVIDIA CUDA. In Proc. of IPDPDW'11, Shanghai, China, pp. 104--113, Sep 2011. Google ScholarDigital Library
N. Sundaram, A. Raghunathan, and S. Chakradhar. 2009. A framework for efficient and scalable execution of domain-specific templates on GPUs. In Proc. of IPDPS '09. IEEE Computer Society, Washington, DC, USA, pp. 1--12. Google ScholarDigital Library
J. Kim, H. Kim, J. Hwan Lee, and J. Lee. 2011. Achieving a single compute device image in OpenCL for multiple GPUs. In Proc. of PPoPP '11. ACM, New York, NY, USA, pp. 277--288. Google ScholarDigital Library
H. Lim, S. Babu, J. Chase, and S. Parekh. 2009. Automated control in cloud computing: challenges and opportunities. In Proc. of ACDC '09. ACM, New York, NY, USA, pp. 13--18. Google ScholarDigital Library
P. Marshall, K. Keahey, and T. Freeman. 2010. Elastic Site: Using Clouds to Elastically Extend Site Resources. In Proc. of CCGrid 2010, pp. 43--52, May 2010. Google ScholarDigital Library
P. Padala et al. 2009. Automated control of multiple virtualized resources. In Proc. of EuroSys '09. New York, NY, USA, pp. 13--26. Google ScholarDigital Library
M. Becchi and P. Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proc. of CF '06. ACM, New York, NY, USA, pp. 29--40. Google ScholarDigital Library
J. Nickolls, I. Buck, M. Garland, K. Skadron. 2008. Scalable Parallel Programming with CUDA. In ACM Queue. April 2008. Google ScholarDigital Library
G. Teodoro et al. 2009. Coordinating the use of GPU and CPU for improving performance of compute intensive applications. In Proc. of CLUSTER, pp. 1--10, 2009.Google ScholarCross Ref
Eucalyptus: http://www.eucalyptus.comGoogle Scholar
TORQUE Resource Manager: http://www.clusterresources.com/ products/TORQUE-resource-manager.phpGoogle Scholar
Amazon EC2 Instances: http://aws.amazon.com/ec2/Google Scholar
Nimbix Informatics Xcelerated: http://www.nimbix.netGoogle Scholar
Hoopoe: http://www.hoopoe-cloud.comGoogle Scholar
BLCR: https://ftg.lbl.gov/projects/CheckpointRestartGoogle Scholar
Rodinia : https://www.cs.virginia.edu/~skadron/ wiki/rodinia/index.php/Main_PageGoogle Scholar

Index Terms

A virtual memory based runtime to support multi-tenancy in clusters with GPUs
1. Computer systems organization
  1. Architectures
    1. Distributed architectures

Recommendations

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computing

Driven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still ...
Read More
A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

In the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and ...
Read More
A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

In the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
June 2012
308 pages
ISBN:9781450308052
DOI:10.1145/2287076
General Chair:
Dick Epema
Delft University of Technology and Eindhoven University of Technology, The Netherlands
,
Program Chairs:
Thilo Kielmann
Vrije Universiteit, The Netherlands
,
Matei Ripeanu
The University of British Columbia, Canada
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cluster computing
cuda
gpu
runtime systems
virtualization
Qualifiers
- research-article
Conference

Acceptance Rates
HPDC '12 Paper Acceptance Rate23of143submissions,16%Overall Acceptance Rate166of966submissions,17%
More
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 54
  Total Citations
  View Citations
- 973
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A virtual memory based runtime to support multi-tenancy in clusters with GPUs

HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A virtual memory based runtime to support multi-tenancy in clusters with GPUs

HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media