research-article

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

Authors:
Kittisak Sajjapongse

University of Missouri, Columbia, MO, USA

University of Missouri, Columbia, MO, USA
View Profile

,
Xiang Wang

University of Missouri, Columbia, MO, USA

University of Missouri, Columbia, MO, USA
View Profile

,
Michela Becchi

University of Missouri, Columbia, MO, USA

University of Missouri, Columbia, MO, USA
View Profile

HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computingJune 2013Pages 179–190https://doi.org/10.1145/2493123.2462911

Published:17 June 2013Publication History

HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Pages 179–190

ABSTRACT

In the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and Stampede and the Chinese Tianhe-1A supercomputers). As a result, widely used open-source cluster resource managers (e.g. SLURM and TORQUE) have recently been extended with GPU support capabilities. These systems, however, provide simple scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. In this paper, we propose a runtime system that can be integrated with existing cluster resource managers to enable a more efficient use of heterogeneous clusters with GPUs. Differently from previous work, we focus on multi-process GPU applications including synchronization (for example, hybrid MPI-CUDA applications). We discuss the limitations and inefficiencies of existing scheduling and resource sharing schemes in the presence of synchronization. We show that preemption is an effective mechanism to allow efficient scheduling of hybrid MPI-CUDA applications. We validate our runtime on a variety of benchmark programs with different computation and communication patterns.

References

M. Becchi, K. Sajjapongse et al., "A virtual memory based runtime to support multi-tenancy in clusters with GPUs," in Proc. of the 21st Int'l Symp. on High-Performance Parallel and Distributed Computing, 2012, pp. 97--108. Google ScholarDigital Library
V. Gupta, A. Gavrilovska et al., "GViM: GPU-accelerated virtual machines," in Proc. of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, 2009, pp. 17--24. Google ScholarDigital Library
L. Shi, H. Chen, and J. Sun, "vCUDA: GPU accelerated high performance computing in virtual machines," in Proc. of the IEEE Int'l Symposium on Parallel&Distributed Processing, 2009, pp. 1--11. Google ScholarDigital Library
J. Duato, A. J. Pena et al., "Enabling CUDA acceleration within virtual machines using rCUDA," in Proc. of the 18th Int'l Conference on High Performance Computing, 2011, pp. 1--10. Google ScholarDigital Library
G. Giunta, R. Montella et al., "A GPGPU transparent virtualization component for high performance computing clouds," in Proc. of the 16th Int'l Euro-Par conference on Parallel processing: Part I, 2010, pp. 379--391. Google ScholarDigital Library
V. T. Ravi, M. Becchi et al., "Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework," in Proc. of the 20th Int'l Symp. on High performance distributed computing, 2011, pp. 217--228. Google ScholarDigital Library
J. Mars, N. Vachharajani et al., "Contention aware execution: online contention detection and response," in Proc. of the 8th annual IEEE/ACM Int'l Symp. on Code generation and optimization, Toronto, 2010, pp. 257--265. Google ScholarDigital Library
J. Mars, L. Tang et al., "Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations," in Proc. of the 44th Annual IEEE/ACM Int'l Symp. on Microarchitecture, 2011, pp. 248--259. Google ScholarDigital Library
M. Kambadur, T. Moseley et al., "Measuring interference between live datacenter applications," in Proc. of the Int'l Conf. on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 1--12. Google ScholarDigital Library
C. Anglano, "A Performance Comparison of Coscheduling Strategies for Workstation Clusters," Cluster Computing, vol. 4, no. 2, pp. 121--131, 2001. Google ScholarDigital Library
G. S. Choi, S. Agarwal et al., "Performance Comparison of Coscheduling Algorithms for Non-Dedicated Clusters Through a Generic Framework," Int. J. High Perform. Comput. Appl., vol. 21, no. 1, pp. 91--105, 2007. Google ScholarDigital Library
D. G. Feitelson, and L. Rudolph, "Coscheduling Based on Run-Time Identification of Activity Working Sets," International Journal of Parallel Programming, vol. 23, pp. 135--160. Google ScholarDigital Library
A. Yoo, and M. A. Jette, "An Efficient and Scalable Coscheduling Technique for Large Symmetric Multiprocessor Clusters," in Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing, 2001, pp. 21--40. Google ScholarDigital Library
D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn, "Parallel job scheduling: a status report," in Proc. of the 10th Int'l Conf. on Job Scheduling Strategies for Parallel Processing, 2005, pp. 1--16. Google ScholarDigital Library
A. Hori, H. Tezuka, and Y. Ishikawa, "Highly efficient gang scheduling implementation," in Proc. of the 1998 ACM/IEEE Conf. on Supercomputing (CDROM), 1998, pp. 1--14. Google ScholarDigital Library
G. S. Choi, J.-H. Kim et al., "Coscheduling in Clusters: Is It a Viable Alternative?," in Proc. of the 2004 ACM/IEEE Conf. on Supercomputing, 2004, pp. 16. Google ScholarDigital Library
P. Sobalvarro, S. Pakin et al., "Dynamic Coscheduling on Workstation Clusters," in Proc. of the Workshop on Job Scheduling Strategies for Parallel Processing, 1998, pp. 231--256. Google ScholarDigital Library
A. C. Arpaci-Dusseau, "Implicit coscheduling: coordinated scheduling with implicit information in distributed systems," ACM Trans. Comput. Syst., vol. 19, no. 3, pp. 283--331, 2001. Google ScholarDigital Library
A. C. Dusseau, R. H. Arpaci, and D. E. Culler, "Effective distributed scheduling of parallel workloads," in Proce. of the 1996 ACM SIGMETRICS Int'l Conf. on Measurement and modeling of computer systems, 1996, pp. 25--36. Google ScholarDigital Library
S. Agarwal, A. B. Yoo et al., "Co-ordinated coscheduling in time-sharing clusters through a generic framework," in Proc. of the IEEE Int'l Conf. on Cluster Computing, 2003, pp. 84--91.Google ScholarCross Ref
V. T. Ravi, M. Becchi et al., "Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes," in Proc. of the 2012 12th IEEE/ACM Int'l Symp. on Cluster, Cloud and Grid Computing, 2012, pp. 140--147. Google ScholarDigital Library
D. E. Irwin, L. E. Grit, and J. S. Chase, "Balancing Risk and Reward in a Market-Based Task Service," in Proc. of the 13th IEEE Int'l Symp. on High Performance Distributed Computing, 2004, pp. 160--169. Google ScholarDigital Library
V. T. Ravi, M. Becchi et al., "ValuePack: value-based scheduling framework for CPU-GPU clusters," in Proc. of the Int'l Conf. on High Performance Computing, Networking, Storage and Analysis, 2012, pp. 1--12. Google ScholarDigital Library
R. Phull, C.-H. Li et al., "Interference-driven resource management for GPU-based heterogeneous clusters," in Proc. of the 21st Int'l Symp. on High-Performance Parallel and Distributed Computing, 2012, pp. 109--120. Google ScholarDigital Library
J. Calhoun, and J. Hai, "Preemption of a CUDA Kernel Function," in Software Engineering, Artificial Intelligence, in Proc. of the Int'l Conf. in Networking and Parallel & Distributed Computing, 2012, pp. 247--252. Google ScholarDigital Library

Index Terms

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs
1. Computer systems organization
  1. Architectures
    1. Distributed architectures

Recommendations

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

In the last few years, thanks to their computational power, their progressively increasing programmability and their wide adoption in both the research community and in industry, GPUs have become part of HPC clusters (for example, the US Titan and ...
Read More
A virtual memory based runtime to support multi-tenancy in clusters with GPUs
HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters. Nevertheless, cloud computing services and resource management frameworks targeting heterogeneous clusters including GPUs are still in their infancy. Further, GPU software ...
Read More
Out-of-core implementation for accelerator kernels on heterogeneous clouds

Cloud environments today are increasingly featuring hybrid nodes containing multicore CPU processors and a diverse mix of accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors, and Field-Programmable Gate Arrays (FPGAs) to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
June 2013
276 pages
ISBN:9781450319102
DOI:10.1145/2493123
General Chairs:
Manish Parashar
Rutgers University, USA
,
Jon Weissman
University of Minnesota, USA
,
Program Chairs:
Dick Epema
Delft University of Technology and Eindhoven University of Technology, The Netherlands
,
Renato Figueiredo
University of Florida, USA and Vrije Universiteit, The Netherlands
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CUDA
GPU
distributed computing
message passing interface
runtime systems
virtualization
Qualifiers
- research-article
Conference

Acceptance Rates
HPDC '13 Paper Acceptance Rate20of131submissions,15%Overall Acceptance Rate166of966submissions,17%
More
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 661
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

A virtual memory based runtime to support multi-tenancy in clusters with GPUs

Out-of-core implementation for accelerator kernels on heterogeneous clouds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

A virtual memory based runtime to support multi-tenancy in clusters with GPUs

Out-of-core implementation for accelerator kernels on heterogeneous clouds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media