skip to main content
10.1145/2287076.2287090acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

A virtual memory based runtime to support multi-tenancy in clusters with GPUs

Published:18 June 2012Publication History

ABSTRACT

Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters. Nevertheless, cloud computing services and resource management frameworks targeting heterogeneous clusters including GPUs are still in their infancy. Further, GPU software stacks (e.g., CUDA driver and runtime) currently provide very limited support to concurrency.

In this paper, we propose a runtime system that provides abstraction and sharing of GPUs, while allowing isolation of concurrent applications. A central component of our runtime is a memory manager that provides a virtual memory abstraction to the applications. Our runtime is flexible in terms of scheduling policies, and allows dynamic (as opposed to programmer-defined) binding of applications to GPUs. In addition, our framework supports dynamic load balancing, dynamic upgrade and downgrade of GPUs, and is resilient to their failures. Our runtime can be deployed in combination with VM-based cloud computing services to allow virtualization of heterogeneous clusters, or in combination with HPC cluster resource managers to form an integrated resource management infrastructure for heterogeneous clusters. Experiments conducted on a three-node cluster show that our GPU sharing scheme allows up to a 28% and a 50% performance improvement over serialized execution on short- and long-running jobs, respectively. Further, dynamic inter-node load balancing leads to an additional 18-20% performance benefit.

References

  1. V. Gupta et al. 2009. GViM: GPU-accelerated virtual machines. In Proc. of HPCVirt '09. ACM, New York, NY, USA, pp. 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Shi, H. Chen, and J. Sun. 2009. vCUDA: GPU accelerated high performance computing in virtual machines. In Proc. of IPDPS '09, Washington, DC, USA, pp. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Duato et al. 2010. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters. In Proc. of HPCS'10, pp. 224--231.Google ScholarGoogle ScholarCross RefCross Ref
  4. G. Giunta, R. Montella, G. Agrillo, and G. Coviello. 2010. A GPGPU transparent virtualization component for high performance computing clouds. In Proc. of Euro-Par 2010, Heidelberg, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. gVirtuS: http://osl.uniparthenope.it/projects/gvirtusGoogle ScholarGoogle Scholar
  6. V. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar. 2011. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework. In Proc. of HPDC '11. ACM, New York, NY, USA, pp. 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Guevara, C. Gregg, K. Hazelwood, and K. Skadron. 2009. Enabling Task Parallelism in the CUDA Scheduler. In Workshop on Programming Models for Emerging Architectures, Sep. 2009.Google ScholarGoogle Scholar
  8. I. Gelado et al. 2010. An asymmetric distributed shared memory model for heterogeneous parallel systems. In Proc. of ASPLOS '10. ACM, New York, NY, USA, pp. 347--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Becchi, S. Byna, S. Cadambi, and S. Chakradhar. 2010. Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In Proc. of SPAA '10. ACM, New York, NY, USA, pp. 82--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. 2008. Merge: a programming model for heterogeneous multi-core systems. In Proc. of ASPLOS'08. ACM, New York, NY, USA, pp. 287--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Saha et al. 2009. Programming model for a heterogeneous x86 platform. In Proc. of PLDI '09. New York, NY, USA, pp. 431--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C.-K. Luk, S. Hong, and H. Kim. 2009. Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proc. of MICRO'09. ACM, New York, NY, USA, pp. 45--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S.-Z. Ueng, M. Lathara, S. Baghsorkhi, and W.-M. Hwu. 2008. CUDA-Lite: Reducing GPU Programming Complexity. In Languages and Compilers for Parallel Computing, Lecture Notes in Comp. Sc., Vol. 5335. Springer-Verlag, Berlin, Heidelberg pp. 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Lee and R. Eigenmann. 2010. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In Proc. of SC '10. Washington, DC, USA, pp. 1--11. Nov 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Nukada, H. Takizawa, and S. Matsuoka, 2011. NVCR: A Transparent Checkpoint-Restart Library for NVIDIA CUDA. In Proc. of IPDPDW'11, Shanghai, China, pp. 104--113, Sep 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Sundaram, A. Raghunathan, and S. Chakradhar. 2009. A framework for efficient and scalable execution of domain-specific templates on GPUs. In Proc. of IPDPS '09. IEEE Computer Society, Washington, DC, USA, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Kim, H. Kim, J. Hwan Lee, and J. Lee. 2011. Achieving a single compute device image in OpenCL for multiple GPUs. In Proc. of PPoPP '11. ACM, New York, NY, USA, pp. 277--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Lim, S. Babu, J. Chase, and S. Parekh. 2009. Automated control in cloud computing: challenges and opportunities. In Proc. of ACDC '09. ACM, New York, NY, USA, pp. 13--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Marshall, K. Keahey, and T. Freeman. 2010. Elastic Site: Using Clouds to Elastically Extend Site Resources. In Proc. of CCGrid 2010, pp. 43--52, May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Padala et al. 2009. Automated control of multiple virtualized resources. In Proc. of EuroSys '09. New York, NY, USA, pp. 13--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Becchi and P. Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proc. of CF '06. ACM, New York, NY, USA, pp. 29--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Nickolls, I. Buck, M. Garland, K. Skadron. 2008. Scalable Parallel Programming with CUDA. In ACM Queue. April 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Teodoro et al. 2009. Coordinating the use of GPU and CPU for improving performance of compute intensive applications. In Proc. of CLUSTER, pp. 1--10, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  24. Eucalyptus: http://www.eucalyptus.comGoogle ScholarGoogle Scholar
  25. TORQUE Resource Manager: http://www.clusterresources.com/ products/TORQUE-resource-manager.phpGoogle ScholarGoogle Scholar
  26. Amazon EC2 Instances: http://aws.amazon.com/ec2/Google ScholarGoogle Scholar
  27. Nimbix Informatics Xcelerated: http://www.nimbix.netGoogle ScholarGoogle Scholar
  28. Hoopoe: http://www.hoopoe-cloud.comGoogle ScholarGoogle Scholar
  29. BLCR: https://ftg.lbl.gov/projects/CheckpointRestartGoogle ScholarGoogle Scholar
  30. Rodinia : https://www.cs.virginia.edu/~skadron/ wiki/rodinia/index.php/Main_PageGoogle ScholarGoogle Scholar

Index Terms

  1. A virtual memory based runtime to support multi-tenancy in clusters with GPUs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
      June 2012
      308 pages
      ISBN:9781450308052
      DOI:10.1145/2287076
      • General Chair:
      • Dick Epema,
      • Program Chairs:
      • Thilo Kielmann,
      • Matei Ripeanu

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 June 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      HPDC '12 Paper Acceptance Rate23of143submissions,16%Overall Acceptance Rate166of966submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader