skip to main content
10.1145/2287076.2287091acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Interference-driven resource management for GPU-based heterogeneous clusters

Authors Info & Claims
Published:18 June 2012Publication History

ABSTRACT

GPU-based clusters are increasingly being deployed in HPC environments to accelerate a variety of scientific applications. Despite their growing popularity, the GPU devices themselves are under-utilized even for many computationally-intensive jobs. This stems from the fact that the typical GPU usage model is one in which a host processor periodically offloads computationally intensive portions of an application to the coprocessor. Since some portions of code cannot be offloaded to the GPU (for example, code performing network communication in MPI applications), this usage model results in periods of time when the GPU is idle. GPUs could be time-shared across jobs to "fill" these idle periods, but unlike CPU resources such as the cache, the effects of sharing the GPU are not well understood. Specifically, two jobs that time-share a single GPU will experience resource contention and interfere with each other. The resulting slow-down could lead to missed job deadlines. Current cluster managers do not support GPU-sharing, but instead dedicate GPUs to a job for the job's lifetime.

In this paper, we present a framework to predict and handle interference when two or more jobs time-share GPUs in HPC clusters. Our framework consists of an analysis model, and a dynamic interference detection and response mechanism to detect excessive interference and restart the interfering jobs on different nodes. We implement our framework in Torque, an open-source cluster manager, and using real workloads on an HPC cluster, show that interference-aware two-job colocation (although our method is applicable to colocating more than two jobs) improves GPU utilization by 25%, reduces a job's waiting time in the queue by 39% and improves job latencies by around 20%.

References

  1. Amber 11 NVIDIA GPU Acceleration Support. http://ambermd.org/gpus/.Google ScholarGoogle Scholar
  2. OpenFoam: The Open Source CFD Toolbox. http://www.openfoam.com.Google ScholarGoogle Scholar
  3. Top 500 list (November 2011).Google ScholarGoogle Scholar
  4. Tsubame2 system architecture.Google ScholarGoogle Scholar
  5. Adaptive Computing. Torque Resource Manager. http://www.adaptivecomputing.com.Google ScholarGoogle Scholar
  6. Advanced Center for Computing and Communication, RIKEN. Himeno Benchmark. http://accc.riken.jp/HPC_e/himenobmt_e.html.Google ScholarGoogle Scholar
  7. R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Amazon Elastic Compute Cloud (Amazon EC2). http://www.amazon.com/b?ie=UTF8 &node=201590011.Google ScholarGoogle Scholar
  9. M. Becchi, K. Sajjapongse, I. Graves, A. Procter, V. Ravi, and S. Chakradhar. A virtual memory based runtime to support multi-tenancy in clusters with GPUs. In Proc. of the Intl. Symposium on High Perf. Distributed Computing, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Benoit, L. Marchal, J.-F. Pineau, Y. Robert, and F. Vivien. Scheduling concurrent bag-of-tasks applications on heterogeneous platforms. IEEE Computer, 59(2):202--217, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Byna, J. Meng, S. T. Chakradhar, A. Raghunathan, and S. Cadambi. Best Effort Semantic Document Search on GPUs. In Proc. of the Workshop on GPUGPU, Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. G. Feitelson and M. A. Jette. Improved utilization and responsiveness with gang scheduling. In Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, pages 238--261. Springer Berlin/Heidelberg, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Gardey, D. Lime, M. Magnin, and O. Roux. Roméo: A tool for analyzing time Petri nets. In K. Etessami and S. Rajamani, editors, Computer Aided Verification, volume 3576 of Lecture Notes in Computer Science, pages 261--272. Springer Berlin/Heidelberg, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. IBM. Tivoli workload scheduler LoadLeveler. http://www-03.ibm.com/systems/software/loadleveler.Google ScholarGoogle Scholar
  15. John Michalakes and Manish Vachharajani. GPU Acceleration of Numerical Weather Prediction. http://www.mmm.ucar.edu/wrf/WG2/GPU/WSM5.htm.Google ScholarGoogle Scholar
  16. D. Lifka. The ANL/IBM SP scheduling system. In Job Scheduling Strategies for Parallel Processing, volume 949 of Lecture Notes in Computer Science, pages 295--303. Springer Berlin/Heidelberg, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Majumdar, S. Cadambi, and S. Chakradhar. An energy-efficient heterogeneous system for embedded learning and classification. volume 3, pages 42--45, march 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proc. of the Annual IEEE/ACM Intl. Symposium on Microarchitecture, Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. Contention aware execution: online contention detection and response. In Proc. of the IEEE/ACM Intl. Symposium on Code Generation and Optimization, pages 257--265, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Ousterhout. Scheduling techniques for concurrent systems. In Proceedings of the International Conference on Distributed Computing Systems, pages 22--30, Oct. 1982.Google ScholarGoogle Scholar
  21. PBS Works. GPU Scheduling with PBS Professional. http://www.pbsworks.com.Google ScholarGoogle Scholar
  22. C. Ramchandani. Analysis of Asynchronous Concurrent Systems by Timed Petri Nets. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1974.Google ScholarGoogle Scholar
  23. V. T. Ravi, M. Becchi, G. Agrawal, and S. Chakradhar. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework. In Proc. of the Intl. Symposium on High Perf. Distributed Computing, pages 217--228, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Ren, E. Tune, T. Moseley, Y. Shi, S. Rus, and R. Hundt. Google-wide profiling: A continuous profiling infrastructure for data centers. IEEE Micro, 30(4):65--79, Jul.--Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Schneider, M. Kraus, and R. Westermann. GPU-based euclidean distance transforms and their application to volume rendering. In Computer Vision, Imaging and Computer Graphics. Theory and Applications, volume 68 of Communications in Computer and Information Science, pages 215--228. Springer Berlin Heidelberg, 2010.Google ScholarGoogle Scholar
  26. J. Sifakis. Performance evaluation of systems using nets. In Net Theory and Applications, volume 84 of Lecture Notes in Computer Science, pages 307--319. Springer Berlin/Heidelberg, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. Springel. The cosmological simulation code gadget-2. Monthly Notices of the Royal Astronomical Society, 364(4):1105--1134, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. A. Stuart, C.-K. Chen, K.-L. Ma, and J. D. Owens. Multi-GPU volume rendering using MapReduce. In Proc. of the Intl. Symposium on High Perf. Distributed Computing, pages 841--848, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In Proc. of the Annual Symposium on Computer Architecture, pages 283--294, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Vetter and C. Chambreau. mpiP: Lightweight, scalable MPI profiling. http://mpip.sourceforge.net.Google ScholarGoogle Scholar
  31. V. Volkov and J. W. Demmel. Benchmarking GPUs to tune dense linear algebra. In Proc. of the ACM/IEEE Conf. on Supercomputing, pages 1--11, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Wiseman and D. G. Feitelson. Paired gang scheduling. IEEE Tran. on Parallel and Distributed Syst., 14(6):581--592, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Zhang, H. Franke, J. Moreira, and A. Sivasubramaniam. An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. IEEE Tran. on Parallel and Distributed Syst., 14(3):236--247, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. M. Zuberek. Timed Petri nets and preliminary performance evaluation. In Proc. of the Annual Symposium on Computer Architecture, pages 88--96, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Interference-driven resource management for GPU-based heterogeneous clusters

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          HPDC '12: Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
          June 2012
          308 pages
          ISBN:9781450308052
          DOI:10.1145/2287076
          • General Chair:
          • Dick Epema,
          • Program Chairs:
          • Thilo Kielmann,
          • Matei Ripeanu

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          HPDC '12 Paper Acceptance Rate23of143submissions,16%Overall Acceptance Rate166of966submissions,17%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader