research-article

GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping

Authors:
Jens Kehne

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
View Profile

,
Jonathan Metter

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
View Profile

,
Frank Bellosa

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
View Profile

VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsMarch 2015Pages 65–77https://doi.org/10.1145/2731186.2731192

Published:14 March 2015Publication History

VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

Pages 65–77

ABSTRACT

Over the last few years, GPUs have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud's cost advantage traditionally stems from oversubscription: Cloud providers rent out more resources to their customers than are actually available, expecting that the customers will not actually use all of the promised resources. For GPU memory, this oversubscription is difficult due to the lack of support for demand paging in current GPUs. Therefore, recent approaches to enabling oversubscription of GPU memory resort to software scheduling of GPU kernels -- which has been shown to induce significant runtime overhead in applications even if sufficient GPU memory is available -- to ensure that data is present on the GPU when referenced.

In this paper, we present GPUswap, a novel approach to enabling oversubscription of GPU memory that does not rely on software scheduling of GPU kernels. GPUswap uses the GPU's ability to access system RAM directly to extend the GPU's own memory. To that end, GPUswap transparently relocates data from the GPU to system RAM in response to memory pressure. GPUswap ensures that all data is permanently accessible to the GPU and thus allows applications to submit commands to the GPU directly at any time, without the need for software scheduling. Experiments with our prototype implementation show that GPU applications can still execute even with only 20 MB of GPU memory available. In addition, while software scheduling suffers from permanent overhead even with sufficient GPU memory available, our approach executes GPU applications with native performance.

References

S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 5th International Symposium on Workload Characterization, IISWC '09, pages 44--54, Austin, TX, Oct. 2009. IEEE. Google ScholarDigital Library
C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In Proceedings of the 2nd Symposium on Networked Systems Design & Implementation, NSDI '05, pages 273--286, Boston, MA, USA, May 2005. USENIX Association. Google ScholarDigital Library
J. Duato, A. Pena, F. Silla, J. Fernandez, R. Mayo, and E. Quintana-Orti. Enabling CUDA acceleration within virtual machines using rCUDA. In Proceedings of the 18th International Conference on High Performance Computing, HiPC '11, pages 1--10, Bengaluru, India, Dec. 2011. Google ScholarDigital Library
G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A GPGPU transparent virtualization component for highperformance computing clouds. In Proceedings of the 16th International Euro-Par Conference on Parallel processing, Euro-Par '10, pages 379--391, Naples, Italy, Sept. 2010. Springer. Google ScholarDigital Library
M. Gottschlag, M. Hillenbrand, J. Kehne, J. Stoess, and F. Bellosa. LoGV: Low-overhead GPGPU virtualization. In Proceedings of the 4th International Workshop on Frontiers of Heterogeneous Computing, FHC '13, pages 1721--1726, Zhangjiajie, China, Nov. 2013. IEEE.Google ScholarCross Ref
V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACMWorkshop on System-level Virtualization for High Performance Computing, HPCVirt '09, pages 17--24, Nuremberg, Germany, Apr. 2009. ACM. Google ScholarDigital Library
F. Ji, H. Lin, and X. Ma. RSVM: A region-based software virtual memory for GPU. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT '13, pages 269--278, Edinburgh, Scotland, Sept. 2013. IEEE. Google ScholarDigital Library
S. Kato. Rodinia for gdev. https://github.com/shinpei0208/gdev-bench, Nov. 2014.Google Scholar
S. Kato. Gdev. https://github.com/shinpei0208/gdev, Nov. 2014.Google Scholar
S. Kato, K. Lakshmanan, R. R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX Annual Technical Conference, USENIX ATC '11, pages 17--30, Portland, OR, USA, June 2011. USENIX Association. Google ScholarDigital Library
S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-class GPU resource management in the operating system. In Proceedings of the 2012 USENIX Annual Technical Conference, USENIX ATC '12, pages 401--412, Boston, MA, USA, June 2012. USENIX Association. Google ScholarDigital Library
Nvidia Corporation. CUDA toolkit. https://developer. nvidia.com/cuda-toolkit, 2014.Google Scholar
PathScale. Pscnv. https://github.com/pathscale/pscnv, Nov. 2014.Google Scholar
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the 23th Symposium on Operating System Principles, SOSP '11, pages 233--248, Cascais, Portugal, Sept. 2011. ACM. Google ScholarDigital Library
L. Shi, H. Chen, J. Sun, and K. Li. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Transactions on Computers, 61(6):804--816, June 2012. Google ScholarDigital Library
Y. Suzuki, S. Kato, H. Yamada, and K. Kono. GPUvm: Why not virtualizing GPUs at the hypervisor? In Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC '14, pages 109--120, Philadelphia, PA, June 2014. USENIX Association. Google ScholarDigital Library
K. Tian, Y. Dong, and D. Cowperthwaite. A full GPU virtualization solution with mediated pass-through. In Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC '14, pages 121--132, Philadelphia, PA, June 2014. USENIX Association. Google ScholarDigital Library
K. Wang, X. Ding, R. Lee, S. Kato, and X. Zhang. GDM: Device memory management for GPGPU computing. In Proceedings of the 2014 ACM International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '14, pages 533--545, Austin, TX, USA, June 2014. ACM. Google ScholarDigital Library

Index Terms

GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory

Recommendations

GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping
VEE '15

Over the last few years, GPUs have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud's cost advantage traditionally stems from ...
Read More
GPrioSwap: towards a swapping policy for GPUs
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage Conference

Over the last few years, Graphics Processing Units (GPUs) have become popular in computing, and have found their way into a number of cloud platforms. However, integrating a GPU into a cloud environment requires the cloud provider to efficiently ...
Read More
VSwapper: a memory swapper for virtualized environments
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

The number of guest virtual machines that can be consolidated on one physical host is typically limited by the memory size, motivating memory overcommitment. Guests are given a choice to either install a "balloon" driver to coordinate the overcommitment ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
March 2015
238 pages
ISBN:9781450334501
DOI:10.1145/2731186
General Chair:
Ada Gavrilovska
Georgia Tech
,
Program Chairs:
Angela Demke Brown
University of Toronto
,
Bjarne Steensgaard
Microsoft
ACM SIGPLAN Notices Volume 50, Issue 7
VEE '15
July 2015
221 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2817817
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 March 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
gpu
memory overcommitment
oversubscription
swapping
virtualization
Qualifiers
- research-article
Conference

Acceptance Rates
VEE '15 Paper Acceptance Rate16of50submissions,32%Overall Acceptance Rate80of235submissions,34%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 637
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping

VEE '15: Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

ABSTRACT

References

Cited By

Index Terms

Recommendations

GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping

GPrioSwap: towards a swapping policy for GPUs

VSwapper: a memory swapper for virtualized environments