ABSTRACT
Over the last few years, GPUs have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud's cost advantage traditionally stems from oversubscription: Cloud providers rent out more resources to their customers than are actually available, expecting that the customers will not actually use all of the promised resources. For GPU memory, this oversubscription is difficult due to the lack of support for demand paging in current GPUs. Therefore, recent approaches to enabling oversubscription of GPU memory resort to software scheduling of GPU kernels -- which has been shown to induce significant runtime overhead in applications even if sufficient GPU memory is available -- to ensure that data is present on the GPU when referenced.
In this paper, we present GPUswap, a novel approach to enabling oversubscription of GPU memory that does not rely on software scheduling of GPU kernels. GPUswap uses the GPU's ability to access system RAM directly to extend the GPU's own memory. To that end, GPUswap transparently relocates data from the GPU to system RAM in response to memory pressure. GPUswap ensures that all data is permanently accessible to the GPU and thus allows applications to submit commands to the GPU directly at any time, without the need for software scheduling. Experiments with our prototype implementation show that GPU applications can still execute even with only 20 MB of GPU memory available. In addition, while software scheduling suffers from permanent overhead even with sufficient GPU memory available, our approach executes GPU applications with native performance.
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 5th International Symposium on Workload Characterization, IISWC '09, pages 44--54, Austin, TX, Oct. 2009. IEEE. Google ScholarDigital Library
- C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In Proceedings of the 2nd Symposium on Networked Systems Design & Implementation, NSDI '05, pages 273--286, Boston, MA, USA, May 2005. USENIX Association. Google ScholarDigital Library
- J. Duato, A. Pena, F. Silla, J. Fernandez, R. Mayo, and E. Quintana-Orti. Enabling CUDA acceleration within virtual machines using rCUDA. In Proceedings of the 18th International Conference on High Performance Computing, HiPC '11, pages 1--10, Bengaluru, India, Dec. 2011. Google ScholarDigital Library
- G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A GPGPU transparent virtualization component for highperformance computing clouds. In Proceedings of the 16th International Euro-Par Conference on Parallel processing, Euro-Par '10, pages 379--391, Naples, Italy, Sept. 2010. Springer. Google ScholarDigital Library
- M. Gottschlag, M. Hillenbrand, J. Kehne, J. Stoess, and F. Bellosa. LoGV: Low-overhead GPGPU virtualization. In Proceedings of the 4th International Workshop on Frontiers of Heterogeneous Computing, FHC '13, pages 1721--1726, Zhangjiajie, China, Nov. 2013. IEEE.Google ScholarCross Ref
- V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. GViM: GPU-accelerated virtual machines. In Proceedings of the 3rd ACMWorkshop on System-level Virtualization for High Performance Computing, HPCVirt '09, pages 17--24, Nuremberg, Germany, Apr. 2009. ACM. Google ScholarDigital Library
- F. Ji, H. Lin, and X. Ma. RSVM: A region-based software virtual memory for GPU. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT '13, pages 269--278, Edinburgh, Scotland, Sept. 2013. IEEE. Google ScholarDigital Library
- S. Kato. Rodinia for gdev. https://github.com/shinpei0208/gdev-bench, Nov. 2014.Google Scholar
- S. Kato. Gdev. https://github.com/shinpei0208/gdev, Nov. 2014.Google Scholar
- S. Kato, K. Lakshmanan, R. R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX Annual Technical Conference, USENIX ATC '11, pages 17--30, Portland, OR, USA, June 2011. USENIX Association. Google ScholarDigital Library
- S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-class GPU resource management in the operating system. In Proceedings of the 2012 USENIX Annual Technical Conference, USENIX ATC '12, pages 401--412, Boston, MA, USA, June 2012. USENIX Association. Google ScholarDigital Library
- Nvidia Corporation. CUDA toolkit. https://developer. nvidia.com/cuda-toolkit, 2014.Google Scholar
- PathScale. Pscnv. https://github.com/pathscale/pscnv, Nov. 2014.Google Scholar
- C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In Proceedings of the 23th Symposium on Operating System Principles, SOSP '11, pages 233--248, Cascais, Portugal, Sept. 2011. ACM. Google ScholarDigital Library
- L. Shi, H. Chen, J. Sun, and K. Li. vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Transactions on Computers, 61(6):804--816, June 2012. Google ScholarDigital Library
- Y. Suzuki, S. Kato, H. Yamada, and K. Kono. GPUvm: Why not virtualizing GPUs at the hypervisor? In Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC '14, pages 109--120, Philadelphia, PA, June 2014. USENIX Association. Google ScholarDigital Library
- K. Tian, Y. Dong, and D. Cowperthwaite. A full GPU virtualization solution with mediated pass-through. In Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC '14, pages 121--132, Philadelphia, PA, June 2014. USENIX Association. Google ScholarDigital Library
- K. Wang, X. Ding, R. Lee, S. Kato, and X. Zhang. GDM: Device memory management for GPGPU computing. In Proceedings of the 2014 ACM International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '14, pages 533--545, Austin, TX, USA, June 2014. ACM. Google ScholarDigital Library
Index Terms
- GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping
Recommendations
GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping
VEE '15Over the last few years, GPUs have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud's cost advantage traditionally stems from ...
GPrioSwap: towards a swapping policy for GPUs
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage ConferenceOver the last few years, Graphics Processing Units (GPUs) have become popular in computing, and have found their way into a number of cloud platforms. However, integrating a GPU into a cloud environment requires the cloud provider to efficiently ...
VSwapper: a memory swapper for virtualized environments
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsThe number of guest virtual machines that can be consolidated on one physical host is typically limited by the memory size, motivating memory overcommitment. Guests are given a choice to either install a "balloon" driver to coordinate the overcommitment ...
Comments