research-article

GPUs as an opportunity for offloading garbage collection

Authors:
Martin Maas

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Philip Reames

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Jeffrey Morlan

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Krste Asanović

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
Anthony D. Joseph

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

,
John Kubiatowicz

University of California, Berkeley, Berkeley, CA, USA

University of California, Berkeley, Berkeley, CA, USA
View Profile

ISMM '12: Proceedings of the 2012 international symposium on Memory ManagementJune 2012Pages 25–36https://doi.org/10.1145/2258996.2259002

Published:15 June 2012Publication History

ISMM '12: Proceedings of the 2012 international symposium on Memory Management

Pages 25–36

ABSTRACT

GPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated CPU/GPU combinations, may create an opportunity to utilize these otherwise unused cycles for offloading traditional systems tasks. Garbage collection appears to be a particularly promising candidate for offloading, due to the popularity of managed languages on consumer devices.

We investigate the challenges for offloading garbage collection to a GPU, by examining the performance trade-offs for the mark phase of a mark & sweep garbage collector. We present a theoretical analysis and an algorithm that demonstrates the feasibility of this approach. We also discuss a number of algorithmic design trade-offs required to leverage the strengths and capabilities of the GPU hardware. Our algorithm has been integrated into the Jikes RVM and we present promising performance results.

References

B. Alpern, S. Augart, S. M. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, K. S. McKinley, M. Mergen, J. E. B. Moss, T. Ngo, V. Sarkar, and M. Trapp. The Jikes Research Virtual Machine project: Building an open-source research community. IBM Systems Journal, 44(2):399--417, 2005. Google ScholarDigital Library
AMD. AMD Embedded G-Series Platform: The world's firs combination of low-power CPU and advanced GPU integrated into a single embedded device. http://www.amd.com/us/Documents/49282_ G-Series_platform_brief.pdf.Google Scholar
AMD. AMD Accelerated Parallel Processing (APP) SDK OpenCL Programming Guide. http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf.Google Scholar
A. W. Appel and A. Bendiksen. Vectorized garbage collection. The Journal of Supercomputing, 3:151--160, 1989.Google ScholarCross Ref
K. Barabash and E. Petrank. Tracing garbage collection on highly parallel platforms. SIGPLAN Not., 45:1--10, June 2010. Google ScholarDigital Library
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. SIGPLAN Not., 41:169--190, October 2006. Google ScholarDigital Library
M. Elteir, H. Lin, and W.-C. Feng. Performance Characterizatio and Optimization of Atomic Operations on AMD GPUs. In 2011 IEEE International Conference on Cluster Computing (CLUSTER), pages 234 --243, Sept 2011. Google ScholarDigital Library
E. M. Gagnon and L. J. Hendren. SableVM: A Research Framework for the Efficient Execution of Java Bytecode. In In Proceedings of the Java Virtual Machine Research and Technology Symposium, pages 27--40, 2000. Google ScholarDigital Library
R. J. Garner, S. M. Blackburn, and D. Frampton. A comprehensive evaluation of object scanning techniques. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 33--42, New York, NY, USA, 2011. Google ScholarDigital Library
P. Harish and P. J. Narayanan. Accelerating large grap algorithms on the GPU using CUDA. Technology, 4873:197--208, 2007. Google ScholarDigital Library
M. Harris. Parallel Prefix Sum (Scan) with CUDA. GPU Gems, 3 (April):851--876, 2007.Google Scholar
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating CUDA graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 267--276, New York, NY, USA, 2011. Google ScholarDigital Library
A. S. Jiva and G. R. Frost. GPU Assisted Garbage Collection, 04 2010. URL http://www.patentlens.net/patentlens/patent/US_2010_0082930_A1/en/.Google Scholar
R. Jones and R. D. Lins. Garbage Collection: Algorithms fo Automatic Dynamic Memory Management. Wiley, Sept. 1996. Google ScholarDigital Library
Khronos Group. OpenCL 1.2 Specification. http://www.khronos. org/registry/cl/specs/opencl-1.2.pdf.Google Scholar
L. Luo, M.Wong, andW.-m. Hwu. An effective GPU implementation of breadth-first search. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 52--55, New York, NY, USA, 2010. Google ScholarDigital Library
S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel generational-copying garbage collection with a block-structured heap. In Proceedings of the 7th International Symposium on Memory Management, ISMM '08, pages 11--20, New York, NY, USA, 2008. Google ScholarDigital Library
J. Naghmouchi, D. P. Scarpazza, and M. Berekovic. Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pages 337--348, New York, NY, USA, 2010. Google ScholarDigital Library
R. Smith, N. Goyal, J. Ormont, K. Sankaralingam, and C. Estan. Evaluating GPUs for network packet signature matching. In International Symposium on Performance Analysis of Systems and Software, 2009. ISPASS 2009, pages 175 --184, April 2009.Google ScholarCross Ref
W. Sun and R. Ricci. Augmenting Operating Systems With the GPU. Technical report, University of Utah, 2010.Google Scholar
R. Veldema and M. Philippsen. Iterative data-parallel mark & sweep on a GPU. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 1--10, New York, NY, USA, 2011. Google ScholarDigital Library
C. yong Cher and M. Gschwind. Cell GC: using the Cel synergistic processor as a garbage collection coprocessor. In VEE '08: Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 141--150. ACM, 2008. Google ScholarDigital Library

Index Terms

GPUs as an opportunity for offloading garbage collection
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
  2. Parallel computing methodologies
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

GPUs as an opportunity for offloading garbage collection
ISMM '12

GPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated ...
Read More
FastCollect: offloading generational garbage collection to integrated GPUs
CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Generational Mark-Sweep Garbage Collection is a widely used garbage collection technique. However, the garbage collector has poor execution efficiency for large programs. Aggressive collection causes execution pauses in the program, while reducing the ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISMM '12: Proceedings of the 2012 international symposium on Memory Management
June 2012
152 pages
ISBN:9781450313506
DOI:10.1145/2258996
General Chair:
Martin Vechev
ETH Zurich
,
Program Chair:
Kathryn S. McKinley
The University of Texas at Austin and Microsoft Research
ACM SIGPLAN Notices Volume 47, Issue 11
ISMM '12
November 2012
136 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2426642
Issue’s Table of Contents
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
apu
gpu
mark and sweep
parallel garbage collection
simt
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate72of156submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 549
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GPUs as an opportunity for offloading garbage collection

ISMM '12: Proceedings of the 2012 international symposium on Memory Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

GPUs as an opportunity for offloading garbage collection

FastCollect: offloading generational garbage collection to integrated GPUs

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing