research-article

FastCollect: offloading generational garbage collection to integrated GPUs

Authors:
Abhinav

Indian Institute of Technology (BHU), Varanasi

Indian Institute of Technology (BHU), Varanasi
View Profile

,
Rupesh Nasre

Indian Institute of Technology, Madras

Indian Institute of Technology, Madras
View Profile

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded SystemsOctober 2016Article No.: 21Pages 1–10https://doi.org/10.1145/2968455.2968520

Published:01 October 2016Publication History

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Pages 1–10

ABSTRACT

Generational Mark-Sweep Garbage Collection is a widely used garbage collection technique. However, the garbage collector has poor execution efficiency for large programs. Aggressive collection causes execution pauses in the program, while reducing the collection frequency leads to memory wastage. In this work, we develop FastCollect, a parallel version of the generational mark-sweep garbage collector running on a graphics processing unit (GPU). At the core of our parallel implementation lies a parallel Depth First Search using a space-efficient concurrent stack, which we develop for the young and the mature garbage collection. To further improve performance, (i) we reduce thread-divergence and improve load-balancing by devising a distributed work-stealing approach, (ii) we optimize our garbage collection algorithm to reduce the number of atomic instructions, (iii) we exploit the memory hierarchy to design a hybrid stack, and (iv) we extract multiple adjacent objects simultaneously by exploiting vectorized memory accesses. We implemented FastCollect in Java Hotspot VM and evaluated our results by executing DaCapo benchmarks. FastCollect is 4-5x faster than the Parallel Hotspot garbage collector and 42% faster than a previous GPU implementation. In addition, while the existing GPU version requires memory linear in the number of compute units, FastCollect's memory requirement is fixed and low. FastCollect not only improves execution time of garbage collection, but also relieves the CPU for improved user interaction.

References

C. R. Attanasio, D. F. Bacon, A. Cocchi, and S. Smith. A Comparative Evaluation of Parallel Garbage Collector Implementations. In Proceedings of the 14th International Conference on Languages and Compilers for Parallel Computing, LCPC'01, pages 177--192, Berlin, Heidelberg, 2003. Springer-Verlag. ISBN 3-540-04029-3. Google ScholarDigital Library
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 169--190, New York, NY, USA, 2006. ACM. ISBN 1-59593-348-4.. Google ScholarDigital Library
H. Boehm. A garbage collector for C and C++. http://www.hboehm.info/gc/, 2015.Google Scholar
T. Endo, K. Taura, and A. Yonezawa. A Scalable Mark-sweep Garbage Collector on Large-scale Shared-memory Machines. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, SC '97, pages 1--14, New York, NY, USA, 1997. ACM. ISBN 0-89791-985-8.. Google ScholarDigital Library
FastCollect Implementation. https://github.com/abhijangda/FastCollect.Google Scholar
C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel Garbage Collection for Shared Memory Multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1, JVM'01, pages 21--21, Berkeley, CA, USA, 2001. USENIX Association. Google ScholarDigital Library
L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A study of the scalability of stop-the-world garbage collectors on multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 229--240, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1870-9.. Google ScholarDigital Library
L. Gidra, G. Thomas, J. Sopena, M. Shapiro, and N. Nguyen. Numagic: A garbage collector for big data on big numa machines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 661--673, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-2835-7.. Google ScholarDigital Library
P. Harish and P. J. Narayanan. Accelerating Large Graph Algorithms on the GPU Using CUDA. In Proceedings of the 14th International Conference on High Performance Computing, HiPC'07, pages 197--208, Berlin, Heidelberg, 2007. Springer-Verlag. ISBN 3-540-77219-7, 978-3-540-77219-4. Google ScholarDigital Library
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating cuda graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 267--276, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0119-0. Google ScholarDigital Library
A. S. Jiva and G. R. Frost. GPU Assisted Garbage Collection. http://www.patentlens.net/patentlens/patent/US\_2010\_0082930\_A1/en/, 2010.Google Scholar
R. E. Jones. Garbage Collection: Algorithms for Automatic Dynamic Memory Management, chapter 6, pages 117--140. Wiley, Chichester, 1 edition, 7 1996.Google Scholar
H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. In Communications of the ACM, Volume 26 Issue 6, pages 419--429, New York, NY, USA, 1983. ACM.. Google ScholarDigital Library
L. Luo, M. Wong, and W. Hwu. An Effective GPU Implementation of Breadth-first Search. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 52--55, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0002-5.. Google ScholarDigital Library
M. Maas, P. Reames, J. Morlan, K. Asanović, A. D. Joseph, and J. Kubiatowicz. GPUs As an Opportunity for Offloading Garbage Collection. In Proceedings of the 2012 International Symposium on Memory Management, ISMM '12, pages 25--36, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1350-6.. Google ScholarDigital Library
D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU Graph Traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1160-1.. Google ScholarDigital Library
Oracle. Java SE Hotspot at a Glance. http://www.oracle.com/technetwork/articles/javase/index-jsp-136373.html, 2015.Google Scholar
J. H. Reif. Depth-first search is inherently sequential. Information Processing Letters, 20(5):229 -- 234, 1985. ISSN 0020-0190..Google ScholarCross Ref
W. Sun and R. Ricci. Augmenting Operating Systems With the GPU. Technical Report, University of Utah, 2010.Google Scholar
G. Tene, B. Iyengar, and M. Wolf. C4: The continuously concurrent compacting collector. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 79--88, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0.. Google ScholarDigital Library
D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SDE 1 Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments, pages 157--167, New York, NY, USA, 1984. ACM. ISBN 0-89791-131-8.. Google ScholarDigital Library
R. Veldema and M. Philippsen. Iterative Data-parallel Mark&Sweep on a GPU. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 1--10, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0.. Google ScholarDigital Library

Index Terms

FastCollect: offloading generational garbage collection to integrated GPUs
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

GPUs as an opportunity for offloading garbage collection
ISMM '12

GPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated ...
Read More
GPUs as an opportunity for offloading garbage collection
ISMM '12: Proceedings of the 2012 international symposium on Memory Management

GPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated ...
Read More
A generational on-the-fly garbage collector for Java
PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation

An on-the-fly garbage collector does not stop the program threads to perform the collection. Instead, the collector executes in a separate thread (or process) in parallel to the program. On-the-fly collectors are useful for multi-threaded applications ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems
October 2016
187 pages
ISBN:9781450344821
DOI:10.1145/2968455

Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
SIMT
generational garbage collection
mark and sweep garbage collection
parallel garbage collection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 175
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FastCollect: offloading generational garbage collection to integrated GPUs

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

GPUs as an opportunity for offloading garbage collection

GPUs as an opportunity for offloading garbage collection

A generational on-the-fly garbage collector for Java

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

FastCollect: offloading generational garbage collection to integrated GPUs

CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

GPUs as an opportunity for offloading garbage collection

GPUs as an opportunity for offloading garbage collection

A generational on-the-fly garbage collector for Java

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media