ABSTRACT
Generational Mark-Sweep Garbage Collection is a widely used garbage collection technique. However, the garbage collector has poor execution efficiency for large programs. Aggressive collection causes execution pauses in the program, while reducing the collection frequency leads to memory wastage. In this work, we develop FastCollect, a parallel version of the generational mark-sweep garbage collector running on a graphics processing unit (GPU). At the core of our parallel implementation lies a parallel Depth First Search using a space-efficient concurrent stack, which we develop for the young and the mature garbage collection. To further improve performance, (i) we reduce thread-divergence and improve load-balancing by devising a distributed work-stealing approach, (ii) we optimize our garbage collection algorithm to reduce the number of atomic instructions, (iii) we exploit the memory hierarchy to design a hybrid stack, and (iv) we extract multiple adjacent objects simultaneously by exploiting vectorized memory accesses. We implemented FastCollect in Java Hotspot VM and evaluated our results by executing DaCapo benchmarks. FastCollect is 4-5x faster than the Parallel Hotspot garbage collector and 42% faster than a previous GPU implementation. In addition, while the existing GPU version requires memory linear in the number of compute units, FastCollect's memory requirement is fixed and low. FastCollect not only improves execution time of garbage collection, but also relieves the CPU for improved user interaction.
- C. R. Attanasio, D. F. Bacon, A. Cocchi, and S. Smith. A Comparative Evaluation of Parallel Garbage Collector Implementations. In Proceedings of the 14th International Conference on Languages and Compilers for Parallel Computing, LCPC'01, pages 177--192, Berlin, Heidelberg, 2003. Springer-Verlag. ISBN 3-540-04029-3. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 169--190, New York, NY, USA, 2006. ACM. ISBN 1-59593-348-4.. Google ScholarDigital Library
- H. Boehm. A garbage collector for C and C++. http://www.hboehm.info/gc/, 2015.Google Scholar
- T. Endo, K. Taura, and A. Yonezawa. A Scalable Mark-sweep Garbage Collector on Large-scale Shared-memory Machines. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, SC '97, pages 1--14, New York, NY, USA, 1997. ACM. ISBN 0-89791-985-8.. Google ScholarDigital Library
- FastCollect Implementation. https://github.com/abhijangda/FastCollect.Google Scholar
- C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel Garbage Collection for Shared Memory Multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1, JVM'01, pages 21--21, Berkeley, CA, USA, 2001. USENIX Association. Google ScholarDigital Library
- L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A study of the scalability of stop-the-world garbage collectors on multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 229--240, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1870-9.. Google ScholarDigital Library
- L. Gidra, G. Thomas, J. Sopena, M. Shapiro, and N. Nguyen. Numagic: A garbage collector for big data on big numa machines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 661--673, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-2835-7.. Google ScholarDigital Library
- P. Harish and P. J. Narayanan. Accelerating Large Graph Algorithms on the GPU Using CUDA. In Proceedings of the 14th International Conference on High Performance Computing, HiPC'07, pages 197--208, Berlin, Heidelberg, 2007. Springer-Verlag. ISBN 3-540-77219-7, 978-3-540-77219-4. Google ScholarDigital Library
- S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating cuda graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 267--276, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0119-0. Google ScholarDigital Library
- A. S. Jiva and G. R. Frost. GPU Assisted Garbage Collection. http://www.patentlens.net/patentlens/patent/US\_2010\_0082930\_A1/en/, 2010.Google Scholar
- R. E. Jones. Garbage Collection: Algorithms for Automatic Dynamic Memory Management, chapter 6, pages 117--140. Wiley, Chichester, 1 edition, 7 1996.Google Scholar
- H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. In Communications of the ACM, Volume 26 Issue 6, pages 419--429, New York, NY, USA, 1983. ACM.. Google ScholarDigital Library
- L. Luo, M. Wong, and W. Hwu. An Effective GPU Implementation of Breadth-first Search. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 52--55, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0002-5.. Google ScholarDigital Library
- M. Maas, P. Reames, J. Morlan, K. Asanović, A. D. Joseph, and J. Kubiatowicz. GPUs As an Opportunity for Offloading Garbage Collection. In Proceedings of the 2012 International Symposium on Memory Management, ISMM '12, pages 25--36, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1350-6.. Google ScholarDigital Library
- D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU Graph Traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1160-1.. Google ScholarDigital Library
- Oracle. Java SE Hotspot at a Glance. http://www.oracle.com/technetwork/articles/javase/index-jsp-136373.html, 2015.Google Scholar
- J. H. Reif. Depth-first search is inherently sequential. Information Processing Letters, 20(5):229 -- 234, 1985. ISSN 0020-0190..Google ScholarCross Ref
- W. Sun and R. Ricci. Augmenting Operating Systems With the GPU. Technical Report, University of Utah, 2010.Google Scholar
- G. Tene, B. Iyengar, and M. Wolf. C4: The continuously concurrent compacting collector. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 79--88, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0.. Google ScholarDigital Library
- D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SDE 1 Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments, pages 157--167, New York, NY, USA, 1984. ACM. ISBN 0-89791-131-8.. Google ScholarDigital Library
- R. Veldema and M. Philippsen. Iterative Data-parallel Mark&Sweep on a GPU. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 1--10, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0.. Google ScholarDigital Library
Index Terms
- FastCollect: offloading generational garbage collection to integrated GPUs
Recommendations
GPUs as an opportunity for offloading garbage collection
ISMM '12GPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated ...
GPUs as an opportunity for offloading garbage collection
ISMM '12: Proceedings of the 2012 international symposium on Memory ManagementGPUs have become part of most commodity systems. Nonetheless, they are often underutilized when not executing graphics-intensive or special-purpose numerical computations, which are rare in consumer workloads. Emerging architectures, such as integrated ...
A generational on-the-fly garbage collector for Java
PLDI '00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementationAn on-the-fly garbage collector does not stop the program threads to perform the collection. Instead, the collector executes in a separate thread (or process) in parallel to the program. On-the-fly collectors are useful for multi-threaded applications ...
Comments