skip to main content
10.1145/2968455.2968520acmotherconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

FastCollect: offloading generational garbage collection to integrated GPUs

Published:01 October 2016Publication History

ABSTRACT

Generational Mark-Sweep Garbage Collection is a widely used garbage collection technique. However, the garbage collector has poor execution efficiency for large programs. Aggressive collection causes execution pauses in the program, while reducing the collection frequency leads to memory wastage. In this work, we develop FastCollect, a parallel version of the generational mark-sweep garbage collector running on a graphics processing unit (GPU). At the core of our parallel implementation lies a parallel Depth First Search using a space-efficient concurrent stack, which we develop for the young and the mature garbage collection. To further improve performance, (i) we reduce thread-divergence and improve load-balancing by devising a distributed work-stealing approach, (ii) we optimize our garbage collection algorithm to reduce the number of atomic instructions, (iii) we exploit the memory hierarchy to design a hybrid stack, and (iv) we extract multiple adjacent objects simultaneously by exploiting vectorized memory accesses. We implemented FastCollect in Java Hotspot VM and evaluated our results by executing DaCapo benchmarks. FastCollect is 4-5x faster than the Parallel Hotspot garbage collector and 42% faster than a previous GPU implementation. In addition, while the existing GPU version requires memory linear in the number of compute units, FastCollect's memory requirement is fixed and low. FastCollect not only improves execution time of garbage collection, but also relieves the CPU for improved user interaction.

References

  1. C. R. Attanasio, D. F. Bacon, A. Cocchi, and S. Smith. A Comparative Evaluation of Parallel Garbage Collector Implementations. In Proceedings of the 14th International Conference on Languages and Compilers for Parallel Computing, LCPC'01, pages 177--192, Berlin, Heidelberg, 2003. Springer-Verlag. ISBN 3-540-04029-3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 169--190, New York, NY, USA, 2006. ACM. ISBN 1-59593-348-4.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Boehm. A garbage collector for C and C++. http://www.hboehm.info/gc/, 2015.Google ScholarGoogle Scholar
  4. T. Endo, K. Taura, and A. Yonezawa. A Scalable Mark-sweep Garbage Collector on Large-scale Shared-memory Machines. In Proceedings of the 1997 ACM/IEEE Conference on Supercomputing, SC '97, pages 1--14, New York, NY, USA, 1997. ACM. ISBN 0-89791-985-8.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. FastCollect Implementation. https://github.com/abhijangda/FastCollect.Google ScholarGoogle Scholar
  6. C. H. Flood, D. Detlefs, N. Shavit, and X. Zhang. Parallel Garbage Collection for Shared Memory Multiprocessors. In Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1, JVM'01, pages 21--21, Berkeley, CA, USA, 2001. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A study of the scalability of stop-the-world garbage collectors on multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 229--240, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1870-9.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Gidra, G. Thomas, J. Sopena, M. Shapiro, and N. Nguyen. Numagic: A garbage collector for big data on big numa machines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 661--673, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-2835-7.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Harish and P. J. Narayanan. Accelerating Large Graph Algorithms on the GPU Using CUDA. In Proceedings of the 14th International Conference on High Performance Computing, HiPC'07, pages 197--208, Berlin, Heidelberg, 2007. Springer-Verlag. ISBN 3-540-77219-7, 978-3-540-77219-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating cuda graph algorithms at maximum warp. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 267--276, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0119-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. S. Jiva and G. R. Frost. GPU Assisted Garbage Collection. http://www.patentlens.net/patentlens/patent/US\_2010\_0082930\_A1/en/, 2010.Google ScholarGoogle Scholar
  12. R. E. Jones. Garbage Collection: Algorithms for Automatic Dynamic Memory Management, chapter 6, pages 117--140. Wiley, Chichester, 1 edition, 7 1996.Google ScholarGoogle Scholar
  13. H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. In Communications of the ACM, Volume 26 Issue 6, pages 419--429, New York, NY, USA, 1983. ACM.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Luo, M. Wong, and W. Hwu. An Effective GPU Implementation of Breadth-first Search. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 52--55, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0002-5.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Maas, P. Reames, J. Morlan, K. Asanović, A. D. Joseph, and J. Kubiatowicz. GPUs As an Opportunity for Offloading Garbage Collection. In Proceedings of the 2012 International Symposium on Memory Management, ISMM '12, pages 25--36, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1350-6.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU Graph Traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1160-1.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Oracle. Java SE Hotspot at a Glance. http://www.oracle.com/technetwork/articles/javase/index-jsp-136373.html, 2015.Google ScholarGoogle Scholar
  18. J. H. Reif. Depth-first search is inherently sequential. Information Processing Letters, 20(5):229 -- 234, 1985. ISSN 0020-0190..Google ScholarGoogle ScholarCross RefCross Ref
  19. W. Sun and R. Ricci. Augmenting Operating Systems With the GPU. Technical Report, University of Utah, 2010.Google ScholarGoogle Scholar
  20. G. Tene, B. Iyengar, and M. Wolf. C4: The continuously concurrent compacting collector. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 79--88, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SDE 1 Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments, pages 157--167, New York, NY, USA, 1984. ACM. ISBN 0-89791-131-8.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Veldema and M. Philippsen. Iterative Data-parallel Mark&Sweep on a GPU. In Proceedings of the International Symposium on Memory Management, ISMM '11, pages 1--10, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0.. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FastCollect: offloading generational garbage collection to integrated GPUs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems
        October 2016
        187 pages
        ISBN:9781450344821
        DOI:10.1145/2968455

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 October 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate52of230submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader