skip to main content
10.1145/3092255.3092272acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article

NG2C: pretenuring garbage collection with dynamic generations for HotSpot big data applications

Published:18 June 2017Publication History

ABSTRACT

Big Data applications suffer from unpredictable and unacceptably high pause times due to Garbage Collection (GC). This is the case in latency-sensitive applications such as on-line credit-card fraud detection, graph-based computing for analysis on social networks, etc. Such pauses compromise latency requirements of the whole application stack and result from applications' aggressive buffering/caching of data, exposing an ill-suited GC design, which assumes that most objects will die young and does not consider that applications hold large amounts of middle-lived data in memory.

To avoid such pauses, we propose NG2C, a new GC algorithm that combines pretenuring with user-defined dynamic generations. By being able to allocate objects into different generations, NG2C is able to group objects with similar lifetime profiles in the same generation. By allocating objects with similar lifetime profiles close to each other, i.e. in the same generation, we avoid object promotion (copying between generations) and heap fragmentation (which leads to heap compactions) both responsible for most of the duration of HotSpot GC pause times.

NG2C is implemented for the OpenJDK 8 HotSpot Java Virtual Machine, as an extension of the Garbage First GC. We evaluate NG2C using Cassandra, Lucene, and GraphChi with three different GCs: Garbage First (G1), Concurrent Mark Sweep (CMS), and NG2C. Results show that NG2C decreases the worst observable GC pause time by up to 94.8% for Cassandra, 85.0% for Lucene and 96.45% for GraphChi, when compared to current collectors (G1 and CMS). In addition, NG2c has no negative impact on application throughput or memory usage.

References

  1. A. W. Appel. Simple Generational Garbage Collection and Fast Allocation. Softw. Pract. Exper., 19(2):171–183, Feb. 1989. ISSN 0038-0644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. F. Bacon, P. Cheng, and V. T. Rajan. Controlling Fragmentation and Space Consumption in the Metronome, a Real-time Garbage Collector for Java. In Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems, LCTES ’03, pages 81–92, New York, NY, USA, 2003. ACM. ISBN 1-58113-647- 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. S. Beebee Jr and M. Rinard. An implementation of scoped memory for Real-Time Java. In International Workshop on Embedded Software, pages 289–305. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. M. Blackburn and K. S. McKinley. Immix: A Mark-region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 22–32, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-860-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. M. Blackburn, S. Singhai, M. Hertz, K. S. McKinely, and J. E. B. Moss. Pretenuring for Java. In Proceedings of the 16th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’01, pages 342–352, New York, NY, USA, 2001. ACM. ISBN 1-58113-335-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. M. Blackburn, R. Jones, K. S. McKinley, and J. E. B. Moss. Beltway: Getting Around Garbage Collection Gridlock. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 153–164, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. ISBN 1-58113-463-0.Google ScholarGoogle Scholar
  8. C. Boyapati, A. Salcianu, W. Beebee, Jr., and M. Rinard. Ownership Types for Safe Region-based Memory Management in Real-time Java. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI ’03, pages 324– 337, New York, NY, USA, 2003. ACM. ISBN 1-58113-662-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Bu, V. Borkar, G. Xu, and M. J. Carey. A Bloat-aware Design for Big Data Applications. In Proceedings of the 2013 International Symposium on Memory Management, ISMM ’13, pages 119–130, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2100-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Caballero, G. Grieco, M. Marron, and A. Nappa. Undangle: Early Detection of Dangling Pointers in Use-after-free and Double-free Vulnerabilities. In Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, pages 133–143, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1454-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Cheng, R. Harper, and P. Lee. Generational Stack Collection and Profile-driven Pretenuring. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI ’98, pages 162–173, New York, NY, USA, 1998. ACM. ISBN 0-89791-987-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Clifford, H. Payer, M. Stanton, and B. L. Titzer. Memento Mori: Dynamic Allocation-site-based Optimizations. In Proceedings of the 2015 International Symposium on Memory Management, ISMM ’15, pages 105–117, New York, NY, USA, 2015. ACM. ISBN 978-1-4503- 3589-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Cowan, P. Wagle, C. Pu, S. Beattie, and J. Walpole. Buffer overflows: Attacks and defenses for the vulnerability of the decade. In DARPA Information Survivability Conference and Exposition, 2000. DISCEX’00. Proceedings, volume 2, pages 119–129. IEEE, 2000.Google ScholarGoogle Scholar
  14. J. Dean and L. A. Barroso. The Tail at Scale. Commun. ACM, 56(2): 74–80, Feb. 2013. ISSN 0001-0782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Detlefs, C. Flood, S. Heller, and T. Printezis. Garbage-first Garbage Collection. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 37–48, New York, NY, USA, 2004. ACM. ISBN 1-58113-945-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Doligez and X. Leroy. A Concurrent, Generational Garbage Collector for a Multithreaded Implementation of ML. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’93, pages 113–123, New York, NY, USA, 1993. ACM. ISBN 0-89791-560-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Gay and A. Aiken. Language Support for Regions. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, PLDI ’01, pages 70–80, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. ISBN 1-58113-414-2.Google ScholarGoogle Scholar
  19. D. Gay and B. Steensgaard. Fast Escape Analysis and Stack Allocation for Object-Based Programs. In Proceedings of the 9th International Conference on Compiler Construction, CC ’00, pages 82–93, London, UK, UK, 2000. Springer-Verlag. ISBN 3-540-67263-X. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A Study of the Scalability of Stop-the-world Garbage Collectors on Multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pages 229–240, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1870-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. I. Gog, J. Giceva, M. Schwarzkopf, K. Vaswani, D. Vytiniotis, G. Ramalingan, D. Murray, S. Hand, and M. Isard. Broom: Sweeping out Garbage Collection from Big Data Systems. In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems, HOTOS’15, pages 2–2, Berkeley, CA, USA, 2015. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Grossman, G. Morrisett, T. Jim, M. Hicks, Y. Wang, and J. Cheney. Region-based Memory Management in Cyclone. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 282–293, New York, NY, USA, 2002. ACM. ISBN 1-58113-463-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Z. Guyer and K. S. McKinley. Finding Your Cronies: Static Analysis for Dynamic Object Colocation. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’04, pages 237– 250, New York, NY, USA, 2004. ACM. ISBN 1-58113-831-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Hallenberg, M. Elsman, and M. Tofte. Combining Region Inference and Garbage Collection. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 141–152, New York, NY, USA, 2002. ACM. ISBN 1-58113-463-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. L. Harris. Dynamic Adaptive Pre-tenuring. In Proceedings of the 2nd International Symposium on Memory Management, ISMM ’00, pages 127–136, New York, NY, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. ACM. ISBN 1-58113-263-8.Google ScholarGoogle Scholar
  27. M. Hicks, G. Morrisett, D. Grossman, and T. Jim. Experience with Safe Manual Memory-management in Cyclone. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 73–84, New York, NY, USA, 2004. ACM. ISBN 1-58113-945- 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. L. Hudson and J. E. B. Moss. Incremental collection of mature objects. In Memory Management, pages 388–403. Springer, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. L. Hudson, R. Morrison, J. E. B. Moss, and D. S. Munro. Garbage Collecting the World: One Car at a Time. In Proceedings of the 12th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’97, pages 162– 175, New York, NY, USA, 1997. ACM. ISBN 0-89791-908-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Jones and C. Ryder. Garbage collection should be lifetime aware. Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, pages 182–196, 2006.Google ScholarGoogle Scholar
  31. R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. CRC Press, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. E. Jones and C. Ryder. A Study of Java Object Demographics. In Proceedings of the 7th International Symposium on Memory Management, ISMM ’08, pages 121–130, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-134-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Jump, S. M. Blackburn, and K. S. McKinley. Dynamic Object Sampling for Pretenuring. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 152– 162, New York, NY, USA, 2004. ACM. ISBN 1-58113-945-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Kowshik, D. Dhurjati, and V. Adve. Ensuring Code Safety Without Runtime Checks for Real-time Control Systems. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES ’02, pages 288– 297, New York, NY, USA, 2002. ACM. ISBN 1-58113-575-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a Social Network or a News Media? In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pages 591–600, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-799-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pages 31–46, Berkeley, CA, USA, 2012. USENIX Association. ISBN 978-1-931971-96-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Lakshman and P. Malik. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev., 44(2):35–40, Apr. 2010. ISSN 0163-5980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. Lengauer and H. Mössenböck. The Taming of the Shrew: Increasing Performance by Automatic Parameter Tuning for Java Garbage Collectors. In Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, ICPE ’14, pages 111–122, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2733-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. Lieberman and C. Hewitt. A Real-time Garbage Collector Based on the Lifetimes of Objects. Communications of the ACM, 26(6):419– 429, June 1983. ISSN 0001-0782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Lu, X. Shi, Y. Zhou, X. Zhang, H. Jin, C. Pei, L. He, and Y. Geng. Lifetime-based Memory Management for Distributed Data Processing Systems. Proc. VLDB Endow., 9(12):936–947, Aug. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. ISSN 2150-8097.Google ScholarGoogle Scholar
  42. S. Marion, R. Jones, and C. Ryder. Decrypting the Java Gene Pool. In Proceedings of the 6th International Symposium on Memory Management, ISMM ’07, pages 67–78, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-893-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel Generational-copying Garbage Collection with a Block-structured Heap. In Proceedings of the 7th International Symposium on Memory Management, ISMM ’08, pages 11–20, New York, NY, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. ACM. ISBN 978-1-60558-134-7.Google ScholarGoogle Scholar
  45. L. Mastrangelo, L. Ponzanelli, A. Mocci, M. Lanza, M. Hauswirth, and N. Nystrom. Use at Your Own Risk: The Java Unsafe API in the Wild. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pages 695–710, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3689-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. McCandless, E. Hatcher, and O. Gospodnetic. Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications Co., Greenwich, CT, USA, 2010. ISBN 1933988177, 9781933988177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, and G. Xu. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’15, pages 675–690, New York, NY, USA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. ACM. ISBN 978-1-4503-2835-7.Google ScholarGoogle Scholar
  49. K. Nguyen, L. Fang, G. Xu, B. Demsky, S. Lu, S. Alamian, and O. Mutlu. Yak: A High-performance Big-data-friendly Garbage Collector. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, pages 349– 365, Berkeley, CA, USA, 2016. USENIX Association. ISBN 978- 1-931971-33-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast Crash Recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP ’11, pages 29–41, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0977-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun. Making Sense of Performance in Data Analytics Frameworks. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI’15, pages 293–307, Berkeley, CA, USA, 2015. USENIX Association. ISBN 978-1-931971-218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. F. Pizlo, L. Ziarek, and J. Vitek. Real Time Java on Resourceconstrained Platforms with Fiji VM. In Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems, JTRES ’09, pages 110–119, New York, NY, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. ACM. ISBN 978-1-60558-732-5.Google ScholarGoogle Scholar
  54. R. Power and J. Li. Piccolo: Building Fast, Distributed Programs with Partitioned Tables. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pages 293– 306, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. S. M. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, and J. K. Ousterhout. It’s Time for Low Latency. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems, HotOS’13, pages 11–11, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. J. Seligmann and S. Grarup. Incremental mature garbage collection using the train algorithm. In European Conference on Object-Oriented Programming, pages 235–252. Springer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. A. Shinnar, D. Cunningham, V. Saraswat, and B. Herta. M3R: Increased Performance for In-memory Hadoop Jobs. Proc. VLDB Endow., 5(12):1736–1747, Aug. 2012. ISSN 2150-8097. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. C. Stancu, C. Wimmer, S. Brunthaler, P. Larsen, and M. Franz. Safe and Efficient Hybrid Memory Management for Java. In Proceedings of the 2015 International Symposium on Memory Management, ISMM ’15, pages 81–92, New York, NY, USA, 2015. ACM. ISBN 978-1- 4503-3589-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. G. Tene, B. Iyengar, and M. Wolf. C4: The Continuously Concurrent Compacting Collector. In Proceedings of the International Symposium on Memory Management, ISMM ’11, pages 79–88, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. D. Ungar. Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm. In Proceedings of the First ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, SDE 1, pages 157–167, New York, NY, USA, 1984. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. ISBN 0-89791-131-8.Google ScholarGoogle Scholar
  62. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pages 10–10, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, pages 2–2, Berkeley, CA, USA, 2012. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. H. Zhang, G. Chen, B. C. Ooi, K.-L. Tan, and M. Zhang. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge & Data Engineering, 27(7):1920–1948, 2015. ISSN 1041-4347. doi:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. NG2C: pretenuring garbage collection with dynamic generations for HotSpot big data applications

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ISMM 2017: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management
    June 2017
    127 pages
    ISBN:9781450350440
    DOI:10.1145/3092255

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 18 June 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate72of156submissions,46%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader