ABSTRACT
Big Data applications suffer from unpredictable and unacceptably high pause times due to Garbage Collection (GC). This is the case in latency-sensitive applications such as on-line credit-card fraud detection, graph-based computing for analysis on social networks, etc. Such pauses compromise latency requirements of the whole application stack and result from applications' aggressive buffering/caching of data, exposing an ill-suited GC design, which assumes that most objects will die young and does not consider that applications hold large amounts of middle-lived data in memory.
To avoid such pauses, we propose NG2C, a new GC algorithm that combines pretenuring with user-defined dynamic generations. By being able to allocate objects into different generations, NG2C is able to group objects with similar lifetime profiles in the same generation. By allocating objects with similar lifetime profiles close to each other, i.e. in the same generation, we avoid object promotion (copying between generations) and heap fragmentation (which leads to heap compactions) both responsible for most of the duration of HotSpot GC pause times.
NG2C is implemented for the OpenJDK 8 HotSpot Java Virtual Machine, as an extension of the Garbage First GC. We evaluate NG2C using Cassandra, Lucene, and GraphChi with three different GCs: Garbage First (G1), Concurrent Mark Sweep (CMS), and NG2C. Results show that NG2C decreases the worst observable GC pause time by up to 94.8% for Cassandra, 85.0% for Lucene and 96.45% for GraphChi, when compared to current collectors (G1 and CMS). In addition, NG2c has no negative impact on application throughput or memory usage.
- A. W. Appel. Simple Generational Garbage Collection and Fast Allocation. Softw. Pract. Exper., 19(2):171–183, Feb. 1989. ISSN 0038-0644. Google ScholarDigital Library
- D. F. Bacon, P. Cheng, and V. T. Rajan. Controlling Fragmentation and Space Consumption in the Metronome, a Real-time Garbage Collector for Java. In Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tool for Embedded Systems, LCTES ’03, pages 81–92, New York, NY, USA, 2003. ACM. ISBN 1-58113-647- 1. Google ScholarDigital Library
- W. S. Beebee Jr and M. Rinard. An implementation of scoped memory for Real-Time Java. In International Workshop on Embedded Software, pages 289–305. Springer, 2001. Google ScholarDigital Library
- S. M. Blackburn and K. S. McKinley. Immix: A Mark-region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’08, pages 22–32, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-860-2. Google ScholarDigital Library
- S. M. Blackburn, S. Singhai, M. Hertz, K. S. McKinely, and J. E. B. Moss. Pretenuring for Java. In Proceedings of the 16th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’01, pages 342–352, New York, NY, USA, 2001. ACM. ISBN 1-58113-335-9. Google ScholarDigital Library
- S. M. Blackburn, R. Jones, K. S. McKinley, and J. E. B. Moss. Beltway: Getting Around Garbage Collection Gridlock. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 153–164, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- ISBN 1-58113-463-0.Google Scholar
- C. Boyapati, A. Salcianu, W. Beebee, Jr., and M. Rinard. Ownership Types for Safe Region-based Memory Management in Real-time Java. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI ’03, pages 324– 337, New York, NY, USA, 2003. ACM. ISBN 1-58113-662-5. Google ScholarDigital Library
- Y. Bu, V. Borkar, G. Xu, and M. J. Carey. A Bloat-aware Design for Big Data Applications. In Proceedings of the 2013 International Symposium on Memory Management, ISMM ’13, pages 119–130, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2100-6. Google ScholarDigital Library
- J. Caballero, G. Grieco, M. Marron, and A. Nappa. Undangle: Early Detection of Dangling Pointers in Use-after-free and Double-free Vulnerabilities. In Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, pages 133–143, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1454-1. Google ScholarDigital Library
- P. Cheng, R. Harper, and P. Lee. Generational Stack Collection and Profile-driven Pretenuring. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI ’98, pages 162–173, New York, NY, USA, 1998. ACM. ISBN 0-89791-987-4. Google ScholarDigital Library
- D. Clifford, H. Payer, M. Stanton, and B. L. Titzer. Memento Mori: Dynamic Allocation-site-based Optimizations. In Proceedings of the 2015 International Symposium on Memory Management, ISMM ’15, pages 105–117, New York, NY, USA, 2015. ACM. ISBN 978-1-4503- 3589-8. Google ScholarDigital Library
- C. Cowan, P. Wagle, C. Pu, S. Beattie, and J. Walpole. Buffer overflows: Attacks and defenses for the vulnerability of the decade. In DARPA Information Survivability Conference and Exposition, 2000. DISCEX’00. Proceedings, volume 2, pages 119–129. IEEE, 2000.Google Scholar
- J. Dean and L. A. Barroso. The Tail at Scale. Commun. ACM, 56(2): 74–80, Feb. 2013. ISSN 0001-0782. Google ScholarDigital Library
- D. Detlefs, C. Flood, S. Heller, and T. Printezis. Garbage-first Garbage Collection. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 37–48, New York, NY, USA, 2004. ACM. ISBN 1-58113-945-4. Google ScholarDigital Library
- D. Doligez and X. Leroy. A Concurrent, Generational Garbage Collector for a Multithreaded Implementation of ML. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’93, pages 113–123, New York, NY, USA, 1993. ACM. ISBN 0-89791-560-7. Google ScholarDigital Library
- D. Gay and A. Aiken. Language Support for Regions. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, PLDI ’01, pages 70–80, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- ISBN 1-58113-414-2.Google Scholar
- D. Gay and B. Steensgaard. Fast Escape Analysis and Stack Allocation for Object-Based Programs. In Proceedings of the 9th International Conference on Compiler Construction, CC ’00, pages 82–93, London, UK, UK, 2000. Springer-Verlag. ISBN 3-540-67263-X. Google ScholarDigital Library
- L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A Study of the Scalability of Stop-the-world Garbage Collectors on Multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, pages 229–240, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1870-9. Google ScholarDigital Library
- I. Gog, J. Giceva, M. Schwarzkopf, K. Vaswani, D. Vytiniotis, G. Ramalingan, D. Murray, S. Hand, and M. Isard. Broom: Sweeping out Garbage Collection from Big Data Systems. In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems, HOTOS’15, pages 2–2, Berkeley, CA, USA, 2015. USENIX Association. Google ScholarDigital Library
- D. Grossman, G. Morrisett, T. Jim, M. Hicks, Y. Wang, and J. Cheney. Region-based Memory Management in Cyclone. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 282–293, New York, NY, USA, 2002. ACM. ISBN 1-58113-463-0. Google ScholarDigital Library
- S. Z. Guyer and K. S. McKinley. Finding Your Cronies: Static Analysis for Dynamic Object Colocation. In Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’04, pages 237– 250, New York, NY, USA, 2004. ACM. ISBN 1-58113-831-8. Google ScholarDigital Library
- N. Hallenberg, M. Elsman, and M. Tofte. Combining Region Inference and Garbage Collection. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pages 141–152, New York, NY, USA, 2002. ACM. ISBN 1-58113-463-0. Google ScholarDigital Library
- T. L. Harris. Dynamic Adaptive Pre-tenuring. In Proceedings of the 2nd International Symposium on Memory Management, ISMM ’00, pages 127–136, New York, NY, USA, 2000. Google ScholarDigital Library
- ACM. ISBN 1-58113-263-8.Google Scholar
- M. Hicks, G. Morrisett, D. Grossman, and T. Jim. Experience with Safe Manual Memory-management in Cyclone. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 73–84, New York, NY, USA, 2004. ACM. ISBN 1-58113-945- 4. Google ScholarDigital Library
- R. L. Hudson and J. E. B. Moss. Incremental collection of mature objects. In Memory Management, pages 388–403. Springer, 1992. Google ScholarDigital Library
- R. L. Hudson, R. Morrison, J. E. B. Moss, and D. S. Munro. Garbage Collecting the World: One Car at a Time. In Proceedings of the 12th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA ’97, pages 162– 175, New York, NY, USA, 1997. ACM. ISBN 0-89791-908-4. Google ScholarDigital Library
- R. Jones and C. Ryder. Garbage collection should be lifetime aware. Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, pages 182–196, 2006.Google Scholar
- R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. CRC Press, 2016. Google ScholarDigital Library
- R. E. Jones and C. Ryder. A Study of Java Object Demographics. In Proceedings of the 7th International Symposium on Memory Management, ISMM ’08, pages 121–130, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-134-7. Google ScholarDigital Library
- M. Jump, S. M. Blackburn, and K. S. McKinley. Dynamic Object Sampling for Pretenuring. In Proceedings of the 4th International Symposium on Memory Management, ISMM ’04, pages 152– 162, New York, NY, USA, 2004. ACM. ISBN 1-58113-945-4. Google ScholarDigital Library
- S. Kowshik, D. Dhurjati, and V. Adve. Ensuring Code Safety Without Runtime Checks for Real-time Control Systems. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES ’02, pages 288– 297, New York, NY, USA, 2002. ACM. ISBN 1-58113-575-0. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a Social Network or a News Media? In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pages 591–600, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-799-8. Google ScholarDigital Library
- A. Kyrola, G. Blelloch, and C. Guestrin. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pages 31–46, Berkeley, CA, USA, 2012. USENIX Association. ISBN 978-1-931971-96-6. Google ScholarDigital Library
- A. Lakshman and P. Malik. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev., 44(2):35–40, Apr. 2010. ISSN 0163-5980. Google ScholarDigital Library
- P. Lengauer and H. Mössenböck. The Taming of the Shrew: Increasing Performance by Automatic Parameter Tuning for Java Garbage Collectors. In Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, ICPE ’14, pages 111–122, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2733-6. Google ScholarDigital Library
- H. Lieberman and C. Hewitt. A Real-time Garbage Collector Based on the Lifetimes of Objects. Communications of the ACM, 26(6):419– 429, June 1983. ISSN 0001-0782. Google ScholarDigital Library
- L. Lu, X. Shi, Y. Zhou, X. Zhang, H. Jin, C. Pei, L. He, and Y. Geng. Lifetime-based Memory Management for Distributed Data Processing Systems. Proc. VLDB Endow., 9(12):936–947, Aug. 2016. Google ScholarDigital Library
- ISSN 2150-8097.Google Scholar
- S. Marion, R. Jones, and C. Ryder. Decrypting the Java Gene Pool. In Proceedings of the 6th International Symposium on Memory Management, ISMM ’07, pages 67–78, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-893-0. Google ScholarDigital Library
- S. Marlow, T. Harris, R. P. James, and S. Peyton Jones. Parallel Generational-copying Garbage Collection with a Block-structured Heap. In Proceedings of the 7th International Symposium on Memory Management, ISMM ’08, pages 11–20, New York, NY, USA, 2008. Google ScholarDigital Library
- ACM. ISBN 978-1-60558-134-7.Google Scholar
- L. Mastrangelo, L. Ponzanelli, A. Mocci, M. Lanza, M. Hauswirth, and N. Nystrom. Use at Your Own Risk: The Java Unsafe API in the Wild. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, pages 695–710, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3689-5. Google ScholarDigital Library
- M. McCandless, E. Hatcher, and O. Gospodnetic. Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications Co., Greenwich, CT, USA, 2010. ISBN 1933988177, 9781933988177. Google ScholarDigital Library
- K. Nguyen, K. Wang, Y. Bu, L. Fang, J. Hu, and G. Xu. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’15, pages 675–690, New York, NY, USA, 2015. Google ScholarDigital Library
- ACM. ISBN 978-1-4503-2835-7.Google Scholar
- K. Nguyen, L. Fang, G. Xu, B. Demsky, S. Lu, S. Alamian, and O. Mutlu. Yak: A High-performance Big-data-friendly Garbage Collector. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, pages 349– 365, Berkeley, CA, USA, 2016. USENIX Association. ISBN 978- 1-931971-33-1. Google ScholarDigital Library
- D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast Crash Recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP ’11, pages 29–41, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0977-6. Google ScholarDigital Library
- K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun. Making Sense of Performance in Data Analytics Frameworks. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI’15, pages 293–307, Berkeley, CA, USA, 2015. USENIX Association. ISBN 978-1-931971-218. Google ScholarDigital Library
- F. Pizlo, L. Ziarek, and J. Vitek. Real Time Java on Resourceconstrained Platforms with Fiji VM. In Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems, JTRES ’09, pages 110–119, New York, NY, USA, 2009. Google ScholarDigital Library
- ACM. ISBN 978-1-60558-732-5.Google Scholar
- R. Power and J. Li. Piccolo: Building Fast, Distributed Programs with Partitioned Tables. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pages 293– 306, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarDigital Library
- S. M. Rumble, D. Ongaro, R. Stutsman, M. Rosenblum, and J. K. Ousterhout. It’s Time for Low Latency. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems, HotOS’13, pages 11–11, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarDigital Library
- J. Seligmann and S. Grarup. Incremental mature garbage collection using the train algorithm. In European Conference on Object-Oriented Programming, pages 235–252. Springer, 1995. Google ScholarDigital Library
- A. Shinnar, D. Cunningham, V. Saraswat, and B. Herta. M3R: Increased Performance for In-memory Hadoop Jobs. Proc. VLDB Endow., 5(12):1736–1747, Aug. 2012. ISSN 2150-8097. Google ScholarDigital Library
- C. Stancu, C. Wimmer, S. Brunthaler, P. Larsen, and M. Franz. Safe and Efficient Hybrid Memory Management for Java. In Proceedings of the 2015 International Symposium on Memory Management, ISMM ’15, pages 81–92, New York, NY, USA, 2015. ACM. ISBN 978-1- 4503-3589-8. Google ScholarDigital Library
- G. Tene, B. Iyengar, and M. Wolf. C4: The Continuously Concurrent Compacting Collector. In Proceedings of the International Symposium on Memory Management, ISMM ’11, pages 79–88, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0263-0. Google ScholarDigital Library
- D. Ungar. Generation Scavenging: A Non-disruptive High Performance Storage Reclamation Algorithm. In Proceedings of the First ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, SDE 1, pages 157–167, New York, NY, USA, 1984. ACM. Google ScholarDigital Library
- ISBN 0-89791-131-8.Google Scholar
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pages 10–10, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, pages 2–2, Berkeley, CA, USA, 2012. USENIX Association. Google ScholarDigital Library
- H. Zhang, G. Chen, B. C. Ooi, K.-L. Tan, and M. Zhang. In-Memory Big Data Management and Processing: A Survey. IEEE Transactions on Knowledge & Data Engineering, 27(7):1920–1948, 2015. ISSN 1041-4347. doi:Google ScholarCross Ref
Index Terms
- NG2C: pretenuring garbage collection with dynamic generations for HotSpot big data applications
Recommendations
NG2C: pretenuring garbage collection with dynamic generations for HotSpot big data applications
ISMM '17Big Data applications suffer from unpredictable and unacceptably high pause times due to Garbage Collection (GC). This is the case in latency-sensitive applications such as on-line credit-card fraud detection, graph-based computing for analysis on ...
Age-based garbage collection
Modern generational garbage collectors look for garbage among the young objects, because they have high mortality; however, these objects include the very youngest objects, which clearly are still live. We introduce new garbage collection algorithms, ...
Age-based garbage collection
OOPSLA '99: Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applicationsModern generational garbage collectors look for garbage among the young objects, because they have high mortality; however, these objects include the very youngest objects, which clearly are still live. We introduce new garbage collection algorithms, ...
Comments