Abstract
On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160GB to 350GB, and on SPECjbb2013 and SPECjbb2005, ourgc improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6x over NAPS (up to 5.4x over Parallel Scavenge).
- T. A. Anderson. Optimizations in a private nursery-based garbage collector. In ISMM '10, pages 21--30. ACM, 2010. Google ScholarDigital Library
- A. W. Appel. Simple generational garbage collection and fast allocation. SP&E, 19(2):171--183, 1989. Google ScholarDigital Library
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schupbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In SOSP '09, pages 29--44. ACM, 2009. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06, pages 169--190. ACM, 2006. Google ScholarDigital Library
- K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. TOCS, 3(1):63--75, 1985. Google ScholarDigital Library
- M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on NUMA systems. In ASPLOS '13, pages 381--394. ACM, 2013. Google ScholarDigital Library
- D. Doligez and X. Leroy. A concurrent, generational garbage collector for a multithreaded implementation of ml. In POPL '93, pages 113--123. ACM, 1993. Google ScholarDigital Library
- Friendster. SNAP: network datasets: Friendster social network. http://snap.stanford.edu/data/com-Friendster.html, 2014.Google Scholar
- L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. Assessing the scalability of garbage collectors on many cores. In SOSP Workshop on Programming Languages and Operating Systems, PLOS '11, pages 1--5. ACM, 2011. Google ScholarDigital Library
- L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A study of the scalability of stop-the-world garbage collectors on multicores. In ASPLOS '13, pages 229--240. ACM, 2013. Google ScholarDigital Library
- H2. H2 database engine. http://www.h2database.com/, 2014.Google Scholar
- M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008. Google ScholarDigital Library
- R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC, 1st edition, 2011. Google ScholarDigital Library
- H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. CACM, 26(6):419--429, 1983. Google ScholarDigital Library
- LinuxMemPolicy. What is Linux memory policy? http://www.kernel.org/doc/Documentation/vm/numa_memory_policy.txt, 2014.Google Scholar
- S. Marlow and S. Peyton Jones. Multicore garbage collection with local heaps. In ISMM '11, pages 21--32. ACM, 2011. Google ScholarDigital Library
- Neo4j. Neo4j -- the world's leading graph database. http://www.neo4j.org, 2014.Google Scholar
- E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP '09, pages 221--234. ACM, 2009. Google ScholarDigital Library
- T. Ogasawara. NUMA-aware memory manager with dominant-thread-based copying GC. In OOPSLA '09, pages 377--390. ACM, 2009. Google ScholarDigital Library
- K. Sivaramakrishnan, L. Ziarek, and S. Jagannathan. Eliminating read barriers through procrastination and cleanliness. In ISMM '12, pages 49--60. ACM, 2012. Google ScholarDigital Library
- P. Sobalvarro. A lifetime-based garbage collector for LISP systems on general-purpose computers. Technical report, Cam- bridge, MA, USA, 1988. Google ScholarDigital Library
- X. Song, H. Chen, R. Chen, Y. Wang, and B. Zang. A case for scaling applications to many-core with OS clustering. In EuroSys '11, pages 61--76. ACM, 2011. Google ScholarDigital Library
- Spark. Apache Spark-- lightning-fast cluster computing. http://spark.apache.org, 2014.Google Scholar
- SPECjbb2005. SPECjbb2005 home page. http://www.spec.org/jbb2005/, 2014.Google Scholar
- SPECjbb2013. SPECjbb2013 home page. http://www.spec.org/jbb2013/, 2014.Google Scholar
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In KDD '12, pages 1222--1230. ACM, 2012. Google ScholarDigital Library
- B. Steensgaard. Thread-specific heaps for multi-threaded programs. In ISMM '00, pages 18--24. ACM, 2000. Google ScholarDigital Library
- M. M. Tikir and J. K. Hollingsworth. NUMA-aware Java heaps for server applications. In IPDPS '05, pages 108--117. IEEE Computer Society, 2005. Google ScholarDigital Library
- D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SDE '84, pages 157--167. ACM, 1984. Google ScholarDigital Library
- P. R. Wilson and T. G. Moher. A "card-marking" scheme for controlling intergenerational references in generation-based garbage collection on stock hardware. SIGPLAN Notice, 24 (5):87--92, 1989. Google ScholarDigital Library
- J. Zhou and B. Demsky. Memory management for many-core processors with software configurable locality policies. In ISMM '12, pages 3--14. ACM, 2012. Google ScholarDigital Library
Index Terms
- NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
Recommendations
A study of the scalability of stop-the-world garbage collectors on multicores
ASPLOS '13Large-scale multicore architectures create new challenges for garbage collectors (GCs). In particular, throughput-oriented stop-the-world algorithms demonstrate good performance with a small number of cores, but have been shown to degrade badly beyond ...
NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating SystemsOn contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory ...
NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
ASPLOS '15On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory ...
Comments