research-article

NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines

Authors:
Lokesh Gidra

LIP6-UPMC/INRIA, Paris, France

LIP6-UPMC/INRIA, Paris, France
View Profile

,
Gaël Thomas

SAMOVAR-Telecom SudParis, Every, France

SAMOVAR-Telecom SudParis, Every, France
View Profile

,
Julien Sopena

LIP6-UPMC/INRIA, Paris, France

LIP6-UPMC/INRIA, Paris, France
View Profile

,
Marc Shapiro

LIP6-INRIA/UPMC, Paris, France

LIP6-INRIA/UPMC, Paris, France
View Profile

,
Nhan Nguyen

Chalmers university of technology, Gothenburg, Sweden

Chalmers university of technology, Gothenburg, Sweden
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 43 Issue 1March 2015pp 661–673https://doi.org/10.1145/2786763.2694361

Published:14 March 2015Publication History

ACM SIGARCH Computer Architecture News

Abstract

On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory accesses, saturating the interconnect between memory nodes. We address this problem with NumaGiC, a GC with a mostly-distributed design. In order to maximise memory access locality during collection, a GC thread avoids accessing a different memory node, instead notifying a remote GC thread with a message; nonetheless, NumaGiC avoids the drawbacks of a pure distributed design, which tends to decrease parallelism. We compare NumaGiC with Parallel Scavenge and NAPS on two different ccNUMA architectures running on the Hotspot Java Virtual Machine of OpenJDK 7. On Spark and Neo4j, two industry-strength analytics applications, with heap sizes ranging from 160GB to 350GB, and on SPECjbb2013 and SPECjbb2005, ourgc improves overall performance by up to 45% over NAPS (up to 94% over Parallel Scavenge), and increases the performance of the collector itself by up to 3.6x over NAPS (up to 5.4x over Parallel Scavenge).

References

T. A. Anderson. Optimizations in a private nursery-based garbage collector. In ISMM '10, pages 21--30. ACM, 2010. Google ScholarDigital Library
A. W. Appel. Simple generational garbage collection and fast allocation. SP&E, 19(2):171--183, 1989. Google ScholarDigital Library
A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schupbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In SOSP '09, pages 29--44. ACM, 2009. Google ScholarDigital Library
S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA '06, pages 169--190. ACM, 2006. Google ScholarDigital Library
K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. TOCS, 3(1):63--75, 1985. Google ScholarDigital Library
M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on NUMA systems. In ASPLOS '13, pages 381--394. ACM, 2013. Google ScholarDigital Library
D. Doligez and X. Leroy. A concurrent, generational garbage collector for a multithreaded implementation of ml. In POPL '93, pages 113--123. ACM, 1993. Google ScholarDigital Library
Friendster. SNAP: network datasets: Friendster social network. http://snap.stanford.edu/data/com-Friendster.html, 2014.Google Scholar
L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. Assessing the scalability of garbage collectors on many cores. In SOSP Workshop on Programming Languages and Operating Systems, PLOS '11, pages 1--5. ACM, 2011. Google ScholarDigital Library
L. Gidra, G. Thomas, J. Sopena, and M. Shapiro. A study of the scalability of stop-the-world garbage collectors on multicores. In ASPLOS '13, pages 229--240. ACM, 2013. Google ScholarDigital Library
H2. H2 database engine. http://www.h2database.com/, 2014.Google Scholar
M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008. Google ScholarDigital Library
R. Jones, A. Hosking, and E. Moss. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC, 1st edition, 2011. Google ScholarDigital Library
H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. CACM, 26(6):419--429, 1983. Google ScholarDigital Library
LinuxMemPolicy. What is Linux memory policy? http://www.kernel.org/doc/Documentation/vm/numa_memory_policy.txt, 2014.Google Scholar
S. Marlow and S. Peyton Jones. Multicore garbage collection with local heaps. In ISMM '11, pages 21--32. ACM, 2011. Google ScholarDigital Library
Neo4j. Neo4j -- the world's leading graph database. http://www.neo4j.org, 2014.Google Scholar
E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In SOSP '09, pages 221--234. ACM, 2009. Google ScholarDigital Library
T. Ogasawara. NUMA-aware memory manager with dominant-thread-based copying GC. In OOPSLA '09, pages 377--390. ACM, 2009. Google ScholarDigital Library
K. Sivaramakrishnan, L. Ziarek, and S. Jagannathan. Eliminating read barriers through procrastination and cleanliness. In ISMM '12, pages 49--60. ACM, 2012. Google ScholarDigital Library
P. Sobalvarro. A lifetime-based garbage collector for LISP systems on general-purpose computers. Technical report, Cam- bridge, MA, USA, 1988. Google ScholarDigital Library
X. Song, H. Chen, R. Chen, Y. Wang, and B. Zang. A case for scaling applications to many-core with OS clustering. In EuroSys '11, pages 61--76. ACM, 2011. Google ScholarDigital Library
Spark. Apache Spark-- lightning-fast cluster computing. http://spark.apache.org, 2014.Google Scholar
SPECjbb2005. SPECjbb2005 home page. http://www.spec.org/jbb2005/, 2014.Google Scholar
SPECjbb2013. SPECjbb2013 home page. http://www.spec.org/jbb2013/, 2014.Google Scholar
I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In KDD '12, pages 1222--1230. ACM, 2012. Google ScholarDigital Library
B. Steensgaard. Thread-specific heaps for multi-threaded programs. In ISMM '00, pages 18--24. ACM, 2000. Google ScholarDigital Library
M. M. Tikir and J. K. Hollingsworth. NUMA-aware Java heaps for server applications. In IPDPS '05, pages 108--117. IEEE Computer Society, 2005. Google ScholarDigital Library
D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SDE '84, pages 157--167. ACM, 1984. Google ScholarDigital Library
P. R. Wilson and T. G. Moher. A "card-marking" scheme for controlling intergenerational references in generation-based garbage collection on stock hardware. SIGPLAN Notice, 24 (5):87--92, 1989. Google ScholarDigital Library
J. Zhou and B. Demsky. Memory management for many-core processors with software configurable locality policies. In ISMM '12, pages 3--14. ACM, 2012. Google ScholarDigital Library

Index Terms

NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

A study of the scalability of stop-the-world garbage collectors on multicores
ASPLOS '13

Large-scale multicore architectures create new challenges for garbage collectors (GCs). In particular, throughput-oriented stop-the-world algorithms demonstrate good performance with a small number of cores, but have been shown to degrade badly beyond ...
Read More
NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory ...
Read More
NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
ASPLOS '15

On contemporary cache-coherent Non-Uniform Memory Access (ccNUMA) architectures, applications with a large memory footprint suffer from the cost of the garbage collector (GC), because, as the GC scans the reference graph, it makes many remote memory ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 43, Issue 1
ASPLOS'15
March 2015
676 pages
ISSN:0163-5964
DOI:10.1145/2786763
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems
March 2015
720 pages
ISBN:9781450328357
DOI:10.1145/2694344
General Chairs:
Ozcan Ozturk
Bilkent University, Turkey
,
Kemal Ebcioglu
Global Supercomputing, USA
,
Program Chair:
Sandhya Dwarkadas
University of Rochester, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 March 2015
Check for updates
Author Tags
NUMA
garbage collection
multicore
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 57
  Total Citations
  View Citations
- 836
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

A study of the scalability of stop-the-world garbage collectors on multicores

NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines

NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines