research-article

RadixVM: scalable address spaces for multithreaded applications

Authors:
Austin T. Clements

MIT CSAIL

MIT CSAIL
View Profile

,
M. Frans Kaashoek

MIT CSAIL

MIT CSAIL
View Profile

,
Nickolai Zeldovich

MIT CSAIL

MIT CSAIL
View Profile

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer SystemsApril 2013Pages 211–224https://doi.org/10.1145/2465351.2465373

Published:15 April 2013Publication History

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

Pages 211–224

ABSTRACT

RadixVM is a new virtual memory system design that enables fully concurrent operations on shared address spaces for multithreaded processes on cache-coherent multicore computers. Today, most operating systems serialize operations such as mmap and munmap, which forces application developers to split their multithreaded applications into multiprocess applications, hoard memory to avoid the overhead of returning it, and so on. RadixVM removes this burden from application developers by ensuring that address space operations on non-overlapping memory regions scale perfectly. It does so by combining three techniques: 1) it organizes metadata in a radix tree instead of a balanced tree to avoid unnecessary cache line movement; 2) it uses a novel memory-efficient distributed reference counting scheme; and 3) it uses a new scheme to target remote TLB shootdowns and to often avoid them altogether. Experiments on an 80 core machine show that RadixVM achieves perfect scalability for non-overlapping regions: if several threads mmap or munmap pages in parallel, they can run completely independently and induce no cache coherence traffic.

References

FreeBSD source code. http://www.freebsd.org/.Google Scholar
J. Appavoo, D. da Silva, O. Krieger, M. Auslander, M. Ostrowski, B. Rosenburg, A. Waterland, R. W. Wisniewski, J. Xenidis, M. Stumm, and L. Soares. Experience distributing objects in an SMMP OS. ACM Transactions on Computer Systems, 25(3), Aug. 2007. Google ScholarDigital Library
A. Baumann, P. Barham, P.-E. Dagand, T. Haris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The Multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), Big Sky, MT, Oct. 2009. Google ScholarDigital Library
D. L. Black, R. F. Rashid, D. B. Golub, and C. R. Hill. Translation lookaside buffer consistency: a software approach. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 113--122, Boston, MA, Apr. 1989. Google ScholarDigital Library
S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), San Diego, CA, Dec. 2008. Google ScholarDigital Library
S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of Linux scalability to many cores. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI), Vancouver, Canada, Oct. 2010. Google ScholarDigital Library
A. T. Clements, M. F. Kaashoek, and N. Zeldovich. Concurrent address spaces using RCU balanced trees. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), London, UK, Mar. 2012. Google ScholarDigital Library
J. Corbet. The search for fast, scalable counters, May 2010. http://lwn.net/Articles/170003/.Google Scholar
R. Cox, M. F. Kaashoek, and R. Morris. Xv6, a simple Unix-like teaching operating system. http://pdos.csail.mit.edu/6.828/2012/xv6.html.Google Scholar
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
J. DeTreville. Experience with concurrent garbage collectors for Modula-2+. Technical Report 64, DEC Systems Research Center, Nov. 1990.Google Scholar
F. Ellen, Y. Lev, V. Luchango, and M. Moir. SNZI: Scalable nonzero indicators. In Proceedings of the 26th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, Portland, OR, Aug. 2007. Google ScholarDigital Library
J. Evans. A scalable concurrent malloc(3) implementation for FreeBSD. In Proceedings of the BSDCan Conference, Ottawa, Canada, Apr. 2006.Google Scholar
B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI), pages 87--100, New Orleans, LA, Feb. 1999. Google ScholarDigital Library
S. Ghemawat. TCMalloc: Thread-caching malloc, 2007. http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html.Google Scholar
M. Herlihy and N. Shavit. The art of multiprocessor programming. Morgan Kaufmann, 2008. Google ScholarDigital Library
P. W. Howard and J. Walpole. Relativistic red-black trees. http://web.cecs.pdx.edu/~walpole/papers/ccpe2011.pdf.Google Scholar
P. W. Howard and J. Walpole. Relativistic red-black trees. Technical Report 10-06, Portland State University, Computer Science Department, 2010.Google Scholar
ISO. ISO/IEC 14882:2011(E): Information technology -- Programming languages -- C++. ISO, Geneva, Switzerland, 2011.Google Scholar
O. Krieger, M. Auslander, B. Rosenburg, R. W. Wisniewski, J. Xenidis, D. da Silva, M. Ostrowski, J. Appavoo, M. Butrico, M. Mergen, A. Waterland, and V. Uhlig. K42: Building a complete operating system. In Proceedings of the ACM EuroSys Conference, Leuven, Belgium, Apr. 2006. Google ScholarDigital Library
R. Liu and H. Chen. SSMalloc: A low-latency, locality-conscious memory allocator with stable performance scalability. In Proceedings of the 3rd Asia-Pacific Workshop on Systems, Seoul, South Korea, July 2012. Google ScholarDigital Library
Y. Mao, R. Morris, and M. F. Kaashoek. Optimizing MapReduce for multicore architectures. Technical Report MIT-CSAIL-TR-2010-020, Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, May 2010.Google Scholar
P. McKenney. Hierarchical RCU, Nov. 2008. https://lwn.net/Articles/305782/.Google Scholar
Microsoft Corp. Windows research kernel. http://www.microsoft.com/resources/sharedsource/windowsacademic/researchkernelkit.mspx.Google Scholar
S. Schneider, C. D. Antonopoulos, and D. S. Nikolopoulos. Scalable locality-conscious multithreaded memory allocation. In Proceedings of the 2006 ACM SIGPLAN International Symposium on Memory Management, Ottawa, Canada, June 2006. Google ScholarDigital Library
G. Tene, B. Iyengar, and M. Wolf. C4: The continuously concurrent compacting collector. SIGPLAN Notices, 46 (11):79--88, June 2011. Google ScholarDigital Library
L. Torvalds et al. Linux source code. http://www.kernel.org/.Google Scholar
V. Uhlig. The mechanics of in-kernel synchronization for a scalable microkernel. ACM SIGOPS Operating Systems Review, 41(4):49--58, July 2007. Google ScholarDigital Library
L. Wang. Windows 7 memory management, Nov. 2009. http://download.microsoft.com/download/7/E/7/7E7662CF-CBEA-470B-A97E-CE7CE0D98DC2/mmwin7.pptx.Google Scholar
D. Wentzlaff and A. Agarwal. Factored operating systems (fos): The case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 43(2):76--85, 2009. Google ScholarDigital Library

Index Terms

RadixVM: scalable address spaces for multithreaded applications

Recommendations

Design and Optimization of Large Size and Low Overhead Off-Chip Caches

Large off-chip L3 caches can significantly improve the performance of memory-intensive applications. However, conventional L3 SRAM caches are facing two issues as those applications require increasingly large caches. First, an SRAM cache has a limited ...
Read More
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
Read More
SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
April 2013
401 pages
ISBN:9781450319942
DOI:10.1145/2465351
General Chairs:
Zdenek Hanzálek
Czech Technical University Prague
,
Hermann Härtig
Technische Universität Dresden
,
Program Chairs:
Miguel Castro
Microsoft Research Cambridge
,
M. Frans Kaashoek
MIT
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 April 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
EuroSys '13 Paper Acceptance Rate28of143submissions,20%Overall Acceptance Rate241of1,308submissions,18%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 664
  Total Downloads
- Downloads (Last 12 months)74
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

RadixVM: scalable address spaces for multithreaded applications

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design and Optimization of Large Size and Low Overhead Off-Chip Caches

TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

RadixVM: scalable address spaces for multithreaded applications

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design and Optimization of Large Size and Low Overhead Off-Chip Caches

TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media