skip to main content
10.1145/2465351.2465373acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

RadixVM: scalable address spaces for multithreaded applications

Published:15 April 2013Publication History

ABSTRACT

RadixVM is a new virtual memory system design that enables fully concurrent operations on shared address spaces for multithreaded processes on cache-coherent multicore computers. Today, most operating systems serialize operations such as mmap and munmap, which forces application developers to split their multithreaded applications into multiprocess applications, hoard memory to avoid the overhead of returning it, and so on. RadixVM removes this burden from application developers by ensuring that address space operations on non-overlapping memory regions scale perfectly. It does so by combining three techniques: 1) it organizes metadata in a radix tree instead of a balanced tree to avoid unnecessary cache line movement; 2) it uses a novel memory-efficient distributed reference counting scheme; and 3) it uses a new scheme to target remote TLB shootdowns and to often avoid them altogether. Experiments on an 80 core machine show that RadixVM achieves perfect scalability for non-overlapping regions: if several threads mmap or munmap pages in parallel, they can run completely independently and induce no cache coherence traffic.

References

  1. FreeBSD source code. http://www.freebsd.org/.Google ScholarGoogle Scholar
  2. J. Appavoo, D. da Silva, O. Krieger, M. Auslander, M. Ostrowski, B. Rosenburg, A. Waterland, R. W. Wisniewski, J. Xenidis, M. Stumm, and L. Soares. Experience distributing objects in an SMMP OS. ACM Transactions on Computer Systems, 25(3), Aug. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Baumann, P. Barham, P.-E. Dagand, T. Haris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The Multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), Big Sky, MT, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. L. Black, R. F. Rashid, D. B. Golub, and C. R. Hill. Translation lookaside buffer consistency: a software approach. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 113--122, Boston, MA, Apr. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), San Diego, CA, Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of Linux scalability to many cores. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI), Vancouver, Canada, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. T. Clements, M. F. Kaashoek, and N. Zeldovich. Concurrent address spaces using RCU balanced trees. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), London, UK, Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Corbet. The search for fast, scalable counters, May 2010. http://lwn.net/Articles/170003/.Google ScholarGoogle Scholar
  9. R. Cox, M. F. Kaashoek, and R. Morris. Xv6, a simple Unix-like teaching operating system. http://pdos.csail.mit.edu/6.828/2012/xv6.html.Google ScholarGoogle Scholar
  10. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. DeTreville. Experience with concurrent garbage collectors for Modula-2+. Technical Report 64, DEC Systems Research Center, Nov. 1990.Google ScholarGoogle Scholar
  12. F. Ellen, Y. Lev, V. Luchango, and M. Moir. SNZI: Scalable nonzero indicators. In Proceedings of the 26th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, Portland, OR, Aug. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Evans. A scalable concurrent malloc(3) implementation for FreeBSD. In Proceedings of the BSDCan Conference, Ottawa, Canada, Apr. 2006.Google ScholarGoogle Scholar
  14. B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (OSDI), pages 87--100, New Orleans, LA, Feb. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Ghemawat. TCMalloc: Thread-caching malloc, 2007. http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html.Google ScholarGoogle Scholar
  16. M. Herlihy and N. Shavit. The art of multiprocessor programming. Morgan Kaufmann, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. W. Howard and J. Walpole. Relativistic red-black trees. http://web.cecs.pdx.edu/~walpole/papers/ccpe2011.pdf.Google ScholarGoogle Scholar
  18. P. W. Howard and J. Walpole. Relativistic red-black trees. Technical Report 10-06, Portland State University, Computer Science Department, 2010.Google ScholarGoogle Scholar
  19. ISO. ISO/IEC 14882:2011(E): Information technology -- Programming languages -- C++. ISO, Geneva, Switzerland, 2011.Google ScholarGoogle Scholar
  20. O. Krieger, M. Auslander, B. Rosenburg, R. W. Wisniewski, J. Xenidis, D. da Silva, M. Ostrowski, J. Appavoo, M. Butrico, M. Mergen, A. Waterland, and V. Uhlig. K42: Building a complete operating system. In Proceedings of the ACM EuroSys Conference, Leuven, Belgium, Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Liu and H. Chen. SSMalloc: A low-latency, locality-conscious memory allocator with stable performance scalability. In Proceedings of the 3rd Asia-Pacific Workshop on Systems, Seoul, South Korea, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Mao, R. Morris, and M. F. Kaashoek. Optimizing MapReduce for multicore architectures. Technical Report MIT-CSAIL-TR-2010-020, Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, May 2010.Google ScholarGoogle Scholar
  23. P. McKenney. Hierarchical RCU, Nov. 2008. https://lwn.net/Articles/305782/.Google ScholarGoogle Scholar
  24. Microsoft Corp. Windows research kernel. http://www.microsoft.com/resources/sharedsource/windowsacademic/researchkernelkit.mspx.Google ScholarGoogle Scholar
  25. S. Schneider, C. D. Antonopoulos, and D. S. Nikolopoulos. Scalable locality-conscious multithreaded memory allocation. In Proceedings of the 2006 ACM SIGPLAN International Symposium on Memory Management, Ottawa, Canada, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Tene, B. Iyengar, and M. Wolf. C4: The continuously concurrent compacting collector. SIGPLAN Notices, 46 (11):79--88, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Torvalds et al. Linux source code. http://www.kernel.org/.Google ScholarGoogle Scholar
  28. V. Uhlig. The mechanics of in-kernel synchronization for a scalable microkernel. ACM SIGOPS Operating Systems Review, 41(4):49--58, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Wang. Windows 7 memory management, Nov. 2009. http://download.microsoft.com/download/7/E/7/7E7662CF-CBEA-470B-A97E-CE7CE0D98DC2/mmwin7.pptx.Google ScholarGoogle Scholar
  30. D. Wentzlaff and A. Agarwal. Factored operating systems (fos): The case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review, 43(2):76--85, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RadixVM: scalable address spaces for multithreaded applications

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
                  April 2013
                  401 pages
                  ISBN:9781450319942
                  DOI:10.1145/2465351

                  Copyright © 2013 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 15 April 2013

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  EuroSys '13 Paper Acceptance Rate28of143submissions,20%Overall Acceptance Rate241of1,308submissions,18%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader