ABSTRACT
This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket -- uniform and non-uniform -- to multi-socket -- directory and broadcast-based -- many-cores. We draw a set of observations that, roughly speaking, imply that scalability of synchronization is mainly a property of the hardware.
Supplemental Material
- J. Abellan, J. Fernandez, and M. Acacio. GLocks: Efficient support for highly-contended locks in Many-Core CMPs. IPDPS 2011, pages 893--905. Google ScholarDigital Library
- AMD. Software optimization guide for AMD family 10h and 12h processors. 2011.Google Scholar
- G. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. AFIPS 1967 (Spring), pages 483--485. Google ScholarDigital Library
- T. E. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE TPDS, 1(1):6--16, 1990. Google ScholarDigital Library
- H. Attiya, A. Bar-Noy, and D. Dolev. Sharing memory robustly in message-passing systems. PODC 1990, pages 363--375. Google ScholarDigital Library
- A. Baumann, P. Barham, P. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. SOSP 2009, pages 29--44. Google ScholarDigital Library
- L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, and A. Vainshtein. Optimal strategies for spinning and blocking. J. Parallel Distrib. Comput., 21(2):246--254, 1994. Google ScholarDigital Library
- S. Borkar. Design challenges of technology scaling. Micro, IEEE, 19(4):23--29, 1999. Google ScholarDigital Library
- S. Borkar and A. Chien. The future of microprocessors. Communications of the ACM, 54(5):67--77, 2011. Google ScholarDigital Library
- S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: an operating system for many cores. OSDI 2008, pages 43--57. Google ScholarDigital Library
- S. Boyd-Wickizer, A. Clements, Y. Mao, A. Pesterev, M. Kaashoek, R. Morris, and N. Zeldovich. An analysis of Linux scalability to many cores. In OSDI 2010, pages 1--16. Google ScholarDigital Library
- S. Boyd-Wickizer, M. Kaashoek, R. Morris, and N. Zeldovich. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium, 2012.Google Scholar
- P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache hierarchy and memory subsystem of the AMD Opteron processor. Micro, IEEE, 30(2):16--29, 2010. Google ScholarDigital Library
- D. Dice, V. Marathe, and N. Shavit. Lock cohorting: a general technique for designing numa locks. PPoPP 2012, pages 247--256. Google ScholarDigital Library
- B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. OSDI 1999, pages 87--100. Google ScholarDigital Library
- V. Gramoli, R. Guerraoui, and V. Trigonakis. TM2C: a software transactional memory for many-cores. EuroSys 2012, pages 351--364. Google ScholarDigital Library
- D. Hackenberg, D. Molka, and W. Nagel. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems. MICRO 2009, pages 413--422. Google ScholarDigital Library
- D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. SPAA 2010, pages 355--364. ACM. Google ScholarDigital Library
- M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, 1991. Google ScholarDigital Library
- M. Herlihy and N. Shavit. The art of multiprocessor programming, revised first edition. 2012. Google ScholarDigital Library
- M. Hill and M. Marty. Amdahl's law in the multi-core era. Computer, 41(7):33--38, 2008. Google ScholarDigital Library
- Intel. An introduction to the Intel QuickPath interconnect. 2009.Google Scholar
- Intel. Intel 64 and IA-32 architectures software developer's manual. 2013.Google Scholar
- Intel. Transactional Synchronization Extensions Overview. 2013.Google Scholar
- libmemcached. http://libmemcached.org/libMemcached.html.Google Scholar
- J. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. USENIX ATC 2012. Google ScholarDigital Library
- V. Luchangco, D. Nussbaum, and N. Shavit. A hierarchical CLH queue lock. ICPP 2006, pages 801--810. Google ScholarDigital Library
- J. Mellor-Crummey and M. Scott. Synchronization without contention. ASPLOS 1991, pages 269--278. Google ScholarDigital Library
- J. Mellor-Crummey and M. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM TOCS, 1991. Google ScholarDigital Library
- Memcached. http://www.memcached.org.Google Scholar
- M. Michael and M. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. PODC 1996, pages 267--275. Google ScholarDigital Library
- S. Microsystems. UltraSPARC T2 supplement to the UltraSPARC architecture. 2007.Google Scholar
- D. Molka, R. Schöne, D. Hackenberg, and M. Müller. Memory performance and SPEC OpenMP scalability on quad-socket x86 64 systems. ICA3PP 2011, pages 170--181. Google ScholarDigital Library
- MonetDB. http://www.monetdb.org/.Google Scholar
- J. Moses, R. Illikkal, L. Zhao, S. Makineni, and D. Newell. Effects of locking and synchronization on future large scale CMP platforms. CAECW 2006.Google Scholar
- U. Nawathe, M. Hassan, K. Yen, A. Kumar, A. Ramachandran, and D. Greenhill. Implementation of an 8-Core, 64-thread, power-efficient SPARC server on a chip. Solid-State Circuits, IEEE Journal of, 43(1):6--20, 2008.Google Scholar
- M. Papamarcos and J. Patel. A low-overhead coherence solution for multiprocessors with private cache memories. ISCA 1984, pages 348--354.Google ScholarDigital Library
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. HPCA 2007, pages 13--24. Google ScholarDigital Library
- M. Schroeder and M. Burrows. Performance of Firefly RPC. In SOSP 1989, pages 83--90. Google ScholarDigital Library
- M. Scott and W. Scherer. Scalable queue-based spin locks with timeout. PPoPP 2001, pages 44--52. Google ScholarDigital Library
- Tilera tile-gx. http://www.tilera.com/products/processors/TILE-Gx_Family.Google Scholar
- TPC-H. http://www.tpc.org/tpch/.Google Scholar
- C. T. S. Building FIFO and priority-queuing spin locks from atomic swap. Technical report, 1993.Google Scholar
- J. Tseng, H. Yu, S. Nagar, N. Dubey, H. Franke, P. Pattnaik, H. Inoue, and T. Nakatani. Performance studies of commercial workloads on a multi-core system. IISWC 2007, pages 57--65. Google ScholarDigital Library
- D. Wentzlaff and A. Agarwal. Factored operating systems (fos): the case for a scalable operating system for multicores. OSR, 43(2):76--85, 2009. Google ScholarDigital Library
Recommendations
Everything you wanted to know about the running time of Mergesort but were afraid to ask
Although mergesort is an algorithm that is frequently glossed over in textbooks, it provides fertile ground for planting ideas about algorithm analysis in the minds of students. Why can we assume that n is a power of 2? How big is the increase in ...
Everything you always wanted to know about planning (but were afraid to ask)
KI'11: Proceedings of the 34th Annual German conference on Advances in artificial intelligenceDomain-independent planning is one of the long-standing sub-areas of Artificial Intelligence (AI), aiming at approaching human problem-solving flexibility. The area has long had an affinity towards playful illustrative examples, imprinting it on the ...
Something I Always Wanted to Know About Test, But Was Afraid to Ask
ETS '09: Proceedings of the 2009 European Test Symposium
Comments