Abstract
Exploiting the multiprocessors that have recently become ubiquitous requires high-performance and reliable concurrent systems code, for concurrent data structures, operating system kernels, synchronization libraries, compilers, and so on. However, concurrent programming, which is always challenging, is made much more so by two problems. First, real multiprocessors typically do not provide the sequentially consistent memory that is assumed by most work on semantics and verification. Instead, they have relaxed memory models, varying in subtle ways between processor families, in which different hardware threads may have only loosely consistent views of a shared memory. Second, the public vendor architectures, supposedly specifying what programmers can rely on, are often in ambiguous informal prose (a particularly poor medium for loose specifications), leading to widespread confusion.
In this paper we focus on x86 processors. We review several recent Intel and AMD specifications, showing that all contain serious ambiguities, some are arguably too weak to program above, and some are simply unsound with respect to actual hardware. We present a new x86-TSO programmer's model that, to the best of our knowledge, suffers from none of these problems. It is mathematically precise (rigorously defined in HOL4) but can be presented as an intuitive abstract machine which should be widely accessible to working programmers. We illustrate how this can be used to reason about the correctness of a Linux spinlock implementation and describe a general theory of data-race freedom for x86-TSO. This should put x86 multiprocessor system building on a more solid foundation; it should also provide a basis for future work on verification of such systems.
- Linux kernel mailing list, thread "spin_unlock optimization (i386)", 119 messages, Nov. 20--Dec. 7, 1999, http://www.gossamer-threads.com/lists/engine?post=105365;list=linux. Accessed 2009/11/18.Google Scholar
- The SPARC Architecture Manual, V. 8. SPARC International, Inc., 1992. Revision SAV080SI9308. http://www.sparc.org/standards/V8.pdf. Google ScholarDigital Library
- AMD64 Architecture Programmer's Manual (3 vols). Advanced Micro Devices, Sept. 2007. rev. 3.14.Google Scholar
- Intel 64 architecture memory ordering white paper, 2007. Intel Corporation. SKU 318147-001.Google Scholar
- Intel 64 and IA-32 Architectures Software Developer's Manual (5 vols). Intel Corporation, Mar. 2010. rev. 34.Google Scholar
- Adve, S. Gharachorloo, K. Shared memory consistency models: A tutorial. IEEE Comput. 29, 12 (Dec. 1996), 66--76. Google ScholarDigital Library
- Adve, S.V., Boehm, H.-J. Memory models: A case for rethinking parallel languages and hardware. C. ACM, To appear.Google Scholar
- Adve, S.V., Hill, M.D. A unified formalization of four shared-memory models. IEEE Trans. Parallel Distrib. Syst. 4, 6 (1993), 613--624. Google ScholarDigital Library
- Ahamad, M., Neiger, G., Burns, J., Kohli, P., Hutto, P. Causal memory: Definitions, implementation, and programming. Distrib. Comput. 9, 1 (1995), 37--49.Google ScholarDigital Library
- Aspinall, D., Ševčík J. Formalising Java's data race free guarantee. In Proc. TPHOLs, LNCS 4732 (2007), 22--37. Google ScholarDigital Library
- Boehm, H.-J., Adve, S. Foundations of the C++ concurrency memory model. In Proceedings of PLDI (2008). Google ScholarDigital Library
- Boudol, G., Petri, G. Relaxed memory models: An operational approach. In Proceedings of POPL, 2009. Google ScholarDigital Library
- Burckhardt, S., Musuvathi, M. Effective program verification for relaxed memory models. Technical Report MSR-TR-2008-12, Microsoft Research, 2008. Conference version in Proceedings of CAV, LNCS 5123 (2008). Google ScholarDigital Library
- Burckhardt, S., Musuvathi, M., Singh, V. Verifying compiler transformations for concurrent programs, Jan. 2009. Technical report MSR-TR-2008-171.Google Scholar
- Dice, D. Java memory model concerns on Intel and AMD systems. http://blogs.sun.com/dave/entry/java_memory_model_concerns_on, Jan. 2008.Google Scholar
- Friedman, R. Consistency conditions for distributed shared memories. Israel Institute of Technologic, 1994.Google Scholar
- Gharachorloo, K. Memory consistency models for shared-memory multiprocessors. WRL Res. Rep. 95, 9 (1995). Google ScholarDigital Library
- Hangal, S., Vahia, D., Manovit, C., Lu, J.-Y.J., Narayanan, S. TSOtool: A program for verifying memory systems using the memory consistency model. In Proceedings of ISCA (2004), 114--123. Google ScholarDigital Library
- Higham, L., Kawash, J., Verwaal, N. Weak memory consistency models part I: Definitions and comparisons. Technical Report98/612/03, Department of Computer Science, The University of Calgary, January, 1998. Full version of a paper in PDCS 1997.Google Scholar
- The HOL 4 system. http://hol.sourceforge.net/.Google Scholar
- Lamport, L. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (1979), 690--691. Google ScholarDigital Library
- Loewenstein, P.N., Chaudhry, S., Cypher, R., Manovit, C. Multiprocessor memory model verification. In Proceedings of AFM (Automated Formal Methods) (Aug. 2006). FLoC workshop. http://fm.csl.sri.com/AFM06/.Google Scholar
- Luchango, V.M. Memory consistency models for high-performance distributed computing. PhD thesis, MIT, 2001. Google ScholarDigital Library
- Manson, J., Pugh, W., Adve, S. The Java memory model. In Proceedings of POPL (2005). Google ScholarDigital Library
- Milner, R. Communication and Concurrency. Prentice Hall International, 1989. Google ScholarDigital Library
- Owens, S. Reasoning about the implementation of concurrency abstractions on x86-TSO. In Proceedings of ECOOP (2010). To appear. Google ScholarDigital Library
- Owens, S., Sarkar, S., Sewell, P. A better x86 memory model: x86-TSO. In Proceedings of TPHOLs, LNCS 5674 (2009), 391--407. Full version as Technical Report UCAM-CL-TR-745, Univ. of Cambridge. Google ScholarDigital Library
- Park, S., Dill, D.L. An executable specification and verifier for relaxed memory order. IEEE Trans. Comput. 48, 2 (1999), 227--235. Google ScholarDigital Library
- Roy, A., Zeisset, S., Fleckenstein, C.J., Huang, J.C. Fast and generalized polynomial time memory consistency verification. In CAV (2006), 503--516. Google ScholarDigital Library
- Saraswat, V., Jagadeesan, R., Michael, M., von Praun, C. A theory of memory models. In Proceedings of PPoPP (2007). Google ScholarDigital Library
- Sarkar, S., Sewell, P., Zappa Nardelli, F., Owens, S., Ridge, T., Braibant, T., Myreen, M., Alglave, J. The semantics of x86-CC multiprocessor machine code. In Proceedings of POPL 2009 (Jan. 2009). Google ScholarDigital Library
- Sindhu, P.S., Frailong, J.-M., Cekleov, M. Formal specification of memory models. In Scalable Shared Memory Multiprocessors, Kluwer, 1991, 25--42.Google Scholar
- Ševčík, J., Aspinall, D. On validity of program transformations in the Java memory model. In ECOOP (2008), 27--51. Google ScholarDigital Library
Index Terms
- x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors
Recommendations
Fast RMWs for TSO: semantics and implementation
PLDI '13Read-Modify-Write (RMW) instructions are widely used as the building blocks of a variety of higher level synchronization constructs, including locks, barriers, and lock-free data structures. Unfortunately, they are expensive in architectures such as x86 ...
Fast RMWs for TSO: semantics and implementation
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and ImplementationRead-Modify-Write (RMW) instructions are widely used as the building blocks of a variety of higher level synchronization constructs, including locks, barriers, and lock-free data structures. Unfortunately, they are expensive in architectures such as x86 ...
Comments