Abstract
In this paper we consider an optimization problem that arises in the execution of parallel programs on shared-memory multiple-instruction-stream, multiple-data-stream (MIMD) computers. A program on such machines consists of many sequential program segments, each executed by a single processor. These segments interact as they access shared variables. Access to memory is asynchronous, and memory accesses are not necessarily executed in the order they were issued. An execution is correct if it is sequentially consistent: It should seem as if all the instructions were executed sequentially, in an order obtained by interleaving the instruction streams of the processors. Sequential consistency can be enforced by delaying each access to shared memory until the previous access of the same processor has terminated. For performance reasons, however, we want to allow several accesses by the same processor to proceed concurrently. Our analysis finds a minimal set of delays that enforces sequential consistency. The analysis extends to interprocessor synchronization constraints and to code where blocks of operations have to execute atomically. We use a conflict graph similar to that used to schedule transactions in distributed databases. Our graph incorporates the order on operations given by the program text, enabling us to do without locks even when database conflict graphs would suggest that locks are necessary. Our work has implications for the design of multiprocessors; it offers new compiler optimization techniques for parallel languages that support shared variables.
- 1 AHO, A., SETHI, R., AND ULLMAN, J. Compilers Principles, Techniques and Tools. Addison- Wesley, Reading, Mass., 1986. Google Scholar
- 2 AHO, A. V., GAREY, M. R., AND ULLMAN, J.D. The transitive reduction of a directed graph. SIAM J. Comput. 1, 2 (Dec. 1972), 131-137.Google Scholar
- 3 ALLEN, J. R., AND KENNEDY, K. Automatic translation of Fortran programs to vector form. COMP TR84-9, Dept. of Computer Science, Rice Univ., Houston, Tex., July 1984.Google Scholar
- 4 BEERI, C., BERNSTEIN, P. A., AND GOODMAN, N. A model for nested transaction systems. Tech. Rep. CS~86-1, Computer Science Dept., Hebrew Univ., Jerusalem, 1986.Google Scholar
- 5 BERNSTEIN, P. A., AND GOODMAN, N. Concurrency control in distributed database systems. ACM Comput. Surv. 13, 2 (June 1981), 185-221. Google Scholar
- 6 BERNSTEIN, P. A., SHIPMAN, D. W., AND ROTHNIE, J. B., JR. Concurrency control in a system for distributed databases (SDD-1). ACM Trans. Database Syst. 5, 1 (Mar. 1980), 18-51. Google Scholar
- 7 COLLIER, W. Principles of architecture for systems of parallel processes. Tech. Rep. TR00.3100, IBM, T. J. Watson Research Center, Yorktown Heights, N.Y., Mar. 1981.Google Scholar
- 8 EDLER, J., GOTTLIEB, A., KRUSKAL, C. K., MCAULIFFE, K. P., RUDOLPH, L., SNIR, M., TELLER, P., AND WILSON, J. Issues related to MIMD, shared-memory computers: The NYU UItracomputer approach. In Proceedings of the 12th International Conference on Computer Architecture (Boston, Mass., June). IEEE Press, New York, 1985, pp. 126-135. Google Scholar
- 9 GOTTLIEB, A., GRISHMAN, a., KRUSKAL, C. K., McAULIFFE, K. P., RUDOLPH, L., AND SNIR, M. The NYU Ultracomputer--Designing a MIMD, shared-memory parallel computer. IEEE Trans. Comput. C-32, 3 (Feb. 1983), 175-189.Google Scholar
- 10 KUNG, H. T., AND PAPADIMITRIOU, C. H. An optimality theory of concurrency control for databases. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Boston, May 1979). ACM, New York, 1979, pp. 116-126. Google Scholar
- 11 LAMPORT, L. Time, clocks and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978), 558-565. Google Scholar
- 12 LAMPORT, L. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (Sept. 1979), 690-691.Google Scholar
- 13 LAMPORT, L. On interprocess communication, parts I and II. Dist. Comput. 1 (Jan. 1986), 77-101.Google Scholar
- 14 LYNCH, N. A., AND FISHER, M.J. Semantics of concurrent computations. Theor. Comput. Sci. 13, i (Jan. 1981), 17-43.Google Scholar
- 15 PADUA, D. A., AND WOLFE, W. J. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (Dec. 1986), 1184-1201. Google Scholar
- 16 PFISTER, G. F., BRANTLEY, W. C., GEORGE, D. A., HARVEY, S. L., KLEINFELDER, W. J., McAUUFFE, K. P., MELTON, E. A., NORTON, V. A., AND WEISS, J. The IBM research parallel processor prototype (RP3): Introduction and architecture. In Proceedings of the IEEE 1985 International Conference on Parallel Processing (Boston, June 1985). IEEE Press, New York, 1985, pp. 764-771.Google Scholar
Index Terms
- Efficient and correct execution of parallel programs that share memory
Recommendations
Correct Execution of Transactions at Different Isolation Levels
Many transaction processing applications execute at isolation levels lower than SERIALIZABLE in order to increase throughput and reduce response time. However, the resulting schedules might not be serializable and, hence, not necessarily correct. The ...
From lock to correct and efficient software transactional memory
INTERACT-14: Proceedings of the 2010 Workshop on Interaction between Compilers and Computer ArchitectureTransactional memory solves many problems in lock-based parallel programs. Unfortunately, the semantics of transactions are different from those of critical sections defined by locks. The semantic differences make it difficult to correctly port existing ...
Transactional lock-free execution of lock-based programs
Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating SystemsThis paper is motivated by the difficulty in writing correct high-performance programs. Writing shared-memory multi-threaded programs imposes a complex trade-off between programming ease and performance, largely due to subtleties in coordinating access ...
Comments