skip to main content
article
Open Access

Efficient and correct execution of parallel programs that share memory

Published:01 April 1988Publication History
Skip Abstract Section

Abstract

In this paper we consider an optimization problem that arises in the execution of parallel programs on shared-memory multiple-instruction-stream, multiple-data-stream (MIMD) computers. A program on such machines consists of many sequential program segments, each executed by a single processor. These segments interact as they access shared variables. Access to memory is asynchronous, and memory accesses are not necessarily executed in the order they were issued. An execution is correct if it is sequentially consistent: It should seem as if all the instructions were executed sequentially, in an order obtained by interleaving the instruction streams of the processors. Sequential consistency can be enforced by delaying each access to shared memory until the previous access of the same processor has terminated. For performance reasons, however, we want to allow several accesses by the same processor to proceed concurrently. Our analysis finds a minimal set of delays that enforces sequential consistency. The analysis extends to interprocessor synchronization constraints and to code where blocks of operations have to execute atomically. We use a conflict graph similar to that used to schedule transactions in distributed databases. Our graph incorporates the order on operations given by the program text, enabling us to do without locks even when database conflict graphs would suggest that locks are necessary. Our work has implications for the design of multiprocessors; it offers new compiler optimization techniques for parallel languages that support shared variables.

References

  1. 1 AHO, A., SETHI, R., AND ULLMAN, J. Compilers Principles, Techniques and Tools. Addison- Wesley, Reading, Mass., 1986. Google ScholarGoogle Scholar
  2. 2 AHO, A. V., GAREY, M. R., AND ULLMAN, J.D. The transitive reduction of a directed graph. SIAM J. Comput. 1, 2 (Dec. 1972), 131-137.Google ScholarGoogle Scholar
  3. 3 ALLEN, J. R., AND KENNEDY, K. Automatic translation of Fortran programs to vector form. COMP TR84-9, Dept. of Computer Science, Rice Univ., Houston, Tex., July 1984.Google ScholarGoogle Scholar
  4. 4 BEERI, C., BERNSTEIN, P. A., AND GOODMAN, N. A model for nested transaction systems. Tech. Rep. CS~86-1, Computer Science Dept., Hebrew Univ., Jerusalem, 1986.Google ScholarGoogle Scholar
  5. 5 BERNSTEIN, P. A., AND GOODMAN, N. Concurrency control in distributed database systems. ACM Comput. Surv. 13, 2 (June 1981), 185-221. Google ScholarGoogle Scholar
  6. 6 BERNSTEIN, P. A., SHIPMAN, D. W., AND ROTHNIE, J. B., JR. Concurrency control in a system for distributed databases (SDD-1). ACM Trans. Database Syst. 5, 1 (Mar. 1980), 18-51. Google ScholarGoogle Scholar
  7. 7 COLLIER, W. Principles of architecture for systems of parallel processes. Tech. Rep. TR00.3100, IBM, T. J. Watson Research Center, Yorktown Heights, N.Y., Mar. 1981.Google ScholarGoogle Scholar
  8. 8 EDLER, J., GOTTLIEB, A., KRUSKAL, C. K., MCAULIFFE, K. P., RUDOLPH, L., SNIR, M., TELLER, P., AND WILSON, J. Issues related to MIMD, shared-memory computers: The NYU UItracomputer approach. In Proceedings of the 12th International Conference on Computer Architecture (Boston, Mass., June). IEEE Press, New York, 1985, pp. 126-135. Google ScholarGoogle Scholar
  9. 9 GOTTLIEB, A., GRISHMAN, a., KRUSKAL, C. K., McAULIFFE, K. P., RUDOLPH, L., AND SNIR, M. The NYU Ultracomputer--Designing a MIMD, shared-memory parallel computer. IEEE Trans. Comput. C-32, 3 (Feb. 1983), 175-189.Google ScholarGoogle Scholar
  10. 10 KUNG, H. T., AND PAPADIMITRIOU, C. H. An optimality theory of concurrency control for databases. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (Boston, May 1979). ACM, New York, 1979, pp. 116-126. Google ScholarGoogle Scholar
  11. 11 LAMPORT, L. Time, clocks and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978), 558-565. Google ScholarGoogle Scholar
  12. 12 LAMPORT, L. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (Sept. 1979), 690-691.Google ScholarGoogle Scholar
  13. 13 LAMPORT, L. On interprocess communication, parts I and II. Dist. Comput. 1 (Jan. 1986), 77-101.Google ScholarGoogle Scholar
  14. 14 LYNCH, N. A., AND FISHER, M.J. Semantics of concurrent computations. Theor. Comput. Sci. 13, i (Jan. 1981), 17-43.Google ScholarGoogle Scholar
  15. 15 PADUA, D. A., AND WOLFE, W. J. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (Dec. 1986), 1184-1201. Google ScholarGoogle Scholar
  16. 16 PFISTER, G. F., BRANTLEY, W. C., GEORGE, D. A., HARVEY, S. L., KLEINFELDER, W. J., McAUUFFE, K. P., MELTON, E. A., NORTON, V. A., AND WEISS, J. The IBM research parallel processor prototype (RP3): Introduction and architecture. In Proceedings of the IEEE 1985 International Conference on Parallel Processing (Boston, June 1985). IEEE Press, New York, 1985, pp. 764-771.Google ScholarGoogle Scholar

Index Terms

  1. Efficient and correct execution of parallel programs that share memory

        Recommendations

        Reviews

        Edward W. Davis

        The title of this paper will attract the attention of many people, but the content will be meaningful to only a select few. The very formal treatment of the subject includes 24 theorems, lemmas, and corollaries and many definitions. The authors provide a good introduction that motivates the work and states the goals. They give an example of the problem they are addressing, which concerns correct access to a shared memory when two program segments in a parallel program update a given variable. The basic goal is “to determine the minimal set of delays that enforce sequential consistency.” This seemingly simple situation leads to much complexity when stated formally. The categories and subject descriptors given by the authors are all related to architectures and processors, but the text is also directed toward compilation. The practicality of their results is brought into question in the conclusion when the authors state that in “an actual implementation of this method . . . one has to find all the minimal cycles in a graph. This requires time exponential in the number of nodes in a general graph.” They comment that the constrained structure of program graphs makes the problem easier. Should you read the paper__?__ Yes, if you are interested in theoretical questions related to the title. The technical content should be good, based on the quality of the journal in which it appeared and the refereeing acknowledged in the paper.

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Programming Languages and Systems
          ACM Transactions on Programming Languages and Systems  Volume 10, Issue 2
          April 1988
          154 pages
          ISSN:0164-0925
          EISSN:1558-4593
          DOI:10.1145/42190
          Issue’s Table of Contents

          Copyright © 1988 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 April 1988
          Published in toplas Volume 10, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader