ABSTRACT
We present compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space. Any type of code motion on explicitly parallel programs requires a new kind of analysis to ensure that operations reordered on one processor cannot be observed by another. The analysis, based on work by Shasha and Snir, checks for cycles among interfering accesses. We improve the accuracy of their analysis by using additional information from post-wait synchronization, barriers, and locks.
We demonstrate the use of this analysis by optimizing remote access on distributed memory machines. The optimizations include message pipelining, to allow multiple outstanding remote memory operations, conversion of two-way to one-way communication, and elimination of communication through data re-use. The performance improvements are as high as 20-35% for programs running on a CM-5 multiprocessor using the Split-C language as a global address layer.
- 1.S.V. Adve and M. D. Hill. Weak Ordering-A New Definition. In I 7th International Symposium on Computer Architecture, April 1990. Google ScholarDigital Library
- 2.P~. Arpaci, D. Culler, A. Krishnamurthy, S. Steinberg, and K. Yelick. Empirical Evaluation of the CRAY-T3D: A Compiler Perspective. In Internatzonat Symposium on Computer Architect~tre, June 1995. Google ScholarDigital Library
- 3.H. Berryman, J. Saltz, and J. Scroggs. Execution Time Support for Adaptive Scientific Algorithms on Distributed Memory Multiprocessors. Concurrenty: Practice and Experience, June 1991. Google ScholarDigital Library
- 4.D. Callahan and J. Subhlok. Static Analysis of Low-level Synchronization. In A CM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging, May 1988. Google ScholarDigital Library
- 5.W. W. Carlson and J. M. Draper. Distributed Data Access in AC. In A CM Symposzum on Pr~nc,ples and Practice of Parallel Programming, July 1995. Google ScholarDigital Library
- 6.D.E. Culler, A. Dusseau, S. C. Goldstein, A. Krishnamurthy, S. Lumetta, T. yon Eicken, and K. Yelick. Parallel Programming in Split-C. In Supercomp~t~ng '93, Portland, Oregon, November 1993. Google ScholarDigital Library
- 7.K. Gharachorloo, D. Lenoski, J. Laudon, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In 17th International Symposzum on Computer Architecture, 1990. Google ScholarDigital Library
- 8.D. Grunwald and H. Srinivasan. Data flow equations for Explicitly Parallel Programs. In A UM Symposium on Principles and Practices of Parallel Programming, June 1993. Google ScholarDigital Library
- 9.S. Hiranandani, K. Kennedy. and C.-W. Tseng. Compiler 0ptimziations for Fortran D on MIMD Distributed-Memory Machines. In Proceedings of the 199t International Conference on Supercomputing, 1991. Google ScholarDigital Library
- 10.T. Ig. Jeremiassen and S. J. Eggers. Reducing False Sharing on Shared Memory Multiprocessors through Compile Time Data Transformations. In A CM Symposium on Principles and Practice of Parallel Programming, July 1995. Google ScholarDigital Library
- 11.A. Krishnamurthy and K. Yelick. Optimizing Parallel SPMD Programs. In Languages and Compilers for Parallel Computing, 1994. Google ScholarDigital Library
- 12.L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9), September 1979.Google Scholar
- 13.D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor. In 17th International Symposium on Computer Architecture, May 1990. Google ScholarDigital Library
- 14.N. K. Madsen. Divergence Preserving Discrete Surface Integral Methods for Maxwell's Curl Equations Using Non- Orthogonal Unstructured Grids. Technical Report 92.04, RIACS, February 1992.Google Scholar
- 15.S. Midkiff and D. Padua. Issues in the Optimization of Parallel Programs. In International Conference on Parallel Processing- Vol II, 1990.Google Scholar
- 16.S. P. Midkiff, D. Padua, and R. G. Cytron. Compiling Programs with User Parallelism. In Languages and Compilers for Parallel Computing, 1990. Google ScholarDigital Library
- 17.A. Rogers and K. Pingali. Compiling for distributed memory architectures. IEEE Transactions on Parallel and Distributed Systems, March 1994. Google ScholarDigital Library
- 18.D. Shasha and M. Snir. Efficient and Correct Execution of Parallel Programs that Share Memory. ACM Transactions on Programming Languages and Systems, 10(2), April 1988. Google ScholarDigital Library
- 19.J.P. Singh, W. D. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared memory. Computer Architecture News, March 1992. Google ScholarDigital Library
- 20.The SPARC Architecture Manual: Version 8. Spare International, Inc., 1992. Google ScholarDigital Library
- 21.T. yon Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active Messages: a Mechanism for Integrated Communication and Computation. In International Symposium on Computer Architecture, May 1992. Google ScholarDigital Library
Index Terms
- Optimizing parallel programs with explicit synchronization
Recommendations
Optimizing parallel programs with explicit synchronization
We present compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space. Any type of code motion on explicitly parallel programs requires a new kind of analysis to ensure that operations reordered ...
Optimizing the Synchronization Operations in Message Passing Interface One-Sided Communication
One-sided communication in Message Passing Interface (MPI) requires the use of one of three different synchronization mechanisms, which indicate when the one-sided operation can be started and when the operation is completed. Efficient implementation of ...
Compiling data-parallel programs for clusters of SMPs: Research Articles
Compilers for Parallel ComputersClusters of shared-memory multiprocessors (SMPs) have become the most promising parallel computing platforms for scientific computing. However, SMP clusters significantly increase the complexity of user application development when using the low-level ...
Comments