Unifying data and control transformations for distributed shared-memory machines

Authors:
Michał Cierniak

Department of Computer Science, University of Rochester, Rochester, NY

Department of Computer Science, University of Rochester, Rochester, NY
View Profile

,
Wei Li

Department of Computer Science, University of Rochester, Rochester, NY

Department of Computer Science, University of Rochester, Rochester, NY
View Profile

PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementationJune 1995Pages 205–217https://doi.org/10.1145/207110.207145

Published:01 June 1995Publication History

PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation

Pages 205–217

ABSTRACT

We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations involve changing the execution order of programs. We have developed new techniques for compiler optimizations for distributed shared-memory machines, although the same techniques can be used for sequential machines with a memory hierarchy.

Our compiler optimizations are based on an algebraic representation of data mappings and a new data locality model. We present a pure data transformation algorithm and an algorithm unifying data and control transformations. While there has been much work on control transformations, the opportunities for data transformations have been largely neglected. In fact, data transformations have the advantage of being applicable to programs that cannot be optimized with control transformations. The unified algorithm, which performs data and control transformations simultaneously, offers improvement over optimizations obtained by applying data and control transformations separately.

The experimental results using a set of applications on a parallel machine show that the new optimizations improve performance significantly. These results are further analyzed using locality metrics with instrumentation and simulation.

References

1.J. M. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. Proceedings of ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993. Google ScholarDigital Library
2.V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. An interactive environment for data partitioning and distribution. In Proc. 5th Distributed Memory Comput. Conf., April 1990.Google ScholarCross Ref
3.D. Bau, i. Kodukula, V. Kotlyar, K. Pingali, and P. StodghilI. Solving alignment using elementary linear algebra. Proceedings of the Seventh Annual Workshop on Languages and Compilers for Parallel Computing, 1994. Google ScholarDigital Library
4.R. Bianchini and T. J. LeBlanc. Software caching on cache-coherent multiprocessors. In Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pages 521-526, December 1992.Google ScholarDigital Library
5.William J. Bolosky, Robert P. Fitzgerald, and Michael L. Scott. Simple but effective techniques for NUMA memory management. In Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, pages 19- 31, Litchfield Park, AZ, December 1989. Google ScholarDigital Library
6.William J. Bolosky and Michael L. Scott. False sharing and its effect on shared memory performance. In Proceedings of fhe Fourth Usenix Symposium on Experiences with Distributed and Multiprocessor Systems, pages 57-71, San Diego, CA, September 1993. Also available as MSR-TR-93-1, Microsoft Research Laboratory, September 1993.Google Scholar
7.S. Carr, K. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality, in Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, October 1994. Google ScholarDigital Library
8.J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Implementation and performance of Munin. in Proceedings of the Thirteenth ACM Symposium on Operating .Systems Principles, pages 152-164, Pacific Grove, CA, October 1991. Google ScholarDigital Library
9.S. Chatterjee, J. R. Gilbert, R. Schreiber, and S. 71:ng. Automatic array alignment in data-parallel programs. Proceedings of ACM Symposium on Principles of Programming Languages, 20, 1993. Google ScholarDigital Library
10.Alan L. Cox and Robert J. Fowler. The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with PLATINUM. In Proceed~!ngs of the Twelfth ACM Symposium on Operating Systems Principles, pages 32--44, Litchfield Park, AZ, December 1989. Google Scholar
11.M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenstr6m. The detection and elimination of useless misses in multiprocessors. In Proceedings of the International Symposium on Computer Architecture, pages 88-97, San Diego, CA, May 1993. Google ScholarDigital Library
12.S. J. Eggers and T E. Jeremiassen. Eliminating false sharing. In Proc. 1991 International Conference on Parallel Processing, 1991.Google Scholar
13.C. Eisenbeis, W. Jalby, D. Windheiser, and F. Bodin. A strategy for array management in local memory. In Proc. 3th Annual Workshop on Languages and Compilers, August 1990.Google Scholar
14.J. Ferrante, V. Sarkar, and W. Thrash. On estimating and exchange cache effectiveness. In Proceeding.r of the Fourth Workshop on Languages and Compilers for Parallel Computing, August 1991. Google ScholarDigital Library
15.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. Journal of Parallel and Distributed Computing, 5:587-616, 1988. Google ScholarDigital Library
16.M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, 3:179-193, 1992. Google ScholarDigital Library
17.J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1990. Google ScholarDigital Library
18.D. Hudak and S. Abraham. Compiler techniques for data partitioning of sequentially iterated parallel loops. In Proc. ACM Int. Conf. Supercomputing, June 19913. Google ScholarDigital Library
19.K. Knobe, J. Lukas, and G. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102-118, February 1990. Google ScholarDigital Library
20.J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. Technical report, Yale University, 1989.Google Scholar
21.K. Li and P. Hudak. Memory coherence in shared virtual memory systems. A CM Transactions on Computer Systems, November 1989. Google ScholarDigital Library
22.W. Li and K. Pingali. Access normalization: Loop restructuring for NUMA compilers. A CM Transactions on Computer Systems, 11 (4):353-375, November 1993. Google ScholarDigital Library
23.W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. International Journal of Parallel Programming, 22(2), April 1994. Google ScholarDigital Library
24.Wei Li. Compiler cache optimizations for banded matrix problems. In 9th ACM International Conference on Supercomputing, July 1995. Google ScholarDigital Library
25.D. J. Lilja. Cache coherence in large-scale sharedmemory multiprocessors: Issues and comparisons. Computing Surveys, 25(3):303-338, September 1993. Google ScholarDigital Library
26.A. Nagurney, C. F. Nicholson, and P. M. Bishop. Spatial price equilibrium models with discriminatory ad valorem tariffs: formulation and comparative computation using variational inequalities. In Recent Advances in Spatial Equilibrium Modeling: Methodology and Applications. Springer-Verlag, Heidelberg, 1995. forthcoming.Google Scholar
27.B. Nitzberg and V. Lo. Distributed shared memory: A survey of issues and algorithms. Computer, 24(8):52- 60, August 1991. Google ScholarDigital Library
28.A. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989. Google ScholarDigital Library
29.R. P. LaRowe, Jr. and C. S. Ellis. Experimental comparison of memory management policies for NUMA multiprocessors. A CM Transactions on Computer Systems, 9(4):319-363, November 1991. Google ScholarDigital Library
30.J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, 2, October 1991. Google ScholarDigital Library
31.Jack E. Veenstra and Robert J. Fowler. Mint: a front end for efficient simulation of shared-memory multiprocessors. Proceedings of the Second International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS '94), pages 201-207, January 1994. Google ScholarDigital Library
32.M. Wolf and M. Lam. A data locality optimizing algorithm. In Proc. ACM SIGPLAN 91 Conference on Programming Language Design and Implementation, pages 30-44, June 1991. Google ScholarDigital Library

Index Terms

Unifying data and control transformations for distributed shared-memory machines

Recommendations

Unifying data and control transformations for distributed shared-memory machines

We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations involve changing the execution order of programs. We ...
Read More
Unifying Data and Control Transformations for Distributed Shared Memory Machines
Read More
Scalable directory architecture for distributed shared memory chip multiprocessors

Traditional Directory-based cache coherence protocol is far from optimal for large-scale cache coherent shared memory multiprocessors due to the increasing latency to access directories stored in DRAM memory. Instead of keeping directories in main ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
June 1995
335 pages
ISBN:0897916972
DOI:10.1145/207110
Chairman:
David W. Wall
Digital—Western Research Laboratory
ACM SIGPLAN Notices Volume 30, Issue 6
June 1995
327 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/223428
Editors:
Brent T. Hailpern
IBM T. J. Watson Research Center, Yorktown Heoghts, NY
,
Richard L. Wexelblat
IRS IS:AO, Washington, DC
Issue’s Table of Contents
Copyright © 1995 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1995
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
PLDI '95 Paper Acceptance Rate28of105submissions,27%Overall Acceptance Rate406of2,067submissions,20%
More
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 201
  Total Citations
  View Citations
- 555
  Total Downloads
- Downloads (Last 12 months)46
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Unifying data and control transformations for distributed shared-memory machines

PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unifying data and control transformations for distributed shared-memory machines

Unifying Data and Control Transformations for Distributed Shared Memory Machines

Scalable directory architecture for distributed shared memory chip multiprocessors