skip to main content
10.1145/207110.207145acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free Access

Unifying data and control transformations for distributed shared-memory machines

Authors Info & Claims
Published:01 June 1995Publication History

ABSTRACT

We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations involve changing the execution order of programs. We have developed new techniques for compiler optimizations for distributed shared-memory machines, although the same techniques can be used for sequential machines with a memory hierarchy.

Our compiler optimizations are based on an algebraic representation of data mappings and a new data locality model. We present a pure data transformation algorithm and an algorithm unifying data and control transformations. While there has been much work on control transformations, the opportunities for data transformations have been largely neglected. In fact, data transformations have the advantage of being applicable to programs that cannot be optimized with control transformations. The unified algorithm, which performs data and control transformations simultaneously, offers improvement over optimizations obtained by applying data and control transformations separately.

The experimental results using a set of applications on a parallel machine show that the new optimizations improve performance significantly. These results are further analyzed using locality metrics with instrumentation and simulation.

References

  1. 1.J. M. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. Proceedings of ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. An interactive environment for data partitioning and distribution. In Proc. 5th Distributed Memory Comput. Conf., April 1990.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3.D. Bau, i. Kodukula, V. Kotlyar, K. Pingali, and P. StodghilI. Solving alignment using elementary linear algebra. Proceedings of the Seventh Annual Workshop on Languages and Compilers for Parallel Computing, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.R. Bianchini and T. J. LeBlanc. Software caching on cache-coherent multiprocessors. In Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pages 521-526, December 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.William J. Bolosky, Robert P. Fitzgerald, and Michael L. Scott. Simple but effective techniques for NUMA memory management. In Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, pages 19- 31, Litchfield Park, AZ, December 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.William J. Bolosky and Michael L. Scott. False sharing and its effect on shared memory performance. In Proceedings of fhe Fourth Usenix Symposium on Experiences with Distributed and Multiprocessor Systems, pages 57-71, San Diego, CA, September 1993. Also available as MSR-TR-93-1, Microsoft Research Laboratory, September 1993.Google ScholarGoogle Scholar
  7. 7.S. Carr, K. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality, in Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, October 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Implementation and performance of Munin. in Proceedings of the Thirteenth ACM Symposium on Operating .Systems Principles, pages 152-164, Pacific Grove, CA, October 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.S. Chatterjee, J. R. Gilbert, R. Schreiber, and S. 71:ng. Automatic array alignment in data-parallel programs. Proceedings of ACM Symposium on Principles of Programming Languages, 20, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Alan L. Cox and Robert J. Fowler. The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with PLATINUM. In Proceed~!ngs of the Twelfth ACM Symposium on Operating Systems Principles, pages 32--44, Litchfield Park, AZ, December 1989. Google ScholarGoogle Scholar
  11. 11.M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenstr6m. The detection and elimination of useless misses in multiprocessors. In Proceedings of the International Symposium on Computer Architecture, pages 88-97, San Diego, CA, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.S. J. Eggers and T E. Jeremiassen. Eliminating false sharing. In Proc. 1991 International Conference on Parallel Processing, 1991.Google ScholarGoogle Scholar
  13. 13.C. Eisenbeis, W. Jalby, D. Windheiser, and F. Bodin. A strategy for array management in local memory. In Proc. 3th Annual Workshop on Languages and Compilers, August 1990.Google ScholarGoogle Scholar
  14. 14.J. Ferrante, V. Sarkar, and W. Thrash. On estimating and exchange cache effectiveness. In Proceeding.r of the Fourth Workshop on Languages and Compilers for Parallel Computing, August 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. Journal of Parallel and Distributed Computing, 5:587-616, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, 3:179-193, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.J. Hennessy and D. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.D. Hudak and S. Abraham. Compiler techniques for data partitioning of sequentially iterated parallel loops. In Proc. ACM Int. Conf. Supercomputing, June 19913. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.K. Knobe, J. Lukas, and G. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8:102-118, February 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.J. Li and M. Chen. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. Technical report, Yale University, 1989.Google ScholarGoogle Scholar
  21. 21.K. Li and P. Hudak. Memory coherence in shared virtual memory systems. A CM Transactions on Computer Systems, November 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.W. Li and K. Pingali. Access normalization: Loop restructuring for NUMA compilers. A CM Transactions on Computer Systems, 11 (4):353-375, November 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. International Journal of Parallel Programming, 22(2), April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.Wei Li. Compiler cache optimizations for banded matrix problems. In 9th ACM International Conference on Supercomputing, July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.D. J. Lilja. Cache coherence in large-scale sharedmemory multiprocessors: Issues and comparisons. Computing Surveys, 25(3):303-338, September 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.A. Nagurney, C. F. Nicholson, and P. M. Bishop. Spatial price equilibrium models with discriminatory ad valorem tariffs: formulation and comparative computation using variational inequalities. In Recent Advances in Spatial Equilibrium Modeling: Methodology and Applications. Springer-Verlag, Heidelberg, 1995. forthcoming.Google ScholarGoogle Scholar
  27. 27.B. Nitzberg and V. Lo. Distributed shared memory: A survey of issues and algorithms. Computer, 24(8):52- 60, August 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.A. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.R. P. LaRowe, Jr. and C. S. Ellis. Experimental comparison of memory management policies for NUMA multiprocessors. A CM Transactions on Computer Systems, 9(4):319-363, November 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30.J. Ramanujam and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, 2, October 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31.Jack E. Veenstra and Robert J. Fowler. Mint: a front end for efficient simulation of shared-memory multiprocessors. Proceedings of the Second International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS '94), pages 201-207, January 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.M. Wolf and M. Lam. A data locality optimizing algorithm. In Proc. ACM SIGPLAN 91 Conference on Programming Language Design and Implementation, pages 30-44, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Unifying data and control transformations for distributed shared-memory machines

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PLDI '95: Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
            June 1995
            335 pages
            ISBN:0897916972
            DOI:10.1145/207110

            Copyright © 1995 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 June 1995

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            PLDI '95 Paper Acceptance Rate28of105submissions,27%Overall Acceptance Rate406of2,067submissions,20%

            Upcoming Conference

            PLDI '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader