skip to main content
10.1145/379539.379586acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Blocking and array contraction across arbitrarily nested loops using affine partitioning

Authors Info & Claims
Published:18 June 2001Publication History

ABSTRACT

Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and statement reordering. Algorithms based on affine partitioning have been shown to be effective for parallelization and communication minimization. This paper presents algorithms that improve data locality using affine partitioning.

Blocking and array contraction are two important optimizations that have been shown to be useful for data locality. Blocking creates a set of inner loops so that data brought into the faster levels of the memory hierarchy can be reused. Array contraction reduces an array to a scalar variable and thereby reduces the number of memory operations executed and the memory footprint. Loop transforms are often necessary to make blocking and array contraction possible.

By bringing the full generality of affine partitioning to bear on the problem, our locality algorithm can find more contractable arrays than previously possible. This paper also generalizes the concept of blocking and shows that affine partitioning allows the benefits of blocking be realized in arbitrarily nested loops. Experimental results on a number of benchmarks and a complete multigrid application in aeronautics indicates that affine partitioning is effective in practice.

References

  1. 1.N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000 ACM International Conference on Supercomputing, pages 141-152, May 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In Proceedings of Supercomputing 2000, November 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.G. Aigner, A. Diwan, D. L. Heine, M. S. Lam, D. L. Moore, B. R. Murphy, and C. Sapuntzakis. An overview of the SUIF2 compiler infrastructure. Technical report, Stanford University, 2000.]]Google ScholarGoogle Scholar
  4. 4.D. Bacon, S. Graham, and O. Sharp. Compiler transformations for high-performance computing. Computing Surveys, 26(4):345-420, December 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.D. Bailey and J. Barton. The NAS kernel benchmark program. Technical Report 86711, NASA, 1985.]]Google ScholarGoogle Scholar
  6. 6.S. Carr, K. S. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, October 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.P. Feautrier. Some efficient solutions to the affine scheduling problem, part I, one dimensional time. International Journal of Parallel Programming, 21(5):313- 348, October 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.G. R. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, pages 171-181, August 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.G. H. Golub and C. F. Van Loan. Matrix Computations - 2nd Edition. The John Hopkins University Press, Baltimore, Maryland, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.A. Jameson. Solution of the Euler equations by amultigrid method. Applied Mathematics and Computation, 13:327-356, 1983.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, pages 323-334, July 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, pages 301-320. Springer-Verlag, August 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proceedings of the ACM SIG- PLAN '97 Conference on Programming Language Design and Implementation, pages 346-357, June 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.E. C. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and array contraction in array languages. In Proceedings of the ACM SIGPLAN '98 Conference onProgramming Language Design and Implementation, pages 50-59, June 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.S.-W. Liao. SUIF Explorer: an Interactive and Interprocedural Paralllelizer. PhD thesis, Stanford University, August 2000. Published as CSL-TR-00-807.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing, Rhodes, Greece, June 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In Conference Record of the 24th ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, Paris, France, January 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3-4):445-475, May 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.W. Pugh and E. Rosser. Iteration space slicing for locality. In Proceedings of the Twelfth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.V. Sarkar and G. R. Gao. Optimization of array accesses by collective loop transformations. In Proceedings of the 1991 ACM International Conference on Supercomputing, pages 194-204, June 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.A. Schrijver. Theory of Linear and Integer Programming. Wiley, Chichester, 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN '91 Conference onProgramming Language Design and Implementation, pages 30-44, June 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Blocking and array contraction across arbitrarily nested loops using affine partitioning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
        June 2001
        142 pages
        ISBN:1581133464
        DOI:10.1145/379539

        Copyright © 2001 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 June 2001

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate230of1,014submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader