Article

Blocking and array contraction across arbitrarily nested loops using affine partitioning

Authors:
Amy W. Lim

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
Shih-Wei Liao

Intel Research in Santa Clara, California and Computer Systems Laboratory, Stanford University, Stanford, CA

Intel Research in Santa Clara, California and Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
Monica S. Lam

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programmingJune 2001Pages 103–112https://doi.org/10.1145/379539.379586

Published:18 June 2001Publication History

PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming

Pages 103–112

ABSTRACT

Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and statement reordering. Algorithms based on affine partitioning have been shown to be effective for parallelization and communication minimization. This paper presents algorithms that improve data locality using affine partitioning.

Blocking and array contraction are two important optimizations that have been shown to be useful for data locality. Blocking creates a set of inner loops so that data brought into the faster levels of the memory hierarchy can be reused. Array contraction reduces an array to a scalar variable and thereby reduces the number of memory operations executed and the memory footprint. Loop transforms are often necessary to make blocking and array contraction possible.

By bringing the full generality of affine partitioning to bear on the problem, our locality algorithm can find more contractable arrays than previously possible. This paper also generalizes the concept of blocking and shows that affine partitioning allows the benefits of blocking be realized in arbitrarily nested loops. Experimental results on a number of benchmarks and a complete multigrid application in aeronautics indicates that affine partitioning is effective in practice.

References

1.N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000 ACM International Conference on Supercomputing, pages 141-152, May 2000.]] Google ScholarDigital Library
2.N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In Proceedings of Supercomputing 2000, November 2000.]] Google ScholarDigital Library
3.G. Aigner, A. Diwan, D. L. Heine, M. S. Lam, D. L. Moore, B. R. Murphy, and C. Sapuntzakis. An overview of the SUIF2 compiler infrastructure. Technical report, Stanford University, 2000.]]Google Scholar
4.D. Bacon, S. Graham, and O. Sharp. Compiler transformations for high-performance computing. Computing Surveys, 26(4):345-420, December 1994.]] Google ScholarDigital Library
5.D. Bailey and J. Barton. The NAS kernel benchmark program. Technical Report 86711, NASA, 1985.]]Google Scholar
6.S. Carr, K. S. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 252-262, October 1994.]] Google ScholarDigital Library
7.P. Feautrier. Some efficient solutions to the affine scheduling problem, part I, one dimensional time. International Journal of Parallel Programming, 21(5):313- 348, October 1992.]] Google ScholarDigital Library
8.G. R. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, pages 171-181, August 1992.]] Google ScholarDigital Library
9.G. H. Golub and C. F. Van Loan. Matrix Computations - 2nd Edition. The John Hopkins University Press, Baltimore, Maryland, 1989.]] Google ScholarDigital Library
10.A. Jameson. Solution of the Euler equations by amultigrid method. Applied Mathematics and Computation, 13:327-356, 1983.]]Google ScholarDigital Library
11.K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 1992 ACM International Conference on Supercomputing, pages 323-334, July 1992.]] Google ScholarDigital Library
12.K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, pages 301-320. Springer-Verlag, August 1993.]] Google ScholarDigital Library
13.I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proceedings of the ACM SIG- PLAN '97 Conference on Programming Language Design and Implementation, pages 346-357, June 1997.]] Google ScholarDigital Library
14.E. C. Lewis, C. Lin, and L. Snyder. The implementation and evaluation of fusion and array contraction in array languages. In Proceedings of the ACM SIGPLAN '98 Conference onProgramming Language Design and Implementation, pages 50-59, June 1998.]] Google ScholarDigital Library
15.S.-W. Liao. SUIF Explorer: an Interactive and Interprocedural Paralllelizer. PhD thesis, Stanford University, August 2000. Published as CSL-TR-00-807.]] Google ScholarDigital Library
16.A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing, Rhodes, Greece, June 1999.]] Google ScholarDigital Library
17.A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In Conference Record of the 24th ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, Paris, France, January 1997.]] Google ScholarDigital Library
18.A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3-4):445-475, May 1998.]] Google ScholarDigital Library
19.W. Pugh and E. Rosser. Iteration space slicing for locality. In Proceedings of the Twelfth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, August 1999.]] Google ScholarDigital Library
20.V. Sarkar and G. R. Gao. Optimization of array accesses by collective loop transformations. In Proceedings of the 1991 ACM International Conference on Supercomputing, pages 194-204, June 1991.]] Google ScholarDigital Library
21.A. Schrijver. Theory of Linear and Integer Programming. Wiley, Chichester, 1986.]] Google ScholarDigital Library
22.M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.]] Google ScholarDigital Library
23.M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN '91 Conference onProgramming Language Design and Implementation, pages 30-44, June 1991.]] Google ScholarDigital Library
24.M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.]] Google ScholarDigital Library

Index Terms

Blocking and array contraction across arbitrarily nested loops using affine partitioning
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Blocking and array contraction across arbitrarily nested loops using affine partitioning

Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and ...
Read More
Affine and unimodular transformations for non-uniform nested loops
ICCOMP'08: Proceedings of the 12th WSEAS international conference on Computers

Performance improvement in the modern parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. Parallelizing and ...
Read More
Affine-by-Statement Transformations of Imperfectly Nested Loops
IPPS '96: Proceedings of the 10th International Parallel Processing Symposium

A majority of loop restructuring techniques developed so far assume that loops are perfectly nested. The unimodular approach unifies three individual transformations -- loop interchange, skewing and reversal -- but is still limited to perfect loop ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
June 2001
142 pages
ISBN:1581133464
DOI:10.1145/379539
Chairmen:
Michael Heath
Univ. of Illinois, Illinois, IN
,
Andrew Lumsdaine
Indiana Univ.
ACM SIGPLAN Notices Volume 36, Issue 7
July 2001
143 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/568014
Issue’s Table of Contents
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate230of1,014submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 785
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming

ABSTRACT

References

Cited By

Index Terms

Recommendations

Blocking and array contraction across arbitrarily nested loops using affine partitioning

Affine and unimodular transformations for non-uniform nested loops

Affine-by-Statement Transformations of Imperfectly Nested Loops