research-article

A practical automatic polyhedral parallelizer and locality optimizer

Authors:
Uday Bondhugula

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA
View Profile

,
Albert Hartono

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA
View Profile

,
J. Ramanujam

Louisiana State University, Baton Rouge, LA, USA

Louisiana State University, Baton Rouge, LA, USA
View Profile

,
P. Sadayappan

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 43 Issue 6June 2008pp 101–113https://doi.org/10.1145/1379022.1375595

Published:07 June 2008Publication History

ACM SIGPLAN Notices

Abstract

We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model -- far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.

References

PLuTo: A polyhedral automatic parallelizer and locality optimizer for multicores. http://pluto-compiler.sourceforge.net.Google Scholar
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loops. IJPP, 29(5), Oct. 2001. Google ScholarDigital Library
R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491--542, 1987. Google ScholarDigital Library
C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In PPoPP'91, pages 39--50, 1991. Google ScholarDigital Library
R. Andonov, S. Balev, S. Rajopadhye, and N. Yanev. Optimal semi-oblique tiling. IEEE Trans. Par. & Dist. Sys., 14(9):944--960, 2003. Google ScholarDigital Library
C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE PACT, pages 7--16, Sept. 2004. Google ScholarDigital Library
C. Bastoul and P. Feautrier. Improving data locality by chunking. In Intl. Conf. on Compiler Construction (ETAPS CC), pages 320--335, Warsaw, Apr. 2003. Google ScholarDigital Library
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Affine transformations for communication minimal parallelization and locality optimization of arbitrarily-nested loop sequences. Technical Report OSU-CISRC-5/07-TR43, The Ohio State University, May 2007.Google Scholar
U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Intl. Conf. on Compiler Construction (ETAPS CC), Apr. 2008. Google ScholarDigital Library
U. Bondhugula, J. Ramanujam, and P. Sadayappan. Pluto: A practical and fully automatic polyhedral parallelizer and locality optimizer. Technical Report OSU-CISRC-10/07-TR70, The Ohio State University, Oct. 2007.Google Scholar
P. Boulet, A. Darte, T. Risset, and Y. Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17(1):33--51, 1994. Google ScholarDigital Library
P. Boulet, A. Darte, G.-A. Silber, and F. Vivien. Loop parallelization algorithms: From parallelism extraction to code generation. Parallel Computing, 24(3?4):421--444, 1998. Google ScholarDigital Library
CLooG: The Chunky Loop Generator. http://www.cloog.org.Google Scholar
A. Cohen, S. Girbal, D. Parello, M. Sigler, O. Temam, and N. Vasilache. Facilitating the search for compositions of program transformations. In ACM Intl. Conf. on Supercomputing, pages 151--160, June 2005. Google ScholarDigital Library
A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser Boston, 2000. Google ScholarDigital Library
A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379--392, 1997.Google ScholarCross Ref
A. Darte and F. Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. Intl. J. Parallel Programming, 25(6):447--496, Dec. 1997. Google ScholarDigital Library
P. Feautrier. Parametric integer programming. RAIRO Recherche Operationnelle, 22(3):243--268, 1988.Google ScholarCross Ref
P. Feautrier. Dataflow analysis of scalar and array references. Intl. J. of Parallel Programming, 20(1):23--53, Feb. 1991.Google ScholarCross Ref
P. Feautrier. Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Intl. J. of Parallel Programming, 21(5):313--348, 1992. Google ScholarDigital Library
P. Feautrier. Some efficient solutions to the affine scheduling problem. part II. multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, 1992. Google ScholarDigital Library
S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. Intl. J. of Parallel Programming, 34(3):261--317, June 2006. Google ScholarDigital Library
G. Goumas, M. Athanasaki, and N. Koziris. Code Generation Methods for Tiling Transformations. J. of Information Science and Engineering, 18(5):667--691, Sep. 2002.Google Scholar
M. Griebl. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. University of Passau, 2004. Habilitation thesis.Google Scholar
M. Griebl, C. Lengauer, and S. Wetzel. Code generation in the polytope model. In IEEE PACT, pages 106--111, 1998. Google ScholarDigital Library
E. Hodzic and W. Shang. On time optimal supernode shape. IEEE Trans. Par. & Dist. Sys., 13(12):1220--1233, 2002. Google ScholarDigital Library
K. Hogstedt, L. Carter, and J. Ferrante. Selecting tile shape for minimal execution time. In SPAA, pages 201--211, 1999. Google ScholarDigital Library
F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN PoPL, pages 319--329, 1988. Google ScholarDigital Library
S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yellick. Implicit and explicit optimization for stencil computations. In ACM SIGPLAN workshop on Memory Systems Perofmance and Correctness, 2006. Google ScholarDigital Library
W. Kelly andW. Pugh. A unifying framework for iteration reordering transformations. Technical Report CS-TR-3430, Dept. of Computer Science, University of Maryland, College Park, 1995. Google ScholarDigital Library
W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Intl. Symp. on the frontiers of massively parallel computation, pages 332--341, Feb. 1995. Google ScholarDigital Library
D. Kim, L. Renganarayanan, M. Strout, and S. Rajopadhye. Multilevel tiling: ?m? for the price of one. In Supercomputing, 2007. Google ScholarDigital Library
H. LeVerge. A note on chernikova?s algorithm. Technical Report Research report 635, IRISA, Feb. 1992.Google Scholar
W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. Intl. J. of Parallel Programming, 22(2):183--205, 1994. Google ScholarDigital Library
A. Lim, S. Liao, and M. Lam. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In ACM SIGPLAN PPoPP, pages 103--112, 2001. Google ScholarDigital Library
A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ACM Intl. Conf. on Supercomputing, pages 228--237, 1999. Google ScholarDigital Library
A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3-4):445--475, 1998. Google ScholarDigital Library
The LooPo Project - Loop parallelization in the polytope model. http://www.fmi.uni-passau.de/loopo.Google Scholar
B. Norris, A. Hartono, and W. Gropp. Annotations for performance and productivity. 2007. Preprint ANL/MCS-P1392-0107.Google Scholar
R. Penrose. A generalized inverse for matrices. Proceedings of the Cambridge Philosophical Society, 51:406--413, 1955.Google ScholarCross Ref
PIP: The Parametric Integer Programming Library. http://www.piplib.org.Google Scholar
PolyLib - A library of polyhedral functions. http://icps.u-strasbg.fr/polylib/.Google Scholar
S. Pop, A. Cohen, C. Bastoul, S. Girbal, P. Jouvelot, G.-A. Silber, and N. Vasilache. GRAPHITE: Loop optimizations based on the polyhedral model for GCC. In Proc. of the 4th GCC Developper?s summit, Ottawa, Canada, June 2006.Google Scholar
L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen. Iterative optimization in the polyhedral model: Part II, multidimensional time. In PLDI?08, Tucson, Arizona, June 2008. Google ScholarDigital Library
L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In ACM CGO, Mar. 2007. Google ScholarDigital Library
W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. Communications of the ACM, 8:102--114, Aug. 1992. Google ScholarDigital Library
F. Quilleré, S. V. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. Intl. J. of Parallel Programming, 28(5):469--498, 2000. Google ScholarDigital Library
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. JPDC, 16(2):108--230, 1992.Google ScholarCross Ref
L. Renganarayana, D. Kim, S. Rajopadhye, and M. M. Strout. Parameterized tiled loops for free. In PLDI, pages 405--414, 2007. Google ScholarDigital Library
R. Schreiber and J. Dongarra. Automatic blocking of nested loops. Technical report, University of Tennessee, Knoxville, TN, Aug. 1990. Google ScholarDigital Library
A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986. Google ScholarDigital Library
Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In PLDI, pages 215--228, 1999. Google ScholarDigital Library
N. Vasilache. Scalable Program Optimization Techniques in the Polyhedral Model. PhD thesis, Université de Paris-Sud, INRIA, Futurs, Sept. 2007.Google Scholar
N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Intl. Conf. on Compiler Construction (ETAPS CC), pages 185--201, Mar. 2006. Google ScholarDigital Library
N. Vasilache, C. Bastoul, S. Girbal, and A. Cohen. Violated dependence analysis. In ACM ICS, June 2006. Google ScholarDigital Library
R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimizations of Software and the ATLAS Project. Parallel Computing, 2000.Google Scholar
D. K. Wilde. A library for doing polyhedral operations. Technical Report RR-2157, IRISA, 1993.Google Scholar
M. Wolf and M. S. Lam. A data locality optimizing algorithm. In ACM SIGPLAN PLDI ?91, pages 30--44, 1991. Google ScholarDigital Library
M. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst., 2(4):452--471, 1991. Google ScholarDigital Library
J. Xue. Communication-minimal tiling of uniform dependence loops. JPDC, 42(1):42--59, 1997. Google ScholarDigital Library
J. Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000. Google ScholarDigital Library
Q. Yi, K. Kennedy, and V. Adve. Transforming complex loop nests for locality. J. of Supercomputing, 27(3):219--264, 2004. Google ScholarDigital Library
K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. A. Padua, K. Pingali, P. Stodghill, and P.Wu. A comparison of empirical and model-driven optimization. In PLDI?03, pages 63--76, 2003. Google ScholarDigital Library

Index Terms

A practical automatic polyhedral parallelizer and locality optimizer
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

Affine transformations have proven to be powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multidimensional affine function can represent a long and complex sequence of simpler transformations. ...
Read More
Polyhedral parallel code generation for CUDA
Special Issue on High-Performance Embedded Architectures and Compilers

This article addresses the compilation of a sequential program for parallel execution on a modern GPU. To this end, we present a novel source-to-source compiler called PPCG. PPCG singles out for its ability to accelerate computations from any static ...
Read More
A practical automatic polyhedral parallelizer and locality optimizer
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation

We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 43, Issue 6
PLDI '08
June 2008
382 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1379022
Issue’s Table of Contents
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2008
396 pages
ISBN:9781595938602
DOI:10.1145/1375581
General Chair:
Rajiv Gupta
University of California, Riverside, USA
,
Program Chair:
Saman Amarasinghe
Massachusetts Institute of Technology, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 June 2008
Check for updates
Author Tags
affine transformations
automatic parallelization
locality optimization
loop transformations
polyhedral model
tiling
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 868
  Total Citations
  View Citations
- 4,121
  Total Downloads
- Downloads (Last 12 months)340
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A practical automatic polyhedral parallelizer and locality optimizer

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

Polyhedral parallel code generation for CUDA

A practical automatic polyhedral parallelizer and locality optimizer