skip to main content
research-article

A practical automatic polyhedral parallelizer and locality optimizer

Published:07 June 2008Publication History
Skip Abstract Section

Abstract

We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model -- far beyond what is possible by current production compilers. Unlike previous works, our approach is an end-to-end fully automatic one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations. The framework has been implemented into a tool to automatically generate OpenMP parallel code from C program sections. Experimental results from the tool show very high speedups for local and parallel execution on multi-cores over state-of-the-art compiler frameworks from the research community as well as the best native production compilers. The system also enables the easy use of powerful empirical/iterative optimization for general arbitrarily nested loop sequences.

References

  1. PLuTo: A polyhedral automatic parallelizer and locality optimizer for multicores. http://pluto-compiler.sourceforge.net.Google ScholarGoogle Scholar
  2. N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loops. IJPP, 29(5), Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491--542, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Ancourt and F. Irigoin. Scanning polyhedra with do loops. In PPoPP'91, pages 39--50, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Andonov, S. Balev, S. Rajopadhye, and N. Yanev. Optimal semi-oblique tiling. IEEE Trans. Par. & Dist. Sys., 14(9):944--960, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE PACT, pages 7--16, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Bastoul and P. Feautrier. Improving data locality by chunking. In Intl. Conf. on Compiler Construction (ETAPS CC), pages 320--335, Warsaw, Apr. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Affine transformations for communication minimal parallelization and locality optimization of arbitrarily-nested loop sequences. Technical Report OSU-CISRC-5/07-TR43, The Ohio State University, May 2007.Google ScholarGoogle Scholar
  9. U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Intl. Conf. on Compiler Construction (ETAPS CC), Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. Bondhugula, J. Ramanujam, and P. Sadayappan. Pluto: A practical and fully automatic polyhedral parallelizer and locality optimizer. Technical Report OSU-CISRC-10/07-TR70, The Ohio State University, Oct. 2007.Google ScholarGoogle Scholar
  11. P. Boulet, A. Darte, T. Risset, and Y. Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17(1):33--51, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Boulet, A. Darte, G.-A. Silber, and F. Vivien. Loop parallelization algorithms: From parallelism extraction to code generation. Parallel Computing, 24(3?4):421--444, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. CLooG: The Chunky Loop Generator. http://www.cloog.org.Google ScholarGoogle Scholar
  14. A. Cohen, S. Girbal, D. Parello, M. Sigler, O. Temam, and N. Vasilache. Facilitating the search for compositions of program transformations. In ACM Intl. Conf. on Supercomputing, pages 151--160, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser Boston, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379--392, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. Darte and F. Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. Intl. J. Parallel Programming, 25(6):447--496, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Feautrier. Parametric integer programming. RAIRO Recherche Operationnelle, 22(3):243--268, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  19. P. Feautrier. Dataflow analysis of scalar and array references. Intl. J. of Parallel Programming, 20(1):23--53, Feb. 1991.Google ScholarGoogle ScholarCross RefCross Ref
  20. P. Feautrier. Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Intl. J. of Parallel Programming, 21(5):313--348, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Feautrier. Some efficient solutions to the affine scheduling problem. part II. multidimensional time. Intl. J. of Parallel Programming, 21(6):389--420, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations. Intl. J. of Parallel Programming, 34(3):261--317, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Goumas, M. Athanasaki, and N. Koziris. Code Generation Methods for Tiling Transformations. J. of Information Science and Engineering, 18(5):667--691, Sep. 2002.Google ScholarGoogle Scholar
  24. M. Griebl. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. University of Passau, 2004. Habilitation thesis.Google ScholarGoogle Scholar
  25. M. Griebl, C. Lengauer, and S. Wetzel. Code generation in the polytope model. In IEEE PACT, pages 106--111, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Hodzic and W. Shang. On time optimal supernode shape. IEEE Trans. Par. & Dist. Sys., 13(12):1220--1233, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Hogstedt, L. Carter, and J. Ferrante. Selecting tile shape for minimal execution time. In SPAA, pages 201--211, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN PoPL, pages 319--329, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yellick. Implicit and explicit optimization for stencil computations. In ACM SIGPLAN workshop on Memory Systems Perofmance and Correctness, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W. Kelly andW. Pugh. A unifying framework for iteration reordering transformations. Technical Report CS-TR-3430, Dept. of Computer Science, University of Maryland, College Park, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Intl. Symp. on the frontiers of massively parallel computation, pages 332--341, Feb. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Kim, L. Renganarayanan, M. Strout, and S. Rajopadhye. Multilevel tiling: ?m? for the price of one. In Supercomputing, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. LeVerge. A note on chernikova?s algorithm. Technical Report Research report 635, IRISA, Feb. 1992.Google ScholarGoogle Scholar
  34. W. Li and K. Pingali. A singular loop transformation framework based on non-singular matrices. Intl. J. of Parallel Programming, 22(2):183--205, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Lim, S. Liao, and M. Lam. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In ACM SIGPLAN PPoPP, pages 103--112, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. W. Lim, G. I. Cheong, and M. S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ACM Intl. Conf. on Supercomputing, pages 228--237, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing, 24(3-4):445--475, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. The LooPo Project - Loop parallelization in the polytope model. http://www.fmi.uni-passau.de/loopo.Google ScholarGoogle Scholar
  39. B. Norris, A. Hartono, and W. Gropp. Annotations for performance and productivity. 2007. Preprint ANL/MCS-P1392-0107.Google ScholarGoogle Scholar
  40. R. Penrose. A generalized inverse for matrices. Proceedings of the Cambridge Philosophical Society, 51:406--413, 1955.Google ScholarGoogle ScholarCross RefCross Ref
  41. PIP: The Parametric Integer Programming Library. http://www.piplib.org.Google ScholarGoogle Scholar
  42. PolyLib - A library of polyhedral functions. http://icps.u-strasbg.fr/polylib/.Google ScholarGoogle Scholar
  43. S. Pop, A. Cohen, C. Bastoul, S. Girbal, P. Jouvelot, G.-A. Silber, and N. Vasilache. GRAPHITE: Loop optimizations based on the polyhedral model for GCC. In Proc. of the 4th GCC Developper?s summit, Ottawa, Canada, June 2006.Google ScholarGoogle Scholar
  44. L.-N. Pouchet, C. Bastoul, J. Cavazos, and A. Cohen. Iterative optimization in the polyhedral model: Part II, multidimensional time. In PLDI?08, Tucson, Arizona, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In ACM CGO, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. W. Pugh. The omega test: a fast and practical integer programming algorithm for dependence analysis. Communications of the ACM, 8:102--114, Aug. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. F. Quilleré, S. V. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. Intl. J. of Parallel Programming, 28(5):469--498, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. JPDC, 16(2):108--230, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  49. L. Renganarayana, D. Kim, S. Rajopadhye, and M. M. Strout. Parameterized tiled loops for free. In PLDI, pages 405--414, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. Schreiber and J. Dongarra. Automatic blocking of nested loops. Technical report, University of Tennessee, Knoxville, TN, Aug. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In PLDI, pages 215--228, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. N. Vasilache. Scalable Program Optimization Techniques in the Polyhedral Model. PhD thesis, Université de Paris-Sud, INRIA, Futurs, Sept. 2007.Google ScholarGoogle Scholar
  54. N. Vasilache, C. Bastoul, and A. Cohen. Polyhedral code generation in the real world. In Intl. Conf. on Compiler Construction (ETAPS CC), pages 185--201, Mar. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. N. Vasilache, C. Bastoul, S. Girbal, and A. Cohen. Violated dependence analysis. In ACM ICS, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimizations of Software and the ATLAS Project. Parallel Computing, 2000.Google ScholarGoogle Scholar
  57. D. K. Wilde. A library for doing polyhedral operations. Technical Report RR-2157, IRISA, 1993.Google ScholarGoogle Scholar
  58. M. Wolf and M. S. Lam. A data locality optimizing algorithm. In ACM SIGPLAN PLDI ?91, pages 30--44, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. M. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst., 2(4):452--471, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. Xue. Communication-minimal tiling of uniform dependence loops. JPDC, 42(1):42--59, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. J. Xue. Loop tiling for parallelism. Kluwer Academic Publishers, Norwell, MA, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Q. Yi, K. Kennedy, and V. Adve. Transforming complex loop nests for locality. J. of Supercomputing, 27(3):219--264, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. K. Yotov, X. Li, G. Ren, M. Cibulskis, G. DeJong, M. Garzaran, D. A. Padua, K. Pingali, P. Stodghill, and P.Wu. A comparison of empirical and model-driven optimization. In PLDI?03, pages 63--76, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A practical automatic polyhedral parallelizer and locality optimizer

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 43, Issue 6
      PLDI '08
      June 2008
      382 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1379022
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2008
        396 pages
        ISBN:9781595938602
        DOI:10.1145/1375581
        • General Chair:
        • Rajiv Gupta,
        • Program Chair:
        • Saman Amarasinghe

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 June 2008

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader