skip to main content
10.1145/231379.231391acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free Access

Global communication analysis and optimization

Authors Info & Claims
Published:01 May 1996Publication History

ABSTRACT

Reducing communication cost is crucial to achieving good performance on scalable parallel machines. This paper presents a new compiler algorithm for global analysis and optimization of communication in data-parallel programs. Our algorithm is distinct from existing approaches in that rather than handling loop-nests and array references one by one, it considers all communication in a procedure and their interactions under different placements before making a final decision on the placement of any communication. It exploits the flexibility resulting from this advanced analysis to eliminate redundancy, reduce the number of messages, and reduce contention for cache and communication buffers, all in a unified framework. In contrast, single loop-nest analysis often retains redundant communication, and more aggressive dataflow analysis on array sections can generate too many messages or cache and buffer contention. The algorithm has been implemented in the IBM pHPF compiler for High Performance Fortran. During compilation, the number of messages per processor goes down by as much as a factor of nine for some HPF programs. We present performance results for the IBM SP2 and a network of Sparc workstations (NOW) connected by a Myrinet switch. In many cases, the communication cost is reduced by a factor of two.

References

  1. 1.G. Agarwal, J. Saltz, and R. Das. Interprocedural partial redundancy elimination and its application to distributed memory compilation. In Programming Language Design and Implementation (PLDI), La Jolla, CA, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.F. Allen, M. Burke, P. Charles, R. Cytron, and J. Ferrante. An overview of the ptran analysis system for multiprocessing. Proc. A CM 1987 International Conference on Supercomputing, 1987. Also published in Journal of Parallel and Distributed Computing, Oct., 1988, 5(5) pages 617-640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.S. P. Amarasinghe and M. S. Lain. Communication optimization and code generation for distributed memory machines. In Programming Language Design and implementation (PLDI), Albuquerque, NM, June 1993. ACM SIGPLAN. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka. A compilation approach for Fortran 90D/HPF compilers on distributed memory MIMD computers. In Proc. Sixth Annual Workshop on Languages and Compilers for Parallel Computing, Portland, Oregon, Aug. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.T. Brandes. ADAPTOR: A compilation system for data-parallel Fortran programs, in C. W. Kessler, editor, Automatic parallelization - new approaches to code generation, data distribution, and performance prediction. Vieweg Advanced Studies in Computer Science, Vieweg, Wiesbaden, Jan. 1994.Google ScholarGoogle Scholar
  6. 6.J.-D. Choi, R. Cytron, and J. Ferrante. On the efficient engineering of ambitious program analysis. IEEE 2"ransact~ons on Software Engineering, 20(2):105-114, Feb. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.C. Click. Global code motion global value numbering. In Programming Language Design and Implementation (PLDI), pages 246-257. ACM SIGPLAN, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.R. Cytron, J. Ferrante, B. Rosen, M. Wegman, and F. Zadeck. Efficiently computing static single assignment form and the control dependence graph. A CM Transactions on Programming Languages and Systems, 13(4):451-490, Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.H. P. F. Forum. High Performance Fortran language specification, version 1.0. Technical Report CRPC- TR92225, Rice University, May 1993.Google ScholarGoogle Scholar
  10. 10.E. Granston and A. Veidenbaum. Detecting redundant accesses to array data. In Prec. $upercomputing '91, pages 854-965, 1991. Google ScholarGoogle Scholar
  11. 11.M. Gupta and P. Banerjee. A methodology for highlevel synthesis of communication on multicomputers. In Prec. 6th A CM International Conference on Supercom. puling, Washington D.C., July 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, K. Wang, D. Shields, W.-M. Ching, and T. Ngo. An I-IPF compiler for the IBM SP2. In Prec. Supercomputing '95, San Diego, CA, Dec. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.M. Gupta and E. Schonberg. Static analysis to reduce synchronization costs in data-parallel programs. In Principles of Programming Languages (POPL), St. Petersburg Beach, FL, Jan. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.M. Gupta, E. Schonberg, and H. Srinivasan. A unified framework for optimizing communication in dataparallel programs. Technical Report RC 19872(87937) 12/14/94, IBM Research, 1994. To appear in iEEE Transactions on Parallel and Distributed Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.S. Hiranandani, K. Kennedy, and C. Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66-80, Aug. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.K. Keeton, T. Anderson, and D. Patterson. LogP quantified: The case for low-overhead local area networks. In Prec. Hot Interconnects IiI: A Symposium on High Performance Interconnects, Stanford, CA, Aug. 1995.Google ScholarGoogle Scholar
  17. 17.K. Kennedy and N. Nedeljkovic. Combining dependence and data-flow analyses to optimize communication. In international Parallel Processing Symposium. IEEE, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.K. Kennedy and A. Sethi. A constraint-based communication placement framework. Technical Report CRPC-TR95515-S, CRPC, Rice University, 1995.Google ScholarGoogle Scholar
  19. 19.J. Knoop, O. Riithing, and B. Steffen. Lazy code motion. In Programming Language Design and Implementation (PLDI), San Francisco, CA, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Transactions on Parallel and Distributed Systems, 2(3):361- 376, July 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.T. Mowry, M. Lain, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 62-73. ACM SiGPLAN, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.M. O'Boyle and F. Bodin. Compiler reduction of synchronization in shared virtual memory systems. In Prec. 9th A CM International Conference on Supercompuling, Barcelona, Spain~ July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.V. Sarkar. The PTRAN parallel programming system. Parallel Functional Programming Languages and Compilers, pages 309-391, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.M. Snir et al. The communication software and parallel environment of the IBM SP2. IBM Systems Journal, 34(2):205-221, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.C. Stunkel et al. The SP2 high performance switch. iBM Systems Journal, 34(2):185-204, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.C.-W. Tseng. Compiler optimizations for eliminating barrier synchronization. In Principles and Practice of Parallel Programming (PPoPP), Santa Barbara, CA, july 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.R. v Hanxleden and K. Kennedy. Give-n-Take--a balanced code placement framework. In Programming Language Design and Implementation (PLDI), Orlando, FL, June 1994. ACM SIGPLAN. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.M. Wolfe and U. Banerjee. Data dependence and its application to parallel processing. International Journal of Parallel Programming, 16(2):137-178, Apr. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30.H. Zima, H. Bast, and M. Gerndt. SUPERB: A tool for semi-automatic MiMD/SIMD parallelization. Parallel Computing, 6:1-18, 1988.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Global communication analysis and optimization

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PLDI '96: Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
            May 1996
            300 pages
            ISBN:0897917952
            DOI:10.1145/231379

            Copyright © 1996 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 May 1996

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            PLDI '96 Paper Acceptance Rate28of112submissions,25%Overall Acceptance Rate406of2,067submissions,20%

            Upcoming Conference

            PLDI '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader