Abstract
The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. After the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The performance of a data layout depends on the target compilation system, the target machine, the problem size, and the number of available processors. This makes the choice of a good layout extremely difficult for most users of such languages. If languages such as HPF are to find general acceptance, the need for data layout selection support has to be addressed. We beleive that the appropriate way to provide the needed support is through a tool that generates data layout specifications automatically. This article discusses the design and implementation of a data layout selection tool that generates HPF-style data layout specifications automatically. Because layout is done in a tool that is not embedded in the target compiler and hence will be run only a few times during the tuning phase of an application, it can use techniques such as integer programming that may be considered too computationally expensive for inclusion in production compilers. The proposed framework for automatic data layout selection builds and examines search spaces of candidate data layouts. A candidate layout is an efficient layout for some part of the program. After the generation of search spaces, a single candidate layout is selected for each program part, resulting in a data layout for the entire program. A good overall data layout may require the remapping of arrays between program parts. A performance estimator based on a compiler model, an execution model, and a machine model are needed to predict the execution time of each candidate layout and the costs of possible remappings between candidate data layouts. In the proposed framework, instances of NP-complete problems are solved during the construction of candidate layout search spaces and the final selection of candidate layouts from each search space. Rather than resorting to heuristics, the framework capitalizes on state-of-the-art 0-1 integer programming technology to compute optimal solutions of these NP-complete problems. A prototype data layout assistant tool based on our framework has been implemented as part of the D system currently under development at Rice University. The article reports preliminary experimental results. The results indicate that the framework is efficient and allows the generation of data layouts of high quality.
- ADVE, V., CARLE, A., GRANSTON, E., HIRANANDANI, S., KENNEDY, K., I~OELBEL, C., I~REMER, U., MELLOR-CRUMMEY, J., TSENG, C.-W., AND WARREN, S. 1994. Requirements for data-parallel programming environments. IEEE Parallel Distrib. Tech. 2, 3, 48-58.]] Google ScholarDigital Library
- ADVE, V. AND MELLOR-CRUMMEY, J. 1998. Advanced code generation for High Performance Fortran. In Compilation Techniques and Run Time Systems for Scalable Parallel Systems. Lecture Notes in Computer Science, vol. 1366. Springer-Verlag, Berlin. To appear.]] Google ScholarDigital Library
- AHO, A. V., SETHI, R., AND ULLMAN, J. 1986. Compilers: Principles, Techniques, and Tools, 2nd ed. Addison-Wesley, Reading, Mass.]] Google ScholarDigital Library
- ALTMAN, E. R., GOVINDARAJAN, R., AND GAO, G. R. 1995. Scheduling and mapping: Software pipelining in the presence of structural hazards. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation. ACM, New York, 139-150.]] Google ScholarDigital Library
- ALTMAN, R. G. E. R. AND GAO, G. R. 1994. Minimizing register requirements under resourceconstrained rate-optimM software pipelining. In Proceedings of the 27th Annual International Symposium on Microarchitecture.]] Google ScholarDigital Library
- ANDERSON, J. 1997. Automatic computation and data decomposition for multiprocessors. Ph.D. thesis, Stanford University, Palo Alto, Cal.]] Google ScholarDigital Library
- ANDERSON, J. AND LAM, M. 1993. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation. ACM, New York, 112-125.]] Google ScholarDigital Library
- AYGUADI~, E., GARCIA, J., GIRONES, M., LABARTA, J., WORRES, J., AND VALERO, M. 1994. Detecting and using affinity in an automatic data distribution tool. In Proceedings of the 7th Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 892. Springer-Verlag, Berlin, 61-75.]] Google ScholarDigital Library
- BALASUNDARAM, V., FOX, G., KENNEDY, I~., AND I~REMER, W. 1991. A static performance estimatot to guide data partitioning decisions. In Proceedings of the 3rd A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, 213-223.]] Google ScholarDigital Library
- BAU, D., KODUKULA, I., KOTLYAR, V., PINGALI, K., AND STODGHILL, P. 1994. Solving alignment using elementary linear algebra. In Proceedings of the 7th Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 892. Springer-Verlag, Berlin, 46-60.]] Google ScholarDigital Library
- BIXBY, R. 1992. Implementing the Simplex method: The initial basis. ORSA J. Comput. 4, 3, 267-284.]]Google ScholarCross Ref
- BIXBY, R., KENNEDY, K., AND KREMER, U. 1994. Automatic data layout using 0-1 integer programming. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PA CTg~). 111-122.]] Google ScholarDigital Library
- CALLAHAN, D. AND KENNEDY, K. 1988. Compiling programs for distributed-memory multiprocessots. J. Supercomput. 2, 151-169.]]Google ScholarCross Ref
- CHAPMAN, B., MEHROTRA, P., AND ZIMA, H. 1992. Vienna Fortran - A Fortran language extension for distributed memory multiprocessors. In Languages, Compilers, and Run-Time Environments for Distributed Memory Machines, J. Saltz and P. Mehrotra, Eds. North-Holland, Amsterdam, The Netherlands, 39-62.]] Google ScholarDigital Library
- CHATTERJEE, S., GILBERT, J., SCHREIBER, R., AND TENG, S.-H. 1993. Automatic array alignment in data-parallel programs. In Proceedings of the 20th Annual A CM Symposium on the Principles of Programming Languages. ACM, New York, 16-28.]] Google ScholarDigital Library
- CHATTERJEE, S., GILBERT, J., SCHREIBER, R., AND TENG, S.-H. 1995. Optimal evaluation of array expressions on massively parallel machines. ACM Trans. Program. Lang. Syst. 17, 1 (Jan.), 123-156.]] Google ScholarDigital Library
- CHATTERJEE, S., GILBERT, J. R., SCHREIBER, R., AND SHEFFLER, T. 1994. Array distribution in data-parallel programs. Lecture Notes in Computer Science, vol. 892. Springer-Verlag, Berlin, 76-91.]] Google ScholarDigital Library
- CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R. L. 1990. Introduction to Algorithms. The MIT Press, Cambridge, Mass.]] Google ScholarDigital Library
- DAMIAN-IORDACHE, M. AND PEMMARAJU, S. 1998. Automatic data decompositions for messagepassing machines. In Proceedings of the 10th Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 1366. Springer-Verlag, Berlin, 64-78.]] Google ScholarDigital Library
- D'HOLLANDER, E. 1989. Partitioning and labeling of index sets in do loops with constant dependence. In Proceedings of the 1989 International Conference on Parallel Processing. II139-II144.]]Google Scholar
- DYBVIG, R. K. 1987. The Scheme Programming Language. Prentice-Hall, Inc., Englewood Cliffs, N.J.]] Google ScholarDigital Library
- FEAUTRIER, P. 1994. Fine-grain scheduling under resource constraints. In Proceedings of the 7th Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 892. Springer-Verlag, Berlin, 1-15.]] Google ScholarDigital Library
- Fox, G., HIRANANDANI, S., KENNEDY, K., KOELBEL, C., KREMER, W., TSENG, C., AND WU, M. 1990. Fortran D language specification. Tech. Rep. TR90-141, Dept. of Computer Science, Rice University, Houston, Tex.]]Google Scholar
- Fox, G., JOHNSON, M., LYZENGA, G., OTTO, S., SALMON, J., AND WALKER, D. 1988. Solving Problems on Concurrent Processors. Vol. 1. Prentice-Hall, Englewood Cliffs, N.J.]] Google ScholarDigital Library
- GARCIA, J. 1997. Automatic data distribution for massively parallel processors. Ph.D. thesis, Universitat Polit~cnica de Catalunya, Barcelona, Spain.]]Google Scholar
- GARCIA, J., AYGUADI~, E., AND LABARTA, J. 1995. A novel approach towards automatic data distribution. In Proceedings of the Workshop on Automatic Data Layout and Performance Prediction (AP'95). Available as Rice University Tech. Rep. CRPC-TR95-548.]]Google ScholarDigital Library
- GARCIA, J., AYGUADI~, E., AND LABARTA, J. 1996. Dynamic data distribution with control flow analysis. In Proceedings of Supercomputing '96. Available at URL http://www, supercomp, org/sc96/proceedings.]] Google ScholarDigital Library
- GUPTA, M. 1992. Automatic data partitioning on distributed memory multicomputers. Ph.D. thesis, University of Illinois at Urbana-Champaign, Ill.]] Google ScholarDigital Library
- GUPTA, M. AND BANERJEE, P. 1992. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Trans. Parallel Distrib. Syst. 3, 2 (Apr.), 179-193.]] Google ScholarDigital Library
- HECHT, M. S. 1977. Flow Analysis of Computer Programs. North Holland.]] Google ScholarDigital Library
- HIRANANDANI, S., KENNEDY, K., KOELBEL, C., KREMER, U., AND TSENG, C. 1991. An overview of the Fortran D programming system. In Proceedings of the ~th Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 589. Springer- Verlag, Berlin.]] Google ScholarDigital Library
- HUANG, C.-H. AND SADAYAPPAN, P. 1991. Communication-free hyperplane partitioning of nested loops. In Proceedings of the ~th Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 589. Springer-Verlag, Berlin.]] Google ScholarDigital Library
- HUDAK, D. AND ABRAHAM, S. 1990. Compiler techniques for data partitioning of sequentially iterated parallel loops. In Proceedings of the 1990 A CM International Conference on Supercomputing. ACM, New York, 187-200.]] Google ScholarDigital Library
- KELLY, W. AND PUGH, W. 1996. Minimizing communication while preserving parallelism. In Proceedings of the 1996 ACM International Conference on Supercomputing. ACM, New York, 52-60.]] Google ScholarDigital Library
- KENNEDY, K. AND KREMER, W. 1995. Automatic data layout for High Performance Fortran. In Proceedings of Supercomputing '95. Available at URL http:////www, supercomp, org//sc95/proceedings.]] Google ScholarDigital Library
- KNOBE, K., LUKAS, J., AND STEELE, G. 1990. Data optimization: Allocation of arrays to reduce communication on SIMD machines. J. Parallel Distrib. Comput. 8, 2 (Feb.), 102-118.]] Google ScholarDigital Library
- KNOBE, K. AND NATARAJAN, V. 1993. Data optimization: Minimizing residual interprocessor data motion on SIMD machines. J. Supercomput. 7, 387-415.]] Google ScholarDigital Library
- KOELBEL, C., LOVEMAN, D., SCHREIBER, R., STEELE, JR., G., AND ZOSEL, M. 1994. The High Performance Fortran Handbook. The MIT Press, Cambridge, Mass.]] Google ScholarDigital Library
- KOELBEL, C. AND MEHROTRA, P. 1991. Compiling global name-space parallel loops for distributed execution. IEEE Trans. Parallel Distrib. Syst. 2, 4 (Oct.), 440-451.]] Google ScholarDigital Library
- KREMER, U. 1995. Automatic data layout for distributed memory machines. Ph.D. thesis, Rice University, Houston, Tex. Available as CRPC-TR95-559-S.]] Google ScholarDigital Library
- KREMER, U. 1997. Optimal and near-optimM solutions for hard compilation problems. Parallel Process. Lett. 7, 2, 371-378.]]Google ScholarCross Ref
- KREMER, U. 1998. Automatic data layout with read-only replication and memory constraints. In Proceedings of the 10th Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 1366. Springer-Verlag, Berlin, 419-422.]] Google ScholarDigital Library
- KREMER, U. AND RAMIe, M. 1994. Compositional oil reservoir simulation in Fortran D" A feasibility study on Intel iPSC,/860. Int. J. Supercomput. Appl. 8, 2 (Summer), 119-128.]]Google Scholar
- LEE, P. AND TSAI, T.-B. 1993. Compiling efficient programs for tightly-coupled distributed mereory computers. In Proceedings of the 1993 International Conference on Parallel Processing. II161-II165.]] Google ScholarDigital Library
- LI, J. 1991. Compiling Crystal for distributed memory machines. Ph.D. thesis, Dept. of Computer Science, Yale University, New Haven, Conn.]] Google ScholarDigital Library
- LI, J. AND CHEN, M. 1991a. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. Parallel Distrib. Syst. 2, 3 (July), 361-376.]] Google ScholarDigital Library
- LI, J. AND CHEN, M. 1991b. The data alignment phase in compiling programs for distributedmemory machines. J. Parallel Distrib. Comput. 13, 2 (Oct.), 213-221.]] Google ScholarDigital Library
- NEMHAUSER, G. L. AND WOLSEY, L. A. 1988. Integer and Combinatorial Optimization. John Wiley & Sons, New York.]] Google ScholarDigital Library
- NING, Q., DONGEN, V. V., AND GAO, G. R. 1995. Automatic data and computation decomposition for distributed memory machines. In Proceedings of the 28th Annual Hawaii International Conference on System Sciences. IEEE, New York.]] Google ScholarDigital Library
- NING, Q. AND GAO, G. R. 1993. A novel framework of register allocation for software pipelining. In Proceedings of the 20th Annual A CM Symposium on the Principles of Programming Languages. ACM, New York, 29-42.]] Google ScholarDigital Library
- PALERMO, D. 1996. Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers. Ph.D. thesis, University of Illinois at Urbana-Champaign, Ill. Available as CRHC-96-09.]] Google ScholarDigital Library
- PALERMO, D. AND BANERJEE, P. 1995. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. Technical Report CRHC-95-09, Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Ill.]]Google Scholar
- PALERMO, D. AND BANERJEE, P. 1996. Interprocedural array redistribution with data-flow analysis. In Proceedings of the 9th Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 1239. Springer-Verlag, Berlin, 435-449.]] Google ScholarDigital Library
- PHILIPPSEN, M. 1995. Automatic alignment of array data and processes to reduce communication time on DMPPs. In Proceedings of the 5th A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, New York, 156-165.]] Google ScholarDigital Library
- PRESBERG, D. L. 1996. Comparison of 3 HPF compilers for the IBM SP. NHSF, Rev. 1996, 2. Available at http://www.crpc.rice.edu/NHSEreview/96-2.html.]]Google Scholar
- PUGH, W. 1991. The Omega test: A fast and practical integer programming algorithm for dependence analysis. In Proceedings of Supercomputing '91. 4-13.]] Google ScholarDigital Library
- RAMANUJAM, J. AND SADAYAPPAN, P. 1989. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. In Proceedings of Supercomputing '89. 637-646.]] Google ScholarDigital Library
- ROGERS, t. AND PINGALI, K. 1989. Process decomposition through locality of reference. In Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation. ACM, New York, 69-80.]] Google ScholarDigital Library
- ROSE, J. AND STEELE, G. 1987. C*: An extended C language for data parallel programming. Technical Report PL87-5, Thinking Machines, Inc., Cambridge, Mass.]]Google Scholar
- SHEFFLER, T. J., SCHREIBER, R., PUGH, W., GILBERT, J. R., AND CHATTERJEE, S. 1996. Efficient distribution analysis via graph contraction. Int. J. of Parallel Prog. 24, 6 (Dec.), 599-620.]]Google ScholarCross Ref
- TANDRI, S. AND ABDELRAHMAN, T. 1997. Automatic partitioning of data and computation on scalable shared memory multiprocessors. In Proceedings of the 1997 International Conference on Parallel Processing. 64-73.]] Google ScholarDigital Library
- Thinking Machines Corp. 1989. CM Fortran Reference Manual, Version 5.2-0.6 ed. Thinking Machines Corp., Cambridge, Mass.]]Google Scholar
- TSENG, C. 1993. An optimizing Fortran D compiler for MIMD distributed-memory machines. Ph.D. thesis, Rice University, Houston, Tex. Available as Rice COMP TR93-199.]] Google ScholarDigital Library
- WHOLEY, S. 1991. Automatic data mapping for distributed-memory parallel computers. Ph.D. thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pa.]] Google ScholarDigital Library
- WHOLEY, S. 1992. Automatic data mapping for distributed-memory parallel computers. In Proceedings of the 1992 ACM International Conference on Supercomputing. ACM, New York, 25-34.]] Google ScholarDigital Library
- ZIMA, H., BAST, H.-J., AND GERNDT, M. 1988. SUPERB: A tool for semi-automatic MIMD,/SIMD parallelization. Parallel Comput. 6, 1-18.]]Google ScholarCross Ref
Index Terms
- Automatic data layout for distributed-memory machines
Recommendations
Automatic data layout for high performance Fortran
Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on SupercomputingHigh Performance Fortran (HPF) is rapidly gaining acceptance as a language for parallel programming. The goal of HPF is to provide a simple yet efficient machine independent parallel programming model. Besides the algorithm selection, the data layout ...
Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks
FRONTIERS '96: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel ComputationHigh Performance Fortran (HPF) compilers and communication libraries with the standardized Message Passing Interface (MPI) are becoming widely available, easing the development of portable parallel applications. The Annai tool environment supports ...
Comments