article

Free Access

Compiler transformations for high-performance computing

Authors:
David F. Bacon

Computer Science Division, University of California, Berkeley, California

Computer Science Division, University of California, Berkeley, California
View Profile

,
Susan L. Graham

Computer Science Division, University of California, Berkeley, California

Computer Science Division, University of California, Berkeley, California
View Profile

,
Oliver J. Sharp

Computer Science Division, University of California, Berkeley, California

Computer Science Division, University of California, Berkeley, California
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 26 Issue 4pp 345–420https://doi.org/10.1145/197405.197406

Published:01 December 1994Publication History

ACM Computing Surveys

Abstract

In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis.

This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application.

Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimizations that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques.

References

ABELSON, H. AND SUSSMAN, G. J. 1985. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, Mass. Google Scholar
ABu-SUFAH, W 1979. improving the performance of virtual memory computers. Ph D., thesis, Tech. Rep. 78-945, Univ. of Illinois at Urbana- Champaign. Google Scholar
ABU-SUFAH, W., KUCK, D. J., AND LAWRIE, I). 1981 On the performance enhancement of paging systems through program analysis and transformations IEEE Trans. Comput. C-30, 5 (May), 341-356.Google Scholar
ACKERSLAXq, W B. 1982. Data flow languages. Computer 15, 2 (Feb.), 15-25.Google Scholar
AHO, A. V., JOHNSON, S C., AND ULLMAN, J. D 1977. Code generation for expressions with common subexpressions. J. ACM 24, 1 (Jan.), 146-160. Google Scholar
Ago, A. V., SF~THI, R., AND ULLMAN, J. D. 1986. Compilers. Prtnc~ples, Techniques, and Tools. Addison-Wesley, Reading, Mass. Google Scholar
AIKEN, A. AND NICOLAU, A. 1988a Optimal loop parallehzation. In Procee&ngs of the SIGPLAN Conference on Programmzng Language Destgn and Implementahon (Atlanta, Ga., June). SIG- PLAN Not. 23, 7 (July), 308-317. Google Scholar
AMEN, A. AND NICOIAU. A. 1988b. Perfect pipelinrag: A new loop parallelization technique. In Proceedmgs of the 2nd European Symposzum on Programming Lecture Notes in Computer Science, vol. 300. Springer-Verlag, Berlin, 221-235. Google Scholar
ALLEN, J. R. 1983. Dependence analysis for subscripted variables and its apphcation to program transformations. Ph.D. thesis, Computer Science Dept., Rice University, Houston, Tex Google Scholar
ALLEN, F. E. 1969. Program optimization. In Annual Rewew ~n Automatic Programming 5. International Tracts in Computer Science and Technology and their Application, vol. 13. Pergamon Press, Oxford, England, 239-307.Google Scholar
ALLEN, F. E. AND COCKE, J. 1971. A catalogue of optimizing transformations. In Design and Opt~m~zat~on of Compilers, R Rustin, Ed. Prentice-Hall, Englewood Cliffs, N.J., 1-30.Google Scholar
ALLEN, J. R. AND KENNEDY, K. 1987. Automatic translation of Fortran programs to vector form. ACM Trans. Program Lang. Syst. 9, 4 (Oct.), 491-542. Google Scholar
ALLEN, J. R. AND I~~ENNEDY, K. 1984. Automatic loop interchange. In Proceedings of the SIGPLAN Symposium on Compzler Construction (Montreal, Quebec, June). SIGPLAN Not. 19, 6, 233-246. Google Scholar
ALLEN, F. E., BURKE, M., CHARLES, P., CYTRON, R., AND FERRANTE, j. 1988a. An overview of the PTRAN analysis system for multiprocessing. J. Parall. Distrib. Comput. 5, 5 (Oct.), 617-640. Google Scholar
ALLEN, F. E., BURKE, M., CYTRON, R., FERRANTE, J., HSIEH, W., AND SARKAR, V. 1988b. A framework for determining useful parallelism. In Proceedings of the ACM International Conference on Supercomput~ng (St. Malo, France, July). ACM Press, New York, 207-215. Google Scholar
ALLEN, F. E., COCKE, J., AND KENNEDY, K. 1981. Reduction of operator ~trength. In Program Flow Analysis: Theory and Apphcations, S. S. Muchnik and N. D. Jones, Eds., Prentice-Hall, Englewood Cliffs, N.J., 79-101.Google Scholar
ALL~N, J R., KENNEDY, K., PORTERFIELD, C., AND WARREN, J. 1983. Conversion of control dependence to data dependence In Conference Record of the 10th ACM Symposium on Principlos of Programming Languages (Austin, Tex., Jan.). ACM Press, New York, 177-189. Google Scholar
ALPERT, D. AND AVNON, D. 1993 Architecture of the Pentium microprocessor IEEE Micro 13, 3 (June), 11-21 Google Scholar
AMERICAN NATIONAL STANDARDS INSTITUTE. 1987. An American National standard, IEEE standard for binary floating-point arithmetic SIGPLAN Not. 22, 2 (Feb.), 9-25.Google Scholar
ANDERSON, J. M. AND LAM, M. S. 1993 Global optimlzations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN Conference on Programming Language Deszgn and Implementation (Albuquerque, New Mexico, June). SIGPLAN Not. 28, 6, 112-125. Google Scholar
ANDERSON, S. AND HUDAK, P 1990 Compilation of Haskell array comprehensions for scientific computing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementatwn (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 137-149. Google Scholar
APPEL, A. W. 1992. Compding w~th Continuations. Cambridge University Press, Cambridge, England. Google Scholar
ARDEN, B. W., GALLER, B. A., AND GRAHAM, R. M. 1962. An algorithm for translating Boolean expressions. J. ACM 9, 2 (Apr.), 222-239. Google Scholar
ARWND AND CULLER, D. E. 1986. Dataflow architectures. In Annual Review of Computer Science. Vol. 1. J. F. Traub, B. J. Grosz, B. W. Lampson, and N. J. Nilsson, Eds. Annual Reviews, Palo Alto, Calif, 225-253. Google Scholar
ARWND, KATHAIL, V., AND PINGALI, K. 1980. A dataflow architecture with tagged tokens. Tech. Rep TM-174, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google Scholar
BACON, D. F., CHOW, J.-H., Ju, D. R, MUTHUKUMAR, 14, AND SARKAR, V. 1994. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proceedings of CASCON '94 (Toronto, Ontario, Oct.). Google Scholar
BAILEY, D. H. 1992. On the behavior of cache memories with strided data access. Tech. Rep. RNR- 92-015, NASA Ames Research Center, Moffett Field, Calif., (May).Google Scholar
BAILEY, D. H., BARszcz, E., BARTON, J. T., BROWNING, D. S., CARTER, R. L., DAGUM, L., FATOOHI, R. A., FREDERICKSON, P. 0., LASINSKI, T. A., SCHREIBER, R. S., SIMON, H. D., VENKATAKRISHNAN, V., AND WEERATUNGA, S. K. 1991. The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5, 3 (Fall), 63-73.Google Scholar
BALASUNDARAM, V. 1990. A mechanism for keeping useful internal information in parallel programming tools: The Data Access Descriptor. J. Parall. D~stnb. Comput. 9, 2 (June), 154-170. Google Scholar
BALASUNDARAM, V., Fox, G., KENNEDY, K., AND KREMER, V. 1990. An interactive environment for data partitioning and distribution. In Proceedtngs of the 5th D~strzbuted Memory Computer Conference (Charleston, South Carolina, Apr.). IEEE Computer Society Press, Los Alamitos, Calif.Google Scholar
BALASUNDARAM, V., KENNEDY, K., KREMER, U., MCKINLEY, K., AND SUBHLOK, J. 1989. The ParaScope editor: An interactive parallel programming tool. In Proceedings of Supercomputing '89 (Reno, Nev., Nov.). ACM Press, New York, 540-550. Google Scholar
BALL, J. E. 1979. Predicting the effects of optimization on a procedure body. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Denver, Color., Aug.). SIGPLAN Not. 14, 8, 214-220. Google Scholar
B~NE~EE, U. 1991. Unimodular transformations of double loops. In Advances in Languages and Compilers for Parallel Processing, A. Nicolau, Ed. Research Monographs in Parallel and Distributed Computing, MIT Press, Cambridge, Mass., Chap. 10.Google Scholar
BANERJEE, U. 1988a. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, Mass. Google Scholar
BnNERJEE, U. 1988b. An introduction to a formal theory of dependence analysis. J. Supercomput. 2, 2 (Oct.), 133-149.Google Scholar
BANERJEE, U. 1979. Speedup of ordinary programs, Ph.D. thesis, Tech. Rep. 79-989, Computer Science Dept., Univ. of Illinois at Urbana- Champaign. Google Scholar
BANERJEE~ U., CHEN, S. C., KUCK, D. J., AND TOWLE, R. A. 1979. Time and parallel processor bounds for Fortran-like loops. IEEE Trans. Comput. C-28, 9 (Sept.), 660-670.Google Scholar
BANNING, J. P. 1979. An efficient way to find the side-effects of procedure calls and the aliases of variables. In Conference Record of the 6th ACM Symposium on Principles of Programming Languages (San Antonio, Tex., Jan.). ACM Press, 29-41. Google Scholar
BERNSTEIN, R. 1986. Multiplication by integer constants. Softw. Pract. Exper. 16, 7 (July), 641-652. Google Scholar
BERRY, M., CHEN, D., Koss, P., KUCK, D., Lo, S., PANG, Y., POINTER, L., ROLOFF, R., SAMEH, A., CLEMENTI, E_. CHIN. S. SCHNEIDER. D. FOX. FT., MESSINA, P., WALKER, D., HSIUNG, C., SCHWARZMEIER, J., LUE, K., ORSZAG, S., SEIDL, F., JOHNSON, O., GOODRUM, R., AND MARTIN, J. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Int. J. Supercomput. Appl. 3, 3 (Fall), 5-40.Google Scholar
BLELLOCH, G. 1989. Scans as primitive parallel operations. IEEE Trans. Comput. C-38, 11 (Nov.), 1526-1538. Google Scholar
BROMLEY, M., {-IELLER, S., MCNERNEY, W., AND STEELE, G. L., JR. 1991. Fortran at ten gigaflops: The Connection Machine convolution compiler. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIG- PLAN Not. 26, 6, 145-156. Google Scholar
BURKE, M. AND CYTRON, R. 1986. Interprocedural dependence analysis and paratlelization. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Palo Alto, Calif., June). SIGPLANNot. 21, 7 (July), 162-175. Extended version available as IBM Thomas J. Watson Research Center Tech. Rep. RC 11794. Google Scholar
BURNETT, G. J. AND COFFMAN, E. G., JR. 1970. A study of interleaved memory systems. In Proceedings of the Spring Joint AFIPS Computer Conference. vol. 36. AFIPS, Montvale, N.J., 467-474.Google Scholar
BURSTALL, R. M. AND DARLINGTON, J. 1977. A transformation system for developing recursive programs. J. ACM 24, 1 (Jan.}, 44-67. Google Scholar
BUTLER, M., YEH, T., PATT, Y., ALSUP, M., SCALES, H., AND SHEBANOW, M. 1991. Single instruction stream parallelism is greater than two. In Proceedings of the 18th Annual International Symposium on Computer Architecture (Toronto, Ontario, May). SIGARCH Comput. Arch. News 19, 3, 276-286. Google Scholar
CALLAHAN, D. AND KENNEDY, K. 1988a. Analysis of interprocedural side-effects in a parallel programming environment. J. Parall. Distrib. Comput. 5, 5 (Oct.), 517-550. Google Scholar
CALLAHAN, D. AND KENNEDY, K. 1988b. Compiling programs for distributed-memory multiprocessots. J. Supercomput. 2, 2 (Oct.), 151-169.Google Scholar
CALLAHAN, D., CARR, S., AND KENNEDY, K. 1990. Improving register allocation for subscripted variables. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 53-65. Google Scholar
CAL~, D., COCKE, J., AND KENNEDY, K. 1988. Estimating interlock and improving balance for pipelined architectures. J. Parall. Distrib. Comput. 5, 4 (Aug.), 334-358. Google Scholar
CALLAHAN, D., COOPER, K., KENNEDY, K., AND TORCZON, L. 1986. Interprocedurat constant propagation. In Proceedings of the SIGPLAN Symposium on Compder Construction (Palo Alto, Calif., June). SIGPLAN Not. 21, 7 (July), 152-161. Google Scholar
CARR, S. 1993. Memory-hierarchy management. Ph.D. thesis, Rice University, Houston, Tex. Google Scholar
CHAITIN, G. J. 1982. Register allocation and spilling via graph coloring. In Proceedings of the SIG- PLAN Symposium on Compiler Construction (Boston, Mass., June). SIGPLAN Not 17, 6, 98-105. Google Scholar
CHAITIN, G. J., AUSLANDER, M. A., CHANDRA, A. K., COOKE, J., HOPKINS, M. E., AND MARKSTEIN, P. W. 1981. Register allocation via coloring. Comput. Lang. 6, 1 (Jan.), 47-57.Google Scholar
CHAMBERS, C. AND UNGAR, D. 1989. Customization: Optimizing compiler technology for SELF, a dynamically-typed object-oriented programming language. In Proceedings of the SIG- PLAN Conference on Programming Language Design and Implementatmn (Portland, Ore., June). SIGPLAN Not. 24, 7 (July), 146-160. Google Scholar
CHATTERJEE, S., GILBERT, J. R., AND SCHREIBER, R. 1993a. The alignment-distribution graph. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 768. 234-252. Google Scholar
CHATTERJEE, S., GILBERT, J. R., SCHREIBER, R., AND TENG, S.-H. 1993b. Automatic array alignment in data-parallel programs. In Conference Record of the 20th ACM Symposium on Principles of Programming Languages (Charleston, S. Carolina, Jan.). ACM Press, New York, 16-28. Google Scholar
CHEN, S. C. AND KUCK, D. J. 1975. Time and parallel processor bounds for linear recurrence systems. IEEE Trans. Comput. C-24 7 (July), 701-717.Google Scholar
CHoI, J.-D., BURKE, M., AND CARINI, P. 1993. Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects. in Conference Record of the 20th ACM Symposlum on Principles of Programming Languages (Charleston, S. Carolina, Jan.). ACM Press, New York, 232-245. Google Scholar
Chow, F. C. 1988. Minimizing register usage penalty at procedure calls. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Atlanta, Ga., June). SIGPLAN Not. 23, 7 (July), 85-94. Google Scholar
CHOW, F. C. AND HENNESSEY, J. L. 1990. The priority-based coloring approach to register allocation. ACM Trans. Program. Lang. Syst. 12, 4 (Oct.), 501-536. Google Scholar
CLARKE, C. D. AND PEYTON-JONES, S. L. 1985. Strictness analysis--a practical approach. In Functional Programming Languages and Computer Architecture. Lecture Notes in Computer Science, vol. 201. Springer-Verlag, Berlin, 35-49. Google Scholar
CLINGER, W. D. 1990. How to read floating point numbers accurately. In Proceedings of the SIG- PLAN Conference on Programming Language Desiogn and Implementatwn (W~nite Plains, N.Y., June). SIGPLAN Not. 25, 6, 92-101. Google Scholar
COCKE, J. 1970. Global common subexpression elimination. In Proceedings of the ACM Symposium on Compiler Opttmizatwn (July). SIGPLAN Not. 5, 7, 20-24. Google Scholar
COCKE, J. AND MARKSTEIN, P. 1980. Measurement of program improvement algorithms. In Proceedrags of the IFIP Congress (Tokyo, Japan, Oct.). North-Holland, Amsterdam, Netherlands, 221-228. Also available as IBM Tech. Rep. RC 8111, Feb. 1980.Google Scholar
COCKE, J. AND SCHWARTZ, J. T. 1970. Programming languages and their compilers (preliminary notes). 2nd ed. Courant Inst. of Mathematical Sciences, New York University, New York. Google Scholar
COLWELL, R. P., NIX, R. P., O'DONNEL, J. J., PAPWORTH, D. B., AND RODMAN, P. K. 1988. A VLIW architecture for a trace scheduling compiler. IEEE Trans. Comput. C-37, 8 (Aug.), 967-979. Google Scholar
COOPER, K. D. AND KENNEDY, K. 1989. Fast interprocedural alias analysis. In Conference Record of the 16th ACM Symposium on Principles of Programming Languages (Austin, Tex., Jan.). ACM Press, New York, 49-59. Google Scholar
COOPER, K. D., HALL, M. W., AND KENNEDY, K. 1993. A methodology for procedure cloning. Comput. Lang. 19, 2 (Apr.), 105-117.Google Scholar
COOPER, K. D., HALL, M. W., AND TORCZON, L. 1992. Unexpected side effect~ of inline substitution: A case study. ACM Lett. Program. Lang. Syst 1, 1 (Mar.), 22-32. Google Scholar
COOPER, K. D., HALL, M. W., AND TORCZON, L. 1991. An experiment with inline substitution. Softw. Pract. Exper. 21, 6 (June), 581-601 Google Scholar
CRAY RESEARCH 1988 CFT77 Reference Manual. Publication SR-0018-C. Cray Research, Inc., Eagan, Minn.Google Scholar
CYTRON, R. 1986. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the Internatwnal Conference on Parallel Processing (St. Charles, Ill., Aug.). IEEE Computer Society, Washington, D.C., 836-844.Google Scholar
CYTRON, R. AND FERRANTE, J. 1987. What's in a name? -or- the value of renaming for parallelism detection and storage allocation. Proceedings of the Internatwnal Conference on Parallel Processing (University Park, Penn., Aug.). Pennsylvania State University Press, University Park, Pa., 19-27.Google Scholar
DANTZIG, G. B. AND EAVES, B. C. 1974. Fourier- Motzkin ehmination and sits dual with application to integer programming, in Combinatorial Programming: Methods and Applications (Versailles, France, Sept.). B. D. Roy, Ed. D. Reidel, Boston, Mass., 93-102.Google Scholar
DENNIS, J. B. 1980 Data flow supercomputers_ Computer 13, 11 (Nov.), 48-56.Google Scholar
DIXIT, K. M. 1992. New CPU benchmarks from SPEC. In Digest of Papers, Spring COMPCON 1992, 37th IEEE Computer Society International Conference (San Francisco, Calif., Feb.). IEEE Computer Society Press, Los Alamitos, Calif., 305-310. Google Scholar
DONGARRA, J. AND HIND, A. R. 1979. Unrolling loops in Fortran. Softw. Pratt. Exper. 9, 3 (Mar.), 219-226.Google Scholar
EGGERS, S. J. AND KATZ, R. H. 1989 Evaluating the performance of four snooping cache coherency protocols. In Proceedings of the 16th Annual International Symposium on Computer Architecture (Jerusalem, Israel, May). SIGARCH Comput. Arch. News 17, 3 (June), 2-15. Google Scholar
EIGENMANN, R., HOEFLINGER, J., LI, Z., AND PADUA, D. A. 1991. Experience in the automatic parallelization of four Perfect-benchmark programs. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 589. Springer-Verlag, Berlin, 65-83. Also available as Tech. Rep. 1193, Center for Supercomputing Research and Development. Google Scholar
ELLIS, J. R. 1986. Bulldog: A Compiler for VLIW Architectures. ACM Doctoral Dissertation Award. MIT Press, Cambridge, Mass. Google Scholar
ERSHOV, A. P. 1966. ALPHA--an automatic programming system of high efficiency. J. ACM 13, 1 (Jan.), 17-24. Google Scholar
FEAUTRIER, P. 1991. Dataflow analysis of array and scalar references. Int. J. Parall. Program. 20, 1 (Feb.), 23-52.Google Scholar
FEAUTRIER, P. 1988. Array expansion, in Proceedings of the ACM International Conference on Supercomputing (St. Malo, France, July). ACM Press, New York, 429-441. Google Scholar
FERRARI, D. 1976. The improvement of program behavior. Computer 9, 11 (Nov.), 39-47.Google Scholar
FISHER, J.A. 1981. Trace scheduling: A technique for global microcode compaction. IEEE Trans. Comput. C-30, 7 (July), 478-490.Google Scholar
FISHER, J. A., ELLIS, J. R., RUTTENBERG, J. C., AND NICOLAU, A. 1984. Parallel processing: A smart compiler and a dumb machine. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Montreal, Quebec, June). SIGPLAN Not. 19, 6, 37-47. Google Scholar
FLOATING POINT SYSTEMS. 1979. FPS AP-120B Processor Handbook. Floating Point Systems, Inc., Beaverton, Ore.Google Scholar
FREE SOFTWARE FOUNDATION. 1992. gCC 2.X Reference Manual. Free Software Foundation, Cambridge, Mass.Google Scholar
FREUDENBERGER, S. M., SCHWARTZ, J. T., AND SHARIR, M. 1983. Experience with the SETL optimizer. ACM Trans. Program. Lang. Syst. 5, 1 (Jan.), 26-45. Google Scholar
GANNON, D., JALBY, W., AND GALLIVAN, K. 1988. Strategies for cache and local memory management by global program transformation. J. Parall. Distrib. Comput. 5, 5 (Oct.), 587-616. Google Scholar
GERNDT, M. 1990. Updating distributed variables in local computations. Concurrency Pract. Exper. 2, 3 (Sept.), 171-193. Google Scholar
GIBSON, D. H. AND RAO, G. S. 1992. Design of the IBM System/390 computer family for numerically intensive applications: An overview for engineers and scientists. IBM J. Res. Dev. 36, 4 (July), 695-711. Google Scholar
GIRKAR, M. AND POLYCHRONOPOULOS, C. D. 1988. Compiling issues for supercomputers. In Proceedings of Supercomputing '88 (Orlando, Fla., Nov.). IEEE Computer Society, Washington, D.C., 164-173. Google Scholar
GOFF, G., KENNEDY, K., AND TSENG, C.-W. 1991. Practical dependence testing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIGPLAN Not. 26, 6, 15-29. Google Scholar
GRAHAm, S. L., Lucco, S., AND SHARP, O. J. 1993. Orchestrating interactions among parallel computations. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Albuquerque, New Mexico, June). SIGPLAN Not. 28, 6, 100-111. Google Scholar
GRANLUND, W. AND KENNER, R. 1992. Eliminating branches using a superoptimizer and the GNU compiler, in Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 341-352. Google Scholar
GUPTA, M. 1992. Automatic data partitioning on distributed memory multicomputers. Ph.D. thesis, Tech. Rep. UILU-ENG-92-2237, Univ. of Illinois at Urbana-Champaign. Google Scholar
GUPTA, M., MIDKIFF, S., SCHONBERG, E., SWEENEY, P., WANG, K. Y., AND BURKE, M. 1993. PTRAN II: A compiler for High Performance Fortran. In Proceedings of the 4th International Workshop on Compilers for Parallel Computers (Delft, Netherlands, Dec.). 479-493.Google Scholar
HALL, M. W., KENNEDY, K., AND MCKINLEY, K. S. 1991. Interprocedural transformations for parallel code generation. In Proceedings of Supercomputing '91 (Albuquerque, New Mexico, Nov.). IEEE Computer Society Press, os Alamitos, Calif., 424-434. Google Scholar
HARRIS, K. AND HOBBS, S. 1994. VAX Fortran. In Optimization in Compilers, F. E. Allen, R. Cytron, B. K. Rosen, and K. Zadeck, Eds. ACM Press, New York, chap. 16. To be published.Google Scholar
HATFIELD, D. J. AND GERALD, j. 1971. Program restructuring for virtual memory. IBM Syst. J. 10, 3, 168-192.Google Scholar
HENNESSY, J. L. AND PATTERSON, D. A. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Mateo, Cali~ Google Scholar
HEWLETT-PACKARD. 1992. PA-RISC 1.1 Architecture and Instruction Manual. 2nd ed. Part Number 09740-90039. Hewlett-Packard, Palo Alto, Ca}i~Google Scholar
HIGH PERFORMANCE FORTRAN FORUM. 1993. High Performance Fortran language specification, version 1.0. Tech. Rep. CRPC-TR92225, Rice University, Houston, Tex.Google Scholar
HIRANANDANI, S., KENNEDY, K., AND TSENG, C.-W. 1993. Preliminary experiences with the Fortran D compiler. In Proceedlng~ of Supercomputing '93 (Portland, Ore., Nov.). IEEE Computer Society Press, Los Alamitos, Calif., 338-350. Google Scholar
HIRANANDANI, S., KENNEDY, K., AND TSENG, C.-W. 1992. Compiling Fortran D for MIMD distributed-memory machines. Commun. ACM 35, 8 (Aug.), 66-80. Google Scholar
HUDAK, P. AND GOLDBERG, B. 1985. Distributed execution of functional programs using serial combinators IEEE Trans. Comput. C-34, 10 (Oct), 881-890. Google Scholar
Hwu, W. W. AND CHANG, P. P. 1989. Achieving high instruction cache performance with an optimizing compiler. In Proceedings of the 16th Annual International Symposium on Computer Architecture (Jerusalem, Israel, May). SIGARCH Comput. Arch. News 17, 3 (June), 242-251. Google Scholar
Hwu, W. W., MAHLKE, S. A., CHEN, W. Y., CHANG, P. P., WARTER, N. J., BRINGMANN, R. A., OUELLETFE, R. G., HANK, R. E., KIYOHARA, T., HAAB, G. E., HOLM, J. G., AND LAVERY, D M 1993. The superblock: An effective technique for VLIW and superscalar compilation. J. Supercomput. 7, 1/2 (May), 229-248. Google Scholar
IBM. 1992. Optimization and Tuning Gutde for the XL Fortran and XL C Compilers. 1st ed. Publication SC09-1545-00, IBM, Armonk, N.Y.Google Scholar
IBM. 1991. IBM RISC System~6000 NIC Tuning Guide for Fortran and C Publication GG24- 3611-01, IBM, Armonk, N.Y.Google Scholar
IRIGOIN, F. AND TRIOLET, R. 1988. Supernode partitioning. In Conference Record of the 15th ACM Symposium on Principles of Programming Languages (San Diego, Calif., Jan.). ACM Press, New York, 319-329. Google Scholar
IVERSON, K. E. 1962. A Programming Language. John Wiley and Sons, New York. Google Scholar
KENNEDY, K. AND MCKINLEY, K. S. 1990. Loop distribution with arbitrary control flow. In Proceedings of Supercomputing '90 (New York, N.Y., Nov.). IEEE Computer Society Press, Los Alamitos, Calif., 407-416. Google Scholar
KENNEDY, K., MCKINLEY, K. S., AND TSENG, C.-W. 1993. Analysis and transformation in an interactive parallel programming tool. Concurrency Pract. Exper 5, 7 (Oct.), 575-602.Google Scholar
KILDALL, G. 1973. A unified approach to global program optimization In Conference Record of the ACM Symposium on Prmctples of Programmmg Languages (Boston, Mass, Oct.). ACM Press, New York, 194-206. Google Scholar
KNOBE, K. AND NATARAJAN, V. 1993. Automatic data allocation to minimize communication on SIMD machines. J. Supercomput. 7, 4 (Dec.), 387-415. Google Scholar
K_NIOBE, K., LUKAS, J. D., AND STEELE, G. L., JR. 1990. Data optimization: Allocation of arrays to reduce communication on SIMD machines. J. Parall Distrib. Comput. 8, 2 (Feb.), 102-118. Google Scholar
KNOBE, K., LUKAS, J D., AND STEELE, G. L., JR 1988. Massively parallel data optimization. In The 2nd. Symposium on the Frontiers of Massively Parallel Computation (Fairfax, Va., Oct.). IEEE, Piscataway, N.J., 551-558.Google Scholar
KNUTH, D. E. 1971. An empirical study of Fortran programs. Soflw. Pract. Exper. 1, 2 (Apr.-June), 105-133.Google Scholar
KOELBEL, C. 1990. Compiling programs for nonshared memory machines. Ph.D. thesis, Purdue University, West Lafayette, Ind. Google Scholar
KOELBEL, C AND MEHROTRA, P. 1991. Compiling global name-space parallel loops for distributed execution. IEEE Trans. Parall. Distrib. Syst. 2, 4 (Oct.), 440-451. Google Scholar
KRANZ, D., KELSEY, R., REES, J., HUDAK, P., PHILBIN, J., AND ADAMS, N. 1986. ORBIT: An optimizing compiler for Scheme. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Pato Alto, Calif., June). SIGPLAN Not. 21, 7 (July), 219-233. Google Scholar
KUCK, D. J. 1978. The Structure of Computers and Computations Vol. 1. John Wiley and Sons. New York. Google Scholar
KUC~4, D. J. 1977. A survey of parallel machine organization and programming. ACM Comput. Surv. 9, 1 (Mar.), 29-59. Google Scholar
KUCK, D. J. AND STOKES, R. 1982. The Burroughs Scientific Processor (BSP). IEEE Trans. Comput. C-31, 5 (May), 363-376.Google Scholar
KUCK, D. J., KUHN, R. H., PADUA, D., LEASURE, B., AND WOLFE, M. 1981. Dependence graphs and compiler optimizations. In Conference Record of the 8th ACM Symposium on Principles of Programming Languages (Williamsburg, Va., Jan.). ACM Press, New York, 207-218. Google Scholar
L~, M. S. 1988. Software pipelining: An effective scheduling technique for VLIW machines In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Atlanta, Ga., June). SIGPLAN Not. 23, 7 (July), 318-328. Google Scholar
LAM, M. S. AND WmSON, R. P. 1992. Limits of control flow on parallelism. In Proceedings of the 19th Annual International Symposium on Computer Architecture (Gold Coast, Australia, May). SIGARCH Comput. Arch. News 20, 2, 46-57 Google Scholar
LAM, M. S., ROTHBERG, E. E., AND WOLF. M. E. 1991. The cache performance and optimization of blocked algorithms. In Proceedzngs of the 4th Internatwnal Conference on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.). SIGPLAN Not. 26, 4, 63-74. Google Scholar
LAMPORT, L. 1974. The parallel execution of DO loops. Commun. ACM 17, 2 (Feb.), 83-93. Google Scholar
LANDI, W., RYDER, B. G., AND ZHANG, S. 1993. Interprocedural modification side effect analysis with pointer ahasing. In Proceedzngs of the SIGPLAN Conference on Programming Language Deszgn and Implementation (Albuquerque, New Mexico, June). SIGPLAN Not. 28, 6, 56-67. Google Scholar
LARUS, J. R. 1993 Loop-level parallelism in numeric and symbolic programs. IEEE Trans. Parall. Dtstrib. Syst 4, 7 (July), 812-826. Google Scholar
LEE, G., KRUSKAL, C. P., AND KUCK, D. J. 1985. An empirical study of automatic restructuring of nonnumerical programs for parallel processors. IEEE Trans. Comput. C-34, 10 (Oct.), 927-933. Google Scholar
LI, Z. 1992. Array privatization for parallel execution of loops. In Proceedings of the A CM International Conference on Supercomput~ng (Washington, D.C., July). ACM Press, New York, 313-322. Google Scholar
LI, J. AND CHEN, M. 1991. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. Parall. Distrib. Syst. 2, 3 (July), 361-376. Google Scholar
LI, J. AND CHEN, M. 1990. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. In The 3rd Symposium on the Frontiers of Massively Parallel Computation, J. Jaja, Ed. IEEE Computer Society Press, Los Alamitos, Calif., 424-433.Google Scholar
LI, Z. AND YEW, P. 1988. Interprocedura} analysis for parallel computing. In Proceedings of the International Conference on Parallel Processing. F. A. Briggs, Ed. Vol. 2. Pennsylvania State University Press, University Park, Pa., 221-228.Google Scholar
LI, Z., YEW, P., AND ZHU, C. 1990. Data dependence analysis on multi-dimensional array references. IEEE Trans. Parall. Distnb. Syst. 1, 1 (Jan.), 26-34, Google Scholar
LOVEMAN, D. B. 1977. Program improvement by source-to-source transformation. J. ACM 1, 24 (Jan.), 121-145. Google Scholar
Lvcco, S. 1992. A dynamic scheduling method for irregular parallel programs. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 200-211. Google Scholar
MACE, M. E. 1987, Memory Storage Patterns in Parallel Processing. Kluwer Academic Publishers, Norwell, Mass. Google Scholar
MACE, M. E. AND WAGNER, R. A. 1985. Globally optimum selection of memory storage patterns. In Proceedings of the International Conference on Parallel Processing, D. DeGrott, Ed. IEEE Computer Society, Washington, D.C., 264-271.Google Scholar
MARKSTEIN, P., MARKSTEIN, V., AND ZADECK, K. 1994. Strength reduction. In Optimizatmn in Compilers. ACM Press, New York, Chap. 9. To be published.Google Scholar
MASSALIN, H. 1987, Superoptimizer: A look at the smallest program. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems (Palo Alto, Calif., Oct.). SIGPLAN Not. 22, 10, 122-126. Google Scholar
MAYDAN, D. }~., AMARASINGHE, S. P., AND LAM, M. S. 1993. Array data flow analysis and its use in array privatization. In Conference Record of the 20th ACM Symposium on Principles of Programming Languages (Charleston, S. Carolina, Jan.). ACM Press, New York, 1-15. Google Scholar
MAYDAN, D. E., HENNESSEY, J. L., AND LAM, M. S. 1991. Efficient and exact data dependence analysis, in Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIG- PLAN Not. 26, 6, 1-14. Google Scholar
MCFARLING, S. 1991. Procedure merging with instruction caches. In Proceedings of the SIG= PLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIGPLAN Not. 26, 6, 71-79. Google Scholar
McGRAw, J. R. 1985. SISAL: Streams and iteration in a single assignment language. Tech. Rep. M-146, Lawrence Livermore National Laboratory, Livermore, Calif.Google Scholar
MCMAHON, F. M. 1986. The Livermore Fortran kernels: A computer test of numerical performance range. Tech. Rep. UCRL-55745, Lawrence Livermore National Laboratory, Livermore, Calif.Google Scholar
MICHIE, D. 1968. "Memo" functions and machine learning. Nature 218, 19-22.Google Scholar
MILLSTEIN, R. E. AND MUNTZ, C. A. 1975. The IL- LIAC-IV Fortran compiler. In Programming Languages and Compilers for Parallel and Vector Machines (New York, N.Y., Mar.). SIG- PLAN Not. 10, 3, 1-8. Google Scholar
MIRCHANDANEY, R., SALTZ, J. H., SMITH, R. M., NICOL, D. 1V{., AND CROWLEY, K. 1988. Principles of runtime support for parallel processors. In Proceedings of the ACM International Conference on Supercomputing (St. Malo, France, July). ACM Press, New York, 140-152. Google Scholar
MOREL, E. AND RENVOISE, C. 1979. Global optimization by suppression of partial redundancies. Commun. ACM 22, 2 (Feb.), 96-103. Google Scholar
MUCHNICK, S. S. AND JONES, N., (EDS.) 1981. Program Flow Analysis. Prentice-Hall, Englewood Cliffs, N.J. Google Scholar
MUm~OKA, Y. 1971. Parallelism exposure and exploitation in programs. Ph.D. thesis, Tech. Rep. 71-424, Univ. of Illinois at Urbana-Champaign.Google Scholar
MYERS, E. W. 1981. A precise interprocedural data flow algorithm. In Conference Record of the 8th ACM Symposium on Principles of Programming Languages (Williamsburg, Va., Jan.). ACM Press, New York, 219-230. Google Scholar
NICOLAU, A. 1988. Loop quantization: A generalized loop unwinding technique. J. Para ll. Distrib. Comput. 5, 5 (Oct.), 568-586. Google Scholar
NICOLAU, A. AND FISHER, J. A. 1984. Measuring the parallelism available for very long instruction word architectures. IEEE Trans. Comput. C-33, 11 (Nov.), 968-976.Google Scholar
NIKHIL, R. S. 1988. ID reference manual, version 88.0. Tech. Rep. 284, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google Scholar
NOBAYASHI, H. AND EOYANG, C. 1989. A comparison study of automatically vectorizing Fortran compilers. In Proceedings of Supercomputing '89 (Reno, Nev., Nov.). ACM Press, New York, 820-825. Google Scholar
O'BRIEN, K., HAY, B., MINISH, J., SCHAFFER, H., SCHLOSS, B., SHEPHERD, A., AND ZALESKI, M. 1990. Advanced compiler technology for the RISC System/6000 architecture. In IBM RISC System/6000 Technology. Publication SA23- 2619. IBM Corporation, Mechanlcsburg, PennGoogle Scholar
OED, W. 1992. Cray Y-MP C90: System features and early benchmark results. Parall. Comput. 18, 8 (Aug.), 947-954. Google Scholar
OEHLER, R. R. AND BLASGEN, M. W. 1991. IBM RISC System/6000: Architecture and performance. IEEE Micro 11, 3 (June), 14-24. Google Scholar
PADUA, D. A. AND PETERSEN, P. M. 1992. Evaluation of paralleIizing compilers. Parallel Computing and Transputer Applications, (Barcelona, Spain. Sept.). CIMNE, 1505-1514. Also available as Center for Supercomputing Research and Development Tech Rep. 1173.Google Scholar
PADUA, D. A. AND WOLFE. M. J. 1986. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (Dec.), 1184-1201. Google Scholar
PADUA, D. A., KUCK, D. j. AND LAWRIE, D. 1980. High-speed multiprocessors and compilation techniques. IEEE Trans. Comput. C-29, 9 (Sept.), 763-776.Google Scholar
PETERSEN, P. M. AND PADUA, D. A. 1993. Static and dynamic evaluation of data dependence analysis. In Proceedings of the ACM International Conference on Supercomputing (Tokyo, Japan, July}. ACM Press, New York, 107-116. Google Scholar
PETTIS, K. AND HANSEN, R. C. 1990. Profile guided code pomtioning. In Proceedings of the SIG- PLAN Conference on Programming Language Design and Implementation (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 16-27. Google Scholar
POLYCHRONOPOULOS, C. D. 1988. Parallel Programming and Compilers. Kluwer Academic Publishers, Boston, Mass. Google Scholar
POLYCHRONOPOULOS, C. D. 1987a Advanced loop optlmlzations for parallel computers. In Proceed~ngs of the 1st International Conference on Supercomputing. Lecture Notes in Computer Science, voI 297. Springer-Verlag, Berlin, 255-277. Google Scholar
POLYCHRONOPOULOS, C. D. 1987b. Loop coalescing: A compiler transformation for parallel machines. In Proceedings of the International Conference on Parallel Processing (University Park, Penn., Aug.). Pennsylvania State University Press, University Park, Pa., 235-242.Google Scholar
POLYCHRONOPOULOS, C. D. AND KUCK, D. J. 1987. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput. C-36, 12 (Dec.), 1425-1439. Google Scholar
POLYCHRONOPOULOS, C. D., GIRKAR, M., HAGHIGHAT, M. R., LEE, C. L., LEUNG, B., AND SCHOUTEN, D. 1989. Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors. In Proceedings of the International Conference on Parallel Processing. Volume 2. Pennsylvania State University Press, University Park, Pa., 39-48.Google Scholar
PRESBERO, D. L. AND JOHNSON, N. W. 1975 The Paralyzer: IVTRAN's parallelism analyzer and synthesizer. In Programming Languages and Compilers for Parallel and Vector Machines (New York, N.Y., Mar.). SIGPLAN Not. 10, 3, 9-16. Google Scholar
PUGH, W. 1992. A practical algorithm for exact array dependence analysis Commun. ACM 35, 8 (Aug.). 102-115. Google Scholar
PUGH, W. 1991. Umform techniques for loop optimization. In Proceedings of the ACM International Conference on Supercomputing (Cologne, Germany, June). ACM Press, New York. Google Scholar
RAU, B. AND FISHER, J. A. 1993. Instruchon4evel parallel processing: History, overview, and perspective. J. Supercomput. 7, 1/2 (May), 9-50. Google Scholar
RAU, B., YEN, D. W. L., YEN, W., AND TOWLE, R. A. 1989. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. Computer 22, 1 (Jan.), 12-34. Google Scholar
REES, j., CLINGER, W., ET AL. 1986 Revised3 report on the algorithmic language Scheme. SIG- PLAN Not 21, 12 (Dec.), 37-79. Google Scholar
RISEMAN, E. M. AND FOSTER, C. C. 1972. The inhibition of potential parallelism by conditional jumps. IEEE Trans. Comput. C-21~ 12 (Dec.), 1405-1411.Google Scholar
ROGERS, A. M. 1991. Compiling for locality of reference. Ph.D. thesis, Tech. Rep. TR 91-1195, Dept. of Computer Science, Cornell University. Google Scholar
RUSSELL, R. M. 1978. The CRAY-1 computer system. Commun. ACM 21, 1 (Jan.), 63-72. Google Scholar
SABOT, G. AND WHOLEY, S. 1993 CMAX: A Fortran translator for the Connection Machine ~ystem. In Proceedings of the ACM InternatLonal Conference on Supercornput~ng (Tokyo, Japan, July). ACM Press, New York. Google Scholar
SARKAR, V. 1989. Partitioning and Scheduling Parallel Programs for Multiprocessors. Research Monographs in Parallel and Distributed Computing. MIT Press, Cambridge, Mass. Google Scholar
SARKAR, V. AND THEKKATH, R. 1992. A general framework for iteration-reordering transformations. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 175-187. Google Scholar
SCHEIFLER, R. W. 1977. An analysis of inline substitution for a structured programming language. Commun. ACM 20, 9 (Sept.). 647-654. Google Scholar
SHEN, Z., LI, Z., AND YEW, P-C. 1990. An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Parall. Distrib. Syst. 1, 3 (July), 356-364. Google Scholar
SINGH, J. P. AND HENNESSY, J. L. 1991. An empirical investigation of the effectiveness and limitations of automatic parallelization. In Proceedings of the International Symposium on Shared Memory Multiprocessing, (Apr.). 213-240.Google Scholar
SINGH, J. P., WEBER, W.-D., AND GUPTA, A. 1992. SPLASH: Stanford parallel applications for shared memory. SIGARCH Comput. Arch. News 20, 1 (Mar.), 5-44. Also available as Stanford Univ. Tech. Rep. CSL-TR-92-526. Google Scholar
SITES, R. L., (ED.) 1992. Alpha Architecture Reference Manual. Digital Press, Bedford, Mass. Google Scholar
SOHI, G. S. AND VAJAPAYEM, S. 1990. Instruction issue logic for high-performance, interruptable, multiple functional unit, pipelined computers. IEEE Trans. Comput. C-39, 3 (Mar.), 349-359. Google Scholar
STEELE, G. L., JR. 1977. Arithmetic shifting considered harmful. SIGPLAN Not. 12, 11 (Nov.), 61-69. Google Scholar
STEELE, G. L., JR. AND WHITE, J. L. 1990. How to print floating-point numbers accurately. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 112-126. Google Scholar
STENSTROM, P. 1990. A survey of cache coherence schemes for multiprocessors. Computer 23, 6 (June), 12-24. Google Scholar
SUN MICROSYSTEMS. 1991. SPARC Architecture Manual, Version 8. Part No. 800-1399-08. Sun Microsystems, Mountain View, Calif.Google Scholar
SZYMANSI4I, T. G. 1978. Assembling code for machines with span-dependent instructions. Cornmum. ACM 21, 4 (Apr.), 300-308. Google Scholar
TANG, P. AND YEW, P. 1990. Dynamic processor self-scheduling for general parallel nested loops. IEEE Trans. Comput. C-39, 7 (July), 919-929. Google Scholar
THINKING MACHINES. 1989. Connection Machine, Model CM-2 Technical Summary. Thinking Machines Corp., Cambridge, Mass.Google Scholar
TJADEN, G. S. AND FLYNN, M. J. 1970. Detection and parallel execution of parallel instructions. IEEE Trans. Comput. C-19, 10 (Oct.), 889-895.Google Scholar
TORRELLAS, J., LAM, H. S., AND HENNESSY, J. L. 1994. False sharing and spatial locality in multiprocessor caches. IEEE Trans. Comput. 43, 6 (June), 651-663. Google Scholar
TOWLE, R. A. 1976. Control and data dependence for program transformations. Ph.D. thesis, Tech. Rep. 76-788, Computer Science Dept., Univ. of Illinois at Urbana-Champaign. Google Scholar
TRIOLET, R., IRIGOIN, F., AND FEAUTRIER, P. 1986. Direct parallelization of call statements. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Palo Alto, Calif., June). SIGPLAN Not. 21, 7 (July), 176-185. Google Scholar
TSENG, C.-W. 1993. An optimizing Fortran D compiler for MIMD distributed-memory machines. Ph.D. thesis, Tech. Rep. Rice COMP TR93-199, Computer Science Dept., Rice University, Houston, Tex. Google Scholar
TU, P. AND PADUA, D. A. 1993. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 768. Springer-Verlag, Berlin, 500-521. Google Scholar
VON HANXLEDEN, R. AND KENNEDY, K. 1992. Relaxing SIMD control flow constraints using loop transformations. In Proceedings of the SIG- PLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 188-199. Google Scholar
WALL, D. W. 1991. Limits of instruction-level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.). SIGPLAN Not. 26, 4, 176-188. Google Scholar
WANG, K. 1991. Intelligent program optimization and parallelization for parallel computers. Ph.D. thesis, Tech. Rep. CSD-TR-91-030, Purdue University, West Lafayette, Ind. Google Scholar
WANG, Ko AND GANNON, D. 1989. Applying AI techniques to program optimization for parallel computers, in Parallel Processing for Supercomputers and Artificial Intelligence, K. Hwang and D. Degroot, Eds. McGraw Hill, New York, Chap. 12.Google Scholar
WEDEL, D. 1975. Fortran for the Texas Instruments ASC system. In Programming Languages and Compilers for Parallel and Vector Machines (New York, N.Y., Mar.). SIGPLAN Not. 10, 3, 119-132. Google Scholar
WEGMAN, M. N. AND ZADECK, F. K. 1991. Constant propagation with conditional branches. ACM Trans. Program. Lang. Syst. 13, 2 (Apr.), 181-210. Google Scholar
WEISS, M. 1991. Strip mining on SIMD architectures. In Proceedings of the ACM International Conference on Supercomputing (Cologne, Germany, June). ACM Press, New York, 234-243. Google Scholar
WHOLEY, S. 1992. Automatic data mapping for distributed-memory parallel computers. In Proceedings of the ACM International Conference on Supercomputing (Washington, D.C., July). ACM Press, New York, 25-34. Google Scholar
WOLF, M. E. 1992. improving locality and parallelism in nested loops. Ph.D. thesis, Computer Science Dept., Stanford University, Stanford, Calif. Google Scholar
WOLFE, M. J. 1989a. More iteration space tiling. In Proceedings of Supercomputing '89 (Reno, Nev., Nov.). ACM Press, New York, 655-664. Google Scholar
WOLFE, M. J. 1989b. Optimizing Supercompilers for Supercomputers. Research Monographs in Parallel and Distributed Computing. MIT Press, Cambridge, Mass. Google Scholar
WOLF, M. E. AND LAM, M. S. 1991. A loop transformation theory and an atgonthm to maximize parallelism. IEEE Trans. Parall. D~strzb. Syst. 2, 4 (Oct.), 452-471. Google Scholar
WOLFE, M J AND TSENG, C. 1992. The Power Test for data dependence. 1EEE Trans. Parall. Distrtb. Syst. 3, 5 (Sept), 591-601 Google Scholar
ZIMA, H. P, BAST, H. J, AND GERNDT, M. 1988. SUPERB. A tool for semi-automatic S1MD/MIMD parallelization Parall. Comput 6, 1 (Jan.J, 1-18.Google Scholar

Index Terms

Compiler transformations for high-performance computing
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Read More
Compiler Transformations for High-Performance Computing
Read More
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS'15

This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Read More

Reviews

Reviewer: Max Hailperin

This is a comprehensive survey of high-level optimizing transformations used in compilers for superscalar, vector, and multiprocessor machines. The emphasis is on transformations endemic to these architectures; supporting analyses and more classic transformations also receive some coverage. The authors expect "readers ... to be familiar with modern computer architecture and basic program compilation techniques." The extensive bibliography generally includes both the earliest work on a topic and a sampling of important recent work. Since the recent work is only sampled, a citation index should be consulted. The writing is generally clear, and is supplemented with useful examples and diagrams and a helpful appendix of model architectures. Accessibility is marred by flaws in the diagrams and bibliography. In the diagrams, the dotted lines largely disappeared between the authors' computer and the printed page. In some diagrams, they are gone entirely; in others, an occasional faint dot remains. This unfortunately turns the diagrams into puzzles. The bibliography's "alphabetization" is quite unusual. The presentation seems reasonably authoritative; there are some minor weaknesses in this regard in peripheral topics (e.g., the definition of dependence distance, the description of the Omega test, the citation of Morel and Renvoise's work, and the treatment of tail-recursion optimization). The importance of this survey, however, doesn't rest on its authoritativeness, let alone its authoritativeness regarding peripheral topics. Rather, it is important because of the amount of material that is collected into one easily accessible place. It will play an important educational role for years to come. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Computing Surveys Volume 26, Issue 4
Dec. 1994
117 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/197405
Issue’s Table of Contents

Copyright © 1994 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 1994
Published in csur Volume 26, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compilation
dependence analysis
locality
multiprocessors
optimization
parallelism
superscalar processors
vectorization
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 600
  Total Citations
  View Citations
- 9,570
  Total Downloads
- Downloads (Last 12 months)1,241
- Downloads (Last 6 weeks)239
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Compiler transformations for high-performance computing

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

PolyMage: Automatic Optimization for Image Processing Pipelines

Compiler Transformations for High-Performance Computing

PolyMage: Automatic Optimization for Image Processing Pipelines

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Compiler transformations for high-performance computing

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

PolyMage: Automatic Optimization for Image Processing Pipelines

Compiler Transformations for High-Performance Computing

PolyMage: Automatic Optimization for Image Processing Pipelines

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media