skip to main content
article
Free Access

Compiler transformations for high-performance computing

Authors Info & Claims
Published:01 December 1994Publication History
Skip Abstract Section

Abstract

In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis.

This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application.

Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimizations that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques.

References

  1. ABELSON, H. AND SUSSMAN, G. J. 1985. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, Mass. Google ScholarGoogle Scholar
  2. ABu-SUFAH, W 1979. improving the performance of virtual memory computers. Ph D., thesis, Tech. Rep. 78-945, Univ. of Illinois at Urbana- Champaign. Google ScholarGoogle Scholar
  3. ABU-SUFAH, W., KUCK, D. J., AND LAWRIE, I). 1981 On the performance enhancement of paging systems through program analysis and transformations IEEE Trans. Comput. C-30, 5 (May), 341-356.Google ScholarGoogle Scholar
  4. ACKERSLAXq, W B. 1982. Data flow languages. Computer 15, 2 (Feb.), 15-25.Google ScholarGoogle Scholar
  5. AHO, A. V., JOHNSON, S C., AND ULLMAN, J. D 1977. Code generation for expressions with common subexpressions. J. ACM 24, 1 (Jan.), 146-160. Google ScholarGoogle Scholar
  6. Ago, A. V., SF~THI, R., AND ULLMAN, J. D. 1986. Compilers. Prtnc~ples, Techniques, and Tools. Addison-Wesley, Reading, Mass. Google ScholarGoogle Scholar
  7. AIKEN, A. AND NICOLAU, A. 1988a Optimal loop parallehzation. In Procee&ngs of the SIGPLAN Conference on Programmzng Language Destgn and Implementahon (Atlanta, Ga., June). SIG- PLAN Not. 23, 7 (July), 308-317. Google ScholarGoogle Scholar
  8. AMEN, A. AND NICOIAU. A. 1988b. Perfect pipelinrag: A new loop parallelization technique. In Proceedmgs of the 2nd European Symposzum on Programming Lecture Notes in Computer Science, vol. 300. Springer-Verlag, Berlin, 221-235. Google ScholarGoogle Scholar
  9. ALLEN, J. R. 1983. Dependence analysis for subscripted variables and its apphcation to program transformations. Ph.D. thesis, Computer Science Dept., Rice University, Houston, Tex Google ScholarGoogle Scholar
  10. ALLEN, F. E. 1969. Program optimization. In Annual Rewew ~n Automatic Programming 5. International Tracts in Computer Science and Technology and their Application, vol. 13. Pergamon Press, Oxford, England, 239-307.Google ScholarGoogle Scholar
  11. ALLEN, F. E. AND COCKE, J. 1971. A catalogue of optimizing transformations. In Design and Opt~m~zat~on of Compilers, R Rustin, Ed. Prentice-Hall, Englewood Cliffs, N.J., 1-30.Google ScholarGoogle Scholar
  12. ALLEN, J. R. AND KENNEDY, K. 1987. Automatic translation of Fortran programs to vector form. ACM Trans. Program Lang. Syst. 9, 4 (Oct.), 491-542. Google ScholarGoogle Scholar
  13. ALLEN, J. R. AND I~~ENNEDY, K. 1984. Automatic loop interchange. In Proceedings of the SIGPLAN Symposium on Compzler Construction (Montreal, Quebec, June). SIGPLAN Not. 19, 6, 233-246. Google ScholarGoogle Scholar
  14. ALLEN, F. E., BURKE, M., CHARLES, P., CYTRON, R., AND FERRANTE, j. 1988a. An overview of the PTRAN analysis system for multiprocessing. J. Parall. Distrib. Comput. 5, 5 (Oct.), 617-640. Google ScholarGoogle Scholar
  15. ALLEN, F. E., BURKE, M., CYTRON, R., FERRANTE, J., HSIEH, W., AND SARKAR, V. 1988b. A framework for determining useful parallelism. In Proceedings of the ACM International Conference on Supercomput~ng (St. Malo, France, July). ACM Press, New York, 207-215. Google ScholarGoogle Scholar
  16. ALLEN, F. E., COCKE, J., AND KENNEDY, K. 1981. Reduction of operator ~trength. In Program Flow Analysis: Theory and Apphcations, S. S. Muchnik and N. D. Jones, Eds., Prentice-Hall, Englewood Cliffs, N.J., 79-101.Google ScholarGoogle Scholar
  17. ALL~N, J R., KENNEDY, K., PORTERFIELD, C., AND WARREN, J. 1983. Conversion of control dependence to data dependence In Conference Record of the 10th ACM Symposium on Principlos of Programming Languages (Austin, Tex., Jan.). ACM Press, New York, 177-189. Google ScholarGoogle Scholar
  18. ALPERT, D. AND AVNON, D. 1993 Architecture of the Pentium microprocessor IEEE Micro 13, 3 (June), 11-21 Google ScholarGoogle Scholar
  19. AMERICAN NATIONAL STANDARDS INSTITUTE. 1987. An American National standard, IEEE standard for binary floating-point arithmetic SIGPLAN Not. 22, 2 (Feb.), 9-25.Google ScholarGoogle Scholar
  20. ANDERSON, J. M. AND LAM, M. S. 1993 Global optimlzations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN Conference on Programming Language Deszgn and Implementation (Albuquerque, New Mexico, June). SIGPLAN Not. 28, 6, 112-125. Google ScholarGoogle Scholar
  21. ANDERSON, S. AND HUDAK, P 1990 Compilation of Haskell array comprehensions for scientific computing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementatwn (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 137-149. Google ScholarGoogle Scholar
  22. APPEL, A. W. 1992. Compding w~th Continuations. Cambridge University Press, Cambridge, England. Google ScholarGoogle Scholar
  23. ARDEN, B. W., GALLER, B. A., AND GRAHAM, R. M. 1962. An algorithm for translating Boolean expressions. J. ACM 9, 2 (Apr.), 222-239. Google ScholarGoogle Scholar
  24. ARWND AND CULLER, D. E. 1986. Dataflow architectures. In Annual Review of Computer Science. Vol. 1. J. F. Traub, B. J. Grosz, B. W. Lampson, and N. J. Nilsson, Eds. Annual Reviews, Palo Alto, Calif, 225-253. Google ScholarGoogle Scholar
  25. ARWND, KATHAIL, V., AND PINGALI, K. 1980. A dataflow architecture with tagged tokens. Tech. Rep TM-174, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google ScholarGoogle Scholar
  26. BACON, D. F., CHOW, J.-H., Ju, D. R, MUTHUKUMAR, 14, AND SARKAR, V. 1994. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proceedings of CASCON '94 (Toronto, Ontario, Oct.). Google ScholarGoogle Scholar
  27. BAILEY, D. H. 1992. On the behavior of cache memories with strided data access. Tech. Rep. RNR- 92-015, NASA Ames Research Center, Moffett Field, Calif., (May).Google ScholarGoogle Scholar
  28. BAILEY, D. H., BARszcz, E., BARTON, J. T., BROWNING, D. S., CARTER, R. L., DAGUM, L., FATOOHI, R. A., FREDERICKSON, P. 0., LASINSKI, T. A., SCHREIBER, R. S., SIMON, H. D., VENKATAKRISHNAN, V., AND WEERATUNGA, S. K. 1991. The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5, 3 (Fall), 63-73.Google ScholarGoogle Scholar
  29. BALASUNDARAM, V. 1990. A mechanism for keeping useful internal information in parallel programming tools: The Data Access Descriptor. J. Parall. D~stnb. Comput. 9, 2 (June), 154-170. Google ScholarGoogle Scholar
  30. BALASUNDARAM, V., Fox, G., KENNEDY, K., AND KREMER, V. 1990. An interactive environment for data partitioning and distribution. In Proceedtngs of the 5th D~strzbuted Memory Computer Conference (Charleston, South Carolina, Apr.). IEEE Computer Society Press, Los Alamitos, Calif.Google ScholarGoogle Scholar
  31. BALASUNDARAM, V., KENNEDY, K., KREMER, U., MCKINLEY, K., AND SUBHLOK, J. 1989. The ParaScope editor: An interactive parallel programming tool. In Proceedings of Supercomputing '89 (Reno, Nev., Nov.). ACM Press, New York, 540-550. Google ScholarGoogle Scholar
  32. BALL, J. E. 1979. Predicting the effects of optimization on a procedure body. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Denver, Color., Aug.). SIGPLAN Not. 14, 8, 214-220. Google ScholarGoogle Scholar
  33. B~NE~EE, U. 1991. Unimodular transformations of double loops. In Advances in Languages and Compilers for Parallel Processing, A. Nicolau, Ed. Research Monographs in Parallel and Distributed Computing, MIT Press, Cambridge, Mass., Chap. 10.Google ScholarGoogle Scholar
  34. BANERJEE, U. 1988a. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, Mass. Google ScholarGoogle Scholar
  35. BnNERJEE, U. 1988b. An introduction to a formal theory of dependence analysis. J. Supercomput. 2, 2 (Oct.), 133-149.Google ScholarGoogle Scholar
  36. BANERJEE, U. 1979. Speedup of ordinary programs, Ph.D. thesis, Tech. Rep. 79-989, Computer Science Dept., Univ. of Illinois at Urbana- Champaign. Google ScholarGoogle Scholar
  37. BANERJEE~ U., CHEN, S. C., KUCK, D. J., AND TOWLE, R. A. 1979. Time and parallel processor bounds for Fortran-like loops. IEEE Trans. Comput. C-28, 9 (Sept.), 660-670.Google ScholarGoogle Scholar
  38. BANNING, J. P. 1979. An efficient way to find the side-effects of procedure calls and the aliases of variables. In Conference Record of the 6th ACM Symposium on Principles of Programming Languages (San Antonio, Tex., Jan.). ACM Press, 29-41. Google ScholarGoogle Scholar
  39. BERNSTEIN, R. 1986. Multiplication by integer constants. Softw. Pract. Exper. 16, 7 (July), 641-652. Google ScholarGoogle Scholar
  40. BERRY, M., CHEN, D., Koss, P., KUCK, D., Lo, S., PANG, Y., POINTER, L., ROLOFF, R., SAMEH, A., CLEMENTI, E_. CHIN. S. SCHNEIDER. D. FOX. FT., MESSINA, P., WALKER, D., HSIUNG, C., SCHWARZMEIER, J., LUE, K., ORSZAG, S., SEIDL, F., JOHNSON, O., GOODRUM, R., AND MARTIN, J. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Int. J. Supercomput. Appl. 3, 3 (Fall), 5-40.Google ScholarGoogle Scholar
  41. BLELLOCH, G. 1989. Scans as primitive parallel operations. IEEE Trans. Comput. C-38, 11 (Nov.), 1526-1538. Google ScholarGoogle Scholar
  42. BROMLEY, M., {-IELLER, S., MCNERNEY, W., AND STEELE, G. L., JR. 1991. Fortran at ten gigaflops: The Connection Machine convolution compiler. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIG- PLAN Not. 26, 6, 145-156. Google ScholarGoogle Scholar
  43. BURKE, M. AND CYTRON, R. 1986. Interprocedural dependence analysis and paratlelization. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Palo Alto, Calif., June). SIGPLANNot. 21, 7 (July), 162-175. Extended version available as IBM Thomas J. Watson Research Center Tech. Rep. RC 11794. Google ScholarGoogle Scholar
  44. BURNETT, G. J. AND COFFMAN, E. G., JR. 1970. A study of interleaved memory systems. In Proceedings of the Spring Joint AFIPS Computer Conference. vol. 36. AFIPS, Montvale, N.J., 467-474.Google ScholarGoogle Scholar
  45. BURSTALL, R. M. AND DARLINGTON, J. 1977. A transformation system for developing recursive programs. J. ACM 24, 1 (Jan.}, 44-67. Google ScholarGoogle Scholar
  46. BUTLER, M., YEH, T., PATT, Y., ALSUP, M., SCALES, H., AND SHEBANOW, M. 1991. Single instruction stream parallelism is greater than two. In Proceedings of the 18th Annual International Symposium on Computer Architecture (Toronto, Ontario, May). SIGARCH Comput. Arch. News 19, 3, 276-286. Google ScholarGoogle Scholar
  47. CALLAHAN, D. AND KENNEDY, K. 1988a. Analysis of interprocedural side-effects in a parallel programming environment. J. Parall. Distrib. Comput. 5, 5 (Oct.), 517-550. Google ScholarGoogle Scholar
  48. CALLAHAN, D. AND KENNEDY, K. 1988b. Compiling programs for distributed-memory multiprocessots. J. Supercomput. 2, 2 (Oct.), 151-169.Google ScholarGoogle Scholar
  49. CALLAHAN, D., CARR, S., AND KENNEDY, K. 1990. Improving register allocation for subscripted variables. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 53-65. Google ScholarGoogle Scholar
  50. CAL~, D., COCKE, J., AND KENNEDY, K. 1988. Estimating interlock and improving balance for pipelined architectures. J. Parall. Distrib. Comput. 5, 4 (Aug.), 334-358. Google ScholarGoogle Scholar
  51. CALLAHAN, D., COOPER, K., KENNEDY, K., AND TORCZON, L. 1986. Interprocedurat constant propagation. In Proceedings of the SIGPLAN Symposium on Compder Construction (Palo Alto, Calif., June). SIGPLAN Not. 21, 7 (July), 152-161. Google ScholarGoogle Scholar
  52. CARR, S. 1993. Memory-hierarchy management. Ph.D. thesis, Rice University, Houston, Tex. Google ScholarGoogle Scholar
  53. CHAITIN, G. J. 1982. Register allocation and spilling via graph coloring. In Proceedings of the SIG- PLAN Symposium on Compiler Construction (Boston, Mass., June). SIGPLAN Not 17, 6, 98-105. Google ScholarGoogle Scholar
  54. CHAITIN, G. J., AUSLANDER, M. A., CHANDRA, A. K., COOKE, J., HOPKINS, M. E., AND MARKSTEIN, P. W. 1981. Register allocation via coloring. Comput. Lang. 6, 1 (Jan.), 47-57.Google ScholarGoogle Scholar
  55. CHAMBERS, C. AND UNGAR, D. 1989. Customization: Optimizing compiler technology for SELF, a dynamically-typed object-oriented programming language. In Proceedings of the SIG- PLAN Conference on Programming Language Design and Implementatmn (Portland, Ore., June). SIGPLAN Not. 24, 7 (July), 146-160. Google ScholarGoogle Scholar
  56. CHATTERJEE, S., GILBERT, J. R., AND SCHREIBER, R. 1993a. The alignment-distribution graph. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 768. 234-252. Google ScholarGoogle Scholar
  57. CHATTERJEE, S., GILBERT, J. R., SCHREIBER, R., AND TENG, S.-H. 1993b. Automatic array alignment in data-parallel programs. In Conference Record of the 20th ACM Symposium on Principles of Programming Languages (Charleston, S. Carolina, Jan.). ACM Press, New York, 16-28. Google ScholarGoogle Scholar
  58. CHEN, S. C. AND KUCK, D. J. 1975. Time and parallel processor bounds for linear recurrence systems. IEEE Trans. Comput. C-24 7 (July), 701-717.Google ScholarGoogle Scholar
  59. CHoI, J.-D., BURKE, M., AND CARINI, P. 1993. Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects. in Conference Record of the 20th ACM Symposlum on Principles of Programming Languages (Charleston, S. Carolina, Jan.). ACM Press, New York, 232-245. Google ScholarGoogle Scholar
  60. Chow, F. C. 1988. Minimizing register usage penalty at procedure calls. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Atlanta, Ga., June). SIGPLAN Not. 23, 7 (July), 85-94. Google ScholarGoogle Scholar
  61. CHOW, F. C. AND HENNESSEY, J. L. 1990. The priority-based coloring approach to register allocation. ACM Trans. Program. Lang. Syst. 12, 4 (Oct.), 501-536. Google ScholarGoogle Scholar
  62. CLARKE, C. D. AND PEYTON-JONES, S. L. 1985. Strictness analysis--a practical approach. In Functional Programming Languages and Computer Architecture. Lecture Notes in Computer Science, vol. 201. Springer-Verlag, Berlin, 35-49. Google ScholarGoogle Scholar
  63. CLINGER, W. D. 1990. How to read floating point numbers accurately. In Proceedings of the SIG- PLAN Conference on Programming Language Desiogn and Implementatwn (W~nite Plains, N.Y., June). SIGPLAN Not. 25, 6, 92-101. Google ScholarGoogle Scholar
  64. COCKE, J. 1970. Global common subexpression elimination. In Proceedings of the ACM Symposium on Compiler Opttmizatwn (July). SIGPLAN Not. 5, 7, 20-24. Google ScholarGoogle Scholar
  65. COCKE, J. AND MARKSTEIN, P. 1980. Measurement of program improvement algorithms. In Proceedrags of the IFIP Congress (Tokyo, Japan, Oct.). North-Holland, Amsterdam, Netherlands, 221-228. Also available as IBM Tech. Rep. RC 8111, Feb. 1980.Google ScholarGoogle Scholar
  66. COCKE, J. AND SCHWARTZ, J. T. 1970. Programming languages and their compilers (preliminary notes). 2nd ed. Courant Inst. of Mathematical Sciences, New York University, New York. Google ScholarGoogle Scholar
  67. COLWELL, R. P., NIX, R. P., O'DONNEL, J. J., PAPWORTH, D. B., AND RODMAN, P. K. 1988. A VLIW architecture for a trace scheduling compiler. IEEE Trans. Comput. C-37, 8 (Aug.), 967-979. Google ScholarGoogle Scholar
  68. COOPER, K. D. AND KENNEDY, K. 1989. Fast interprocedural alias analysis. In Conference Record of the 16th ACM Symposium on Principles of Programming Languages (Austin, Tex., Jan.). ACM Press, New York, 49-59. Google ScholarGoogle Scholar
  69. COOPER, K. D., HALL, M. W., AND KENNEDY, K. 1993. A methodology for procedure cloning. Comput. Lang. 19, 2 (Apr.), 105-117.Google ScholarGoogle Scholar
  70. COOPER, K. D., HALL, M. W., AND TORCZON, L. 1992. Unexpected side effect~ of inline substitution: A case study. ACM Lett. Program. Lang. Syst 1, 1 (Mar.), 22-32. Google ScholarGoogle Scholar
  71. COOPER, K. D., HALL, M. W., AND TORCZON, L. 1991. An experiment with inline substitution. Softw. Pract. Exper. 21, 6 (June), 581-601 Google ScholarGoogle Scholar
  72. CRAY RESEARCH 1988 CFT77 Reference Manual. Publication SR-0018-C. Cray Research, Inc., Eagan, Minn.Google ScholarGoogle Scholar
  73. CYTRON, R. 1986. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the Internatwnal Conference on Parallel Processing (St. Charles, Ill., Aug.). IEEE Computer Society, Washington, D.C., 836-844.Google ScholarGoogle Scholar
  74. CYTRON, R. AND FERRANTE, J. 1987. What's in a name? -or- the value of renaming for parallelism detection and storage allocation. Proceedings of the Internatwnal Conference on Parallel Processing (University Park, Penn., Aug.). Pennsylvania State University Press, University Park, Pa., 19-27.Google ScholarGoogle Scholar
  75. DANTZIG, G. B. AND EAVES, B. C. 1974. Fourier- Motzkin ehmination and sits dual with application to integer programming, in Combinatorial Programming: Methods and Applications (Versailles, France, Sept.). B. D. Roy, Ed. D. Reidel, Boston, Mass., 93-102.Google ScholarGoogle Scholar
  76. DENNIS, J. B. 1980 Data flow supercomputers_ Computer 13, 11 (Nov.), 48-56.Google ScholarGoogle Scholar
  77. DIXIT, K. M. 1992. New CPU benchmarks from SPEC. In Digest of Papers, Spring COMPCON 1992, 37th IEEE Computer Society International Conference (San Francisco, Calif., Feb.). IEEE Computer Society Press, Los Alamitos, Calif., 305-310. Google ScholarGoogle Scholar
  78. DONGARRA, J. AND HIND, A. R. 1979. Unrolling loops in Fortran. Softw. Pratt. Exper. 9, 3 (Mar.), 219-226.Google ScholarGoogle Scholar
  79. EGGERS, S. J. AND KATZ, R. H. 1989 Evaluating the performance of four snooping cache coherency protocols. In Proceedings of the 16th Annual International Symposium on Computer Architecture (Jerusalem, Israel, May). SIGARCH Comput. Arch. News 17, 3 (June), 2-15. Google ScholarGoogle Scholar
  80. EIGENMANN, R., HOEFLINGER, J., LI, Z., AND PADUA, D. A. 1991. Experience in the automatic parallelization of four Perfect-benchmark programs. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 589. Springer-Verlag, Berlin, 65-83. Also available as Tech. Rep. 1193, Center for Supercomputing Research and Development. Google ScholarGoogle Scholar
  81. ELLIS, J. R. 1986. Bulldog: A Compiler for VLIW Architectures. ACM Doctoral Dissertation Award. MIT Press, Cambridge, Mass. Google ScholarGoogle Scholar
  82. ERSHOV, A. P. 1966. ALPHA--an automatic programming system of high efficiency. J. ACM 13, 1 (Jan.), 17-24. Google ScholarGoogle Scholar
  83. FEAUTRIER, P. 1991. Dataflow analysis of array and scalar references. Int. J. Parall. Program. 20, 1 (Feb.), 23-52.Google ScholarGoogle Scholar
  84. FEAUTRIER, P. 1988. Array expansion, in Proceedings of the ACM International Conference on Supercomputing (St. Malo, France, July). ACM Press, New York, 429-441. Google ScholarGoogle Scholar
  85. FERRARI, D. 1976. The improvement of program behavior. Computer 9, 11 (Nov.), 39-47.Google ScholarGoogle Scholar
  86. FISHER, J.A. 1981. Trace scheduling: A technique for global microcode compaction. IEEE Trans. Comput. C-30, 7 (July), 478-490.Google ScholarGoogle Scholar
  87. FISHER, J. A., ELLIS, J. R., RUTTENBERG, J. C., AND NICOLAU, A. 1984. Parallel processing: A smart compiler and a dumb machine. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Montreal, Quebec, June). SIGPLAN Not. 19, 6, 37-47. Google ScholarGoogle Scholar
  88. FLOATING POINT SYSTEMS. 1979. FPS AP-120B Processor Handbook. Floating Point Systems, Inc., Beaverton, Ore.Google ScholarGoogle Scholar
  89. FREE SOFTWARE FOUNDATION. 1992. gCC 2.X Reference Manual. Free Software Foundation, Cambridge, Mass.Google ScholarGoogle Scholar
  90. FREUDENBERGER, S. M., SCHWARTZ, J. T., AND SHARIR, M. 1983. Experience with the SETL optimizer. ACM Trans. Program. Lang. Syst. 5, 1 (Jan.), 26-45. Google ScholarGoogle Scholar
  91. GANNON, D., JALBY, W., AND GALLIVAN, K. 1988. Strategies for cache and local memory management by global program transformation. J. Parall. Distrib. Comput. 5, 5 (Oct.), 587-616. Google ScholarGoogle Scholar
  92. GERNDT, M. 1990. Updating distributed variables in local computations. Concurrency Pract. Exper. 2, 3 (Sept.), 171-193. Google ScholarGoogle Scholar
  93. GIBSON, D. H. AND RAO, G. S. 1992. Design of the IBM System/390 computer family for numerically intensive applications: An overview for engineers and scientists. IBM J. Res. Dev. 36, 4 (July), 695-711. Google ScholarGoogle Scholar
  94. GIRKAR, M. AND POLYCHRONOPOULOS, C. D. 1988. Compiling issues for supercomputers. In Proceedings of Supercomputing '88 (Orlando, Fla., Nov.). IEEE Computer Society, Washington, D.C., 164-173. Google ScholarGoogle Scholar
  95. GOFF, G., KENNEDY, K., AND TSENG, C.-W. 1991. Practical dependence testing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIGPLAN Not. 26, 6, 15-29. Google ScholarGoogle Scholar
  96. GRAHAm, S. L., Lucco, S., AND SHARP, O. J. 1993. Orchestrating interactions among parallel computations. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Albuquerque, New Mexico, June). SIGPLAN Not. 28, 6, 100-111. Google ScholarGoogle Scholar
  97. GRANLUND, W. AND KENNER, R. 1992. Eliminating branches using a superoptimizer and the GNU compiler, in Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 341-352. Google ScholarGoogle Scholar
  98. GUPTA, M. 1992. Automatic data partitioning on distributed memory multicomputers. Ph.D. thesis, Tech. Rep. UILU-ENG-92-2237, Univ. of Illinois at Urbana-Champaign. Google ScholarGoogle Scholar
  99. GUPTA, M., MIDKIFF, S., SCHONBERG, E., SWEENEY, P., WANG, K. Y., AND BURKE, M. 1993. PTRAN II: A compiler for High Performance Fortran. In Proceedings of the 4th International Workshop on Compilers for Parallel Computers (Delft, Netherlands, Dec.). 479-493.Google ScholarGoogle Scholar
  100. HALL, M. W., KENNEDY, K., AND MCKINLEY, K. S. 1991. Interprocedural transformations for parallel code generation. In Proceedings of Supercomputing '91 (Albuquerque, New Mexico, Nov.). IEEE Computer Society Press, os Alamitos, Calif., 424-434. Google ScholarGoogle Scholar
  101. HARRIS, K. AND HOBBS, S. 1994. VAX Fortran. In Optimization in Compilers, F. E. Allen, R. Cytron, B. K. Rosen, and K. Zadeck, Eds. ACM Press, New York, chap. 16. To be published.Google ScholarGoogle Scholar
  102. HATFIELD, D. J. AND GERALD, j. 1971. Program restructuring for virtual memory. IBM Syst. J. 10, 3, 168-192.Google ScholarGoogle Scholar
  103. HENNESSY, J. L. AND PATTERSON, D. A. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Mateo, Cali~ Google ScholarGoogle Scholar
  104. HEWLETT-PACKARD. 1992. PA-RISC 1.1 Architecture and Instruction Manual. 2nd ed. Part Number 09740-90039. Hewlett-Packard, Palo Alto, Ca}i~Google ScholarGoogle Scholar
  105. HIGH PERFORMANCE FORTRAN FORUM. 1993. High Performance Fortran language specification, version 1.0. Tech. Rep. CRPC-TR92225, Rice University, Houston, Tex.Google ScholarGoogle Scholar
  106. HIRANANDANI, S., KENNEDY, K., AND TSENG, C.-W. 1993. Preliminary experiences with the Fortran D compiler. In Proceedlng~ of Supercomputing '93 (Portland, Ore., Nov.). IEEE Computer Society Press, Los Alamitos, Calif., 338-350. Google ScholarGoogle Scholar
  107. HIRANANDANI, S., KENNEDY, K., AND TSENG, C.-W. 1992. Compiling Fortran D for MIMD distributed-memory machines. Commun. ACM 35, 8 (Aug.), 66-80. Google ScholarGoogle Scholar
  108. HUDAK, P. AND GOLDBERG, B. 1985. Distributed execution of functional programs using serial combinators IEEE Trans. Comput. C-34, 10 (Oct), 881-890. Google ScholarGoogle Scholar
  109. Hwu, W. W. AND CHANG, P. P. 1989. Achieving high instruction cache performance with an optimizing compiler. In Proceedings of the 16th Annual International Symposium on Computer Architecture (Jerusalem, Israel, May). SIGARCH Comput. Arch. News 17, 3 (June), 242-251. Google ScholarGoogle Scholar
  110. Hwu, W. W., MAHLKE, S. A., CHEN, W. Y., CHANG, P. P., WARTER, N. J., BRINGMANN, R. A., OUELLETFE, R. G., HANK, R. E., KIYOHARA, T., HAAB, G. E., HOLM, J. G., AND LAVERY, D M 1993. The superblock: An effective technique for VLIW and superscalar compilation. J. Supercomput. 7, 1/2 (May), 229-248. Google ScholarGoogle Scholar
  111. IBM. 1992. Optimization and Tuning Gutde for the XL Fortran and XL C Compilers. 1st ed. Publication SC09-1545-00, IBM, Armonk, N.Y.Google ScholarGoogle Scholar
  112. IBM. 1991. IBM RISC System~6000 NIC Tuning Guide for Fortran and C Publication GG24- 3611-01, IBM, Armonk, N.Y.Google ScholarGoogle Scholar
  113. IRIGOIN, F. AND TRIOLET, R. 1988. Supernode partitioning. In Conference Record of the 15th ACM Symposium on Principles of Programming Languages (San Diego, Calif., Jan.). ACM Press, New York, 319-329. Google ScholarGoogle Scholar
  114. IVERSON, K. E. 1962. A Programming Language. John Wiley and Sons, New York. Google ScholarGoogle Scholar
  115. KENNEDY, K. AND MCKINLEY, K. S. 1990. Loop distribution with arbitrary control flow. In Proceedings of Supercomputing '90 (New York, N.Y., Nov.). IEEE Computer Society Press, Los Alamitos, Calif., 407-416. Google ScholarGoogle Scholar
  116. KENNEDY, K., MCKINLEY, K. S., AND TSENG, C.-W. 1993. Analysis and transformation in an interactive parallel programming tool. Concurrency Pract. Exper 5, 7 (Oct.), 575-602.Google ScholarGoogle Scholar
  117. KILDALL, G. 1973. A unified approach to global program optimization In Conference Record of the ACM Symposium on Prmctples of Programmmg Languages (Boston, Mass, Oct.). ACM Press, New York, 194-206. Google ScholarGoogle Scholar
  118. KNOBE, K. AND NATARAJAN, V. 1993. Automatic data allocation to minimize communication on SIMD machines. J. Supercomput. 7, 4 (Dec.), 387-415. Google ScholarGoogle Scholar
  119. K_NIOBE, K., LUKAS, J. D., AND STEELE, G. L., JR. 1990. Data optimization: Allocation of arrays to reduce communication on SIMD machines. J. Parall Distrib. Comput. 8, 2 (Feb.), 102-118. Google ScholarGoogle Scholar
  120. KNOBE, K., LUKAS, J D., AND STEELE, G. L., JR 1988. Massively parallel data optimization. In The 2nd. Symposium on the Frontiers of Massively Parallel Computation (Fairfax, Va., Oct.). IEEE, Piscataway, N.J., 551-558.Google ScholarGoogle Scholar
  121. KNUTH, D. E. 1971. An empirical study of Fortran programs. Soflw. Pract. Exper. 1, 2 (Apr.-June), 105-133.Google ScholarGoogle Scholar
  122. KOELBEL, C. 1990. Compiling programs for nonshared memory machines. Ph.D. thesis, Purdue University, West Lafayette, Ind. Google ScholarGoogle Scholar
  123. KOELBEL, C AND MEHROTRA, P. 1991. Compiling global name-space parallel loops for distributed execution. IEEE Trans. Parall. Distrib. Syst. 2, 4 (Oct.), 440-451. Google ScholarGoogle Scholar
  124. KRANZ, D., KELSEY, R., REES, J., HUDAK, P., PHILBIN, J., AND ADAMS, N. 1986. ORBIT: An optimizing compiler for Scheme. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Pato Alto, Calif., June). SIGPLAN Not. 21, 7 (July), 219-233. Google ScholarGoogle Scholar
  125. KUCK, D. J. 1978. The Structure of Computers and Computations Vol. 1. John Wiley and Sons. New York. Google ScholarGoogle Scholar
  126. KUC~4, D. J. 1977. A survey of parallel machine organization and programming. ACM Comput. Surv. 9, 1 (Mar.), 29-59. Google ScholarGoogle Scholar
  127. KUCK, D. J. AND STOKES, R. 1982. The Burroughs Scientific Processor (BSP). IEEE Trans. Comput. C-31, 5 (May), 363-376.Google ScholarGoogle Scholar
  128. KUCK, D. J., KUHN, R. H., PADUA, D., LEASURE, B., AND WOLFE, M. 1981. Dependence graphs and compiler optimizations. In Conference Record of the 8th ACM Symposium on Principles of Programming Languages (Williamsburg, Va., Jan.). ACM Press, New York, 207-218. Google ScholarGoogle Scholar
  129. L~, M. S. 1988. Software pipelining: An effective scheduling technique for VLIW machines In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Atlanta, Ga., June). SIGPLAN Not. 23, 7 (July), 318-328. Google ScholarGoogle Scholar
  130. LAM, M. S. AND WmSON, R. P. 1992. Limits of control flow on parallelism. In Proceedings of the 19th Annual International Symposium on Computer Architecture (Gold Coast, Australia, May). SIGARCH Comput. Arch. News 20, 2, 46-57 Google ScholarGoogle Scholar
  131. LAM, M. S., ROTHBERG, E. E., AND WOLF. M. E. 1991. The cache performance and optimization of blocked algorithms. In Proceedzngs of the 4th Internatwnal Conference on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.). SIGPLAN Not. 26, 4, 63-74. Google ScholarGoogle Scholar
  132. LAMPORT, L. 1974. The parallel execution of DO loops. Commun. ACM 17, 2 (Feb.), 83-93. Google ScholarGoogle Scholar
  133. LANDI, W., RYDER, B. G., AND ZHANG, S. 1993. Interprocedural modification side effect analysis with pointer ahasing. In Proceedzngs of the SIGPLAN Conference on Programming Language Deszgn and Implementation (Albuquerque, New Mexico, June). SIGPLAN Not. 28, 6, 56-67. Google ScholarGoogle Scholar
  134. LARUS, J. R. 1993 Loop-level parallelism in numeric and symbolic programs. IEEE Trans. Parall. Dtstrib. Syst 4, 7 (July), 812-826. Google ScholarGoogle Scholar
  135. LEE, G., KRUSKAL, C. P., AND KUCK, D. J. 1985. An empirical study of automatic restructuring of nonnumerical programs for parallel processors. IEEE Trans. Comput. C-34, 10 (Oct.), 927-933. Google ScholarGoogle Scholar
  136. LI, Z. 1992. Array privatization for parallel execution of loops. In Proceedings of the A CM International Conference on Supercomput~ng (Washington, D.C., July). ACM Press, New York, 313-322. Google ScholarGoogle Scholar
  137. LI, J. AND CHEN, M. 1991. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. Parall. Distrib. Syst. 2, 3 (July), 361-376. Google ScholarGoogle Scholar
  138. LI, J. AND CHEN, M. 1990. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. In The 3rd Symposium on the Frontiers of Massively Parallel Computation, J. Jaja, Ed. IEEE Computer Society Press, Los Alamitos, Calif., 424-433.Google ScholarGoogle Scholar
  139. LI, Z. AND YEW, P. 1988. Interprocedura} analysis for parallel computing. In Proceedings of the International Conference on Parallel Processing. F. A. Briggs, Ed. Vol. 2. Pennsylvania State University Press, University Park, Pa., 221-228.Google ScholarGoogle Scholar
  140. LI, Z., YEW, P., AND ZHU, C. 1990. Data dependence analysis on multi-dimensional array references. IEEE Trans. Parall. Distnb. Syst. 1, 1 (Jan.), 26-34, Google ScholarGoogle Scholar
  141. LOVEMAN, D. B. 1977. Program improvement by source-to-source transformation. J. ACM 1, 24 (Jan.), 121-145. Google ScholarGoogle Scholar
  142. Lvcco, S. 1992. A dynamic scheduling method for irregular parallel programs. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 200-211. Google ScholarGoogle Scholar
  143. MACE, M. E. 1987, Memory Storage Patterns in Parallel Processing. Kluwer Academic Publishers, Norwell, Mass. Google ScholarGoogle Scholar
  144. MACE, M. E. AND WAGNER, R. A. 1985. Globally optimum selection of memory storage patterns. In Proceedings of the International Conference on Parallel Processing, D. DeGrott, Ed. IEEE Computer Society, Washington, D.C., 264-271.Google ScholarGoogle Scholar
  145. MARKSTEIN, P., MARKSTEIN, V., AND ZADECK, K. 1994. Strength reduction. In Optimizatmn in Compilers. ACM Press, New York, Chap. 9. To be published.Google ScholarGoogle Scholar
  146. MASSALIN, H. 1987, Superoptimizer: A look at the smallest program. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems (Palo Alto, Calif., Oct.). SIGPLAN Not. 22, 10, 122-126. Google ScholarGoogle Scholar
  147. MAYDAN, D. }~., AMARASINGHE, S. P., AND LAM, M. S. 1993. Array data flow analysis and its use in array privatization. In Conference Record of the 20th ACM Symposium on Principles of Programming Languages (Charleston, S. Carolina, Jan.). ACM Press, New York, 1-15. Google ScholarGoogle Scholar
  148. MAYDAN, D. E., HENNESSEY, J. L., AND LAM, M. S. 1991. Efficient and exact data dependence analysis, in Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIG- PLAN Not. 26, 6, 1-14. Google ScholarGoogle Scholar
  149. MCFARLING, S. 1991. Procedure merging with instruction caches. In Proceedings of the SIG= PLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIGPLAN Not. 26, 6, 71-79. Google ScholarGoogle Scholar
  150. McGRAw, J. R. 1985. SISAL: Streams and iteration in a single assignment language. Tech. Rep. M-146, Lawrence Livermore National Laboratory, Livermore, Calif.Google ScholarGoogle Scholar
  151. MCMAHON, F. M. 1986. The Livermore Fortran kernels: A computer test of numerical performance range. Tech. Rep. UCRL-55745, Lawrence Livermore National Laboratory, Livermore, Calif.Google ScholarGoogle Scholar
  152. MICHIE, D. 1968. "Memo" functions and machine learning. Nature 218, 19-22.Google ScholarGoogle Scholar
  153. MILLSTEIN, R. E. AND MUNTZ, C. A. 1975. The IL- LIAC-IV Fortran compiler. In Programming Languages and Compilers for Parallel and Vector Machines (New York, N.Y., Mar.). SIG- PLAN Not. 10, 3, 1-8. Google ScholarGoogle Scholar
  154. MIRCHANDANEY, R., SALTZ, J. H., SMITH, R. M., NICOL, D. 1V{., AND CROWLEY, K. 1988. Principles of runtime support for parallel processors. In Proceedings of the ACM International Conference on Supercomputing (St. Malo, France, July). ACM Press, New York, 140-152. Google ScholarGoogle Scholar
  155. MOREL, E. AND RENVOISE, C. 1979. Global optimization by suppression of partial redundancies. Commun. ACM 22, 2 (Feb.), 96-103. Google ScholarGoogle Scholar
  156. MUCHNICK, S. S. AND JONES, N., (EDS.) 1981. Program Flow Analysis. Prentice-Hall, Englewood Cliffs, N.J. Google ScholarGoogle Scholar
  157. MUm~OKA, Y. 1971. Parallelism exposure and exploitation in programs. Ph.D. thesis, Tech. Rep. 71-424, Univ. of Illinois at Urbana-Champaign.Google ScholarGoogle Scholar
  158. MYERS, E. W. 1981. A precise interprocedural data flow algorithm. In Conference Record of the 8th ACM Symposium on Principles of Programming Languages (Williamsburg, Va., Jan.). ACM Press, New York, 219-230. Google ScholarGoogle Scholar
  159. NICOLAU, A. 1988. Loop quantization: A generalized loop unwinding technique. J. Para ll. Distrib. Comput. 5, 5 (Oct.), 568-586. Google ScholarGoogle Scholar
  160. NICOLAU, A. AND FISHER, J. A. 1984. Measuring the parallelism available for very long instruction word architectures. IEEE Trans. Comput. C-33, 11 (Nov.), 968-976.Google ScholarGoogle Scholar
  161. NIKHIL, R. S. 1988. ID reference manual, version 88.0. Tech. Rep. 284, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google ScholarGoogle Scholar
  162. NOBAYASHI, H. AND EOYANG, C. 1989. A comparison study of automatically vectorizing Fortran compilers. In Proceedings of Supercomputing '89 (Reno, Nev., Nov.). ACM Press, New York, 820-825. Google ScholarGoogle Scholar
  163. O'BRIEN, K., HAY, B., MINISH, J., SCHAFFER, H., SCHLOSS, B., SHEPHERD, A., AND ZALESKI, M. 1990. Advanced compiler technology for the RISC System/6000 architecture. In IBM RISC System/6000 Technology. Publication SA23- 2619. IBM Corporation, Mechanlcsburg, PennGoogle ScholarGoogle Scholar
  164. OED, W. 1992. Cray Y-MP C90: System features and early benchmark results. Parall. Comput. 18, 8 (Aug.), 947-954. Google ScholarGoogle Scholar
  165. OEHLER, R. R. AND BLASGEN, M. W. 1991. IBM RISC System/6000: Architecture and performance. IEEE Micro 11, 3 (June), 14-24. Google ScholarGoogle Scholar
  166. PADUA, D. A. AND PETERSEN, P. M. 1992. Evaluation of paralleIizing compilers. Parallel Computing and Transputer Applications, (Barcelona, Spain. Sept.). CIMNE, 1505-1514. Also available as Center for Supercomputing Research and Development Tech Rep. 1173.Google ScholarGoogle Scholar
  167. PADUA, D. A. AND WOLFE. M. J. 1986. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (Dec.), 1184-1201. Google ScholarGoogle Scholar
  168. PADUA, D. A., KUCK, D. j. AND LAWRIE, D. 1980. High-speed multiprocessors and compilation techniques. IEEE Trans. Comput. C-29, 9 (Sept.), 763-776.Google ScholarGoogle Scholar
  169. PETERSEN, P. M. AND PADUA, D. A. 1993. Static and dynamic evaluation of data dependence analysis. In Proceedings of the ACM International Conference on Supercomputing (Tokyo, Japan, July}. ACM Press, New York, 107-116. Google ScholarGoogle Scholar
  170. PETTIS, K. AND HANSEN, R. C. 1990. Profile guided code pomtioning. In Proceedings of the SIG- PLAN Conference on Programming Language Design and Implementation (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 16-27. Google ScholarGoogle Scholar
  171. POLYCHRONOPOULOS, C. D. 1988. Parallel Programming and Compilers. Kluwer Academic Publishers, Boston, Mass. Google ScholarGoogle Scholar
  172. POLYCHRONOPOULOS, C. D. 1987a Advanced loop optlmlzations for parallel computers. In Proceed~ngs of the 1st International Conference on Supercomputing. Lecture Notes in Computer Science, voI 297. Springer-Verlag, Berlin, 255-277. Google ScholarGoogle Scholar
  173. POLYCHRONOPOULOS, C. D. 1987b. Loop coalescing: A compiler transformation for parallel machines. In Proceedings of the International Conference on Parallel Processing (University Park, Penn., Aug.). Pennsylvania State University Press, University Park, Pa., 235-242.Google ScholarGoogle Scholar
  174. POLYCHRONOPOULOS, C. D. AND KUCK, D. J. 1987. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput. C-36, 12 (Dec.), 1425-1439. Google ScholarGoogle Scholar
  175. POLYCHRONOPOULOS, C. D., GIRKAR, M., HAGHIGHAT, M. R., LEE, C. L., LEUNG, B., AND SCHOUTEN, D. 1989. Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors. In Proceedings of the International Conference on Parallel Processing. Volume 2. Pennsylvania State University Press, University Park, Pa., 39-48.Google ScholarGoogle Scholar
  176. PRESBERO, D. L. AND JOHNSON, N. W. 1975 The Paralyzer: IVTRAN's parallelism analyzer and synthesizer. In Programming Languages and Compilers for Parallel and Vector Machines (New York, N.Y., Mar.). SIGPLAN Not. 10, 3, 9-16. Google ScholarGoogle Scholar
  177. PUGH, W. 1992. A practical algorithm for exact array dependence analysis Commun. ACM 35, 8 (Aug.). 102-115. Google ScholarGoogle Scholar
  178. PUGH, W. 1991. Umform techniques for loop optimization. In Proceedings of the ACM International Conference on Supercomputing (Cologne, Germany, June). ACM Press, New York. Google ScholarGoogle Scholar
  179. RAU, B. AND FISHER, J. A. 1993. Instruchon4evel parallel processing: History, overview, and perspective. J. Supercomput. 7, 1/2 (May), 9-50. Google ScholarGoogle Scholar
  180. RAU, B., YEN, D. W. L., YEN, W., AND TOWLE, R. A. 1989. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. Computer 22, 1 (Jan.), 12-34. Google ScholarGoogle Scholar
  181. REES, j., CLINGER, W., ET AL. 1986 Revised3 report on the algorithmic language Scheme. SIG- PLAN Not 21, 12 (Dec.), 37-79. Google ScholarGoogle Scholar
  182. RISEMAN, E. M. AND FOSTER, C. C. 1972. The inhibition of potential parallelism by conditional jumps. IEEE Trans. Comput. C-21~ 12 (Dec.), 1405-1411.Google ScholarGoogle Scholar
  183. ROGERS, A. M. 1991. Compiling for locality of reference. Ph.D. thesis, Tech. Rep. TR 91-1195, Dept. of Computer Science, Cornell University. Google ScholarGoogle Scholar
  184. RUSSELL, R. M. 1978. The CRAY-1 computer system. Commun. ACM 21, 1 (Jan.), 63-72. Google ScholarGoogle Scholar
  185. SABOT, G. AND WHOLEY, S. 1993 CMAX: A Fortran translator for the Connection Machine ~ystem. In Proceedings of the ACM InternatLonal Conference on Supercornput~ng (Tokyo, Japan, July). ACM Press, New York. Google ScholarGoogle Scholar
  186. SARKAR, V. 1989. Partitioning and Scheduling Parallel Programs for Multiprocessors. Research Monographs in Parallel and Distributed Computing. MIT Press, Cambridge, Mass. Google ScholarGoogle Scholar
  187. SARKAR, V. AND THEKKATH, R. 1992. A general framework for iteration-reordering transformations. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 175-187. Google ScholarGoogle Scholar
  188. SCHEIFLER, R. W. 1977. An analysis of inline substitution for a structured programming language. Commun. ACM 20, 9 (Sept.). 647-654. Google ScholarGoogle Scholar
  189. SHEN, Z., LI, Z., AND YEW, P-C. 1990. An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Parall. Distrib. Syst. 1, 3 (July), 356-364. Google ScholarGoogle Scholar
  190. SINGH, J. P. AND HENNESSY, J. L. 1991. An empirical investigation of the effectiveness and limitations of automatic parallelization. In Proceedings of the International Symposium on Shared Memory Multiprocessing, (Apr.). 213-240.Google ScholarGoogle Scholar
  191. SINGH, J. P., WEBER, W.-D., AND GUPTA, A. 1992. SPLASH: Stanford parallel applications for shared memory. SIGARCH Comput. Arch. News 20, 1 (Mar.), 5-44. Also available as Stanford Univ. Tech. Rep. CSL-TR-92-526. Google ScholarGoogle Scholar
  192. SITES, R. L., (ED.) 1992. Alpha Architecture Reference Manual. Digital Press, Bedford, Mass. Google ScholarGoogle Scholar
  193. SOHI, G. S. AND VAJAPAYEM, S. 1990. Instruction issue logic for high-performance, interruptable, multiple functional unit, pipelined computers. IEEE Trans. Comput. C-39, 3 (Mar.), 349-359. Google ScholarGoogle Scholar
  194. STEELE, G. L., JR. 1977. Arithmetic shifting considered harmful. SIGPLAN Not. 12, 11 (Nov.), 61-69. Google ScholarGoogle Scholar
  195. STEELE, G. L., JR. AND WHITE, J. L. 1990. How to print floating-point numbers accurately. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 112-126. Google ScholarGoogle Scholar
  196. STENSTROM, P. 1990. A survey of cache coherence schemes for multiprocessors. Computer 23, 6 (June), 12-24. Google ScholarGoogle Scholar
  197. SUN MICROSYSTEMS. 1991. SPARC Architecture Manual, Version 8. Part No. 800-1399-08. Sun Microsystems, Mountain View, Calif.Google ScholarGoogle Scholar
  198. SZYMANSI4I, T. G. 1978. Assembling code for machines with span-dependent instructions. Cornmum. ACM 21, 4 (Apr.), 300-308. Google ScholarGoogle Scholar
  199. TANG, P. AND YEW, P. 1990. Dynamic processor self-scheduling for general parallel nested loops. IEEE Trans. Comput. C-39, 7 (July), 919-929. Google ScholarGoogle Scholar
  200. THINKING MACHINES. 1989. Connection Machine, Model CM-2 Technical Summary. Thinking Machines Corp., Cambridge, Mass.Google ScholarGoogle Scholar
  201. TJADEN, G. S. AND FLYNN, M. J. 1970. Detection and parallel execution of parallel instructions. IEEE Trans. Comput. C-19, 10 (Oct.), 889-895.Google ScholarGoogle Scholar
  202. TORRELLAS, J., LAM, H. S., AND HENNESSY, J. L. 1994. False sharing and spatial locality in multiprocessor caches. IEEE Trans. Comput. 43, 6 (June), 651-663. Google ScholarGoogle Scholar
  203. TOWLE, R. A. 1976. Control and data dependence for program transformations. Ph.D. thesis, Tech. Rep. 76-788, Computer Science Dept., Univ. of Illinois at Urbana-Champaign. Google ScholarGoogle Scholar
  204. TRIOLET, R., IRIGOIN, F., AND FEAUTRIER, P. 1986. Direct parallelization of call statements. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Palo Alto, Calif., June). SIGPLAN Not. 21, 7 (July), 176-185. Google ScholarGoogle Scholar
  205. TSENG, C.-W. 1993. An optimizing Fortran D compiler for MIMD distributed-memory machines. Ph.D. thesis, Tech. Rep. Rice COMP TR93-199, Computer Science Dept., Rice University, Houston, Tex. Google ScholarGoogle Scholar
  206. TU, P. AND PADUA, D. A. 1993. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 768. Springer-Verlag, Berlin, 500-521. Google ScholarGoogle Scholar
  207. VON HANXLEDEN, R. AND KENNEDY, K. 1992. Relaxing SIMD control flow constraints using loop transformations. In Proceedings of the SIG- PLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 188-199. Google ScholarGoogle Scholar
  208. WALL, D. W. 1991. Limits of instruction-level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.). SIGPLAN Not. 26, 4, 176-188. Google ScholarGoogle Scholar
  209. WANG, K. 1991. Intelligent program optimization and parallelization for parallel computers. Ph.D. thesis, Tech. Rep. CSD-TR-91-030, Purdue University, West Lafayette, Ind. Google ScholarGoogle Scholar
  210. WANG, Ko AND GANNON, D. 1989. Applying AI techniques to program optimization for parallel computers, in Parallel Processing for Supercomputers and Artificial Intelligence, K. Hwang and D. Degroot, Eds. McGraw Hill, New York, Chap. 12.Google ScholarGoogle Scholar
  211. WEDEL, D. 1975. Fortran for the Texas Instruments ASC system. In Programming Languages and Compilers for Parallel and Vector Machines (New York, N.Y., Mar.). SIGPLAN Not. 10, 3, 119-132. Google ScholarGoogle Scholar
  212. WEGMAN, M. N. AND ZADECK, F. K. 1991. Constant propagation with conditional branches. ACM Trans. Program. Lang. Syst. 13, 2 (Apr.), 181-210. Google ScholarGoogle Scholar
  213. WEISS, M. 1991. Strip mining on SIMD architectures. In Proceedings of the ACM International Conference on Supercomputing (Cologne, Germany, June). ACM Press, New York, 234-243. Google ScholarGoogle Scholar
  214. WHOLEY, S. 1992. Automatic data mapping for distributed-memory parallel computers. In Proceedings of the ACM International Conference on Supercomputing (Washington, D.C., July). ACM Press, New York, 25-34. Google ScholarGoogle Scholar
  215. WOLF, M. E. 1992. improving locality and parallelism in nested loops. Ph.D. thesis, Computer Science Dept., Stanford University, Stanford, Calif. Google ScholarGoogle Scholar
  216. WOLFE, M. J. 1989a. More iteration space tiling. In Proceedings of Supercomputing '89 (Reno, Nev., Nov.). ACM Press, New York, 655-664. Google ScholarGoogle Scholar
  217. WOLFE, M. J. 1989b. Optimizing Supercompilers for Supercomputers. Research Monographs in Parallel and Distributed Computing. MIT Press, Cambridge, Mass. Google ScholarGoogle Scholar
  218. WOLF, M. E. AND LAM, M. S. 1991. A loop transformation theory and an atgonthm to maximize parallelism. IEEE Trans. Parall. D~strzb. Syst. 2, 4 (Oct.), 452-471. Google ScholarGoogle Scholar
  219. WOLFE, M J AND TSENG, C. 1992. The Power Test for data dependence. 1EEE Trans. Parall. Distrtb. Syst. 3, 5 (Sept), 591-601 Google ScholarGoogle Scholar
  220. ZIMA, H. P, BAST, H. J, AND GERNDT, M. 1988. SUPERB. A tool for semi-automatic S1MD/MIMD parallelization Parall. Comput 6, 1 (Jan.J, 1-18.Google ScholarGoogle Scholar

Index Terms

  1. Compiler transformations for high-performance computing

        Recommendations

        Reviews

        Max Hailperin

        This is a comprehensive survey of high-level optimizing transformations used in compilers for superscalar, vector, and multiprocessor machines. The emphasis is on transformations endemic to these architectures; supporting analyses and more classic transformations also receive some coverage. The authors expect "readers ... to be familiar with modern computer architecture and basic program compilation techniques." The extensive bibliography generally includes both the earliest work on a topic and a sampling of important recent work. Since the recent work is only sampled, a citation index should be consulted. The writing is generally clear, and is supplemented with useful examples and diagrams and a helpful appendix of model architectures. Accessibility is marred by flaws in the diagrams and bibliography. In the diagrams, the dotted lines largely disappeared between the authors' computer and the printed page. In some diagrams, they are gone entirely; in others, an occasional faint dot remains. This unfortunately turns the diagrams into puzzles. The bibliography's "alphabetization" is quite unusual. The presentation seems reasonably authoritative; there are some minor weaknesses in this regard in peripheral topics (e.g., the definition of dependence distance, the description of the Omega test, the citation of Morel and Renvoise's work, and the treatment of tail-recursion optimization). The importance of this survey, however, doesn't rest on its authoritativeness, let alone its authoritativeness regarding peripheral topics. Rather, it is important because of the amount of material that is collected into one easily accessible place. It will play an important educational role for years to come. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader