Abstract
In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis.
This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application.
Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimizations that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques.
- ABELSON, H. AND SUSSMAN, G. J. 1985. Structure and Interpretation of Computer Programs. MIT Press, Cambridge, Mass. Google Scholar
- ABu-SUFAH, W 1979. improving the performance of virtual memory computers. Ph D., thesis, Tech. Rep. 78-945, Univ. of Illinois at Urbana- Champaign. Google Scholar
- ABU-SUFAH, W., KUCK, D. J., AND LAWRIE, I). 1981 On the performance enhancement of paging systems through program analysis and transformations IEEE Trans. Comput. C-30, 5 (May), 341-356.Google Scholar
- ACKERSLAXq, W B. 1982. Data flow languages. Computer 15, 2 (Feb.), 15-25.Google Scholar
- AHO, A. V., JOHNSON, S C., AND ULLMAN, J. D 1977. Code generation for expressions with common subexpressions. J. ACM 24, 1 (Jan.), 146-160. Google Scholar
- Ago, A. V., SF~THI, R., AND ULLMAN, J. D. 1986. Compilers. Prtnc~ples, Techniques, and Tools. Addison-Wesley, Reading, Mass. Google Scholar
- AIKEN, A. AND NICOLAU, A. 1988a Optimal loop parallehzation. In Procee&ngs of the SIGPLAN Conference on Programmzng Language Destgn and Implementahon (Atlanta, Ga., June). SIG- PLAN Not. 23, 7 (July), 308-317. Google Scholar
- AMEN, A. AND NICOIAU. A. 1988b. Perfect pipelinrag: A new loop parallelization technique. In Proceedmgs of the 2nd European Symposzum on Programming Lecture Notes in Computer Science, vol. 300. Springer-Verlag, Berlin, 221-235. Google Scholar
- ALLEN, J. R. 1983. Dependence analysis for subscripted variables and its apphcation to program transformations. Ph.D. thesis, Computer Science Dept., Rice University, Houston, Tex Google Scholar
- ALLEN, F. E. 1969. Program optimization. In Annual Rewew ~n Automatic Programming 5. International Tracts in Computer Science and Technology and their Application, vol. 13. Pergamon Press, Oxford, England, 239-307.Google Scholar
- ALLEN, F. E. AND COCKE, J. 1971. A catalogue of optimizing transformations. In Design and Opt~m~zat~on of Compilers, R Rustin, Ed. Prentice-Hall, Englewood Cliffs, N.J., 1-30.Google Scholar
- ALLEN, J. R. AND KENNEDY, K. 1987. Automatic translation of Fortran programs to vector form. ACM Trans. Program Lang. Syst. 9, 4 (Oct.), 491-542. Google Scholar
- ALLEN, J. R. AND I~~ENNEDY, K. 1984. Automatic loop interchange. In Proceedings of the SIGPLAN Symposium on Compzler Construction (Montreal, Quebec, June). SIGPLAN Not. 19, 6, 233-246. Google Scholar
- ALLEN, F. E., BURKE, M., CHARLES, P., CYTRON, R., AND FERRANTE, j. 1988a. An overview of the PTRAN analysis system for multiprocessing. J. Parall. Distrib. Comput. 5, 5 (Oct.), 617-640. Google Scholar
- ALLEN, F. E., BURKE, M., CYTRON, R., FERRANTE, J., HSIEH, W., AND SARKAR, V. 1988b. A framework for determining useful parallelism. In Proceedings of the ACM International Conference on Supercomput~ng (St. Malo, France, July). ACM Press, New York, 207-215. Google Scholar
- ALLEN, F. E., COCKE, J., AND KENNEDY, K. 1981. Reduction of operator ~trength. In Program Flow Analysis: Theory and Apphcations, S. S. Muchnik and N. D. Jones, Eds., Prentice-Hall, Englewood Cliffs, N.J., 79-101.Google Scholar
- ALL~N, J R., KENNEDY, K., PORTERFIELD, C., AND WARREN, J. 1983. Conversion of control dependence to data dependence In Conference Record of the 10th ACM Symposium on Principlos of Programming Languages (Austin, Tex., Jan.). ACM Press, New York, 177-189. Google Scholar
- ALPERT, D. AND AVNON, D. 1993 Architecture of the Pentium microprocessor IEEE Micro 13, 3 (June), 11-21 Google Scholar
- AMERICAN NATIONAL STANDARDS INSTITUTE. 1987. An American National standard, IEEE standard for binary floating-point arithmetic SIGPLAN Not. 22, 2 (Feb.), 9-25.Google Scholar
- ANDERSON, J. M. AND LAM, M. S. 1993 Global optimlzations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN Conference on Programming Language Deszgn and Implementation (Albuquerque, New Mexico, June). SIGPLAN Not. 28, 6, 112-125. Google Scholar
- ANDERSON, S. AND HUDAK, P 1990 Compilation of Haskell array comprehensions for scientific computing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementatwn (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 137-149. Google Scholar
- APPEL, A. W. 1992. Compding w~th Continuations. Cambridge University Press, Cambridge, England. Google Scholar
- ARDEN, B. W., GALLER, B. A., AND GRAHAM, R. M. 1962. An algorithm for translating Boolean expressions. J. ACM 9, 2 (Apr.), 222-239. Google Scholar
- ARWND AND CULLER, D. E. 1986. Dataflow architectures. In Annual Review of Computer Science. Vol. 1. J. F. Traub, B. J. Grosz, B. W. Lampson, and N. J. Nilsson, Eds. Annual Reviews, Palo Alto, Calif, 225-253. Google Scholar
- ARWND, KATHAIL, V., AND PINGALI, K. 1980. A dataflow architecture with tagged tokens. Tech. Rep TM-174, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google Scholar
- BACON, D. F., CHOW, J.-H., Ju, D. R, MUTHUKUMAR, 14, AND SARKAR, V. 1994. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proceedings of CASCON '94 (Toronto, Ontario, Oct.). Google Scholar
- BAILEY, D. H. 1992. On the behavior of cache memories with strided data access. Tech. Rep. RNR- 92-015, NASA Ames Research Center, Moffett Field, Calif., (May).Google Scholar
- BAILEY, D. H., BARszcz, E., BARTON, J. T., BROWNING, D. S., CARTER, R. L., DAGUM, L., FATOOHI, R. A., FREDERICKSON, P. 0., LASINSKI, T. A., SCHREIBER, R. S., SIMON, H. D., VENKATAKRISHNAN, V., AND WEERATUNGA, S. K. 1991. The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5, 3 (Fall), 63-73.Google Scholar
- BALASUNDARAM, V. 1990. A mechanism for keeping useful internal information in parallel programming tools: The Data Access Descriptor. J. Parall. D~stnb. Comput. 9, 2 (June), 154-170. Google Scholar
- BALASUNDARAM, V., Fox, G., KENNEDY, K., AND KREMER, V. 1990. An interactive environment for data partitioning and distribution. In Proceedtngs of the 5th D~strzbuted Memory Computer Conference (Charleston, South Carolina, Apr.). IEEE Computer Society Press, Los Alamitos, Calif.Google Scholar
- BALASUNDARAM, V., KENNEDY, K., KREMER, U., MCKINLEY, K., AND SUBHLOK, J. 1989. The ParaScope editor: An interactive parallel programming tool. In Proceedings of Supercomputing '89 (Reno, Nev., Nov.). ACM Press, New York, 540-550. Google Scholar
- BALL, J. E. 1979. Predicting the effects of optimization on a procedure body. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Denver, Color., Aug.). SIGPLAN Not. 14, 8, 214-220. Google Scholar
- B~NE~EE, U. 1991. Unimodular transformations of double loops. In Advances in Languages and Compilers for Parallel Processing, A. Nicolau, Ed. Research Monographs in Parallel and Distributed Computing, MIT Press, Cambridge, Mass., Chap. 10.Google Scholar
- BANERJEE, U. 1988a. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, Mass. Google Scholar
- BnNERJEE, U. 1988b. An introduction to a formal theory of dependence analysis. J. Supercomput. 2, 2 (Oct.), 133-149.Google Scholar
- BANERJEE, U. 1979. Speedup of ordinary programs, Ph.D. thesis, Tech. Rep. 79-989, Computer Science Dept., Univ. of Illinois at Urbana- Champaign. Google Scholar
- BANERJEE~ U., CHEN, S. C., KUCK, D. J., AND TOWLE, R. A. 1979. Time and parallel processor bounds for Fortran-like loops. IEEE Trans. Comput. C-28, 9 (Sept.), 660-670.Google Scholar
- BANNING, J. P. 1979. An efficient way to find the side-effects of procedure calls and the aliases of variables. In Conference Record of the 6th ACM Symposium on Principles of Programming Languages (San Antonio, Tex., Jan.). ACM Press, 29-41. Google Scholar
- BERNSTEIN, R. 1986. Multiplication by integer constants. Softw. Pract. Exper. 16, 7 (July), 641-652. Google Scholar
- BERRY, M., CHEN, D., Koss, P., KUCK, D., Lo, S., PANG, Y., POINTER, L., ROLOFF, R., SAMEH, A., CLEMENTI, E_. CHIN. S. SCHNEIDER. D. FOX. FT., MESSINA, P., WALKER, D., HSIUNG, C., SCHWARZMEIER, J., LUE, K., ORSZAG, S., SEIDL, F., JOHNSON, O., GOODRUM, R., AND MARTIN, J. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Int. J. Supercomput. Appl. 3, 3 (Fall), 5-40.Google Scholar
- BLELLOCH, G. 1989. Scans as primitive parallel operations. IEEE Trans. Comput. C-38, 11 (Nov.), 1526-1538. Google Scholar
- BROMLEY, M., {-IELLER, S., MCNERNEY, W., AND STEELE, G. L., JR. 1991. Fortran at ten gigaflops: The Connection Machine convolution compiler. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIG- PLAN Not. 26, 6, 145-156. Google Scholar
- BURKE, M. AND CYTRON, R. 1986. Interprocedural dependence analysis and paratlelization. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Palo Alto, Calif., June). SIGPLANNot. 21, 7 (July), 162-175. Extended version available as IBM Thomas J. Watson Research Center Tech. Rep. RC 11794. Google Scholar
- BURNETT, G. J. AND COFFMAN, E. G., JR. 1970. A study of interleaved memory systems. In Proceedings of the Spring Joint AFIPS Computer Conference. vol. 36. AFIPS, Montvale, N.J., 467-474.Google Scholar
- BURSTALL, R. M. AND DARLINGTON, J. 1977. A transformation system for developing recursive programs. J. ACM 24, 1 (Jan.}, 44-67. Google Scholar
- BUTLER, M., YEH, T., PATT, Y., ALSUP, M., SCALES, H., AND SHEBANOW, M. 1991. Single instruction stream parallelism is greater than two. In Proceedings of the 18th Annual International Symposium on Computer Architecture (Toronto, Ontario, May). SIGARCH Comput. Arch. News 19, 3, 276-286. Google Scholar
- CALLAHAN, D. AND KENNEDY, K. 1988a. Analysis of interprocedural side-effects in a parallel programming environment. J. Parall. Distrib. Comput. 5, 5 (Oct.), 517-550. Google Scholar
- CALLAHAN, D. AND KENNEDY, K. 1988b. Compiling programs for distributed-memory multiprocessots. J. Supercomput. 2, 2 (Oct.), 151-169.Google Scholar
- CALLAHAN, D., CARR, S., AND KENNEDY, K. 1990. Improving register allocation for subscripted variables. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 53-65. Google Scholar
- CAL~, D., COCKE, J., AND KENNEDY, K. 1988. Estimating interlock and improving balance for pipelined architectures. J. Parall. Distrib. Comput. 5, 4 (Aug.), 334-358. Google Scholar
- CALLAHAN, D., COOPER, K., KENNEDY, K., AND TORCZON, L. 1986. Interprocedurat constant propagation. In Proceedings of the SIGPLAN Symposium on Compder Construction (Palo Alto, Calif., June). SIGPLAN Not. 21, 7 (July), 152-161. Google Scholar
- CARR, S. 1993. Memory-hierarchy management. Ph.D. thesis, Rice University, Houston, Tex. Google Scholar
- CHAITIN, G. J. 1982. Register allocation and spilling via graph coloring. In Proceedings of the SIG- PLAN Symposium on Compiler Construction (Boston, Mass., June). SIGPLAN Not 17, 6, 98-105. Google Scholar
- CHAITIN, G. J., AUSLANDER, M. A., CHANDRA, A. K., COOKE, J., HOPKINS, M. E., AND MARKSTEIN, P. W. 1981. Register allocation via coloring. Comput. Lang. 6, 1 (Jan.), 47-57.Google Scholar
- CHAMBERS, C. AND UNGAR, D. 1989. Customization: Optimizing compiler technology for SELF, a dynamically-typed object-oriented programming language. In Proceedings of the SIG- PLAN Conference on Programming Language Design and Implementatmn (Portland, Ore., June). SIGPLAN Not. 24, 7 (July), 146-160. Google Scholar
- CHATTERJEE, S., GILBERT, J. R., AND SCHREIBER, R. 1993a. The alignment-distribution graph. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 768. 234-252. Google Scholar
- CHATTERJEE, S., GILBERT, J. R., SCHREIBER, R., AND TENG, S.-H. 1993b. Automatic array alignment in data-parallel programs. In Conference Record of the 20th ACM Symposium on Principles of Programming Languages (Charleston, S. Carolina, Jan.). ACM Press, New York, 16-28. Google Scholar
- CHEN, S. C. AND KUCK, D. J. 1975. Time and parallel processor bounds for linear recurrence systems. IEEE Trans. Comput. C-24 7 (July), 701-717.Google Scholar
- CHoI, J.-D., BURKE, M., AND CARINI, P. 1993. Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects. in Conference Record of the 20th ACM Symposlum on Principles of Programming Languages (Charleston, S. Carolina, Jan.). ACM Press, New York, 232-245. Google Scholar
- Chow, F. C. 1988. Minimizing register usage penalty at procedure calls. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Atlanta, Ga., June). SIGPLAN Not. 23, 7 (July), 85-94. Google Scholar
- CHOW, F. C. AND HENNESSEY, J. L. 1990. The priority-based coloring approach to register allocation. ACM Trans. Program. Lang. Syst. 12, 4 (Oct.), 501-536. Google Scholar
- CLARKE, C. D. AND PEYTON-JONES, S. L. 1985. Strictness analysis--a practical approach. In Functional Programming Languages and Computer Architecture. Lecture Notes in Computer Science, vol. 201. Springer-Verlag, Berlin, 35-49. Google Scholar
- CLINGER, W. D. 1990. How to read floating point numbers accurately. In Proceedings of the SIG- PLAN Conference on Programming Language Desiogn and Implementatwn (W~nite Plains, N.Y., June). SIGPLAN Not. 25, 6, 92-101. Google Scholar
- COCKE, J. 1970. Global common subexpression elimination. In Proceedings of the ACM Symposium on Compiler Opttmizatwn (July). SIGPLAN Not. 5, 7, 20-24. Google Scholar
- COCKE, J. AND MARKSTEIN, P. 1980. Measurement of program improvement algorithms. In Proceedrags of the IFIP Congress (Tokyo, Japan, Oct.). North-Holland, Amsterdam, Netherlands, 221-228. Also available as IBM Tech. Rep. RC 8111, Feb. 1980.Google Scholar
- COCKE, J. AND SCHWARTZ, J. T. 1970. Programming languages and their compilers (preliminary notes). 2nd ed. Courant Inst. of Mathematical Sciences, New York University, New York. Google Scholar
- COLWELL, R. P., NIX, R. P., O'DONNEL, J. J., PAPWORTH, D. B., AND RODMAN, P. K. 1988. A VLIW architecture for a trace scheduling compiler. IEEE Trans. Comput. C-37, 8 (Aug.), 967-979. Google Scholar
- COOPER, K. D. AND KENNEDY, K. 1989. Fast interprocedural alias analysis. In Conference Record of the 16th ACM Symposium on Principles of Programming Languages (Austin, Tex., Jan.). ACM Press, New York, 49-59. Google Scholar
- COOPER, K. D., HALL, M. W., AND KENNEDY, K. 1993. A methodology for procedure cloning. Comput. Lang. 19, 2 (Apr.), 105-117.Google Scholar
- COOPER, K. D., HALL, M. W., AND TORCZON, L. 1992. Unexpected side effect~ of inline substitution: A case study. ACM Lett. Program. Lang. Syst 1, 1 (Mar.), 22-32. Google Scholar
- COOPER, K. D., HALL, M. W., AND TORCZON, L. 1991. An experiment with inline substitution. Softw. Pract. Exper. 21, 6 (June), 581-601 Google Scholar
- CRAY RESEARCH 1988 CFT77 Reference Manual. Publication SR-0018-C. Cray Research, Inc., Eagan, Minn.Google Scholar
- CYTRON, R. 1986. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the Internatwnal Conference on Parallel Processing (St. Charles, Ill., Aug.). IEEE Computer Society, Washington, D.C., 836-844.Google Scholar
- CYTRON, R. AND FERRANTE, J. 1987. What's in a name? -or- the value of renaming for parallelism detection and storage allocation. Proceedings of the Internatwnal Conference on Parallel Processing (University Park, Penn., Aug.). Pennsylvania State University Press, University Park, Pa., 19-27.Google Scholar
- DANTZIG, G. B. AND EAVES, B. C. 1974. Fourier- Motzkin ehmination and sits dual with application to integer programming, in Combinatorial Programming: Methods and Applications (Versailles, France, Sept.). B. D. Roy, Ed. D. Reidel, Boston, Mass., 93-102.Google Scholar
- DENNIS, J. B. 1980 Data flow supercomputers_ Computer 13, 11 (Nov.), 48-56.Google Scholar
- DIXIT, K. M. 1992. New CPU benchmarks from SPEC. In Digest of Papers, Spring COMPCON 1992, 37th IEEE Computer Society International Conference (San Francisco, Calif., Feb.). IEEE Computer Society Press, Los Alamitos, Calif., 305-310. Google Scholar
- DONGARRA, J. AND HIND, A. R. 1979. Unrolling loops in Fortran. Softw. Pratt. Exper. 9, 3 (Mar.), 219-226.Google Scholar
- EGGERS, S. J. AND KATZ, R. H. 1989 Evaluating the performance of four snooping cache coherency protocols. In Proceedings of the 16th Annual International Symposium on Computer Architecture (Jerusalem, Israel, May). SIGARCH Comput. Arch. News 17, 3 (June), 2-15. Google Scholar
- EIGENMANN, R., HOEFLINGER, J., LI, Z., AND PADUA, D. A. 1991. Experience in the automatic parallelization of four Perfect-benchmark programs. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 589. Springer-Verlag, Berlin, 65-83. Also available as Tech. Rep. 1193, Center for Supercomputing Research and Development. Google Scholar
- ELLIS, J. R. 1986. Bulldog: A Compiler for VLIW Architectures. ACM Doctoral Dissertation Award. MIT Press, Cambridge, Mass. Google Scholar
- ERSHOV, A. P. 1966. ALPHA--an automatic programming system of high efficiency. J. ACM 13, 1 (Jan.), 17-24. Google Scholar
- FEAUTRIER, P. 1991. Dataflow analysis of array and scalar references. Int. J. Parall. Program. 20, 1 (Feb.), 23-52.Google Scholar
- FEAUTRIER, P. 1988. Array expansion, in Proceedings of the ACM International Conference on Supercomputing (St. Malo, France, July). ACM Press, New York, 429-441. Google Scholar
- FERRARI, D. 1976. The improvement of program behavior. Computer 9, 11 (Nov.), 39-47.Google Scholar
- FISHER, J.A. 1981. Trace scheduling: A technique for global microcode compaction. IEEE Trans. Comput. C-30, 7 (July), 478-490.Google Scholar
- FISHER, J. A., ELLIS, J. R., RUTTENBERG, J. C., AND NICOLAU, A. 1984. Parallel processing: A smart compiler and a dumb machine. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Montreal, Quebec, June). SIGPLAN Not. 19, 6, 37-47. Google Scholar
- FLOATING POINT SYSTEMS. 1979. FPS AP-120B Processor Handbook. Floating Point Systems, Inc., Beaverton, Ore.Google Scholar
- FREE SOFTWARE FOUNDATION. 1992. gCC 2.X Reference Manual. Free Software Foundation, Cambridge, Mass.Google Scholar
- FREUDENBERGER, S. M., SCHWARTZ, J. T., AND SHARIR, M. 1983. Experience with the SETL optimizer. ACM Trans. Program. Lang. Syst. 5, 1 (Jan.), 26-45. Google Scholar
- GANNON, D., JALBY, W., AND GALLIVAN, K. 1988. Strategies for cache and local memory management by global program transformation. J. Parall. Distrib. Comput. 5, 5 (Oct.), 587-616. Google Scholar
- GERNDT, M. 1990. Updating distributed variables in local computations. Concurrency Pract. Exper. 2, 3 (Sept.), 171-193. Google Scholar
- GIBSON, D. H. AND RAO, G. S. 1992. Design of the IBM System/390 computer family for numerically intensive applications: An overview for engineers and scientists. IBM J. Res. Dev. 36, 4 (July), 695-711. Google Scholar
- GIRKAR, M. AND POLYCHRONOPOULOS, C. D. 1988. Compiling issues for supercomputers. In Proceedings of Supercomputing '88 (Orlando, Fla., Nov.). IEEE Computer Society, Washington, D.C., 164-173. Google Scholar
- GOFF, G., KENNEDY, K., AND TSENG, C.-W. 1991. Practical dependence testing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIGPLAN Not. 26, 6, 15-29. Google Scholar
- GRAHAm, S. L., Lucco, S., AND SHARP, O. J. 1993. Orchestrating interactions among parallel computations. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Albuquerque, New Mexico, June). SIGPLAN Not. 28, 6, 100-111. Google Scholar
- GRANLUND, W. AND KENNER, R. 1992. Eliminating branches using a superoptimizer and the GNU compiler, in Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 341-352. Google Scholar
- GUPTA, M. 1992. Automatic data partitioning on distributed memory multicomputers. Ph.D. thesis, Tech. Rep. UILU-ENG-92-2237, Univ. of Illinois at Urbana-Champaign. Google Scholar
- GUPTA, M., MIDKIFF, S., SCHONBERG, E., SWEENEY, P., WANG, K. Y., AND BURKE, M. 1993. PTRAN II: A compiler for High Performance Fortran. In Proceedings of the 4th International Workshop on Compilers for Parallel Computers (Delft, Netherlands, Dec.). 479-493.Google Scholar
- HALL, M. W., KENNEDY, K., AND MCKINLEY, K. S. 1991. Interprocedural transformations for parallel code generation. In Proceedings of Supercomputing '91 (Albuquerque, New Mexico, Nov.). IEEE Computer Society Press, os Alamitos, Calif., 424-434. Google Scholar
- HARRIS, K. AND HOBBS, S. 1994. VAX Fortran. In Optimization in Compilers, F. E. Allen, R. Cytron, B. K. Rosen, and K. Zadeck, Eds. ACM Press, New York, chap. 16. To be published.Google Scholar
- HATFIELD, D. J. AND GERALD, j. 1971. Program restructuring for virtual memory. IBM Syst. J. 10, 3, 168-192.Google Scholar
- HENNESSY, J. L. AND PATTERSON, D. A. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Mateo, Cali~ Google Scholar
- HEWLETT-PACKARD. 1992. PA-RISC 1.1 Architecture and Instruction Manual. 2nd ed. Part Number 09740-90039. Hewlett-Packard, Palo Alto, Ca}i~Google Scholar
- HIGH PERFORMANCE FORTRAN FORUM. 1993. High Performance Fortran language specification, version 1.0. Tech. Rep. CRPC-TR92225, Rice University, Houston, Tex.Google Scholar
- HIRANANDANI, S., KENNEDY, K., AND TSENG, C.-W. 1993. Preliminary experiences with the Fortran D compiler. In Proceedlng~ of Supercomputing '93 (Portland, Ore., Nov.). IEEE Computer Society Press, Los Alamitos, Calif., 338-350. Google Scholar
- HIRANANDANI, S., KENNEDY, K., AND TSENG, C.-W. 1992. Compiling Fortran D for MIMD distributed-memory machines. Commun. ACM 35, 8 (Aug.), 66-80. Google Scholar
- HUDAK, P. AND GOLDBERG, B. 1985. Distributed execution of functional programs using serial combinators IEEE Trans. Comput. C-34, 10 (Oct), 881-890. Google Scholar
- Hwu, W. W. AND CHANG, P. P. 1989. Achieving high instruction cache performance with an optimizing compiler. In Proceedings of the 16th Annual International Symposium on Computer Architecture (Jerusalem, Israel, May). SIGARCH Comput. Arch. News 17, 3 (June), 242-251. Google Scholar
- Hwu, W. W., MAHLKE, S. A., CHEN, W. Y., CHANG, P. P., WARTER, N. J., BRINGMANN, R. A., OUELLETFE, R. G., HANK, R. E., KIYOHARA, T., HAAB, G. E., HOLM, J. G., AND LAVERY, D M 1993. The superblock: An effective technique for VLIW and superscalar compilation. J. Supercomput. 7, 1/2 (May), 229-248. Google Scholar
- IBM. 1992. Optimization and Tuning Gutde for the XL Fortran and XL C Compilers. 1st ed. Publication SC09-1545-00, IBM, Armonk, N.Y.Google Scholar
- IBM. 1991. IBM RISC System~6000 NIC Tuning Guide for Fortran and C Publication GG24- 3611-01, IBM, Armonk, N.Y.Google Scholar
- IRIGOIN, F. AND TRIOLET, R. 1988. Supernode partitioning. In Conference Record of the 15th ACM Symposium on Principles of Programming Languages (San Diego, Calif., Jan.). ACM Press, New York, 319-329. Google Scholar
- IVERSON, K. E. 1962. A Programming Language. John Wiley and Sons, New York. Google Scholar
- KENNEDY, K. AND MCKINLEY, K. S. 1990. Loop distribution with arbitrary control flow. In Proceedings of Supercomputing '90 (New York, N.Y., Nov.). IEEE Computer Society Press, Los Alamitos, Calif., 407-416. Google Scholar
- KENNEDY, K., MCKINLEY, K. S., AND TSENG, C.-W. 1993. Analysis and transformation in an interactive parallel programming tool. Concurrency Pract. Exper 5, 7 (Oct.), 575-602.Google Scholar
- KILDALL, G. 1973. A unified approach to global program optimization In Conference Record of the ACM Symposium on Prmctples of Programmmg Languages (Boston, Mass, Oct.). ACM Press, New York, 194-206. Google Scholar
- KNOBE, K. AND NATARAJAN, V. 1993. Automatic data allocation to minimize communication on SIMD machines. J. Supercomput. 7, 4 (Dec.), 387-415. Google Scholar
- K_NIOBE, K., LUKAS, J. D., AND STEELE, G. L., JR. 1990. Data optimization: Allocation of arrays to reduce communication on SIMD machines. J. Parall Distrib. Comput. 8, 2 (Feb.), 102-118. Google Scholar
- KNOBE, K., LUKAS, J D., AND STEELE, G. L., JR 1988. Massively parallel data optimization. In The 2nd. Symposium on the Frontiers of Massively Parallel Computation (Fairfax, Va., Oct.). IEEE, Piscataway, N.J., 551-558.Google Scholar
- KNUTH, D. E. 1971. An empirical study of Fortran programs. Soflw. Pract. Exper. 1, 2 (Apr.-June), 105-133.Google Scholar
- KOELBEL, C. 1990. Compiling programs for nonshared memory machines. Ph.D. thesis, Purdue University, West Lafayette, Ind. Google Scholar
- KOELBEL, C AND MEHROTRA, P. 1991. Compiling global name-space parallel loops for distributed execution. IEEE Trans. Parall. Distrib. Syst. 2, 4 (Oct.), 440-451. Google Scholar
- KRANZ, D., KELSEY, R., REES, J., HUDAK, P., PHILBIN, J., AND ADAMS, N. 1986. ORBIT: An optimizing compiler for Scheme. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Pato Alto, Calif., June). SIGPLAN Not. 21, 7 (July), 219-233. Google Scholar
- KUCK, D. J. 1978. The Structure of Computers and Computations Vol. 1. John Wiley and Sons. New York. Google Scholar
- KUC~4, D. J. 1977. A survey of parallel machine organization and programming. ACM Comput. Surv. 9, 1 (Mar.), 29-59. Google Scholar
- KUCK, D. J. AND STOKES, R. 1982. The Burroughs Scientific Processor (BSP). IEEE Trans. Comput. C-31, 5 (May), 363-376.Google Scholar
- KUCK, D. J., KUHN, R. H., PADUA, D., LEASURE, B., AND WOLFE, M. 1981. Dependence graphs and compiler optimizations. In Conference Record of the 8th ACM Symposium on Principles of Programming Languages (Williamsburg, Va., Jan.). ACM Press, New York, 207-218. Google Scholar
- L~, M. S. 1988. Software pipelining: An effective scheduling technique for VLIW machines In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Atlanta, Ga., June). SIGPLAN Not. 23, 7 (July), 318-328. Google Scholar
- LAM, M. S. AND WmSON, R. P. 1992. Limits of control flow on parallelism. In Proceedings of the 19th Annual International Symposium on Computer Architecture (Gold Coast, Australia, May). SIGARCH Comput. Arch. News 20, 2, 46-57 Google Scholar
- LAM, M. S., ROTHBERG, E. E., AND WOLF. M. E. 1991. The cache performance and optimization of blocked algorithms. In Proceedzngs of the 4th Internatwnal Conference on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.). SIGPLAN Not. 26, 4, 63-74. Google Scholar
- LAMPORT, L. 1974. The parallel execution of DO loops. Commun. ACM 17, 2 (Feb.), 83-93. Google Scholar
- LANDI, W., RYDER, B. G., AND ZHANG, S. 1993. Interprocedural modification side effect analysis with pointer ahasing. In Proceedzngs of the SIGPLAN Conference on Programming Language Deszgn and Implementation (Albuquerque, New Mexico, June). SIGPLAN Not. 28, 6, 56-67. Google Scholar
- LARUS, J. R. 1993 Loop-level parallelism in numeric and symbolic programs. IEEE Trans. Parall. Dtstrib. Syst 4, 7 (July), 812-826. Google Scholar
- LEE, G., KRUSKAL, C. P., AND KUCK, D. J. 1985. An empirical study of automatic restructuring of nonnumerical programs for parallel processors. IEEE Trans. Comput. C-34, 10 (Oct.), 927-933. Google Scholar
- LI, Z. 1992. Array privatization for parallel execution of loops. In Proceedings of the A CM International Conference on Supercomput~ng (Washington, D.C., July). ACM Press, New York, 313-322. Google Scholar
- LI, J. AND CHEN, M. 1991. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. Parall. Distrib. Syst. 2, 3 (July), 361-376. Google Scholar
- LI, J. AND CHEN, M. 1990. Index domain alignment: Minimizing cost of cross-referencing between distributed arrays. In The 3rd Symposium on the Frontiers of Massively Parallel Computation, J. Jaja, Ed. IEEE Computer Society Press, Los Alamitos, Calif., 424-433.Google Scholar
- LI, Z. AND YEW, P. 1988. Interprocedura} analysis for parallel computing. In Proceedings of the International Conference on Parallel Processing. F. A. Briggs, Ed. Vol. 2. Pennsylvania State University Press, University Park, Pa., 221-228.Google Scholar
- LI, Z., YEW, P., AND ZHU, C. 1990. Data dependence analysis on multi-dimensional array references. IEEE Trans. Parall. Distnb. Syst. 1, 1 (Jan.), 26-34, Google Scholar
- LOVEMAN, D. B. 1977. Program improvement by source-to-source transformation. J. ACM 1, 24 (Jan.), 121-145. Google Scholar
- Lvcco, S. 1992. A dynamic scheduling method for irregular parallel programs. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 200-211. Google Scholar
- MACE, M. E. 1987, Memory Storage Patterns in Parallel Processing. Kluwer Academic Publishers, Norwell, Mass. Google Scholar
- MACE, M. E. AND WAGNER, R. A. 1985. Globally optimum selection of memory storage patterns. In Proceedings of the International Conference on Parallel Processing, D. DeGrott, Ed. IEEE Computer Society, Washington, D.C., 264-271.Google Scholar
- MARKSTEIN, P., MARKSTEIN, V., AND ZADECK, K. 1994. Strength reduction. In Optimizatmn in Compilers. ACM Press, New York, Chap. 9. To be published.Google Scholar
- MASSALIN, H. 1987, Superoptimizer: A look at the smallest program. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems (Palo Alto, Calif., Oct.). SIGPLAN Not. 22, 10, 122-126. Google Scholar
- MAYDAN, D. }~., AMARASINGHE, S. P., AND LAM, M. S. 1993. Array data flow analysis and its use in array privatization. In Conference Record of the 20th ACM Symposium on Principles of Programming Languages (Charleston, S. Carolina, Jan.). ACM Press, New York, 1-15. Google Scholar
- MAYDAN, D. E., HENNESSEY, J. L., AND LAM, M. S. 1991. Efficient and exact data dependence analysis, in Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIG- PLAN Not. 26, 6, 1-14. Google Scholar
- MCFARLING, S. 1991. Procedure merging with instruction caches. In Proceedings of the SIG= PLAN Conference on Programming Language Design and Implementation (Toronto, Ontario, June). SIGPLAN Not. 26, 6, 71-79. Google Scholar
- McGRAw, J. R. 1985. SISAL: Streams and iteration in a single assignment language. Tech. Rep. M-146, Lawrence Livermore National Laboratory, Livermore, Calif.Google Scholar
- MCMAHON, F. M. 1986. The Livermore Fortran kernels: A computer test of numerical performance range. Tech. Rep. UCRL-55745, Lawrence Livermore National Laboratory, Livermore, Calif.Google Scholar
- MICHIE, D. 1968. "Memo" functions and machine learning. Nature 218, 19-22.Google Scholar
- MILLSTEIN, R. E. AND MUNTZ, C. A. 1975. The IL- LIAC-IV Fortran compiler. In Programming Languages and Compilers for Parallel and Vector Machines (New York, N.Y., Mar.). SIG- PLAN Not. 10, 3, 1-8. Google Scholar
- MIRCHANDANEY, R., SALTZ, J. H., SMITH, R. M., NICOL, D. 1V{., AND CROWLEY, K. 1988. Principles of runtime support for parallel processors. In Proceedings of the ACM International Conference on Supercomputing (St. Malo, France, July). ACM Press, New York, 140-152. Google Scholar
- MOREL, E. AND RENVOISE, C. 1979. Global optimization by suppression of partial redundancies. Commun. ACM 22, 2 (Feb.), 96-103. Google Scholar
- MUCHNICK, S. S. AND JONES, N., (EDS.) 1981. Program Flow Analysis. Prentice-Hall, Englewood Cliffs, N.J. Google Scholar
- MUm~OKA, Y. 1971. Parallelism exposure and exploitation in programs. Ph.D. thesis, Tech. Rep. 71-424, Univ. of Illinois at Urbana-Champaign.Google Scholar
- MYERS, E. W. 1981. A precise interprocedural data flow algorithm. In Conference Record of the 8th ACM Symposium on Principles of Programming Languages (Williamsburg, Va., Jan.). ACM Press, New York, 219-230. Google Scholar
- NICOLAU, A. 1988. Loop quantization: A generalized loop unwinding technique. J. Para ll. Distrib. Comput. 5, 5 (Oct.), 568-586. Google Scholar
- NICOLAU, A. AND FISHER, J. A. 1984. Measuring the parallelism available for very long instruction word architectures. IEEE Trans. Comput. C-33, 11 (Nov.), 968-976.Google Scholar
- NIKHIL, R. S. 1988. ID reference manual, version 88.0. Tech. Rep. 284, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.Google Scholar
- NOBAYASHI, H. AND EOYANG, C. 1989. A comparison study of automatically vectorizing Fortran compilers. In Proceedings of Supercomputing '89 (Reno, Nev., Nov.). ACM Press, New York, 820-825. Google Scholar
- O'BRIEN, K., HAY, B., MINISH, J., SCHAFFER, H., SCHLOSS, B., SHEPHERD, A., AND ZALESKI, M. 1990. Advanced compiler technology for the RISC System/6000 architecture. In IBM RISC System/6000 Technology. Publication SA23- 2619. IBM Corporation, Mechanlcsburg, PennGoogle Scholar
- OED, W. 1992. Cray Y-MP C90: System features and early benchmark results. Parall. Comput. 18, 8 (Aug.), 947-954. Google Scholar
- OEHLER, R. R. AND BLASGEN, M. W. 1991. IBM RISC System/6000: Architecture and performance. IEEE Micro 11, 3 (June), 14-24. Google Scholar
- PADUA, D. A. AND PETERSEN, P. M. 1992. Evaluation of paralleIizing compilers. Parallel Computing and Transputer Applications, (Barcelona, Spain. Sept.). CIMNE, 1505-1514. Also available as Center for Supercomputing Research and Development Tech Rep. 1173.Google Scholar
- PADUA, D. A. AND WOLFE. M. J. 1986. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (Dec.), 1184-1201. Google Scholar
- PADUA, D. A., KUCK, D. j. AND LAWRIE, D. 1980. High-speed multiprocessors and compilation techniques. IEEE Trans. Comput. C-29, 9 (Sept.), 763-776.Google Scholar
- PETERSEN, P. M. AND PADUA, D. A. 1993. Static and dynamic evaluation of data dependence analysis. In Proceedings of the ACM International Conference on Supercomputing (Tokyo, Japan, July}. ACM Press, New York, 107-116. Google Scholar
- PETTIS, K. AND HANSEN, R. C. 1990. Profile guided code pomtioning. In Proceedings of the SIG- PLAN Conference on Programming Language Design and Implementation (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 16-27. Google Scholar
- POLYCHRONOPOULOS, C. D. 1988. Parallel Programming and Compilers. Kluwer Academic Publishers, Boston, Mass. Google Scholar
- POLYCHRONOPOULOS, C. D. 1987a Advanced loop optlmlzations for parallel computers. In Proceed~ngs of the 1st International Conference on Supercomputing. Lecture Notes in Computer Science, voI 297. Springer-Verlag, Berlin, 255-277. Google Scholar
- POLYCHRONOPOULOS, C. D. 1987b. Loop coalescing: A compiler transformation for parallel machines. In Proceedings of the International Conference on Parallel Processing (University Park, Penn., Aug.). Pennsylvania State University Press, University Park, Pa., 235-242.Google Scholar
- POLYCHRONOPOULOS, C. D. AND KUCK, D. J. 1987. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput. C-36, 12 (Dec.), 1425-1439. Google Scholar
- POLYCHRONOPOULOS, C. D., GIRKAR, M., HAGHIGHAT, M. R., LEE, C. L., LEUNG, B., AND SCHOUTEN, D. 1989. Parafrase-2: An environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors. In Proceedings of the International Conference on Parallel Processing. Volume 2. Pennsylvania State University Press, University Park, Pa., 39-48.Google Scholar
- PRESBERO, D. L. AND JOHNSON, N. W. 1975 The Paralyzer: IVTRAN's parallelism analyzer and synthesizer. In Programming Languages and Compilers for Parallel and Vector Machines (New York, N.Y., Mar.). SIGPLAN Not. 10, 3, 9-16. Google Scholar
- PUGH, W. 1992. A practical algorithm for exact array dependence analysis Commun. ACM 35, 8 (Aug.). 102-115. Google Scholar
- PUGH, W. 1991. Umform techniques for loop optimization. In Proceedings of the ACM International Conference on Supercomputing (Cologne, Germany, June). ACM Press, New York. Google Scholar
- RAU, B. AND FISHER, J. A. 1993. Instruchon4evel parallel processing: History, overview, and perspective. J. Supercomput. 7, 1/2 (May), 9-50. Google Scholar
- RAU, B., YEN, D. W. L., YEN, W., AND TOWLE, R. A. 1989. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs. Computer 22, 1 (Jan.), 12-34. Google Scholar
- REES, j., CLINGER, W., ET AL. 1986 Revised3 report on the algorithmic language Scheme. SIG- PLAN Not 21, 12 (Dec.), 37-79. Google Scholar
- RISEMAN, E. M. AND FOSTER, C. C. 1972. The inhibition of potential parallelism by conditional jumps. IEEE Trans. Comput. C-21~ 12 (Dec.), 1405-1411.Google Scholar
- ROGERS, A. M. 1991. Compiling for locality of reference. Ph.D. thesis, Tech. Rep. TR 91-1195, Dept. of Computer Science, Cornell University. Google Scholar
- RUSSELL, R. M. 1978. The CRAY-1 computer system. Commun. ACM 21, 1 (Jan.), 63-72. Google Scholar
- SABOT, G. AND WHOLEY, S. 1993 CMAX: A Fortran translator for the Connection Machine ~ystem. In Proceedings of the ACM InternatLonal Conference on Supercornput~ng (Tokyo, Japan, July). ACM Press, New York. Google Scholar
- SARKAR, V. 1989. Partitioning and Scheduling Parallel Programs for Multiprocessors. Research Monographs in Parallel and Distributed Computing. MIT Press, Cambridge, Mass. Google Scholar
- SARKAR, V. AND THEKKATH, R. 1992. A general framework for iteration-reordering transformations. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 175-187. Google Scholar
- SCHEIFLER, R. W. 1977. An analysis of inline substitution for a structured programming language. Commun. ACM 20, 9 (Sept.). 647-654. Google Scholar
- SHEN, Z., LI, Z., AND YEW, P-C. 1990. An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Parall. Distrib. Syst. 1, 3 (July), 356-364. Google Scholar
- SINGH, J. P. AND HENNESSY, J. L. 1991. An empirical investigation of the effectiveness and limitations of automatic parallelization. In Proceedings of the International Symposium on Shared Memory Multiprocessing, (Apr.). 213-240.Google Scholar
- SINGH, J. P., WEBER, W.-D., AND GUPTA, A. 1992. SPLASH: Stanford parallel applications for shared memory. SIGARCH Comput. Arch. News 20, 1 (Mar.), 5-44. Also available as Stanford Univ. Tech. Rep. CSL-TR-92-526. Google Scholar
- SITES, R. L., (ED.) 1992. Alpha Architecture Reference Manual. Digital Press, Bedford, Mass. Google Scholar
- SOHI, G. S. AND VAJAPAYEM, S. 1990. Instruction issue logic for high-performance, interruptable, multiple functional unit, pipelined computers. IEEE Trans. Comput. C-39, 3 (Mar.), 349-359. Google Scholar
- STEELE, G. L., JR. 1977. Arithmetic shifting considered harmful. SIGPLAN Not. 12, 11 (Nov.), 61-69. Google Scholar
- STEELE, G. L., JR. AND WHITE, J. L. 1990. How to print floating-point numbers accurately. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (White Plains, N.Y., June). SIGPLAN Not. 25, 6, 112-126. Google Scholar
- STENSTROM, P. 1990. A survey of cache coherence schemes for multiprocessors. Computer 23, 6 (June), 12-24. Google Scholar
- SUN MICROSYSTEMS. 1991. SPARC Architecture Manual, Version 8. Part No. 800-1399-08. Sun Microsystems, Mountain View, Calif.Google Scholar
- SZYMANSI4I, T. G. 1978. Assembling code for machines with span-dependent instructions. Cornmum. ACM 21, 4 (Apr.), 300-308. Google Scholar
- TANG, P. AND YEW, P. 1990. Dynamic processor self-scheduling for general parallel nested loops. IEEE Trans. Comput. C-39, 7 (July), 919-929. Google Scholar
- THINKING MACHINES. 1989. Connection Machine, Model CM-2 Technical Summary. Thinking Machines Corp., Cambridge, Mass.Google Scholar
- TJADEN, G. S. AND FLYNN, M. J. 1970. Detection and parallel execution of parallel instructions. IEEE Trans. Comput. C-19, 10 (Oct.), 889-895.Google Scholar
- TORRELLAS, J., LAM, H. S., AND HENNESSY, J. L. 1994. False sharing and spatial locality in multiprocessor caches. IEEE Trans. Comput. 43, 6 (June), 651-663. Google Scholar
- TOWLE, R. A. 1976. Control and data dependence for program transformations. Ph.D. thesis, Tech. Rep. 76-788, Computer Science Dept., Univ. of Illinois at Urbana-Champaign. Google Scholar
- TRIOLET, R., IRIGOIN, F., AND FEAUTRIER, P. 1986. Direct parallelization of call statements. In Proceedings of the SIGPLAN Symposium on Compiler Construction (Palo Alto, Calif., June). SIGPLAN Not. 21, 7 (July), 176-185. Google Scholar
- TSENG, C.-W. 1993. An optimizing Fortran D compiler for MIMD distributed-memory machines. Ph.D. thesis, Tech. Rep. Rice COMP TR93-199, Computer Science Dept., Rice University, Houston, Tex. Google Scholar
- TU, P. AND PADUA, D. A. 1993. Automatic array privatization. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science, vol. 768. Springer-Verlag, Berlin, 500-521. Google Scholar
- VON HANXLEDEN, R. AND KENNEDY, K. 1992. Relaxing SIMD control flow constraints using loop transformations. In Proceedings of the SIG- PLAN Conference on Programming Language Design and Implementation (San Francisco, Calif., June). SIGPLAN Not. 27, 7 (July), 188-199. Google Scholar
- WALL, D. W. 1991. Limits of instruction-level parallelism. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.). SIGPLAN Not. 26, 4, 176-188. Google Scholar
- WANG, K. 1991. Intelligent program optimization and parallelization for parallel computers. Ph.D. thesis, Tech. Rep. CSD-TR-91-030, Purdue University, West Lafayette, Ind. Google Scholar
- WANG, Ko AND GANNON, D. 1989. Applying AI techniques to program optimization for parallel computers, in Parallel Processing for Supercomputers and Artificial Intelligence, K. Hwang and D. Degroot, Eds. McGraw Hill, New York, Chap. 12.Google Scholar
- WEDEL, D. 1975. Fortran for the Texas Instruments ASC system. In Programming Languages and Compilers for Parallel and Vector Machines (New York, N.Y., Mar.). SIGPLAN Not. 10, 3, 119-132. Google Scholar
- WEGMAN, M. N. AND ZADECK, F. K. 1991. Constant propagation with conditional branches. ACM Trans. Program. Lang. Syst. 13, 2 (Apr.), 181-210. Google Scholar
- WEISS, M. 1991. Strip mining on SIMD architectures. In Proceedings of the ACM International Conference on Supercomputing (Cologne, Germany, June). ACM Press, New York, 234-243. Google Scholar
- WHOLEY, S. 1992. Automatic data mapping for distributed-memory parallel computers. In Proceedings of the ACM International Conference on Supercomputing (Washington, D.C., July). ACM Press, New York, 25-34. Google Scholar
- WOLF, M. E. 1992. improving locality and parallelism in nested loops. Ph.D. thesis, Computer Science Dept., Stanford University, Stanford, Calif. Google Scholar
- WOLFE, M. J. 1989a. More iteration space tiling. In Proceedings of Supercomputing '89 (Reno, Nev., Nov.). ACM Press, New York, 655-664. Google Scholar
- WOLFE, M. J. 1989b. Optimizing Supercompilers for Supercomputers. Research Monographs in Parallel and Distributed Computing. MIT Press, Cambridge, Mass. Google Scholar
- WOLF, M. E. AND LAM, M. S. 1991. A loop transformation theory and an atgonthm to maximize parallelism. IEEE Trans. Parall. D~strzb. Syst. 2, 4 (Oct.), 452-471. Google Scholar
- WOLFE, M J AND TSENG, C. 1992. The Power Test for data dependence. 1EEE Trans. Parall. Distrtb. Syst. 3, 5 (Sept), 591-601 Google Scholar
- ZIMA, H. P, BAST, H. J, AND GERNDT, M. 1988. SUPERB. A tool for semi-automatic S1MD/MIMD parallelization Parall. Comput 6, 1 (Jan.J, 1-18.Google Scholar
Index Terms
- Compiler transformations for high-performance computing
Recommendations
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS '15This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
PolyMage: Automatic Optimization for Image Processing Pipelines
ASPLOS'15This paper presents the design and implementation of PolyMage, a domain-specific language and compiler for image processing pipelines. An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each ...
Comments