Abstract
This article presents a new analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views the computation as composed of operations on objects. It then analyzes the program at this granularity to discover when operations commute (i.e., generate the same final result regardless of the order in which they execute). If all of the operations required to perform a given computation commute, the compiler can automatically generate parallel code. We have implemented a prototype compilation system that uses commutativity analysis as its primary analysis technique. We have used this system to automatically parallelize three complete scientific computations: the Barnes-Hut N-body solver, the Water liquid simulation code, and the String seismic simulation code. This article presents performance results for the generated parallel code running on the Stanford DASH machine. These results provide encouraging evidence that commutativity analysis can serve as the basis for a successful parallelizing compiler.
- BANERJEE, W. 1988. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, Mass.]] Google Scholar
- BANERJEE, W., EIGENMANN, R., NICOLAU, t., AND PADUA, D. 1993. Automatic program parallelization. Proceedings IEEE 81, 2 (Feb.), 211-243.]]Google Scholar
- BARNES, J. AND HUT, P. 1986. A hierarchical O(NlogN) force calculation algorithm. Nature 324, 4 (Dec.), 446-449.]]Google Scholar
- BERNSTEIN, A. J. 1966. Analysis of programs for parallel processing. IEEE Trans. Electron. Comput. 15, 5 (Oct.), 757-763.]]Google Scholar
- BERRY, ~/{., CHEN, D., Koss, P., KUCK, D., Lo, S., PANG, Y., POINTER, L., ROLOFF, R., SAMEH, t., CLEMENTI, E., CHIN, S., SCHNEIDER, D., FOX, G., ~/{ESSINA, P., WALKER, D., HSIUNG, C., SCHWARZMEIER, J., LUE, K., ORSZAG, S., SEIDL, F., JOHNSON, O., GOODRUM, R., AND MARTIN, J. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. ICASE Rep. 827, Center for Supercomputing Research and Development, Univ. of Illinois at Urbana-Champaign, Urbana, Ill., May.]]Google Scholar
- BLUME, W. AND EIGENMANN, R. 1992. Performance analysis of parallelizing compilers on the Perfect Benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 6 (Nov.), 643-656.]] Google Scholar
- BLUME, W. AND EIGENMANN, R. 1995. Symbolic range propagation. In Proceedings of the 9th International Parallel Processing Symposium. IEEE Computer Society Press, Los Alamitos, Calif., 357-363.]] Google Scholar
- BODIN, F., BECKMAN, P., GANNON, D., GOTWALS, J., NARAYANA, S., SRINIVAS, S., AND WINNICKA, B. 1994. Sage++: An object-oriented toolkit and class library for building Fortran and C++ structuring tools. In Proceedings of the Object-Oriented Numerics Conference.]]Google Scholar
- CALLAHAN, D. 1991. Recognizing and parallelizing bounded recurrences. In Proceedings of the gth Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, 169-184.]] Google Scholar
- CARLISLE, M. AND ROGERS, A. 1995. Software caching and computation migration in Olden. In Proceedings of the 5th A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, New York, 29-38.]] Google Scholar
- CHANDRA, R., GUPTA, A., AND HENNESSY, J. 1993. Data locality and load balancing in COOL. In Proceedings of the gth A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, New York, 249-259.]] Google Scholar
- CHASE, D., WEGMAN, M., AND ZADEK, F. 1990. Analysis of pointers and structures. In Proceedings of the SIGPLAN '90 Conference on Program Language Design and Implementation. ACM Press, New York, 296-310.]] Google Scholar
- CLARKE, L. AND RICHARDSON, D. 1981. Symbolic evaluation methods for program analysis. In Program Flow Analysis: Theory and Applications, S. Muchnick and N. Jones, Eds. Prentice- Hall, Englewood Cliffs, N.J., 79-101.]]Google Scholar
- DARLINGTON, J. 1972. A semantic approach to automatic program improvement. Ph.D. thesis, Univ. of Edinburgh, Edinburgh, U.K.]]Google Scholar
- DILLON, L. 1987a. Verification of Ada tasking programs using symbolic execution, Part I: Partial Correctness. Tech. Rep. TRCS87-20, Dept. of Computer Science, Univ. of California, Santa Barbara, Calif., Oct.]]Google Scholar
- DILLON, L. 1987b. Verification of Ada tasking programs using symbolic execution, Part II: General Safety Properties. Tech. Rep. TRCS87-21, Dept. of Computer Science, Univ. of California, Santa Barbara, Calif., Oct.]]Google Scholar
- DINIZ, P. AND RINARD, M. 1996. Lock coarsening: Eliminating lock overhead in automatically parallelized object-based programs. In Proceedings of the 9th Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin, 285-299.]] Google Scholar
- DOUGLAS, J. AND KEMMERER, R. 1994. Aslantest: A symbolic execution tool for testing Aslan formal specifications. In the 199~ International Symposium on Software Testing and Analysis, ACM Press, New York, 15-27.]] Google Scholar
- EIGENMANN, R., HOEFLINGER, J., LI, Z., AND PADUA, D. 1991. Experience in the automatic parallelization of four Perfect Benchmark programs. In Languages and Compilers for Parallel Computing, gth International Workshop, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Springer-Verlag, Berlin.]] Google Scholar
- EMAMI, M., GHIYA, R., AND HENDREN, L. 1994. Context-sensitive interprocedural points-to analysis in the presence of function pointers. In Proceedings of the SIGPLAN '9~ Conference on Program Language Design and Implementation. ACM Press, New York, 242-256.]] Google Scholar
- FISHER, t. AND GHULOUM, t. 1994. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '9~ Conference on Program Language Design and Implementation. ACM Press, New York, 135-144.]] Google Scholar
- GHULOUM, t. AND FISHER, t. 1995. Flattening and parallelizing irregular, recurrent loop nests. In Proceedings of the 5th A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, New York, 58-67.]] Google Scholar
- GIFFORD, D., JOUVELOT, P., LUCASSEN, J., AND SHELDON, M. 1987. FX-87 Reference Manual. Tech. Rep. MIT/LCS/TR-407, MIT, Cambridge, Mass., Sept.]]Google Scholar
- GRAHAM, S., KESSLER, P., AND McKuSICK, M. 1982. gprof: A call graph execution profiler. In Proceedings of the SIGPLAN '82 Symposium on Compiler Construction. ACM Press, New York.]] Google Scholar
- HALL, M., AMARASINGHE, S., MURPHY, B., LIAO, S., AND LAM, M. 1995. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. In Proceedings of Supercomputing '95. IEEE Computer Society Press, Los Alamitos, Calif.]] Google Scholar
- HAMMEL, R. AND GIFFORD, D. 1988. FX-87 Performance Measurements: Dataflow Implementation. Tech. Rep. MIT/LCS/TR-421, MIT, Cambridge, Mass., Nov.]] Google Scholar
- HARRIS, J., LAZARATOS, S., AND MICHELENA, R. 1990. Tomographic string inversion. In the 60th Annual International Meeting, Society of Exploration and Geophysics, Extended Abstracts, 82-85.]]Google Scholar
- HENDREN, L., HUMMEL, J., AND NICOLAU, t. 1992. Abstractions for recursive pointer data structures: Improving the analysis and transformation of imperative programs. In Proceedings of the SIGPLAN '92 Conference on Program Language Design and Implementation. ACM Press, New York.]] Google Scholar
- HENDREN, L., HUMMEL, J., AND NICOLAU, t. 1994. Hendren A general data dependence test for dynamic, pointer-based data structures. In Proceedings of the SIGPLAN '9~ Conference on Program Language Design and Implementation. ACM Press, New York, 218-229.]] Google Scholar
- HIRANANDANI, S., KENNEDY, K., AND TSENG, C. 1992. Compiling Fortran D for MIMD distributed-memory machines. Commun. ACM 35, 8 (Aug.), 66-80.]] Google Scholar
- IBARRA, O., DINIZ, P., AND RINARD, M. 1996. On the complexity of commutativity analysis. In Lecture Notes in Computer Science, Vol. 1090. Springer-Verlag, Berlin, 323-332.]] Google Scholar
- KEMMERER, R. AND ECKMANN, S. 1985. UNISEX: A UNix-based Symbolic EXecutor for Pascal. Softw. Pract. Exper. 15, 5 (May), 439-458.]]Google Scholar
- KING, J. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (July), 385-394.]] Google Scholar
- KING, J. 1981. Program reduction using symbolic execution. ACM SIGPLAN Not. 6, 1 (Jan.), 9-14.]] Google Scholar
- KNUTH, D. 1971. An empirical study of FORTRAN programs. Softw. Pract. Exper. 1, 105-133.]]Google Scholar
- LAM, M. AND RINARD, M. 1991. Coarse-grain parallel programming in Jade. In Proceedings of the 3rd A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, New York, 94-105.]] Google Scholar
- LAMPSON, B. W. AND REDELL, D. D. 1980. Experience with processes and monitors in Mesa. Commun. ACM 23, 2 (Feb.), 105-117.]] Google Scholar
- LANDI, W., RYDER, B., AND ZHANG, S. 1993. Interprocedural modification side effect analysis with pointer aliasing. In Proceedings of the SIGPLAN '93 Conference on Program Language Design and Implementation. ACM Press, New York.]] Google Scholar
- LARUS, J. AND HILFINGER, P. 1988. Detecting conflicts between structure accesses. In Proceedings of the SIGPLAN '88 Conference on Program Language Design and Implementation. ACM Press, New York, 21-34.]] Google Scholar
- LENGAUER, C. AND HEHNER, E. 1982. A methodology for programming with concurrency: An informal presentation. Sci. Comput. Program. 2, 1 (Oct.), 1-18.]]Google Scholar
- LENOSKI, D. 1992. The design and analysis of DASH: A scalable directory-based multiprocessor. Ph.D. thesis, Stanford Univ., Calif.]] Google Scholar
- LUSK, E., OVERBEEK, R., BOYLE, J., BUTLER, R., DISZ, T., GLICKFIELD, B., PATTERSON, J., AND STEVENS, R. 1987. Portable Programs for Parallel Processors. Holt, Rinehart and Winston, Inc.]] Google Scholar
- MOHR, E., KRANZ, D., AND HALSTEAD, R. 1990. Lazy task creation: A technique for increasing the granularity of parallel programs. In Proceedings of the 1990 ACM Conference on Lisp and Functional Programming. ACM Press, New York, 185-197.]] Google Scholar
- PINTER, S. AND PINTER, R. 1991. Program optimization and parallelization using idioms. In Proceedings of the 18th Annual A CM Symposium on the Principles of Programming Languages. ACM Press, New York, 79-92.]] Google Scholar
- P LEVYAK, J., KARAMCHETI, V., AND CHIEN, A. 1993. Analysis of dynamic structures for efficient parallel execution. In Proceedings of the 6th Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, Berlin.]] Google Scholar
- POLYCHRONOPOULOS, C. AND KUCK, D. 1987. Guided self-scheduling: A practical scheduling scheme for parallel computers. IEEE Trans. Comput. 36, 12 (Dec.), 1425-1439.]] Google Scholar
- PUGH, W. AND WONNACOTT, D. 1992. Eliminating false data dependences using the Omega test. In Proceedings of the SIGPLAN '92 Conference on Program Language Design and Implemenration. ACM Press, New York.]] Google Scholar
- RINARD, M. 1994a. The design, implementation and evaluation of Jade, a portable, implicitly parallel programming language. Ph.D. thesis, Stanford Univ., Calif.]] Google Scholar
- RINARD, M. 1994b. Implicitly synchronized abstract data types: Data structures for modular parallel programming. In Proceedings of the 2nd International Workshop on Massive Parallelism: Hardware, Software and Applications, M. Furnari, Ed. World Scientific Publishing, Singapore, 259-274.]]Google Scholar
- RINARD, M. 1995. Communication optimizations for parallel computing using data access information. In Proceedings of Supercomputing '95. IEEE Computer Society Press, Los Alamitos, Calif.]] Google Scholar
- RINARD, M. AND DINIZ, P. 1996. Commutativity analysis: A technique for automatically parallelizing pointer-based computations. In Proceedings of the 10th International Parallel Processing Symposium. IEEE Computer Society Press, Los Alamitos, Calif., 14-22.]] Google Scholar
- SALMON, J. K. 1990. Parallel hierarchical N-body methods. Ph.D. thesis, California Institute of Technology, Pasadena, Calif.]] Google Scholar
- SCALES, D. AND LAM, M. S. 1994. The design and evaluation of a shared object system for distributed memory machines. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation. USENIX Assoc., Berkeley, Calif.]] Google Scholar
- SINGH, J. 1993. Parallel hierarchical N-body methods and their implications for multiprocessors. Ph.D. thesis, Stanford Univ., Calif.]] Google Scholar
- SINGH, J., WEBER, W., AND GUPTA, A. 1992. SPLASH: Stanford parallel applications for shared memory. Comput. Arch. News 20, 1 (March), 5-44.]] Google Scholar
- STEELE, G. 1990. Making asynchronous parallelism safe for the world. In Proceedings of the 17th Annual A CM Symposium on the Principles of Programming Languages. ACM Press, New York, 218-231.]] Google Scholar
- TSENG, C. 1995. Compiler optimizations for eliminating barrier synchronization. In Proceedings of the 5th A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, New York, 144-155.]] Google Scholar
- RSCHLER, G. 1974. Complete redundant expression elimination in flow diagrams. Research Rep. RC 4965, IBM, Yorktown Heights, N.Y., Aug.]]Google Scholar
- WEIHL, W. 1988. Commutativity-based concurrency control for abstract data types. IEEE Trans. Comput. 37, 12 (Dec.), 1488-1505.]] Google Scholar
- WILSON, R. AND LAM, M. 1995. Efficient context-sensitive pointer analysis for C programs. In Proceedings of the SIGPLAN '95 Conference on Program Language Design and Implementation. ACM Press, New York, 1-12.]] Google Scholar
- Woo, S., OHARA, M., TORRIE, E., SINGH, J., AND GUPTA, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22th International Symposium on Computer Architecture. ACM Press, New York.]] Google Scholar
- YONEZAWA, A., BRIOT, J.-P., AND SHIBAYAMA, E. 1986. Object oriented concurrent programming in ABCL/1. In Proceedings of the OOPSLA-86 Conference. ACM Press, New York, 258-268.]] Google Scholar
Index Terms
- Commutativity analysis: a new analysis technique for parallelizing compilers
Recommendations
Commutativity analysis: a new analysis framework for parallelizing compilers
This paper presents a new analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views the computation as composed of operations on objects. ...
Commutativity Analysis: A Technique for Automatically Parallelizing Pointer-Based Computations
IPPS '96: Proceedings of the 10th International Parallel Processing SymposiumThis paper introduces an analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views computations as composed of operations on objects. It ...
Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96: Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementationThis paper presents a new analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views the computation as composed of operations on objects. ...
Comments