Abstract
The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90's. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.
- Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2006. Compilers: Principles, Techniques, and Tools, 2nd ed. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- Allen, R. and Kennedy, K. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan-Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Ammarguellat, Z. and Harrison, W. L. 1990. Automatic recognition of induction and recurrence relations by abstract interpretation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 283--295. Google ScholarDigital Library
- Andrade, D., Arenaz, M., Fraguela, B. B., Touriño, J., and Doallo, R. 2007. Automated and accurate cache behavior analysis for codes with irregular access patterns. Concur. Comput. Pract. Exper. 19, 18 (Dec.), 2407--2423. Google ScholarDigital Library
- Arenaz, M. 2003. Compiler framework for the automatic detection of loop-level parallelism. Ph.D. dissertation, Department of Electronics and Systems, University of A Coruña. (Available at http://www.des.udc.es/~arenaz/papers/phdthesis_arenaz.pdf.Google Scholar
- Arenaz, M., Touriño, J., and Doallo, R. 2003. A GSA-based compiler infrastructure to extract parallelism from complex loops. In Proceedings of the 17th International Conference on Supercomputing. (San Francisco, CA). ACM, New York, 193--204. Google ScholarDigital Library
- Arenaz, M., Touriño, J., and Doallo, R. 2004. Compiler support for parallel code generation through kernel recognition. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (Santa Fe, NM). IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
- Ballance, R. A., MacCabe, A. B., and Ottenstein, K. J. 1990. The program dependence web: A representation supporting control, data, and demand-driven interpretation of imperative languages. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 257--271. Google ScholarDigital Library
- Bank, R. E. 2007. PLTMG package. Available at http://cam.ucsd.edu/~reb/software.html.Google Scholar
- Berry, M., Chen, D., Koss, P., Kuck, D., Pointer, L., Lo, S., Pang, Y., Roloff, R., Sameh, A., Clementi, E., Chin, S., Schneider, D., Fox, G., Messina, P., Walker, D., Hsiung, C., Schwarzmeier, J., Lue, K., Orzag, S., Seidl, F., Johnson, O., Swanson, G., Goodrum, R., and Martin, J. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Int. J. Supercomput. Apps. 3, 3, 5--40.Google ScholarDigital Library
- Bhansali, S. and Hagemeister, J. R. 1995. A pattern matching approach for reusing software libraries in parallel systems. In Proceedings of Workshop on Knowledge Based Systems for the Reuse of Program Libraries (Sophia Anthipolis, France).Google Scholar
- Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D. A., Paek, Y., Pottenger, W. M., Rauchwerger, L., and Tu, P. 1996. Parallel programming with Polaris. IEEE Computer 29, 12 (Dec.), 78--82. Google ScholarDigital Library
- Callahan, D. 1991. Recognizing and parallelizing bounded recurrences. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing (Santa Clara, CA). Lecture Notes in Computer Science, vol. 589. Springer-Verlag, New York, 169--185. Google ScholarDigital Library
- Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4 (Oct.), 451--490. Google ScholarDigital Library
- di Martino, B. and Iannello, G. 1996. PAP recognizer: A tool for automatic recognition of parallelizable patterns. In Proceedings of the 4th International Workshop on Program Comprehension (Berlin, Germany). IEEE Computer Society Press, Los Alamitos, CA, 164--174. Google ScholarDigital Library
- Fahringer, T. and Scholz, B. 2000. A unified symbolic evaluation framework for parallelizing compilers. IEEE Trans. Parallel Dist. Syst. 11, 11 (Nov.), 1105--1125. Google ScholarDigital Library
- Faigin, K. A., Weatherford, S. A., Hoeflinger, J. P., Padua, D. A., and Petersen, P. M. 1994. The Polaris internal representation. Int. J. Parall. Prog. 22, 5 (Oct.), 553--586. Google ScholarDigital Library
- Fisher, A. L. and Ghuloum, A. M. 1994. Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (Orlando, FL). ACM, New York, 135--146. Google ScholarDigital Library
- GCC Internals. GNU Compiler Collection Internals (GCC). Available at http://gcc.gnu. org/onlinedocs/gccint.pdf.Google Scholar
- Gerlek, M. P., Stoltz, E., and Wolfe, M. 1995. Beyond induction variables: Detecting and classifying sequences using a demand-driven SSA. ACM Trans. Program. Lang. Syst. 17, 1 (Jan.), 85--122. Google ScholarDigital Library
- Haghighat, M. R. and Polychronopoulos, C. D. 1996. Symbolic analysis for parallelizing compilers. ACM Trans. Program. Lang. Syst. 18, 4 (July), 477--518. Google ScholarDigital Library
- Harandi, M. T. and Ning, J. Q. 1990. Knowledge-based program analysis. IEEE Softw. 7, 1, 74--81. Google ScholarDigital Library
- Jouvelot, P. and Dehbonei, B. 1989. A unified semantic approach for the vectorization and parallelization of generalized reductions. In Proceedings of the 3rd International Conference on Supercomputing (Heraklion, Crete). ACM, New York, 186--194. Google ScholarDigital Library
- Keßler, C. W. 1996. Pattern-driven automatic parallelization. Scient. Progr. 5, 3, 251--274. Google ScholarDigital Library
- Keßler, C. W. and Smith, C. 1999. The SPARAMAT approach to automatic comprehension of sparse matrix computations. In Proceedings of the 7th International Workshop on Program Comprehension (Pittsburgh, PA). IEEE Computer Society Press, Los Alamitos, CA, 200--207. Google ScholarDigital Library
- Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, CA). ACM, New York, 107--120. Google ScholarDigital Library
- Knobe, K. and Sarkar, V. 2000. Enhanced parallelization via analyses and transformations on Array SSA form. In Proceedings of the 8th International Workshop on Compilers for Parallel Computers (Aussois, France).Google Scholar
- Kozaczynski, W., Ning, J. Q., and Engberts, A. 1992. Program concept recognition and transformation. IEEE Trans. Softw. Eng. 18, 12, 1065--1075. Google ScholarDigital Library
- Lin, Y. and Padua, D. A. 1998. On the automatic parallelization of sparse and irregular Fortran programs. In Proceedings of the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers (Pittsburgh, PA). Lecture Notes in Computer Science, vol. 1511. Springer-Verlag, New York, 41--56. Google ScholarDigital Library
- Merrill, J. 2003. GENERIC and GIMPLE: A new tree representation for entire functions. In Proceedings of the 2003 GCC Developers Summit. 171--180. (Available at http://www.gccsummit.org/2003.Google Scholar
- Metzger, R. 1995. Automated recognition of parallel algorithms in scientific applications. In IJCAI-95 Workshop Program Working Notes: “The Next Generation of Plan Recognition Systems”.Google Scholar
- Muchnick, S. S. 1997. Advanced Compiler Design and Implementation. Morgan-Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Paul, S. and Prakash, A. 1994. A framework for source code search using program patterns. IEEE Trans. Softw. Eng. 20, 6, 463--475. Google ScholarDigital Library
- Pinter, S. S. and Pinter, R. Y. 1994. Program optimization and parallelization using idioms. ACM Trans. Program. Lang. Syst. 16, 3 (May), 305--327. Google ScholarDigital Library
- Pottenger, W. M. and Eigenmann, R. 1995. Idiom recognition in the Polaris parallelizing compiler. In Proceedings of the 9th International Conference on Supercomputing (Barcelona, Spain). ACM, New York, 444--448. Google ScholarDigital Library
- Redon, X. and Feautrier, P. 1993. Detection of recurrences in sequential programs with loops. In Proceedings of the 5th International Parallel Architectures and Languages Europe Conference (Munich, Germany). Lecture Notes in Computer Science, vol. 694. Springer-Verlag, New York, 132--145. Google ScholarDigital Library
- Saad, Y. 1994. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations (Version 2). (Available at http://www.cs.umn.edu/~saad/software/SPARSKIT/sparskit.html.Google Scholar
- Sabot, G. and Wholey, S. 1993. CMAX: A Fortran translator for the connection machine system. In Proceedings of the 7th International Conference on Supercomputing (Tokyo, Japan). ACM, New York, 147--156. Google ScholarDigital Library
- SPEC. SPEC CPU2000. Standard Performance Evaluation Corporation. Available at http://www.spec.org/cpu2000/.Google Scholar
- Suganuma, T., Komatsu, H., and Nakatani, T. 1996. Detection and global optimization of reduction operations for distributed parallel machines. In Proceedings of the 10th International Conference on Supercomputing (Philadelphia, PA). ACM, New York, 18--25. Google ScholarDigital Library
- Tarjan, R. E. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 2 (June), 146--160.Google ScholarDigital Library
- Tu, P. and Padua, D. A. 1995. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 9th International Conference on Supercomputing (Barcelona, Spain). ACM, New York, 414--423. Google ScholarDigital Library
- van Engelen, R. 2001. Efficient symbolic analysis for optimizing compilers. In Proceedings of the 10th International Conference on Compiler Construction (Genova, Italy). Lecture Notes in Computer Science, vol. 2027. Springer-Verlag, New York, 118--132. Google ScholarDigital Library
- van Engelen, R., Birch, J., Shou, Y., Walsh, B., and Gallivan, K. 2004. A unified framework for nonlinear dependence testing and symbolic analysis. In Proceedings of the 18th International Conference on Supercomputing (Saint Malo, France). ACM, New York, 106--115. Google ScholarDigital Library
- Wills, L. M. 1990. Automated program recognition: A feasibility demonstration. Artif. Intell. 45, 1-2, 113--171. Google ScholarDigital Library
- Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- Wu, P., Cohen, A., Hoeflinger, J., and Padua, D. A. 2001. Monotonic evolution: An alternative to induction variable substitution for dependence analysis. In Proceedings of the 15th International Conference on Supercomputing (Sorrento, Italy). ACM, New York, 78--91. Google ScholarDigital Library
- Zhang, F. and D'Hollander, E. H. 1994. Enhancing parallelism by removing cyclic data dependencies. In Proceedings of the 6th International Parallel Architectures and Languages Europe Conference (Athens, Greece). Lecture Notes in Computer Science, vol. 817. Springer-Verlag, New York, 387--397. Google ScholarDigital Library
- Zima, E. V. 1986. Automatic construction of systems of recurrence relations. USSR Comput. Math. Math. Phys. 24, 11-12, 193--197. Google ScholarDigital Library
- Zima, E. V. 1995. Simplification and optimization transformations of chains of recurrences. In Proceedings of the 8th International Symposium on Symbolic and Algebraic Computation (Montreal, Ont., Canada). ACM New York, 42--50. Google ScholarDigital Library
Index Terms
- XARK: An extensible framework for automatic recognition of computational kernels
Recommendations
Interprocedural parallelization analysis in SUIF
As shared-memory multiprocessor systems become widely available, there is an increasing need for tools to simplify the task of developing parallel programs. This paper describes one such tool, the automatic parallelization system in the Stanford SUIF ...
Interprocedural analysis based on guarded array regions
Compiler optimizations for scalable parallel systemsArray data flow information plays an important role for successful automatic parallelization of Fortran programs. This chapter proposes a powerful symbolic array data flow summary scheme to support array privatization and loop parallelization for ...
A Unified Symbolic Evaluation Framework for Parallelizing Compilers
The quality of many optimizations and analyses of parallelizing compilers depends significantly on the ability to evaluate symbolic expressions and on the amount of information available about program variables at arbitrary program points. In this paper,...
Comments