research-article

Open Access

XARK: An extensible framework for automatic recognition of computational kernels

Authors:
Manuel Arenaz

University of A Coruña, Spain

University of A Coruña, Spain
View Profile

,
Juan Touriño

University of A Coruña, Spain

University of A Coruña, Spain
View Profile

,
Ramon Doallo

University of A Coruña, Spain

University of A Coruña, Spain
View Profile

ACM Transactions on Programming Languages and Systems Volume 30 Issue 6Article No.: 32pp 1–56https://doi.org/10.1145/1391956.1391959

Published:30 October 2008Publication History

ACM Transactions on Programming Languages and Systems

Abstract

The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90's. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.

References

Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2006. Compilers: Principles, Techniques, and Tools, 2nd ed. Addison-Wesley, Reading, MA. Google ScholarDigital Library
Allen, R. and Kennedy, K. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan-Kaufmann, San Francisco, CA. Google ScholarDigital Library
Ammarguellat, Z. and Harrison, W. L. 1990. Automatic recognition of induction and recurrence relations by abstract interpretation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 283--295. Google ScholarDigital Library
Andrade, D., Arenaz, M., Fraguela, B. B., Touriño, J., and Doallo, R. 2007. Automated and accurate cache behavior analysis for codes with irregular access patterns. Concur. Comput. Pract. Exper. 19, 18 (Dec.), 2407--2423. Google ScholarDigital Library
Arenaz, M. 2003. Compiler framework for the automatic detection of loop-level parallelism. Ph.D. dissertation, Department of Electronics and Systems, University of A Coruña. (Available at http://www.des.udc.es/~arenaz/papers/phdthesis_arenaz.pdf.Google Scholar
Arenaz, M., Touriño, J., and Doallo, R. 2003. A GSA-based compiler infrastructure to extract parallelism from complex loops. In Proceedings of the 17th International Conference on Supercomputing. (San Francisco, CA). ACM, New York, 193--204. Google ScholarDigital Library
Arenaz, M., Touriño, J., and Doallo, R. 2004. Compiler support for parallel code generation through kernel recognition. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (Santa Fe, NM). IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
Ballance, R. A., MacCabe, A. B., and Ottenstein, K. J. 1990. The program dependence web: A representation supporting control, data, and demand-driven interpretation of imperative languages. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 257--271. Google ScholarDigital Library
Bank, R. E. 2007. PLTMG package. Available at http://cam.ucsd.edu/~reb/software.html.Google Scholar
Berry, M., Chen, D., Koss, P., Kuck, D., Pointer, L., Lo, S., Pang, Y., Roloff, R., Sameh, A., Clementi, E., Chin, S., Schneider, D., Fox, G., Messina, P., Walker, D., Hsiung, C., Schwarzmeier, J., Lue, K., Orzag, S., Seidl, F., Johnson, O., Swanson, G., Goodrum, R., and Martin, J. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Int. J. Supercomput. Apps. 3, 3, 5--40.Google ScholarDigital Library
Bhansali, S. and Hagemeister, J. R. 1995. A pattern matching approach for reusing software libraries in parallel systems. In Proceedings of Workshop on Knowledge Based Systems for the Reuse of Program Libraries (Sophia Anthipolis, France).Google Scholar
Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D. A., Paek, Y., Pottenger, W. M., Rauchwerger, L., and Tu, P. 1996. Parallel programming with Polaris. IEEE Computer 29, 12 (Dec.), 78--82. Google ScholarDigital Library
Callahan, D. 1991. Recognizing and parallelizing bounded recurrences. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing (Santa Clara, CA). Lecture Notes in Computer Science, vol. 589. Springer-Verlag, New York, 169--185. Google ScholarDigital Library
Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4 (Oct.), 451--490. Google ScholarDigital Library
di Martino, B. and Iannello, G. 1996. PAP recognizer: A tool for automatic recognition of parallelizable patterns. In Proceedings of the 4th International Workshop on Program Comprehension (Berlin, Germany). IEEE Computer Society Press, Los Alamitos, CA, 164--174. Google ScholarDigital Library
Fahringer, T. and Scholz, B. 2000. A unified symbolic evaluation framework for parallelizing compilers. IEEE Trans. Parallel Dist. Syst. 11, 11 (Nov.), 1105--1125. Google ScholarDigital Library
Faigin, K. A., Weatherford, S. A., Hoeflinger, J. P., Padua, D. A., and Petersen, P. M. 1994. The Polaris internal representation. Int. J. Parall. Prog. 22, 5 (Oct.), 553--586. Google ScholarDigital Library
Fisher, A. L. and Ghuloum, A. M. 1994. Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (Orlando, FL). ACM, New York, 135--146. Google ScholarDigital Library
GCC Internals. GNU Compiler Collection Internals (GCC). Available at http://gcc.gnu. org/onlinedocs/gccint.pdf.Google Scholar
Gerlek, M. P., Stoltz, E., and Wolfe, M. 1995. Beyond induction variables: Detecting and classifying sequences using a demand-driven SSA. ACM Trans. Program. Lang. Syst. 17, 1 (Jan.), 85--122. Google ScholarDigital Library
Haghighat, M. R. and Polychronopoulos, C. D. 1996. Symbolic analysis for parallelizing compilers. ACM Trans. Program. Lang. Syst. 18, 4 (July), 477--518. Google ScholarDigital Library
Harandi, M. T. and Ning, J. Q. 1990. Knowledge-based program analysis. IEEE Softw. 7, 1, 74--81. Google ScholarDigital Library
Jouvelot, P. and Dehbonei, B. 1989. A unified semantic approach for the vectorization and parallelization of generalized reductions. In Proceedings of the 3rd International Conference on Supercomputing (Heraklion, Crete). ACM, New York, 186--194. Google ScholarDigital Library
Keßler, C. W. 1996. Pattern-driven automatic parallelization. Scient. Progr. 5, 3, 251--274. Google ScholarDigital Library
Keßler, C. W. and Smith, C. 1999. The SPARAMAT approach to automatic comprehension of sparse matrix computations. In Proceedings of the 7th International Workshop on Program Comprehension (Pittsburgh, PA). IEEE Computer Society Press, Los Alamitos, CA, 200--207. Google ScholarDigital Library
Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, CA). ACM, New York, 107--120. Google ScholarDigital Library
Knobe, K. and Sarkar, V. 2000. Enhanced parallelization via analyses and transformations on Array SSA form. In Proceedings of the 8th International Workshop on Compilers for Parallel Computers (Aussois, France).Google Scholar
Kozaczynski, W., Ning, J. Q., and Engberts, A. 1992. Program concept recognition and transformation. IEEE Trans. Softw. Eng. 18, 12, 1065--1075. Google ScholarDigital Library
Lin, Y. and Padua, D. A. 1998. On the automatic parallelization of sparse and irregular Fortran programs. In Proceedings of the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers (Pittsburgh, PA). Lecture Notes in Computer Science, vol. 1511. Springer-Verlag, New York, 41--56. Google ScholarDigital Library
Merrill, J. 2003. GENERIC and GIMPLE: A new tree representation for entire functions. In Proceedings of the 2003 GCC Developers Summit. 171--180. (Available at http://www.gccsummit.org/2003.Google Scholar
Metzger, R. 1995. Automated recognition of parallel algorithms in scientific applications. In IJCAI-95 Workshop Program Working Notes: “The Next Generation of Plan Recognition Systems”.Google Scholar
Muchnick, S. S. 1997. Advanced Compiler Design and Implementation. Morgan-Kaufmann, San Francisco, CA. Google ScholarDigital Library
Paul, S. and Prakash, A. 1994. A framework for source code search using program patterns. IEEE Trans. Softw. Eng. 20, 6, 463--475. Google ScholarDigital Library
Pinter, S. S. and Pinter, R. Y. 1994. Program optimization and parallelization using idioms. ACM Trans. Program. Lang. Syst. 16, 3 (May), 305--327. Google ScholarDigital Library
Pottenger, W. M. and Eigenmann, R. 1995. Idiom recognition in the Polaris parallelizing compiler. In Proceedings of the 9th International Conference on Supercomputing (Barcelona, Spain). ACM, New York, 444--448. Google ScholarDigital Library
Redon, X. and Feautrier, P. 1993. Detection of recurrences in sequential programs with loops. In Proceedings of the 5th International Parallel Architectures and Languages Europe Conference (Munich, Germany). Lecture Notes in Computer Science, vol. 694. Springer-Verlag, New York, 132--145. Google ScholarDigital Library
Saad, Y. 1994. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations (Version 2). (Available at http://www.cs.umn.edu/~saad/software/SPARSKIT/sparskit.html.Google Scholar
Sabot, G. and Wholey, S. 1993. CMAX: A Fortran translator for the connection machine system. In Proceedings of the 7th International Conference on Supercomputing (Tokyo, Japan). ACM, New York, 147--156. Google ScholarDigital Library
SPEC. SPEC CPU2000. Standard Performance Evaluation Corporation. Available at http://www.spec.org/cpu2000/.Google Scholar
Suganuma, T., Komatsu, H., and Nakatani, T. 1996. Detection and global optimization of reduction operations for distributed parallel machines. In Proceedings of the 10th International Conference on Supercomputing (Philadelphia, PA). ACM, New York, 18--25. Google ScholarDigital Library
Tarjan, R. E. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 2 (June), 146--160.Google ScholarDigital Library
Tu, P. and Padua, D. A. 1995. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 9th International Conference on Supercomputing (Barcelona, Spain). ACM, New York, 414--423. Google ScholarDigital Library
van Engelen, R. 2001. Efficient symbolic analysis for optimizing compilers. In Proceedings of the 10th International Conference on Compiler Construction (Genova, Italy). Lecture Notes in Computer Science, vol. 2027. Springer-Verlag, New York, 118--132. Google ScholarDigital Library
van Engelen, R., Birch, J., Shou, Y., Walsh, B., and Gallivan, K. 2004. A unified framework for nonlinear dependence testing and symbolic analysis. In Proceedings of the 18th International Conference on Supercomputing (Saint Malo, France). ACM, New York, 106--115. Google ScholarDigital Library
Wills, L. M. 1990. Automated program recognition: A feasibility demonstration. Artif. Intell. 45, 1-2, 113--171. Google ScholarDigital Library
Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, Reading, MA. Google ScholarDigital Library
Wu, P., Cohen, A., Hoeflinger, J., and Padua, D. A. 2001. Monotonic evolution: An alternative to induction variable substitution for dependence analysis. In Proceedings of the 15th International Conference on Supercomputing (Sorrento, Italy). ACM, New York, 78--91. Google ScholarDigital Library
Zhang, F. and D'Hollander, E. H. 1994. Enhancing parallelism by removing cyclic data dependencies. In Proceedings of the 6th International Parallel Architectures and Languages Europe Conference (Athens, Greece). Lecture Notes in Computer Science, vol. 817. Springer-Verlag, New York, 387--397. Google ScholarDigital Library
Zima, E. V. 1986. Automatic construction of systems of recurrence relations. USSR Comput. Math. Math. Phys. 24, 11-12, 193--197. Google ScholarDigital Library
Zima, E. V. 1995. Simplification and optimization transformations of chains of recurrences. In Proceedings of the 8th International Symposium on Symbolic and Algebraic Computation (Montreal, Ont., Canada). ACM New York, 42--50. Google ScholarDigital Library

Index Terms

XARK: An extensible framework for automatic recognition of computational kernels
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Interprocedural parallelization analysis in SUIF

As shared-memory multiprocessor systems become widely available, there is an increasing need for tools to simplify the task of developing parallel programs. This paper describes one such tool, the automatic parallelization system in the Stanford SUIF ...
Read More
Interprocedural analysis based on guarded array regions
Compiler optimizations for scalable parallel systems

Array data flow information plays an important role for successful automatic parallelization of Fortran programs. This chapter proposes a powerful symbolic array data flow summary scheme to support array privatization and loop parallelization for ...
Read More
A Unified Symbolic Evaluation Framework for Parallelizing Compilers

The quality of many optimizations and analyses of parallelizing compilers depends significantly on the ability to evaluate symbolic expressions and on the amount of information available about program variables at arbitrary program points. In this paper,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Programming Languages and Systems Volume 30, Issue 6
October 2008
245 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/1391956
Issue’s Table of Contents

Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2008
- Accepted: 1 January 2008
- Revised: 1 July 2007
- Received: 1 December 2006
Published in toplas Volume 30, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Automatic kernel recognition
demand-driven algorithms
gated single assignment
strongly connected component
symbolic analysis
use-def chains
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 631
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

XARK: An extensible framework for automatic recognition of computational kernels

ACM Transactions on Programming Languages and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Interprocedural parallelization analysis in SUIF

Interprocedural analysis based on guarded array regions

A Unified Symbolic Evaluation Framework for Parallelizing Compilers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

XARK: An extensible framework for automatic recognition of computational kernels

ACM Transactions on Programming Languages and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Interprocedural parallelization analysis in SUIF

Interprocedural analysis based on guarded array regions

A Unified Symbolic Evaluation Framework for Parallelizing Compilers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media