skip to main content
research-article
Open Access

XARK: An extensible framework for automatic recognition of computational kernels

Published:30 October 2008Publication History
Skip Abstract Section

Abstract

The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90's. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.

References

  1. Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. 2006. Compilers: Principles, Techniques, and Tools, 2nd ed. Addison-Wesley, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Allen, R. and Kennedy, K. 2002. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan-Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ammarguellat, Z. and Harrison, W. L. 1990. Automatic recognition of induction and recurrence relations by abstract interpretation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 283--295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrade, D., Arenaz, M., Fraguela, B. B., Touriño, J., and Doallo, R. 2007. Automated and accurate cache behavior analysis for codes with irregular access patterns. Concur. Comput. Pract. Exper. 19, 18 (Dec.), 2407--2423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arenaz, M. 2003. Compiler framework for the automatic detection of loop-level parallelism. Ph.D. dissertation, Department of Electronics and Systems, University of A Coruña. (Available at http://www.des.udc.es/~arenaz/papers/phdthesis_arenaz.pdf.Google ScholarGoogle Scholar
  6. Arenaz, M., Touriño, J., and Doallo, R. 2003. A GSA-based compiler infrastructure to extract parallelism from complex loops. In Proceedings of the 17th International Conference on Supercomputing. (San Francisco, CA). ACM, New York, 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Arenaz, M., Touriño, J., and Doallo, R. 2004. Compiler support for parallel code generation through kernel recognition. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (Santa Fe, NM). IEEE Computer Society Press, Los Alamitos, CA.Google ScholarGoogle Scholar
  8. Ballance, R. A., MacCabe, A. B., and Ottenstein, K. J. 1990. The program dependence web: A representation supporting control, data, and demand-driven interpretation of imperative languages. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 257--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bank, R. E. 2007. PLTMG package. Available at http://cam.ucsd.edu/~reb/software.html.Google ScholarGoogle Scholar
  10. Berry, M., Chen, D., Koss, P., Kuck, D., Pointer, L., Lo, S., Pang, Y., Roloff, R., Sameh, A., Clementi, E., Chin, S., Schneider, D., Fox, G., Messina, P., Walker, D., Hsiung, C., Schwarzmeier, J., Lue, K., Orzag, S., Seidl, F., Johnson, O., Swanson, G., Goodrum, R., and Martin, J. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Int. J. Supercomput. Apps. 3, 3, 5--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bhansali, S. and Hagemeister, J. R. 1995. A pattern matching approach for reusing software libraries in parallel systems. In Proceedings of Workshop on Knowledge Based Systems for the Reuse of Program Libraries (Sophia Anthipolis, France).Google ScholarGoogle Scholar
  12. Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D. A., Paek, Y., Pottenger, W. M., Rauchwerger, L., and Tu, P. 1996. Parallel programming with Polaris. IEEE Computer 29, 12 (Dec.), 78--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Callahan, D. 1991. Recognizing and parallelizing bounded recurrences. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing (Santa Clara, CA). Lecture Notes in Computer Science, vol. 589. Springer-Verlag, New York, 169--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4 (Oct.), 451--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. di Martino, B. and Iannello, G. 1996. PAP recognizer: A tool for automatic recognition of parallelizable patterns. In Proceedings of the 4th International Workshop on Program Comprehension (Berlin, Germany). IEEE Computer Society Press, Los Alamitos, CA, 164--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fahringer, T. and Scholz, B. 2000. A unified symbolic evaluation framework for parallelizing compilers. IEEE Trans. Parallel Dist. Syst. 11, 11 (Nov.), 1105--1125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Faigin, K. A., Weatherford, S. A., Hoeflinger, J. P., Padua, D. A., and Petersen, P. M. 1994. The Polaris internal representation. Int. J. Parall. Prog. 22, 5 (Oct.), 553--586. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fisher, A. L. and Ghuloum, A. M. 1994. Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (Orlando, FL). ACM, New York, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. GCC Internals. GNU Compiler Collection Internals (GCC). Available at http://gcc.gnu. org/onlinedocs/gccint.pdf.Google ScholarGoogle Scholar
  20. Gerlek, M. P., Stoltz, E., and Wolfe, M. 1995. Beyond induction variables: Detecting and classifying sequences using a demand-driven SSA. ACM Trans. Program. Lang. Syst. 17, 1 (Jan.), 85--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Haghighat, M. R. and Polychronopoulos, C. D. 1996. Symbolic analysis for parallelizing compilers. ACM Trans. Program. Lang. Syst. 18, 4 (July), 477--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Harandi, M. T. and Ning, J. Q. 1990. Knowledge-based program analysis. IEEE Softw. 7, 1, 74--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jouvelot, P. and Dehbonei, B. 1989. A unified semantic approach for the vectorization and parallelization of generalized reductions. In Proceedings of the 3rd International Conference on Supercomputing (Heraklion, Crete). ACM, New York, 186--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Keßler, C. W. 1996. Pattern-driven automatic parallelization. Scient. Progr. 5, 3, 251--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Keßler, C. W. and Smith, C. 1999. The SPARAMAT approach to automatic comprehension of sparse matrix computations. In Proceedings of the 7th International Workshop on Program Comprehension (Pittsburgh, PA). IEEE Computer Society Press, Los Alamitos, CA, 200--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, CA). ACM, New York, 107--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Knobe, K. and Sarkar, V. 2000. Enhanced parallelization via analyses and transformations on Array SSA form. In Proceedings of the 8th International Workshop on Compilers for Parallel Computers (Aussois, France).Google ScholarGoogle Scholar
  28. Kozaczynski, W., Ning, J. Q., and Engberts, A. 1992. Program concept recognition and transformation. IEEE Trans. Softw. Eng. 18, 12, 1065--1075. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lin, Y. and Padua, D. A. 1998. On the automatic parallelization of sparse and irregular Fortran programs. In Proceedings of the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers (Pittsburgh, PA). Lecture Notes in Computer Science, vol. 1511. Springer-Verlag, New York, 41--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Merrill, J. 2003. GENERIC and GIMPLE: A new tree representation for entire functions. In Proceedings of the 2003 GCC Developers Summit. 171--180. (Available at http://www.gccsummit.org/2003.Google ScholarGoogle Scholar
  31. Metzger, R. 1995. Automated recognition of parallel algorithms in scientific applications. In IJCAI-95 Workshop Program Working Notes: “The Next Generation of Plan Recognition Systems”.Google ScholarGoogle Scholar
  32. Muchnick, S. S. 1997. Advanced Compiler Design and Implementation. Morgan-Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Paul, S. and Prakash, A. 1994. A framework for source code search using program patterns. IEEE Trans. Softw. Eng. 20, 6, 463--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pinter, S. S. and Pinter, R. Y. 1994. Program optimization and parallelization using idioms. ACM Trans. Program. Lang. Syst. 16, 3 (May), 305--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pottenger, W. M. and Eigenmann, R. 1995. Idiom recognition in the Polaris parallelizing compiler. In Proceedings of the 9th International Conference on Supercomputing (Barcelona, Spain). ACM, New York, 444--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Redon, X. and Feautrier, P. 1993. Detection of recurrences in sequential programs with loops. In Proceedings of the 5th International Parallel Architectures and Languages Europe Conference (Munich, Germany). Lecture Notes in Computer Science, vol. 694. Springer-Verlag, New York, 132--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Saad, Y. 1994. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations (Version 2). (Available at http://www.cs.umn.edu/~saad/software/SPARSKIT/sparskit.html.Google ScholarGoogle Scholar
  38. Sabot, G. and Wholey, S. 1993. CMAX: A Fortran translator for the connection machine system. In Proceedings of the 7th International Conference on Supercomputing (Tokyo, Japan). ACM, New York, 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. SPEC. SPEC CPU2000. Standard Performance Evaluation Corporation. Available at http://www.spec.org/cpu2000/.Google ScholarGoogle Scholar
  40. Suganuma, T., Komatsu, H., and Nakatani, T. 1996. Detection and global optimization of reduction operations for distributed parallel machines. In Proceedings of the 10th International Conference on Supercomputing (Philadelphia, PA). ACM, New York, 18--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tarjan, R. E. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 2 (June), 146--160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Tu, P. and Padua, D. A. 1995. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 9th International Conference on Supercomputing (Barcelona, Spain). ACM, New York, 414--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. van Engelen, R. 2001. Efficient symbolic analysis for optimizing compilers. In Proceedings of the 10th International Conference on Compiler Construction (Genova, Italy). Lecture Notes in Computer Science, vol. 2027. Springer-Verlag, New York, 118--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. van Engelen, R., Birch, J., Shou, Y., Walsh, B., and Gallivan, K. 2004. A unified framework for nonlinear dependence testing and symbolic analysis. In Proceedings of the 18th International Conference on Supercomputing (Saint Malo, France). ACM, New York, 106--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Wills, L. M. 1990. Automated program recognition: A feasibility demonstration. Artif. Intell. 45, 1-2, 113--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wolfe, M. 1996. High Performance Compilers for Parallel Computing. Addison-Wesley, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wu, P., Cohen, A., Hoeflinger, J., and Padua, D. A. 2001. Monotonic evolution: An alternative to induction variable substitution for dependence analysis. In Proceedings of the 15th International Conference on Supercomputing (Sorrento, Italy). ACM, New York, 78--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zhang, F. and D'Hollander, E. H. 1994. Enhancing parallelism by removing cyclic data dependencies. In Proceedings of the 6th International Parallel Architectures and Languages Europe Conference (Athens, Greece). Lecture Notes in Computer Science, vol. 817. Springer-Verlag, New York, 387--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Zima, E. V. 1986. Automatic construction of systems of recurrence relations. USSR Comput. Math. Math. Phys. 24, 11-12, 193--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Zima, E. V. 1995. Simplification and optimization transformations of chains of recurrences. In Proceedings of the 8th International Symposium on Symbolic and Algebraic Computation (Montreal, Ont., Canada). ACM New York, 42--50. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. XARK: An extensible framework for automatic recognition of computational kernels

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 30, Issue 6
      October 2008
      245 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/1391956
      Issue’s Table of Contents

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 October 2008
      • Accepted: 1 January 2008
      • Revised: 1 July 2007
      • Received: 1 December 2006
      Published in toplas Volume 30, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader