Top

International Journal of Parallel Programming

Published in:

06-08-2021

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Authors: Jichi Guo, Qing Yi, Kleanthis Psarris

Published in: International Journal of Parallel Programming | Issue 1/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The emergence of multi-core architectures makes it essential for optimizing compilers to automatically extract parallelism for large scientific applications composed of many subroutines residing in different files. Inlining is a well-known technique which can be used to erase procedural boundaries and enable more aggressive loop parallelization. However, conventional inlining cannot be applied to external libraries where the source code is not available, and when overly applied, it can degrade the effectiveness of compiler optimizations due to excessive code complexity. This paper highlights some obstacles we encountered while applying conventional inlining combined with automatic loop parallelization using the Polaris optimizing compiler and presents a new approach, annotation-based inlining, to effectively overcome these obstacles. Our experimental results show that the annotation-based inlining approach can eliminate negative impact of conventional inlining while enhancing the effectiveness of interprocedural parallelization for a majority of applications from the PERFECT benchmark suite.

previous article A Comparative Survey of Big Data Computing and HPC: From a Parallel Programming Model to a Cluster Architecture

next article Statistical Analysis Based Intrusion Detection System for Ultra-High-Speed Software Defined Network

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architec- tures. Morgan Kaufmann, San Francisco (2001)

Arnold, M., Fink, S., Sarkar, V., Sweeney, P.F.: A comparative study of static and profile-based heuristics for inlining. SIGPLAN Not. 35(7), 52–64 (2000)CrossRef

Ashley, J. M.: The effectiveness of flow analysis for inlining. In: In Proceedings of the 1997 ACM SIGPLAN International Conference on Functional Programming, pp 99–111 (1997).

Ayers, A., Schooler, R., Gottlieb, R.: Aggressive inlining. SIGPLAN Not. 32(5), 134–145 (1997)CrossRef

Balasundaram, V., Kennedy, K.: A technique for summarizing data access and its use in parallelism enhancing transformations. SIGPLAN Not. 24(7), 41–53 (1989)CrossRef

Blume, W., Eigenmann, R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 643–656 (1992)CrossRef

Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: Improving the effectiveness of parallelizing compilers. In: In Languages and Compilers for Parallel Computing, pp. 141–154. Springer-Verlag (1994).

Bondhugula, U., Hartono, A., Ramanujam, J.: A practical automatic polyhedral parallelizer and locality optimizer. In: In PLDI 08: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (2008).

Calder, B., Grunwald, D.: Reducing indirect function call overhead in c++ programs. In: POPL ’94: Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 397–408, ACM, New York, NY, USA, (1994).

10.

Cavazos, J., OBoyle, M. F. P.: Automatic tuning of inlining heuristics. In: In ACM/IEEE Conference on Supercomputing, p. 14 (2005).

11.

Chaki, S., Clarke, E., Groce, A.: Modular verification of software components in c. Trans. Softw. Eng. 1(8), 388–402 (2004).

12.

Chang, Y.-S., Lee, H.-J., Park, D.-S., Lee, I.-Y.: Interprocedural transformations for extracting maximum parallelism. In: ADVIS ’02: Proceedings of the Second International Conference on Advances in Information Systems, pp. 415–424, Springer-Verlag, London, UK, (2002).

13.

Cooper, K.D., Hall, M.W., Torczon, L.: Unexpected side effects of inline substitution: a case study. ACM Lett. Program. Lang. Syst. 1(1), 22–32 (1992)CrossRef

14.

Dean, J., Chambers, C.: Towards better inlining decisions using inlining trials. In: In Proceedings of the 1994 ACM Conference on LISP and Functional Programming, pp. 273–282 (1994).

15.

Flanagan, C., Leino, K. R. M., Lillibridge, M., Nelson, G., Saxe, J. B.,Stata, R.: Extended static checking for java. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pp. 234–245, ACM, New York, NY, USA (2002).

16.

Grant, B., Mock, M., Philipose, M., Chambers, C., Eggers, S.J.: Dyc: an expressive annotation-directed dynamic compiler for c. Theor. Comput. Sci. 248(1–2), 147–199 (2000)CrossRef

17.

Grout, J. R.: Inline expansion for the polaris research compiler. Technical Report, (1995).

18.

Guyer, S.Z., Lin, C.: Broadway: A compiler for exploiting the domain-specific semantics of software libraries. Proc. IEEE Special Issue Prog. Gener. Optim. Adapt. 93(2), 342–357 (2005)

19.

Hall, M., Kennedy, K., Kinley, K. S. M.: Interprocedural transformations for parallel code generation. In: In Proceedings of Supercomputing ’91, pp. 424–434 (1991).

20.

Hall, M. W., Mellor-Crummey, J. M., Carle, A., Rodrguez, R. G.: Fiat: A framework for interprocedural analysis and transformation. In: In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, pp. 522–545. Springer-Verlag (1995).

21.

Hank, R. E., Mei, W., Hwu, W., Rau, B. R.: Region-based compilation: An introduction and motivation. In: In Proceedings of the 28th Annual International Symposium on Microarchitecture, pp. 158–168 (1995).

22.

Hoeflinger, J.P., Paek, Y., Yi, K.: Unified interprocedural parallelism detection. Int. J. Parallel Program. 29(2), 185–215 (2001)CrossRef

23.

Jagannathan, S., Wright, A.: Flow-directed inlining. In: In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 193–205 (1996).

24.

Liu, Y., Zhaoqing, Y. L., Qiao, R., Ching Ju, R. D.: A region- based compilation infrastructure. In: Proceefings of the 7th Workshop on Interaction between Compilers and Computer Architectures, pp. 75–84. IEEE Computer Society Press (2003).

25.

Psarris, K., Kyriakopoulos, K.: The impact of data dependence analysis on compilation and program parallelization. In: ICS ’03: Proceedings of the 17th Annual International Conference on Supercomputing, pp. 205–214, ACM, New York, NY, USA (2003).

26.

Ramaswamy, S., Sapatnekar, S., Banerjee, P.: A framework for exploiting data and functional parallelism on distributed memory multicomputers. IEEE Trans. Parallel Distrib. Syst. 8(11), 1098–1116 (1997). https://doi.org/10.1109/71.642945

27.

Saavedra, R.H., Smith, A.J.: Analysis of benchmark characteristics and benchmark performance prediction. ACM Trans. Comput. Syst. 14, 344–384 (1996)CrossRef

28.

Seidl, H., Steffen, B.: Constraint-based inter-procedural analysis of parallel programs. Nordic J. of Computing 7(4), 375–400 (2000)MathSciNetMATH

29.

Shenoy, U.N., Srikant, Y.N., Bhatkar, V.P.: An automatic parallelization framework for multicomputers. Comput. Lang. 20(3), 135–150 (1994)CrossRef

30.

Shirako, J., Nagasawa, K., Ishizaka, K., Obata, M., Kasahara, H.: Selective inline expansion for improvement of multi grain parallelism. In: Parallel and Distributed Computing and Networks, pp. 476–482 (2004).

31.

Triantafyllis, S., Bridges, M. J., Raman, E., Ottoni, G., August, D. I.: A framework for unrestricted whole-program optimization. In: In ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, pp. 61–71 (2006).

32.

Waddell, O., Dybig, R. K.: Fast and effective procedure inlining. In: SAS ’97: Proceedings of the 4th International Symposium on Static Analysis, pp. 35–52, Springer-Verlag, London, UK (1997).

33.

Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co. Inc, Boston, MA, USA (1995)MATH

34.

Wu, P., Midkiff, S. P., Moreira, J. E., Gupta, M.: Improving Java performance through semantic inlining. In: Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing (1999).

35.

Wu, P., Moreira, J. E., Midkiff, S. P., Gupta, M., Padua, D. A.: Semantic inlining - the compiler support for java in technical computing. In: PPSC (1999).

36.

Zhao, P., Amaral, J. N.: To inline or not to inline? enhanced inlining decisions. In: In Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 405–419 (2003).

37.

Zhao, P., Amaral, J.N.: Ablego: a function outlining and partial inlining framework. Softw. Pract. Exper. 37(5), 465–491 (2007)CrossRef

38.

Aleksandar, P., et al.: An optimization-driven incremental inline substitution algorithm for just-in-time compilers.In: 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE (2019).

39.

Häubl, C., Wimmer, C., Mössenböck, H.: Context-sensitive Trace Inlining for Java. Comput. Lang. Syst. Struct. 39(4), 123–141 (2013). https://doi.org/10.1016/j.cl.2013.04.002CrossRef

40.

Cammarota, R., Nicolau, A., Veidenbaum, A.V., Kejariwal, A., Donato, D., Madhugiri, M.: On the determination of inlining vectors for program optimization. In: Proceedings of the 22nd International Conference on Compiler Construction (CC’13). Springer-Verlag, Berlin, Heidelberg, pp. 164–183. https://doi.org/10.1007/978-3-642-37051-9_9

41.

Simon, D., Cavazos, J., Wimmer, C., Kulkarni, S.: Automatic construction of inlining heuristics using machine learning. In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’13). IEEE Computer Society, Washington, DC, pp. 1–12. https://doi.org/10.1109/CGO.2013.6495004

42.

Stucki, N., Biboudis, A., Doeraene, S., Odersky, M.: Semantics-preserving inlining for metaprogramming. In: Proceedings of the 11th ACM SIGPLAN International Symposium on Scala (SCALA 2020). ACM, New York, NY, USA, pp. 14–24. https://doi.org/10.1145/3426426.3428486

43.

Norris, H. B., Sadayappan, P.: Annotation-based empirical performance tuning using Orio, In: 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy, pp. 1-11 (2009). https://doi.org/10.1109/IPDPS.2009.5161004

44.

Papadopoulos, I., Thomas, N., Fidel, A., Amato, N. M., Rauchwerger, L.: STAPL-RTS: An application driven runtime system. In: Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, pp. 425–434. https://doi.org/10.1145/2751205.2751233

45.

Yi, Q., Wang, Q., Cui, H.: Specializing compiler optimizations through programmable composition for dense matrix computations. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, USA, pp. 596–608. https://doi.org/10.1109/MICRO.2014.14

46.

Nguyen, T., Cicotti, P., Bylaska, E., Quinlan, D., Baden, S.B.: Bamboo: translating MPI applications to a latency-tolerant, data-driven form. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Washington, DC, USA, Article 39, pp. 1–11.

47.

Ali, K., Lhoták, O.: Application-only call graph construction. In: Proceedings of the European Conference on Object-Oriented Programming (ECOOP 2012). Lecture Notes in Computer Science, Vol 7313. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31057-7_30

Title: Enhancing the Effectiveness of Inlining in Automatic Parallelization
Authors: Jichi Guo
Qing Yi
Kleanthis Psarris
Publication date: 06-08-2021
Publisher: Springer US
Published in: International Journal of Parallel Programming / Issue 1/2022
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-021-00722-1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 1/2022

Lightweight Artificial Intelligence Technology for Health Diagnosis of Agriculture Vehicles: Parallel Evolving Artificial Neural Networks by Genetic Algorithm

Accelerating Computation of Steiner Trees on GPUs

Statistical Analysis Based Intrusion Detection System for Ultra-High-Speed Software Defined Network

A Comparative Survey of Big Data Computing and HPC: From a Parallel Programming Model to a Cluster Architecture

A Profile-Based AI-Assisted Dynamic Scheduling Approach for Heterogeneous Architectures

Premium Partner