Skip to main content
Top
Published in: International Journal of Parallel Programming 1/2022

06-08-2021

Enhancing the Effectiveness of Inlining in Automatic Parallelization

Authors: Jichi Guo, Qing Yi, Kleanthis Psarris

Published in: International Journal of Parallel Programming | Issue 1/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The emergence of multi-core architectures makes it essential for optimizing compilers to automatically extract parallelism for large scientific applications composed of many subroutines residing in different files. Inlining is a well-known technique which can be used to erase procedural boundaries and enable more aggressive loop parallelization. However, conventional inlining cannot be applied to external libraries where the source code is not available, and when overly applied, it can degrade the effectiveness of compiler optimizations due to excessive code complexity. This paper highlights some obstacles we encountered while applying conventional inlining combined with automatic loop parallelization using the Polaris optimizing compiler and presents a new approach, annotation-based inlining, to effectively overcome these obstacles. Our experimental results show that the annotation-based inlining approach can eliminate negative impact of conventional inlining while enhancing the effectiveness of interprocedural parallelization for a majority of applications from the PERFECT benchmark suite.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architec- tures. Morgan Kaufmann, San Francisco (2001) Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architec- tures. Morgan Kaufmann, San Francisco (2001)
2.
go back to reference Arnold, M., Fink, S., Sarkar, V., Sweeney, P.F.: A comparative study of static and profile-based heuristics for inlining. SIGPLAN Not. 35(7), 52–64 (2000)CrossRef Arnold, M., Fink, S., Sarkar, V., Sweeney, P.F.: A comparative study of static and profile-based heuristics for inlining. SIGPLAN Not. 35(7), 52–64 (2000)CrossRef
3.
go back to reference Ashley, J. M.: The effectiveness of flow analysis for inlining. In: In Proceedings of the 1997 ACM SIGPLAN International Conference on Functional Programming, pp 99–111 (1997). Ashley, J. M.: The effectiveness of flow analysis for inlining. In: In Proceedings of the 1997 ACM SIGPLAN International Conference on Functional Programming, pp 99–111 (1997).
4.
go back to reference Ayers, A., Schooler, R., Gottlieb, R.: Aggressive inlining. SIGPLAN Not. 32(5), 134–145 (1997)CrossRef Ayers, A., Schooler, R., Gottlieb, R.: Aggressive inlining. SIGPLAN Not. 32(5), 134–145 (1997)CrossRef
5.
go back to reference Balasundaram, V., Kennedy, K.: A technique for summarizing data access and its use in parallelism enhancing transformations. SIGPLAN Not. 24(7), 41–53 (1989)CrossRef Balasundaram, V., Kennedy, K.: A technique for summarizing data access and its use in parallelism enhancing transformations. SIGPLAN Not. 24(7), 41–53 (1989)CrossRef
6.
go back to reference Blume, W., Eigenmann, R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 643–656 (1992)CrossRef Blume, W., Eigenmann, R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 643–656 (1992)CrossRef
7.
go back to reference Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: Improving the effectiveness of parallelizing compilers. In: In Languages and Compilers for Parallel Computing, pp. 141–154. Springer-Verlag (1994). Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: Improving the effectiveness of parallelizing compilers. In: In Languages and Compilers for Parallel Computing, pp. 141–154. Springer-Verlag (1994).
8.
go back to reference Bondhugula, U., Hartono, A., Ramanujam, J.: A practical automatic polyhedral parallelizer and locality optimizer. In: In PLDI 08: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (2008). Bondhugula, U., Hartono, A., Ramanujam, J.: A practical automatic polyhedral parallelizer and locality optimizer. In: In PLDI 08: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (2008).
9.
go back to reference Calder, B., Grunwald, D.: Reducing indirect function call overhead in c++ programs. In: POPL ’94: Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 397–408, ACM, New York, NY, USA, (1994). Calder, B., Grunwald, D.: Reducing indirect function call overhead in c++ programs. In: POPL ’94: Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 397–408, ACM, New York, NY, USA, (1994).
10.
go back to reference Cavazos, J., OBoyle, M. F. P.: Automatic tuning of inlining heuristics. In: In ACM/IEEE Conference on Supercomputing, p. 14 (2005). Cavazos, J., OBoyle, M. F. P.: Automatic tuning of inlining heuristics. In: In ACM/IEEE Conference on Supercomputing, p. 14 (2005).
11.
go back to reference Chaki, S., Clarke, E., Groce, A.: Modular verification of software components in c. Trans. Softw. Eng. 1(8), 388–402 (2004). Chaki, S., Clarke, E., Groce, A.: Modular verification of software components in c. Trans. Softw. Eng. 1(8), 388–402 (2004).
12.
go back to reference Chang, Y.-S., Lee, H.-J., Park, D.-S., Lee, I.-Y.: Interprocedural transformations for extracting maximum parallelism. In: ADVIS ’02: Proceedings of the Second International Conference on Advances in Information Systems, pp. 415–424, Springer-Verlag, London, UK, (2002). Chang, Y.-S., Lee, H.-J., Park, D.-S., Lee, I.-Y.: Interprocedural transformations for extracting maximum parallelism. In: ADVIS ’02: Proceedings of the Second International Conference on Advances in Information Systems, pp. 415–424, Springer-Verlag, London, UK, (2002).
13.
go back to reference Cooper, K.D., Hall, M.W., Torczon, L.: Unexpected side effects of inline substitution: a case study. ACM Lett. Program. Lang. Syst. 1(1), 22–32 (1992)CrossRef Cooper, K.D., Hall, M.W., Torczon, L.: Unexpected side effects of inline substitution: a case study. ACM Lett. Program. Lang. Syst. 1(1), 22–32 (1992)CrossRef
14.
go back to reference Dean, J., Chambers, C.: Towards better inlining decisions using inlining trials. In: In Proceedings of the 1994 ACM Conference on LISP and Functional Programming, pp. 273–282 (1994). Dean, J., Chambers, C.: Towards better inlining decisions using inlining trials. In: In Proceedings of the 1994 ACM Conference on LISP and Functional Programming, pp. 273–282 (1994).
15.
go back to reference Flanagan, C., Leino, K. R. M., Lillibridge, M., Nelson, G., Saxe, J. B.,Stata, R.: Extended static checking for java. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pp. 234–245, ACM, New York, NY, USA (2002). Flanagan, C., Leino, K. R. M., Lillibridge, M., Nelson, G., Saxe, J. B.,Stata, R.: Extended static checking for java. In: Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, PLDI ’02, pp. 234–245, ACM, New York, NY, USA (2002).
16.
go back to reference Grant, B., Mock, M., Philipose, M., Chambers, C., Eggers, S.J.: Dyc: an expressive annotation-directed dynamic compiler for c. Theor. Comput. Sci. 248(1–2), 147–199 (2000)CrossRef Grant, B., Mock, M., Philipose, M., Chambers, C., Eggers, S.J.: Dyc: an expressive annotation-directed dynamic compiler for c. Theor. Comput. Sci. 248(1–2), 147–199 (2000)CrossRef
17.
go back to reference Grout, J. R.: Inline expansion for the polaris research compiler. Technical Report, (1995). Grout, J. R.: Inline expansion for the polaris research compiler. Technical Report, (1995).
18.
go back to reference Guyer, S.Z., Lin, C.: Broadway: A compiler for exploiting the domain-specific semantics of software libraries. Proc. IEEE Special Issue Prog. Gener. Optim. Adapt. 93(2), 342–357 (2005) Guyer, S.Z., Lin, C.: Broadway: A compiler for exploiting the domain-specific semantics of software libraries. Proc. IEEE Special Issue Prog. Gener. Optim. Adapt. 93(2), 342–357 (2005)
19.
go back to reference Hall, M., Kennedy, K., Kinley, K. S. M.: Interprocedural transformations for parallel code generation. In: In Proceedings of Supercomputing ’91, pp. 424–434 (1991). Hall, M., Kennedy, K., Kinley, K. S. M.: Interprocedural transformations for parallel code generation. In: In Proceedings of Supercomputing ’91, pp. 424–434 (1991).
20.
go back to reference Hall, M. W., Mellor-Crummey, J. M., Carle, A., Rodrguez, R. G.: Fiat: A framework for interprocedural analysis and transformation. In: In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, pp. 522–545. Springer-Verlag (1995). Hall, M. W., Mellor-Crummey, J. M., Carle, A., Rodrguez, R. G.: Fiat: A framework for interprocedural analysis and transformation. In: In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, pp. 522–545. Springer-Verlag (1995).
21.
go back to reference Hank, R. E., Mei, W., Hwu, W., Rau, B. R.: Region-based compilation: An introduction and motivation. In: In Proceedings of the 28th Annual International Symposium on Microarchitecture, pp. 158–168 (1995). Hank, R. E., Mei, W., Hwu, W., Rau, B. R.: Region-based compilation: An introduction and motivation. In: In Proceedings of the 28th Annual International Symposium on Microarchitecture, pp. 158–168 (1995).
22.
go back to reference Hoeflinger, J.P., Paek, Y., Yi, K.: Unified interprocedural parallelism detection. Int. J. Parallel Program. 29(2), 185–215 (2001)CrossRef Hoeflinger, J.P., Paek, Y., Yi, K.: Unified interprocedural parallelism detection. Int. J. Parallel Program. 29(2), 185–215 (2001)CrossRef
23.
go back to reference Jagannathan, S., Wright, A.: Flow-directed inlining. In: In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 193–205 (1996). Jagannathan, S., Wright, A.: Flow-directed inlining. In: In Proceedings of the ACM Conference on Programming Language Design and Implementation, pp. 193–205 (1996).
24.
go back to reference Liu, Y., Zhaoqing, Y. L., Qiao, R., Ching Ju, R. D.: A region- based compilation infrastructure. In: Proceefings of the 7th Workshop on Interaction between Compilers and Computer Architectures, pp. 75–84. IEEE Computer Society Press (2003). Liu, Y., Zhaoqing, Y. L., Qiao, R., Ching Ju, R. D.: A region- based compilation infrastructure. In: Proceefings of the 7th Workshop on Interaction between Compilers and Computer Architectures, pp. 75–84. IEEE Computer Society Press (2003).
25.
go back to reference Psarris, K., Kyriakopoulos, K.: The impact of data dependence analysis on compilation and program parallelization. In: ICS ’03: Proceedings of the 17th Annual International Conference on Supercomputing, pp. 205–214, ACM, New York, NY, USA (2003). Psarris, K., Kyriakopoulos, K.: The impact of data dependence analysis on compilation and program parallelization. In: ICS ’03: Proceedings of the 17th Annual International Conference on Supercomputing, pp. 205–214, ACM, New York, NY, USA (2003).
26.
27.
go back to reference Saavedra, R.H., Smith, A.J.: Analysis of benchmark characteristics and benchmark performance prediction. ACM Trans. Comput. Syst. 14, 344–384 (1996)CrossRef Saavedra, R.H., Smith, A.J.: Analysis of benchmark characteristics and benchmark performance prediction. ACM Trans. Comput. Syst. 14, 344–384 (1996)CrossRef
28.
go back to reference Seidl, H., Steffen, B.: Constraint-based inter-procedural analysis of parallel programs. Nordic J. of Computing 7(4), 375–400 (2000)MathSciNetMATH Seidl, H., Steffen, B.: Constraint-based inter-procedural analysis of parallel programs. Nordic J. of Computing 7(4), 375–400 (2000)MathSciNetMATH
29.
go back to reference Shenoy, U.N., Srikant, Y.N., Bhatkar, V.P.: An automatic parallelization framework for multicomputers. Comput. Lang. 20(3), 135–150 (1994)CrossRef Shenoy, U.N., Srikant, Y.N., Bhatkar, V.P.: An automatic parallelization framework for multicomputers. Comput. Lang. 20(3), 135–150 (1994)CrossRef
30.
go back to reference Shirako, J., Nagasawa, K., Ishizaka, K., Obata, M., Kasahara, H.: Selective inline expansion for improvement of multi grain parallelism. In: Parallel and Distributed Computing and Networks, pp. 476–482 (2004). Shirako, J., Nagasawa, K., Ishizaka, K., Obata, M., Kasahara, H.: Selective inline expansion for improvement of multi grain parallelism. In: Parallel and Distributed Computing and Networks, pp. 476–482 (2004).
31.
go back to reference Triantafyllis, S., Bridges, M. J., Raman, E., Ottoni, G., August, D. I.: A framework for unrestricted whole-program optimization. In: In ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, pp. 61–71 (2006). Triantafyllis, S., Bridges, M. J., Raman, E., Ottoni, G., August, D. I.: A framework for unrestricted whole-program optimization. In: In ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, pp. 61–71 (2006).
32.
go back to reference Waddell, O., Dybig, R. K.: Fast and effective procedure inlining. In: SAS ’97: Proceedings of the 4th International Symposium on Static Analysis, pp. 35–52, Springer-Verlag, London, UK (1997). Waddell, O., Dybig, R. K.: Fast and effective procedure inlining. In: SAS ’97: Proceedings of the 4th International Symposium on Static Analysis, pp. 35–52, Springer-Verlag, London, UK (1997).
33.
go back to reference Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co. Inc, Boston, MA, USA (1995)MATH Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co. Inc, Boston, MA, USA (1995)MATH
34.
go back to reference Wu, P., Midkiff, S. P., Moreira, J. E., Gupta, M.: Improving Java performance through semantic inlining. In: Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing (1999). Wu, P., Midkiff, S. P., Moreira, J. E., Gupta, M.: Improving Java performance through semantic inlining. In: Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing (1999).
35.
go back to reference Wu, P., Moreira, J. E., Midkiff, S. P., Gupta, M., Padua, D. A.: Semantic inlining - the compiler support for java in technical computing. In: PPSC (1999). Wu, P., Moreira, J. E., Midkiff, S. P., Gupta, M., Padua, D. A.: Semantic inlining - the compiler support for java in technical computing. In: PPSC (1999).
36.
go back to reference Zhao, P., Amaral, J. N.: To inline or not to inline? enhanced inlining decisions. In: In Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 405–419 (2003). Zhao, P., Amaral, J. N.: To inline or not to inline? enhanced inlining decisions. In: In Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 405–419 (2003).
37.
go back to reference Zhao, P., Amaral, J.N.: Ablego: a function outlining and partial inlining framework. Softw. Pract. Exper. 37(5), 465–491 (2007)CrossRef Zhao, P., Amaral, J.N.: Ablego: a function outlining and partial inlining framework. Softw. Pract. Exper. 37(5), 465–491 (2007)CrossRef
38.
go back to reference Aleksandar, P., et al.: An optimization-driven incremental inline substitution algorithm for just-in-time compilers.In: 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE (2019). Aleksandar, P., et al.: An optimization-driven incremental inline substitution algorithm for just-in-time compilers.In: 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE (2019).
40.
go back to reference Cammarota, R., Nicolau, A., Veidenbaum, A.V., Kejariwal, A., Donato, D., Madhugiri, M.: On the determination of inlining vectors for program optimization. In: Proceedings of the 22nd International Conference on Compiler Construction (CC’13). Springer-Verlag, Berlin, Heidelberg, pp. 164–183. https://doi.org/10.1007/978-3-642-37051-9_9 Cammarota, R., Nicolau, A., Veidenbaum, A.V., Kejariwal, A., Donato, D., Madhugiri, M.: On the determination of inlining vectors for program optimization. In: Proceedings of the 22nd International Conference on Compiler Construction (CC’13). Springer-Verlag, Berlin, Heidelberg, pp. 164–183. https://​doi.​org/​10.​1007/​978-3-642-37051-9_​9
41.
go back to reference Simon, D., Cavazos, J., Wimmer, C., Kulkarni, S.: Automatic construction of inlining heuristics using machine learning. In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’13). IEEE Computer Society, Washington, DC, pp. 1–12. https://doi.org/10.1109/CGO.2013.6495004 Simon, D., Cavazos, J., Wimmer, C., Kulkarni, S.: Automatic construction of inlining heuristics using machine learning. In: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO ’13). IEEE Computer Society, Washington, DC, pp. 1–12. https://​doi.​org/​10.​1109/​CGO.​2013.​6495004
42.
44.
go back to reference Papadopoulos, I., Thomas, N., Fidel, A., Amato, N. M., Rauchwerger, L.: STAPL-RTS: An application driven runtime system. In: Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, pp. 425–434. https://doi.org/10.1145/2751205.2751233 Papadopoulos, I., Thomas, N., Fidel, A., Amato, N. M., Rauchwerger, L.: STAPL-RTS: An application driven runtime system. In: Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, pp. 425–434. https://​doi.​org/​10.​1145/​2751205.​2751233
45.
go back to reference Yi, Q., Wang, Q., Cui, H.: Specializing compiler optimizations through programmable composition for dense matrix computations. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, USA, pp. 596–608. https://doi.org/10.1109/MICRO.2014.14 Yi, Q., Wang, Q., Cui, H.: Specializing compiler optimizations through programmable composition for dense matrix computations. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, USA, pp. 596–608. https://​doi.​org/​10.​1109/​MICRO.​2014.​14
46.
go back to reference Nguyen, T., Cicotti, P., Bylaska, E., Quinlan, D., Baden, S.B.: Bamboo: translating MPI applications to a latency-tolerant, data-driven form. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Washington, DC, USA, Article 39, pp. 1–11. Nguyen, T., Cicotti, P., Bylaska, E., Quinlan, D., Baden, S.B.: Bamboo: translating MPI applications to a latency-tolerant, data-driven form. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Washington, DC, USA, Article 39, pp. 1–11.
Metadata
Title
Enhancing the Effectiveness of Inlining in Automatic Parallelization
Authors
Jichi Guo
Qing Yi
Kleanthis Psarris
Publication date
06-08-2021
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 1/2022
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-021-00722-1

Other articles of this Issue 1/2022

International Journal of Parallel Programming 1/2022 Go to the issue

Premium Partner