Skip to main content
Erschienen in: International Journal of Parallel Programming 4/2015

01.08.2015

PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers

verfasst von: Dheya Mustafa, Rudolf Eigenmann

Erschienen in: International Journal of Parallel Programming | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper describes PETRA: a portable performance evaluation tool for parallelizing compilers and their individual techniques. Automatic parallelization of sequential programs combined with performance tuning is an important alternative to manual parallelization for exploiting the performance potential of today’s multicores. Given the renewed interest in autoparallelization, this paper aims at a comprehensive evaluation, identifying strengths and weaknesses in the underlying techniques. The findings allow engineers to make informed decisions about techniques to include in industrial products and direct researchers to potential improvements. We present an experimental methodology and a fully automated implementation for comprehensively evaluating the effectiveness of parallelizing compilers and their underlying optimization techniques. The methodology is the first to (1) include automatic tuning, (2) measure the performance contributions of individual techniques at multiple optimization levels, and (3) quantify the interactions of compiler optimizations. The results will also help close the gap between research compilers and industrial compilers, which are still far behind. We applied the proposed methodology using PETRA on five modern parallelizing compilers and their tuning capabilities, illustrating several use cases and applications for the evaluation tool. We report speedups, parallel coverage, and the number of parallel loops, using the NAS Benchmarks as a program suite. We found parallelizers to be reasonably successful in about half of the given science-engineering programs. An important finding is also that some techniques substitute each other. Furthermore, we found that automatic tuning can lead to significant additional performance and sometimes matches or outperforms hand-parallelized programs. Advanced versions of some of the techniques identified as most successful in previous generations of compilers are also most important today, while other techniques have risen significantly in impact. Finally, we analyze specific reasons for the measured performance and the potential for improvement of automatic parallelization.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.: Automatic program parallelization. In: Proceedings of the IEEE, pp. 211–243 (1993) Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.: Automatic program parallelization. In: Proceedings of the IEEE, pp. 211–243 (1993)
2.
Zurück zum Zitat Blume, W., Eigenmann, R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 643–656 (1992) Blume, W., Eigenmann, R.: Performance analysis of parallelizing compilers on the perfect benchmarks programs. IEEE Trans. Parallel Distrib. Syst. 3, 643–656 (1992)
3.
Zurück zum Zitat Callahan, D., Levine, D.: Vectorizing compilers: a test suite and results. In: SC Conference, pp. 98–105 (1988) Callahan, D., Levine, D.: Vectorizing compilers: a test suite and results. In: SC Conference, pp. 98–105 (1988)
4.
Zurück zum Zitat Cavazos, J., O’Boyle, M.F.P.: Automatic tuning of inlining heuristics. In: Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, pp. 14–14 (2005). doi:10.1109/SC.2005.14 Cavazos, J., O’Boyle, M.F.P.: Automatic tuning of inlining heuristics. In: Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference, pp. 14–14 (2005). doi:10.​1109/​SC.​2005.​14
5.
Zurück zum Zitat Cytron, R., Kuck, D.J., Veidenbaum, A.V.: The effect of restructing compilers on program performance for high-speed computers. Comput. Phys. Commun. 37, 39–48 (1985) Cytron, R., Kuck, D.J., Veidenbaum, A.V.: The effect of restructing compilers on program performance for high-speed computers. Comput. Phys. Commun. 37, 39–48 (1985)
6.
Zurück zum Zitat der Wijngaart, R.F.V.: NAS Parallel Benchmarks Version 2.4. Technical Report, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division (2002) der Wijngaart, R.F.V.: NAS Parallel Benchmarks Version 2.4. Technical Report, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division (2002)
7.
Zurück zum Zitat Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput. 42(12), 36–42 (2009) Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput. 42(12), 36–42 (2009)
8.
Zurück zum Zitat Dave, C., Eigenmann, R.: Automatically tuning parallel and parallelized programs. In: LCPC ’09: Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing (2009) Dave, C., Eigenmann, R.: Automatically tuning parallel and parallelized programs. In: LCPC ’09: Proceedings of the 22nd International Workshop on Languages and Compilers for Parallel Computing (2009)
9.
Zurück zum Zitat Eigenmann, R., Blume, W.: An effectiveness study of parallelizing compiler techniques. In: ICPP (2)’91, pp. 17–25 (1991) Eigenmann, R., Blume, W.: An effectiveness study of parallelizing compiler techniques. In: ICPP (2)’91, pp. 17–25 (1991)
10.
Zurück zum Zitat Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks. IEEE Trans. Parallel Distrib. Syst. 9, 5–23 (1998) Eigenmann, R., Hoeflinger, J., Padua, D.: On the automatic parallelization of the perfect benchmarks. IEEE Trans. Parallel Distrib. Syst. 9, 5–23 (1998)
11.
Zurück zum Zitat Haneda, M., Knijnenburg, P.M.W., Wijshoff, H.A.G.: Automatic selection of compiler options using non-parametric inferential statistics. In: Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on, pp. 123–132 (2005). doi:10.1109/PACT.2005.9 Haneda, M., Knijnenburg, P.M.W., Wijshoff, H.A.G.: Automatic selection of compiler options using non-parametric inferential statistics. In: Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on, pp. 123–132 (2005). doi:10.​1109/​PACT.​2005.​9
13.
Zurück zum Zitat Kim, S.W., Voss, M., Eigenmann, R.: Performance analysis of compiler-parallelized programs on shared-memory multiprocessors. In: Proceedings of CPC2000 Compilers for Parallel Computers, p. 305 (2000) Kim, S.W., Voss, M., Eigenmann, R.: Performance analysis of compiler-parallelized programs on shared-memory multiprocessors. In: Proceedings of CPC2000 Compilers for Parallel Computers, p. 305 (2000)
15.
Zurück zum Zitat Larsen, P., Ladelsky, R., Lidman, J., McKee, S.A., Karlsson, S., Zaks, A.: Parallelizing more loops with compiler guided refactoring. In: Proceedings of the 2012 41st International Conference on Parallel Processing, ICPP ’12, pp. 410–419. IEEE Computer Society, Washington, DC, USA (2012). doi:10.1109/ICPP.2012.48 Larsen, P., Ladelsky, R., Lidman, J., McKee, S.A., Karlsson, S., Zaks, A.: Parallelizing more loops with compiler guided refactoring. In: Proceedings of the 2012 41st International Conference on Parallel Processing, ICPP ’12, pp. 410–419. IEEE Computer Society, Washington, DC, USA (2012). doi:10.​1109/​ICPP.​2012.​48
16.
Zurück zum Zitat Liao, C., Hernandez, O., Chapman, B., Chen, W., Zheng, W.: OpenUH: an optimizing, portable OpenMP compiler. Concurr. Comput. Pract. Exp. 19(18), 2317–2332 (2007)CrossRef Liao, C., Hernandez, O., Chapman, B., Chen, W., Zheng, W.: OpenUH: an optimizing, portable OpenMP compiler. Concurr. Comput. Pract. Exp. 19(18), 2317–2332 (2007)CrossRef
17.
Zurück zum Zitat Maleki, S., Gao, Y., Garzaran, M., Wong, T., Padua, D.: An evaluation of vectorizing compilers. In: Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pp. 372–382 (2011). doi:10.1109/PACT.2011.68 Maleki, S., Gao, Y., Garzaran, M., Wong, T., Padua, D.: An evaluation of vectorizing compilers. In: Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pp. 372–382 (2011). doi:10.​1109/​PACT.​2011.​68
18.
Zurück zum Zitat McKinley, K.S., Singhai, S.K., Weaver, G.E., Weems, C.C.: Compiling for Heterogeneous System: A Survey and an Approach. Technical Report. Amherst, MA, USA (1995) McKinley, K.S., Singhai, S.K., Weaver, G.E., Weems, C.C.: Compiling for Heterogeneous System: A Survey and an Approach. Technical Report. Amherst, MA, USA (1995)
19.
Zurück zum Zitat Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Artificial Intelligence: Methodology, Systems, Applications, pp. 41–50. Springer, Berlin (2002) Monsifrot, A., Bodin, F., Quiniou, R.: A machine learning approach to automatic production of compiler heuristics. In: Artificial Intelligence: Methodology, Systems, Applications, pp. 41–50. Springer, Berlin (2002)
20.
Zurück zum Zitat Mustafa, D., Auranzeb, Eigenmann, R.: Performance analysis and tuning of automatically parallelized openmp applications. In: Proceedings of the International Workshop on OpenMP, IWOMP (2011) Mustafa, D., Auranzeb, Eigenmann, R.: Performance analysis and tuning of automatically parallelized openmp applications. In: Proceedings of the International Workshop on OpenMP, IWOMP (2011)
22.
Zurück zum Zitat Nobayashi, H., Eoyang, C.: A comparison study of automatically vectorizing fortran compilers. In: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, pp. 820–825 (1989) Nobayashi, H., Eoyang, C.: A comparison study of automatically vectorizing fortran compilers. In: Proceedings of the 1989 ACM/IEEE Conference on Supercomputing, pp. 820–825 (1989)
25.
Zurück zum Zitat O’Boyle, M.F.P., Bull, J.M.: Expert programmer versus parallelizing compiler: a comparative study of two approaches for distributed shared memory. Sci. Program. 5(1), 63–88 (1996) O’Boyle, M.F.P., Bull, J.M.: Expert programmer versus parallelizing compiler: a comparative study of two approaches for distributed shared memory. Sci. Program. 5(1), 63–88 (1996)
28.
Zurück zum Zitat Pan, Z., Armstrong, B., Bae, H., Eigenmann, R.: On the interaction of tiling and automatic parallelization. In: First International Workshop on OpenMP, pp. 24–35 (2005) Pan, Z., Armstrong, B., Bae, H., Eigenmann, R.: On the interaction of tiling and automatic parallelization. In: First International Workshop on OpenMP, pp. 24–35 (2005)
31.
Zurück zum Zitat Pan, Z., Eigenmann, R.: Peak—a fast and effective performance tuning system via compiler optimization orchestration. ACM Trans. Program. Lang. Syst. 30, 1–43 (2008) Pan, Z., Eigenmann, R.: Peak—a fast and effective performance tuning system via compiler optimization orchestration. ACM Trans. Program. Lang. Syst. 30, 1–43 (2008)
32.
Zurück zum Zitat Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.H., Lenharth, A., Manevich, R., Méndez-Lojo, M., Prountzos, D., Sui, X.: The tao of parallelism in algorithms. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI ’11, pp. 12–25. ACM, New York, NY, USA (2011). doi:10.1145/1993498.1993501 Pingali, K., Nguyen, D., Kulkarni, M., Burtscher, M., Hassaan, M.A., Kaleem, R., Lee, T.H., Lenharth, A., Manevich, R., Méndez-Lojo, M., Prountzos, D., Sui, X.: The tao of parallelism in algorithms. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI ’11, pp. 12–25. ACM, New York, NY, USA (2011). doi:10.​1145/​1993498.​1993501
33.
Zurück zum Zitat Pinkers, R.P.J., Knijnenburg, P.M.W., Haneda, M., Wijshoff, H.A.G.: Statistical selection of compiler options. In: Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings. The IEEE Computer Society’s 12th Annual International Symposium on, pp. 494–501 (2004). doi:10.1109/MASCOT.2004.1348305 Pinkers, R.P.J., Knijnenburg, P.M.W., Haneda, M., Wijshoff, H.A.G.: Statistical selection of compiler options. In: Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings. The IEEE Computer Society’s 12th Annual International Symposium on, pp. 494–501 (2004). doi:10.​1109/​MASCOT.​2004.​1348305
36.
Zurück zum Zitat Schulte, W., Tillmann, N.: Automatic parallelization of programming languages: past, present and future. In: Proceedings of the 3rd International Workshop on Multicore Software Engineering, IWMSE ’10, pp. 1–1. ACM, New York, NY, USA (2010). doi:10.1145/1808954.1808956 Schulte, W., Tillmann, N.: Automatic parallelization of programming languages: past, present and future. In: Proceedings of the 3rd International Workshop on Multicore Software Engineering, IWMSE ’10, pp. 1–1. ACM, New York, NY, USA (2010). doi:10.​1145/​1808954.​1808956
37.
Zurück zum Zitat Shen, Z., Li, Z., Yew, P.: An empirical study of fortran programs for parallelizing compilers. IEEE Trans. Parallel Distrib. Syst. 1, 356–364 (1990) Shen, Z., Li, Z., Yew, P.: An empirical study of fortran programs for parallelizing compilers. IEEE Trans. Parallel Distrib. Syst. 1, 356–364 (1990)
39.
Zurück zum Zitat Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. SIGPLAN Not. 44(6), 177–187 (2009). doi:10.1145/1543135.1542496 CrossRef Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. SIGPLAN Not. 44(6), 177–187 (2009). doi:10.​1145/​1543135.​1542496 CrossRef
40.
Zurück zum Zitat Vandierendonck, H., Rul, S., Bosschere, K.D.: The paralax infrastructure: automatic parallelization with a helping hand. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 389–400 (2010) Vandierendonck, H., Rul, S., Bosschere, K.D.: The paralax infrastructure: automatic parallelization with a helping hand. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 389–400 (2010)
41.
Zurück zum Zitat William, B., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. Computer. 29(12), 78–82 (1996) William, B., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. Computer. 29(12), 78–82 (1996)
42.
Zurück zum Zitat Yoo, S., Lee, H., Killian, C., Kulkarni, M.: Incontext: simple parallelism for distributed applications. In: Proceedings of the 20th International Symposium on High performance distributed computing, HPDC ’11, pp. 97–108. ACM, New York, NY, USA (2011). doi:10.1145/1996130.1996144 Yoo, S., Lee, H., Killian, C., Kulkarni, M.: Incontext: simple parallelism for distributed applications. In: Proceedings of the 20th International Symposium on High performance distributed computing, HPDC ’11, pp. 97–108. ACM, New York, NY, USA (2011). doi:10.​1145/​1996130.​1996144
Metadaten
Titel
PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers
verfasst von
Dheya Mustafa
Rudolf Eigenmann
Publikationsdatum
01.08.2015
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 4/2015
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-014-0307-8

Weitere Artikel der Ausgabe 4/2015

International Journal of Parallel Programming 4/2015 Zur Ausgabe