Skip to main content
Erschienen in: Software Quality Journal 3/2018

16.05.2017

A multi-aspect online tuning framework for HPC applications

verfasst von: Michael Gerndt, Siegfried Benkner, Eduardo César, Carmen Navarrete, Enes Bajrovic, Jiri Dokulil, Carla Guillén, Robert Mijakovic, Anna Sikora

Erschienen in: Software Quality Journal | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Developing software applications for high-performance computing (HPC) requires careful optimizations targeting a myriad of increasingly complex, highly interrelated software, hardware and system components. The demands placed on minimizing energy consumption on extreme-scale HPC systems and the associated shift towards hete rogeneous architectures add yet another level of complexity to program development and optimization. As a result, the software optimization process is often seen as daunting, cumbersome and time-consuming by software developers wishing to fully exploit HPC resources. To address these challenges, we have developed the Periscope Tuning Framework (PTF), an online automatic integrated tuning framework that combines both performance analysis and performance tuning with respect to the myriad of tuning parameters available to today’s software developer on modern HPC systems. This work introduces the architecture, tuning model and main infrastructure components of PTF as well as the main tuning plugins of PTF and their evaluation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
2
The software environment on SuperMUC comprised the Intel Compiler 14, Parallel Environment 1.3, and OS SLE11 SP3. Details on SuperMUC can be found at: https://​www.​lrz.​de/​services/​compute/​supermuc/​systemdescriptio​n
 
3
Due to the large number of flags, exhaustive search has not been used. It would have required over 27000 experiments.
 
4
Governors are processor policies to change frequency.
 
Literatur
Zurück zum Zitat Bajrovic, E., Mijakovic, R., Dokulil, J., Benkner, S., & Gerndt, M. (2016). Tuning OpenCL applications with the periscope tuning framework, Hawaii international conference on system sciences. IEEE. Bajrovic, E., Mijakovic, R., Dokulil, J., Benkner, S., & Gerndt, M. (2016). Tuning OpenCL applications with the periscope tuning framework, Hawaii international conference on system sciences. IEEE.
Zurück zum Zitat Balaprakash, P., Tiwari, A., & Wild, S.M. (2013). Multi-objective optimization of hpc kernels for performance, power, and energy, 4th international workshop on performance modeling, benchmarking, and simulation of HPC systems (PMBS12), 11/2013. Balaprakash, P., Tiwari, A., & Wild, S.M. (2013). Multi-objective optimization of hpc kernels for performance, power, and energy, 4th international workshop on performance modeling, benchmarking, and simulation of HPC systems (PMBS12), 11/2013.
Zurück zum Zitat Benedict, S., Petkov, V., & Gerndt, M. (2010). Periscope: An online-based distributed performance analysis tool. In Müller, M.S., Resch, M.M., Schulz, A., & Nagel, W.E. (Eds.), Tools for high performance computing 2009 (pp. 1–16). Berlin Heidelberg: Springer. Benedict, S., Petkov, V., & Gerndt, M. (2010). Periscope: An online-based distributed performance analysis tool. In Müller, M.S., Resch, M.M., Schulz, A., & Nagel, W.E. (Eds.), Tools for high performance computing 2009 (pp. 1–16). Berlin Heidelberg: Springer.
Zurück zum Zitat Bruel, P., Gonzalez, M., & Goldman, A. (2015). Autotuning gpu compiler parameters using opentuner. XXII Symposium of Systems of High Performance Computing. Bruel, P., Gonzalez, M., & Goldman, A. (2015). Autotuning gpu compiler parameters using opentuner. XXII Symposium of Systems of High Performance Computing.
Zurück zum Zitat Buck, B., & Hollingsworth, J.K. (2000). An api for runtime code patching. International Journal of High Performance Computing Applications, 14(4), 317–329.CrossRef Buck, B., & Hollingsworth, J.K. (2000). An api for runtime code patching. International Journal of High Performance Computing Applications, 14(4), 317–329.CrossRef
Zurück zum Zitat Chen, C., Chame, J., & Hall, M. (2008). Chill: A framework for composing high-level loop transformations. Technical report University of Southern California. Chen, C., Chame, J., & Hall, M. (2008). Chill: A framework for composing high-level loop transformations. Technical report University of Southern California.
Zurück zum Zitat Chung, I-H., & Hollingsworth, J.K. (2004). Using information from prior runs to improve automated tuning systems, Proceedings of the 2004 ACM/IEEE conference on supercomputing, SC ’04 (p. 30). Washington: IEEE Computer Society. Chung, I-H., & Hollingsworth, J.K. (2004). Using information from prior runs to improve automated tuning systems, Proceedings of the 2004 ACM/IEEE conference on supercomputing, SC ’04 (p. 30). Washington: IEEE Computer Society.
Zurück zum Zitat Costa, G., Jorba, J., Morajko, A., Margalef, T., & Luque, E. (2008). Performance models for dynamic tuning of parallel applications on computational grids, 2008 IEEE international conference on cluster computing (pp. 376–385).CrossRef Costa, G., Jorba, J., Morajko, A., Margalef, T., & Luque, E. (2008). Performance models for dynamic tuning of parallel applications on computational grids, 2008 IEEE international conference on cluster computing (pp. 376–385).CrossRef
Zurück zum Zitat Costa, G., Sikora, A., Jorba, J., & Gmate, T.M. (2014). Dynamic tuning of parallel applications in grid environment. Journal of Grid Computing, 12(2), 371–398.CrossRef Costa, G., Sikora, A., Jorba, J., & Gmate, T.M. (2014). Dynamic tuning of parallel applications in grid environment. Journal of Grid Computing, 12(2), 371–398.CrossRef
Zurück zum Zitat Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., & Yelick, K. (2008). Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on supercomputing, SC ’08 (pp. 4:1–4:12). Piscataway: IEEE Press. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., & Yelick, K. (2008). Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on supercomputing, SC ’08 (pp. 4:1–4:12). Piscataway: IEEE Press.
Zurück zum Zitat Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, R. C., & Yelick, K. (2005). Self-adapting linear algebra algorithms and software. Proceedings of the IEEE, 93(2), 293–312.CrossRef Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, R. C., & Yelick, K. (2005). Self-adapting linear algebra algorithms and software. Proceedings of the IEEE, 93(2), 293–312.CrossRef
Zurück zum Zitat Frigo, M., & Johnson, S. G. (1998). Fftw: an adaptive software architecture for the fft. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (Vol. 3, pp. 1381–1384). Frigo, M., & Johnson, S. G. (1998). Fftw: an adaptive software architecture for the fft. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (Vol. 3, pp. 1381–1384).
Zurück zum Zitat Frigo, M., & Johnson, S. G. (2005). The design and implementation of fftw3. Proceedings of the IEEE, 93(2), 216–231.CrossRef Frigo, M., & Johnson, S. G. (2005). The design and implementation of fftw3. Proceedings of the IEEE, 93(2), 216–231.CrossRef
Zurück zum Zitat Fursin, G., Kashnikov, Y., Memon, A.W., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J., Williams, C.K.I., & O’Boyle, M. (2011). Milepost gcc Machine learning enabled self-tuning compiler. International Journal of Parallel Programming, 39(3), 296–327.CrossRef Fursin, G., Kashnikov, Y., Memon, A.W., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J., Williams, C.K.I., & O’Boyle, M. (2011). Milepost gcc Machine learning enabled self-tuning compiler. International Journal of Parallel Programming, 39(3), 296–327.CrossRef
Zurück zum Zitat Gerndt, M., César, E., & Benkner, S. (eds.) (2015). Automatic tuning of HPC applications - the periscope tuning framework. Shaker Verlag. Gerndt, M., César, E., & Benkner, S. (eds.) (2015). Automatic tuning of HPC applications - the periscope tuning framework. Shaker Verlag.
Zurück zum Zitat Haneda, M., Knijnenburg, P. M. W., & Wijshoff, H.A.G. (2005). Automatic selection of compiler options using non-parametric inferential statistics, 14th International conference on parallel architectures and compilation techniques, 2005. PACT 2005 (pp. 123–132). Haneda, M., Knijnenburg, P. M. W., & Wijshoff, H.A.G. (2005). Automatic selection of compiler options using non-parametric inferential statistics, 14th International conference on parallel architectures and compilation techniques, 2005. PACT 2005 (pp. 123–132).
Zurück zum Zitat Kukkonen, S., & Lampinen, J. (2005). Gde3: The third evolution step of generalized differential evolution. In The 2005 IEEE congress on evolutionary computation, 2005 (Vol. 1, pp. 443–450). IEEE. Kukkonen, S., & Lampinen, J. (2005). Gde3: The third evolution step of generalized differential evolution. In The 2005 IEEE congress on evolutionary computation, 2005 (Vol. 1, pp. 443–450). IEEE.
Zurück zum Zitat Leather, H., Bonilla, E., & O’Boyle, M. (2009). Automatic feature generation for machine learning based optimizing compilation, Proceedings of the 7th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’09 (pp. 81–91). Washington: IEEE Computer Society.CrossRef Leather, H., Bonilla, E., & O’Boyle, M. (2009). Automatic feature generation for machine learning based optimizing compilation, Proceedings of the 7th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’09 (pp. 81–91). Washington: IEEE Computer Society.CrossRef
Zurück zum Zitat Morajko, A., Caymes-Scutari, P., Margalef, T., & Mate, E. Luque. (2007). Monitoring, analysis and tuning environment for parallel/distributed applications. Concurrency and Computation: Practice and Experience, 19(11), 1517–1531.CrossRefMATH Morajko, A., Caymes-Scutari, P., Margalef, T., & Mate, E. Luque. (2007). Monitoring, analysis and tuning environment for parallel/distributed applications. Concurrency and Computation: Practice and Experience, 19(11), 1517–1531.CrossRefMATH
Zurück zum Zitat Morajko, A., César, E., Caymes-Scutari, P., Margalef, T., Sorribes, J., & Luque, E. (2005). Automatic tuning of Master/Worker applications. In Proceedings of Euro-Par 2005 parallel processing: 11th international euro-par conference (pp. 95–103). Morajko, A., César, E., Caymes-Scutari, P., Margalef, T., Sorribes, J., & Luque, E. (2005). Automatic tuning of Master/Worker applications. In Proceedings of Euro-Par 2005 parallel processing: 11th international euro-par conference (pp. 95–103).
Zurück zum Zitat Navarette, C., Guillen, C., Hesse, W., & Brehm, M. (2014). Autotuning the energy consumption. In Bader, M. et al. (Eds.) Parallel computing accelerating computational science and engineering. IOS Press. Navarette, C., Guillen, C., Hesse, W., & Brehm, M. (2014). Autotuning the energy consumption. In Bader, M. et al. (Eds.) Parallel computing accelerating computational science and engineering. IOS Press.
Zurück zum Zitat Nelson, Y. L., Bansal, B., Hall, M., Nakano, A., & Lerman, K. (2008). Model-guided performance tuning of parameter values A case study with molecular dynamics visualization, IEEE international symposium on parallel and distributed processing, 2008. IPDPS 2008 (pp. 1–8). Nelson, Y. L., Bansal, B., Hall, M., Nakano, A., & Lerman, K. (2008). Model-guided performance tuning of parameter values A case study with molecular dynamics visualization, IEEE international symposium on parallel and distributed processing, 2008. IPDPS 2008 (pp. 1–8).
Zurück zum Zitat Oleynik, Y., Gerndt, M., Schuchart, J., Kjeldsberg, P.G., & Nagel, W.E. (2015). Run-time exploitation of application dynamism for energy-efficient exascale computing (READEX). In IEEE 18th international conference on computational science and engineering (CSE), 2015 (pp. 347–350). IEEE. Oleynik, Y., Gerndt, M., Schuchart, J., Kjeldsberg, P.G., & Nagel, W.E. (2015). Run-time exploitation of application dynamism for energy-efficient exascale computing (READEX). In IEEE 18th international conference on computational science and engineering (CSE), 2015 (pp. 347–350). IEEE.
Zurück zum Zitat Pan, Z., & Eigenmann, R. (2006). Fast and effective orchestration of compiler optimizations for automatic performance tuning, Proceedings of the international symposium on code generation and optimization, CGO ’06 (pp. 319–332). Washington: IEEE Computer Society. Pan, Z., & Eigenmann, R. (2006). Fast and effective orchestration of compiler optimizations for automatic performance tuning, Proceedings of the international symposium on code generation and optimization, CGO ’06 (pp. 319–332). Washington: IEEE Computer Society.
Zurück zum Zitat Püschel, M., Moura, J.M. F., Singer, B., Xiong, J., Johnson, J., Padua, D., Veloso, M., & Johnson, R.W. (2004). Spiral: a generator for platform-adapted libraries of signal processing algorithms. International Journal of High Performance Computing Applications, 18(1), 21–45.CrossRef Püschel, M., Moura, J.M. F., Singer, B., Xiong, J., Johnson, J., Padua, D., Veloso, M., & Johnson, R.W. (2004). Spiral: a generator for platform-adapted libraries of signal processing algorithms. International Journal of High Performance Computing Applications, 18(1), 21–45.CrossRef
Zurück zum Zitat Ravipati, G., Bernat, A.R., Miller, B.P., & Hollingsworth, J.K. (2007). Towards the deconstruction of dyninst. Technical report. University of Wisconsin. Ravipati, G., Bernat, A.R., Miller, B.P., & Hollingsworth, J.K. (2007). Towards the deconstruction of dyninst. Technical report. University of Wisconsin.
Zurück zum Zitat Ribler, R.L., Simitci, H., & Reed, D.A. (2001). The autopilot performance-directed adaptive control system. Future Generation Computer Systems, 18(1), 175–187.CrossRefMATH Ribler, R.L., Simitci, H., & Reed, D.A. (2001). The autopilot performance-directed adaptive control system. Future Generation Computer Systems, 18(1), 175–187.CrossRefMATH
Zurück zum Zitat Ribler, R. L., Vetter, J. S., Simitci, H., & Reed, D. A. (1998). Autopilot: adaptive control of distributed applications, Proceedings of the 7th international symposium on high performance distributed computing, 1998 (pp. 172–179). Ribler, R. L., Vetter, J. S., Simitci, H., & Reed, D. A. (1998). Autopilot: adaptive control of distributed applications, Proceedings of the 7th international symposium on high performance distributed computing, 1998 (pp. 172–179).
Zurück zum Zitat Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.-K., & Leiserson, C.E. (2011). The pochoir stencil compiler, Proceedings of the 23rd annual ACM symposium on parallelism in algorithms and architectures, SPAA ’11 (pp. 117–128). New York: ACM. Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.-K., & Leiserson, C.E. (2011). The pochoir stencil compiler, Proceedings of the 23rd annual ACM symposium on parallelism in algorithms and architectures, SPAA ’11 (pp. 117–128). New York: ACM.
Zurück zum Zitat Tiwari, A., Chen, C., Chame, J., Hall, M., & Hollingsworth, J.K. (2009). A scalable auto-tuning framework for compiler optimization, IEEE International symposium on parallel distributed processing, 2009. IPDPS 2009 (pp. 1–12). Tiwari, A., Chen, C., Chame, J., Hall, M., & Hollingsworth, J.K. (2009). A scalable auto-tuning framework for compiler optimization, IEEE International symposium on parallel distributed processing, 2009. IPDPS 2009 (pp. 1–12).
Zurück zum Zitat Tiwari, A., & Hollingsworth, J. K. (2011). Online adaptive code generation and tuning. In 2011 IEEE international parallel distributed processing symposium (IPDPS) (pp. 879–892). Tiwari, A., & Hollingsworth, J. K. (2011). Online adaptive code generation and tuning. In 2011 IEEE international parallel distributed processing symposium (IPDPS) (pp. 879–892).
Zurück zum Zitat Ţăpuş, C., Chung, I-H., & Hollingsworth, J.K. (2002). Active harmony: Towards automated performance tuning, Proceedings of the 2002 ACM/IEEE conference on supercomputing, SC ’02 (pp. 1–11). Los Alamitos: IEEE Computer Society Press. Ţăpuş, C., Chung, I-H., & Hollingsworth, J.K. (2002). Active harmony: Towards automated performance tuning, Proceedings of the 2002 ACM/IEEE conference on supercomputing, SC ’02 (pp. 1–11). Los Alamitos: IEEE Computer Society Press.
Zurück zum Zitat Triantafyllis, S., Vachharajani, M., Vachharajani, N., & August, D.I. (2003). Compiler optimization-space exploration, Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03 (pp. 204–215). Washington: IEEE Computer Society. Triantafyllis, S., Vachharajani, M., Vachharajani, N., & August, D.I. (2003). Compiler optimization-space exploration, Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03 (pp. 204–215). Washington: IEEE Computer Society.
Zurück zum Zitat Vuduc, R., Demmel, J.W., & Yelick, K.A. (2005). Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1), 521. Vuduc, R., Demmel, J.W., & Yelick, K.A. (2005). Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1), 521.
Zurück zum Zitat Whaley, R.C., Petitet, A., & Dongarra, J.J. (2001). Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(12), 3–35. New Trends in High Performance Computing.CrossRefMATH Whaley, R.C., Petitet, A., & Dongarra, J.J. (2001). Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(12), 3–35. New Trends in High Performance Computing.CrossRefMATH
Zurück zum Zitat Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52 (4), 65–76.CrossRef Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52 (4), 65–76.CrossRef
Zurück zum Zitat Xiujuan, L., & Zhongke, S. (2004). Overview of multi-objective optimization methods. Journal of Systems Engineering and Electronics, 15(2), 142–146. Xiujuan, L., & Zhongke, S. (2004). Overview of multi-objective optimization methods. Journal of Systems Engineering and Electronics, 15(2), 142–146.
Metadaten
Titel
A multi-aspect online tuning framework for HPC applications
verfasst von
Michael Gerndt
Siegfried Benkner
Eduardo César
Carmen Navarrete
Enes Bajrovic
Jiri Dokulil
Carla Guillén
Robert Mijakovic
Anna Sikora
Publikationsdatum
16.05.2017
Verlag
Springer US
Erschienen in
Software Quality Journal / Ausgabe 3/2018
Print ISSN: 0963-9314
Elektronische ISSN: 1573-1367
DOI
https://doi.org/10.1007/s11219-017-9370-x

Weitere Artikel der Ausgabe 3/2018

Software Quality Journal 3/2018 Zur Ausgabe