Skip to main content
Erschienen in:
Buchtitelbild

2017 | OriginalPaper | Buchkapitel

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

verfasst von : Julian Hammer, Jan Eitzinger, Georg Hager, Gerhard Wellein

Erschienen in: Tools for High Performance Computing 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Achieving optimal program performance requires deep insight into the interaction between hardware and software. For software developers without an in-depth background in computer architecture, understanding and fully utilizing modern architectures is close to impossible. Analytic loop performance modeling is a useful way to understand the relevant bottlenecks of code execution based on simple machine models. The Roofline Model and the Execution-Cache-Memory (ECM) model are proven approaches to performance modeling of loop nests. In comparison to the Roofline model, the ECM model can also describes the single-core performance and saturation behavior on a multicore chip.We give an introduction to the Roofline and ECM models, and to stencil performance modeling using layer conditions (LC). We then present Kerncraft, a tool that can automatically construct Roofline and ECM models for loop nests by performing the required code, data transfer, and LC analysis. The layer condition analysis allows to predict optimal spatial blocking factors for loop nests. Together with the models it enables an ab-initio estimate of the potential benefits of loop blocking optimizations and of useful block sizes. In cases where LC analysis is not easily possible, Kerncraft supports a cache simulator as a fallback option. Using a 25-point long-range stencil we demonstrate the usefulness and predictive power of the Kerncraft tool.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
Kerncraft currently only supports Intel Xeon and Core architectures, but pycachesim has been developed with other architectures in mind.
 
Literatur
6.
Zurück zum Zitat Hammer, J., Hager, G., Eitzinger, J., Wellein, G.: Automatic loop kernel analysis and performance modeling with kerncraft. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS ’15, pp. 4:1–4:11. ACM, New York (2015). doi:10.1145/2832087.2832092 Hammer, J., Hager, G., Eitzinger, J., Wellein, G.: Automatic loop kernel analysis and performance modeling with kerncraft. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS ’15, pp. 4:1–4:11. ACM, New York (2015). doi:10.​1145/​2832087.​2832092
7.
8.
Zurück zum Zitat Hofmann, J., Fey, D., Riedmann, M., Eitzinger, J., Hager, G., Wellein, G.: Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors. Concurr. Comput. Pract. Exper. (2016). doi:10.1002/cpe.3921 Hofmann, J., Fey, D., Riedmann, M., Eitzinger, J., Hager, G., Wellein, G.: Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors. Concurr. Comput. Pract. Exper. (2016). doi:10.​1002/​cpe.​3921
12.
Zurück zum Zitat Kreutzer, M., Thies, J., Röhrig-Zöllner, M., Pieper, A., Shahzad, F., Galgon, M., Basermann, A., Fehske, H., Hager, G., Wellein, G.: GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems. Int. J. Parallel Prog. 1–27 (2016). doi:10.1007/s10766-016-0464-z Kreutzer, M., Thies, J., Röhrig-Zöllner, M., Pieper, A., Shahzad, F., Galgon, M., Basermann, A., Fehske, H., Hager, G., Wellein, G.: GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems. Int. J. Parallel Prog. 1–27 (2016). doi:10.​1007/​s10766-016-0464-z
13.
Zurück zum Zitat Lo, Y., Williams, S., Van Straalen, B., Ligocki, T., Cordery, M., Wright, N., Hall, M., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. Lecture Notes in Computer Science, vol. 8966, pp. 129–148. Springer International Publishing, Berlin (2015). doi: 10.1007/978-3-319-17248-4_7 Lo, Y., Williams, S., Van Straalen, B., Ligocki, T., Cordery, M., Wright, N., Hall, M., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. Lecture Notes in Computer Science, vol. 8966, pp. 129–148. Springer International Publishing, Berlin (2015). doi: 10.1007/978-3-319-17248-4_7
14.
15.
Zurück zum Zitat Narayanan, S.H.K., Norris, B., Hovland, P.D.: Generating performance bounds from source code. In: 2010 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 197–206 (2010). doi:10.1109/ICPPW.2010.37 Narayanan, S.H.K., Norris, B., Hovland, P.D.: Generating performance bounds from source code. In: 2010 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 197–206 (2010). doi:10.​1109/​ICPPW.​2010.​37
16.
Zurück zum Zitat Rivera, G., Tseng, C.W.: Tiling optimizations for 3D scientific computations. In: Supercomputing, ACM/IEEE 2000 Conference, pp. 32–32 (2000). doi:10.1109/SC.2000.10015 Rivera, G., Tseng, C.W.: Tiling optimizations for 3D scientific computations. In: Supercomputing, ACM/IEEE 2000 Conference, pp. 32–32 (2000). doi:10.​1109/​SC.​2000.​10015
17.
Zurück zum Zitat Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS ’15, pp. 207–216. ACM, New York (2015). doi:10.1145/2751205.2751240 Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS ’15, pp. 207–216. ACM, New York (2015). doi:10.​1145/​2751205.​2751240
19.
Zurück zum Zitat Treibig, J., Hager, G., Wellein, G.: Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego, CA (2010) Treibig, J., Hager, G., Wellein, G.: Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego, CA (2010)
20.
Zurück zum Zitat Unat, D., Chan, C., Zhang, W., Williams, S., Bachan, J., Bell, J., Shalf, J.: ExaSAT: an exascale co-design tool for performance modeling. Int. J. High Perform. Comput. Appl. 29 (2), 209–232 (2015). doi:10.1177/1094342014568690CrossRef Unat, D., Chan, C., Zhang, W., Williams, S., Bachan, J., Bell, J., Shalf, J.: ExaSAT: an exascale co-design tool for performance modeling. Int. J. High Perform. Comput. Appl. 29 (2), 209–232 (2015). doi:10.1177/1094342014568690CrossRef
22.
Zurück zum Zitat Wittmann, M., Hager, G., Zeiser, T., Treibig, J., Wellein, G.: Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations. Concurrency Comput. Pract. Exper. 28 (7), 2295–2315 (2016). doi:10.1002/cpe.3489 CrossRef Wittmann, M., Hager, G., Zeiser, T., Treibig, J., Wellein, G.: Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations. Concurrency Comput. Pract. Exper. 28 (7), 2295–2315 (2016). doi:10.​1002/​cpe.​3489 CrossRef
Metadaten
Titel
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
verfasst von
Julian Hammer
Jan Eitzinger
Georg Hager
Gerhard Wellein
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-56702-0_1

Premium Partner