Skip to main content

2016 | OriginalPaper | Buchkapitel

Analysis of Intel’s Haswell Microarchitecture Using the ECM Model and Microbenchmarks

verfasst von : Johannes Hofmann, Dietmar Fey, Jan Eitzinger, Georg Hager, Gerhard Wellein

Erschienen in: Architecture of Computing Systems – ARCS 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents an in-depth analysis of Intel’s Haswell microarchitecture for streaming loop kernels. Among the new features examined are the dual-ring Uncore design, Cluster-on-Die mode, Uncore Frequency Scaling, enhancements such as new and improved execution units, as well as improvements throughout the memory hierarchy. The Execution-Cache-Memory diagnostic performance model is used together with a generic set of microbenchmarks to quantify the efficiency of the microarchitecture. The set of microbenchmarks is chosen in a way that it can serve as a blueprint for other streaming loop kernels.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Normally, with two AVX mul ports available, \(T_\mathrm {OL}\) should be 1 c. However, the frontend can only retire 4 \(\mu \)ops/c; this, along with the fact that stores count as 2 \(\mu \)ops, means that if both multiplications were paired with the first store, there would not be enough full AGUs to retire the second store and the remaining AVX load instructions in the same cycle.
 
Literatur
1.
Zurück zum Zitat Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. IEEE (2015) Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. IEEE (2015)
2.
Zurück zum Zitat Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency Computat: Pract. Exper. (2013). doi:10.1002/cpe.3180 Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency Computat: Pract. Exper. (2013). doi:10.​1002/​cpe.​3180
3.
Zurück zum Zitat Hofmann, J., Treibig, J., Fey, D.: Execution-cache-memory performance model: introduction and validation (2015) Hofmann, J., Treibig, J., Fey, D.: Execution-cache-memory performance model: introduction and validation (2015)
5.
Zurück zum Zitat Intel Corporation: Intel Technology Journal 14(3) (2010) Intel Corporation: Intel Technology Journal 14(3) (2010)
6.
Zurück zum Zitat McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995 McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995
7.
Zurück zum Zitat Molka, D., Hackenberg, D., Schöne, R.: Main memory and cache performance of intel sandy bridge and amd bulldozer. In: Proceedings of the Workshop on Memory Systems Performance and Correctness, MSPC 2014, pp. 4: 1–4:10. ACM (2014) Molka, D., Hackenberg, D., Schöne, R.: Main memory and cache performance of intel sandy bridge and amd bulldozer. In: Proceedings of the Workshop on Memory Systems Performance and Correctness, MSPC 2014, pp. 4: 1–4:10. ACM (2014)
8.
Zurück zum Zitat Schönauer, W.: Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers. Self-edition (2000) Schönauer, W.: Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers. Self-edition (2000)
9.
Zurück zum Zitat Schöne, R., Hackenberg, D., Molka, D.: Memory performance at reduced cpu clock speeds: an analysis of current x86\_64 processors. In: Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems, HotPower 2012, p. 9. USENIX Association (2012) Schöne, R., Hackenberg, D., Molka, D.: Memory performance at reduced cpu clock speeds: an analysis of current x86\_64 processors. In: Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems, HotPower 2012, p. 9. USENIX Association (2012)
10.
Zurück zum Zitat Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS 2015. ACM, New York (2015). http://doi.acm.org/10.1145/2751205.2751240 Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS 2015. ACM, New York (2015). http://​doi.​acm.​org/​10.​1145/​2751205.​2751240
11.
Zurück zum Zitat Treibig, J., Hager, G.: Introducing a performance model for bandwidth-limited loop kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009, Part I. LNCS, vol. 6067, pp. 615–624. Springer, Heidelberg (2010)CrossRef Treibig, J., Hager, G.: Introducing a performance model for bandwidth-limited loop kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009, Part I. LNCS, vol. 6067, pp. 615–624. Springer, Heidelberg (2010)CrossRef
12.
Zurück zum Zitat Treibig, J., Hager, G., Wellein, G.: likwid-bench: an extensible microbenchmarking platform for x86 multicore compute nodes. In: Parallel Tools Workshop, pp. 27–36 (2011) Treibig, J., Hager, G., Wellein, G.: likwid-bench: an extensible microbenchmarking platform for x86 multicore compute nodes. In: Parallel Tools Workshop, pp. 27–36 (2011)
Metadaten
Titel
Analysis of Intel’s Haswell Microarchitecture Using the ECM Model and Microbenchmarks
verfasst von
Johannes Hofmann
Dietmar Fey
Jan Eitzinger
Georg Hager
Gerhard Wellein
Copyright-Jahr
2016
Verlag
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-30695-7_16

Premium Partner