nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Analysis of Intel’s Haswell Microarchitecture Using the ECM Model and Microbenchmarks

verfasst von : Johannes Hofmann, Dietmar Fey, Jan Eitzinger, Georg Hager, Gerhard Wellein

Erschienen in: Architecture of Computing Systems – ARCS 2016

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper presents an in-depth analysis of Intel’s Haswell microarchitecture for streaming loop kernels. Among the new features examined are the dual-ring Uncore design, Cluster-on-Die mode, Uncore Frequency Scaling, enhancements such as new and improved execution units, as well as improvements throughout the memory hierarchy. The Execution-Cache-Memory diagnostic performance model is used together with a generic set of microbenchmarks to quantify the efficiency of the microarchitecture. The set of microbenchmarks is chosen in a way that it can serve as a blueprint for other streaming loop kernels.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel DiaSys: On-Chip Trace Analysis for Multi-processor System-on-Chip

Nächstes Kapitel Measurement-Based Probabilistic Timing Analysis for Graphics Processor Units

Normally, with two AVX mul ports available, \(T_\mathrm {OL}\) should be 1 c. However, the frontend can only retire 4 \(\mu \)ops/c; this, along with the fact that stores count as 2 \(\mu \)ops, means that if both multiplications were paired with the first store, there would not be enough full AGUs to retire the second store and the remaining AVX load instructions in the same cycle.

Cache Coherence Protocol and Memory Performance of the Intel Haswell-EP Architecture. IEEE (2015)

Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multicore chips via simple machine models. Concurrency Computat: Pract. Exper. (2013). doi:10.1002/cpe.3180

Hofmann, J., Treibig, J., Fey, D.: Execution-cache-memory performance model: introduction and validation (2015)

Intel Corporation: Intel Xeon Processor E5-2600/4600 Product Family Technical Overview. https://www.software.intel.com/en-us/articles/intel-xeon-processor-e526004600-product-family-technical-overview

Intel Corporation: Intel Technology Journal 14(3) (2010)

McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995

Molka, D., Hackenberg, D., Schöne, R.: Main memory and cache performance of intel sandy bridge and amd bulldozer. In: Proceedings of the Workshop on Memory Systems Performance and Correctness, MSPC 2014, pp. 4: 1–4:10. ACM (2014)

Schönauer, W.: Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers. Self-edition (2000)

Schöne, R., Hackenberg, D., Molka, D.: Memory performance at reduced cpu clock speeds: an analysis of current x86\_64 processors. In: Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems, HotPower 2012, p. 9. USENIX Association (2012)

10.

Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS 2015. ACM, New York (2015). http://doi.acm.org/10.1145/2751205.2751240

11.

Treibig, J., Hager, G.: Introducing a performance model for bandwidth-limited loop kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009, Part I. LNCS, vol. 6067, pp. 615–624. Springer, Heidelberg (2010)CrossRef

12.

Treibig, J., Hager, G., Wellein, G.: likwid-bench: an extensible microbenchmarking platform for x86 multicore compute nodes. In: Parallel Tools Workshop, pp. 27–36 (2011)

Titel: Analysis of Intel’s Haswell Microarchitecture Using the ECM Model and Microbenchmarks
verfasst von: Johannes Hofmann
Dietmar Fey
Jan Eitzinger
Georg Hager
Gerhard Wellein
Verlag: Springer International Publishing
Buch: Architecture of Computing Systems – ARCS 2016
Print ISBN: 978-3-319-30694-0

Electronic ISBN: 978-3-319-30695-7

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-30695-7_16

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner