Skip to main content
Erschienen in: The Journal of Supercomputing 1/2014

01.10.2014

Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications

verfasst von: Yonggang Che, Lilun Zhang, Yongxian Wang, Chuanfu Xu, Wei Liu, Zhenghua Wang

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper comparatively evaluates the microarchitectural performance of two representative Computational Fluid Dynamics (CFD) applications on the Intel Many Integrated Core (MIC) product, the Intel Knights Corner (KNC) coprocessor, and the Intel Sand Bridge (SNB) processor. Performance Monitoring Unit-based measurement method is used, along with a two-phase measurement method and some considerations to minimize the errors and instabilities. The results show that the CFD applications are sensitive to architecture factors. Their single thread performance and efficiency on KNC are much lower than that on SNB. Branch prediction and memory access are two primary factors that make the performance difference. The applications’ low-computational intensity and inefficient vector instruction usage are two additional factors. To be more efficient for the CFD applications, the MIC architecture needs to improve its branch prediction mechanism and memory hierarchy. Fine tuning of application codes is also crucial and is hard work.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Intel Corporation (2012) Intel Xeon Phi coprocessor datasheet Intel Corporation (2012) Intel Xeon Phi coprocessor datasheet
3.
Zurück zum Zitat Jeffers J, Reinders J (2013) Intel Xeon Phi coprocessor high performance programming. Morgan Kaufmann Press, Menlo Park Jeffers J, Reinders J (2013) Intel Xeon Phi coprocessor high performance programming. Morgan Kaufmann Press, Menlo Park
4.
Zurück zum Zitat Intel Corporation (2012) An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors, Rev 20121015 Intel Corporation (2012) An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors, Rev 20121015
7.
Zurück zum Zitat Raman K (2013) Sandias molecular dynamics miniMD performance optimizations Raman K (2013) Sandias molecular dynamics miniMD performance optimizations
8.
Zurück zum Zitat Kamruzzaman M, Swanson S, Tullsen DM (2010) Data software, spreading: leveraging distributed caches to improve single thread performance. PLDI’10, Toronto, Ontario, Canada, June 5–10 Kamruzzaman M, Swanson S, Tullsen DM (2010) Data software, spreading: leveraging distributed caches to improve single thread performance. PLDI’10, Toronto, Ontario, Canada, June 5–10
10.
Zurück zum Zitat Schulz KW, Ulerich R, Malaya N, Bauman PT, Stogner R, Simmons C (2012) Early experiences porting scientific applications to the many integrated core (MIC) platform. In: TACC-Intel highly parallel computing symposium, Austin, TX, April 10–11 Schulz KW, Ulerich R, Malaya N, Bauman PT, Stogner R, Simmons C (2012) Early experiences porting scientific applications to the many integrated core (MIC) platform. In: TACC-Intel highly parallel computing symposium, Austin, TX, April 10–11
11.
Zurück zum Zitat Glenn Brook R, Hadri B, Betro VC, Hulguin RC, Braby R (2012) Early application experiences with the Intel MIC architecture in a cray CX1. Cray User Group Meeting, Stuttgart, Germany, April 29–May 3. 2012, paper no.194 Glenn Brook R, Hadri B, Betro VC, Hulguin RC, Braby R (2012) Early application experiences with the Intel MIC architecture in a cray CX1. Cray User Group Meeting, Stuttgart, Germany, April 29–May 3. 2012, paper no.194
12.
Zurück zum Zitat Satish N, Kim C, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Can traditional programming bridge the ninja performance gap for parallel computing applications? ISCA, pp 440–451 Satish N, Kim C, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Can traditional programming bridge the ninja performance gap for parallel computing applications? ISCA, pp 440–451
13.
Zurück zum Zitat Williams S, Kalamkar DD et al (2012) Optimization of geometric multigrid for emerging multi- and manycore processors. SC’12, Salt Lake City, Utah, USA, paper no. 96 Williams S, Kalamkar DD et al (2012) Optimization of geometric multigrid for emerging multi- and manycore processors. SC’12, Salt Lake City, Utah, USA, paper no. 96
14.
Zurück zum Zitat Cramer T, Schmidl D, Klemmy M, an Mey D (2012) OpenMP programming on Intel Xeon Phi coprocessors an early performance comparison. Many-core applications research community symposium, pp 38–44 Cramer T, Schmidl D, Klemmy M, an Mey D (2012) OpenMP programming on Intel Xeon Phi coprocessors an early performance comparison. Many-core applications research community symposium, pp 38–44
16.
Zurück zum Zitat Koesterke L, Milfeld K et al (2013) Optimizing the PCIT algorithm on Stampede’s Xeon and Xeon Phi processors for faster discovery of biological networks. XSEDE’13, San Diego, CA, USA, July 22–25 Koesterke L, Milfeld K et al (2013) Optimizing the PCIT algorithm on Stampede’s Xeon and Xeon Phi processors for faster discovery of biological networks. XSEDE’13, San Diego, CA, USA, July 22–25
17.
Zurück zum Zitat Meng Q, Humphrey A, Berzins M, Schmidt J (2013) Preliminary experiences with the Uintah framework on Intel Xeon Phi and stampede. XSEDE’13, San Diego, California, USA, July 22–25 Meng Q, Humphrey A, Berzins M, Schmidt J (2013) Preliminary experiences with the Uintah framework on Intel Xeon Phi and stampede. XSEDE’13, San Diego, California, USA, July 22–25
18.
Zurück zum Zitat Cadambi S, Coviello G, Li C-H, Phull R, Rao K, Sankaradass M, Chakradhar S (2013) COSMIC: middleware for high performance and reliable multiprocessing on Xeon Phi Coprocessors. HPDC’13, New York, NY, USA, June 17–21, pp 215–226 Cadambi S, Coviello G, Li C-H, Phull R, Rao K, Sankaradass M, Chakradhar S (2013) COSMIC: middleware for high performance and reliable multiprocessing on Xeon Phi Coprocessors. HPDC’13, New York, NY, USA, June 17–21, pp 215–226
19.
Zurück zum Zitat Li Yuqian, Che Yonggang, Wang Zhenghua (2013) Performance evaluation and scalability analysis of NPB-MZ on Intel Xeon Phi coprocessor. Commun Comput Inf Sci 396:153–162CrossRef Li Yuqian, Che Yonggang, Wang Zhenghua (2013) Performance evaluation and scalability analysis of NPB-MZ on Intel Xeon Phi coprocessor. Commun Comput Inf Sci 396:153–162CrossRef
21.
Zurück zum Zitat Van der Wijngaart RF, Jin H (2003) NAS parallel benchmarks, multi-zone versions. NAS Technical Report NAS-03-010 Van der Wijngaart RF, Jin H (2003) NAS parallel benchmarks, multi-zone versions. NAS Technical Report NAS-03-010
23.
Zurück zum Zitat Deng X, Mao M, Tu G et al (2010) Extending the fifth-order weighted compact nonlinear scheme to complex grids with characteristic-based interface conditions. AIAA J 48(12):2840–2851 Deng X, Mao M, Tu G et al (2010) Extending the fifth-order weighted compact nonlinear scheme to complex grids with characteristic-based interface conditions. AIAA J 48(12):2840–2851
24.
Zurück zum Zitat Deng Xiaogang, Mao Meiliang, Zhang Hanxin, Zhang Yifeng (2012) High-order and high accurate CFD methods and their applications for complex grid problems. J Comput Phys 11(4):1081–1102MathSciNet Deng Xiaogang, Mao Meiliang, Zhang Hanxin, Zhang Yifeng (2012) High-order and high accurate CFD methods and their applications for complex grid problems. J Comput Phys 11(4):1081–1102MathSciNet
25.
Zurück zum Zitat Che Y-G, Zhang L-L, Wang Y-X, Xu C-F, Liu W, Wang Z-H, Liu H-Y (2012) Uniprocessor performance tuning of a structured grid based parallel CFD application. In: Annual conference on high performance computing of China, Zhangjiajie, China, October 29–31, pp 39–46 (in Chinese with English abstract) Che Y-G, Zhang L-L, Wang Y-X, Xu C-F, Liu W, Wang Z-H, Liu H-Y (2012) Uniprocessor performance tuning of a structured grid based parallel CFD application. In: Annual conference on high performance computing of China, Zhangjiajie, China, October 29–31, pp 39–46 (in Chinese with English abstract)
27.
29.
Zurück zum Zitat Intel Corporation (2013) Intel 64 and IA-32 architectures optimization reference manual. Order number: 248966-028 Intel Corporation (2013) Intel 64 and IA-32 architectures optimization reference manual. Order number: 248966-028
30.
Zurück zum Zitat Serdjuk N (2012) Enabling huge paging on MIC with libhugetlbfs library. Intel Corporation Serdjuk N (2012) Enabling huge paging on MIC with libhugetlbfs library. Intel Corporation
31.
Zurück zum Zitat Intel Corporation (2012) Intel Xeon Phi coprocessor (codename: Knights Corner) Performance Monitoring Units. Revision 1.01 Intel Corporation (2012) Intel Xeon Phi coprocessor (codename: Knights Corner) Performance Monitoring Units. Revision 1.01
32.
Zurück zum Zitat Intel Corporation (2013) Intel 64 and IA-32 architectures software developer’s manual combined volumes Intel Corporation (2013) Intel 64 and IA-32 architectures software developer’s manual combined volumes
33.
Zurück zum Zitat Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76CrossRef Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76CrossRef
34.
Zurück zum Zitat Sun Xian-He, Wang Dawei (2012) APC: a performance metric of memory systems. ACM Sigmetrics Perform Eval Rev 40(2):125–130CrossRef Sun Xian-He, Wang Dawei (2012) APC: a performance metric of memory systems. ACM Sigmetrics Perform Eval Rev 40(2):125–130CrossRef
Metadaten
Titel
Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications
verfasst von
Yonggang Che
Lilun Zhang
Yongxian Wang
Chuanfu Xu
Wei Liu
Zhenghua Wang
Publikationsdatum
01.10.2014
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1245-3

Weitere Artikel der Ausgabe 1/2014

The Journal of Supercomputing 1/2014 Zur Ausgabe

Premium Partner