nach oben

The Journal of Supercomputing

Erschienen in:

01.10.2014

Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications

verfasst von: Yonggang Che, Lilun Zhang, Yongxian Wang, Chuanfu Xu, Wei Liu, Zhenghua Wang

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper comparatively evaluates the microarchitectural performance of two representative Computational Fluid Dynamics (CFD) applications on the Intel Many Integrated Core (MIC) product, the Intel Knights Corner (KNC) coprocessor, and the Intel Sand Bridge (SNB) processor. Performance Monitoring Unit-based measurement method is used, along with a two-phase measurement method and some considerations to minimize the errors and instabilities. The results show that the CFD applications are sensitive to architecture factors. Their single thread performance and efficiency on KNC are much lower than that on SNB. Branch prediction and memory access are two primary factors that make the performance difference. The applications’ low-computational intensity and inefficient vector instruction usage are two additional factors. To be more efficient for the CFD applications, the MIC architecture needs to improve its branch prediction mechanism and memory hierarchy. Fine tuning of application codes is also crucial and is hard work.

Vorheriger Artikel Modeling computational limitations in H-Phy and Overlay-NoC architectures

Nächster Artikel Performance scalability and energy consumption on distributed and many-core platforms

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Intel Corporation. Many Integrated Core (MIC) Architecture. http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html

Intel Corporation (2012) Intel Xeon Phi coprocessor datasheet

Jeffers J, Reinders J (2013) Intel Xeon Phi coprocessor high performance programming. Morgan Kaufmann Press, Menlo Park

Intel Corporation (2012) An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors, Rev 20121015

Top500 Supercomputers sites. http://www.top500.org/

Kanter D (2010) Intels sandy bridge microarchitecture. http://www.realworldtech.com/sandy-bridge/

Raman K (2013) Sandias molecular dynamics miniMD performance optimizations

Kamruzzaman M, Swanson S, Tullsen DM (2010) Data software, spreading: leveraging distributed caches to improve single thread performance. PLDI’10, Toronto, Ontario, Canada, June 5–10

Wellein G, Hager G (2012) Performance engineering for multi- and manycores: unveiling the mysteries of application performance. Invited session “Application performance: lessons learned from petascale computing” at ISC12, June 18, 2012. http://blogs.fau.de/hager/files/2010/09/Hager-ISC12

10.

Schulz KW, Ulerich R, Malaya N, Bauman PT, Stogner R, Simmons C (2012) Early experiences porting scientific applications to the many integrated core (MIC) platform. In: TACC-Intel highly parallel computing symposium, Austin, TX, April 10–11

11.

Glenn Brook R, Hadri B, Betro VC, Hulguin RC, Braby R (2012) Early application experiences with the Intel MIC architecture in a cray CX1. Cray User Group Meeting, Stuttgart, Germany, April 29–May 3. 2012, paper no.194

12.

Satish N, Kim C, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Can traditional programming bridge the ninja performance gap for parallel computing applications? ISCA, pp 440–451

13.

Williams S, Kalamkar DD et al (2012) Optimization of geometric multigrid for emerging multi- and manycore processors. SC’12, Salt Lake City, Utah, USA, paper no. 96

14.

Cramer T, Schmidl D, Klemmy M, an Mey D (2012) OpenMP programming on Intel Xeon Phi coprocessors an early performance comparison. Many-core applications research community symposium, pp 38–44

15.

Vladimirov A, Karpusenko V (2013) Test-driving Intel Xeon Phi coprocessors with a basic N-body simulation. http://goparallel.sourceforge.net/wp-content/uploads/2013/01/Colfax_Nbody_Xeon_Phi

16.

Koesterke L, Milfeld K et al (2013) Optimizing the PCIT algorithm on Stampede’s Xeon and Xeon Phi processors for faster discovery of biological networks. XSEDE’13, San Diego, CA, USA, July 22–25

17.

Meng Q, Humphrey A, Berzins M, Schmidt J (2013) Preliminary experiences with the Uintah framework on Intel Xeon Phi and stampede. XSEDE’13, San Diego, California, USA, July 22–25

18.

Cadambi S, Coviello G, Li C-H, Phull R, Rao K, Sankaradass M, Chakradhar S (2013) COSMIC: middleware for high performance and reliable multiprocessing on Xeon Phi Coprocessors. HPDC’13, New York, NY, USA, June 17–21, pp 215–226

19.

Li Yuqian, Che Yonggang, Wang Zhenghua (2013) Performance evaluation and scalability analysis of NPB-MZ on Intel Xeon Phi coprocessor. Commun Comput Inf Sci 396:153–162CrossRef

20.

http://www.nas.nasa.gov/Software/NPB/

21.

Van der Wijngaart RF, Jin H (2003) NAS parallel benchmarks, multi-zone versions. NAS Technical Report NAS-03-010

22.

Xiaogang Deng, Hanxin Zhang (2000) Developing high-order accurate nonlinear schemes. J Comput Phys 165:22–44MathSciNetCrossRefMATH

23.

Deng X, Mao M, Tu G et al (2010) Extending the fifth-order weighted compact nonlinear scheme to complex grids with characteristic-based interface conditions. AIAA J 48(12):2840–2851

24.

Deng Xiaogang, Mao Meiliang, Zhang Hanxin, Zhang Yifeng (2012) High-order and high accurate CFD methods and their applications for complex grid problems. J Comput Phys 11(4):1081–1102MathSciNet

25.

Che Y-G, Zhang L-L, Wang Y-X, Xu C-F, Liu W, Wang Z-H, Liu H-Y (2012) Uniprocessor performance tuning of a structured grid based parallel CFD application. In: Annual conference on high performance computing of China, Zhangjiajie, China, October 29–31, pp 39–46 (in Chinese with English abstract)

26.

Intel Corporation (2013) Multiplying matrices using dgemm. http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/tutorials/mkl_mmx_f/GUID-36BFBCE9-EB0A-43B0-ADAF-2B65275726EA.htm

27.

Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14:189–204. http://icl.cs.utk.edu/papi/

28.

http://www.intel.com/software/products/vtune/

29.

Intel Corporation (2013) Intel 64 and IA-32 architectures optimization reference manual. Order number: 248966-028

30.

Serdjuk N (2012) Enabling huge paging on MIC with libhugetlbfs library. Intel Corporation

31.

Intel Corporation (2012) Intel Xeon Phi coprocessor (codename: Knights Corner) Performance Monitoring Units. Revision 1.01

32.

Intel Corporation (2013) Intel 64 and IA-32 architectures software developer’s manual combined volumes

33.

Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76CrossRef

34.

Sun Xian-He, Wang Dawei (2012) APC: a performance metric of memory systems. ACM Sigmetrics Perform Eval Rev 40(2):125–130CrossRef

35.

McCalpin JD (2012) Some comments on the Xeon Phi coprocessor. Posted on November 17, 2012. http://blogs.utexas.edu/jdm4372/2012/11/17/some-comments-on-the-xeon-phi-coprocessor/

Titel: Microarchitectural performance comparison of Intel Knights Corner and Intel Sandy Bridge with CFD applications
verfasst von: Yonggang Che
Lilun Zhang
Yongxian Wang
Chuanfu Xu
Wei Liu
Zhenghua Wang
Publikationsdatum: 01.10.2014
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 1/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-014-1245-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 1/2014

An anonymous mobile user authentication protocol using self-certified public keys based on multi-server architectures

A study on security strategy in ICT convergence environment

Improved deleted file recovery technique for Ext2/3 filesystem

Probabilistic odd–even: an adaptive wormhole routing algorithm for 2D mesh network-on-chip

An efficient mutual authentication RFID scheme based on elliptic curve cryptography

Editorial of special section on Hybrid Information Security Technologies: part II

Premium Partner