Skip to main content

2013 | OriginalPaper | Buchkapitel

3. Efficiency, Energy Efficiency and Programming of Accelerated HPC Servers: Highlights of PRACE Studies

verfasst von : Lennart Johnsson

Erschienen in: GPU Solutions to Multi-scale Problems in Science and Engineering

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

During the last few years the convergence in architecture for High-Performance Computing systems that took place for over a decade has been replaced by a divergence. The divergence is driven by the quest for performance, cost-performance and in the last few years also energy consumption that during the life-time of a system have come to exceed the HPC system cost in many cases. Mass market, specialized processors, such as the Cell Broadband Engine (CBE) and Graphics Processors, have received particular attention, the latter especially after hardware support for double-precision floating-point arithmetic was introduced about three years ago. The recent support of Error Correcting Code (ECC) for memory and significantly enhanced performance for double-precision arithmetic in the current generation of Graphic Processing Units (GPUs) have further solidified the interest in GPUs for HPC. In order to assess the issues involved in potentially deploying clusters with nodes consisting of commodity microprocessors with some type of specialized processor for enhanced performance or enhanced energy efficiency or both for science and engineering workloads, PRACE, the Partnership for Advanced Computing in Europe, undertook a study that included three types of accelerators, the CBE, GPUs and ClearSpeed, and tools for their programming. The study focused on assessing performance, efficiency, power efficiency for double-precision arithmetic and programmer productivity. Four kernels, matrix multiplication, sparse matrix-vector multiplication, FFT, random number generation were used for the assessment together with High-Performance Linpack (HPL) and a few application codes. We report here on the results from the kernels and HPL for GPU and ClearSpeed accelerated systems. The GPU performed surprisingly significantly better than the CPU on the sparse matrix-vector multiplication on which the ClearSpeed performed surprisingly poorly. For matrix-multiplication, HPL and FFT the ClearSpeed accelerator was by far the most energy efficient device.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ali A, Johnsson L, Mirkovic D (2007) Empirical auto-tuning code generator for FFT and trigonometric transforms. Paper presented at the 5th workshop on optimizations for DSP and embedded systems. International symposium on code generation and optimization, San Jose Ali A, Johnsson L, Mirkovic D (2007) Empirical auto-tuning code generator for FFT and trigonometric transforms. Paper presented at the 5th workshop on optimizations for DSP and embedded systems. International symposium on code generation and optimization, San Jose
Zurück zum Zitat Belady CL (2007) In the data center, power and cooling costs more than the IT equipment it supports. Electronics cooling Belady CL (2007) In the data center, power and cooling costs more than the IT equipment it supports. Electronics cooling
Zurück zum Zitat Christadler I, Weinberg V (2010) RapidMind: portability across architectures and its limitations. Paper presented at the facing the multi-core challenge (conference proceedings), Heidelberg Christadler I, Weinberg V (2010) RapidMind: portability across architectures and its limitations. Paper presented at the facing the multi-core challenge (conference proceedings), Heidelberg
Zurück zum Zitat Clark J (1980) A VLSI geometry processor for graphics. Comput Mag 13(7):59–68 Clark J (1980) A VLSI geometry processor for graphics. Comput Mag 13(7):59–68
Zurück zum Zitat Clark J (1982) The geometry engine: a VLSI geometry systems for graphics. Comput Graph 16(3):127–133CrossRef Clark J (1982) The geometry engine: a VLSI geometry systems for graphics. Comput Graph 16(3):127–133CrossRef
Zurück zum Zitat Colella P (2004) Defining software requirements for scientific computing Colella P (2004) Defining software requirements for scientific computing
Zurück zum Zitat Cray-1 Computer System (1976) Cray Research, Inc, Minnesota Cray-1 Computer System (1976) Cray Research, Inc, Minnesota
Zurück zum Zitat Dongarra J, Graybill R, Harrod W, Lucas R, Lusk E, Luszczek P, Tikir M (2008) DARPA’s HPCS program: history, models, tools, languages. Adv Comput 72:1–100CrossRef Dongarra J, Graybill R, Harrod W, Lucas R, Lusk E, Luszczek P, Tikir M (2008) DARPA’s HPCS program: history, models, tools, languages. Adv Comput 72:1–100CrossRef
Zurück zum Zitat ESC Corporation (ed) LDS-1/PDP-10 display system. Evans and Sutherland Computer Corporation, Salt Lake City ESC Corporation (ed) LDS-1/PDP-10 display system. Evans and Sutherland Computer Corporation, Salt Lake City
Zurück zum Zitat Ghuloum A, Smith T, Wu G, Zhou X, Fang J, Guo P, So B, Rajagopalan M, Chen Y, Chen B (2007b) Future-proof data parallel algorithms and software on Intel® multi-core architecture. Intel Technol J 11(4):333–348 Ghuloum A, Smith T, Wu G, Zhou X, Fang J, Guo P, So B, Rajagopalan M, Chen Y, Chen B (2007b) Future-proof data parallel algorithms and software on Intel® multi-core architecture. Intel Technol J 11(4):333–348
Zurück zum Zitat Grochowski E, Annavaram M (2006) Energy per instruction trends in Intel® microprocessors Grochowski E, Annavaram M (2006) Energy per instruction trends in Intel® microprocessors
Zurück zum Zitat Hills WD (1989) The connection machine. MIT Press, Cambridge Hills WD (1989) The connection machine. MIT Press, Cambridge
Zurück zum Zitat ILLIAC IV (1972) Corporation system characteristics and programming manual. Burroughs corporation ILLIAC IV (1972) Corporation system characteristics and programming manual. Burroughs corporation
Zurück zum Zitat Introduction to Parallel GPU Computing (2010) Center for scalable application development software Introduction to Parallel GPU Computing (2010) Center for scalable application development software
Zurück zum Zitat Johnsson L (2011) Overview of data centers energy efficiency evolution. In: Ranka S, Ahmad I (eds) Handbook of green computing. CRC Press, New York Johnsson L (2011) Overview of data centers energy efficiency evolution. In: Ranka S, Ahmad I (eds) Handbook of green computing. CRC Press, New York
Zurück zum Zitat Kanellos M (2001) Intel’s accidental revolution. CNET news. Accessed from CNET News website Kanellos M (2001) Intel’s accidental revolution. CNET news. Accessed from CNET News website
Zurück zum Zitat Kennedy K, Koelbel C, Schreiber R (2004) Defining and measuring the productivity of programming languages. Int J High Perform Comput Appl 18(4):441–448CrossRef Kennedy K, Koelbel C, Schreiber R (2004) Defining and measuring the productivity of programming languages. Int J High Perform Comput Appl 18(4):441–448CrossRef
Zurück zum Zitat McCool MD (2007) RapidMind multi-core development platform. CASCON Cell Workshop McCool MD (2007) RapidMind multi-core development platform. CASCON Cell Workshop
Zurück zum Zitat McCool MD (2008) Developing for GPUs, cell, and multi-core CPUs using a unified programming model. Linux J McCool MD (2008) Developing for GPUs, cell, and multi-core CPUs using a unified programming model. Linux J
Zurück zum Zitat Mirkovic D, Mahasoom R, Johnsson L (2000) An adaptive software library for fast fourier transforms. Paper presented at the 2000 international conference on supercomputing, Santa Fe Mirkovic D, Mahasoom R, Johnsson L (2000) An adaptive software library for fast fourier transforms. Paper presented at the 2000 international conference on supercomputing, Santa Fe
Zurück zum Zitat Moore GE (1965) Craming more components onto integrated circuits. Electronics 38(8):114–117 Moore GE (1965) Craming more components onto integrated circuits. Electronics 38(8):114–117
Zurück zum Zitat Petitet A, Whaley RC, Dongarra J, Cleary A (2008) HPL–a portable implementation of the high-performance Linpack benchmark for distributed-memory computers, University of Tennessee Computer Science Department. Accessed 2 May 2011, from http://www.netlib.org/benchmark/hpl/, University of Tennessee Computer Science Department Petitet A, Whaley RC, Dongarra J, Cleary A (2008) HPL–a portable implementation of the high-performance Linpack benchmark for distributed-memory computers, University of Tennessee Computer Science Department. Accessed 2 May 2011, from http://​www.​netlib.​org/​benchmark/​hpl/​, University of Tennessee Computer Science Department
Zurück zum Zitat Pollack F (1999) New microarchitecture challenges in the coming generations of CMOS process technologies. Paper presented at the proceedings of the 32nd annual IEEE/ACM international symposium on microarchitecture, Haifa Pollack F (1999) New microarchitecture challenges in the coming generations of CMOS process technologies. Paper presented at the proceedings of the 32nd annual IEEE/ACM international symposium on microarchitecture, Haifa
Zurück zum Zitat PRACE (2009) Preparatory phase project, Deliverable 8.3.1, technical component assessment and development, report PRACE (2009) Preparatory phase project, Deliverable 8.3.1, technical component assessment and development, report
Zurück zum Zitat Shalf J, Donofrio D, Oliker L, Wehner M (2006) Green flash: application driven system design for power efficient HPC. Paper presented at the Salishan conference on high-speed computing Shalf J, Donofrio D, Oliker L, Wehner M (2006) Green flash: application driven system design for power efficient HPC. Paper presented at the Salishan conference on high-speed computing
Zurück zum Zitat Single Chip 4-Bit P-Channel Microprocessor (1987) Intel corporation Single Chip 4-Bit P-Channel Microprocessor (1987) Intel corporation
Zurück zum Zitat Tesla C2050/C2070 GPU Computing Processor (2010) NVIDIA Corporation Tesla C2050/C2070 GPU Computing Processor (2010) NVIDIA Corporation
Zurück zum Zitat The Green500 (2010) Green 500: ranking the worlds most energy-efficient supercomputers. Accessed 2 May 2011, from www.green500.org, The Green500 The Green500 (2010) Green 500: ranking the worlds most energy-efficient supercomputers. Accessed 2 May 2011, from www.​green500.​org, The Green500
Zurück zum Zitat Thornton JE (1970) The design of a computer: the control data 6600. Scott, Foresman and Company, Glenview Thornton JE (1970) The design of a computer: the control data 6600. Scott, Foresman and Company, Glenview
Metadaten
Titel
Efficiency, Energy Efficiency and Programming of Accelerated HPC Servers: Highlights of PRACE Studies
verfasst von
Lennart Johnsson
Copyright-Jahr
2013
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-16405-7_3