Top

Published in:

2013 | OriginalPaper | Chapter

4. GRAPE and GRAPE-DR

Author : Junichiro Makino

Published in: GPU Solutions to Multi-scale Problems in Science and Engineering

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We describe the architecture and performance of GRAPE-DR (Greatly Reduced Array of Processor Elements with Data Reduction). It operates as an accelerator attached to general-purpose PCs or x86-based servers. The processor chip of a GRAPE-DR board have 512 cores operating at the clock frequency of 400 MHz. The peak speed of a processor chip is 410 Gflops (single precision) or 205 Gflops (double precision). A GRAPE-DR board consists of four GRAPE-DR chips, each with its own local memory of 256 MB. Thus, a GRAPE-DR board has the theoretical peak speed of 1.64 SP and 0.82 DP Tflops. Its power consumption is around 200 W. The application area of GRAPE-DR covers particle-based simulations such as astrophysical many-body simulations and molecular-dynamics simulations, quantum chemistry calculations, various applications which requires dense matrix operations, and many other compute-intensive applications. The architecture of GRAPE-DR is in many ways similar to those of modern GPUs, since the evolutionary tracks are rather similar. GPUs have evolved from specialized hardwired logic for specific operations to a more general-purpose computing engine, in order to meet the perform complex shading and other operations. The predecessor of GRAPE-DR is GRAPE (GRAvity PipE), which is a specialized pipeline processor for gravitational $N$-body simulations. We have changed the architecture to extend the range of applications. There are two main differences between GRAPE-DR and GPGPU. One is the transistor and power efficiency. With 90 nm technology and 400M transistors, we have integrated 512 processor cores and achieved the speed of 400 Gflops at 400 MHz clock and 50 W. A Fermi processor from NVIDIA integrates 448 processors with 3B transistors and achieved the speed of 1.03 Tflops at 1.15 GHz and 247 W. Thus, Fermi achieved 2.5 times higher speed compared to GRAPE-DR, with 2.9 times higher clock, 8 times more transistors, and 5 times more power consumption. The other is the external memory bandwidth. GPUs typically have the memory bandwidth of around 100 GB/s, while our GRAPE-DR card, with 4 chips, have only 16 GB/s. Thus, the range of application is somewhat limited, but for suitable applications, the performance and performance per watt of GRAPE-DR is quite good. The single-card performance of HPL benchmark is 480 Gflops for matrix size of t 48 k, and for 81 cards 37 Tflops.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Efficiency, Energy Efficiency and Programming of Accelerated HPC Servers: Highlights of PRACE Studies

next chapter Parray: A Unifying Array Representation for Heterogeneous Parallelism

Barnes J, Hut P (1986) A hiearchical o(nlogn) force calculation algorithm. Nature 324:446–449CrossRef

Fukushige T, Taiji M, Makino J, Ebisuzaki T, Sugimoto D (1996) A highly-parallelized special-purchase computer for many-body simulations with an arbitrary central force: Md-grape. ApJ 468:51–61CrossRef

Greengard L, Rokhlin V (1987) A fast algorithm for particle simulations. J Comput Phys 73:325–348MathSciNetMATHCrossRef

Hamada T, Fukushige T, Kawai A, Makino J (1999) Progrape-1: a programmable, multi-purpose computer for many-body simulations. PASJ (submitted)

Ito T, Makino J, Ebisuzaki T, Sugimoto D (1990) A special-purpose n-body machine grape-1. Comput Phys Commun 60:187–194MATHCrossRef

Ito T, Makino J, Fukushige T, Ebisuzaki T, Okumura SK, Sugimoto D (1993) A special-purpose computer for n-body simulations: Grape-2a. PASJ 45:339–347

Kawai A, Fukushige T (2006) $158/gflop astrophysical n-body simulation with a reconfigurable add-in card and a hierarchical tree algorithm, 2006

Kawai A, Fukushige T, Makino J, Taiji M (2000) Grape-5: a special-purpose computer for n-body simulations. PASJ 52:659–676

Makino J, Ito T, Ebisuzaki T (1990) Error analysis of the grape-1 special-purpose n-body machine. PASJ 42:717–736

Makino J, Taiji M, Ebisuzaki T, Sugimoto D (1997) Grape-4: a massively parallel special-purpose computer for collisional n-body simulations. ApJ 480:432–446CrossRef

Makino J, Fukushige T, Koga M, Namura K (2003) GRAPE-6: massively-parallel special-purpose computer for astrophysical particle simulations. PASJ 55:1163–1187

Makino J, Hiraki K, Inaba M (2007) Grape-dr: 2-pflops massively-parallel computer with 512-core, 512-gflops processor chips for scientific computing. In: Proceedings of SC07. ACM, 2007 (Online)

Taiji M, Narumi T, Ohno Y, Futatsugi N, Suenaga A, Takada N, Konagaya A (2003) Protein explorer: a petaflops special-purpose computer system for molecular dynamics simulations. In: The SC2003 proceedings, pages CD-ROM, 2003. IEEE, Los Alamitos

Title: GRAPE and GRAPE-DR
Author: Junichiro Makino
Publisher: Springer Berlin Heidelberg
Book: GPU Solutions to Multi-scale Problems in Science and Engineering
Print ISBN: 978-3-642-16404-0

Electronic ISBN: 978-3-642-16405-7

Copyright Year: 2013
DOI: https://doi.org/10.1007/978-3-642-16405-7_4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner