nach oben

The Journal of Supercomputing

Erschienen in:

01.09.2013

An (almost) direct deployment of the Fast Multipole Method on the Cell processor

verfasst von: Pierre Fortin, Jean-Luc Lamotte

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper presents the first deployment of the Fast Multipole Method on the Cell processor (PowerXCell 8i). We rely on the matrix formulation with BLAS routines of the FMB code (Fast Multipole with BLAS) in order to directly and efficiently offload the most time consuming operators of both far field and near field computations on the Cell heterogeneous cores. We detail the difficulties that had to be solved first, and we finally obtain a deployment in single and double precisions, which scales linearly on several Cell blades and which is able to handle both uniform and non-uniform distributions of particles. We also present our performance results and comparisons with multicore CPUs, as well as the limitations of our deployment on the Cell processor.

Vorheriger Artikel Network numerical analysis for the smoother and the lagged joint-process estimator

Nächster Artikel Extending goal-oriented parallel computer job scheduling policies to heterogeneous systems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Cheng H, Greengard L, Rokhlin V (1999) A fast adaptive multipole algorithm in three dimensions. J Comput Phys 155:468–498 MathSciNetMATHCrossRef

Dongarra J, Sullivan F (2000) Guest editors’ introduction: the top 10 algorithms. Comput Sci Eng 2(1):22–23 CrossRef

Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G (2009) A massively parallel adaptive fast-multipole method on heterogeneous architectures. In: SC’09, 58

Arora N, Shringarpure A, Vuduc R (2009) Direct N-body kernels for multicore platforms. In: ICPP’09, pp 379–387

Knight TJ, Park JY, Ren M, Houston M, Erez M, Fatahalian K, Aiken A, Dally WJ, Hanrahan P (2007) Compilation for explicitly managed memory hierarchies. In: PPoPP’07, pp 226–236

De Fabritiis G (2007) Performance of the cell processor for biomolecular simulations. Comput Phys Commun 176:660–664 CrossRef

Luttmann E, Ensign D, Vaidyanathan V, Houston M, Rimon N, Øland J, Jayachandran G, Friedrichs M, Pande V (2009) Accelerating molecular dynamic simulation on the cell processor and Playstation 3. J Comput Chem 30(2):268–274 CrossRef

Swaminarayan S, Kadau K, Germann TC, Fossum GC (2008) 369 Tflop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer. In: SC’08

Gumerov NA, Duraiswami R (2008) Fast multipole methods on graphics processors. J Comput Phys 227:8290–8313 MathSciNetMATHCrossRef

10.

Yokota R, Bardhan JP, Knepley MG, Barba LA, Hamada T (2011) Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns. Comput Phys Commun 182(6):1272–1283 MATHCrossRef

11.

Chandramowlishwaran A, Williams S, Oliker L, Lashuk I, Biros G, Vuduc R (2010) Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures. In: IPDPS’10

12.

Hu Q, Gumerov NA, Duraiswami R (2011) Scalable fast multipole methods on distributed heterogeneous architectures. In: SC’11

13.

Hu Q, Gumerov NA, Duraiswami R (2012) Scalable distributed fast multipole methods. In: HPCC’12

14.

Yokota R, Barba L (2012) Hierarchical N-body simulations with autotuning for heterogeneous systems. Comput Sci Eng 14(3):30–39 CrossRef

15.

Coulaud O, Fortin P, Roman J (2008) High performance BLAS formulation of the multipole-to-local operator in the fast multipole method. J Comput Phys 227(3):1836–1862 MathSciNetMATHCrossRef

16.

Coulaud O, Fortin P, Roman J (2010) High-performance BLAS formulation of the adaptive fast multipole method. Math Comput Model 51(3–4):177–188 MathSciNetMATHCrossRef

17.

Takahashi T, Cecka C, Fong W, Darve E (2012) Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units. Int J Numer Methods Eng 89(1):105–133 MATHCrossRef

18.

Nyland L, Harris M, Prins J (2007) Fast N-body simulation with CUDA. GPU Gems 3:677–695

19.

Fortin P, Lamotte JL (2009) Fast multipole method on the cell broadband engine: the near field part. In: ParCo’2009, vol 19, pp 323–330

20.

IBM (2008) Basic linear algebra subprograms library programmer’s guide and API reference, software development kit for multicore acceleration version 3.1

21.

Bourgerie Q, Fortin P, Lamotte JL (2010) Efficient complex matrix multiplication on the synergistic processing element of the CEll processor. In: PPAAC’10

22.

Fortin P, Lamotte JL (2013) The fast multipole method on the cell processor. Research report hal-00770484, LIP6. http://hal.archives-ouvertes.fr/hal-00770484

23.

Coulaud O, Fortin P, Roman J (2007) Hybrid MPI-thread parallelization of the fast multipole method. In: ISPDC’07, pp 391–398

24.

Arevalo A, Matinata RM, Pandian M, Peri E, Ruby K, Thomas F, Almond C (2008) Programming the cell broadband engine architecture, examples and best practices. In: IBM redbook, SG24-SG7575

25.

IBM (2008) Cell broadband engine programming handbook, including the PowerXCell 8i processor. Version 1.11

26.

Williams SW, Shalf J, Oliker L, Husbands P, Yelick K (2005) Dense and sparse matrix operations on the cell processor. LBNL paper LBNL-58253

27.

Kurzak J, Buttari A, Dongarra J (2008) Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans Parallel Distrib Syst 19(9):1175–1186 CrossRef

28.

Kurzak J, Alvaro W, Dongarra J (2009) Optimizing matrix multiplication for a short-vector SIMD architecture—CELL processor. Parallel Comput 35(3):138–150 CrossRef

29.

Kistler M, Gunnels J, Brokenshire D, Benton B (2009) Programming the Linpack benchmark for the IBM PowerXCell 8i processor. Sci Program 17(1–2):43–57

30.

Hamada T, Narumi T, Yokota R, Yasuoka K, Nitadori K, Taiji M (2009) 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In: SC’09, 62

31.

Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput: Pract Exper 23(2):87–198 CrossRef

Titel: An (almost) direct deployment of the Fast Multipole Method on the Cell processor
verfasst von: Pierre Fortin
Jean-Luc Lamotte
Publikationsdatum: 01.09.2013
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 3/2013
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-013-0877-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 3/2013

Parallel construction of independent spanning trees and an application in diagnosis on Möbius cubes

Improving multiple sequence alignment biological accuracy through genetic algorithms

uBench: exposing the impact of CUDA block geometry in terms of performance

Modeling energy consumption for master–slave applications

Parallel simulation of Brownian dynamics on shared memory systems with OpenMP and Unified Parallel C

Deployment optimization of software objects by design-level delay estimation

Premium Partner