Skip to main content
Erschienen in: The Journal of Supercomputing 3/2013

01.09.2013

An (almost) direct deployment of the Fast Multipole Method on the Cell processor

verfasst von: Pierre Fortin, Jean-Luc Lamotte

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents the first deployment of the Fast Multipole Method on the Cell processor (PowerXCell 8i). We rely on the matrix formulation with BLAS routines of the FMB code (Fast Multipole with BLAS) in order to directly and efficiently offload the most time consuming operators of both far field and near field computations on the Cell heterogeneous cores. We detail the difficulties that had to be solved first, and we finally obtain a deployment in single and double precisions, which scales linearly on several Cell blades and which is able to handle both uniform and non-uniform distributions of particles. We also present our performance results and comparisons with multicore CPUs, as well as the limitations of our deployment on the Cell processor.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cheng H, Greengard L, Rokhlin V (1999) A fast adaptive multipole algorithm in three dimensions. J Comput Phys 155:468–498 MathSciNetMATHCrossRef Cheng H, Greengard L, Rokhlin V (1999) A fast adaptive multipole algorithm in three dimensions. J Comput Phys 155:468–498 MathSciNetMATHCrossRef
2.
Zurück zum Zitat Dongarra J, Sullivan F (2000) Guest editors’ introduction: the top 10 algorithms. Comput Sci Eng 2(1):22–23 CrossRef Dongarra J, Sullivan F (2000) Guest editors’ introduction: the top 10 algorithms. Comput Sci Eng 2(1):22–23 CrossRef
3.
Zurück zum Zitat Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G (2009) A massively parallel adaptive fast-multipole method on heterogeneous architectures. In: SC’09, 58 Lashuk I, Chandramowlishwaran A, Langston H, Nguyen TA, Sampath R, Shringarpure A, Vuduc R, Ying L, Zorin D, Biros G (2009) A massively parallel adaptive fast-multipole method on heterogeneous architectures. In: SC’09, 58
4.
Zurück zum Zitat Arora N, Shringarpure A, Vuduc R (2009) Direct N-body kernels for multicore platforms. In: ICPP’09, pp 379–387 Arora N, Shringarpure A, Vuduc R (2009) Direct N-body kernels for multicore platforms. In: ICPP’09, pp 379–387
5.
Zurück zum Zitat Knight TJ, Park JY, Ren M, Houston M, Erez M, Fatahalian K, Aiken A, Dally WJ, Hanrahan P (2007) Compilation for explicitly managed memory hierarchies. In: PPoPP’07, pp 226–236 Knight TJ, Park JY, Ren M, Houston M, Erez M, Fatahalian K, Aiken A, Dally WJ, Hanrahan P (2007) Compilation for explicitly managed memory hierarchies. In: PPoPP’07, pp 226–236
6.
Zurück zum Zitat De Fabritiis G (2007) Performance of the cell processor for biomolecular simulations. Comput Phys Commun 176:660–664 CrossRef De Fabritiis G (2007) Performance of the cell processor for biomolecular simulations. Comput Phys Commun 176:660–664 CrossRef
7.
Zurück zum Zitat Luttmann E, Ensign D, Vaidyanathan V, Houston M, Rimon N, Øland J, Jayachandran G, Friedrichs M, Pande V (2009) Accelerating molecular dynamic simulation on the cell processor and Playstation 3. J Comput Chem 30(2):268–274 CrossRef Luttmann E, Ensign D, Vaidyanathan V, Houston M, Rimon N, Øland J, Jayachandran G, Friedrichs M, Pande V (2009) Accelerating molecular dynamic simulation on the cell processor and Playstation 3. J Comput Chem 30(2):268–274 CrossRef
8.
Zurück zum Zitat Swaminarayan S, Kadau K, Germann TC, Fossum GC (2008) 369 Tflop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer. In: SC’08 Swaminarayan S, Kadau K, Germann TC, Fossum GC (2008) 369 Tflop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer. In: SC’08
10.
Zurück zum Zitat Yokota R, Bardhan JP, Knepley MG, Barba LA, Hamada T (2011) Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns. Comput Phys Commun 182(6):1272–1283 MATHCrossRef Yokota R, Bardhan JP, Knepley MG, Barba LA, Hamada T (2011) Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns. Comput Phys Commun 182(6):1272–1283 MATHCrossRef
11.
Zurück zum Zitat Chandramowlishwaran A, Williams S, Oliker L, Lashuk I, Biros G, Vuduc R (2010) Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures. In: IPDPS’10 Chandramowlishwaran A, Williams S, Oliker L, Lashuk I, Biros G, Vuduc R (2010) Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures. In: IPDPS’10
12.
Zurück zum Zitat Hu Q, Gumerov NA, Duraiswami R (2011) Scalable fast multipole methods on distributed heterogeneous architectures. In: SC’11 Hu Q, Gumerov NA, Duraiswami R (2011) Scalable fast multipole methods on distributed heterogeneous architectures. In: SC’11
13.
Zurück zum Zitat Hu Q, Gumerov NA, Duraiswami R (2012) Scalable distributed fast multipole methods. In: HPCC’12 Hu Q, Gumerov NA, Duraiswami R (2012) Scalable distributed fast multipole methods. In: HPCC’12
14.
Zurück zum Zitat Yokota R, Barba L (2012) Hierarchical N-body simulations with autotuning for heterogeneous systems. Comput Sci Eng 14(3):30–39 CrossRef Yokota R, Barba L (2012) Hierarchical N-body simulations with autotuning for heterogeneous systems. Comput Sci Eng 14(3):30–39 CrossRef
15.
Zurück zum Zitat Coulaud O, Fortin P, Roman J (2008) High performance BLAS formulation of the multipole-to-local operator in the fast multipole method. J Comput Phys 227(3):1836–1862 MathSciNetMATHCrossRef Coulaud O, Fortin P, Roman J (2008) High performance BLAS formulation of the multipole-to-local operator in the fast multipole method. J Comput Phys 227(3):1836–1862 MathSciNetMATHCrossRef
16.
Zurück zum Zitat Coulaud O, Fortin P, Roman J (2010) High-performance BLAS formulation of the adaptive fast multipole method. Math Comput Model 51(3–4):177–188 MathSciNetMATHCrossRef Coulaud O, Fortin P, Roman J (2010) High-performance BLAS formulation of the adaptive fast multipole method. Math Comput Model 51(3–4):177–188 MathSciNetMATHCrossRef
17.
Zurück zum Zitat Takahashi T, Cecka C, Fong W, Darve E (2012) Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units. Int J Numer Methods Eng 89(1):105–133 MATHCrossRef Takahashi T, Cecka C, Fong W, Darve E (2012) Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units. Int J Numer Methods Eng 89(1):105–133 MATHCrossRef
18.
Zurück zum Zitat Nyland L, Harris M, Prins J (2007) Fast N-body simulation with CUDA. GPU Gems 3:677–695 Nyland L, Harris M, Prins J (2007) Fast N-body simulation with CUDA. GPU Gems 3:677–695
19.
Zurück zum Zitat Fortin P, Lamotte JL (2009) Fast multipole method on the cell broadband engine: the near field part. In: ParCo’2009, vol 19, pp 323–330 Fortin P, Lamotte JL (2009) Fast multipole method on the cell broadband engine: the near field part. In: ParCo’2009, vol 19, pp 323–330
20.
Zurück zum Zitat IBM (2008) Basic linear algebra subprograms library programmer’s guide and API reference, software development kit for multicore acceleration version 3.1 IBM (2008) Basic linear algebra subprograms library programmer’s guide and API reference, software development kit for multicore acceleration version 3.1
21.
Zurück zum Zitat Bourgerie Q, Fortin P, Lamotte JL (2010) Efficient complex matrix multiplication on the synergistic processing element of the CEll processor. In: PPAAC’10 Bourgerie Q, Fortin P, Lamotte JL (2010) Efficient complex matrix multiplication on the synergistic processing element of the CEll processor. In: PPAAC’10
23.
Zurück zum Zitat Coulaud O, Fortin P, Roman J (2007) Hybrid MPI-thread parallelization of the fast multipole method. In: ISPDC’07, pp 391–398 Coulaud O, Fortin P, Roman J (2007) Hybrid MPI-thread parallelization of the fast multipole method. In: ISPDC’07, pp 391–398
24.
Zurück zum Zitat Arevalo A, Matinata RM, Pandian M, Peri E, Ruby K, Thomas F, Almond C (2008) Programming the cell broadband engine architecture, examples and best practices. In: IBM redbook, SG24-SG7575 Arevalo A, Matinata RM, Pandian M, Peri E, Ruby K, Thomas F, Almond C (2008) Programming the cell broadband engine architecture, examples and best practices. In: IBM redbook, SG24-SG7575
25.
Zurück zum Zitat IBM (2008) Cell broadband engine programming handbook, including the PowerXCell 8i processor. Version 1.11 IBM (2008) Cell broadband engine programming handbook, including the PowerXCell 8i processor. Version 1.11
26.
Zurück zum Zitat Williams SW, Shalf J, Oliker L, Husbands P, Yelick K (2005) Dense and sparse matrix operations on the cell processor. LBNL paper LBNL-58253 Williams SW, Shalf J, Oliker L, Husbands P, Yelick K (2005) Dense and sparse matrix operations on the cell processor. LBNL paper LBNL-58253
27.
Zurück zum Zitat Kurzak J, Buttari A, Dongarra J (2008) Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans Parallel Distrib Syst 19(9):1175–1186 CrossRef Kurzak J, Buttari A, Dongarra J (2008) Solving systems of linear equations on the CELL processor using Cholesky factorization. IEEE Trans Parallel Distrib Syst 19(9):1175–1186 CrossRef
28.
Zurück zum Zitat Kurzak J, Alvaro W, Dongarra J (2009) Optimizing matrix multiplication for a short-vector SIMD architecture—CELL processor. Parallel Comput 35(3):138–150 CrossRef Kurzak J, Alvaro W, Dongarra J (2009) Optimizing matrix multiplication for a short-vector SIMD architecture—CELL processor. Parallel Comput 35(3):138–150 CrossRef
29.
Zurück zum Zitat Kistler M, Gunnels J, Brokenshire D, Benton B (2009) Programming the Linpack benchmark for the IBM PowerXCell 8i processor. Sci Program 17(1–2):43–57 Kistler M, Gunnels J, Brokenshire D, Benton B (2009) Programming the Linpack benchmark for the IBM PowerXCell 8i processor. Sci Program 17(1–2):43–57
30.
Zurück zum Zitat Hamada T, Narumi T, Yokota R, Yasuoka K, Nitadori K, Taiji M (2009) 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In: SC’09, 62 Hamada T, Narumi T, Yokota R, Yasuoka K, Nitadori K, Taiji M (2009) 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In: SC’09, 62
31.
Zurück zum Zitat Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput: Pract Exper 23(2):87–198 CrossRef Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput: Pract Exper 23(2):87–198 CrossRef
Metadaten
Titel
An (almost) direct deployment of the Fast Multipole Method on the Cell processor
verfasst von
Pierre Fortin
Jean-Luc Lamotte
Publikationsdatum
01.09.2013
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 3/2013
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-013-0877-z

Weitere Artikel der Ausgabe 3/2013

The Journal of Supercomputing 3/2013 Zur Ausgabe

Premium Partner