Skip to main content
Erschienen in: The Journal of Supercomputing 2/2017

18.06.2016

Hierarchical redesign of classic MPI reduction algorithms

verfasst von: Khalid Hasanov, Alexey Lastovetsky

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Optimization of MPI collective communication operations has been an active research topic since the advent of MPI in 1990s. Many general and architecture-specific collective algorithms have been proposed and implemented in the state-of-the-art MPI implementations. Hierarchical topology-oblivious transformation of existing communication algorithms has been recently proposed as a new promising approach to optimization of MPI collective communication algorithms and MPI-based applications. This approach has been successfully applied to the most popular parallel matrix multiplication algorithm, SUMMA, and the state-of-the-art MPI broadcast algorithms, demonstrating significant multifold performance gains, especially for large-scale HPC systems. In this paper, we apply this approach to optimization of the MPI Reduce and Allreduce operations. Theoretical analysis and experimental results on a cluster of Grid’5000 platform are presented.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Rabenseifner R (2004) Optimization of collective reduction operations. In: 2004 International conference on computational science, pp 1–9 Rabenseifner R (2004) Optimization of collective reduction operations. In: 2004 International conference on computational science, pp 1–9
3.
Zurück zum Zitat Venkata MG et al (2013) Optimizing blocking and nonblocking reduction operations for multicore systems: hierarchical design and implementation. In: 2013 IEEE international conference on cluster computing, pp 1–8 Venkata MG et al (2013) Optimizing blocking and nonblocking reduction operations for multicore systems: hierarchical design and implementation. In: 2013 IEEE international conference on cluster computing, pp 1–8
4.
Zurück zum Zitat Hasanov K, Quintin JN, Lastovetsky A (2015) Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms. J Supercomput 71(11):3991–4014CrossRef Hasanov K, Quintin JN, Lastovetsky A (2015) Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms. J Supercomput 71(11):3991–4014CrossRef
5.
Zurück zum Zitat Hasanov K, Quintin JN, Lastovetsky A (2014) High-level topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms. In: Euro-Par 2014: parallel processing workshops, lecture notes in computer science, vol 8806, Springer, New York, pp 412–424 Hasanov K, Quintin JN, Lastovetsky A (2014) High-level topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms. In: Euro-Par 2014: parallel processing workshops, lecture notes in computer science, vol 8806, Springer, New York, pp 412–424
6.
Zurück zum Zitat de Geijn RA, Jerrell W (1997) SUMMA: scalable universal matrix multiplication algorithm. Concurr Pract Exp 9(4):255–274CrossRef de Geijn RA, Jerrell W (1997) SUMMA: scalable universal matrix multiplication algorithm. Concurr Pract Exp 9(4):255–274CrossRef
7.
Zurück zum Zitat Hasanov K, Lastovetsky A (2015) Hierarchical optimization of MPI reduce algorithms. In: PaCT 2015, lecture notes in computer science, vol 9251, Springer, New York, pp 21–34 Hasanov K, Lastovetsky A (2015) Hierarchical optimization of MPI reduce algorithms. In: PaCT 2015, lecture notes in computer science, vol 9251, Springer, New York, pp 21–34
8.
Zurück zum Zitat Gabriel E, Fagg G, Bosilca G, Angskun T, Dongarra J et al (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: EuroPVM/MPI 2004, lecture notes in computer science, vol 3241, Springer, New York, pp 97–104 Gabriel E, Fagg G, Bosilca G, Angskun T, Dongarra J et al (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: EuroPVM/MPI 2004, lecture notes in computer science, vol 3241, Springer, New York, pp 97–104
9.
Zurück zum Zitat Bala V, Bruck J, Cypher R, Elustondo P, Ho C-T, Ho C-T, Kipnis S, Snir M (1995) CCL: a portable and tunable collective communication library for scalable parallel computers. IEEE Trans Parallel Distrib Syst 6(2):154–164CrossRef Bala V, Bruck J, Cypher R, Elustondo P, Ho C-T, Ho C-T, Kipnis S, Snir M (1995) CCL: a portable and tunable collective communication library for scalable parallel computers. IEEE Trans Parallel Distrib Syst 6(2):154–164CrossRef
10.
Zurück zum Zitat Barnett M, Shuler L, van De Geijn R, Gupta S, Payne DG, Watts J (1994) Interprocessor collective communication library (InterCom). In: IEEE scalable high-performance computing conference, pp 357–364 Barnett M, Shuler L, van De Geijn R, Gupta S, Payne DG, Watts J (1994) Interprocessor collective communication library (InterCom). In: IEEE scalable high-performance computing conference, pp 357–364
11.
Zurück zum Zitat Kielmann T, Hofman RF, Bal HE, Plaat A, Bhoedjang RA (1999) MagPIe: MPI’s collective communication operations for clustered wide area systems. ACM Sigplan Notices 34(8):131–140CrossRef Kielmann T, Hofman RF, Bal HE, Plaat A, Bhoedjang RA (1999) MagPIe: MPI’s collective communication operations for clustered wide area systems. ACM Sigplan Notices 34(8):131–140CrossRef
12.
Zurück zum Zitat Chan EW, Heimlich MF, Purkayastha A, Van de Geijn RA (2004) On optimizing collective communication. In: 2004 IEEE international conference on cluster computing, pp 145–155 Chan EW, Heimlich MF, Purkayastha A, Van de Geijn RA (2004) On optimizing collective communication. In: 2004 IEEE international conference on cluster computing, pp 145–155
13.
Zurück zum Zitat Vadhiyar SS, Fagg GE, Dongarra J (2000) Automatically tuned collective communications. In: ACM/IEEE conference on supercomputing, p 3 Vadhiyar SS, Fagg GE, Dongarra J (2000) Automatically tuned collective communications. In: ACM/IEEE conference on supercomputing, p 3
14.
Zurück zum Zitat Hockney RW (1994) The communication challenge for MPP: intel paragon and Meiko CS-2. Parallel Comput 20(3):389–398CrossRef Hockney RW (1994) The communication challenge for MPP: intel paragon and Meiko CS-2. Parallel Comput 20(3):389–398CrossRef
15.
Zurück zum Zitat Pjes̆ivac-Grbović J (2007) Towards automatic and adaptive optimizations of MPI collective operations. PhD thesis, University of Tennessee, Knoxville Pjes̆ivac-Grbović J (2007) Towards automatic and adaptive optimizations of MPI collective operations. PhD thesis, University of Tennessee, Knoxville
16.
Zurück zum Zitat Lastovetsky A, Rychkov V, O’Flynn M (2008) MPIBlib: Benchmarking MPI communications for parallel computing on homogeneous and heterogeneous clusters. In: EuroPVM/MPI 2008, lecture notes in computer science, vol 5205, Springer, New York, pp 227–238 Lastovetsky A, Rychkov V, O’Flynn M (2008) MPIBlib: Benchmarking MPI communications for parallel computing on homogeneous and heterogeneous clusters. In: EuroPVM/MPI 2008, lecture notes in computer science, vol 5205, Springer, New York, pp 227–238
17.
Zurück zum Zitat Hasanov K, Quintin JN, Lastovetsky A (2015) Topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms. Simul Model Pract Theory 58:30–39CrossRef Hasanov K, Quintin JN, Lastovetsky A (2015) Topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms. Simul Model Pract Theory 58:30–39CrossRef
18.
Zurück zum Zitat Hasanov K (2015) Hierarchical approach to optimization of MPI collective communication algorithms. PhD. thesis, University College Dublin Hasanov K (2015) Hierarchical approach to optimization of MPI collective communication algorithms. PhD. thesis, University College Dublin
Metadaten
Titel
Hierarchical redesign of classic MPI reduction algorithms
verfasst von
Khalid Hasanov
Alexey Lastovetsky
Publikationsdatum
18.06.2016
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2017
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-016-1779-7

Weitere Artikel der Ausgabe 2/2017

The Journal of Supercomputing 2/2017 Zur Ausgabe

Premium Partner