Skip to main content
Erschienen in: The Journal of Supercomputing 9/2020

06.01.2020

Performance drop at executing communication-intensive parallel algorithms

verfasst von: José A. Moríñigo, Pablo García-Muller, Antonio J. Rubio-Montero, Antonio Gómez-Iglesias, Norbert Meyer, Rafael Mayo-García

Erschienen in: The Journal of Supercomputing | Ausgabe 9/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work summarizes the results of a set of executions completed on three fat-tree network supercomputers: Stampede at TACC (USA), Helios at IFERC (Japan) and Eagle at PSNC (Poland). Three MPI-based, communication-intensive scientific applications compiled for CPUs have been executed under weak-scaling tests: the molecular dynamics solver LAMMPS; the finite element-based mini-kernel miniFE of NERSC (USA); and the three-dimensional fast Fourier transform mini-kernel bigFFT of LLNL (USA). The design of the experiments focuses on the sensitivity of the applications to rather different patterns of task location, to assess the impact on the cluster performance. The accomplished weak-scaling tests stress the effect of the MPI-based application mappings (concentrated vs. distributed patterns of MPI tasks over the nodes) on the cluster. Results reveal that highly distributed task patterns may imply a much larger execution time in scale, when several hundreds or thousands of MPI tasks are involved in the experiments. Such a characterization serves users to carry out further, more efficient executions. Also researchers may use these experiments to improve their scalability simulators. In addition, these results are useful from the clusters administration standpoint since tasks mapping has an impact on the cluster throughput.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
8.
Zurück zum Zitat Moríñigo JA, García-Muller P, Rubio-Montero AJ, Gómez-Iglesias A, Meyer N, Mayo-García R (2019) Benchmarking LAMMPS: sensitivity to task location under CPU-based weak-scaling. In: High Performance Computing, Proceedings of the 5th Latin American Conference (CARLA 2018), Bucaramanga, Colombia—Communication in Computer and Information Science, vol 979, pp 224–238. https://doi.org/10.1007/978-3-030-16205-4_17 Moríñigo JA, García-Muller P, Rubio-Montero AJ, Gómez-Iglesias A, Meyer N, Mayo-García R (2019) Benchmarking LAMMPS: sensitivity to task location under CPU-based weak-scaling. In: High Performance Computing, Proceedings of the 5th Latin American Conference (CARLA 2018), Bucaramanga, Colombia—Communication in Computer and Information Science, vol 979, pp 224–238. https://​doi.​org/​10.​1007/​978-3-030-16205-4_​17
12.
Zurück zum Zitat Rodrigues ER, Madruga FL, Navaux POA, Panetta J (2009) Multi-core aware process mapping and its impact on communication overhead of parallel applications. In: Proceedings of the IEEE Symposium Computers and Communications, Sousse, Tunisia, pp 811–817. https://doi.org/10.1109/iscc.2009.5202271 Rodrigues ER, Madruga FL, Navaux POA, Panetta J (2009) Multi-core aware process mapping and its impact on communication overhead of parallel applications. In: Proceedings of the IEEE Symposium Computers and Communications, Sousse, Tunisia, pp 811–817. https://​doi.​org/​10.​1109/​iscc.​2009.​5202271
13.
Zurück zum Zitat León EA, Karlin I, Moody AT (2016) System noise revisited: enabling application scalability and reproducibility with SMT. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Chicago, USA, pp 596–607. https://doi.org/10.1109/ipdps.2016.48 León EA, Karlin I, Moody AT (2016) System noise revisited: enabling application scalability and reproducibility with SMT. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Chicago, USA, pp 596–607. https://​doi.​org/​10.​1109/​ipdps.​2016.​48
14.
Zurück zum Zitat Chai L, Gao Q, Panda DK (2007) Understanding the impact of multi-core architecture in cluster computing: a case study with intel dual-core system. In: Proceedings of the 7th IEEE International Symposium Cluster Computing and the Grid (CCGrid), Rio De Janeiro, Brazil, pp 471–478. https://doi.org/10.1109/ccgrid.2007.119 Chai L, Gao Q, Panda DK (2007) Understanding the impact of multi-core architecture in cluster computing: a case study with intel dual-core system. In: Proceedings of the 7th IEEE International Symposium Cluster Computing and the Grid (CCGrid), Rio De Janeiro, Brazil, pp 471–478. https://​doi.​org/​10.​1109/​ccgrid.​2007.​119
15.
Zurück zum Zitat Shainer G, Lui P, Liu T, Wilde T, Layton J (2011) The impact of inter-node latency versus intra-node latency on HPC applications. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, pp 455–460. https://doi.org/10.2316/p.2011.757-005 Shainer G, Lui P, Liu T, Wilde T, Layton J (2011) The impact of inter-node latency versus intra-node latency on HPC applications. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, pp 455–460. https://​doi.​org/​10.​2316/​p.​2011.​757-005
16.
Zurück zum Zitat Xingfu W, Taylor V (2009) Using processor partitioning to evaluate the performance of MPI, OpenMP and hybrid parallel applications on dual- and quad-core cray XT4 systems. In: Compute the Future. Proceedings of the Cray User Group (CUG 2009), Atlanta, USA Xingfu W, Taylor V (2009) Using processor partitioning to evaluate the performance of MPI, OpenMP and hybrid parallel applications on dual- and quad-core cray XT4 systems. In: Compute the Future. Proceedings of the Cray User Group (CUG 2009), Atlanta, USA
19.
Zurück zum Zitat Xingfu W, Taylor V (2007) Processor partitioning: an experimental performance analysis of parallel applications on SMP clusters systems. In: 19th IASTED Conference Parallel Distributed Computing and Systems (PDCS07), Cambridge, USA, pp 13–18 Xingfu W, Taylor V (2007) Processor partitioning: an experimental performance analysis of parallel applications on SMP clusters systems. In: 19th IASTED Conference Parallel Distributed Computing and Systems (PDCS07), Cambridge, USA, pp 13–18
21.
Zurück zum Zitat McKenna G (2007) Performance analysis and optimisation of LAMMPS on XCmaster, HPCx and BlueGene. University of Edinburgh, EPCC, Edinburgh McKenna G (2007) Performance analysis and optimisation of LAMMPS on XCmaster, HPCx and BlueGene. University of Edinburgh, EPCC, Edinburgh
22.
Zurück zum Zitat Liu J (2010) LAMMPS on advanced SGI architectures. White paper SGI Liu J (2010) LAMMPS on advanced SGI architectures. White paper SGI
23.
Zurück zum Zitat León EA, Rosenthal E (2014) Characterizing applications sensitivity to network performance. In: Supercomputing Conference (SC’14), Poster, New Orleans, USA León EA, Rosenthal E (2014) Characterizing applications sensitivity to network performance. In: Supercomputing Conference (SC’14), Poster, New Orleans, USA
24.
Zurück zum Zitat León EA, Karlin I, Bhatele A, Langer SH, Chambreau C, Howell LH, D’Hooge T, Leininger ML (2016) Characterizing parallel scientific applications on commodity clusters: an empirical study of a tapered fat-tree. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), Salt Lake City, USA León EA, Karlin I, Bhatele A, Langer SH, Chambreau C, Howell LH, D’Hooge T, Leininger ML (2016) Characterizing parallel scientific applications on commodity clusters: an empirical study of a tapered fat-tree. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16), Salt Lake City, USA
25.
Zurück zum Zitat Jain N, Bhatele A, Howell LH, Böhme D, Karlin I, León EA, Mubarak M, Wolfe N, Gamblin T, Leininger ML (2017) Predicting the performance impact of different fat-tree configurations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17), Denver, USA, pp 50:1–50:13. https://doi.org/10.1145/3126908.3126967 Jain N, Bhatele A, Howell LH, Böhme D, Karlin I, León EA, Mubarak M, Wolfe N, Gamblin T, Leininger ML (2017) Predicting the performance impact of different fat-tree configurations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17), Denver, USA, pp 50:1–50:13. https://​doi.​org/​10.​1145/​3126908.​3126967
26.
Zurück zum Zitat Choi DJ, Lockwood G, Sinkovits RS, Tatineni M (2014) Performance of applications using dual-rail InfiniBand 3D torus network on the gordon supercomputer. In: Conference on Extreme Science and Engineering Discovery Environment (XSEDE’14), Atlanta, GA, USA, pp 43:1–43:6. https://doi.org/10.1145/2616498.2616541 Choi DJ, Lockwood G, Sinkovits RS, Tatineni M (2014) Performance of applications using dual-rail InfiniBand 3D torus network on the gordon supercomputer. In: Conference on Extreme Science and Engineering Discovery Environment (XSEDE’14), Atlanta, GA, USA, pp 43:1–43:6. https://​doi.​org/​10.​1145/​2616498.​2616541
27.
Zurück zum Zitat Cornebize T, Heinrich F, Legrand A, Vienne J (2017) Emulating high performance linpack on a commodity server at the scale of a supercomputer, HAL-id: hal-01654804 Cornebize T, Heinrich F, Legrand A, Vienne J (2017) Emulating high performance linpack on a commodity server at the scale of a supercomputer, HAL-id: hal-01654804
29.
Zurück zum Zitat Pollard SA, Jain N, Herbein S, Bhatele A (2018) Evaluation of an interference-free node allocation policy on fat-tree clusters. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’18), Dallas, USA. https://doi.org/10.1109/SC.2018.00029 Pollard SA, Jain N, Herbein S, Bhatele A (2018) Evaluation of an interference-free node allocation policy on fat-tree clusters. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’18), Dallas, USA. https://​doi.​org/​10.​1109/​SC.​2018.​00029
30.
Zurück zum Zitat León EA, Chambreau C, Leininger ML (2017) What do scientific applications need? An empirical study of multirail network bandwidth. In: 7th International Conference on Advanced Communications and Computations (INFOCOMP 2017), Venice, Italy, pp 35–39 León EA, Chambreau C, Leininger ML (2017) What do scientific applications need? An empirical study of multirail network bandwidth. In: 7th International Conference on Advanced Communications and Computations (INFOCOMP 2017), Venice, Italy, pp 35–39
32.
Zurück zum Zitat Radulovic M, Asifuzzaman K, Carpenter P, Radojkovic P, Ayguadé E (2018) HPC benchmarking: scaling right and looking beyond the average. In: Proceedings of the 24th International European Conference on Parallel and Distributed Computing (EuroPAR 2018), LNCS, vol 11014, pp 135–146. https://doi.org/10.1007/978-3-319-96983-1_10 Radulovic M, Asifuzzaman K, Carpenter P, Radojkovic P, Ayguadé E (2018) HPC benchmarking: scaling right and looking beyond the average. In: Proceedings of the 24th International European Conference on Parallel and Distributed Computing (EuroPAR 2018), LNCS, vol 11014, pp 135–146. https://​doi.​org/​10.​1007/​978-3-319-96983-1_​10
42.
Zurück zum Zitat Richards DF, Glosli JN, Chan B, Dorr MR, Draeger EW et al (2009) Beyond homogeneous decomposition: scaling long-range forces on massively parallel systems. In: Proceedings of the International Conference on High Performance Computing Networking, Storage and Analysis (SC’09), art. nº 60. Portland, USA. https://doi.org/10.1145/1654059.1654121 Richards DF, Glosli JN, Chan B, Dorr MR, Draeger EW et al (2009) Beyond homogeneous decomposition: scaling long-range forces on massively parallel systems. In: Proceedings of the International Conference on High Performance Computing Networking, Storage and Analysis (SC’09), art. nº 60. Portland, USA. https://​doi.​org/​10.​1145/​1654059.​1654121
44.
Zurück zum Zitat Plimpton S, Pollock R, Stevens M (1997) Particle-mesh Ewald and rRESPA for parallel molecular dynamics simulations. In: SIAM 8th Conference on Parallel Processing for Scientific Computing Plimpton S, Pollock R, Stevens M (1997) Particle-mesh Ewald and rRESPA for parallel molecular dynamics simulations. In: SIAM 8th Conference on Parallel Processing for Scientific Computing
46.
Zurück zum Zitat Qiao P, Wang X, Yang X, Fan Y, Lan Z (2017) Preliminary interference study about job placement and routing algorithms in the fat-tree topology for HPC applications. In: IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, USA, pp 641–642. https://doi.org/10.1109/CLUSTER.2017.90 Qiao P, Wang X, Yang X, Fan Y, Lan Z (2017) Preliminary interference study about job placement and routing algorithms in the fat-tree topology for HPC applications. In: IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, USA, pp 641–642. https://​doi.​org/​10.​1109/​CLUSTER.​2017.​90
Metadaten
Titel
Performance drop at executing communication-intensive parallel algorithms
verfasst von
José A. Moríñigo
Pablo García-Muller
Antonio J. Rubio-Montero
Antonio Gómez-Iglesias
Norbert Meyer
Rafael Mayo-García
Publikationsdatum
06.01.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 9/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-019-03142-8

Weitere Artikel der Ausgabe 9/2020

The Journal of Supercomputing 9/2020 Zur Ausgabe