Skip to main content
Top
Published in: The Journal of Supercomputing 2/2021

30-05-2020

Applying the swept rule for solving explicit partial differential equations on heterogeneous computing systems

Authors: Daniel J. Magee, Anthony S. Walker, Kyle E. Niemeyer

Published in: The Journal of Supercomputing | Issue 2/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Applications that exploit the architectural details of high-performance computing (HPC) systems have become increasingly invaluable in academia and industry over the past two decades. The most important hardware development of the last decade in HPC has been the general purpose graphics processing unit (GPGPU), a class of massively parallel devices that now contributes the majority of computational power in the top 500 supercomputers. As these systems grow, small costs such as latency—due to the fixed cost of memory accesses and communication—accumulate in a large simulation and become a significant barrier to performance. The swept time-space decomposition rule is a communication-avoiding technique for time-stepping stencil update formulas that attempts to reduce latency costs. This work extends the swept rule by targeting heterogeneous, CPU/GPU architectures representing current and future HPC systems. We compare our approach to a naive decomposition scheme with two test equations using an MPI+CUDA pattern on 40 processes over two nodes containing one GPU. The swept rule produces a factor of 1.9 to 23 speedup for the heat equation and a factor of 1.1 to 2.0 speedup for the Euler equations, using the same processors and work distribution, and with the best possible configurations. These results show the potential effectiveness of the swept rule for different equations and numerical schemes on massively parallel compute systems that incur substantial latency costs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Alhubail and Wang [3] use the term “straight” decomposition where we use Classic.
 
2
Here we use “block” and “domain” interchangeably to represent a domain of dependence; the term comes from the GPU/CUDA construct representing a collection of threads.
 
Literature
4.
go back to reference Alhubail M, Wang Q (2017) Improving the strong parallel scalability of CFD schemes via the swept domain decomposition rule. In: 55th AIAA Aerospace Sciences Meeting, Grapevine, Texas, American Institute of Aeronautics and Astronautics, January 2017. https://doi.org/10.2514/6.2017-1218 Alhubail M, Wang Q (2017) Improving the strong parallel scalability of CFD schemes via the swept domain decomposition rule. In: 55th AIAA Aerospace Sciences Meeting, Grapevine, Texas, American Institute of Aeronautics and Astronautics, January 2017. https://​doi.​org/​10.​2514/​6.​2017-1218
5.
go back to reference Alhubail MM, Wang Q, Williams J (2016) The swept rule for breaking the latency barrier in time advancing two-dimensional PDEs. arXiv:1602.07558 [cs.NA] Alhubail MM, Wang Q, Williams J (2016) The swept rule for breaking the latency barrier in time advancing two-dimensional PDEs. arXiv:​1602.​07558 [cs.NA]
7.
go back to reference Ballard G, Demmel J, Holtz O, Schwartz O (2011) Minimizing communication in numerical linear algebra. SIAM J Matrix Anal Appl 32(3):866–901MathSciNetCrossRef Ballard G, Demmel J, Holtz O, Schwartz O (2011) Minimizing communication in numerical linear algebra. SIAM J Matrix Anal Appl 32(3):866–901MathSciNetCrossRef
9.
go back to reference Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC ’08, IEEE Press, Piscataway, NJ, USA, pp 4:1–4:12 Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J, Yelick K (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC ’08, IEEE Press, Piscataway, NJ, USA, pp 4:1–4:12
10.
go back to reference Demmel J, Hoemmen M, Mohiyuddin M, Yelick K (2008) Avoiding communication in sparse matrix computations. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, IEEE, pp 1–12 Demmel J, Hoemmen M, Mohiyuddin M, Yelick K (2008) Avoiding communication in sparse matrix computations. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, IEEE, pp 1–12
11.
go back to reference Emmett M, Minion M (2012) Toward an efficient parallel in time method for partial differential equations. Commun Appl Math Comput Sci 7(1):105–132MathSciNetCrossRef Emmett M, Minion M (2012) Toward an efficient parallel in time method for partial differential equations. Commun Appl Math Comput Sci 7(1):105–132MathSciNetCrossRef
13.
go back to reference Falgout RD, Friedhoff S, Kolev TV, MacLachlan SP, Schroder JB (2014) Parallel time integration with multigrid. SIAM J Sci Comput 36(6):C635–C661MathSciNetCrossRef Falgout RD, Friedhoff S, Kolev TV, MacLachlan SP, Schroder JB (2014) Parallel time integration with multigrid. SIAM J Sci Comput 36(6):C635–C661MathSciNetCrossRef
15.
go back to reference Gander MJ, Güttel S (2013) Paraexp: a parallel integrator for linear initial-value problems. SIAM J Sci Comput 35(2):C123–C142MathSciNetCrossRef Gander MJ, Güttel S (2013) Paraexp: a parallel integrator for linear initial-value problems. SIAM J Sci Comput 35(2):C123–C142MathSciNetCrossRef
16.
go back to reference Gander MJ, Neumuller M (2016) Analysis of a new space-time parallel multigrid algorithm for parabolic problems. SIAM J Sci Comput 38(4):A2173–A2208MathSciNetCrossRef Gander MJ, Neumuller M (2016) Analysis of a new space-time parallel multigrid algorithm for parabolic problems. SIAM J Sci Comput 38(4):A2173–A2208MathSciNetCrossRef
17.
go back to reference Huerta YA, Swartz B, Lilja DJ (2017) Determining work partitioning on closely coupled heterogeneous computing systems using statistical design of experiments. In: 2017 IEEE International Symposium on Workload Characterization (IISWC), October 2017, pp 118–119. https://doi.org/10.1109/IISWC.2017.8167766 Huerta YA, Swartz B, Lilja DJ (2017) Determining work partitioning on closely coupled heterogeneous computing systems using statistical design of experiments. In: 2017 IEEE International Symposium on Workload Characterization (IISWC), October 2017, pp 118–119. https://​doi.​org/​10.​1109/​IISWC.​2017.​8167766
19.
go back to reference Khabou A, Demmel JW, Grigori L, Gu M (2013) LU factorization with panel rank revealing pivoting and its communication avoiding version. SIAM J Matrix Anal Appl 34(3):1401–1429MathSciNetCrossRef Khabou A, Demmel JW, Grigori L, Gu M (2013) LU factorization with panel rank revealing pivoting and its communication avoiding version. SIAM J Matrix Anal Appl 34(3):1401–1429MathSciNetCrossRef
20.
go back to reference Lions J-L, Maday Y, Turinici G (2001) R ’e solution of edp by a sch é ma en temps guillemotleft parar ’e el guillemotright. Proc Acad Sci Ser I Math 332(7):661–668 Lions J-L, Maday Y, Turinici G (2001) R ’e solution of edp by a sch é ma en temps guillemotleft parar ’e el guillemotright. Proc Acad Sci Ser I Math 332(7):661–668
26.
go back to reference Mills RT, Rupp K, Adams M, Brown J, Isaac T, Knepley M, Smith B, Zhang H (2017) Software strategy and experiences with manycore processor support in PETSc. In: SIAM Pacific Northwest Regional Conference, October 2017 Mills RT, Rupp K, Adams M, Brown J, Isaac T, Knepley M, Smith B, Zhang H (2017) Software strategy and experiences with manycore processor support in PETSc. In: SIAM Pacific Northwest Regional Conference, October 2017
27.
go back to reference Minion ML, Speck R, Bolten M, Emmett M, Ruprecht D (2015) Interweaving PFASST and parallel multigrid. SIAM J Sci Comput 37(5):S244–S263MathSciNetCrossRef Minion ML, Speck R, Bolten M, Emmett M, Ruprecht D (2015) Interweaving PFASST and parallel multigrid. SIAM J Sci Comput 37(5):S244–S263MathSciNetCrossRef
28.
go back to reference Solomonik E, Ballard G, Demmel J, Hoefler T (2017) A communication-avoiding parallel algorithm for the symmetric eigenvalue problem. In: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pp 111–121 Solomonik E, Ballard G, Demmel J, Hoefler T (2017) A communication-avoiding parallel algorithm for the symmetric eigenvalue problem. In: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pp 111–121
Metadata
Title
Applying the swept rule for solving explicit partial differential equations on heterogeneous computing systems
Authors
Daniel J. Magee
Anthony S. Walker
Kyle E. Niemeyer
Publication date
30-05-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 2/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03340-9

Other articles of this Issue 2/2021

The Journal of Supercomputing 2/2021 Go to the issue

Premium Partner