Skip to main content

2016 | OriginalPaper | Buchkapitel

Multigrid at Scale?

verfasst von : Mark Ainsworth, Christian Glusa

Erschienen in: Numerical Mathematics and Advanced Applications ENUMATH 2015

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The reduced reliability of next generation exascale systems means that the resiliency properties of a numerical algorithm will become an important factor in both the choice of algorithm, and in its analysis. The multigrid algorithm is the workhorse for the distributed solution of linear systems but little is known about its resiliency properties and convergence behavior in a fault-prone environment. In the current work, we propose a probabilistic model for the effect of faults involving random diagonal matrices. We summarize results of the theoretical analysis of the model for the rate of convergence of fault-prone multigrid methods which show that the standard multigrid method will not be resilient. Finally, we present a modification of the standard multigrid algorithm that will be resilient.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat M. Ainsworth, C. Glusa, Is the multigrid method fault-tolerant? The Two Grid Case (Submitted) M. Ainsworth, C. Glusa, Is the multigrid method fault-tolerant? The Two Grid Case (Submitted)
2.
Zurück zum Zitat M. Ainsworth, C. Glusa, Is the multigrid method fault-tolerant? The Multi Grid Case (In preparation) M. Ainsworth, C. Glusa, Is the multigrid method fault-tolerant? The Multi Grid Case (In preparation)
3.
Zurück zum Zitat A. Avižienis, J.-C. Laprie, B. Randell, C. Landwehr, Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secure Comput. 1 (1), 11–33 (2004)CrossRef A. Avižienis, J.-C. Laprie, B. Randell, C. Landwehr, Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secure Comput. 1 (1), 11–33 (2004)CrossRef
4.
Zurück zum Zitat P. Bougerol, J. Lacroix, Products of Random Matrices with Applications to Schrödinger Operators. Progress in Probability and Statistics, vol. 8 (Birkhäuser Boston Inc., Boston, 1985) P. Bougerol, J. Lacroix, Products of Random Matrices with Applications to Schrödinger Operators. Progress in Probability and Statistics, vol. 8 (Birkhäuser Boston Inc., Boston, 1985)
5.
Zurück zum Zitat J.H. Bramble, Multigrid Methods, vol. 294 (Longman Scientific & Technical, Harlow, 1993) J.H. Bramble, Multigrid Methods, vol. 294 (Longman Scientific & Technical, Harlow, 1993)
6.
Zurück zum Zitat F. Cappello, Fault tolerance in petascale/exascale systems: current knowledge, challenges and research opportunities. Int. J. High Perform. Comput. Appl. 23 (3), 212–226 (2009)CrossRef F. Cappello, Fault tolerance in petascale/exascale systems: current knowledge, challenges and research opportunities. Int. J. High Perform. Comput. Appl. 23 (3), 212–226 (2009)CrossRef
7.
Zurück zum Zitat F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, M. Snir, Toward exascale resilience. Int. J. High Perform. Comput. Appl. 23, 374–388 (2009)CrossRef F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, M. Snir, Toward exascale resilience. Int. J. High Perform. Comput. Appl. 23, 374–388 (2009)CrossRef
8.
Zurück zum Zitat F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, M. Snir, Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1 (1), 5–28 (2014) F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, M. Snir, Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1 (1), 5–28 (2014)
9.
Zurück zum Zitat M. Casas, B.R. de Supinski, G. Bronevetsky, M. Schulz, Fault Resilience of the Algebraic Multi-grid Solver (ICS’12) (ACM, New York, 2012), pp. 91–100 M. Casas, B.R. de Supinski, G. Bronevetsky, M. Schulz, Fault Resilience of the Algebraic Multi-grid Solver (ICS’12) (ACM, New York, 2012), pp. 91–100
10.
Zurück zum Zitat A. Crisanti, G. Paladin, A. Vulpiani, Products of Random Matrices (Springer, Berlin/Heidelberg, 1993)CrossRefMATH A. Crisanti, G. Paladin, A. Vulpiani, Products of Random Matrices (Springer, Berlin/Heidelberg, 1993)CrossRefMATH
11.
Zurück zum Zitat T. Cui, J. Xu, C.-S. Zhang, An Error-Resilient Redundant Subspace Correction Method, ArXiv e-prints (2013) T. Cui, J. Xu, C.-S. Zhang, An Error-Resilient Redundant Subspace Correction Method, ArXiv e-prints (2013)
12.
Zurück zum Zitat J. Elliott, F. Mueller, M. Stoyanov, C.G. Webster, Quantifying the impact of single bit flips on floating point arithmetic, Technical report ORNL/TM-2013/282, Oak Ridge National Laboratory, 2013CrossRef J. Elliott, F. Mueller, M. Stoyanov, C.G. Webster, Quantifying the impact of single bit flips on floating point arithmetic, Technical report ORNL/TM-2013/282, Oak Ridge National Laboratory, 2013CrossRef
13.
Zurück zum Zitat M. Embree, L.N. Trefethen, Growth and decay of random Fibonacci sequences. Proc.: Math. Phys. Eng. Sci. 455 (1987), 2471–2485 (1999) (English) M. Embree, L.N. Trefethen, Growth and decay of random Fibonacci sequences. Proc.: Math. Phys. Eng. Sci. 455 (1987), 2471–2485 (1999) (English)
15.
Zurück zum Zitat W. Hackbusch, Multi-grid Methods and Applications, vol. 4 (Springer, Berlin, 1985)MATH W. Hackbusch, Multi-grid Methods and Applications, vol. 4 (Springer, Berlin, 1985)MATH
16.
Zurück zum Zitat W. Hackbusch, Iterative Solution of Large Sparse Systems of Equations. Applied Mathematical Sciences, vol. 95 (Springer, New York, 1994). Translated and revised from the 1991 German original W. Hackbusch, Iterative Solution of Large Sparse Systems of Equations. Applied Mathematical Sciences, vol. 95 (Springer, New York, 1994). Translated and revised from the 1991 German original
17.
Zurück zum Zitat T. Herault, Y. Robert, Fault-Tolerance Techniques for High-Performance Computing (Springer, Cham, 2015)CrossRefMATH T. Herault, Y. Robert, Fault-Tolerance Techniques for High-Performance Computing (Springer, Cham, 2015)CrossRefMATH
18.
Zurück zum Zitat K.-H. Huang, J. Abraham, Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 100 (6), 518–528 (1984)CrossRefMATH K.-H. Huang, J. Abraham, Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 100 (6), 518–528 (1984)CrossRefMATH
19.
Zurück zum Zitat M. Huber, B. Gmeiner, U. Rüde, B. Wohlmuth, Resilience for multigrid software at the extreme scale, arXiv preprint arXiv:1506.06185 (2015) M. Huber, B. Gmeiner, U. Rüde, B. Wohlmuth, Resilience for multigrid software at the extreme scale, arXiv preprint arXiv:1506.06185 (2015)
20.
Zurück zum Zitat R. Mainieri, Zeta function for the Lyapunov exponent of a product of random matrices. Phys. Rev. Lett. 68, 1965–1968 (1992)CrossRefMATH R. Mainieri, Zeta function for the Lyapunov exponent of a product of random matrices. Phys. Rev. Lett. 68, 1965–1968 (1992)CrossRefMATH
21.
Zurück zum Zitat S.F. McCormick, W.L. Briggs, V.E. Henson, A Multigrid Tutorial (SIAM, Philadelphia, 2000)MATH S.F. McCormick, W.L. Briggs, V.E. Henson, A Multigrid Tutorial (SIAM, Philadelphia, 2000)MATH
22.
Zurück zum Zitat M. Shantharam, S. Srinivasmurthy, P. Raghavan, Characterizing the Impact of Soft Errors on Iterative Methods in Scientific Computing (ICS’11) (ACM, New York, 2011), pp. 152–161 M. Shantharam, S. Srinivasmurthy, P. Raghavan, Characterizing the Impact of Soft Errors on Iterative Methods in Scientific Computing (ICS’11) (ACM, New York, 2011), pp. 152–161
23.
Zurück zum Zitat J. Sloan, R. Kumar, G. Bronevetsky, Algorithmic approaches to low overhead fault detection for sparse linear algebra, in 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Boston (IEEE, 2012), pp. 1–12 J. Sloan, R. Kumar, G. Bronevetsky, Algorithmic approaches to low overhead fault detection for sparse linear algebra, in 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Boston (IEEE, 2012), pp. 1–12
24.
Zurück zum Zitat M. Snir, R.W. Wisniewski, J.A. Abraham, S.V. Adve, S. Bagchi, P. Balaji, J. Belak, P. Bose, F. Cappello, B. Carlson, et al., Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28 (2), 129–173 (2014)CrossRef M. Snir, R.W. Wisniewski, J.A. Abraham, S.V. Adve, S. Bagchi, P. Balaji, J. Belak, P. Bose, F. Cappello, B. Carlson, et al., Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28 (2), 129–173 (2014)CrossRef
25.
Zurück zum Zitat M. Stoyanov, C. Webster, Numerical analysis of fixed point algorithms in the presence of hardware faults. SIAM J. Sci. Comput. 37 (5), C532–C553 (2015)MathSciNetCrossRefMATH M. Stoyanov, C. Webster, Numerical analysis of fixed point algorithms in the presence of hardware faults. SIAM J. Sci. Comput. 37 (5), C532–C553 (2015)MathSciNetCrossRefMATH
26.
Zurück zum Zitat U. Trottenberg, C.W. Oosterlee, A. Schüller, Multigrid (Academic Press Inc., San Diego, 2001). With contributions by A. Brandt, P. Oswald and K. Stüben U. Trottenberg, C.W. Oosterlee, A. Schüller, Multigrid (Academic Press Inc., San Diego, 2001). With contributions by A. Brandt, P. Oswald and K. Stüben
27.
Zurück zum Zitat J.N. Tsitsiklis, V.D. Blondel, The Lyapunov exponent and joint spectral radius of pairs of matrices are hard-when not impossible-to compute and to approximate. Math. Control Signals Syst. 10 (1), 31–40 (1997) (English)MathSciNetCrossRefMATH J.N. Tsitsiklis, V.D. Blondel, The Lyapunov exponent and joint spectral radius of pairs of matrices are hard-when not impossible-to compute and to approximate. Math. Control Signals Syst. 10 (1), 31–40 (1997) (English)MathSciNetCrossRefMATH
Metadaten
Titel
Multigrid at Scale?
verfasst von
Mark Ainsworth
Christian Glusa
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-39929-4_24