Skip to main content
Top

2018 | OriginalPaper | Chapter

EXAHD: An Exa-Scalable Two-Level Sparse Grid Approach for Higher-Dimensional Problems in Plasma Physics and Beyond

Authors : Mario Heene, Alfredo Parra Hinojosa, Michael Obersteiner, Hans-Joachim Bungartz, Dirk Pflüger

Published in: High Performance Computing in Science and Engineering ' 17

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Within the current reporting period (04/2016–04/2017) of our HLRS project we have developed a scalable implementation of the fault-tolerant combination technique. Fault-tolerance is one of the key topics in the ongoing research of algorithms for future exascale systems. Our algorithms enable fault-tolerance for both hard and soft faults, for the efficient and massively parallel computation of high-dimensional PDEs without the need of checkpointing or process replication. The research project EXAHD is part of DFG’s priority program “Software for Exascale Computing” (SPPEXA). The project’s target application is the large-scale simulation of plasma turbulence with the code GENE. The report combines parts of three publications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference L. Bautista-Gomez, F. Cappello, Detecting silent data corruption for extreme-scale MPI applications, in Proceedings of the 22nd European MPI Users’ Group Meeting (ACM, New York, 2015), p. 12 L. Bautista-Gomez, F. Cappello, Detecting silent data corruption for extreme-scale MPI applications, in Proceedings of the 22nd European MPI Users’ Group Meeting (ACM, New York, 2015), p. 12
2.
go back to reference E. Berrocal, L. Bautista-Gomez, S. Di, Z. Lan, F. Cappello, Lightweight silent data corruption detection based on runtime data analysis for HPC applications, in Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’15 (ACM, New York, 2015), pp. 275–278 E. Berrocal, L. Bautista-Gomez, S. Di, Z. Lan, F. Cappello, Lightweight silent data corruption detection based on runtime data analysis for HPC applications, in Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, HPDC ’15 (ACM, New York, 2015), pp. 275–278
3.
go back to reference W. Bland et al., A proposal for user-level failure mitigation in the mpi-3 standard. University of Tennessee (2012) W. Bland et al., A proposal for user-level failure mitigation in the mpi-3 standard. University of Tennessee (2012)
4.
go back to reference M. Blatt, A. Burchardt, A. Dedner, C. Engwer, J. Fahlke, B. Flemisch, C. Gersbacher, C. Gräser, F. Gruber, C. Grüninger et al., The distributed and unified numerics environment, version 2.4. Arch. Numer. Softw. 4(100), 13–29 (2016) M. Blatt, A. Burchardt, A. Dedner, C. Engwer, J. Fahlke, B. Flemisch, C. Gersbacher, C. Gräser, F. Gruber, C. Grüninger et al., The distributed and unified numerics environment, version 2.4. Arch. Numer. Softw. 4(100), 13–29 (2016)
6.
7.
go back to reference F. Cappello et al., Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1(1), 5–28 (2014) F. Cappello et al., Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1(1), 5–28 (2014)
8.
go back to reference T. Dannert, Gyrokinetische simulation von plasmaturbulenz mit gefangenen teilchen und elektromagnetischen effekten. Ph.D. thesis, Technische Universität München (2005) T. Dannert, Gyrokinetische simulation von plasmaturbulenz mit gefangenen teilchen und elektromagnetischen effekten. Ph.D. thesis, Technische Universität München (2005)
9.
go back to reference E. Doyle, Y. Kamada, T. Osborne et al., Chapter 2: plasma confinement and transport. Nucl. Fusion 47(6), S18 (2007) E. Doyle, Y. Kamada, T. Osborne et al., Chapter 2: plasma confinement and transport. Nucl. Fusion 47(6), S18 (2007)
10.
go back to reference J. Elliott, M. Hoemmen, F. Mueller, Evaluating the impact of SDC on the GMRES iterative solver, in 2014 IEEE 28th International Parallel and Distributed Processing Symposium (IEEE, Piscataway, 2014), pp. 1193–1202 J. Elliott, M. Hoemmen, F. Mueller, Evaluating the impact of SDC on the GMRES iterative solver, in 2014 IEEE 28th International Parallel and Distributed Processing Symposium (IEEE, Piscataway, 2014), pp. 1193–1202
11.
go back to reference J. Elliott, M. Hoemmen, F. Mueller, Resilience in numerical methods: a position on fault models and methodologies (2014). arXiv preprint arXiv:1401.3013 J. Elliott, M. Hoemmen, F. Mueller, Resilience in numerical methods: a position on fault models and methodologies (2014). arXiv preprint arXiv:1401.3013
12.
go back to reference D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira, R. Brightwell, Detection and correction of silent data corruption for large-scale High-Performance Computing, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. (IEEE Computer Society Press, Piscataway, 2012), p. 78 D. Fiala, F. Mueller, C. Engelmann, R. Riesen, K. Ferreira, R. Brightwell, Detection and correction of silent data corruption for large-scale High-Performance Computing, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. (IEEE Computer Society Press, Piscataway, 2012), p. 78
14.
go back to reference M. Griebel, W. Huber, U. Rüde, T. Störtkuhl, The combination technique for parallel sparse-grid-preconditioning or -solution of PDEs on workstation networks, in Parallel Processing: CONPAR 92 VAPP V. LNCS, vol. 634 (1992) M. Griebel, W. Huber, U. Rüde, T. Störtkuhl, The combination technique for parallel sparse-grid-preconditioning or -solution of PDEs on workstation networks, in Parallel Processing: CONPAR 92 VAPP V. LNCS, vol. 634 (1992)
15.
go back to reference M. Griebel, M. Schneider, C. Zenger, A combination technique for the solution of sparse grid problems, in Iterative Methods in Linear Algebra (IMACS, Elsevier, North Holland, 1992), pp. 263–281MATH M. Griebel, M. Schneider, C. Zenger, A combination technique for the solution of sparse grid problems, in Iterative Methods in Linear Algebra (IMACS, Elsevier, North Holland, 1992), pp. 263–281MATH
16.
go back to reference B. Harding et al., Fault tolerant computation with the sparse grid combination technique. SIAM J. Sci. Comput. 37(3), C331–C353 (2015)MathSciNetCrossRefMATH B. Harding et al., Fault tolerant computation with the sparse grid combination technique. SIAM J. Sci. Comput. 37(3), C331–C353 (2015)MathSciNetCrossRefMATH
17.
go back to reference M. Heene, D. Pflüger, Efficient and scalable distributed-memory hierarchization algorithms for the sparse grid combination technique, in Parallel Computing: On the Road to Exascale (2016) M. Heene, D. Pflüger, Efficient and scalable distributed-memory hierarchization algorithms for the sparse grid combination technique, in Parallel Computing: On the Road to Exascale (2016)
18.
go back to reference M. Heene, D. Pflüger, Scalable algorithms for the solution of higher-dimensional PDEs, in Software for Exascale Computing - SPPEXA 2013–2015, ed. by H.-J. Bungartz, P. Neumann, W.E. Nagel (Springer, Berlin, 2016), pp. 165–186CrossRef M. Heene, D. Pflüger, Scalable algorithms for the solution of higher-dimensional PDEs, in Software for Exascale Computing - SPPEXA 2013–2015, ed. by H.-J. Bungartz, P. Neumann, W.E. Nagel (Springer, Berlin, 2016), pp. 165–186CrossRef
19.
go back to reference M. Heene, A.P. Hinojosa, H.J. Bungartz, D. Pflüger, A massively-parallel, fault-tolerant solver for high-dimensional PDEs, in Euro-Par 2016: Parallel Processing Workshops. Lecture Notes in Computer Science, vol. 10104 (Springer, Cham, 2016), pp. 635–647 M. Heene, A.P. Hinojosa, H.J. Bungartz, D. Pflüger, A massively-parallel, fault-tolerant solver for high-dimensional PDEs, in Euro-Par 2016: Parallel Processing Workshops. Lecture Notes in Computer Science, vol. 10104 (Springer, Cham, 2016), pp. 635–647
20.
go back to reference M. Hegland, J. Garcke, V. Challis, The combination technique and some generalisations. Linear Algebra Appl. 420(2–3), 249–275 (2007)MathSciNetCrossRefMATH M. Hegland, J. Garcke, V. Challis, The combination technique and some generalisations. Linear Algebra Appl. 420(2–3), 249–275 (2007)MathSciNetCrossRefMATH
21.
go back to reference A. Pan, J.W. Tschanz, S. Kundu, A low cost scheme for reducing silent data corruption in large arithmetic circuits, in IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems, 2008. DFTVS’08 (IEEE, Boston, 2008), pp. 343–351 A. Pan, J.W. Tschanz, S. Kundu, A low cost scheme for reducing silent data corruption in large arithmetic circuits, in IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems, 2008. DFTVS’08 (IEEE, Boston, 2008), pp. 343–351
22.
go back to reference A. Parra Hinojosa et al., Towards a fault-tolerant, scalable implementation of GENE, in Proceedings of ICCE 2014. LNCSE (Springer, Berlin, 2015) A. Parra Hinojosa et al., Towards a fault-tolerant, scalable implementation of GENE, in Proceedings of ICCE 2014. LNCSE (Springer, Berlin, 2015)
23.
go back to reference A. Parra Hinojosa et al., Handling silent data corruption with the sparse grid combination technique, in Proceedings of the SPPEXA Workshop. LNCSE (Springer, Berlin, 2016) A. Parra Hinojosa et al., Handling silent data corruption with the sparse grid combination technique, in Proceedings of the SPPEXA Workshop. LNCSE (Springer, Berlin, 2016)
25.
go back to reference M. Snir, R.W. Wisniewski, J.A. Abraham, S.V. Adve, S. Bagchi, P. Balaji, J. Belak, P. Bose, F. Cappello, B. Carlson et al., Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28, 129–173 (2014)CrossRef M. Snir, R.W. Wisniewski, J.A. Abraham, S.V. Adve, S. Bagchi, P. Balaji, J. Belak, P. Bose, F. Cappello, B. Carlson et al., Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28, 129–173 (2014)CrossRef
26.
go back to reference J. Walter, Design and implementation of a fault simulation layer for the combination technique on HPC systems. Master’s thesis, University of Stuttgart, 2016 J. Walter, Design and implementation of a fault simulation layer for the combination technique on HPC systems. Master’s thesis, University of Stuttgart, 2016
Metadata
Title
EXAHD: An Exa-Scalable Two-Level Sparse Grid Approach for Higher-Dimensional Problems in Plasma Physics and Beyond
Authors
Mario Heene
Alfredo Parra Hinojosa
Michael Obersteiner
Hans-Joachim Bungartz
Dirk Pflüger
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-68394-2_31

Premium Partner