1 Introduction and Motivation
A uniform discretization of the mantle at for instance 1 km resolution would result in meshes with nearly a trillion elements, which is far beyond the capacity of the largest available supercomputers.
2 Basic Ideas and Concepts
R
0
| Earth’s radius | ρ
0
| Reference density | g
| Gravitational acceleration |
η
| Dynamic viscosity | α
| Thermal expansivity | κ
| Heat conductivity |
ΔT
| Temperature difference between core-mantle-boundary and surface |
3 Summary of Project Results
3.1 Efficiency of Solvers and Software
3.2 Reducing Complexity in Models and Algorithms
3.3 Stokes Solvers and Performance
3.3.1 Multigrid Approaches for the Stokes System
SuperMUC Phase 1 | ||||
---|---|---|---|---|
(Thin nodes) | SuperMUC Phase 2 | Juqueen | Hazel Hen | |
Operation | 2012–2018 | 2015–present | 2012–2018 | 2015–present |
# nodes | 9216 | 3072 | 28,672 | 7712 |
CPU | Intel Sandy Bridge | Intel Haswell | IBM | Intel Haswell |
E5-2580 | E5-2697v3 | PowerPC A2 | E5-2680v3 | |
8 Core | 14 Core | 16 Core | 12 Core | |
CPU frequency (GHz) | 2.7 | 2.6 | 1.6 | 2.5 |
# total cores | 147,456 | 80,016 | 458,752 | 185,088 |
Interconnect | Infiniband FDR10 | Infiniband FDR14 | 5D Torus | Aries |
Total memory (TByte) | 288 | 194 | 448 | 987 |
Linpack (PFlop/s) | 2.897 | 2.814 | 5.0 | 7.42 |
Nodes | Threads | DoFs | Iter | Time [s] | Time [s] w/o coarse grid solver |
---|---|---|---|---|---|
5 | 80 | 2.7 × 109 | 10 | 685.88 | 678.77 |
40 | 640 | 2.1 × 1010 | 10 | 703.69 | 686.24 |
320 | 5120 | 1.2 × 1011 | 10 | 741.86 | 709.88 |
2560 | 40,960 | 1.7 × 1012 | 9 | 720.24 | 671.63 |
20,480 | 327,680 | 1.1 × 1013 | 9 | 776.09 | 681.91 |
3.3.2 Smoothers for Indefinite Systems
ν
| 4 | 6 | 8 | |||
---|---|---|---|---|---|---|
DoFs | Iter | Time [s] | Iter | Time [s] | Iter | Time [s] |
1.4 × 103 | 13 | 0.10 | 9 | 0.07 | 7 | 0.06 |
1.4 × 104 | 12 | 0.21 | 9 | 0.18 | 7 | 0.15 |
1.2 × 105 | 12 | 0.61 | 8 | 0.51 | 6 | 0.44 |
1.0 × 106 | 11 | 2.44 | 8 | 2.33 | 6 | 2.16 |
8.2 × 106 | 11 | 14.54 | 8 | 14.58 | 6 | 14.03 |
6.6 × 107 | 11 | 102.66 | 7 | 92.09 | 6 | 101.90 |
5.3 × 108 | 10 | 700.38 | 7 | 693.75 | 6 | 769.34 |
3.3.3 Multigrid Coarse Grid Solvers
np | DoF | Red. | T[s] | Coarse | Par. eff |
---|---|---|---|---|---|
30 | 8.3 × 107 | 1 | 16.284 | 0.043 | 1.00 |
120 | 3.3 × 108 | 1 | 16.426 | 0.050 | 0.99 |
960 | 2.6 × 109 | 1 | 17.084 | 0.171 | 0.95 |
7680 | 2.4 × 1010 | 1 | 17.310 | 0.382 | 0.94 |
61,440 | 1.7 × 1011 | 8 | 17.704 | 0.877 | 0.92 |
DOF | Iter | Time [s] | Time [s] | |||||
---|---|---|---|---|---|---|---|---|
Proc. | Fine | Total | Fine | BLR 𝜖
| Coarse Ana. & fac. | Par. eff. | ||
1920 | 2.10 × 1010 | 15 | 78.1 | 77.9 | 10−3
| 0.03 | 2.7 | 1.00 |
15,360 | 4.30 × 1010 | 13 | 88.9 | 86.8 | 10−3
| 0.22 | 25.0 | 0.93 |
43,200 | 1.70 × 1011 | 14 | 95.5 | 87.0 | 10−8
| 0.59 | 111.6 | 0.82 |
3.4 Multi-Level Monte Carlo
LeSyHom | DyLeSyHom | ||
---|---|---|---|
Level | time [s] | time [s] | Ratio |
0 | 500 | 460 | 0.92 |
1 | 1512 | 1347 | 0.89 |
2 | 5885 | 5596 | 0.95 |
Total
| 7897
| 7403
| 0.94
|
3.5 Inverse Problem and Adjoint Computations
3.5.1 Twin Experiment
3.6 Matrix-Free Algorithms
3.6.1 Matrix-Free Approaches Based on Surrogate Polynomials
e
LSQP
| ρ
IFEM∕ρ
LSQP
| e
LSQP
| ρ
IFEM∕ρ
LSQP
| ||
---|---|---|---|---|---|
Level | e
IFEM
| q = 2 | q = 3 | ||
ℓ = 1 | 7.50 × 10−5 | 7.50 × 10−5 | 1.00 | 7.50 × 10−5 | 1.00 |
ℓ = 2 | 1.86 × 10−5 | 1.86 × 10−5 | 1.00 | 1.86 × 10−5 | 1.00 |
ℓ = 3 | 4.64 × 10−6 | 4.67 × 10−6 | 1.00 | 4.64 × 10−6 | 1.00 |
ℓ = 4 | 1.16 × 10−6 | 1.41 × 10−6 | 1.00 | 1.16 × 10−6 | 1.00 |
ℓ = 5 | 2.89 × 10−7 | 9.87 × 10−7 | 1.00 | 3.10 × 10−7 | 1.00 |
Global | # UMG | Time UMG cycle | Parallel | Time | ||||
---|---|---|---|---|---|---|---|---|
Islands | Cores | DoFs | resolution | V-cycles | w/ c.g | w/o c.g. | efficiency | residual |
1 | 5580 | 1.3 × 1011 | 3.4 km | 7 (50/150) | 192 s | 164 s | 1.00 / 1.00 | 11.9 s |
2 | 12,000 | 2.7 × 1011 | 2.8 km | 10 (100/150) | 213 s | 169 s | 0.90 / 0.97 | 12.1 s |
4 | 21,600 | 4.8 × 1011 | 2.3 km | 7 (50/250) | 210 s | 172 s | 0.92 / 0.96 | 12.7 s |
8 | 47,250 | 1.1 × 1012 | 1.7 km | 8 (50/350) | 230 s | 173 s | 0.83 / 0.95 | 12.8 s |
3.6.2 A Stencil Scaling Approach for Accelerating Matrix-Free Finite Element Implementations
Nodal integration | Scale Vol+Face | Rel. | |||||||
---|---|---|---|---|---|---|---|---|---|
DoFs | Error | eoc | ρ
| tts [s] | Error | eoc | ρ
| tts [s] | tts |
4.7 × 106 | 2.43 × 10−4 | – | 0.522 | 2.5 | 2.38 × 10−4 | – | 0.522 | 2.0 | 0.80 |
3.8 × 107 | 6.00 × 10−5 | 2.02 | 0.536 | 4.2 | 5.86 × 10−5 | 2.02 | 0.536 | 2.6 | 0.61 |
3.1 × 108 | 1.49 × 10−5 | 2.01 | 0.539 | 12.0 | 1.46 × 10−5 | 2.01 | 0.539 | 4.5 | 0.37 |
2.5 × 109 | 3.72 × 10−6 | 2.00 | 0.538 | 53.9 | 3.63 × 10−6 | 2.00 | 0.538 | 15.3 | 0.28 |
2.0 × 1010 | 9.28 × 10−7 | 2.00 | 0.536 | 307.2 | 9.06 × 10−7 | 2.00 | 0.536 | 88.9 | 0.29 |
1.6 × 1011 | 2.32 × 10−7 | 2.00 | 0.534 | 1822.2 | 2.26 × 10−7 | 2.00 | 0.534 | 589.6 | 0.32 |
3.6.3 Stencil Scaling for Vector-Valued PDEs with Applications to Generalized Newtonian Fluids
Parameter | η
0
| η
∞
| κ
| r
|
---|---|---|---|---|
Value | 140.764 | 1.0 | 212.2 | − 0.325 |
Viscosity | Velocity | ||
---|---|---|---|
Physical | Unphysical | Physical | Unphysical |
8.88 × 10−4 | 1.07 × 10−1 | 5.65 × 10−5 | 3.19 × 10−2 |
Nodal integration | Physical scaling | Relative | |
---|---|---|---|
DoFs | tts [s] | tts [s] | tts |
4.69 × 106 | 309.10 | 364.97 | 1.18 |
3.82 × 107 | 361.90 | 412.10 | 1.14 |
3.08 × 108 | 895.18 | 719.55 | 0.80 |
2.47 × 109 | 3227.45 | 2626.13 | 0.81 |