1 Introduction and Motivation
2 Background
2.1 Discontinuous Galerkin Methods (DG)
2.2 Smoothness-Increasing Accuracy-Conserving (SIAC) Filtering
3 Theoretical Kernel Scaling
4 Kernel Scaling Results
4.1 Efficient Computing Using GPUs
Test case | CPU wall-clock time | GPU wall-clock time | ||
---|---|---|---|---|
m=1.0 |
m=1.0 |
m=2.0 |
m=3.0 | |
P2-202
| 5815.7 | 529.136 | 436.728 | 623.297 |
P2-402
| 23811.6 | 743.865 | 889.147 | 1440.91 |
P2-802
| 98103.7 | 1749.58 | 3141.44 | 5361.63 |
P3-202
| 26518.4 | 924.4 | 1438.82 | 2399.03 |
P3-402
| 109256.2 | 2164.72 | 3937.15 | 6752.97 |
P3-802
| 446857.9 | 7447.4 | 15399.6 | 26756.2 |
P4-402
| 343438.2 | 6904.21 | 14178 | 25136.2 |
P4-802
| 1396076.3 | 26744 | 56831.8 | 101341 |
P4-1602
| 5723912.9 | 106045 | 227221 | 404746 |
-
There is a static amount of overhead associated with running a process on the GPU. This gives a false impression as to the scaling of computation times at lower mesh resolutions. The overhead will be hardware/implementation dependent. In this case the trend becomes more clear for computations that take at least five or more seconds to complete on the GPU.
-
The majority of the computation times were spent on memory access. As the footprint of the kernel increases, more neighboring data must be accessed to compute the post-processed solution at a given evaluation point. Judicious memory layout patterns were key to achieving significantly improved performance on the GPU.
-
As the kernel spacing increases, the GPU wall-clock time increases. This increase comes as a consequence of the increased width of the kernel footprint induced by the increased scaling factor. As an increased number of elements surrounding an element are needed in order to generate the post-processed solution for a particular element, the number of floating-point operations increases (and hence the total wall clock time).
-
The increase in the kernel spacing also dictates how much data from the region surrounding an element is necessary to accomplish the post-processing. This increased memory usage decreases the efficiency of computation per core as more loads/stores are required to facilitate computation on a GPU core.
4.2 Numerical Results
4.2.1 Uniform Quadrilateral Mesh
Mesh |
m=0.5 |
m=0.75 |
m=1 |
m=2 |
m=3 | |||||
---|---|---|---|---|---|---|---|---|---|---|
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order | |
ℙ2
| ||||||||||
202
| 2.95e–05 | – | 2.41e–06 | – | 4.48e–06 | – | 0.000269 | – | x | – |
402
| 3.75e–06 | 2.98 | 2.53e–07 | 3.25 | 7.09e–08 | 5.98 | 4.45e–06 | 5.92 | 4.96e–05 | – |
802
| 4.7e–07 | 2.99 | 3.06e–08 | 3.05 | 1.11e–09 | 5.99 | 7.06e–08 | 5.98 | 7.99e–07 | 5.95 |
1602
| 5.88e–08 | 3 | 3.79e–09 | 3.01 | 1.74e–11 | 6 | 1.11e–09 | 5.99 | 1.26e–08 | 5.99 |
ℙ3
| ||||||||||
202
| 1.99e–07 | – | 1.32e–08 | – | 1.38e–07 | – | 3.22e–05 | – | x | – |
402
| 1.27e–08 | 3.97 | 1.55e–10 | 6.4 | 5.49e–10 | 7.97 | 1.38e–07 | 7.87 | 3.4e–06 | – |
802
| 7.97e–10 | 3.99 | 8.05e–12 | 4.27 | 2.16e–12 | 7.99 | 5.49e–10 | 7.97 | 1.4e–08 | 7.93 |
1602
| 4.99e–11 | 4 | 4.52e–13 | 4.15 | 9.1e–15 | 7.89 | 2.16e–12 | 7.99 | 5.52e–11 | 7.98 |
ℙ4
| ||||||||||
402
| 3.36e–11 | – | 1.11e–12 | – | 4.41e–12 | – | 4.38e–09 | – | 2.4e–07 | – |
802
| 1.06e–12 | 4.99 | 3.41e–14 | 5.02 | 3.19e–15 | 10.4 | 4.41e–12 | 9.96 | 2.51e–10 | 9.9 |
1602
| 3.37e–14 | 4.97 | 2.69e–15 | 3.67 | 2.38e–15 | 0.421 | 3.79e–15 | 10.2 | 2.48e–13 | 9.98 |
4.2.2 Quadrilateral Cross Mesh
Mesh |
m=0.25 |
m=0.5 |
m=1 |
m=1.5 |
m=2 | |||||
---|---|---|---|---|---|---|---|---|---|---|
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order | |
ℙ2
| ||||||||||
202
| 4.28e–04 | – | 1.5e–04 | – | 3.19e–04 | – | x | – | x | – |
402
| 4.96e–05 | 3.11 | 1.83e–05 | 3.03 | 6.04e–06 | 5.72 | 5.11e–05 | – | 2.71e–04 | – |
802
| 6.04e–06 | 3.04 | 2.34e–06 | 2.97 | 1.28e–07 | 5.56 | 8.52e–07 | 5.91 | 4.5e–06 | 5.91 |
1602
| 7.49e–07 | 3.01 | 2.95e–07 | 2.98 | 4.74e–09 | 4.75 | 1.77e–08 | 5.59 | 7.21e–08 | 5.96 |
ℙ3
| ||||||||||
202
| 4.06e–06 | – | 1.91e–06 | – | 3.23e–05 | – | x | – | x | – |
402
| 1.6e–07 | 4.66 | 1.24e–07 | 3.95 | 1.38e–07 | 7.87 | 3.4e–06 | – | 3.22e–05 | – |
802
| 6.98e–09 | 4.52 | 7.92e–09 | 3.96 | 5.72e–10 | 7.92 | 1.4e–08 | 7.93 | 1.38e–07 | 7.87 |
1602
| 3.56e–10 | 4.3 | 5e–10 | 3.99 | 3.75e–12 | 7.25 | 5.56e–11 | 7.97 | 5.5e–10 | 7.97 |
ℙ4
| ||||||||||
402
| 7.74e–09 | – | 6.61e–10 | – | 4.38e–09 | – | 2.4e–07 | – | x | – |
802
| 2.39e–10 | 5.02 | 2.19e–11 | 4.91 | 7.87e–12 | 9.12 | 2.51e–10 | 9.9 | 4.38e–09 | – |
1602
| 7.45e–12 | 5.01 | 7.6e–13 | 4.85 | 3.71e–13 | 4.4 | 4.23e–13 | 9.21 | 4.38e–12 | 9.96 |
4.2.3 Structured Triangular Mesh
m
| ℙ2
| ℙ3
| ℙ4
|
---|---|---|---|
0.5 | 3.5/N
| 5/N
| 7.5/N
|
1 | 7/N
| 10/N
| 13/N
|
2 | 14/N
| 20/N
| 26/N
|
3 | 21/N
| 30/N
| 39/N
|
4 | 28/N
| 40/N
| 52/N
|
– |
m=0.5 |
m=1 |
m=2 |
m=3 |
m=4 | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mesh |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
ℙ2
| ||||||||||
202
| 1.01e–04 | – | 4.65e–06 | – | 2.69e–04 | – | x | – | x | – |
402
| 1.25e–05 | 3.01 | 7.50e–08 | 5.95 | 4.46e–06 | 5.92 | 4.96e–05 | – | 0.000269 | – |
802
| 1.57e–06 | 3.00 | 1.26e–09 | 5.89 | 7.06e–08 | 5.98 | 7.99e–07 | 5.96 | 4.45e–006 | 5.92 |
1602
| 2.28e–07 | 2.78 | 1.05e–09 | 0.259 | 2.14e–09 | 5.04 | 1.36e–08 | 5.87 | 7.16e–008 | 5.96 |
ℙ3
| ||||||||||
202
| 1.49e–06 | – | 1.38e–07 | – | 2.21e–04 | – | x | – | x | – |
402
| 9.15e–08 | 4.02 | 5.50e–10 | 7.97 | 1.38e–07 | 10.65 | 3.40e–06 | – | 4.53e–005 | – |
802
| 5.70e–09 | 4.01 | 2.16e–12 | 7.99 | 5.49e–10 | 7.97 | 1.40e–08 | 7.93 | 1.38e–007 | 8.36 |
1602
| 3.52e–10 | 4.02 | 1.64e–13 | 3.72 | 2.2e–12 | 7.97 | 5.52e–11 | 7.98 | 5.49e–010 | 7.97 |
ℙ4
| ||||||||||
402
| 7.26e–10 | – | 4.41e–12 | – | 4.38e–09 | – | 3.47e–07 | – | x | – |
802
| 2.24e–11 | 5.02 | 1.75e–14 | 7.98 | 4.41e–12 | 9.96 | 2.51e–10 | 10.43 | 4.38e–009 | – |
1602
| 5.81e–013 | 5.27 | 2.38e–013 | −3.77 | 2.37e–013 | 4.22 | 3.53e–013 | 9.47 | 4.42e–012 | 9.95 |
4.2.4 Union-Jack Mesh
– |
m=0.5 |
m=1 |
m=1.5 |
m=2 | ||||
---|---|---|---|---|---|---|---|---|
Mesh |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
ℙ2
| ||||||||
202
| 2.88e–05 | – | 2.87e–04 | – | 1.98e–03 | – | 7.92e–02 | – |
402
| 2.39e–06 | 3.59 | 5.01e–06 | 5.84 | 5.01e–05 | 5.31 | 2.70e–04 | 8.19 |
802
| 2.92e–07 | 3.03 | 8.81e–08 | 5.83 | 8.17e–07 | 5.94 | 4.47e–06 | 5.92 |
1602
| 3.66e–08 | 3 | 1.65e–09 | 5.74 | 1.31e–08 | 5.96 | 7.11e–08 | 5.97 |
ℙ3
| ||||||||
202
| 2.39e–07 | – | 2.23e–04 | – | 8.64e–02 | – | 6.90e–02 | – |
402
| 9.97e–09 | 4.59 | 1.38e–07 | 10.66 | 3.40e–06 | 14.63 | 4.54e–05 | 13.89 |
802
| 5.89e–10 | 4.09 | 5.51e–10 | 7.97 | 1.39e–08 | 7.93 | 1.38e–07 | 8.37 |
1602
| 3.65e–11 | 4.01 | 2.2e–12 | 7.97 | 5.52e–11 | 7.98 | 5.5e–10 | 7.97 |
ℙ4
| ||||||||
402
| 1.84e–09 | – | 4.77e–09 | – | 3.49e–07 | – | 2.72e–03 | – |
802
| 3.11e–12 | 9.21 | 5.34e–12 | 9.80 | 2.51e–10 | 10.44 | 4.38e–09 | 19.24 |
1602
| 2.35e–13 | 3.73 | 2.34e–013 | 4.52 | 3.51e–13 | 9.48 | 4.42e–12 | 9.95 |
4.2.5 Chevron Mesh
– |
m=0.25 |
m=0.5 |
m=1 |
m=1.5 |
m=2 | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mesh |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
ℙ2
| ||||||||||
202
| 7.22e–05 | – | 4.06e–05 | – | 3.05e–04 | – | x | – | x | – |
402
| 7.42e–06 | 3.28 | 1.46e–06 | 4.79 | 5.57e–06 | 5.77 | 5.07e–05 | – | 2.71e–04 | – |
802
| 9.05e–07 | 3.04 | 1.09e–07 | 3.74 | 1.06e–07 | 5.72 | 8.34e–07 | 5.93 | 4.49e–06 | 5.91 |
1602
| 1.13e–07 | 3.0 | 1.29e–08 | 3.09 | 2.2e–09 | 5.58 | 1.37e–08 | 5.93 | 7.17e–08 | 5.97 |
ℙ3
| ||||||||||
202
| 1.03e–06 | – | 1.96e–07 | – | 2.23e–04 | – | x | – | x | – |
402
| 6.35e–08 | 4.02 | 2.75e–09 | 6.16 | 1.38e–07 | 10.66 | 3.40e–06 | – | 4.54e–05 | – |
802
| 3.97e–09 | 3.99 | 2.82e–10 | 3.29 | 6.00e–10 | 7.85 | 1.40e–08 | 7.93 | 1.38e–07 | 8.37 |
1602
| 2.48e–10 | 4.0 | 1.17e–11 | 4.59 | 6.27e–12 | 6.58 | 5.56e–11 | 7.97 | 5.5e–10 | 7.97 |
ℙ4
| ||||||||||
402
| 5.32e–10 | – | 9.49e–11 | – | 4.38e–09 | – | 3.48e–07 | – | x | – |
802
| 1.64e–11 | 5.02 | 6.01e–12 | 3.98 | 7.46e–12 | 9.20 | 2.51e–10 | 10.44 | 4.38e–09 | – |
1602
| 5.97e–13 | 4.78 | 4.67e–13 | 3.68 | 4.67e–13 | 4.0 | 5.38e–13 | 8.87 | 4.45e–12 | 9.94 |
4.2.6 Hexahedral Mesh
– | Original DG error |
m=0.5 |
m=1 |
m=1.5 | ||||
---|---|---|---|---|---|---|---|---|
Test case | DG error | Order |
L
2 error | Order |
L
2 error | Order |
L
2 error | Order |
ℙ2
| ||||||||
202
| 1.82e–04 | – | 4.22E-05 | – | 6.71e–06 | – | 7.44e–05 | – |
402
| 2.28e–05 | 2.99 | 5.36e–06 | 2.97 | 1.06e–07 | 5.98 | 1.21e–06 | 5.94 |
ℙ3
| ||||||||
202
| 3.17e–06 | – | 1.57e–07 | – | 2.06e–07 | – | 5.09E-06 | – |
402
| 1.98e–07 | 3.99 | 1.00e–08 | 3.97 | 8.24e–10 | 7.97 | 2.13E-08 | 7.90 |