1 Introduction
n
|
\(t_n\)
|
\(\tau _n^{MPI}\)
|
\(\pm \Delta \tau _n^{MPI}\)
|
\(\frac{\tau _n^{MPI}}{t_n}\)
|
---|---|---|---|---|
(s) | (s) | (s) | ||
\(^*1\)
| 1092139.0 | 0.0 | 0.0 | 0.000 |
4 | 243000.0 | 14550.0 | 2215.3 | 0.060 |
8 | 115000.0 | 5293.8 | 1557.3 | 0.046 |
16 | 75300.0 | 6988.1 | 2260.3 | 0.093 |
32 | 35200.0 | 4111.6 | 1184.4 | 0.117 |
48 | 27500.0 | 6202.5 | 1735.2 | 0.226 |
64 | 19300.0 | 3571.2 | 928.4 | 0.185 |
96 | 13725.0 | 3525.9 | 2313.6 | 0.257 |
128 | 11300.0 | 3400.8 | 668.8 | 0.301 |
160 | 9210.0 | 2963.2 | 434.2 | 0.322 |
192 | 6970.0 | 1749.6 | 360.2 | 0.251 |
256 | 5670.0 | 1738.2 | 309.8 | 0.307 |
320 | 4680.0 | 1455.4 | 294.4 | 0.311 |
384 | 3920.0 | 1204.7 | 231.6 | 0.307 |
640 | 2900.0 | 1213.8 | 185.0 | 0.419 |
768 | 2450.0 | 995.3 | 170.0 | 0.406 |
896 | 2390.0 | 1112.4 | 172.0 | 0.465 |
1024 | 2090.0 | 960.0 | 145.2 | 0.459 |
1296 | 1670.0 | 707.5 | 140.3 | 0.424 |
1520 | 1870.3 | 1084.4 | 122.5 | 0.580 |
n
|
\(t_n\)
|
\(\tau _n^{MPI}\)
|
\(\pm \Delta \tau _n^{MPI}\)
|
\(\frac{\tau _n^{MPI}}{t_n}\)
|
---|---|---|---|---|
(s) | (s) | (s) | ||
1 | 5317.0 | 0.0 | 0.0 | 0.000 |
2 | 2690.0 | 39.5 | 15.3 | 0.015 |
4 | 1380.0 | 50.7 | 19.1 | 0.037 |
8 | 730.0 | 43.5 | 10.4 | 0.060 |
16 | 388.0 | 35.6 | 1.4 | 0.092 |
32 | 251.0 | 68.0 | 55.3 | 0.271 |
48 | 146.0 | 26.6 | 17.6 | 0.182 |
64 | 134.0 | 42.9 | 28.3 | 0.320 |
80 | 105.0 | 30.4 | 15.9 | 0.290 |
96 | 121.0 | 52.9 | 12.1 | 0.437 |
112 | 99.1 | 44.6 | 16.5 | 0.451 |
128 | 78.7 | 30.8 | 14.3 | 0.391 |
160 | 66.6 | 27.3 | 8.3 | 0.410 |
192 | 98.4 | 61.3 | 6.0 | 0.623 |
224 | 54.0 | 25.4 | 8.3 | 0.470 |
256 | 53.7 | 27.9 | 7.5 | 0.519 |
288 | 116.0 | 76.0 | 10.5 | 0.655 |
320 | 63.1 | 40.2 | 5.1 | 0.638 |
336 | 45.6 | 23.8 | 3.5 | 0.522 |
384 | 47.0 | 27.6 | 3.3 | 0.588 |
416 | 99.4 | 80.6 | 4.1 | 0.811 |
448 | 44.6 | 27.0 | 5.1 | 0.606 |
512 | 88.1 | 73.5 | 3.5 | 0.834 |
592 | 43.7 | 29.7 | 1.7 | 0.679 |
688 | 42.0 | 29.4 | 1.7 | 0.699 |
n
|
\(t_n\)
|
\(\tau _n^{MPI}\)
|
\(\pm \Delta \tau _n^{MPI}\)
|
\(\frac{\tau _n^{MPI}}{t_n}\)
| ||||
---|---|---|---|---|---|---|---|---|
(s) | (s) | (s) | ||||||
1 | 4602.0 | [4623.0] | 0.0 | [ 0.0] | 0.0 | [0.0] | 0.000 | [0.000] |
2 | 2350.0 | [2362.3] | 40.6 | [ 33.1] | 9.0 | [–] | 0.017 | [0.014] |
4 | 1230.0 | [1245.5] | 51.8 | [ 59.8] | 23.3 | [–] | 0.042 | [0.048] |
8 | 633.1 | [ 634.4] | 35.5 | [ 34.3] | 17.7 | [–] | 0.056 | [0.054] |
12 | 484.1 | [ 410.9] | 46.4 | [ 41.5] | 16.6 | [–] | 0.096 | [0.101] |
16 | 383.0 | [ 378.3] | 51.1 | [ 43.5] | 9.0 | [–] | 0.133 | [0.115] |
24 | 296.0 | [ 244.6] | 69.9 | [ 50.1] | 31.4 | [–] | 0.236 | [0.205] |
32 | 280.0 | [ 281.2] | 101.6 | [100.4] | 28.9 | [–] | 0.363 | [0.357] |
40 | 207.0 | [ 183.5] | 66.1 | [ 59.3] | 22.8 | [–] | 0.319 | [0.323] |
48 | 233.0 | [ 202.9] | 104.7 | [ 79.5] | 30.8 | [–] | 0.449 | [0.392] |
64 | 200.0 | [ 168.0] | 99.0 | [ 71.6] | 28.0 | [–] | 0.495 | [0.426] |
80 | 167.0 | [ 149.1] | 84.2 | [ 69.0] | 21.6 | [–] | 0.504 | [0.463] |
96 | 139.0 | [ 136.7] | 69.9 | [ 67.1] | 23.9 | [–] | 0.503 | [0.491] |
112 | 142.0 | [ 123.4] | 79.4 | [ 62.9] | 21.8 | [–] | 0.559 | [0.510] |
128 | 125.0 | [ 116.4] | 70.5 | [ 62.2] | 20.7 | [–] | 0.564 | [0.534] |
160 | 116.0 | [ 103.7] | 69.4 | [ 58.2] | 17.3 | [–] | 0.598 | [0.561] |
192 | 113.0 | [ 94.9] | 72.4 | [ 55.6] | 17.1 | [–] | 0.641 | [0.586] |
224 | 112.0 | [ 91.4] | 75.6 | [ 55.9] | 16.5 | [–] | 0.675 | [0.612] |
256 | 127.0 | [ 92.1] | 90.9 | [ 59.6] | 14.1 | [–] | 0.716 | [0.647] |
288 | 132.0 | [ 90.6] | 98.2 | [ 60.7] | 12.8 | [–] | 0.744 | [0.670] |
320 | 128.0 | [ 96.5] | 95.2 | [ 66.9] | 10.5 | [–] | 0.744 | [0.693] |
352 | 146.0 | [ 111.3] | 112.7 | [ 82.8] | 7.9 | [–] | 0.772 | [0.744] |
416 | 131.0 | [ 117.0] | 101.9 | [ 89.6] | 7.1 | [–] | 0.778 | [0.766] |
512 | 124.0 | [ 102.1] | 99.3 | [ 79.7] | 6.8 | [–] | 0.801 | [0.781] |
n
|
\(t_n\)
|
\(\tau _n^{MPI}\)
|
\(\pm \Delta \tau _n^{MPI}\)
|
\(\frac{\tau _n^{MPI}}{t_n}\)
|
---|---|---|---|---|
(s) | (s) | (s) | ||
1 | 2124.0 | 0.0 | 0.0 | 0.000 |
2 | 1290.0 | 172.0 | 0.0 | 0.133 |
4 | 554.0 | 181.7 | 5.8 | 0.328 |
8 | 356.0 | 196.6 | 8.2 | 0.552 |
16 | 287.0 | 212.9 | 6.3 | 0.742 |
32 | 239.0 | 203.2 | 3.7 | 0.850 |
64 | 220.0 | 201.9 | 2.7 | 0.918 |
128 | 215.0 | 206.1 | 1.6 | 0.959 |
n
|
\(t_n\)
|
\(\tau _n^{MPI}\)
|
\(\frac{\tau _n^{MPI}}{t_n}\)
|
---|---|---|---|
(s) | (s) | ||
1 | 5375.4 | 0.0 | 0.000 |
2 | 2664.9 | 48.0 | 0.018 |
4 | 1424.9 | 42.8 | 0.030 |
8 | 889.8 | 42.7 | 0.048 |
16 | 452.9 | 30.8 | 0.068 |
32 | 261.1 | 46.2 | 0.177 |
48 | 208.0 | 52.4 | 0.252 |
64 | 159.7 | 48.7 | 0.305 |
80 | 145.0 | 51.5 | 0.355 |
96 | 126.3 | 44.7 | 0.354 |
112 | 164.0 | 88.1 | 0.537 |
128 | 107.4 | 47.3 | 0.440 |
160 | 111.1 | 55.4 | 0.499 |
192 | 109.3 | 60.1 | 0.550 |
224 | 103.4 | 60.6 | 0.586 |
256 | 104.8 | 66.5 | 0.635 |
512 | 82.3 | 59.2 | 0.719 |
768 | 86.3 | 69.6 | 0.806 |
1024 | 79.2 | 65.0 | 0.821 |
n
|
\(t_n\)
|
\(\tau _n^{MPI}\)
|
\(\frac{\tau _n^{MPI}}{t_n}\)
|
---|---|---|---|
(s) | (s) | ||
1 | 5531.3 | 0.0 | 0.000 |
2 | 3085.6 | 216.0 | 0.070 |
4 | 1784.2 | 269.4 | 0.151 |
8 | 1281.7 | 362.7 | 0.283 |
16 | 793.6 | 388.9 | 0.490 |
32 | 356.3 | 170.0 | 0.477 |
48 | 294.3 | 165.4 | 0.562 |
64 | 229.2 | 132.7 | 0.579 |
80 | 203.7 | 123.0 | 0.604 |
96 | 194.5 | 125.1 | 0.643 |
112 | 186.3 | 126.7 | 0.680 |
128 | 156.7 | 100.8 | 0.643 |
160 | 155.9 | 110.2 | 0.707 |
192 | 157.1 | 119.1 | 0.758 |
224 | 153.5 | 120.0 | 0.782 |
256 | 166.8 | 136.4 | 0.818 |
288 | 151.6 | 123.5 | 0.815 |
320 | 155.8 | 129.5 | 0.831 |
352 | 154.3 | 129.8 | 0.841 |
416 | 164.8 | 142.2 | 0.863 |
512 | 161.6 | 141.1 | 0.873 |
768 | 185.8 | 168.5 | 0.907 |
1024 | 228.3 | 212.3 | 0.930 |
n
|
\(t_n\)
|
\(\tau _n^{MPI}\)
|
\(\frac{\tau _n^{MPI}}{t_n}\)
|
---|---|---|---|
(s) | (s) | ||
1 | 4501.0 | 0.0 | 0.000 |
2 | 2432.6 | 73.0 | 0.030 |
4 | 1298.9 | 85.7 | 0.066 |
8 | 702.7 | 97.0 | 0.138 |
16 | 409.7 | 100.8 | 0.246 |
32 | 290.9 | 128.9 | 0.443 |
48 | 261.8 | 147.9 | 0.565 |
64 | 246.4 | 157.4 | 0.639 |
80 | 332.9 | 257.0 | 0.772 |
96 | 294.9 | 226.8 | 0.769 |
112 | 319.0 | 260.3 | 0.816 |
128 | 300.8 | 247.9 | 0.824 |
160 | 356.2 | 309.2 | 0.868 |
192 | 360.8 | 319.3 | 0.885 |
224 | 335.5 | 295.2 | 0.880 |
256 | 308.5 | 276.1 | 0.895 |
288 | 318.7 | 288.1 | 0.904 |
320 | 330.9 | 303.8 | 0.918 |
352 | 389.0 | 362.9 | 0.933 |
416 | 362.8 | 338.5 | 0.933 |
512 | 355.6 | 333.6 | 0.938 |
768 | 396.9 | 378.2 | 0.953 |
1024 | 423.5 | 402.3 | 0.950 |
2 Basic model
2.1 Generalization
-
time to interchange data
-
time to synchronize individual parallel tasks
-
extra computing time due to code sections arising only in the parallel algorithm
-
computing time penalties due to load balancing issues
-
computing time penalties due to inhomogeneous conditions between individual components of the parallel machine [1]
2.2 Solving for \(\tau _n\)
3 Results
3.1 HPC systems used
3.2 Parallel overhead determined from run-time records
3.3 Fitting with GNUPLOT
-
define an explicit value for \(f_s\) (either known or guessed)
-
graphically check the quality of the fit and aim at asymptotic standard errors in the range of 10–30%
-
make sure that \(c>b\) and try to have \(b+\Delta b \approx c\), where \(\Delta b\) is the reported asymptotic standard error
-
incrementally decrease \(f_s\) and repeat the above steps until an optimal fit is obtained