1 Introduction
1.1 Literature Review
1.2 Our Contribution
-
firstly, we propose a dynamic graph model for solving the OBST problem and show that each instance of this problem corresponds to a one-to-all shortest path problem in this graph. So like in [15], any solution of a one-to-all shortest path problem can be used on our dynamic graph to accelerate the sequential or parallel resolution of OBST problem;
-
secondly, we proposed a BSP/CGM parallel algorithm based on our dynamic graph to solve the OBST problem from the corresponding one-to-all shortest path in this graph. It uses our new technique of irregular partitioning of this dynamic graph to try to bring a solution to the contradictory objectives of the minimization of the communication time and the load balancing of the processors in this type of graph. This new technique assures that the blocks of the first diagonals (or the first steps) are of large sizes to minimize the number of communications, but this size decreases along the diagonals to increase the number of blocks in these diagonals and allows processors to stay active as long as possible. This promotes load balancing and thus minimizes the overall computation time of processors. This also reduces their latency, which is the largest part of the overall communication time.
2 Optimal Binary Search Tree Problem
2.1 Classical Sequential Algorithm: Computing Optimal Solution in \({\mathcal {O}}(n^3)\)
-
a key which lies between \(k_{j}\) and \(k_{j+1}\) if \(0<j<n\) ;
-
a key greater than \(k_n\) if \(j=n\) ;
-
a key less than \(k_1\) if \(j=0\).
2.2 Knuth-Acceleration-Based Sequential Algorithm in \({\mathcal {O}}(n^2)\)
3 Dynamic Graph Model for OBST Problem
3.1 Fundamental Concepts
3.2 Description of the Model
4 CGM Algorithm for OBST Problem
4.1 Task Graph Partitioning
4.2 Block Dependency
-
\(LUE_{ij}\) stands for leftmost upper entry of Bk(i, j);
-
\(RUE_{ij}\) stands for rightmost upper entry of Bk(i, j);
-
\(LLE_{ij}\) stands for leftmost lower entry of Bk(i, j);
-
\(RLE_{ij}\) stands for rightmost lower entry of Bk(i, j).
-
\(LUE_{ij}\) = \(Tree(i, j - g(n, p, k) + 1)\);
-
\(RUE_{ij}\) = Tree(i, j);
-
\(LLE_{ij}\) = \(Tree(i + g(n, p, k) - 1, j - g(n, p, k) + 1)\);
-
\(RLE_{ij}\) = \(Tree(i + g(n, p, k) - 1, j)\).
-
\(A = Tree(i, Cut(RUE_{i,j-g(n,p,k)}) - 1)\)
-
\(B = Tree(i, Cut(i + g(n,p,k), j) - 1)\)
-
\(C = Tree(i + g(n,p,k) - 1, Cut(RUE_{i,j-g(n,p,k)}) - 1)\)
-
\(D = Tree(i + g(n,p,k) - 1, Cut(i + g(n,p,k), j) - 1)\)
-
\(E = Tree(Cut(RUE_{i,j-g(n,p,k)}) + 1, j - g(n,p,k) + 1)\)
-
\(F = Tree(Cut(RUE_{i,j-g(n,p,k)}) + 1, j)\)
-
\(G = Tree(Cut(i + g(n,p,k), j) + 1, j - g(n,p,k) + 1)\)
-
\(H = Tree(Cut(i + g(n,p,k), j) + 1, j)\)
4.3 Mapping Blocks Onto Processors
4.4 CGM Algorithm
5 Experimental Results
-
each node is an Intel Xeon Processor E5-2680 V4 (35M Cache, 2.40 GHz);
-
2 login nodes which are not intended to make the calculations but to give the job submission environment;
-
1 NFS server (85 TB) in 10 Gbits;
-
a distributed file system BeeGFS (300 TB) in 100 Gbits.
-
n is the problem size (number of data), with values in the set \(\{511, 1023, 2047, 4095, 8191, 16{,}383, 32{,}767\}\);
-
p is the number of processors, with values in the set \(\{1, 2, 5, 8, 25, 28, 32, 114, 120, 128\}\);
-
k is the number of fragmentations of blocks performed, with values in the set \(\{0, 1, 2\}\).
p
|
n
| ||||||
---|---|---|---|---|---|---|---|
511 | 1027 | 2047 | 4095 | 8191 | 16,383 | 32,767 | |
\(k = 0\)
| |||||||
1 | 0.23 | 1.05 | 6.11 | 39.67 | 258.79 | 2088.16 | 17,240.77 |
2 | 0.19 | 1.01 | 5.70 | 36.80 | 257.51 | 2054.92 | 15,728.34 |
8 | 0.15 | 0.77 | 4.32 | 27.47 | 190.33 | 1458.98 | 10,785 |
32 | 0.09 | 0.46 | 2.58 | 16.27 | 112.56 | 832.55 | 6365.66 |
128 | 0.05 | 0.25 | 1.39 | 8.76 | 60.21 | 443.98 | 3384.43 |
\(k = 1\)
| |||||||
2 | 0.17 | 0.88 | 4.87 | 30.47 | 208.72 | 1549.23 | 12444.43 |
8 | 0.13 | 0.64 | 3.43 | 20.69 | 138.59 | 1040.62 | 7851.66 |
32 | 0.07 | 0.37 | 1.97 | 11.86 | 78.51 | 566.24 | 4258.75 |
128 | 0.04 | 0.21 | 1.09 | 6.39 | 41.72 | 296.80 | 2222.54 |
\(k = 2\)
| |||||||
2 | 0.17 | 0.89 | 4.67 | 28.30 | 190.92 | 1392.15 | 10529.33 |
8 | 0.12 | 0.59 | 3.12 | 18.50 | 123.45 | 905.05 | 6798.14 |
32 | 0.07 | 0.34 | 1.78 | 10.40 | 67.79 | 483.59 | 3591.39 |
128 | 0.04 | 0.19 | 0.98 | 5.62 | 35.91 | 251.51 | 1870.06 |
p
|
n
| ||||||
---|---|---|---|---|---|---|---|
511 | 1027 | 2047 | 4095 | 8191 | 16,383 | 32,767 | |
\(k = 0\)
| |||||||
2 | 1.20 | 1.04 | 1.07 | 1.07 | 1 | 1.01 | 1.09 |
8 | 1.50 | 1.36 | 1.41 | 1.44 | 1.35 | 1.43 | 1.59 |
32 | 2.53 | 2.27 | 2.36 | 2.43 | 2.29 | 2.50 | 2.70 |
128 | 4.55 | 4.14 | 4.36 | 4.52 | 4.29 | 4.70 | 5.09 |
\(k = 1\)
| |||||||
2 | 1.31 | 1.19 | 1.25 | 1.30 | 1.23 | 1.34 | 1.38 |
8 | 1.75 | 1.64 | 1.77 | 1.91 | 1.86 | 2 | 2.19 |
32 | 2.98 | 2.80 | 3.09 | 3.34 | 3.29 | 3.68 | 4.04 |
128 | 5.02 | 4.94 | 5.60 | 6.20 | 6.19 | 7.03 | 7.75 |
\(k = 2\)
| |||||||
2 | 1.34 | 1.18 | 1.30 | 1.40 | 1.35 | 1.49 | 1.63 |
8 | 1.89 | 1.77 | 1.95 | 2.14 | 2.09 | 2.30 | 2.53 |
32 | 3.16 | 3.06 | 3.43 | 3.81 | 3.81 | 4.31 | 4.80 |
128 | 5.23 | 5.32 | 6.21 | 7.05 | 7.19 | 8.30 | 9.21 |
p
|
n
| ||||||
---|---|---|---|---|---|---|---|
511 | 1027 | 2047 | 4095 | 8191 | 16,383 | 32,767 | |
\(k = 0\)
| |||||||
2 | 60.29 | 52.30 | 53.61 | 53.89 | 50.19 | 50.80 | 54.80 |
8 | 18.78 | 17.05 | 17.64 | 18.04 | 16.97 | 17.89 | 19.98 |
32 | 7.92 | 7.10 | 7.39 | 7.61 | 7.17 | 7.83 | 8.46 |
128 | 3.56 | 3.23 | 3.41 | 3.53 | 3.35 | 3.67 | 3.97 |
\(k = 1\)
| |||||||
2 | 65.87 | 59.80 | 62.69 | 65.08 | 61.92 | 67.39 | 69.27 |
8 | 21.99 | 20.54 | 22.21 | 23.96 | 23.31 | 25.08 | 27.44 |
32 | 9.31 | 8.75 | 9.67 | 10.44 | 10.28 | 11.52 | 12.65 |
128 | 3.92 | 3.86 | 4.38 | 4.84 | 4.84 | 5.49 | 6.06 |
\(k = 2\)
| |||||||
2 | 67.05 | 59.10 | 65.30 | 70.07 | 67.69 | 74.99 | 81.87 |
8 | 23.68 | 22.16 | 24.41 | 26.80 | 26.17 | 28.84 | 31.70 |
32 | 9.88 | 9.57 | 10.72 | 11.91 | 11.91 | 13.49 | 15 |
128 | 4.09 | 4.16 | 4.85 | 5.51 | 5.62 | 6.48 | 7.20 |