Introduction
Related work
Methods/experimental
Technological infrastructure
Dataset and queries
Test scenarios
Results
Multiple partitioning
SF = 30 | SF = 100 | SF = 300 | SF = 30 | SF = 100 | SF = 300 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||||||
SS | SS-P | SS | SS-P | SS | SS-P | SS | SS-P | SS | SS-P | SS | SS-P | |
Q1.1
| 25 |
21
| 31 |
22
| 44 |
25
| 5 |
2
| 13 |
4
| 36 |
8
|
Q1.2 |
24
| 27 |
29
| 33 |
42
| 54 |
5
| 7 |
13
| 18 |
34
| 48 |
Q1.3
| 24 |
21
| 29 |
22
| 43 |
26
| 4 |
2
| 13 |
4
| 35 |
8
|
Q2.1
| 32 |
30
| 47 |
45
| 531 |
153
| 8 |
4
| 19 |
8
| 59 |
23
|
Q2.2
| 31 |
28
| 46 |
39
| 531 |
152
| 7 |
4
| 18 |
6
| 51 |
17
|
Q2.3
| 30 |
27
| 44 |
41
| 531 |
147
| 7 |
3
| 17 |
6
| 49 |
15
|
Q3.1
| 35 |
26
| 59 |
34
| 651 |
162
| 8 |
4
| 29 |
9
| 81 |
27
|
Q3.2
| 30 | 30 |
45
| 52 | 677 |
570
|
6
| 7 |
17
| 19 |
51
| 52 |
Q3.3
|
33
| 37 | 219 |
75
| 665 |
578
|
5
| 7 |
15
| 16 |
43
| 48 |
Q3.4 |
34
| 36 |
222
| 223 | 675 |
618
|
6
| 8 |
15
| 20 |
43
| 56 |
Q4.1
| 38 |
33
| 86 |
70
| 226 |
205
| 13 |
6
| 43 |
15
| 119 |
40
|
Q4.2
| 49 |
30
| 70 |
58
| 141 |
91
| 9 |
4
| 26 |
9
| 69 |
20
|
Q4.3
| 34 |
29
| 54 |
44
| 116 |
70
| 8 |
5
| 23 |
14
| 63 |
36
|
Total | 420 | 375 | 982 | 760 | 4874 | 2849 | 92 | 63 | 262 | 149 | 733 | 399 |
Diff. | − 11% | − 23% | − 42% | − 32% | − 43% | − 46% |
SF = 30 | SF = 100 | SF = 300 | SF = 30 | SF = 100 | SF = 300 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||||||
DT | DT-P | DT | DT-P | DT | DT-P | DT | DT-P | DT | DT-P | DT | DT-P | |
Q1.1
| 24 |
20
| 29 |
21
| 51 |
29
| 5 |
2
| 13 |
3
| 37 |
8
|
Q1.2 |
24
| 26 |
29
| 36 |
45
| 80 | 5 |
2
| 14 |
5
| 38 |
16
|
Q1.3
| 23 |
21
| 30 |
21
| 45 |
30
| 5 |
2
| 14 |
3
| 39 |
8
|
Q2.1
| 25 |
21
| 36 |
23
| 79 |
30
| 4 |
3
| 10 |
5
| 36 |
16
|
Q2.2
| 36 |
24
| 73 |
32
| 161 |
50
| 4 |
3
| 10 |
6
| 32 |
16
|
Q2.3
| 25 |
21
| 35 |
22
| 62 |
29
| 4 |
3
| 10 |
5
| 29 |
17
|
Q3.1
| 28 |
21
| 40 |
23
| 98 |
31
| 5 |
2
| 12 |
3
| 33 |
11
|
Q3.2
| 28 |
25
| 41 |
29
| 93 |
60
| 5 |
4
| 12 |
5
|
29
| 32 |
Q3.3
| 25 | 25 | 38 |
29
| 59 |
62
|
4
| 5 | 9 |
6
|
27
| 44 |
Q3.4 |
25
| 28 |
38
| 40 |
72
| 108 |
5
| 7 |
12
| 13 |
33
| 81 |
Q4.1
| 27 |
22
| 41 |
24
| 103 |
34
| 6 |
3
| 14 |
6
| 42 |
20
|
Q4.2
| 29 |
17
| 42 |
21
| 107 |
25
| 6 |
2
| 14 |
4
| 49 |
12
|
Q4.3
| 29 |
21
| 42 |
24
| 114 |
34
| 5 |
4
| 12 |
7
| 49 |
19
|
Total | 349 | 292 | 516 | 346 | 1090 | 602 | 63 | 43 | 155 | 71 | 472 | 299 |
Diff. | − 16% | − 33% | − 45% | − 32% | − 54% | − 37% |
SF = 30 | SF = 100 | SF = 30 | SF = 100 | |||||
---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||
SS | SS-P | SS | SS-P | SS | SS-P | SS | SS-P | |
Q1.1 |
25
| 32 |
31
| 41 |
5
| 11 |
13
| 36 |
Q1.2 |
24
| 31 |
29
| 44 |
5
| 11 |
13
| 37 |
Q1.3 |
24
| 30 |
29
| 43 |
4
| 11 |
13
| 35 |
Q2.1
| 32 |
30
| 47 |
46
| 8 |
4
| 19 |
10
|
Q2.2
| 31 |
29
| 46 |
43
| 7 |
4
| 18 |
10
|
Q2.3
| 30 | 30 |
44
| 45 | 7 |
4
| 17 |
9
|
Q3.1
| 35 |
28
| 59 |
38
| 8 |
4
| 29 |
11
|
Q3.2
| 30 |
24
| 45 |
28
| 6 |
2
| 17 |
4
|
Q3.3
| 33 |
27
| 219 |
35
| 5 |
2
| 15 |
3
|
Q3.4
| 34 |
26
| 222 |
33
| 6 |
2
| 15 |
3
|
Q4.1
| 38 |
37
| 86 |
79
| 13 |
6
| 43 |
16
|
Q4.2
| 49 |
42
| 70 |
63
| 9 |
5
| 26 |
14
|
Q4.3
| 34 |
30
| 54 |
45
| 8 |
3
| 23 |
5
|
Total | 420 | 396 | 982 | 582 | 92 | 70 | 262 | 193 |
Diff. | − 6% | − 41% | − 24% | − 26% |
Bucketing
Data model | SF | Table size (MB) | HDFS block (128 MB) | At least 1 GB |
---|---|---|---|---|
SS | 30 | 5088 |
\( \frac{{5088 {\text{MB}}}}{{128 {\text{MB}}}} \cong 40\, \text{buckets} \)
|
\( \frac{{5088 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 5\varvec{ }\, \text{buckets} \)
|
100 | 16,533 |
\( \frac{{16533 {\text{MB}}}}{{128 {\text{MB}}}} \cong 129\, \text{buckets} \)
|
\( \frac{{16533 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 16\varvec{ }\,\text{buckets} \)
| |
300 | 49,700 |
\( \frac{{49700 {\text{MB}}}}{{128 {\text{MB}}}} \cong 388\, \text{buckets} \)
|
\( \frac{{49700 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 49\varvec{ }\,\text{buckets} \)
| |
DT | 30 | 14,650 |
\( \frac{{14650 {\text{MB}}}}{{128 {\text{MB}}}} \cong 114\, \text{buckets} \)
|
\( \frac{{14650 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 14\varvec{ }\,\text{buckets} \)
|
100 | 46,800 |
\( \frac{{46800 {\text{MB}}}}{{128 {\text{MB}}}} \cong 366\, \text{buckets} \)
|
\( \frac{{46800 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 46\varvec{ }\,\text{buckets} \)
|
SF = 30 | SF = 100 | SF = 30 | SF = 100 | |||||
---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||
SS | SS-B | SS | SS-B | SS | SS-B | SS | SS-B | |
Q1.1
| 25 |
23
| 31 |
29
|
5
| 7 |
13
| 14 |
Q1.2 | 24 |
23
|
29
| 30 |
5
| 7 | 13 | 13 |
Q1.3
| 24 |
23
|
29
| 30 |
4
| 6 | 13 | 13 |
Q2.1
|
32
| 33 |
47
| 59 |
8
| 11 |
19
| 26 |
Q2.2
|
31
| 32 |
46
| 51 |
7
| 11 |
18
| 23 |
Q2.3
| 30 | 30 |
44
| 54 |
7
| 10 |
17
| 22 |
Q3.1
| 35 | 35 |
59
| 64 |
8
| 12 |
29
| 30 |
Q3.2
| 30 | 30 |
45
| 46 |
6
| 8 |
17
| 19 |
Q3.3
|
33
| 34 |
219
| 224 |
5
| 8 |
15
| 18 |
Q3.4 | 34 |
32
|
222
| 225 |
6
| 7 |
15
| 18 |
Q4.1
|
38
| 39 |
86
| 100 |
13
| 19 |
43
| 47 |
Q4.2
|
49
| 50 | 70 | 70 |
9
| 14 |
26
| 33 |
Q4.3
|
34
| 35 |
54
| 65 |
8
| 13 |
23
| 29 |
Total | 420 | 421 | 982 | 1047 | 92 | 133 | 262 | 305 |
Diff. | 0% | 7% | 44% | 16% |
SF = 30 | SF = 100 | SF = 30 | SF = 100 | |||||
---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||
DT | DT-B | DT | DT-B | DT | DT-B | DT | DT-B | |
Q1.1
| 24 |
23
|
29
| 31 | 5 | 5 |
13
| 15 |
Q1.2 | 24 |
23
|
29
| 30 | 5 |
6
|
14
| 15 |
Q1.3
| 23 | 23 | 30 | 30 | 5 | 5 |
14
| 15 |
Q2.1
|
25
| 26 |
36
| 42 |
4
| 6 |
10
| 14 |
Q2.2
| 36 |
35
| 73 |
69
|
4
| 5 |
10
| 12 |
Q2.3
| 25 |
23
| 35 |
34
| 4 | 4 | 10 | 10 |
Q3.1
| 28 |
27
|
40
| 45 | 5 | 5 |
12
| 13 |
Q3.2
| 28 |
27
|
41
| 44 | 5 | 5 | 12 | 12 |
Q3.3
| 25 |
24
| 38 |
35
| 4 | 4 |
9
| 10 |
Q3.4 | 25 | 25 |
38
| 42 | 5 | 5 | 12 | 12 |
Q4.1
| 27 | 27 |
41
| 44 |
6
| 7 |
14
| 16 |
Q4.2
| 29 | 29 |
42
| 46 | 6 | 6 |
14
| 17 |
Q4.3
| 29 | 29 |
42
| 47 |
5
| 6 |
12
| 17 |
Total | 349 | 342 | 516 | 539 | 63 | 71 | 155 | 178 |
Diff. | − 2% | 5% | 14% | 15% |
SF = 30 | SF = 100 | SF = 30 | SF = 100 | |||||
---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||
DT | DT-B | DT | DT-B | DT | DT-B | DT | DT-B | |
Q1.1 | 24 |
18
| 29 |
21
| 5 |
3
| 13 |
8
|
Q1.2 | 24 |
19
| 29 |
21
| 5 |
3
| 14 |
9
|
Q1.3 | 23 |
18
| 30 |
22
| 5 |
3
| 14 |
8
|
Q2.1
| 25 |
18
| 36 |
20
| 4 |
2
| 10 |
4
|
Q2.2
| 36 |
18
| 73 |
20
| 4 |
2
| 10 |
3
|
Q2.3
| 25 |
18
| 35 |
16
| 4 |
2
| 10 |
4
|
Q3.1
| 28 |
26
| 40 |
39
| 5 | 5 |
12
| 14 |
Q3.2
| 28 |
25
| 41 |
40
| 5 | 5 |
12
| 13 |
Q3.3
| 25 |
23
| 38 |
32
| 4 | 4 |
9
| 11 |
Q3.4
|
25
| 26 |
38
| 39 | 5 | 5 |
12
| 13 |
Q4.1
| 27 |
22
| 41 |
30
| 6 |
3
| 14 |
9
|
Q4.2
| 29 |
20
| 42 |
23
| 6 |
2
| 14 |
5
|
Q4.3
| 29 |
14
| 42 |
15
| 5 |
2
| 12 |
4
|
Total | 349 | 265 | 516 | 337 | 63 | 41 | 155 | 103 |
Diff. | − 24% | − 35% | − 35% | − 34% |
SF = 30 | SF = 100 | SF = 300 | SF = 30 | SF = 100 | SF = 300 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||||||
SS | SS-B | SS | SS-B | SS | SS-B | SS | SS-B | SS | SS-B | SS | SS-B | |
Q1.1 | 25 |
22
| 31 |
29
|
44
| 46 |
5
| 6 |
13
| 16 | 36 | 36 |
Q1.2 | 24 |
23
| 29 | 29 |
42
| 44 |
5
| 7 |
13
| 16 |
34
| 36 |
Q1.3 | 24 |
23
|
29
| 30 |
43
| 45 |
4
| 7 |
13
| 14 | 35 |
34
|
Q2.1 | 32 |
31
|
47
| 53 | 531 |
110
|
8
| 11 |
19
| 25 |
59
| 62 |
Q2.2 | 31 |
29
|
46
| 66 |
531
| 611 |
7
| 9 |
18
| 22 |
51
| 56 |
Q2.3 | 30 |
29
|
44
| 49 | 531 |
101
|
7
| 9 |
17
| 22 |
49
| 53 |
Q3.1 | 35 |
33
|
59
| 68 | 651 |
137
|
8
| 11 |
29
| 35 |
81
| 83 |
Q3.2 | 30 |
28
|
45
| 52 | 677 |
92
|
6
| 8 |
17
| 23 |
51
| 53 |
Q3.3 | 33 | 33 | 219 |
45
| 665 |
80
|
5
| 7 |
15
| 17 |
43
| 44 |
Q3.4 | 34 |
30
| 222 |
44
| 675 |
78
| 6 | 6 |
15
| 19 | 43 | 43 |
Q4.1 |
38
| 39 |
86
| 88 |
226
| 237 |
13
| 17 |
43
| 51 |
119
| 127 |
Q4.2 | 49 | 49 | 70 |
65
| 141 |
119
|
9
| 11 |
26
| 32 |
69
| 75 |
Q4.3 |
34
| 35 |
54
| 57 | 116 |
103
|
8
| 10 |
23
| 28 |
63
| 67 |
Total | 420 | 404 | 982 | 676 | 4874 | 1803 | 92 | 120 | 262 | 321 | 733 | 768 |
Diff. | − 4% | − 31% | − 63% | 30% | 22% | 5% |
SF = 30 | SF = 100 | SF = 300 | SF = 30 | SF = 100 | SF = 300 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||||||
SS | SS-B | SS | SS-B | SS | SS-B | SS | SS-B | SS | SS-B | SS | SS-B | |
Q1.1 | 25 |
23
| 31 |
29
|
44
| 45 | 5 | 5 |
13
| 14 | 36 |
35
|
Q1.2 | 24 | 24 |
29
| 30 |
42
| 45 |
5
| 6 | 13 |
12
| 34 | 34 |
Q1.3 | 24 | 24 |
29
| 30 |
43
| 44 |
4
| 5 | 13 |
12
|
35
| 36 |
Q2.1 |
32
| 33 |
47
| 59 |
531
| 702 |
8
| 11 |
19
| 27 |
59
| 82 |
Q2.2 | 31 | 31 |
46
| 51 |
531
| 681 |
7
| 9 |
18
| 23 |
51
| 67 |
Q2.3 |
30
| 31 |
44
| 54 |
531
| 699 |
7
| 9 |
17
| 22 |
49
| 62 |
Q3.1 | 35 |
34
|
59
| 64 |
651
| 684 |
8
| 11 |
29
| 30 |
81
| 88 |
Q3.2 | 30 | 30 |
45
| 46 |
677
| 688 |
6
| 7 |
17
| 20 |
51
| 57 |
Q3.3 | 33 | 33 |
219
| 224 |
665
| 702 |
5
| 7 |
15
| 17 |
43
| 52 |
Q3.4 | 34 |
32
|
222
| 225 |
675
| 870 |
6
| 7 |
15
| 16 |
43
| 52 |
Q4.1
|
38
| 39 |
86
| 100 |
226
| 256 |
13
| 18 |
43
| 49 |
119
| 142 |
Q4.2
|
49
| 50 | 70 | 70 |
141
| 155 |
9
| 14 |
26
| 33 |
69
| 90 |
Q4.3
|
34
| 37 |
54
| 65 |
116
| 141 |
8
| 12 |
23
| 29 |
63
| 77 |
Total | 420 | 420 | 982 | 1047 | 4874 | 5712 | 92 | 121 | 262 | 305 | 733 | 876 |
Diff. | 0% | 7% | 17% | 32% | 16% | 19% |
Combination of partitioning and bucketing
Data model | Scenario | SF | Partition size | HDFS block (128 MB) | At least 1 GB |
---|---|---|---|---|---|
SS | P = Od_Year B = Orderkey | 30 | 828 MB |
\( \frac{{828 {\text{MB}}}}{{128 {\text{MB}}}} \cong 6\varvec{ }\,buckets \)
| – |
100 | 2844 MB | \( \frac{{2844 {\text{MB}}}}{{128 {\text{MB}}}} \cong 22\) buckets |
\( \frac{{2844 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \)
| ||
300 | 8670 MB | \( \frac{{8670 {\text{MB}}}}{{128 {\text{MB}}}} \cong 68\) buckets |
\( \frac{{8670 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 9\varvec{ }\,buckets \)
| ||
P = S_Region B = Suppkey | 30 | 879 MB |
\( \frac{{879 {\text{MB}}}}{{128 {\text{MB}}}} \cong 7\varvec{ }\,buckets \)
| – | |
100 | 3306 MB | \( \frac{{3306 {\text{MB}}}}{{128 {\text{MB}}}} \cong 26\) buckets |
\( \frac{{3306 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \)
| ||
300 | 9830 MB | \( \frac{{9830 {\text{MB}}}}{{128 {\text{MB}}}} \cong 77\) buckets |
\( \frac{{9830 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 9\varvec{ }\,buckets \)
| ||
DT | P = Od_Year B = P_Brand | 30 | 828 MB |
\( \frac{{828 {\text{MB}}}}{{128 {\text{MB}}}} \cong 6\varvec{ }\,buckets \)
| – |
100 | 2844 MB | \( \frac{{2844 {\text{MB}}}}{{128 {\text{MB}}}} \cong 22\) buckets |
\( \frac{{2844 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 3\varvec{ }\,buckets \)
| ||
P = Od_Year S_Region B = P_Brand | 30 | 240 MB |
\( \frac{{240 {\text{MB}}}}{{128 {\text{MB}}}} \cong 2\varvec{ }\,buckets \)
| – | |
100 | 1815 MB | \( \frac{{1815 {\text{MB}}}}{{128 {\text{MB}}}} \cong 14\) buckets |
\( \frac{{1815 {\text{MB}}}}{{1024 {\text{MB}}}} \cong 2\varvec{ }\,buckets \)
|
Star schema
SF = 30 | SF = 100 | SF = 300 | SF = 30 | SF = 100 | SF = 300 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||||||
SS | SS-PB | SS | SS-PB | SS | SS-PB | SS | SS-PB | SS | SS-PB | SS | SS-PB | |
Q1.1
| 25 |
16
| 31 |
21
| 44 |
26
| 5 |
2
| 13 |
4
| 36 |
7
|
Q1.2 | 24 |
23
|
29
| 32 | 42 | 42 |
5
| 6 | 13 | 13 |
34
| 44 |
Q1.3
| 24 |
18
| 29 |
21
| 43 |
25
| 4 |
2
| 13 |
4
| 35 |
8
|
Q2.1 |
32
| 33 |
47
| 60 |
531
| 682 |
8
| 11 |
19
| 26 |
59
| 98 |
Q2.2 |
31
| 32 |
46
| 53 |
531
| 677 |
7
| 10 |
18
| 23 |
51
| 76 |
Q2.3 | 30 | 30 |
44
| 52 |
531
| 670 |
7
| 9 |
17
| 22 |
49
| 74 |
Q3.1
| 35 |
31
| 59 |
56
|
651
| 667 |
8
| 10 | 29 | 29 |
81
| 100 |
Q3.2
| 30 |
28
|
45
| 50 | 677 |
634
|
6
| 7 |
17
| 19 |
51
| 63 |
Q3.3
| 33 | 33 | 219 |
78
| 665 |
648
|
5
| 6 |
15
| 16 |
43
| 53 |
Q3.4 | 34 |
31
|
222
| 228 | 675 |
674
|
6
| 7 |
15
| 19 |
43
| 59 |
Q4.1 |
38
| 39 |
86
| 102 |
226
| 253 |
13
| 17 |
43
| 50 |
119
| 164 |
Q4.2
| 49 |
35
| 70 |
63
| 141 |
91
| 9 |
7
| 26 |
18
| 69 |
48
|
Q4.3
| 34 |
28
| 54 |
50
| 116 |
77
| 8 |
6
| 23 |
14
| 63 |
38
|
Total | 420 | 378 | 982 | 865 | 4874 | 5166 | 92 | 100 | 262 | 256 | 733 | 835 |
Diff | − 10% | − 12% | 6% | 8% | − 2% | 14% |
SF = 30 | SF = 100 | SF = 300 | SF = 30 | SF = 100 | SF = 300 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||||||
SS | SS-PB | SS | SS-PB | SS | SS-PB | SS | SS-PB | SS | SS-PB | SS | SS-PB | |
Q1.1 | 25 |
22
|
31
| 32 |
44
| 48 |
5
| 6 |
13
| 24 | 36 | 36 |
Q1.2 | 24 |
23
| 29 | 29 |
42
| 47 |
5
| 6 |
13
| 23 |
34
| 37 |
Q1.3 | 24 |
23
|
29
| 30 |
43
| 46 |
4
| 6 |
13
| 22 |
35
| 37 |
Q2.1
| 32 |
25
| 47 |
39
| 531 |
44
| 8 |
4
| 19 |
10
| 59 |
19
|
Q2.2
| 31 |
21
| 46 |
39
| 531 |
143
|
7
|
4
| 18 |
9
| 51 |
14
|
Q2.3
| 30 |
21
| 44 |
38
| 531 |
42
|
7
|
4
| 17 |
9
| 49 |
13
|
Q3.1
| 35 |
23
| 59 |
37
| 651 |
67
| 8 |
4
| 29 |
15
| 81 |
28
|
Q3.2 | 30 |
29
|
45
| 46 | 677 |
96
|
6
| 7 |
17
| 33 |
51
| 52 |
Q3.3 | 33 | 33 | 219 | 219 | 665 |
77
|
5
| 7 |
15
| 29 |
43
| 46 |
Q3.4 | 34 |
32
| 222 |
220
| 675 |
75
|
6
| 7 |
15
| 29 |
43
| 45 |
Q4.1
| 38 |
30
| 86 |
61
| 226 |
118
| 13 |
7
| 43 |
22
| 119 |
36
|
Q4.2
| 49 |
34
| 70 |
58
| 141 |
67
| 9 |
5
| 26 |
17
| 69 |
26
|
Q4.3 | 34 | 34 |
54
| 60 | 116 |
110
|
8
| 11 |
23
| 44 | 63 | 63 |
Total | 420 |
349
| 982 |
908
| 4874 |
982
| 92 |
77
|
262
| 285 | 733 |
452
|
Diff | − 17% | − 8% | − 80% | − 16% | 9% | − 38% |
SF = 30 | SF = 100 | SF = 300 | SF = 30 | SF = 100 | SF = 300 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||||||
SS | SS-PB | SS | SS-PB | SS | SS-PB | SS | SS-PB | SS | SS-PB | SS | SS-PB | |
Q1.1
| 25 |
19
| 31 |
22
| 44 |
27
| 5 |
3
| 13 |
5
| 36 |
12
|
Q1.2 |
24
| 25 |
29
| 32 |
42
| 51 |
5
| 8 |
13
| 21 |
34
| 65 |
Q1.3
| 24 |
19
| 29 |
22
| 43 |
25
| 4 |
2
| 13 |
5
| 35 |
12
|
Q2.1
| 32 |
28
| 47 |
43
| 531 |
50
| 8 |
5
| 19 |
12
| 59 |
37
|
Q2.2
| 31 |
26
| 46 |
41
| 531 |
160
| 7 |
5
| 18 |
10
| 51 |
27
|
Q2.3
| 30 |
26
| 44 |
41
| 531 |
45
| 7 |
4
| 17 |
9
| 49 |
26
|
Q3.1
| 35 |
25
| 59 |
36
| 651 |
66
| 8 |
5
| 29 |
14
| 81 |
44
|
Q3.2
| 30 | 30 |
45
| 50 | 677 |
92
|
6
| 9 |
17
| 31 |
51
| 100 |
Q3.3
|
33
| 36 | 219 |
78
| 665 |
78
|
5
| 9 |
15
| 25 |
43
| 86 |
Q3.4 | 34 |
33
|
222
| 226 | 675 |
81
|
6
| 10 |
15
| 30 |
43
| 91 |
Q4.1
| 38 |
33
| 86 |
70
| 226 |
127
| 13 |
9
| 43 |
24
| 119 |
65
|
Q4.2
| 49 |
30
| 70 |
57
| 141 |
60
| 9 |
4
| 26 |
13
| 69 |
25
|
Q4.3
| 34 |
31
| 54 |
47
| 116 |
72
| 8 |
7
| 23 |
21
| 63 |
60
|
Total | 420 | 362 | 982 | 765 | 4874 | 933 | 92 | 81 | 262 | 220 | 733 | 650 |
DIF | − 14% | − 22% | − 81% | − 12% | − 16% | − 11% |
Denormalized table
SF = 30 | SF = 100 | SF = 30 | SF = 100 | |||||
---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||
DT | DT-PB | DT | DT-PB | DT | DT-PB | DT | DT-PB | |
Q1.1
| 24 |
19
| 29 |
21
| 5 |
2
| 13 |
3
|
Q1.2
| 24 |
21
| 29 |
22
| 5 |
2
| 14 |
5
|
Q1.3
| 23 |
20
| 30 |
21
| 5 |
2
| 14 |
4
|
Q2.1 |
25
| 26 |
36
| 40 |
4
| 5 |
10
| 14 |
Q2.2 | 36 | 36 | 73 |
68
| 4 | 4 |
10
| 11 |
Q2.3 | 25 |
23
| 35 |
32
| 4 | 4 | 10 |
9
|
Q3.1
| 28 |
25
| 40 | 40 | 5 | 5 |
12
| 13 |
Q3.2
| 28 |
26
| 41 |
39
| 5 | 5 |
12
| 13 |
Q3.3
| 25 |
22
| 38 |
31
| 4 | 4 |
9
| 10 |
Q3.4 | 25 | 25 |
38
| 39 | 5 |
4
| 12 |
10
|
Q4.1 | 27 | 27 |
41
| 43 | 6 | 6 |
14
| 17 |
Q4.2
| 29 |
22
| 42 |
26
| 6 |
3
| 14 |
5
|
Q4.3
| 29 |
21
| 42 |
27
| 5 |
3
| 12 |
5
|
Total | 349 | 312 | 516 | 449 | 63 | 47 | 155 | 119 |
Diff | − 10% | − 13% | − 24% | − 23% |
SF = 30 | SF = 100 | SF = 30 | SF = 100 | |||||
---|---|---|---|---|---|---|---|---|
HIVE | PRESTO | |||||||
DT | DT-PB | DT | DT-PB | DT | DT-PB | DT | DT-PB | |
Q1.1
| 24 |
19
| 29 |
23
| 5 |
2
| 13 |
5
|
Q1.2 | 24 |
22
|
29
| 44 | 5 |
3
| 14 |
8
|
Q1.3
| 23 |
18
| 30 |
25
| 5 |
3
| 14 |
6
|
Q2.1
| 25 |
20
| 36 |
25
| 4 |
2
| 10 |
6
|
Q2.2
| 36 |
22
| 73 |
35
| 4 |
2
| 10 |
5
|
Q2.3
| 25 |
19
| 35 |
24
| 4 |
2
| 10 |
4
|
Q3.1
| 28 |
19
| 40 |
27
| 5 |
2
| 12 |
6
|
Q3.2
| 28 |
23
|
41
| 47 | 5 |
3
| 12 |
11
|
Q3.3
| 25 |
23
|
38
| 45 | 4 |
3
|
9
| 12 |
Q3.4 | 25 | 25 |
38
| 51 | 5 |
4
|
12
| 13 |
Q4.1
| 27 |
20
| 41 |
28
| 6 |
3
| 14 |
7
|
Q4.2
| 29 |
15
| 42 |
22
| 6 |
2
| 14 |
3
|
Q4.3
| 29 |
21
| 42 |
29
| 5 |
2
| 12 |
5
|
Total | 349 | 265 | 516 | 424 | 63 | 33 | 155 | 90 |
Diff | − 24% | − 18% | − 47% | − 42% |
CPU usage
Synopsis
Data organization strategy | Data model | Attributes | Decrease in processing time | Decrease in CPU usage |
---|---|---|---|---|
Multiple partitioning | SS-P DT-P | “Od_Year”, “S_Region” | Yes | Yes |
SS-P | “S_Region”, “S_Nation”, “S_City” | Yes | NA | |
Bucketing | SS-B DT-B | “Orderkey” | No | No |
DT-B | “Od_Year”, “P_Brand” | Yes | Yes | |
SS-B | “Suppkey” | Yes (Hive) No (Presto) | No (SF = 100) Yes (Hive, SF = 300) | |
SS-B | “Orderdate”, “Custkey”, “Suppkey”, “Partkey” | No | No | |
Partitioning and bucketing | SS-PB | “Od_Year”, “Orderkey” | No | NA |
SS-PB | “S_Region”, “Suppkey” | Yes | NA | |
SS-PB | “Od_Year”, “S_Region”, “Suppkey” | Yes | Yes | |
DT-PB | “Od_Year”, “P_Brand” | Yes | NA | |
DT-PB | “Od_Year”, S_Region”, “Suppkey”. | Yes | Yes |
Discussion
Main insights
SF | Data model | Without data organization strategies | Multiple partitioning | |
---|---|---|---|---|
Od_Year, S_Region
|
S_Region, S_Nation, S_City
| |||
30 | SS | 92 s | 63 s | 70 s |
− 32% | − 24% | |||
DT | 63 s | 43 s | – | |
− 32% | – | |||
100 | SS | 262 s | 149 s | 193 s |
− 43% | − 26% | |||
DT | 155 s | 71 s | – | |
− 54% | – | |||
300 | SS | 733 s | 399 s | – |
− 46% | – | |||
DT | 472 s | 299 s | – | |
− 37% | – |
SF | Data model | Without data organization strategies | Bucketing | |||
---|---|---|---|---|---|---|
Orderkey
|
Od_Year (sorted by P_Brand)
|
Suppkey
|
Orderdate, Custkey, Suppkey, Partkey
| |||
30 | SS | 92 s | 133 s | – | 120 s | 121 s |
44% | – | 30% | 32% | |||
DT | 63 s | 71 s | 41 s | – | – | |
14% | − 35% | – | – | |||
100 | SS | 262 s | 305 s | – | 321 s | 305 s |
16% | – | 22% | 16% | |||
DT | 155 s | 178 s | 103 s | – | – | |
15% | − 34% | – | – | |||
300 | SS | 733 s | – | – | 768 s | 876 s |
– | – | 5% | 19% | |||
DT | 472 s | – | – | – | – | |
– | – | – | – |
SF | Data model | Without data organization strategies | Partitioning (P) and bucketing (B) | |||
---|---|---|---|---|---|---|
P = Od_Year B = Orderkey | P = S_Region B = Suppkey | P = Od_Year B = P_Brand | P = Od_Year, S_Region B = Suppkey | |||
30 | SS | 92 s | 100 s | 77 s | – | 81 s |
8% | − 16% | – | − 12% | |||
DT | 63 s | 46 s | – | 47 s | 33 s | |
− 26% | – | − 24% | − 47% | |||
100 | SS | 262 s | 256 s | 285 s | – | 220 s |
− 2% |
9%
|
–
| − 16% | |||
DT | 155 s | 129 s | – | 119 s | 90 s | |
− 17% | – | − 23% | − 42% | |||
300 | SS | 733 s | 835 s | 452 s | – | 650 s |
14% | − 38% | – | − 11% | |||
DT | 472 s | – | – | – | – | |
– | – | – | – |
SF | Partitioning | Bucketing | Partitioning and bucketing | Configuration (best scenario) | |||
---|---|---|---|---|---|---|---|
SS | DT | SS | DT | SS | DT | ||
30 | 41 s | Bucketing by “Od_Year” (Sorted by “P_Brand”) | |||||
100 | 71 s | Multiple Partitioning by “Od_Year” and “S_Region” | |||||
300 | 299 s | Multiple Partitioning by “Od_Year” and “S_Region” |
Data model | Attributes | SF | Tool | |||
---|---|---|---|---|---|---|
Time (s) | Increase along SF | |||||
Hive | Presto | Hive | Presto | |||
SS | None | 30 | 420 | 92 | ||
100 | 982 | 262 | 2.34 | 2.85 | ||
300 | 4874 | 733 | 4.96 | 2.80 | ||
SS-P | Od_Year + S_Region | 30 | 375 | 63 | ||
100 | 760 | 149 | 2.03 | 2.37 | ||
300 | 2849 | 399 | 3.75 | 2.68 | ||
SS-B | Orderdate + Custkey + Suppkey + Partkey | 30 | 420 | 121 | ||
100 | 1047 | 305 | 2.49 | 2.52 | ||
300 | 5712 | 876 | 5.46 | 2.87 | ||
Suppkey | 30 | 404 | 120 | |||
100 | 676 | 321 | 1.67 | 2.68 | ||
300 | 1803 | 768 | 2.67 | 2.39 | ||
SS-PB | Od_Year + Orderkey | 30 | 378 | 100 | ||
100 | 865 | 256 | 2.29 | 2.56 | ||
300 | 5166 | 835 | 5.97 | 3.26 | ||
Od_Year + S_Region+ Suppkey | 30 | 362 | 81 | |||
100 | 765 | 220 | 2.11 | 2.72 | ||
300 | 933 | 650 | 1.22 | 2.95 | ||
S_Region + Suppkey | 30 | 349 | 77 | |||
100 | 908 | 285 | 2.60 | 3.70 | ||
300 | 982 | 452 | 1.08 | 1.59 | ||
DT | None | 30 | 349 | 63 | ||
100 | 516 | 155 | 1.48 | 2.46 | ||
300 | 1090 | 472 | 2.11 | 3.05 | ||
DT-P | Od_Year + S_Region | 30 | 292 | 43 | ||
100 | 346 | 71 | 1.18 | 1.65 | ||
300 | 602 | 299 | 1.74 | 4.21 |
Data model | Attributes | SF | Tool | |||
---|---|---|---|---|---|---|
Time (s) | Increase along SF | |||||
Hive | Presto | Hive | Presto | |||
SS | None | 30 | 420 | 92 | ||
300 | 4874 | 733 | 11.60 | 7.97 | ||
SS-P | Od_Year + S_Region | 30 | 375 | 63 | ||
300 | 2849 | 399 | 7.60 | 6.33 | ||
SS-B | Orderdate + Custkey + Suppkey + Partkey | 30 | 420 | 121 | ||
300 | 5712 | 876 | 13.60 | 7.24 | ||
Suppkey | 30 | 404 | 120 | |||
300 | 1803 | 768 | 4.46 | 6.40 | ||
SS-PB | Od_Year + Orderkey | 30 | 378 | 100 | ||
300 | 5166 | 835 | 13.67 | 8.35 | ||
Od_Year + S_Region+ Suppkey | 30 | 362 | 81 | |||
300 | 933 | 650 | 2.58 | 8.02 | ||
S_Region + Suppkey | 30 | 349 | 77 | |||
300 | 982 | 452 | 2.81 | 5.87 | ||
DT | None | 30 | 349 | 63 | ||
300 | 1090 | 472 | 3.12 | 7.49 | ||
DT-P | Od_Year + S_Region | 30 | 292 | 43 | ||
300 | 602 | 299 | 2.06 | 6.95 |
Guidelines for practitioners
Data organization strategy | Data model | Attributes | Decrease in processing time | Decrease in CPU usage | Role of the attributes |
---|---|---|---|---|---|
Multiple partitioning | SS-P DT-P | “Od_Year” “S_Region” | Yes | Yes | Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses |
SS-P | “S_Region” “S_Nation” “S_City” | Yes | NA | Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses | |
Bucketing | SS-B DT-B | “Orderkey” | No | No | Attribute not used in the “where” conditions nor used for “group by” or “order by” |
DT-B | “Od_Year” “P_Brand” | Yes | Yes | Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses | |
SS-B | “Suppkey” | Yes (Hive) No (Presto) | No (SF = 100) Yes (Hive, SF = 300) | Attribute not used in the “where” conditions nor used for “group by” or “order by”. Attribute used for joining tables | |
SS-B | “Orderdate” “Custkey” “Suppkey” “Partkey” | No | No | Attributes not used in the “where” conditions nor used for “group by” or “order by”. Attributes used for joining tables | |
Partitioning and bucketing | SS-PB | “Od_Year” “Orderkey” | No | NA | Only “Od_Year” is used in the “where” conditions, and in the “group by” and “order by” clauses |
SS-PB | “S_Region” “Suppkey” | Yes | NA | Only “S_Region” is used in the “where” conditions. “Suppkey” is used for joining tables | |
SS-PB DT-PB | “Od_Year” “S_Region” “Suppkey” | Yes | Yes | “Od_Year” and “S_Region” are used in the “where” conditions, and “Od_Year” is also used in the “group by” and “order by” clauses. “Suppkey” is used for joining tables in the SS-PB scenario | |
DT-PB | “Od_Year” “P_Brand” | Yes | NA | Attributes are used as filters in the “where” conditions, and in the “group by” and “order by” clauses |