Introduction
-
We provide domain specific metrics which were most relevant for streaming platform running on top of high performance computing architecture because existing methodologies only depicts about the big data processing and distributed database management framework.
-
We provide performance behavior of streaming platform running on top of high performance architecture.
-
We transform state-of-the-art automated performance tuning module of distributed database management system to work for distributed streaming platform.
-
We propose a novel framework running on top of a streaming platform using linear least squares with L2 regularization to recommend a plausible performance for the stream of individual topology.
-
To validate and evaluate the proposed framework, we implemented on an emerging processing system, Apache Heron.
Background
-
Fields grouping: The progression of tuples is transmitted to those processing logic components comprised of similar meta-attribute value.
-
Global grouping: The progression of tuples is transmitted to single instance having lowest encoded meta-attribute value.
-
Shuffle grouping: The progression of tuples is randomly distributed to distinct instances while ensuring uniform distribution.
-
None grouping: Till now, having similar functionality as shuffle grouping.
-
All grouping: The progression of tuples distributed to all corresponding processing components.
-
Custom grouping: The progression of tuples distributed to corresponding processing components as defined by the user.
-
Sliding window: Tuples in a stream are grouped together to form windows that can be overlap either on the basis of time duration or on number of operation performed.
-
Tumbling window: Tuples in a stream are grouped together to form non-overlapping window either on the basis of time duration or on number of operation performed.
Design and implementation of proposed framework
Overview
Performance metrics classification
Memory metrics
n-Verticals metrics
Communication metrics
Computation metrics
Scheduler metrics
Data streaming performance prediction model
Regressions | Cluster | MSE | RMSE | MAE | R\(^2\) |
---|---|---|---|---|---|
Lasso regression | C-I | 68.9297 | 8.0056 | 5.9667 | 0.9807 |
Lasso regression | C-II | 58.4858 | 7.5675 | 5.3386 |
0.963
|
Lasso regression |
C-III
|
42.481
|
6.518
|
4.649
| 0.99 |
Ridge regression | C-I | 66.7715 | 8.1146 | 5.8574 | 0.9813 |
Ridge regression | C-II | 55.2465 | 7.3488 | 5.2273 | 0.9663 |
Ridge regression |
C-III
|
40.433
|
6.359
|
4.546
|
0.991
|
Elastic net regression | C-I | 69.1028 | 8.0056 | 5.9757 | 0.9805 |
Elastic net regression | C-II | 60.7171 | 7.6966 | 5.3623 | 0.9618 |
Elastic net regression |
C-III
|
42.578
|
6.525
|
4.693
|
0.99
|
\(\epsilon\)-SVR linear kernel | C-I | 316.88 |
8.1301
| 13.7762 | 0.9113 |
\(\epsilon\)-SVR linear kernel | C-II | 298.6252 | 16.7423 | 13.3822 |
0.813
|
\(\epsilon\)-SVR linear kernel |
C-III
|
126.669
| 11.255 |
9.044
| 0.971 |
nu-SVR linear kernel | C-I | 227.4365 | 17.0608 | 11.7427 | 0.9331 |
nu-SVR linear kernel | C-II | 214.9223 | 14.2851 | 10.9677 |
0.8661
|
nu-SVR linear kernel |
C-III
|
132.333
|
11.502
|
9.289
| 0.969 |