Introduction
-
Missing data labels. Most of the monitoring data does not contain labels that can be immediately used for training a machine learning-based model, and labeling time-series data is often manual and time-consuming.
-
Data noise. Monitoring data collected from a distributed network often contain noises, which can significantly influence the performance of the anomaly detection methods and increase the false-positive detection.
-
We characterize four typical base detection methods on different datasets, and the results show that their detection performance is not good for detection accuracy, robustness, and prediction.
-
Based on base detection methods, we propose an ELBD framework including three classic linear ensemble methods (maximum, average, and weighted average) and a deep ensemble method.
-
We propose \(ARP\_score\) to evaluate detection performance in terms of accuracy, robustness, and multi-step prediction.
-
We evaluated the methods in the ELBD framework on different datasets, and the results show that the deep ensemble method achieves the highest \(ARP\_score\) 5.1821.
Related works
Machine learning-based anomaly detection methods
Type | Method | Description |
---|---|---|
Density-based | LOF [6] | Local Outlier Factor |
COF [14] | Connectivity-based Outlier Factor | |
LOCI [15] | Local Correlation Integral | |
Distance-Based | KNN [16] | K-Nearest Neighbors |
LDOF [17] | Local Distance-based Outlier Factor | |
Kernel-based | OCSVM [18] | One-Class Support Vector Machines |
RSVM [19] | Robust Support Vector Machines | |
Tree-based | IForest [20] | Isolation Forest |
Deep learning | AutoEncoder [21] | Fully connected AutoEncoder |
VAE [22] | Variational AutoEncoder |
Ensemble learning
Base performance anomaly detection methods
Problem definition
Feature extraction
Base detection methods
Experiments and results
Dataset
Resource Metrics | Description |
---|---|
CPU related | Per core and overall load, usage, idle time, I/O wait time, hard and soft interrupt counts, context switch count, etc. |
Memory related | Free, cached, active, inactive, dirty memory, etc. |
Disk related | Disk space used, IOps, I/O usage, read/write rate, etc. |
Network related | Receive/transmit network traffic, etc. |
Dataset | Number of samples | Number of features | Number of extracted features | Anomaly fraction (%) |
---|---|---|---|---|
DApp monitoring data | 3237 | 229 | 15 | 28.14 |
SMD data | 28479 | 38 | 5 | 9.46 |
Vichalana data | 45486 | 13 | 6 | 6.45 |
Experimental settings
Evaluation indicators
Experimental results
Detection methods | DApp monitoring data | SMD data | Vichalana data | |||
---|---|---|---|---|---|---|
F1 score | Time(s) | F1 score | Time(s) | F1 score | Time(s) | |
IForest | 0.791 | 0.318±0.0121 | 0.7515 | 1.278±0.0195 | 0.658 | 1.9814±0.0704 |
KNN | 0.8033 | 0.0246±0.0021 | 0.5713 | 0.311±0.0047 | 0.5519 | 0.7758±0.0693 |
LOF | 0.5143 | 0.0439±0.0015 | 0.5468 | 0.5379±0.0108 | 0.5128 | 1.4684±0.1229 |
OCSVM | 0.737 | 0.3054±0.0076 | 0.6047 | 23.9234±0.8924 | 0.6778 | 190.118±10.5769 |
Ensemble learning-based detection framework
Basic idea
Linear ensemble methods
Index | IForest | KNN | LOF | OCSVM | Max | Avg | Weighted Avg |
---|---|---|---|---|---|---|---|
1 | -0.41 | -0.23 | 0.14 | -0.88 | 0.14 | -0.35 | -0.49 |
2 | -0.18 | -0.03 | 0.63 | -0.86 | 0.63 | -0.11 | -0.33 |
3 | 2.29 | 5.14 | 1.07 | 0.62 | 5.14 | 2.28 | 2.76 |
4 | 2.36 | 4.56 | 0.86 | 0.11 | 4.56 | 1.97 | 2.42 |
5 | 1.99 | 1.5 | -0.3 | -0.19 | 1.99 | 0.75 | 1.14 |
The deep ensemble method
Experiments and results
Experimental settings
-
Performance of methods in the ELBD framework. To evaluate the improvement in detection accuracy and algorithm robustness, we compare the performance of methods in the ELBD framework with the best-performing base detection method. Experiment results can be seen in E1.
-
Multi-step prediction of the deep ensemble method. As for the deep ensemble method, we evaluate its multi-step prediction ability, which can be seen in E2.
Experimental results
Method | IForest | KNN | LOF | OCSVM | Emsemble_max | Ensemble_avg | Ensemble_w_avg | Deep_ensemble |
---|---|---|---|---|---|---|---|---|
DApp monitoring data | 4 | 3 | 8 | 7 | 6 | 5 | 2 | 1 |
SMD | 2 | 7 | 8 | 6 | 3 | 5 | 4 | 1 |
Vichalana data | 5 | 7 | 8 | 4 | 2 | 3 | 6 | 1 |
Average rank | 3.7 | 5.7 | 8 | 5.7 | 3.7 | 4.3 | 4 | 1 |
Robustness score | 0.6143 | 0.3286 | 0 | 0.3286 | 0.6143 | 0.5286 | 0.5714 | 1 |
Challenge | Indicator | IForest | KNN | LOF | OCSVM | Emsemble_max | Ensemble_avg | Ensemble_w_avg | Deep_ensemble |
---|---|---|---|---|---|---|---|---|---|
Detection accuracy | F1 score | 0.7335 | 0.6422 | 0.5246 | 0.6732 | 0.7453 | 0.7188 | 0.7169 | 0.8324 |
Algorithm robustness | Robustness score | 0.6143 | 0.3286 | 0 | 0.3286 | 0.6143 | 0.5286 | 0.5714 | 1 |
Multi-step prediction | Prediction score | - | - | - | - | - | - | - | 3.3497 |
ARP_score | 1.3478 | 0.9708 | 0.5246 | 1.0018 | 1.3596 | 1.2474 | 1.2883 | 5.1821 |