1 Introduction
2 Related work
Reference | Considered Scenario | AI based | Detection or Mitigation Strategy |
---|---|---|---|
Cassavia et al. (2022) | Covert channels in IoT nodes | \(\checkmark \) | Ensemble of sparse autoencoders |
Elsadig and Gafar (2022) | IoT covert communications are not addressed for detection | \(\checkmark \) | Survey with many detection mechanisms, e.g., SVM, LSTM, and KNN |
Guarascio et al. (2022) | Covert channels in IoT nodes | \(\checkmark \) | Sparse autoencoders |
Frolova et al. (2021) | General network-wide covert communications | - | Traffic manipulation and engineering to disrupt or prevent hidden transmissions |
Thakkar and Lohiya (2021) | Covert channels are not considered. Attack models for IoT nodes are tojans, DoS, etc. | \(\checkmark \) | Survey covering techniques like ML/DL, k-Nearest, Random Forest and Ensemble Learning |
Nowakowski et al. (2021) | Covert channels and threats cloaking data in TTL of IPv4, as well as in various parts of TCP and HTTP traffic | \(\checkmark \) | The Driverless AI platform is used to prevent feature engineering and spot anomalous flows from a threat-specific raw data representation |
Vaccari et al. (2020) | Not focusing on hidden communications but on general attack templates | - | A dataset to perform realistic experiment is provided |
Alcaraz et al. (2019) | Timing channels in SCADA IoT applications | - | By-design elimination or impairment via additional delays |
Noor and Hassan (2019) | Generic hidden communications for IoT nodes via the MQTT protocol | - | Use of different state-of-the-art and attack dependent metric, including the “compressibility” (see, Cabuk et al. (2004)) |
Ho (2019) | Not related to hiding data in network traffic, but in data produced by sensors | - | By-design elimination or “constraints” in the timing behaviors of data managed by IoT nodes |
3 Attack scenario and threat modelling
3.1 Attack scenario
1
and 0
by increasing or decreasing the observed TTL of a suitable threshold or by exploiting the most popular values as “high” and “low” signals. The attacker should then design a proper information hiding mechanism by taking into account the “clean” traffic conditions and select accordingly how bits are encoded. To this aim, the threat actor is expected to perform a reconnaissance campaign to gather traffic information (e.g., by fingerprinting IoT devices) or deploy a packet sniffing routine for monitoring local network conversations (Mazurczyk and Caviglione, 2021; Zorawski et al., 2023). To have a general setting, we consider the two use cases depicted in Fig. 1b. In more detail:-
Case 1 (Naive Encoding Scheme): the malware encodes the bit
1
and0
by selecting a single TTL value for each bit; -
Case 2 (Advanced Threat Scheme): the malware encodes the bit
1
and0
by randomly selecting a single TTL value from three alternatives for each bit.
0
is encoded by using a TTL value equal to 64, whereas the bit 1
is encoded by using 100. Differently, a more advanced scheme tries to reduce the footprint left in the traffic, e.g., in terms of anomalous distributions or possible signatures. Hence, at each bit sent, the malware can slightly adapt its encoding scheme by selecting a different TTL value (see, Case 2 in Fig. 1b). As a result, the bit 0
is encoded by using a TTL value among 44, 56, and 57, whereas the bit 1
is encoded by using a TTL value among 210, 223, and 224.
3.2 Datasets for modelling the covert channel
4 Deep ensemble learning scheme
4.1 Detection through a single autoencoder
4.2 Learning and combining different detectors
5 Experimental investigation
5.1 Datasets, parameters and evaluation metrics
5.1.1 Datasets and evaluation protocol
5.1.2 Parameters and competitors
-
the Deep Autoencoder (from now on referred as DAE), depicted in Fig. 5a. The encoder is instantiated with two fully-connected dense layers with 9 (input size) and 8 neurons, respectively. The third layer is the latent space and it is instantiated as a dense layer with 4 neurons. The decoder expands the latent representation (still of size 4) through the same (but inverted) 2-layer sequence as the encoder;
-
the “unsparse” version of our architecture (from now on referred to as U-Net), depicted in Fig. 5b. Both its encoder and decoder parts are instantiated with two fully-connected dense layers with 9 (input size) and 8 neurons, respectively. The middle (third) layer still consists of 4 neurons;
-
the Sparse Autoencoder (from now on referred as Sparse AE), depicted in Fig. 5c. It features a single-hidden-layer shallow architecture with 32 neurons.
5.1.3 Evaluation metrics
-
Accuracy: defined as the fraction of cases correctly classified, i.e., \(\frac{TP + TN}{TP + FP + FN + TN}\);
-
Precision and Recall: metrics used to estimate the detection capability of a system since they provide a measurement of accuracy in identifying attacks and avoiding false alarms. Specifically, Precision is defined as \(\frac{TP}{TP + FP}\), while Recall as \(\frac{TP}{TP + FN}\);
-
F-Measure: summarizes the model performance and it is the harmonic mean of Precision and Recall. The F-Measures is particularly beneficial when the class distribution is imbalanced and represents the best trade-off between Recall and Precision. Hence, it is often preferred over other metrics when one searches for a unique criterion assessing the goodness of a classification approach.
5.2 Numerical results
Ensemble Size | Strategy | Detection Threshold | Accuracy | Precision | Recall | F-Measure |
---|---|---|---|---|---|---|
3 | \(90^{th}\) perc. | 0.894 | 0.771 | 0.979 | 0.863 | |
median | \(95^{th}\) perc. | 0.947 | 0.902 | 0.948 | 0.924 | |
\(99^{th}\) perc. | 0.955 | 0.950 | 0.915 | 0.932 | ||
\(90^{th}\) perc. | 0.894 | 0.772 | 0.974 | 0.861 | ||
max | \(95^{th}\) perc. | 0.946 | 0.901 | 0.945 | 0.922 | |
\(99^{th}\) perc. | 0.948 | 0.941 | 0.902 | 0.921 | ||
\(90^{th}\) perc. | 0.892 | 0.767 | 0.980 | 0.860 | ||
avg | \(95^{th}\) perc. | 0.947 | 0.901 | 0.947 | 0.924 | |
\(99^{th}\) perc. | 0.952 | 0.946 | 0.910 | 0.928 | ||
5 | \(90^{th}\) perc. | 0.890 | 0.764 | 0.977 | 0.858 | |
median | \(95^{th}\) perc. | 0.933 | 0.863 | 0.954 | 0.906 | |
\(99^{th}\) perc. | 0.952 | 0.944 | 0.911 | 0.927 | ||
\(90^{th}\) perc. | 0.893 | 0.767 | 0.981 | 0.861 | ||
max | \(95^{th}\) perc. | 0.939 | 0.878 | 0.953 | 0.914 | |
\(99^{th}\) perc. | 0.941 | 0.944 | 0.878 | 0.910 | ||
\(90^{th}\) perc. | 0.891 | 0.765 | 0.981 | 0.859 | ||
avg | \(95^{th}\) perc. | 0.941 | 0.884 | 0.951 | 0.916 | |
\(99^{th}\) perc. | 0.951 | 0.942 | 0.912 | 0.927 |
5.2.1 Comparing different ensemble schemes and sizes
Ensemble Size | Strategy | Detection Threshold | Accuracy | Precision | Recall | F-Measure |
---|---|---|---|---|---|---|
3 | median | \(90^{th}\) perc. | 0.811 | 0.715 | 0.732 | 0.724 |
\(95^{th}\) perc. | 0.728 | 0.735 | 0.308 | 0.434 | ||
max | \(90^{th}\) perc. | 0.779 | 0.687 | 0.640 | 0.663 | |
\(95^{th}\) perc. | 0.727 | 0.735 | 0.307 | 0.433 | ||
avg | \(90^{th}\) perc. | 0.803 | 0.706 | 0.720 | 0.713 | |
\(95^{th}\) perc. | 0.731 | 0.742 | 0.315 | 0.442 | ||
5 | median | \(90^{th}\) perc. | 0.786 | 0.688 | 0.671 | 0.680 |
\(95^{th}\) perc. | 0.734 | 0.703 | 0.374 | 0.488 | ||
max | \(90^{th}\) perc. | 0.788 | 0.692 | 0.675 | 0.683 | |
\(95^{th}\) perc. | 0.728 | 0.707 | 0.336 | 0.455 | ||
avg | \(90^{th}\) perc. | 0.801 | 0.702 | 0.717 | 0.710 | |
\(95^{th}\) perc. | 0.732 | 0.723 | 0.338 | 0.461 |
Model Type | Detection Threshold | Accuracy | Precision | Recall | F-Measure |
---|---|---|---|---|---|
Sparse U-Net
| \(90^{th}\) perc. | 0.882 | 0.743 |
0.993
| 0.850 |
\(95^{th}\) perc. | 0.921 | 0.822 | 0.976 | 0.893 | |
\(99^{th}\) perc. |
0.936
| 0.942 | 0.865 |
0.902
| |
DAE
| \(90^{th}\) perc. | 0.869 | 0.724 |
0.993
| 0.837 |
\(95^{th}\) perc. | 0.910 | 0.801 | 0.975 | 0.880 | |
\(99^{th}\) perc. | 0.905 |
0.962
| 0.750 | 0.843 | |
Sparse AE
| \(90^{th}\) perc. | 0.875 | 0.737 | 0.979 | 0.841 |
\(95^{th}\) perc. | 0.901 | 0.795 | 0.951 | 0.866 | |
\(99^{th}\) perc. | 0.902 | 0.922 | 0.778 | 0.844 | |
U-Net
| \(90^{th}\) perc. | 0.875 | 0.736 | 0.982 | 0.841 |
\(95^{th}\) perc. | 0.907 | 0.799 | 0.968 | 0.876 | |
\(99^{th}\) perc. | 0.819 | 0.853 | 0.563 | 0.678 | |
Ensemble (k=3)
| \(90^{th}\) perc. | 0.894 | 0.771 | 0.979 | 0.863 |
\(95^{th}\) perc. | 0.947 | 0.902 | 0.948 | 0.924 | |
\(99^{th}\) perc. |
0.955
| 0.950 | 0.915 |
0.932
|
5.2.2 Comparing different base models
Training Perc. | Strategy | Detection Threshold | Accuracy | Precision | Recall | F-Measure |
---|---|---|---|---|---|---|
\(25\%\) | median | \(90^{th}\) perc. | 0.866 | 0.718 | 0.995 | 0.834 |
\(95^{th}\) perc. | 0.905 | 0.789 | 0.979 | 0.874 | ||
\(99^{th}\) perc. | 0.942 | 0.924 | 0.904 | 0.914 | ||
max | \(90^{th}\) perc. | 0.873 | 0.729 | 0.997 | 0.842 | |
\(95^{th}\) perc. | 0.905 | 0.789 | 0.982 | 0.875 | ||
\(99^{th}\) perc. | 0.923 | 0.870 | 0.908 | 0.888 | ||
mean | \(90^{th}\) perc. | 0.869 | 0.722 | 0.997 | 0.838 | |
\(95^{th}\) perc. | 0.904 | 0.786 | 0.982 | 0.873 | ||
\(99^{th}\) perc. | 0.928 | 0.886 | 0.902 | 0.894 | ||
Average Performances 25\(\%\) | 0.902 | 0.801 | 0.961 | 0.870 | ||
\(50\%\) | median | \(90^{th}\) perc. | 0.849 | 0.691 | 0.998 | 0.817 |
\(95^{th}\) perc. | 0.903 | 0.781 | 0.990 | 0.873 | ||
\(99^{th}\) perc. | 0.943 | 0.901 | 0.934 | 0.917 | ||
max | \(90^{th}\) perc. | 0.862 | 0.710 | 0.998 | 0.830 | |
\(95^{th}\) perc. | 0.913 | 0.803 | 0.985 | 0.885 | ||
\(99^{th}\) perc. | 0.944 | 0.904 | 0.932 | 0.918 | ||
mean | \(90^{th}\) perc. | 0.853 | 0.697 | 0.998 | 0.821 | |
\(95^{th}\) perc. | 0.903 | 0.783 | 0.988 | 0.873 | ||
\(99^{th}\) perc. | 0.942 | 0.897 | 0.935 | 0.916 | ||
Average Performances 50\(\%\) | 0.901 | 0.796 | 0.973 | 0.872 | ||
\(100\%\) | median | \(90^{th}\) perc. | 0.894 | 0.771 | 0.979 | 0.863 |
\(95^{th}\) perc. | 0.947 | 0.902 | 0.948 | 0.924 | ||
\(99^{th}\) perc. | 0.955 | 0.950 | 0.915 | 0.932 | ||
max | \(90^{th}\) perc. | 0.894 | 0.772 | 0.974 | 0.861 | |
\(95^{th}\) perc. | 0.946 | 0.901 | 0.945 | 0.922 | ||
\(99^{th}\) perc. | 0.948 | 0.941 | 0.902 | 0.921 | ||
mean | \(90^{th}\) perc. | 0.892 | 0.767 | 0.980 | 0.860 | |
\(95^{th}\) perc. | 0.947 | 0.901 | 0.947 | 0.924 | ||
\(99^{th}\) perc. | 0.952 | 0.946 | 0.910 | 0.928 | ||
Average performances 100\(\%\) | 0.930 | 0.872 | 0.944 | 0.904 |