Introduction
Related work
Machine learning-based intrusion detection in WSNs
Deep learning in WSN security
Clustering techniques for anomaly detection
Feature reduction methods in WSN security
Comparative studies and benchmarking
Challenges and open issues
Summary and positioning
-
It has not been investigated how to identify cyberattacks in wireless sensor networks using a hybrid feature reduction technique and machine learning.
-
DLFFNN methodology is not combined with the SMOTE-based ENN method.
-
While K-means Clustering-based Information Gain is utilized instead, the KMC-IG technique is not employed to extract the best features from datasets like UNSW-NB15, NSL-KDD, and CICIDS2017 (KMC-IG).
Knowledge and background
Research methodology
-
The DFNN Classification Mode takes as input features extracted from network traffic data in the context of Wireless Sensor Networks (WSNs).
-
Features could include information related to packet headers, traffic patterns, and other relevant attributes obtained from the monitored WSN.
-
The input features undergo a feature reduction process. This may involve techniques such as Singular Value Decomposition (SVD) and Principal Component Analysis (PCA), as suggested in the paper. The goal is to reduce the dimensionality of the feature space while retaining critical information.
-
A K-Means Clustering Model enhanced with Information Gain (KMC-IG) is applied to further refine and cluster the reduced features. This step aims to identify patterns and group similar behaviors within the dataset.
-
The proposed Synthetic Minority Excessively Technique is introduced, likely during or after the clustering stage, to address imbalances in the dataset. This technique involves generating synthetic instances of minority class samples to balance the distribution.
-
The core of the proposed method is the Deep Forward Neural Network (DFNN). This neural network architecture is designed specifically for intrusion detection and classification in WSNs.
-
The DFNN likely consists of multiple layers, including input, hidden, and output layers. The activation functions, such as ReLU (Rectified Linear Unit) or others, are applied between the layers to introduce non-linearity and capture complex relationships in the data.
-
The performance of the DFNN is evaluated using standard metrics such as accuracy, precision, recall, and F-measure. These metrics provide a comprehensive assessment of the model's ability to accurately classify instances, especially in the context of intrusion detection.
-
Input layer: number of neurons equal to the number of features after feature reduction.
-
Hidden layers: multiple hidden layers with varying numbers of neurons. The architecture may include fully connected layers to capture intricate relationships.
-
Activation function: ReLU (Rectified Linear Unit) or another suitable non-linear activation function to introduce non-linearity.
-
Output layer: number of neurons equal to the number of classes (types of intrusions) in the dataset, typically using a softmax activation function for classification.
-
Loss function: cross-entropy loss, commonly used for classification tasks.
-
Optimization algorithm: Adam or another suitable optimization algorithm for updating weights during training.
-
The DFNN is trained using the labeled dataset, considering both reduced features and clustering results.
-
Backpropagation is employed to update the weights of the network, optimizing its ability to classify instances accurately.
-
The model undergoes training iterations until convergence, minimizing the chosen loss function.
-
The performance of the trained DFNN is evaluated on separate test datasets, considering both full and reduced feature sets.
-
Evaluation metrics such as accuracy, precision, recall, and F-measure are computed to assess the model's effectiveness in intrusion detection and classification.
Proposed architecture workflow and algorithms
Data pre-processing stage
Encoding features based on labels
Feature normalization using logarithmic technique
Data splitting stage
Feature extraction and selection (FES) using KMC-IG
KMC-IG-based FES
Data balancing using SMOTE and ENN stage
Training and validation stage
-
The primary goal of any machine learning model, including neural networks, is to generalize well to unseen data. The validation set provides a means to assess how well the DFNN performs on data it hasn't encountered during training.
-
During the training process, hyperparameters like learning rate, batch size, or the number of hidden layers are optimized to enhance the model's performance. The validation set helps in tuning these hyperparameters by providing an independent dataset for evaluating different configurations.
-
Overfitting occurs when a model learns the training data too well, capturing noise and specificities that do not generalize. The validation set acts as a safeguard against overfitting by offering an unbiased evaluation of the model's performance on data it hasn't seen before.
-
The validation set is often used in conjunction with early stopping. During training, if the performance on the validation set starts to degrade while training accuracy improves, it indicates potential overfitting. Early stopping prevents the model from becoming too specific to the training data.
-
In scenarios where multiple models or architectures are being considered, the validation set aids in comparing their performance. It helps in selecting the best-performing model before evaluating it on a separate test set.
-
The validation set ensures that the model is not inadvertently learning patterns specific to the test set during training. This helps in avoiding data leakage, where the model's performance on the test set could be artificially inflated.
-
As the model evolves through iterative development, the validation set allows for fine-tuning. Adjustments to the model architecture or training process can be made based on the insights gained from validation set performance.
-
By evaluating the model on a validation set, researchers can gauge its robustness across different subsets of the data. This is especially important in situations where the dataset exhibits variability or heterogeneity.
-
Including a validation set adds a level of rigor to the model evaluation process. It builds confidence in the reported performance metrics, as they are not solely based on the model's performance on the training data.
DFNN
Evaluation stage
-
CN (Correct Negative): The instances that are truly negative and are correctly identified as negative.
-
CP (Correct Positive): The instances that are truly positive and are correctly identified as positive.
-
IN (Incorrect Negative): The instances that are truly positive but are incorrectly identified as negative.
-
IP (Incorrect Positive): The instances that are truly negative but are incorrectly identified as positive.
Experiments and results
Datasets description and modelling
Name of dataset | Attributes numbers and features |
---|---|
NSL-KDD | 16 Features |
CICIDS2017 | 39 Features |
UNSW-NB 15 | 13 Features |
-
a) NSL-KDD dataset
Set of 16 NSL-KDD reduced features | |||
---|---|---|---|
srv_rerror_rate | dst_host_count | dst_host_srv_c ount | dst_host_same_srv_rate |
dst_host_srv_rerror_rate | dst_host_srv_se rror_rate | serror_ra te | srv_serro r_rate |
logged_in | rerror_ra te | same_srv_rate | count |
dst_host_rerror_rate | Protocol type | dst_host_serror_rate | flag |
Attacks types | NSL-KDD-full feature set | NSL-KDD-reduced and Balanced Feature Set | ||||
---|---|---|---|---|---|---|
Training | Validation | Testing | Training Set | Validation Test | Testing Set | |
N = Normal | 54,108 | 10,998 | 10,998 | 12,987 | 10,998 | 10,998 |
D = DoS | 41,415 | 8111 | 7009 | 8989 | 8111 | 7009 |
P = Probe | 9855 | 3221 | 3221 | 4255 | 3221 | 3221 |
R = R2L | 3617 | 632 | 632 | 1729 | 632 | 632 |
U = U2R | 94 | 16 | 16 | 67 | 16 | 16 |
Total | 109,089 | 22,978 | 21,876 | 28,027 | 22,978 | 21,876 |
-
b) UNSW-NB15 dataset
Set of 13 UNSW-NB 15 reduced features set | |
---|---|
ct_dst_ltm | dttl |
stcpb | Dwin |
is_sm_ips_ports | sinpkt |
dmean | ct_state_ttl |
dloss | Proto |
dtcpb | ct_src_dport_ltm |
swin |
Attack type | UNSW-NB 15-full feature set | UNSW-NB 15-reduced feature set | ||||
---|---|---|---|---|---|---|
Training | Validation | Testing | Training | Validation | Testing | |
N = Normal | 66,211 | 14,893 | 14,893 | 46,632 | 14,893 | 14,893 |
F = Fuzzers | 17,853 | 4748 | 4748 | 12,992 | 4748 | 4748 |
A = Analysis | 1985 | 513 | 513 | 1423 | 513 | 513 |
B = Backdoors | 1740 | 458 | 458 | 1252 | 458 | 458 |
D = DoS | 11,558 | 2564 | 2564 | 8124 | 2564 | 2564 |
E = Exploits | 31,279 | 7588 | 7588 | 21,928 | 7588 | 7588 |
G = Generic | 42,321 | 8942 | 8942 | 29,058 | 8942 | 8942 |
R = Reconnaissance | 9811 | 2387 | 2387 | 7463 | 2387 | 2387 |
SC = Shell Code | 1103 | 238 | 238 | 810 | 238 | 238 |
W = Worms | 133 | 37 | 37 | 96 | 37 | 37 |
Total | 183,994 | 42,368 | 42,368 | 129,778 | 42,368 | 42,368 |
-
c) CICIDS2017 dataset
Set of 39 CICIDS2017 reduced features | ||||
---|---|---|---|---|
URG_Flag_Count | Fwd_Packet_Length_Min | Bwd_Packet_Length_Max | Bwd_Packet_Length_Mean | FIN_Flag_Count |
Idle_Std | Init_Win_bytes_backward | Down/Up_Ratio | Packet_Length_Mean | Idle_Max |
Idle_Mean | Fwd_IAT_Std | Min_Packet_Length | Flow_IAT_Mean | Max_Packet_Length |
Bwd_Packet_Length_Std | Fwd_IAT_Mean | Average_Packet_Size | Fwd_PSH_Flags | Fwd_IAT_Total |
Flow_IAT_Max | Flow_IAT_Std | Fwd_IAT_Max | Fwd_Packet_Length_Mean | Destination Port |
Packet_Length_Std | Avg_Fwd_Segment_Size | Fwd_Packet_Length_Max | ACK_Flag_Count | Packet_Length_Variance |
Idle_Min | PSH_Flag_Count | Flow Duration | Bwd_IAT_Max | Avg_Bwd_Segment_Size |
Bwd_Packet_Length_Min | Flow_Packets/s | SYN_Flag_Count | Bwd_IAT_Std |
Attack type | CICIDS2017-full feature set | CICIDS2017-reduced feature set | ||||
---|---|---|---|---|---|---|
Training | Validation | Testing | Training | Validation | Testing | |
Normal | 44,238 | 1025 | 1025 | 25,241 | 1025 | 1025 |
Bot | 1487 | 324 | 324 | 882 | 324 | 324 |
Brute Force | 1666 | 233 | 233 | 637 | 233 | 233 |
DDoS | 50,122 | 9330 | 9330 | 23,899 | 9330 | 9330 |
DoS Golden-Eye | 7326 | 1655 | 1655 | 4636 | 1655 | 1655 |
DoS Hulk | 7836 | 1669 | 1599 | 4222 | 1669 | 1599 |
FTP patator | 6320 | 1250 | 1250 | 4002 | 1250 | 1250 |
Heart Bleed | 9 | 3 | 4 | 3 | 3 | 4 |
Infilteration | 28 | 6 | 7 | 16 | 6 | 7 |
PortScan | 43,315 | 9066 | 9066 | 24,215 | 9066 | 9066 |
SQL | 17 | 4 | 4 | 9 | 4 | 4 |
SSH Patator | 4216 | 913 | 913 | 2450 | 913 | 913 |
XSS | 615 | 96 | 96 | 263 | 96 | 96 |
DoS SlowHttpTest | 4115 | 916 | 916 | 2215 | 916 | 916 |
DoS Slowloris | 4156 | 916 | 869 | 2359 | 916 | 869 |
Total | 175,466 | 27,406 | 27,291 | 95,049 | 27,406 | 27,291 |
Binary classification
Phase | Feature set | |||||
---|---|---|---|---|---|---|
Full feature set | Reduced feature set | |||||
Class | Normal | Anomalous | Class | Normal | Anomalous | |
Training | N | 58,008 | 82 | N | 16,609 | 75 |
A | 101 | 46,908 | A | 46 | 12,097 | |
Validation | N | 13,662 | 93 | N | 14,773 | 93 |
A | 61 | 9687 | A | 61 | 9687 | |
Testing | N | 12,578 | 55 | N | 12,578 | 55 |
A | 95 | 11,794 | A | 95 | 11,794 |
Phases | Feature sets | |||||
---|---|---|---|---|---|---|
Full feature set | Reduced feature set | |||||
Class | Normal | Anomalous | Class | Normal | Anomalous | |
Training | N | 100,085 | 101 | N | 82,152 | 87 |
A | 178 | 81,143 | A | 54 | 46,106 | |
Validation | N | 29,873 | 63 | N | 29,873 | 63 |
A | 92 | 9854 | A | 92 | 9854 | |
Testing | N | 27,233 | 87 | N | 27,233 | 87 |
A | 62 | 13,513 | A | 62 | 13,513 |
Phase | Feature set | |||||
---|---|---|---|---|---|---|
Full feature set | Reduced feature set | |||||
Class | Normal | Anomalous | Class | Normal | Anomalous | |
Training | N | 94,645 | 79 | N | 63,959 | 69 |
A | 52 | 66,532 | A | 46 | 28,086 | |
Validation | N | 25,853 | 54 | N | 25,853 | 54 |
A | 92 | 9759 | A | 92 | 9759 | |
Testing | N | 24,213 | 41 | N | 24,213 | 41 |
A | 71 | 11,423 | A | 71 | 11,423 |
Phases | Feature set | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Full feature set | Reduced feature set | |||||||||||
Class | N | D | P | R | U | Class | N | D | P | R | U | |
Training | N | 69,293 | 0 | 0 | 0 | 1 | N | 8867 | 0 | 0 | 0 | 1 |
D | 13 | 2814 | 3 | 9 | 6 | D | 0 | 5454 | 3 | 1 | 1 | |
P | 0 | 0 | 1745 | 1 | 0 | P | 0 | 0 | 2988 | 1 | 0 | |
R | 0 | 0 | 0 | 2038 | 0 | R | 3 | 0 | 0 | 4239 | 0 | |
U | 6 | 4 | 0 | 1 | 6628 | U | 0 | 4 | 0 | 1 | 4038 | |
Validation | N | 1044 | 1 | 0 | 0 | 1 | N | 1044 | 1 | 0 | 0 | 1 |
D | 13 | 2822 | 2 | 6 | 0 | D | 13 | 2822 | 2 | 6 | 0 | |
P | 0 | 0 | 5044 | 1 | 0 | P | 0 | 0 | 5044 | 1 | 0 | |
R | 0 | 0 | 0 | 3513 | 0 | R | 0 | 0 | 0 | 3513 | 0 | |
U | 4 | 1 | 0 | 1 | 872 | U | 4 | 1 | 0 | 1 | 872 | |
Testing | N | 9964 | 4 | 0 | 0 | 1 | N | 9964 | 4 | 0 | 0 | 1 |
D | 14 | 3879 | 3 | 1 | 0 | D | 14 | 3879 | 3 | 1 | 0 | |
P | 0 | 0 | 3589 | 1 | 0 | P | 0 | 0 | 3589 | 1 | 0 | |
R | 3 | 0 | 0 | 4247 | 3 | R | 3 | 0 | 0 | 4247 | 3 | |
U | 0 | 1 | 0 | 1 | 866 | U | 0 | 1 | 0 | 1 | 866 |
Phase | Feature | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Class | N | F | A | B | D | E | G | R | SC | W | Class | N | F | A | B | D | E | G | R | SC | W | |
Training | N | 62,456 | 0 | 4 | 0 | 1 | 5 | 0 | 0 | 1 | 0 | N | 43,453 | 0 | 0 | 0 | 1 | 0 | 0 | 6 | 1 | 0 |
F | 1 | 44,532 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | F | 1 | 24,526 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 5 | |
A | 0 | 3 | 25,321 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | A | 0 | 0 | 19,834 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | |
B | 0 | 9 | 1 | 13,145 | 0 | 4 | 0 | 1 | 0 | 0 | B | 0 | 1 | 1 | 17,739 | 0 | 4 | 0 | 1 | 0 | 0 | |
D | 0 | 0 | 4 | 0 | 12,442 | 6 | 3 | 3 | 0 | 0 | D | 0 | 0 | 4 | 0 | 7349 | 0 | 3 | 0 | 4 | 0 | |
E | 7 | 9 | 0 | 3 | 8 | 9122 | 0 | 0 | 0 | 0 | E | 5 | 3 | 0 | 3 | 0 | 4673 | 0 | 0 | 0 | 3 | |
G | 0 | 0 | 0 | 1 | 0 | 0 | 7823 | 0 | 0 | 0 | G | 0 | 0 | 0 | 1 | 0 | 0 | 3865 | 5 | 0 | 0 | |
R | 0 | 5 | 0 | 0 | 0 | 3 | 1 | 6632 | 0 | 1 | R | 0 | 4 | 0 | 0 | 7 | 3 | 1 | 4423 | 3 | 1 | |
SC | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 3074 | 0 | SC | 0 | 0 | 3 | 0 | 0 | 7 | 3 | 1 | 3074 | 0 | |
W | 3 | 0 | 1 | 1 | 0 | 4 | 0 | 0 | 4 | 1845 | W | 1 | 0 | 3 | 0 | 4 | 4 | 1 | 1 | 4 | 2345 | |
Validation | N | 13,432 | 0 | 5 | 0 | 1 | 3 | 0 | 0 | 1 | 0 | N | 13,432 | 0 | 5 | 0 | 1 | 3 | 3 | 0 | 0 | 1 |
F | 1 | 44,532 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | F | 1 | 44,532 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | |
A | 0 | 3 | 25,321 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | A | 0 | 3 | 25,321 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | |
B | 0 | 9 | 1 | 13,145 | 0 | 4 | 0 | 1 | 0 | 0 | B | 0 | 9 | 1 | 13,145 | 0 | 4 | 4 | 0 | 1 | 0 | |
D | 0 | 0 | 4 | 0 | 12,442 | 6 | 3 | 3 | 0 | 0 | D | 0 | 0 | 4 | 0 | 12,442 | 6 | 6 | 3 | 3 | 0 | |
E | 7 | 9 | 0 | 3 | 8 | 9122 | 0 | 0 | 0 | 0 | E | 7 | 9 | 0 | 3 | 8 | 9122 | 0 | 0 | 0 | 0 | |
G | 0 | 0 | 0 | 1 | 0 | 0 | 7823 | 0 | 0 | 0 | G | 0 | 0 | 0 | 1 | 0 | 0 | 7823 | 0 | 0 | 0 | |
R | 0 | 5 | 0 | 0 | 0 | 3 | 1 | 6632 | 0 | 1 | R | 0 | 5 | 0 | 0 | 0 | 3 | 1 | 6632 | 0 | 1 | |
SC | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 3074 | 0 | SC | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 3074 | 0 | |
W | 3 | 0 | 1 | 1 | 0 | 4 | 0 | 0 | 4 | 3642 | W | 3 | 0 | 1 | 1 | 0 | 4 | 0 | 0 | 4 | 3642 | |
Testing | N | 11,432 | 4 | 0 | 1 | 1 | 1 | 0 | 3 | 1 | 1 | N | 11,432 | 4 | 0 | 1 | 1 | 1 | 0 | 3 | 1 | 1 |
F | 4 | 14,562 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | F | 4 | 14,562 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | |
A | 0 | 0 | 6843 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | A | 0 | 0 | 6843 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | |
B | 1 | 0 | 1 | 13,145 | 0 | 4 | 0 | 1 | 0 | 0 | B | 1 | 0 | 1 | 13,145 | 0 | 4 | 0 | 1 | 0 | 0 | |
D | 0 | 0 | 4 | 0 | 12,442 | 6 | 3 | 3 | 0 | 0 | D | 0 | 0 | 4 | 0 | 12,442 | 6 | 3 | 3 | 0 | 0 | |
E | 0 | 1 | 0 | 3 | 8 | 1267 | 0 | 0 | 0 | 0 | E | 0 | 1 | 0 | 3 | 8 | 1267 | 0 | 0 | 0 | 0 | |
G | 5 | 0 | 0 | 1 | 0 | 0 | 445 | 0 | 0 | 0 | G | 5 | 0 | 0 | 1 | 0 | 0 | 445 | 0 | 0 | 0 | |
R | 0 | 1 | 0 | 0 | 0 | 3 | 1 | 665 | 0 | 1 | R | 0 | 1 | 0 | 0 | 0 | 3 | 1 | 665 | 0 | 1 | |
SC | 0 | 1 | 0 | 0 | 6 | 0 | 6 | 0 | 1457 | 0 | SC | 0 | 1 | 0 | 0 | 6 | 0 | 6 | 0 | 1457 | 0 | |
W | 1 | 0 | 0 | 3 | 0 | 4 | 0 | 0 | 4 | 994 | W | 1 | 0 | 0 | 3 | 0 | 4 | 0 | 0 | 4 | 994 |
Phases | Feature | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Full feature set | Reduced feature set | |||||||||||||||||||||||||||||||
Class | N | B | BF | DD | DGE | DH | DSHT | DS | FB | HB | I | PS | S | SP | X | Class | N | B | BF | DD | DGE | DH | DSHT | DS | FB | HB | I | PS | S | SP | X | |
Training | N | 54,082 | 3 | 0 | 5 | 2 | 0 | 4 | 0 | 1 | 0 | 1 | 4 | 0 | 4 | 3 | N | 15,736 | 2 | 0 | 4 | 2 | 0 | 3 | 0 | 1 | 0 | 1 | 3 | 0 | 3 | 2 |
B | 1 | 13,687 | 1 | 9 | 1 | 0 | 0 | 1 | 0 | 0 | 6 | 5 | 0 | 0 | 0 | B | 1 | 12,576 | 1 | 8 | 1 | 0 | 0 | 1 | 0 | 0 | 5 | 6 | 0 | 0 | 0 | |
BF | 0 | 3 | 33,567 | 3 | 0 | 5 | 0 | 6 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | BF | 0 | 2 | 2769 | 2 | 0 | 4 | 0 | 5 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | |
DD | 0 | 7 | 3 | 17,026 | 3 | 0 | 0 | 3 | 0 | 0 | 0 | 3 | 5 | 0 | 9 | DD | 0 | 6 | 2 | 16,020 | 3 | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 3 | 0 | 7 | |
DGE | 5 | 0 | 0 | 4 | 12,311 | 6 | 10 | 0 | 8 | 0 | 0 | 8 | 0 | 10 | 0 | DGE | 5 | 0 | 0 | 4 | 11,201 | 6 | 8 | 0 | 8 | 0 | 0 | 8 | 0 | 10 | 0 | |
DH | 4 | 0 | 0 | 0 | 11 | 8461 | 0 | 0 | 5 | 0 | 5 | 0 | 0 | 0 | 0 | DH | 4 | 0 | 0 | 0 | 11 | 7150 | 0 | 0 | 5 | 0 | 5 | 0 | 0 | 0 | 0 | |
DSHT | 13 | 0 | 9 | 0 | 0 | 1 | 5438 | 0 | 16 | 1 | 7 | 0 | 0 | 0 | 0 | DSHT | 9 | 0 | 8 | 0 | 0 | 1 | 4327 | 0 | 11 | 1 | 6 | 0 | 0 | 0 | 0 | |
DS | 0 | 3 | 0 | 0 | 0 | 3 | 0 | 4221 | 3 | 0 | 0 | 0 | 7 | 0 | 5 | DS | 0 | 2 | 0 | 0 | 0 | 2 | 0 | 3110 | 2 | 0 | 0 | 0 | 6 | 0 | 3 | |
FP | 0 | 6 | 0 | 10 | 7 | 7 | 0 | 9 | 9923 | 0 | 1 | 8 | 8 | 0 | 0 | FP | 0 | 5 | 0 | 9 | 6 | 6 | 0 | 6 | 8812 | 0 | 1 | 6 | 6 | 0 | 0 | |
HB | 0 | 1 | 0 | 0 | 0 | 9 | 0 | 1 | 2 | 1246 | 0 | 3 | 1 | 0 | 1 | HB | 0 | 1 | 0 | 0 | 0 | 8 | 0 | 1 | 2 | 980 | 0 | 2 | 1 | 0 | 1 | |
I | 4 | 0 | 5 | 0 | 0 | 0 | 6 | 0 | 5 | 0 | 1289 | 0 | 0 | 1 | 3 | I | 3 | 0 | 4 | 0 | 0 | 0 | 6 | 0 | 4 | 0 | 1152 | 0 | 0 | 1 | 2 | |
PS | 4 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 4 | 0 | 3 | 2695 | 3 | 0 | 3 | PS | 3 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 3 | 0 | 3 | 1132 | 3 | 0 | 2 | |
S | 1 | 0 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 0 | 0 | 5 | 2111 | 4 | 0 | S | 1 | 0 | 1 | 0 | 1 | 0 | 2 | 0 | 1 | 0 | 0 | 5 | 1753 | 4 | 0 | |
SP | 5 | 0 | 3 | 0 | 0 | 0 | 8 | 5 | 0 | 3 | 5 | 6 | 0 | 98 | 0 | SP | 4 | 0 | 3 | 0 | 0 | 0 | 9 | 6 | 0 | 3 | 6 | 5 | 0 | 1236 | 0 | |
X | 6 | 0 | 3 | 0 | 0 | 4 | 0 | 14 | 3 | 0 | 8 | 0 | 3 | 1 | 876 | X | 5 | 0 | 2 | 0 | 0 | 3 | 0 | 9 | 3 | 0 | 7 | 0 | 3 | 1 | 76 | |
Validation | N | 14,073 | 5 | 0 | 1 | 1 | 0 | 1 | 0 | 5 | 7 | 0 | 0 | 1 | 0 | 5 | N | 14,073 | 5 | 0 | 1 | 1 | 0 | 1 | 0 | 5 | 7 | 0 | 0 | 1 | 0 | 5 |
B | 7 | 6687 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | B | 7 | 6687 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | |
BF | 0 | 3 | 1831 | 0 | 0 | 5 | 0 | 6 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | BF | 0 | 3 | 1831 | 0 | 0 | 5 | 0 | 6 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | |
DD | 0 | 7 | 3 | 1695 | 3 | 0 | 0 | 3 | 0 | 0 | 0 | 3 | 5 | 0 | 9 | DD | 0 | 7 | 3 | 1695 | 3 | 0 | 0 | 3 | 0 | 0 | 0 | 3 | 5 | 0 | 9 | |
DGE | 5 | 0 | 0 | 4 | 141 | 6 | 10 | 0 | 8 | 0 | 0 | 8 | 0 | 10 | 0 | DGE | 5 | 0 | 0 | 4 | 141 | 6 | 10 | 0 | 8 | 0 | 0 | 8 | 0 | 10 | 0 | |
DH | 4 | 0 | 0 | 0 | 11 | 1399 | 0 | 0 | 5 | 0 | 5 | 0 | 0 | 0 | 0 | DH | 4 | 0 | 0 | 0 | 11 | 1399 | 0 | 0 | 5 | 0 | 5 | 0 | 0 | 0 | 0 | |
DSHT | 8 | 0 | 7 | 0 | 0 | 1 | 13,988 | 0 | 16 | 1 | 7 | 0 | 0 | 0 | 0 | DSHT | 8 | 0 | 7 | 0 | 0 | 1 | 13,988 | 0 | 16 | 1 | 7 | 0 | 0 | 0 | 0 | |
DS | 0 | 3 | 0 | 0 | 0 | 3 | 0 | 2043 | 3 | 0 | 0 | 0 | 7 | 0 | 5 | DS | 0 | 3 | 0 | 0 | 0 | 3 | 0 | 2043 | 3 | 0 | 0 | 0 | 7 | 0 | 5 | |
FP | 0 | 0 | 0 | 8 | 0 | 6 | 0 | 9 | 1823 | 0 | 1 | 8 | 8 | 0 | 0 | FP | 0 | 0 | 0 | 8 | 0 | 6 | 0 | 9 | 1823 | 0 | 1 | 8 | 8 | 0 | 0 | |
HB | 0 | 1 | 0 | 0 | 0 | 9 | 0 | 1 | 2 | 305 | 0 | 3 | 1 | 0 | 1 | HB | 0 | 1 | 0 | 0 | 0 | 9 | 0 | 1 | 2 | 305 | 0 | 3 | 1 | 0 | 1 | |
I | 0 | 0 | 5 | 0 | 6 | 0 | 4 | 0 | 5 | 0 | 623 | 0 | 0 | 1 | 3 | I | 0 | 0 | 5 | 0 | 6 | 0 | 4 | 0 | 5 | 0 | 623 | 0 | 0 | 1 | 3 | |
PS | 3 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 6 | 0 | 3 | 874 | 3 | 0 | 3 | PS | 3 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 6 | 0 | 3 | 874 | 3 | 0 | 3 | |
S | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 7 | 2257 | 0 | 5 | S | 0 | 0 | 1 | 0 | 0 | 0 | 3 | 0 | 1 | 0 | 0 | 7 | 2257 | 0 | 5 | |
SP | 0 | 5 | 3 | 0 | 8 | 0 | 0 | 3 | 1 | 0 | 0 | 6 | 0 | 108 | 0 | SP | 0 | 5 | 3 | 0 | 8 | 0 | 0 | 3 | 1 | 0 | 0 | 6 | 0 | 108 | 0 | |
X | 11 | 0 | 1 | 0 | 0 | 4 | 4 | 1 | 5 | 1 | 0 | 6 | 9 | 0 | 1967 | X | 11 | 0 | 1 | 0 | 0 | 4 | 4 | 1 | 5 | 1 | 0 | 6 | 9 | 0 | 1967 | |
Testing | N | 11,789 | 1 | 0 | 5 | 0 | 2 | 0 | 3 | 1 | 0 | 1 | 4 | 0 | 4 | 3 | N | 11,789 | 1 | 0 | 5 | 0 | 2 | 0 | 3 | 1 | 0 | 1 | 4 | 0 | 4 | 3 |
B | 2 | 9632 | 3 | 0 | 2 | 0 | 6 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | B | 2 | 9632 | 3 | 0 | 2 | 0 | 6 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | |
BF | 0 | 3 | 2289 | 3 | 0 | 5 | 0 | 6 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | BF | 0 | 3 | 2289 | 3 | 0 | 5 | 0 | 6 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | |
DD | 0 | 7 | 3 | 16,956 | 3 | 0 | 0 | 3 | 0 | 0 | 0 | 3 | 5 | 0 | 9 | DD | 0 | 7 | 3 | 16,956 | 3 | 0 | 0 | 3 | 0 | 0 | 0 | 3 | 5 | 0 | 9 | |
DGE | 5 | 0 | 0 | 4 | 1306 | 6 | 10 | 0 | 8 | 0 | 0 | 8 | 0 | 10 | 0 | DGE | 5 | 0 | 0 | 4 | 1306 | 6 | 10 | 0 | 8 | 0 | 0 | 8 | 0 | 10 | 0 | |
DH | 4 | 0 | 0 | 0 | 11 | 825 | 0 | 0 | 5 | 0 | 5 | 0 | 0 | 0 | 0 | DH | 4 | 0 | 0 | 0 | 11 | 825 | 0 | 0 | 5 | 0 | 5 | 0 | 0 | 0 | 0 | |
DSHT | 13 | 0 | 9 | 0 | 0 | 1 | 439 | 0 | 16 | 1 | 7 | 0 | 0 | 0 | 0 | DSHT | 13 | 0 | 9 | 0 | 0 | 1 | 439 | 0 | 16 | 1 | 7 | 0 | 0 | 0 | 0 | |
DS | 0 | 3 | 0 | 0 | 0 | 3 | 0 | 1985 | 3 | 0 | 0 | 0 | 7 | 0 | 5 | DS | 0 | 3 | 0 | 0 | 0 | 3 | 0 | 1985 | 3 | 0 | 0 | 0 | 7 | 0 | 5 | |
FP | 0 | 6 | 0 | 10 | 7 | 7 | 0 | 9 | 432 | 0 | 1 | 8 | 8 | 0 | 0 | FP | 0 | 6 | 0 | 10 | 7 | 7 | 0 | 9 | 432 | 0 | 1 | 8 | 8 | 0 | 0 | |
HB | 0 | 1 | 0 | 0 | 0 | 9 | 0 | 1 | 2 | 1543 | 0 | 3 | 1 | 0 | 1 | HB | 0 | 1 | 0 | 0 | 0 | 9 | 0 | 1 | 2 | 1543 | 0 | 3 | 1 | 0 | 1 | |
I | 4 | 0 | 5 | 0 | 0 | 0 | 6 | 0 | 5 | 0 | 129 | 0 | 0 | 1 | 3 | I | 4 | 0 | 5 | 0 | 0 | 0 | 6 | 0 | 5 | 0 | 129 | 0 | 0 | 1 | 3 | |
PS | 4 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 4 | 0 | 3 | 1189 | 3 | 0 | 3 | PS | 4 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 4 | 0 | 3 | 1189 | 3 | 0 | 3 | |
S | 1 | 0 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 0 | 0 | 5 | 108 | 4 | 0 | S | 1 | 0 | 1 | 0 | 1 | 0 | 3 | 0 | 1 | 0 | 0 | 5 | 108 | 4 | 0 | |
SP | 0 | 5 | 3 | 0 | 0 | 0 | 4 | 5 | 1 | 3 | 0 | 2 | 0 | 84 | 0 | SP | 0 | 5 | 3 | 0 | 0 | 0 | 4 | 5 | 1 | 3 | 0 | 2 | 0 | 84 | 0 | |
X | 4 | 1 | 1 | 1 | 5 | 4 | 2 | 1 | 1 | 0 | 5 | 1 | 3 | 1 | 652 | X | 4 | 1 | 1 | 1 | 5 | 4 | 2 | 1 | 1 | 0 | 5 | 1 | 3 | 1 | 652 |
Algorithms | NSL-KDD dataset | |||||||
---|---|---|---|---|---|---|---|---|
Original feature set | Reduced feature set | |||||||
Accuracy | Precision | Recall | F-measure | Accuracy | Precision | Recall | F-measure | |
NB | 79.8 | 78.6 | 77.2 | 78 | 87.8 | 85.2 | 83.6 | 84.8 |
CNN | 95.2 | 93.7 | 92.7 | 92.7 | 97.2 | 96.2 | 95.2 | 93.6 |
SVM | 82.1 | 79.5 | 79 | 78.2 | 83.7 | 83.4 | 83.2 | 82.4 |
ANN | 93.2 | 92.3 | 92 | 93.1 | 94.6 | 94 | 93.6 | 95 |
Proposed | 98.9 | 95.8 | 95.1 | 96.8 | 99.7 | 99.8 | 97.8 | 98.8 |
Algorithms | CICIDS2017 dataset | |||||||
---|---|---|---|---|---|---|---|---|
Original feature set | Reduced feature set | |||||||
Accuracy | Precision | Recall | F-measure | Accuracy | Precision | Recall | F-measure | |
NB | 79.2 | 77.7 | 77.2 | 80 | 97.8 | 82.1 | 81.6 | 81.8 |
CNN | 94.7 | 93.7 | 92.7 | 92.7 | 97.2 | 96.2 | 95.2 | 93.7 |
SVM | 80.5 | 79.5 | 79 | 78.2 | 84.7 | 83.4 | 83.2 | 82.3 |
ANN | 93.1 | 92.1 | 92 | 93.1 | 95 | 94 | 93.6 | 95 |
Proposed | 97.8 | 96.8 | 95.1 | 96.8 | 99.8 | 98.7 | 97.7 | 98.7 |
Algorithms | UNSW-NB15 dataset | |||||||
---|---|---|---|---|---|---|---|---|
Original feature set | Reduced feature set | |||||||
Accuracy | Precision | Recall | F-measure | Accuracy | Precision | Recall | F-measure | |
NB | 75.7 | 74.1 | 74.7 | 76.7 | 80.6 | 78.6 | 79.6 | 81.6 |
CNN | 91.7 | 91.1 | 89.8 | 91.7 | 96.2 | 94 | 95.2 | 97.2 |
SVM | 76.5 | 74.8 | 76.5 | 78.5 | 81.7 | 80.1 | 80.7 | 82.7 |
ANN | 89.1 | 89.5 | 89.1 | 91.1 | 94 | 93.3 | 92.7 | 95 |
Proposed | 96.2 | 95.7 | 94.4 | 96.2 | 99.1 | 98.7 | 98.4 | 99.6 |
Multi-class classification
Discussion with compassion
Graphical representations general results
Results and discussion
High detection accuracy across datasets
Effective feature reduction techniques
Balanced trade-off between precision and recall
Benchmarking and comparative analysis
Generalizability and adaptability
Efficiency and early detection
Practical implications
Future directions
Contributions to the field
Limitations and caveats
-
The proposed method may employ a more effective feature representation or extraction technique compared to traditional algorithms. If the features used by the model better capture the underlying patterns in the data, it can lead to improved performance.
-
Neural networks, including CNNs and ANNs, are capable of capturing complex and non-linear relationships in data. If the problem at hand involves intricate patterns or dependencies, a deep learning approach may have an advantage over linear models like SVM and Naive Bayes.
-
Traditional algorithms, including SVM and NB, may struggle with imbalanced datasets. If the dataset used for evaluation is imbalanced, the proposed method might incorporate techniques to handle this imbalance, giving it an edge in performance.
-
The proposed method could be a hybrid model that combines the strengths of multiple algorithms. Hybrid models are designed to leverage the advantages of different techniques, potentially resulting in improved performance over individual models.
-
If the proposed method employs techniques like Synthetic Minority Over-sampling Technique (SMOTE) or other data augmentation methods, it can enhance the model's ability to generalize and detect minority classes, which may be challenging for traditional algorithms.
-
The architecture of the proposed model, especially in the case of CNNs or ANNs, might be designed to capture specific domain knowledge or features that are critical for intrusion detection. This tailored architecture can contribute to better performance.
-
The proposed method could use ensemble learning, combining multiple models to make predictions. Ensemble methods often lead to more robust and accurate results compared to individual models.
-
If the proposed method is designed with a deep understanding of the domain and specific characteristics of intrusion detection, it may be better suited to handle the nuances of the problem compared to more generic algorithms.