Introduction
-
The significance of the IDF in a multi-cloud-IoT environment is discussed in this study. State-of-the-art works related to machine learning and deep learning techniques are presented.
-
Propose Integer-Grading Normalization (I-GN), a simple pre-processing technique to leverage the collected data in a unique grading form. This ensures the fairness of the preserved data for different purposes.
-
Opposition-Based Learning (OBL)- Rat Inspired Optimizer (RIO) is a novel feature selection technique that extracts significant features by exploring and exploiting the local searching process. The design of rat inspired optimiser analyses the entire data to select the fittest features.
-
2D-Array-based Convolutional Neural Network (2D-ACNN) is employed to classify the attack classes. The proposed model resolves the overfitting issue by incorporating filtering layers for regularisation.
-
The designed framework is tested and implemented on a combined dataset NF-UQ-NIDS, yielding the best detection accuracy than the prior method.
-
The “Related Surveys” discusses the IDS model using machine learning and deep learning classifiers in Part 2.
-
The “Proposed framework” that discusses the proposed design of IDF in Part 3.
-
The “Experimental results and discussion” portrays the implementation setup and result in part 4.
-
The “Experimental results and discussion” portrays the implementation setup and result in part 4.
-
“Conclusion” concludes the proposed work’s findings in part 5.
Related surveys
Machine learning-oriented IDFs
Deep learning-based IDF
-
Data inconvenience: Intrusion detection is a sensitive topic wherein the security and privacy of the organisation and the users are highly involved. Hence, synthetic datasets display the enhanced efficiency of the designed techniques for the intervening period.
-
Running time: It involves both training and testing the data samples. In some cases, high-complexity models must be designed to achieve optimal solutions.
-
Number of parameters: Parameter setting defines the consecutive steps to accomplish the goals. It has two types, viz, learnable parameters, which are defined during training the features and the other, hyperparameters which are manually defined before the initialisation of the training process. Therefore, it must be addressed for efficient detection and predictions.
-
Feature representation: Machine learning models take the feature vector from raw data that reduces the issues of data overlapping, overfitting and under-fitting.
-
Interpretability: Data mining techniques like decision trees, Bayesian networks etc., have strong interpretability, whereas solution converging speed estimation, related to data skewness issues, is always a challenging task. Thus, the use of machine learning will give scalable convergence speed.
-
Class imbalance: It is a common problem. To some extent, machine learning models will resolve by designing single-label and multi-label classifications.
Proposed framework
System model
Design of intelligent intrusion detection framework (IDF)
Data pre-processing phase
-
A ➔ Specified element of the data;
-
d ➔ Count of digits in element A;
-
P ➔ First digit of the element A;
-
\({GN}_{I}\) ➔ Normalized value ranging between 0 and 1.
Feature selection phase
-
\({r}_{i-max}\)➔ Upper bounds for the ith variable;
-
\({r}_{i-min}\)➔ Lower bounds for the ith variable;
-
N ➔ Aggregate count of used agents.
-
rand ➔ Assuming a random number from [1–5];
-
N ➔ Random number between [0, 2];
-
j ➔ Present use of iteration;
-
maxiter ➔ Maximum use of iterations to execute the task.
Classification phase
Experimental results and discussion
Datasets | Volume of data | Volume of Training data | Volume of testing data |
---|---|---|---|
NF-BoTIoT | 600100 | 480080 | 120020 |
NF-ToNIoT | 1379274 | 1103419 | 275855 |
NF-UNSWNB15 | 1623118 | 1298494 | 324624 |
NF-CSE CI-CIDS2018 | 8392401 | 6713920 | 1678481 |
Datasets | Attacks classes | |
---|---|---|
NF-BoTIoT | Benign | 13859 |
DDoS | 56844 | |
DoS | 56833 | |
Reconnaissance | 470655 | |
Theft | 1909 | |
NF-ToNIoT | Benign | 270279 |
Backdoor | 17247 | |
DDoS | 326345 | |
DoS | 17717 | |
Injection | 468539 | |
Mitm | 1295 | |
password | 156299 | |
Ransomware | 142 | |
Xss | 99944 | |
Scanning | 21467 | |
NF-UNSWNB15 | Analysis | 1995 |
Backdoor | 1782 | |
benign | 1550712 | |
DoS | 5051 | |
Exploits | 24736 | |
Fuzzers | 19463 | |
Reconnaissance | 12291 | |
shellcode | 1365 | |
Worms | 153 | |
Generic | 5570 | |
NF-CSE CI-CIDS2018 | Benign | 7373198 |
Bot | 15683 | |
Brute force web | 2613 | |
Brute force XSS | 1745 | |
DDoS attack HOIC | 230 | |
DDoS attack LOIC UDP | 1667 | |
DDoS attacks LOIC HTTP | 378199 | |
DDoS attacks Golden Eye | 32850 | |
DDoS attacks hulk | 108136 | |
DDoS attacks slow HTTP test | 105550 | |
DoS attacks slowloris | 22825 | |
FTP bruteforce | 193360 | |
Infiltration | 62072 | |
SQL injection | 36 | |
SSH bruteforce | 94237 |
Target part of the intrusion | Origin of the intrusion |
---|---|
Application layer | Bruteforce; XSS; SQL injection; Fuzzers; DoS flood; DoS slowloris |
Network layer | Bruteforce; DoS based amplification; DoS synflood; Unsolicited traffic; Backdoor |
Original data (L4 SRC port as features) | Proposed I-GN technique |
---|---|
80 | 0.800 |
49160 | 0.491 |
3456 | 0.345 |
80 | 0.800 |
80 | 0.800 |
0 | 0 |
365 | 0.365 |
80 | 0.800 |
80 | 0.800 |
50850 | 0.508 |
Original data (L4 SRC port as features) | Proposed I-GN technique |
---|---|
63318 | 0.633 |
57442 | 0.574 |
57452 | 0.574 |
138 | 0.138 |
51989 | 0.519 |
53927 | 0.539 |
60453 | 0.604 |
49866 | 0.498 |
36125 | 0.361 |
0 | 0 |
Original data (L4 SRC port as features) | Proposed I-GN technique |
---|---|
62073 | 0.620 |
32284 | 0.322 |
21 | 0.210 |
23800 | 0.238 |
63062 | 0.630 |
57349 | 0.573 |
41660 | 0.416 |
29259 | 0.292 |
1813 | 0.181 |
20139 | 0.201 |
Original data (L4 SRC port as features) | Proposed I-GN technique |
---|---|
51128 | 0.511 |
443 | 0.443 |
12262 | 0.122 |
61023 | 0.610 |
443 | 0.443 |
55252 | 0.552 |
443 | 0.443 |
63445 | 0.634 |
49248 | 0.492 |
51109 | 0.511 |
Parameters | Representation |
---|---|
No. of rats (Nr) | Count of the available solutions |
Position of each rat | Solutions include chosen features |
Best rat | The solution has the optimal fitness value |
Opposite position of the rat | Change towards the best rat |
Fitness function | Evaluates by TPR, FPR and no. of features |
maxiter | Usage of iterations |
Features | Chosen features |
---|---|
IPV4 SRC ADDR; IPV4 DST ADDR; L4SRC PORT, L4 DST PORT; L7 PROTO, PROTOCOL; IN BYTES; OUT BYTES; TCP FLAGS; IN PKTS; OUT PKTS; FLOW DURATION MILLISECONDS | L4 SRC PORT; L4 DST PORT; PROTOCOL; TCP FLAGS; L7 PROTO; IN BYTES; OUT BYTES; IN PKTS & OUT PKTS |
Datasets | TPR ± STDV | FPR ± STDV | Accuracy ± STDV | F-measure ± STDV |
---|---|---|---|---|
NF-BoTIoT | 0.8012 ± (0.011) | 0.056 ± (0.007) | 0.859 ± (0.004) | 0.845 ± (0.006) |
NF-ToNIoT | 0.856 ± (0.014) | 0.085 ± (0.000) | 0.893 ± (0.008) | 0.884 ± 0.014 |
NF-UNSWNB15 | 0.707 ± (0.012) | 0.057 ± (0.003) | 0.652 ± (0.010) | 0.795 ± (0.005) |
NF-CSE CI-CIDS2018 | 0.613 ± (0.002) | 0.037 ± (0.003) | 0.805 ± (0.010) | 0.761 ± (0.002) |
Performance (%) | Proposed(NF_UQ-NIDS dataset) | LeNet network model (NSL-KDD dataset) [38] |
---|---|---|
Accuracy | 95.20 | 97.29 |
False positive rate | 2.5 | 6.5 |
Detection rate | 97.24 | 98.55 |