Introduction
Related work
Approach | Working idea | Dataset | Detection rate | False alarm |
---|---|---|---|---|
Unsupervised anomaly detection system [7] | Tune and optimize automatically the values of parameters without pre-defining them | From Kyoto University honeypot | – | – |
Multiclass SVM [8] | Attributes are optimized using k-fold cross validation. This technique can be used to decrease the rate of False-Negatives in the IDS | Self | – | – |
OC-SVM One-Class SVM [9] | Multistage OC-SVM and feature extraction represents a method to detect unknown attacks. Method is poor in second stage classifier to detection rate of unknown attacks | From Kyoto | 80.00 | 20.94 |
IG-ABC-SVM Information Gain- Artificial Bee Colony [10] | A combining IG feature selection and SVM classifier in IDS model is proposed. Experiments using just two swarm intelligence algorithms | NSL-KDD | 98.53 | 0.03 |
SbSVM [11] | Autonomous labeling algorithm of normal traffic (when the class distribution is not imbalanced) Not evaluated for real time case | DARPA | 99 | 5.5 |
RS-ISVM- reserved set -Incremental SVM [12] | An incremental SVM training algorithms is used, hybrid with modifying kernel function U-RBF Foreseeing attacks, specifically for attacks of U2R and R2L may not tolerate but oscillation problem solved | KDD Cup 1999 | 89.17 | 4.9 |
SVM-GA [13] | Hybrid model by combining (GA and SVM) | KDD CUP 1999 | 98.33 | 0.50 |
Genetic principal Component [14] | Subset selection using GA and PCA | KDD cup 1999 | 99.96 | 0.49 |
SVM and NN [15] | Hybrid process Most significant performance as far as training time but time consuming and hard task to trigger | DARPA | 99.87 | – |
N-KPCA-GA-SVM kernel principal component analysis-genetic algorithm (GA)-SVM [16] | Hybrid of KPCA, SVM and GA algorithms. Faster convergence speed. Performs higher predictive accuracy and better generalization But have complex structure and have latency for real time application | KDD CUP99 | 96.37 | 0.95 |
CSV-ISVM Candidate Support Vector -Incremental SVM [17] | Candidate Support Vector -Incremental SVM improves detection rate and false alarm rate | KDD Cup 1999 | 90.14 | 2.31 |
Hybrid approach of K-Medoids, SVM and Naïve Bayes [18] | Hybrid learning approach through a combination of K-Medoids clustering, Selecting Feature using SVM, and Naïve Bayes classifier | KDD Cup 1999 | 90.1 | 6.36 |
Distance of sum-based SVM [19] | Hybrid learning method named distance sum-based support vector machine (DSSVM) | KDD Cup 1999 | ||
SVM and GA [20] | Hybrid method consisting of GA and SVM for intrusion detection system | KDD Cup 1999 | 0.98 | 0. 017 |
PCA and SVM [21] | Hybrid model by integrating the principal component analysis (PCA) and (SVM) | NSL-KDD | 0.9655 | – |
SVM with GA [5] | FWP-SVM-genetic algorithm (feature selection, weight, and parameter optimization of support vector machine based on the genetic algorithm) | KDD Cup 1999 | 96.61 | 3.39 |
SVM for Anomaly in smart city wireless network [22] | Compare SVM and isolation forests to detect anomalies in a laboratory that reproduces a real smart city use case with heterogeneous devices, algorithms, protocols, and network configurations | smart city WSNs | 85 | 5%-10% |
SVM and RBF and [23] | SVM using Radial-basis kernel function (RBF) and a Particle Swarm Optimization algorithm to optimize the parameters of SVM | NSL-KDD | 99.8 | 0.9 |
SVM and GA [24] | Hybrid classification algorithm (GSVM) based Gravitational Search Algorithm (GSA) and support vector machines (SVM) to optimize the accuracy of the SVM classifier by detecting the subset of the best values of the kernel parameters for the SVM classifier | KDD CUP 99 | 97.5 | 0.03 |
Support Vector Machine (SVM) Based on Wavelet Transform (WT) [25] | Support Vector Machine SVM based on Wavelet Transform(WT) | UNSW-NB15 | 95.92 | – |
Method
Proposed algorithm-long short term memory(LSTM)
Experiment setup
Data
KDD99
Statistics about KDD training set
Number of samples | Number of distinct samples | Possible reduction percentage | |
---|---|---|---|
Attacks | 3,925,650 | 262,178 | 93.32% |
Normal | 972,781 | 812,814 | 16.44% |
Total | 4,898,431 | 1,074,992 | 78.05% |
Statistics about KDD testing set
Number of samples | Number of distinct samples | Possible reduction percentage | |
---|---|---|---|
Attacks | 250,436 | 29,378 | 88.26% |
Normal | 60,591 | 47,911 | 20.92% |
Total | 311,027 | 77,289 | 75.15% |
NSL-KDD
NSL _KDD advantages
-
It does not contain redundant samples in testing samples so that solves bias problem
-
It does not contain duplicate records in the test set which have better reduction rates.
-
All samples for any attack type has the same percentage of its number in KDD.
NSL statistics
Type of attacks available in NSL-KDD
Attack type | Attack name |
---|---|
DoS | Back, Land, Neptune, Pod, Smurf, Teardrop, Mailbomb, Processtable, Udpstorm, Apache2, Worm |
R2L | Guess_password, Ftp_write, Imap, Phf, Multihop, Warezmaster, Xlock, Xsnoop, Snmpguess, Snmpgetattack, Httptunnel, Sendmail, Named |
U2R | Buffer_overflow, Loadmodule, Rootkit, Perl, Sqlattack, Xterm, Ps |
Probe | Satan, IPsweep, Nmap, Portsweep, Mscan, Saint |
Normal | Normal |
Unknown | Unknown |
Results and discussion
Methodology | False-positive FP | False negative FN | Accuracy |
---|---|---|---|
Deep learning (LSTM) | 0.01 | 0.03 | 0.9676 |
SVM | 0.1 | 0.03 | 0.87 |