1 Introduction
1.1 Related work
1.2 Contributions
-
We propose five new features that are entropy rate of IP source flow, entropy rate of flow, entropy of packet size, entropy rate of packet size, and number of ICMP destination unreachable packet. Since the value of each feature has the same trend of change under all five types of DDoS attacks, the proposed features are effective on DDoS attacks detection.
-
We theoretically analyze the improvements of the proposed features over the existing features and evaluated their effectiveness on the real DDoS attack datasets.
-
By using the five features, our proposed framework outperforms the existing methods when detecting all five types of DDoS attacks and mixed DDoS attacks. The detection accuracy improvements over the existing methods are between 21% and 53%.
2 Proposed framework
2.1 Raw dataset splitting
2.2 The proposed feature extraction
2.2.1 Entropy rate of IP source flow
2.2.2 Entropy rate of flow
2.2.3 Entropy of packet size and entropy rate of packet size
2.2.4 Number of ICMP destination unreachable packet
2.3 Feature preprocessing
2.3.1 Feature normalization
2.3.2 Pearson correlation coefficient
2.4 Classification tasks
3 Feature performance analysis
3.1 Feature extraction
3.1.1 Packet features
3.1.2 Flow features
Feature No. | Feature |
---|---|
F1 | Entropy rate of IP source flow |
F2 | Entropy rate of flow |
F3 | Entropy of packet size |
F4 | Entropy rate of packet size |
F5 | Number of ICMP destination unreachable packet |
F6 | Number of packet |
F7 | Entropy of packet type |
F8 | Maximum of packet size |
F9 | Variance of packet size |
F10 | Number of ICMP packet |
F11 | Ratio of ICMP packet |
F12 | Number of HTTP packet |
F13 | Ratio of HTTP packet |
F14 | Number of DNS packet |
F15 | Ratio of DNS packet |
F16 | Number of SYN packet |
F17 | Ratio of SYN packet |
F18 | Number of ACK packet |
F19 | Ratio of ACK packet |
F20 | Number of flow |
F21 | Entropy of flow |
F22 | Entropy of IP source flow |
F23 | Entropy of IP destination flow |
F24 | Entropy of source port |
F25 | Entropy of destination port |
F26 | Mean inbound to outbound traffic ratio |
F27 | Entropy of inbound to outbound ratio |
3.2 Feature ranking
3.2.1 Datasets
-
ISCX: This dataset is provided by the Information Security Centre of Excellence (ISCX) at the University of New Brunswick, which resembles the true real-world data to generate benchmark datasets for intrusion detection [6]. The datasets consist of normal and anomalous traffic. We choose the dataset on Monday, which has benign background traffic that resembles normal human activities. This dataset is also used by [11].
-
SYN flooding (SYN): This dataset is from the Impact [4], which is collected by the University of Southern California-Information Sciences Institute and contains an SYN flooding attack.
-
DNS amplification (DNS): This dataset comes from the Impact [4] and contains a DNS amplification attack.
-
Spoofing (Spoof.): This dataset is from the CAIDA [5]. The attack is implemented by modifying the source IP address of the packets to conceal the identity of attackers and compromised machines.
Dataset | Duration (s) | Packet | PacketRate | Average bits/s | |
---|---|---|---|---|---|
Normal | ISCX | 29,135.87 | 11,709,971 | 401 | 2.864 Mbps |
LLS | 6,616.45 | 347,987 | 56.4 | 78 kbps | |
Attack | SYN | 300 | 3,368,576 | 11,228.6 | 66 Mbps |
DNS | 300 | 9,319,873 | 31,064.8 | 39 M | |
LowRate | 300 | 166,448 | 554.8 | 998 K | |
Pulsing | 300 | 37,116 | 299.9 | 130K | |
Spoofing | 300 | 543,957 | 1,813.2 | 992K |
3.2.2 Performance ranking
Feature | ISCX | LLS | Mean | Rank | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SYN | DNS | Low | Puls. | Spoof. | SYN | DNS | Low | Puls. | Spoof. | |||
F1 | 0.95 | 0.99 | 0.84 | 0.64 | 0.93 | 0.93 | 0.93 | 0.93 | 0.81 | 0.93 | 0.88 | 4 |
F2 | 0.95 | 0.99 | 0.87 | 0.59 | 0.88 | 0.94 | 0.94 | 0.93 | 0.78 | 0.93 | 0.88 | 5 |
F3 | 0.99 | 1.0 | 0.87 | 0.62 | 0.95 | 0.99 | 1.0 | 0.99 | 0.92 | 0.99 | 0.93 | 1 |
F4 | 0.99 | 0.99 | 0.85 | 0.58 | 0.94 | 0.96 | 0.96 | 0.96 | 0.86 | 0.96 | 0.90 | 3 |
F5 | 0.87 | 0.50 | 1.0 | 1.0 | 1.0 | 0.88 | 0.50 | 1.0 | 1.0 | 1.0 | 0.87 | 6 |
F6 | 0.99 | 1.0 | 0.86 | 0.59 | 0.95 | 0.99 | 1.0 | 0.99 | 0.90 | 0.99 | 0.92 | 2 |
F7 | 0.86 | 0.89 | 0.79 | 0.62 | 0.53 | 0.62 | 0.63 | 0.90 | 0.83 | 0.66 | 0.73 | 15 |
F8 | 0.50 | 0.51 | 0.56 | 0.58 | 0.60 | 0.87 | 1.0 | 0.71 | 0.69 | 0.71 | 0.67 | 20 |
F9 | 0.79 | 0.52 | 0.57 | 0.61 | 0.76 | 0.99 | 0.81 | 0.70 | 0.67 | 0.59 | 0.70 | 18 |
F10 | 0.87 | 0.50 | 1.0 | 1.0 | 1.0 | 0.88 | 0.50 | 1.0 | 1.0 | 1.0 | 0.87 | 7 |
F11 | 0.87 | 0.50 | 0.99 | 0.99 | 0.99 | 0.88 | 0.50 | 0.99 | 0.99 | 0.99 | 0.86 | 8 |
F12 | 0.99 | 0.53 | 0.53 | 0.53 | 0.53 | 1.0 | 0.50 | 0.50 | 0.50 | 0.50 | 0.61 | 22 |
F13 | 0.96 | 0.77 | 0.67 | 0.56 | 0.73 | 0.99 | 0.52 | 0.52 | 0.51 | 0.52 | 0.67 | 19 |
F14 | 0.91 | 0.55 | 0.55 | 0.55 | 0.55 | 0.99 | 0.83 | 0.83 | 0.83 | 0.83 | 0.74 | 14 |
F15 | 0.64 | 0.74 | 0.66 | 0.59 | 0.71 | 0.53 | 0.53 | 0.58 | 0.73 | 0.53 | 0.62 | 21 |
F16 | 1.0 | 0.60 | 0.79 | 0.66 | 0.60 | 1.0 | 0.50 | 0.87 | 0.74 | 0.50 | 0.72 | 16 |
F17 | 0.76 | 0.79 | 0.63 | 0.66 | 0.77 | 0.88 | 0.56 | 0.75 | 0.71 | 0.56 | 0.70 | 17 |
F18 | 0.99 | 0.59 | 0.62 | 0.58 | 0.74 | 0.99 | 0.70 | 0.86 | 0.85 | 0.97 | 0.78 | 13 |
F19 | 0.65 | 0.99 | 0.93 | 0.61 | 0.98 | 0.55 | 0.93 | 0.86 | 0.69 | 0.92 | 0.81 | 11 |
F20 | 1.0 | 0.51 | 0.70 | 0.60 | 1.0 | 0.99 | 0.80 | 0.99 | 0.98 | 0.99 | 0.85 | 9 |
F21 | 0.99 | 0.55 | 0.60 | 0.51 | 1.0 | 0.99 | 0.80 | 0.97 | 0.91 | 0.99 | 0.83 | 10 |
F22 | 1.0 | 0.51 | 0.57 | 0.65 | 0.78 | 0.99 | 0.81 | 0.98 | 0.65 | 0.99 | 0.79 | 12 |
4 Experimental results
4.1 Experiment settings
4.2 Evaluation metric
4.3 Classification results in different attack scenarios
4.3.1 Specific type of attacks
Attack | Normal | DT | DL | KNN | LR | RF | SVM |
---|---|---|---|---|---|---|---|
SYN | ISCX | 0.99 | 0.76 | 0.92 | 0.97 | 0.98 | 0.98 |
LLS | 0.99 | 0.87 | 0.98 | 0.99 | 0.99 | 0.99 | |
DNS | ISCX | 1.0 | 0.49 | 1.0 | 1.0 | 1.0 | 1.0 |
LLS | 0.99 | 0.48 | 0.99 | 0.99 | 0.99 | 0.99 | |
LowRate | ISCX | 0.99 | 0.99 | 0.99 | 1.0 | 0.99 | 0.99 |
LLS | 0.99 | 1.0 | 1.0 | 0.99 | 1.0 | 1.0 | |
Pulsing | ISCX | 1.0 | 0.99 | 1.0 | 1.0 | 1.0 | 0.99 |
LLS | 0.99 | 1.0 | 0.99 | 1.0 | 1.0 | 1.0 | |
Spoofing | ISCX | 1.0 | 0.98 | 1.0 | 1.0 | 1.0 | 0.99 |
LLS | 0.99 | 0.98 | 1.0 | 0.99 | 1.0 | 0.99 |
4.3.2 Mixed attacks
DT | DL | KNN | LR | RF | SVM | |
---|---|---|---|---|---|---|
ISCX | 1.0 | 0.92 | 1.0 | 1.0 | 1.0 | 1.0 |
LLS | 0.99 | 0.98 | 1.0 | 1.0 | 1.0 | 1.0 |
Datasets | Binary class labelling | Multi-class labelling | |||
---|---|---|---|---|---|
Label | Ratio(%) | Label | Ratio(%) | ||
Normal | ISCX | 0 | 95.92 | 0 | 79.16 |
LLS | 0 | 1 | 16.75 | ||
Attack | SYN | 1 | 4.07 | 2 | 0.81 |
DNS | 1 | 3 | 0.81 | ||
LowRate | 1 | 4 | 0.81 | ||
Pulsing | 1 | 5 | 0.81 | ||
Spoofing | 1 | 6 | 0.81 |
DT | DL | KNN | LR | RF | SVM | |
---|---|---|---|---|---|---|
ISCX | 0.97 | 0.62 | 0.93 | 0.65 | 0.98 | 0.68 |
LLS | 0.98 | 0.67 | 0.94 | 0.82 | 0.98 | 0.75 |
4.4 Comparison with existing methods
-
RADAR [50]: This method detects SYN flooding by analyzing the SYN to ACK packet ratio. The threshold of the SYN/ACK ratio is set to 0.5 as indicated by the authors.
-
Umbrella [35]: This method identifies malicious traffic by analyzing the packet loss rate and the number of packet. The threshold of the packet loss rate is set to 5% as indicated by the authors.
-
GE [47]: This method measures the generalized entropy distance between legitimate and attack traffic. The order of generalized entropy \(\alpha\) is set to 5 as indicated by the authors.
-
Entropy [49]: This method measures the distance of entropy of IP source flow between legitimate and attack traffic.
-
SAFETY [28]: This method measures the normalized entropy distance between legitimate and attack traffic.
-
MLP [37]: This method selects three features, which are number of packets, entropy of IP source flow, and average of inter-arrival time, and chooses the best method MLP (i.e., DL) from four methods.
-
SKM-HFS [23]: This method selects 6 features from 9 features using Hybrid feature selection method and uses Semi-supervised K-means algorithm to detect attacks.
-
Fuzzy [43]: This method selects 8 features from 23 features based on chi-square and selects the best method Fuzzy c-means from 6 methods.
4.4.1 Detection performance on specific type of attacks
Method | ISCX | LLS | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
SYN | DNS | Low | Puls | Spoof. | SYN | DNS | Low | Puls. | Spoof. | |
RADAR | 0.49 | 0.49 | 0.51 | 0.49 | 0.49 | 0.47 | 0.47 | 0.49 | 0.47 | 0.47 |
Umbrella | 0.49 | 0.49 | 0.49 | 0.49 | 0.48 | 0.48 | 0.48 | 0.48 | 0.48 | 0.48 |
GE | *0.96 | 0.50 | *0.50 | 0.66 | *0.50 | *0.98 | *0.57 | *0.71 | *0.55 | *0.53 |
Entropy | *0.92 | *0.50 | 0.50 | *0.51 | *0.99 | *0.99 | *0.57 | *0.77 | *0.70 | *0.99 |
SAFETY | *0.50 | 0.54 | 0.55 | *0.52 | *0.91 | *0.71 | 0.66 | 0.60 | *0.53 | *1.0 |
MLP | 0.49 | 0.49 | 0.49 | 0.49 | 0.49 | 0.48 | 0.48 | 0.48 | 0.48 | 0.48 |
SKM-HFS | 0.27 | 0.40 | 0.40 | 0.40 | 0.27 | 0.99 | 0.33 | 0.30 | 0.41 | 0.99 |
Fuzzy | 0.52 | 0.63 | 0.49 | 0.47 | 0.47 | 0.55 | 0.94 | 0.52 | 0.43 | 0.42 |
Our | 0.98 | 1.0 | 0.99 | 1.0 | 1.0 | 0.99 | 0.99 | 1.0 | 1.0 | 1.0 |
4.4.2 Detection performance on mixed attacks
RADAR | Umbrella | GE | Entropy | SAFETY | MLP | SKM | Fuzzy | Our | |
---|---|---|---|---|---|---|---|---|---|
ISCX | 0.49 | 0.49 | *0.98 | *0.99 | *0.50 | 0.49 | 0.27 | 0.56 | 1.0 |
LLS | 0.47 | 0.48 | *0.98 | *0.99 | *0.74 | 0.48 | 0.45 | 0.62 | 1.0 |
4.4.3 Running time
ISCX | LLS | |||
---|---|---|---|---|
Avg. | Total | Avg. | Total | |
RADAR | 1.0e-06 | 0.014 | 1.2e-06 | 0.0038 |
Umbrella | 7.6e-07 | 0.011 | 8.4e-07 | 0.0027 |
GE | 2.4e-05 | 0.3532 | 1.6e-05 | 0.0517 |
Entropy | 1.3e-05 | 0.1913 | 3.5e-05 | 0.1131 |
SAFETY | 1.8e-05 | 0.2649 | 1.4e-05 | 0.0452 |
MLP | 7.7e-07 | 0.0113 | 9.3e-07 | 0.0030 |
SKM-HFS | 4.4e-07 | 0.0065 | 9.2e-07 | 0.0029 |
Fuzzy | 2.7e-07 | 0.0038 | 3.3e-07 | 0.0009 |
Our | 5.2e-07 | 0.0074 | 1.1e-06 | 0.0035 |