Introduction
Related works
Methodology
Dataset
No. | Type of classes | Amount | No. | Type of sub-classes | Amount |
---|---|---|---|---|---|
1 | Normal | 300,000 | 1 | Normal | 300,000 |
2 | Attack | 161,043 | 2 | Backdoor | 20,000 |
3 | ddos | 20,000 | |||
4 | dos | 20,000 | |||
5 | Injection | 20,000 | |||
6 | Password | 20,000 | |||
7 | Ransomware | 20,000 | |||
8 | Scanning | 20,000 | |||
9 | xss | 20,000 | |||
10 | mitm | 1043 |
No. | Feature | Description | No. | Feature | Description |
---|---|---|---|---|---|
1 | ts | Timestamp of connection between flow identifiers | 24 | dns_rejected | DNS rejection, where the DNS queries are rejected by the server |
2 | src_ip | Source IP addresses which originate endpoints’ IP addresses | 25 | ssl_version | SSL version which is offered by the server |
3 | Src_port | Source ports which Originate endpoint’s TCP/UDP ports | 26 | ssl_cipher | SSL cipher suite which the server chose |
4 | Dst_ip | Destination IP addresses which respond to endpoint’s IP addresses | 27 | ssl_resumed | SSL flag indicates the session that can be used to initiate new connections, where T refers to the SSL connection is initiated |
5 | Dst_port | Destination ports which respond to endpoint’s TCP/UDP ports | 28 | ssl_established | SSL flag indicates establishing connections between two parties, where T refers to establishing the connection |
6 | proto | Transport layer protocols of flow connections | 29 | ssl_subject | Subject of the X.509 cert offered by the server |
7 | Service | Dynamically detected protocols, such as DNS, HTTP and SSL | 30 | ssl_issuer | Trusted owner/originator of SLL and digital certificate (certificate authority) |
8 | Duration | The time of the packet connections, which is estimated by subtracting ‘time of the last packet seen’ and ‘time of the first packet seen’ | 31 | http_trans_depth | Pipelined depth into the HTTP connection |
9 | src_bytes | Source bytes which are originated from payload bytes of TCP sequence number | 32 | http_method | HTTP request methods such as GET, POST and HEAD |
10 | dst_bytes | Destination bytes which are responded payload bytes from TCP sequence numbers | 33 | http_uri | URIs used in the HTTP request |
11 | conn_state | Various connection states, such as S0 (connection without replay), S1 (connection established), and REJ (connection attempt rejected) | 34 | http_version | The HTTP versions utilized such as V1.1 |
12 | missed_bytes | Number of missing bytes in content gaps | 35 | http_request_body_len | Actual uncompressed content sizes of the data transferred from the HTTP client |
13 | src_pkts | Number of original packets which is estimated from source systems | 36 | http_response_body_len | Actual uncompressed content sizes of the data transferred from the HTTP server |
14 | src_ip_bytes | Number of original IP bytes which is the total length of IP header field of source systems | 37 | http_status_code | Status codes returned by the HTTP server |
15 | dst_pkts | Number of destination packets which is estimated from destination systems | 38 | http_user_agent | Values of the UserAgent header in the HTTP protocol |
16 | dst_ip_bytes | Number of destination IP bytes which is the total length of IP header field of destination systems | 39 | http_orig_mime_types | Ordered vectors of mime types from source system in the HTTP protocol |
17 | dns_query | Domain name subjects of the DNS queries | 40 | http_resp_mime _types | Ordered vectors of mime types from destination system in the HTTP protocol |
18 | dns_qclass | Values which specifie the DNS query classes | 41 | weird_name | Names of anomalies/violations related to protocols that happened |
19 | dns_qtype | Value which specifies the DNS query types | 42 | weird_addl | Additional information is associated to protocol anomalies/violations |
20 | dns_rcode | Response code values in the DNS responses | 43 | weird_notice | It indicates if the violation/anomaly was turned into a notice |
21 | dns_AA | Authoritative answers of DNS, where T denotes server is authoritative for query | 44 | Label | Tag normal and attack records, where 0 indicates normal and 1 indicates attacks |
22 | dns_RD | Recursion desired of DNS, where T denotes request recursive lookup of query | 45 | Type | Tag attack categories, such as normal, DoS, DDoS and backdoor attacks, and normal records |
23 | dns_RA | Recursion available of DNS, where T denotes server supports recursive queries |
Phase 1 data preprocessing
Feature elimination
Missing value handling
Duplicates removal
Non-numerical features encoding
Data splitting
Normalization
Phase 2 feature reduction
Feature selection
Feature extraction
Phase 3 attack classification
Decision tree (DT)
Random forest (RF)
k-Nearest neighbors (kNN)
Naive Bayes (NB)
Multi-layer perceptron (MLP)
Experimental setup and analysis
Experimental setup
Hardware | Description |
---|---|
Computing platform | Google colab |
Process | 2-core Xeon 2.2 GHz |
RAM | 16 GB |
Disk usage | 100 GB |
Software | Description |
---|---|
Operating system | Linux |
Machine | x86_64 |
Python | 3.10.12 |
Other packages | Pandas, Numpy, Scikit-learn, Matplotlib, Scipy, Scikit-plot, and time |
Performance evaluation
Hyperparameter settings of classifiers
Machine learning model | Hyperparameter settings |
---|---|
DT | Criterion: Gini Splitter: best max_depth: None random_state: 42 |
RF | n_estimators: 100 criterion: Gini max_depth: 5 random_state: 42 |
kNN | n_neighbors: 3 |
NB | Default parameters |
MLP | hidden_layer_sizes: 100 activation: relu alpha: 0.0001 batch_size: auto learning_rate: constant max_iter: 200 |
Features selected based on correlation thresholds
Correlation threshold | Number of features | Selected features |
---|---|---|
[− 0.01 0.01] | 9 | ‘conn_state_RSTRH’, ‘conn_state_S3’, ‘proto_icmp’, ‘conn_state_SHR’, ‘conn_state_S1’, ‘conn_state_SH’, ‘service_http’, ‘conn_state_S0’, ‘conn_state_REJ’ |
[− 0.015 0.015] | 22 | ‘conn_state_RSTO’, ‘service_smb’, ‘dns_qclass’, ‘conn_state_SF’, ‘dns_AA_T’, ‘service_smb;gssapi’, ‘service_ftp’, ‘service_dhcp’, ‘conn_state_RSTOS0’, ‘service_dce_rpc’, ‘service_gssapi’, ‘conn_state_S2’, ‘dns_RA_T’, ‘conn_state_RSTRH’, ‘conn_state_S3’, ‘proto_icmp’, ‘conn_state_SHR’, ‘conn_state_S1’, ‘conn_state_SH’, ‘service_http’, ‘conn_state_S0’, ‘conn_state_REJ’ |
[− 0.02 0.02] | 33 | ‘ssl_established_F’, ‘missed_bytes’, ‘dns_RD_F’, ‘dns_rejected_T’, ‘dns_RD_T’, ‘conn_state_RSTR’, ‘dns_qtype’, ‘http_method_HEAD’, ‘proto_udp’, ‘conn_state_RSTO’, ‘service_smb’, ‘dns_qclass’, ‘conn_state_SF’, ‘dns_AA_T’, ‘service_smb;gssapi’, ‘service_ftp’, ‘service_dhcp’, ‘conn_state_RSTOS0’, ‘service_dce_rpc’, ‘service_gssapi’, ‘conn_state_S2’, ‘dns_RA_T’, ‘conn_state_RSTRH’, ‘conn_state_S3’, ‘proto_icmp’, ‘conn_state_SHR’, ‘conn_state_S1’, ‘conn_state_SH’, ‘service_http’, ‘conn_state_S0’, ‘conn_state_REJ’, ‘conn_state_OTH’, ‘proto_tcp’ |
[− 0.03 0.03] | 47 | ‘dns_query’, ‘service_dns’, ‘src_bytes’, ‘dns_RA_F’, ‘dst_bytes’, ‘dns_AA_F’, ‘service_ssl’, ‘ssl_resumed_F’, ‘http_response_body_len’, ‘dns_rejected_F’, ‘dns_rcode’, ‘ssl_established_F’, ‘missed_bytes’, ‘dns_RD_F’, ‘dns_rejected_T’, ‘dns_RD_T’, ‘conn_state_RSTR’, ‘dns_qtype’, ‘http_method_HEAD’, ‘proto_udp’, ‘conn_state_RSTO’, ‘service_smb’, ‘dns_qclass’, ‘conn_state_SF’, ‘dns_AA_T’, ‘service_smb;gssapi’, ‘service_ftp’, ‘service_dhcp’, ‘conn_state_RSTOS0’, ‘service_dce_rpc’, ‘service_gssapi’, ‘conn_state_S2’, ‘dns_RA_T’, ‘conn_state_RSTRH’, ‘conn_state_S3’, ‘proto_icmp’, ‘conn_state_SHR’, ‘conn_state_S1’, ‘conn_state_SH’, ‘service_http’, ‘conn_state_S0’, ‘conn_state_REJ’, ‘conn_state_OTH’, ‘proto_tcp’, ‘dns_AA_NA’, ‘dns_rejected_NA’, ‘dns_RD_NA’, ‘dns_RA_NA’, ‘service_NA’ |
N/A | 77 | All the features of the dataset transformed by pre-processing stage |
Features extracted based on PCA
Result and analysis
Binary classification
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 80.73 | 79.50 | 77.74 | 78.44 | 0.5721 | 8.38 | 0.22 | 12.11 |
RF | 79.07 | 78.53 | 74.55 | 75.73 | 0.5293 | 7.56 | 801.32 | |
kNN | 77.44 | 82.90 | 69.38 | 70.66 | 0.5050 | 1.09 | 709,996.65 | |
NB | 78.99 | 82.82 | 71.94 | 73.58 | 0.5366 | 0.11 | 12.50 | |
MLP | 80.73 | 79.50 | 77.74 | 78.44 | 0.5721 | 37.3 | 136.17 | |
Feature extraction | ||||||||
DT | 86.54 | 85.12 | 86.33 | 85.62 | 0.7128 | 5.27 | 1.44 | 8.10 |
RF | 86.45 | 85.02 | 86.30 | 85.54 | 0.7127 | 18.61 | 838.21 | |
kNN | 71.00 | 76.33 | 76.79 | 70.99 | 0.3409 | 1.55 | 10,172.81 | |
NB | 83.35 | 81.76 | 82.68 | 82.15 | 0.6443 | 0.13 | 18.52 | |
MLP | 86.30 | 84.85 | 86.26 | 85.41 | 0.7111 | 81.58 | 139.20 |
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 81.27 | 80.00 | 78.55 | 79.15 | 0.5853 | 7.82 | 0.46 | 11.85 |
RF | 77.72 | 84.92 | 69.30 | 70.57 | 0.5192 | 8.99 | 776.08 | |
kNN | 78.65 | 85.26 | 70.66 | 72.19 | 0.5398 | 0.07 | 196,772.36 | |
NB | 78.34 | 85.02 | 70.24 | 71.69 | 0.5324 | 0.17 | 28.58 | |
MLP | 81.27 | 80.00 | 78.55 | 79.15 | 0.5853 | 56.56 | 174.12 | |
Feature extraction | ||||||||
DT | 85.94 | 84.49 | 85.55 | 84.94 | 0.7119 | 4.92 | 1.84 | 12.71 |
RF | 86.54 | 85.11 | 86.37 | 85.63 | 0.7147 | 26.44 | 631 | |
kNN | 64.29 | 62.85 | 63.80 | 62.82 | 0.7287 | 0.05 | 193,070.46 | |
NB | 84.77 | 83.26 | 84.75 | 83.83 | 0.6799 | 0.19 | 37.25 | |
MLP | 86.53 | 85.11 | 86.42 | 85.64 | 0.7151 | 128.43 | 478.01 |
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 86.40 | 84.96 | 86.19 | 85.47 | 0.7114 | 8.28 | 0.64 | 17.69 |
RF | 85.90 | 84.45 | 86.17 | 85.07 | 0.7059 | 11.74 | 848.32 | |
kNN | 83.75 | 86.96 | 78.30 | 80.30 | 0.6469 | 0.13 | 231,367.82 | |
NB | 79.92 | 85.77 | 72.51 | 74.33 | 0.5675 | 0.27 | 40.52 | |
MLP | 86.45 | 85.01 | 86.29 | 85.54 | 0.7129 | 75.38 | 184.73 | |
Feature extraction | ||||||||
DT | 86.83 | 85.42 | 86.59 | 85.91 | 0.7201 | 6.13 | 3.13 | 11.58 |
RF | 86.58 | 85.15 | 86.40 | 85.67 | 0.7154 | 38.67 | 657.02 | |
kNN | 89.10 | 87.78 | 89.28 | 88.39 | 0.7669 | 0.06 | 227,237.45 | |
NB | 83.37 | 83.56 | 79.55 | 80.89 | 0.6299 | 0.27 | 45.47 | |
MLP | 86.54 | 85.11 | 86.35 | 85.62 | 0.7151 | 45.43 | 84.14 |
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 84.23 | 83.44 | 81.68 | 82.40 | 0.6509 | 5.47 | 0.86 | 30.00 |
RF | 86.23 | 84.82 | 85.76 | 85.23 | 0.7057 | 15.96 | 524.71 | |
kNN | 82.82 | 82.28 | 79.54 | 80.55 | 0.6176 | 0.09 | 148,293.32 | |
NB | 81.20 | 84.15 | 75.15 | 77.02 | 0.5861 | 0.28 | 44.95 | |
MLP | 86.52 | 85.09 | 86.34 | 85.61 | 0.7142 | 67.58 | 72.34 | |
Feature extraction | ||||||||
DT | 83.81 | 82.92 | 85.61 | 83.26 | 0.6848 | 5.01 | 6.50 | 10.47 |
RF | 86.94 | 85.54 | 86.72 | 86.04 | 0.7225 | 35.72 | 569.59 | |
kNN | 86.76 | 85.34 | 87.16 | 86.00 | 0.7129 | 0.05 | 147,798.15 | |
NB | 69.74 | 70.52 | 59.96 | 58.85 | 0.2859 | 0.21 | 43.87 | |
MLP | 86.59 | 85.16 | 86.39 | 85.67 | 0.7152 | 59.03 | 105.07 |
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 78.28 | 76.59 | 75.24 | 75.79 | 0.5128 | 0 | 1.65 | 24.21 |
RF | 88.22 | 86.99 | 89.56 | 87.69 | 0.7651 | 12.94 | 553.13 | |
kNN | 80.55 | 80.74 | 83.44 | 80.19 | 0.6413 | 0.09 | 188,417.64 | |
NB | 59.57 | 71.04 | 67.75 | 59.20 | 0.3865 | 0.36 | 55.02 | |
MLP | 86.58 | 85.15 | 86.38 | 85.66 | 0.7153 | 70.78 | 83.49 | |
Feature extraction | ||||||||
DT | 74.68 | 73.37 | 75.10 | 73.64 | 0.4845 | 3.98 | 10.01 | 12.25 |
RF | 87.04 | 85.65 | 86.78 | 86.14 | 0.7243 | 47.68 | 579.27 | |
kNN | 80.56 | 80.75 | 83.45 | 80.19 | 0.6414 | 0.08 | 186,251.17 | |
NB | 79.76 | 81.24 | 73.97 | 75.58 | 0.5473 | 0.28 | 63.89 | |
MLP | 86.59 | 85.16 | 86.39 | 85.67 | 0.7153 | 86.44 | 152.35 |
Multiclass classification
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 72.65 | 40.81 | 33.33 | 29.39 | 0.4553 | 8.38 | 0.75 | 11.00 |
RF | 71.42 | 33.75 | 27.65 | 24.06 | 0.3606 | 10.00 | 826.79 | |
kNN | 70.11 | 33.06 | 26.29 | 22.29 | 0.5271 | 4.26 | 807,363.15 | |
NB | 19.22 | 24.14 | 34.37 | 19.2 | 0.2498 | 0.88 | 50.40 | |
MLP | 72.65 | 40.81 | 33.33 | 29.39 | 0.4553 | 72.16 | 230.48 | |
Feature extraction | ||||||||
DT | 77.04 | 57.75 | 45.29 | 40.66 | 0.5768 | 5.27 | 1.40 | 12.13 |
RF | 76.25 | 42.42 | 43.33 | 36.83 | 0.5563 | 20.99 | 1317.2 | |
kNN | 41.74 | 43.36 | 21.24 | 15.26 | 0.1835 | 1.53 | 10,193.23 | |
NB | 52.43 | 30.69 | 50.32 | 32.64 | 0.4104 | 0.81 | 72.35 | |
MLP | 76.90 | 53.66 | 45.11 | 39.77 | 0.5692 | 129.93 | 239.49 |
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 73.52 | 59.02 | 35.47 | 32.53 | 0.4781 | 7.82 | 1.60 | 19.29 |
RF | 69.24 | 28.63 | 21.32 | 16.82 | 0.3446 | 10.19 | 790.10 | |
kNN | 72.55 | 60.57 | 30.39 | 28.82 | 0.4451 | 0.64 | 192,447.54 | |
NB | 19.36 | 34.90 | 41.03 | 17.70 | 0.2758 | 1.58 | 99.98 | |
MLP | 73.52 | 59.02 | 35.47 | 32.53 | 0.4781 | 107.55 | 179.01 | |
Feature extraction | ||||||||
DT | 77.25 | 72.18 | 48.18 | 45.00 | 0.5840 | 4.92 | 3.47 | 12.93 |
RF | 77.14 | 61.21 | 45.24 | 40.45 | 0.5716 | 25.02 | 922.89 | |
kNN | 51.86 | 41.04 | 31.09 | 29.50 | 0.5809 | 1.23 | 200,249.55 | |
NB | 55.84 | 37.60 | 63.32 | 41.57 | 0.4618 | 1.92 | 169.60 | |
MLP | 77.43 | 66.47 | 46.00 | 41.50 | 0.5793 | 180.86 | 135.30 |
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 77.23 | 77.23 | 45.69 | 41.00 | 0.5750 | 8.28 | 1.20 | 15.34 |
RF | 70.21 | 37.68 | 23.57 | 20.42 | 0.3751 | 26.77 | 1558.13 | |
kNN | 76.23 | 65.60 | 40.58 | 37.63 | 0.5440 | 0.69 | 225,211.54 | |
NB | 32.64 | 29.45 | 45.45 | 23.91 | 0.2758 | 0.96 | 157.66 | |
MLP | 77.28 | 66.66 | 45.83 | 41.26 | 0.5762 | 158.29 | 198.80 | |
Feature extraction | ||||||||
DT | 77.62 | 67.34 | 48.67 | 45.53 | 0.5831 | 6.13 | 5.18 | 21.81 |
RF | 77.33 | 61.54 | 45.64 | 41.19 | 0.5753 | 30.58 | 901.71 | |
kNN | 66.34 | 46.62 | 33.70 | 30.94 | 0.4532 | 0.66 | 227,584.21 | |
NB | 44.71 | 42.86 | 57.43 | 39.01 | 0.3895 | 1.67 | 233.32 | |
MLP | 77.45 | 68.01 | 46.00 | 41.50 | 0.5792 | 213.14 | 194.79 |
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 77.25 | 72.18 | 48.18 | 45.00 | 0.5524 | 5.47 | 3.47 | 12.93 |
RF | 77.14 | 61.21 | 45.24 | 40.45 | 0.3606 | 25.02 | 922.89 | |
kNN | 51.86 | 41.04 | 31.09 | 29.50 | 0.5271 | 1.23 | 200,249.55 | |
NB | 55.84 | 37.60 | 63.32 | 41.57 | 0.2498 | 1.92 | 169.6 | |
MLP | 77.43 | 66.47 | 46.00 | 41.50 | 0.5638 | 180.86 | 135.3 | |
Feature extraction | ||||||||
DT | 65.35 | 50.12 | 37.58 | 32.36 | 0.4156 | 5.01 | 7.85 | 12.35 |
RF | 76.65 | 52.36 | 44.17 | 39.43 | 0.5626 | 35.90 | 719.71 | |
kNN | 67.51 | 55.02 | 39.23 | 33.82 | 0.4694 | 0.44 | 143,359 | |
NB | 32.40 | 36.20 | 56.10 | 31.93 | 0.3050 | 0.64 | 210.78 | |
MLP | 77.46 | 68.08 | 46.02 | 41.51 | 0.5795 | 90.35 | 103.62 |
Models | Accuracy (%) | Precision (%) | Re-call (%) | F1-score (%) | MCC | FS (s) | Training (s) | Inference (ms) |
---|---|---|---|---|---|---|---|---|
Feature selection | ||||||||
DT | 51.29 | 33.16 | 25.02 | 16.00 | 0.2450 | 0 | 1.94 | 16.79 |
RF | 69.38 | 29.90 | 21.68 | 19.06 | 0.3507 | 13.46 | 758.22 | |
kNN | 57.24 | 48.56 | 34.46 | 25.69 | 0.3720 | 0.48 | 187,612.41 | |
NB | 20.14 | 34.66 | 41.28 | 19.76 | 0.1973 | 0.79 | 257.37 | |
MLP | 77.47 | 67.96 | 46.03 | 41.56 | 0.5796 | 151.02 | 117.19 | |
Feature extraction | ||||||||
DT | 54.98 | 38.23 | 34.68 | 22.60 | 0.2982 | 3.98 | 10.01 | 16.80 |
RF | 76.84 | 62.80 | 44.55 | 39.72 | 0.5661 | 48.97 | 801.26 | |
kNN | 57.25 | 45.09 | 34.42 | 25.47 | 0.3718 | 0.49 | 188,883.26 | |
NB | 22.50 | 34.92 | 42.80 | 22.52 | 0.2204 | 0.76 | 322.33 | |
MLP | 77.46 | 68.01 | 46.05 | 41.53 | 0.5796 | 116.71 | 118.47 |
Class | Feature selection | Feature extraction | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Number of features | Number of features | |||||||||
9 | 22 | 33 | 47 | 77 | 9 | 22 | 33 | 47 | 77 | |
Backdoor | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | 0.61 | 0.62 | 0.62 | 0.61 | 0.61 |
ddos | 0.71 | 0.71 | 0.85 | 0.86 | 0.86 | 0.85 | 0.85 | 0.87 | 0.86 | 0.86 |
dos | 0.00 | 0.08 | 0.08 | 0.09 | 0.09 | 0.00 | 0.18 | 0.17 | 0.09 | 0.09 |
Injection | 0.00 | 0.06 | 0.03 | 0.08 | 0.06 | 0.02 | 0.06 | 0.06 | 0.06 | 0.06 |
mitm | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.05 | 0.29 | 0.28 | 0.00 | 0.00 |
Normal | 0.86 | 0.86 | 0.89 | 0.89 | 0.89 | 0.89 | 0.89 | 0.89 | 0.89 | 0.89 |
Password | 0.00 | 0.10 | 0.10 | 0.10 | 0.10 | 0.09 | 0.10 | 0.10 | 0.10 | 0.10 |
Ransomware | 0.13 | 0.16 | 0.16 | 0.16 | 0.16 | 0.16 | 0.16 | 0.16 | 0.16 | 0.16 |
Scanning | 0.00 | 0.00 | 0.69 | 0.70 | 0.70 | 0.70 | 0.72 | 0.72 | 0.70 | 0.70 |
xss | 0.62 | 0.67 | 0.68 | 0.68 | 0.68 | 0.67 | 0.63 | 0.70 | 0.68 | 0.68 |
Average | 0.29 | 0.33 | 0.41 | 0.45 | 0.42 | 0.41 | 0.45 | 0.46 | 0.42 | 0.42 |
Result verification statistically
No. | Data | T-statistic | P-values | Significant |
---|---|---|---|---|
1 | Accuracy of binary and multiclassification using FE between 9, 22 and 33, 47 features respectively | − 9.3139 | 0.0026 | * |
2 | Accuracy of binary and multiclassification using FS between 33, 47 and 9, 22 features respectively | − 7.0103 | 0.0006 | * |
3 | Feature reduction time between FS and FE for each feature scheme | 3.5833 | 0.0372 | * |
4 | Model training time between FS and FE for each feature scheme | − 2.2707 | 0.0324 | * |
5 | Model inference time between FS and FE for each feature scheme (except for kNN model) | − 3.4921 | 0.0251 | * |
6 | DT runtime (including the run time of both model building and inference) compared to other models for both FS and FE | − 3.6216 | 0.0152 | * |
No. | Content | FS | FE |
---|---|---|---|
1 | Higher accuracy when no. of features is small, such as 9 and 22 | ✓ | |
2 | Higher accuracy when no. of features gets large, such as 33 and 47 | ✓ | |
3 | Lower feature reduction time | ✓ | |
4 | Lower model training time | ✓ | |
5 | Lower inference time | ✓ | |
6 | DT is the most favorite classifier considering runtime | ✓ | ✓ |
7 | DT is the most favorite classifier considering performance for multi-classification with small and moderate number of features, such as 9, 22, and 33 | ✓ | ✓ |
8 | MLP is the most favorite classifier considering performance for multi-classification with more features, such as 47 and 77 | ✓ | ✓ |
9 | Less sensitive to the number of selected/extracted features | ✓ | |
10 | Less sensitive to various machine learning models | ✓ | |
11 | Detection performance degrades when number of features is too large | ✓ | |
12 | Detection performance increase when more informative features added | ✓ | |
13 | Detect more diverse attack types when using the same classifier | ✓ | |
14 | F1-score of attack class is much lower than that of normal class (binary) | ✓ | ✓ |
15 | F1-score of attack class degrades when number of features increases (binary) | ✓ | |
16 | Higher F1-score in detecting DDoS, normal, scanning and XSS classes | ✓ | |
17 | Higher F1-score in detecting injection classes | ✓ | |
18 | More potential to improve performance when the number of features is small | ✓ | |
19 | More potential to improve performance when the number of features is large | ✓ |