Introduction
Classifiers
Logistic regression
Decision trees and random forest
Naive Bayes
Artificial Neural Network
Related work
Research methodology
Dataset
Feature selection
Genetic algorithm feature selection
-
Population EAs approaches maintain a sample of possible solutions called population.
-
Fitness A solution within the population is called an individual. Each individual is characterized by a gene representation and a fitness measure.
-
Variation The individual evolves through mutations that are inspired from the biological gene evolution.
Attribute vector | Vector length | Attribute list |
---|---|---|
\(v_1\) | 18 | V1, V5, V7, V8, V11,V13, V14, V15, V16, V17, V18, V19, V20, V21, V22, V23, V24, Amount |
\(v_2\) | 9 | V1, V6, V13, V16, V17, V22, V23, V28, Amount |
\(v_3\) | 13 | V2, V11, V12, V13, V15, V16, V17, V18,V20, V21, V24, V26, Amount |
\(v_4\) | 9 | V2, V7, V10, V13,V15, V17, V19, V28, Amount |
\(v_5\) | 13 | Time, V1, V7, V8, V9, V11, V12, V14, V15, V22, V27, V28, Amount |
Fraud detection framework
Performance metrics
-
True positive (TP): attacks/intrusions that are accurately flagged as attacks.
-
True Negative (TN): normal traffic patterns/traces that are successfully categorized as normal.
-
False positive (FP): legitimate network traces that are incorrectly labeled as intrusive.
-
False Negative (FN): attacks/intrusions that are incorrectly classified as non-intrusive.$$\begin{aligned} AC= & {} \frac{TN+TP}{TP+TN+FP+FN} \end{aligned}$$(5)$$\begin{aligned} RC= & {} \frac{TP}{FN+TP} \end{aligned}$$(6)$$\begin{aligned} PR= & {} \frac{TP}{FP+TP} \end{aligned}$$(7)$$\begin{aligned} F1_{score}= & {} 2\frac{PR . RC}{PR + RC} \end{aligned}$$(8)
Experiments
Experimental configuration
Results and discussions
Model | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
RF | 99.94 % | 76.99 % | 89.69 % | 82.85% |
DT | 99.92 % | 75.22 % | 75.22 % | 75.22% |
ANN | 99.94 % | 77.87 % | 84.61 % | 81.10% |
NB | 98.13 % | 84.95 % | 6.83 % | 12.65% |
LR | 99.91 % | 57.52 % | 82.27 % | 67.70 % |
Model | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
RF | 99.93 % | 76.10 % | 82.69 % | 79.26 % |
DT | 99.87 % | 68.14 % | 60.62 % | 64.16 % |
ANN | 99.91 % | 66.37 % | 76.53 % | 71.09 % |
NB | 98.65 % | 77.87 % | 8.59 % | 15.47 % |
LR | 99.89 % | 47.78 % | 79.41 % | 59.66 % |
Model | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
RF | 99.94 % | 75.22 % | 85.85 % | 80.18 % |
DT | 99.90 % | 76.10 % | 68.80 % | 72.26 % |
ANN | 99.91 % | 67.25 % | 77.55 % | 72.03 % |
NB | 98.81 % | 81.41 % | 10.07 % | 17.93 % |
LR | 99.90 % | 53.09 % | 80.00 % | 63.82 % |
Model | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
RF | 99.94 % | 77.87 % | 83.80 % | 80.73 % |
DT | 99.91 % | 76.10 % | 72.26 % | 74.13 % |
ANN | 99.91 % | 61.06 % | 81.17 % | 69.69 % |
NB | 98.48 % | 81.41 % | 7.97 % | 14.53 % |
LR | 99.89 % | 46.90 % | 77.94 % | 58.56 % |
Model | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
RF | 99.98 % | 72.56 % | 95.34 % | 82.41 % |
DT | 99.89 % | 72.56 % | 65.07 % | 68.61 % |
ANN | 99.08 % | 77.87 % | 12.27 % | 21.20 % |
NB | 99.44 % | 57.52 % | 15.85 % | 24.85 % |
LR | 99.77 % | 46.90 % | 34.64 % | 39.84 % |
Model | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
RF | 87.95 % | 77.87 % | 92.63 % | 84.61% |
DT | 96.91 % | 76.10 % | 71.07 % | 73.50% |
ANN | 97.80 % | 74.33 % | 42.85 % | 54.36% |
NB | 80.31 % | 64.60 % | 13.95 % | 22.95% |
LR | 93.88 % | 60.17 % | 62.96 % | 61.53 % |
Model | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
RF | 83.78 % | 79.64 % | 92.78 % | 85.71% |
DT | 89.91 % | 79.64 % | 68.70 % | 73.77% |
ANN | 88.93 % | 78.76 % | 82.40 % | 80.54% |
NB | 78.14 % | 83.18 % | 6.73 % | 12.46% |
LR | 79.91 % | 59.29% | 81.70 % | 68.71 % |
Model | Accuracy |
---|---|
LR [13] | 97.70 % |
DT [13] | 95.50 % |
SVM [13] | 97.50 % |
NB [14] | 99.23 % |
KNN [16] | 97.69 % |
LR [16] | 54.86 % |
DT [4] | 97.08 % |
LR [17] | 97.18 % |
IF [16] | 58.83 % |
GA-ANN [17] | 81.82 % |
GA-DT [17] | 81.97 % |
GA-RF [17] | 77.95 % |
GA-RF (Proposed \(v_5\)) | 99.98 % |
GA-DT (Proposed \(v_1\)) | 99.92 % |
GA-LR (Proposed \(v_1\)) | 99.91 % |
GA-NB (Proposed \(v_5\)) | 99.44 % |
Experiments on synthetic dataset
Attribute vector | Vector length | Attribute list |
---|---|---|
GA selected feature space, \(v_0\) | 7 | Card, Year, Month, Day, Amount, Zip, MCC |
Model | Accuracy | Recall | Precision | F1-Score |
---|---|---|---|---|
RF | 99.95 % | 99.82 % | 99.92 % | 99.82 % |
DT | 100 % | 99.71 % | 99.51 % | 99.61 % |
ANN | 100 % | 72.09 % | 84.31 % | 77.72 % |
NB | 99.10 % | 96.29 % | 84.47 % | 41.52 % |
LR | 99.96 % | 99.12 % | 80.68 % | 88.95 % |