1 Introduction
2 Methods
2.1 SVM for imbalanced data
2.2 The issue of determining penalty costs
2.3 Our Approach
Predicted positive | Predicted negative | |
---|---|---|
Actual positive | TP (True positive) | FN (False negative) |
Actual negative | FP (False positive) | TN (True negative) |
3 Experiments
-
Experiment 1: the presented method is evaluated on \(44\) benchmark datasets with varying value of imbalance ratio.
-
Experiment 2: the presented approach is applied to the real-life decision problem related to the short-term loans repayment prediction.
3.1 Experiment 1: benchmark datasets
3.1.1 Description
ID | Dataset |
\(\#\mathrm{In}\)
|
\(\#\mathrm{At}\)
|
\(\%\mathrm{P}\)
|
\(\%\mathrm{N}\)
|
\(\mathrm{Imb_{rate}}\)
|
---|---|---|---|---|---|---|
1 | Glass1 | 214 | 9 | 35.51 | 64.49 | 1.82 |
2 | Ecoli0vs1 | 220 | 7 | 35.00 | 65.00 | 1.86 |
3 | Wisconsin | 683 | 9 | 35.00 | 65.00 | 1.86 |
4 | Pima | 768 | 8 | 34.84 | 66.16 | 1.90 |
5 | Iris0 | 150 | 4 | 33.33 | 66.67 | 2.00 |
6 | Glass0 | 214 | 9 | 32.71 | 67.29 | 2.06 |
7 | Yeast1 | 1,484 | 8 | 28.91 | 71.09 | 2.46 |
8 | Vehicle1 | 846 | 18 | 28.37 | 71.63 | 2.52 |
9 | Vehicle2 | 846 | 18 | 28.37 | 71.63 | 2.52 |
10 | Vehicle3 | 846 | 18 | 28.37 | 71.63 | 2.52 |
11 | Haberman | 306 | 3 | 27.42 | 73.58 | 2.68 |
12 | Glass0123vs456 | 214 | 9 | 23.83 | 76.17 | 3.19 |
13 | Vehicle0 | 846 | 18 | 23.64 | 76.36 | 3.23 |
14 | Ecoli1 | 336 | 7 | 22.92 | 77.08 | 3.36 |
15 | New-thyroid2 | 215 | 5 | 16.89 | 83.11 | 4.92 |
16 | New-thyroid1 | 215 | 5 | 16.28 | 83.72 | 5.14 |
17 | Ecoli2 | 336 | 7 | 15.48 | 84.52 | 5.46 |
18 | Segment0 | 2,308 | 19 | 14.26 | 85.74 | 6.01 |
19 | Glass6 | 214 | 9 | 13.55 | 86.45 | 6.38 |
20 | Yeast3 | 1,484 | 8 | 10.98 | 89.02 | 8.11 |
21 | Ecoli3 | 336 | 7 | 10.88 | 89.77 | 8.77 |
22 | Page-blocks0 | 5,472 | 10 | 10.23 | 89.77 | 8.77 |
23 | Yeast2vs4 | 514 | 8 | 9.92 | 90.08 | 9.08 |
24 | Yeast05679vs4 | 528 | 8 | 9.66 | 90.34 | 9.35 |
25 | Vowel0 | 988 | 13 | 9.01 | 90.99 | 10.10 |
26 | Glass016vs2 | 192 | 9 | 8.89 | 91.11 | 10.29 |
27 | Glass2 | 214 | 9 | 8.78 | 91.22 | 10.39 |
28 | Ecoli4 | 336 | 7 | 6.74 | 93.26 | 13.84 |
29 | Yeast1vs7 | 459 | 8 | 6.72 | 93.28 | 13.87 |
30 | Shuttle0vs4 | 1,829 | 9 | 6.72 | 93.28 | 13.87 |
31 | Glass4 | 214 | 9 | 6.07 | 93.93 | 15.47 |
32 | Page-blocks13 | 472 | 10 | 5.93 | 94.07 | 15.85 |
33 | Abalone9vs18 | 731 | 8 | 5.65 | 94.25 | 16.68 |
34 | Glass016vs5 | 184 | 9 | 4.89 | 95.11 | 19.44 |
35 | Shuttle2vs4 | 129 | 9 | 4.65 | 95.35 | 20.5 |
36 | Yeast1458vs7 | 693 | 8 | 4.33 | 96.67 | 22.10 |
37 | Glass5 | 214 | 9 | 4.20 | 95.80 | 22.81 |
38 | Yeast2vs8 | 482 | 8 | 4.15 | 95.85 | 23.10 |
39 | Yeast4 | 1,484 | 8 | 3.43 | 96.57 | 28.41 |
40 | Yeast1289vs7 | 947 | 8 | 3.17 | 96.83 | 30.56 |
41 | Yeast5 | 1,484 | 8 | 2.96 | 97.04 | 32.78 |
42 | Ecoli0137vs26 | 281 | 7 | 2.49 | 97.51 | 39.15 |
43 | Yeast6 | 1,484 | 8 | 2.49 | 97.51 | 39.15 |
44 | Abalone9 | 4,174 | 8 | 0.77 | 99.23 | 128.87 |
3.1.2 Methods
-
SVM (SVM): SVM trained using SMO.
-
SVM + SMOTE (SSVM): SVM trained on data oversampled by SMOTE.
-
SMOTEBoostSVM (SBSVM): Boosted SVM which uses SMOTE to generate artificial samples before constructing each of base classifiers.
-
AdaCost (AdaC): Cost-sensitive, ensemble classifier, in which the misclassification cost for minority class is higher than the misclassification cost for majority class (Fan et al. 1999).
-
SMOTEBoost (SBO): modified AdaBoost algorithm, in which base classifiers are constructed using SMOTE synthetic sampling (Chawla et al. 2003).
-
RUSBoost (RUS): extension of SMOTEBoost approach, which uses additional undersampling in each boosting iteration (Seiffert et al. 2010).
-
SMOTEBagging (SB): bagging method, which uses SMOTE to oversample dataset before constructing each of base classifiers (Wang and Yao 2009).
-
UnderBagging (UB): bagging method, which randomly undersamples dataset before constructing each of base classifiers (Tao et al. 2006).
-
BoostingSVM-IB (BSI): boosted SVM trained with cost-sensitive approach presented in Zięba et al. (2014).
3.1.3 Methodology
3.1.4 Results and discussion
ID | IB | SVM | SSVM | SBSVM | CSVM | AdaC | SBO | RUS | SB | UB | BSI | BSIA |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1.82 | 0.0000 | 0.5567 | 0.6932 | 0.7140 | 0.7893 |
0.8008
| 0.7824 | 0.7518 | 0.7649 | 0.7416 | 0.7179 |
2 | 1.86 |
0.9869
| 0.9835 | 0.8327 | 0.9700 | 0.9695 | 0.9695 | 0.9765 | 0.9835 | 0.9800 | 0.9835 | 0.9800 |
3 | 1.86 | 0.9686 |
0.9758
| 0.9570 | 0.9463 | 0.9724 | 0.9633 | 0.9590 | 0.9641 | 0.9628 | 0.9728 | 0.9688 |
4 | 1.90 | 0.6963 | 0.7534 | 0.7436 | 0.7321 | 0.7159 | 0.7439 | 0.7331 |
0.7609
| 0.7602 | 0.7456 | 0.7456 |
5 | 2.00 |
1.0000
|
1.0000
|
1.0000
|
1.0000
| 0.9899 | 0.9899 | 0.9899 | 0.9798 | 0.9899 |
1.0000
|
1.0000
|
6 | 2.06 | 0.4807 | 0.7069 | 0.7481 | 0.7742 | 0.8150 | 0.8150 |
0.8557
| 0.8269 | 0.8292 | 0.7782 | 0.7994 |
7 | 2.46 | 0.4522 | 0.7057 | 0.7033 | 0.7163 | 0.6460 | 0.7070 | 0.7059 |
0.7294
| 0.7225 | 0.7245 | 0.7274 |
8 | 2.52 | 0.5409 | 0.7899 | 0.8266 | 0.8299 | 0.7953 | 0.7438 | 0.7404 | 0.7710 | 0.7758 | 0.8413 |
0.8451
|
9 | 2.52 | 0.9376 | 0.9503 |
0.9837
| 0.9744 | 0.9813 | 0.9774 | 0.9758 | 0.9701 | 0.9595 | 0.9807 | 0.9752 |
10 | 2.52 | 0.3914 | 0.7668 | 0.8173 |
0.8214
| 0.7668 | 0.7388 | 0.7747 | 0.7555 | 0.7898 | 0.8204 | 0.8206 |
11 | 2.68 | 0.0000 | 0.5529 | 0.6199 | 0.6241 | 0.5598 | 0.6302 | 0.6258 | 0.6559 | 0.6620 | 0.6421 |
0.6640
|
12 | 3.19 | 0.8828 | 0.8940 | 0.8925 | 0.8925 | 0.9231 | 0.9028 | 0.9101 | 0.9231 | 0.9054 | 0.9141 |
0.9337
|
13 | 3.23 | 0.9504 | 0.9646 | 0.9652 |
0.9779
| 0.9765 | 0.9633 | 0.9601 | 0.9638 | 0.9525 | 0.9714 | 0.9739 |
14 | 3.36 | 0.8277 | 0.8973 | 0.8894 | 0.8798 | 0.8912 | 0.8776 |
0.9115
| 0.9035 | 0.9035 | 0.9015 | 0.8953 |
15 | 4.92 | 0.7928 | 0.9888 | 0.9710 | 0.9774 | 0.9574 | 0.9690 | 0.9547 | 0.9663 | 0.9494 | 0.9801 |
0.9972
|
16 | 5.14 | 0.7746 | 0.9860 | 0.9801 | 0.9944 | 0.9464 | 0.9832 | 0.9774 | 0.9746 | 0.9663 | 0.9916 |
0.9972
|
17 | 5.46 | 0.7719 | 0.9108 | 0.9238 | 0.9188 | 0.8815 | 0.9035 | 0.8835 | 0.8801 | 0.8947 | 0.9221 |
0.9270
|
18 | 6.01 | 0.9906 | 0.9934 | 0.9944 | 0.9947 | 0.9824 | 0.9959 | 0.9914 | 0.9929 | 0.9891 |
0.9985
| 0.9954 |
19 | 6.38 | 0.8440 | 0.8948 | 0.8686 | 0.8882 | 0.8873 | 0.8347 | 0.9130 |
0.9209
| 0.8969 | 0.8857 | 0.8711 |
20 | 8.11 | 0.7653 | 0.9177 | 0.8977 | 0.9068 | 0.8918 | 0.8932 | 0.9162 |
0.9413
| 0.9311 | 0.9191 | 0.9249 |
21 | 8.77 | 0.4106 | 0.8938 | 0.8673 | 0.8377 | 0.8215 | 0.8151 | 0.8713 | 0.8687 | 0.8902 | 0.8897 |
0.8946
|
22 | 8.77 | 0.6547 | 0.9539 | 0.9625 | 0.9604 |
0.9977
| 0.9966 | 0.9703 | 0.9898 | 0.9703 | 0.9775 | 0.9944 |
23 | 9.08 | 0.7402 | 0.8941 | 0.8712 | 0.8826 | 0.9195 | 0.8770 | 0.9131 | 0.9021 |
0.9536
| 0.8920 | 0.8961 |
24 | 9.35 | 0.0000 | 0.7948 | 0.7507 | 0.7424 | 0.7810 | 0.7726 |
0.8444
| 0.7973 | 0.7907 | 0.7907 | 0.7958 |
25 | 10.10 | 0.9713 | 0.9882 |
1.0000
|
1.0000
| 0.9702 | 0.9911 | 0.9577 | 0.9861 | 0.9477 |
1.0000
| 0.9978 |
26 | 10.29 | 0.0000 | 0.5615 | 0.5751 | 0.6193 | 0.5561 | 0.6059 | 0.5980 | 0.6600 | 0.7331 |
0.7674
| 0.7492 |
27 | 10.39 | 0.0000 | 0.5710 | 0.5757 | 0.7795 | 0.7187 | 0.7689 | 0.7043 |
0.8355
| 0.7697 | 0.8123 | 0.8127 |
28 | 13.84 | 0.8062 | 0.9244 | 0.8802 | 0.8859 | 0.9274 | 0.8802 | 0.9259 | 0.9290 | 0.8866 | 0.9259 |
0.9336
|
29 | 13.87 | 0.9959 | 0.9959 | 0.9956 | 0.9956 | 0.9997 |
1.0000
|
1.0000
| 0.9997 |
1.0000
| 0.9959 | 0.9997 |
30 | 13.87 | 0.0000 | 0.7511 | 0.5432 | 0.6906 | 0.7011 | 0.6325 | 0.7351 | 0.6522 | 0.7454 |
0.7939
| 0.7738 |
31 | 15.47 | 0.3922 | 0.9067 | 0.8216 | 0.8661 | 0.8810 | 0.9192 | 0.9267 | 0.8801 | 0.8572 | 0.9292 |
0.9463
|
32 | 15.85 | 0.7015 | 0.9057 | 0.9016 | 0.9344 | 0.7967 | 0.9343 | 0.9499 | 0.9563 |
0.9599
| 0.9337 | 0.9371 |
33 | 16.68 | 0.0000 | 0.8706 | 0.7206 | 0.8603 | 0.6904 | 0.7831 | 0.7847 | 0.7796 | 0.7731 |
0.8989
| 0.8960 |
34 | 19.44 | 0.0000 | 0.9502 | 0.8743 | 0.8118 | 0.8641 | 0.9292 |
0.9885
| 0.8537 | 0.9411 | 0.9827 | 0.9769 |
35 | 20.50 | 0.9092 | 0.9959 | 0.9129 | 0.9129 | 0.9129 |
1.0000
|
1.0000
|
1.0000
|
1.0000
| 0.9129 |
1.0000
|
36 | 22.10 | 0.0000 | 0.6382 | 0.6662 | 0.5734 | 0.4208 | 0.4384 | 0.6190 | 0.5460 | 0.6424 | 0.6636 |
0.6686
|
37 | 22.81 | 0.0000 | 0.9422 | 0.7435 | 0.8125 | 0.9728 | 0.9828 | 0.8667 | 0.9195 | 0.9474 |
0.9902
| 0.9753 |
38 | 23.10 | 0.7408 | 0.7670 | 0.7408 | 0.6102 | 0.4984 | 0.7368 | 0.7705 |
0.7975
| 0.7623 | 0.7957 | 0.7966 |
39 | 28.41 | 0.0000 | 0.8125 | 0.6196 | 0.7731 | 0.6954 | 0.6600 | 0.8217 | 0.7474 |
0.8477
| 0.8141 | 0.8222 |
40 | 30.56 | 0.0000 | 0.6973 | 0.1820 | 0.6256 | 0.5771 | 0.5945 | 0.7453 | 0.5809 | 0.7149 | 0.7326 |
0.7455
|
41 | 32.78 | 0.2132 | 0.9661 | 0.8463 | 0.9401 | 0.8754 | 0.9090 | 0.9600 | 0.9630 | 0.9575 | 0.9477 |
0.9699
|
42 | 39.15 | 0.8421 | 0.8755 |
0.9665
| 0.7462 | 0.8153 | 0.8296 | 0.8121 | 0.8312 | 0.7539 | 0.8390 | 0.8966 |
43 | 39.15 | 0.0000 | 0.8763 | 0.7132 | 0.8640 | 0.6782 | 0.8019 | 0.8374 | 0.8245 | 0.8698 | 0.8887 |
0.9009
|
44 | 128.87 | 0.0000 | 0.6842 | 0.1759 | 0.6120 | 0.1753 | 0.1759 | 0.6847 | 0.3867 | 0.6904 | 0.7658 |
0.7807
|
AV | 0.5098 | 0.8501 | 0.8003 | 0.8379 | 0.8088 | 0.8281 | 0.8596 | 0.8478 | 0.8634 | 0.8785 |
0.8845
|
Methods |
p value |
FWER
| Hypothesis (\(\alpha =0.05\)) |
---|---|---|---|
BSIA versus SVM
| 0.0000 | 0.0050 | rejected for BSIA
|
BSIA versus SSVM
| 0.0000 | 0.0056 | rejected for BSIA
|
BSIA versus SBSVM
| 0.0000 | 0.0063 | rejected for BSIA
|
BSIA versus CSVM
| 0.0000 | 0.0071 | rejected for BSIA
|
BSIA versus AdaC
| 0.0000 | 0.0083 | rejected for BSIA
|
BSIA versus SBO
| 0.0000 | 0.0100 | rejected for BSIA
|
BSIA versus UB
| 0.0006 | 0.0125 | rejected for BSIA
|
BSIA versus RUS
| 0.0008 | 0.0167 | rejected for BSIA
|
BSIA versus SB
| 0.0008 | 0.0250 | rejected for BSIA
|
BSIA versus BSI
| 0.0255 | 0.0500 | rejected for BSIA
|
3.2 Experiment 2: the short-term loans repayment prediction
3.2.1 Description
3.2.2 Results and discussion
Method |
\(\mathrm{TP}_{\mathrm{rate}}\)
|
\(\mathrm{TN}_{\mathrm{rate}}\)
| Acc | GMean |
---|---|---|---|---|
UB
| 0.6383 | 0.5930 | 0.5986 | 0.6153 |
RUS
| 0.4468 | 0.7393 | 0.7033 | 0.5747 |
SSVM
|
0.6596
| 0.5592 | 0.5716 | 0.6073 |
BSI
| 0.5957 | 0.6448 | 0.6387 | 0.6198 |
BSIA
| 0.6312 | 0.6388 | 0.6379 |
0.6350
|
JRip
| 0.0000 |
1.0000
|
0.8770
| 0.0000 |
J48
| 0.0000 |
1.0000
|
0.8770
| 0.0000 |
JRip + BSIA
| 0.6028 | 0.6537 | 0.6475 | 0.6277 |