Introduction
Related work
Our work
-
To better evaluate the classification quality of features in terms of separability, we define a new separability degree (SD) by integrating the coincidence degree and the dependency degree for fuzzy neighborhood rough sets and fuse it with FNRS to define a new fuzzy neighborhood entropy. Then, we propose the concepts of fuzzy neighborhood joint entropy, fuzzy neighborhood conditional entropy and fuzzy neighborhood mutual information. The related properties are explored and proven.
-
To better discuss the measure of online streaming feature selection from both algebra and information views, we propose fuzzy neighborhood symmetric uncertainty. Then, we present a series of uncertainty measures such as the significance, fuzzy neighborhood interaction gain and contrast ratio. The related theorems are derived and proven. Furthermore, we construct an online group streaming decision system to retain features with strong approximation ability when features dynamically flow into the feature space while removing redundant features.
-
Based on this, we design a new online group streaming feature selection algorithm, named FNE-OGSFS. First, the significance is used for intra-group feature selection. Second, online interaction analysis is performed on feature groups flowing into the feature space based on the fuzzy neighborhood interaction gain and contrast ratio. Finally, redundant features are removed using the Lasso model. Experimental results on thirteen different types of real-world datasets confirmed that FNE-OGSFS can effectively select the optimal feature subset.
Preliminaries
Fuzzy neighborhood rough sets
Coincidence degree
Fuzzy neighborhood entropy-based uncertainty measures
Online group streaming feature selection approach
Problem formalization
Our new algorithm
Online intra-group selection
Online interaction analysis
Online redundancy analysis
Time complexity
Experimental results
Experiment setup
No. | Datasets | Samples | Features | Classes |
---|---|---|---|---|
1 | Sonar | 208 | 60 | 2 |
2 | Wpbc | 198 | 34 | 2 |
3 | Ionosphere | 351 | 33 | 2 |
4 | Wdbc | 569 | 31 | 2 |
5 | COLON | 62 | 2000 | 2 |
6 | DLBCL | 77 | 7129 | 2 |
7 | LEUKEMIA | 72 | 7129 | 2 |
8 | LYMPHOMA | 62 | 4026 | 3 |
9 | SRBCT | 83 | 2308 | 4 |
10 | Lung Cancer | 203 | 12600 | 5 |
11 | Ovarian Cancer | 253 | 15154 | 2 |
12 | MADELON | 2600 | 500 | 2 |
13 | ARCENE | 100 | 10000 | 2 |
The parameter analysis of the FNE-OGSFS
Comparison with other algorithms
Datasets | FNE- OGSFS | OGSFS- FI | Group- SAOLA | Alpha- investing | SFS- FI | OFS- A3M | FNRS | FNCE | FNPME- FS |
---|---|---|---|---|---|---|---|---|---|
Sonar |
0.8478
| 0.8091 | 0.744 | 0.7689 | 0.6487 | 0.8419 | 0.6981 | 0.7312 | 0.8102 |
Wpbc |
0.7912
| 0.723 | 0.6522 | 0.7323 | 0.7041 | 0.6457 | 0.6613 | 0.6986 | 0.7127 |
Ionosphere | 0.9019 | 0.8977 | 0.6254 | 0.8994 |
0.9115
| 0.8906 | 0.8342 | 0.8011 | 0.8311 |
Wdbc | 0.9274 |
0.9579
| 0.9013 | 0.939 | 0.8696 | 0.9137 | 0.942 | 0.8912 | 0.9162 |
COLON |
0.8734
| 0.7905 | 0.6966 | 0.657 | 0.6572 | 0.7105 | 0.7214 | 0.7365 | 0.8014 |
DLBCL |
0.9267
| 0.8298 | 0.7306 | 0.8121 | 0.7527 | 0.7816 | 0.828 | 0.8901 | 0.9069 |
LEUKEMIA |
0.9417
| 0.729 | 0.834 | 0.6332 | 0.5519 | 0.8595 | 0.8639 | 0.8895 | 0.8944 |
LYMPHOMA |
0.9703
| 0.9085 | 0.7963 | 0.858 | 0.6852 | 0.8651 | 0.911 | 0.9039 | 0.8991 |
SRBCT |
0.8495
| 0.6273 | 0.8013 | 0.8005 | 0.7585 | 0.7944 | 0.7367 | 0.7121 | 0.738 |
Lung Cancer |
0.8699
| 0.8173 | 0.8197 | 0.6779 | 0.6375 | 0.8621 | 0.7132 | 0.8196 | 0.8203 |
Ovarian Cancer | 0.9746 | 0.9652 | 0.9742 |
0.9791
| 0.6078 | 0.9166 | 0.9044 | 0.9162 | 0.9718 |
ARCENE |
0.7479
| 0.733 | 0.5967 | 0.5848 | 0.5635 | 0.628 | 0.6649 | 0.6895 | 0.6124 |
MADELON | 0.548 |
0.5976
| 0.509 | 0.5836 | 0.521 | 0.511 | 0.503 | 0.5091 | 0.5132 |
W/T/L | 9/0/4 | 2/0/11 | 0/0/13 | 1/0/12 | 1/0/12 | 0/0/13 | 0/0/13 | 0/0/13 | 0/0/13 |
Average |
0.8593
| 0.7989 | 0.7447 | 0.7635 | 0.6822 | 0.7862 | 0.7672 | 0.7837 | 0.8021 |
Datasets | FNE- OGSFS | OGSFS- FI | Group- SAOLA | Alpha- investing | SFS- FI | OFS- A3M | FNRS | FNCE | FNPME- FS |
---|---|---|---|---|---|---|---|---|---|
Sonar | 0.74 |
0.7626
| 0.707 | 0.7168 | 0.5778 | 0.7256 | 0.6337 | 0.5413 | 0.6891 |
Wpbc |
0.7691
| 0.7687 | 0.7426 | 0.7631 | 0.7658 | 0.7538 | 0.7262 | 0.7174 | 0.7672 |
Ionosphere |
0.922
| 0.8926 | 0.1226 | 0.1083 | 0.8922 | 0.1103 | 0.8791 | 0.853 | 0.8932 |
Wdbc | 0.9385 |
0.9634
| 0.9026 | 0.9372 | 0.8752 | 0.9342 | 0.891 | 0.6316 | 0.8897 |
COLON |
0.9138
| 0.8095 | 0.674 | 0.6926 | 0.6759 | 0.7899 | 0.8203 | 0.6869 | 0.8292 |
DLBCL |
0.9201
| 0.8608 | 0.8115 | 0.8543 | 0.7589 | 0.8183 | 0.8177 | 0.7473 | 0.9136 |
LEUKEMIA |
0.9746
| 0.7896 | 0.7563 | 0.8281 | 0.6798 | 0.8329 | 0.896 | 0.6653 | 0.938 |
LYMPHOMA |
0.9679
| 0.8939 | 0.8263 | 0.755 | 0.7105 | 0.8279 | 0.9036 | 0.8642 | 0.9021 |
SRBCT |
0.9526
| 0.8267 | 0.5989 | 0.8201 | 0.7899 | 0.8575 | 0.7991 | 0.7633 | 0.8629 |
Lung Cancer |
0.8955
| 0.788 | 0.7244 | 0.8557 | 0.6897 | 0.8524 | 0.746 | 0.7061 | 0.8542 |
Ovarian Cancer | 0.9877 | 0.9141 | 0.921 |
0.9927
| 0.6463 | 0.9579 | 0.9152 | 0.9637 | 0.9536 |
ARCENE |
0.69
| 0.6553 | 0.6181 | 0.5928 | 0.587 | 0.5714 | 0.5531 | 0.513 | 0.6455 |
MADELON |
0.6133
| 0.6091 | 0.5473 | 0.607 | 0.4849 | 0.4795 | 0.5063 | 0.5064 | 0.5389 |
W/T/L | 10/0/3 | 2/0/11 | 0/0/13 | 1/0/12 | 0/0/13 | 0/0/13 | 0/0/13 | 0/0/13 | 0/0/13 |
Average |
0.8681
| 0.8103 | 0.6887 | 0.7326 | 0.7026 | 0.7317 | 0.7759 | 0.7046 | 0.8213 |
Datasets | FNE- OGSFS | OGSFS- FI | Group- SAOLA | Alpha- investing | SFS- FI | OFS- A3M | FNRS | FNCE | FNPME- FS |
---|---|---|---|---|---|---|---|---|---|
Sonar |
0.7632
| 0.7389 | 0.684 | 0.7078 | 0.6039 | 0.6673 | 0.7233 | 0.7192 | 0.7338 |
Wpbc |
0.7742
| 0.7152 | 0.7105 | 0.7421 | 0.7616 | 0.7623 | 0.719 | 0.7039 | 0.7093 |
Ionosphere |
0.9113
| 0.9027 | 0.5196 | 0.8624 | 0.8984 | 0.8766 | 0.9037 | 0.8632 | 0.8932 |
Wdbc | 0.904 |
0.9223
| 0.786 | 0.8539 | 0.8668 | 0.8821 | 0.8075 | 0.8395 | 0.858 |
COLON |
0.8481
| 0.7238 | 0.6215 | 0.6594 | 0.6405 | 0.5965 | 0.7112 | 0.7181 | 0.8127 |
DLBCL |
0.8577
| 0.6477 | 0.6011 | 0.6096 | 0.7799 | 0.7006 | 0.8019 | 0.7956 | 0.8131 |
LEUKEMIA |
0.9011
| 0.7336 | 0.6973 | 0.7347 | 0.6591 | 0.8173 | 0.8346 | 0.8198 | 0.8571 |
LYMPHOMA |
0.9223
| 0.8681 | 0.7139 | 0.8251 | 0.721 | 0.7981 | 0.8199 | 0.8366 | 0.9037 |
SRBCT |
0.742
| 0.5983 | 0.5732 | 0.7176 | 0.5732 | 0.6898 | 0.5422 | 0.5981 | 0.6392 |
Lung Cancer | 0.7925 | 0.6508 | 0.5892 | 0.5432 | 0.6107 | 0.7433 | 0.7972 | 0.7211 |
0.806
|
Ovarian Cancer |
0.956
| 0.9298 | 0.8971 | 0.9333 | 0.6903 | 0.9468 | 0.8261 | 0.8973 | 0.9318 |
ARCENE |
0.6874
| 0.625 | 0.5357 | 0.6191 | 0.4961 | 0.565 | 0.488 | 0.5144 | 0.5742 |
MADELON |
0.6131
| 0.6048 | 0.5011 | 0.587 | 0.5194 | 0.5075 | 0.4931 | 0.5239 | 0.594 |
W/T/L | 11/0/2 | 1/0/12 | 0/0/13 | 0/0/13 | 0/0/13 | 0/0/13 | 0/0/13 | 0/0/13 | 1/0/12 |
Average |
0.821
| 0.7432 | 0.6485 | 0.7227 | 0.6785 | 0.7349 | 0.7283 | 0.7347 | 0.7789 |
Datasets | FNE- OGSFS | OGSFS- FI | Group- SAOLA | Alpha- investing | SFS- FI | OFS- A3M | FNRS | FNCE | FNPME- FS |
---|---|---|---|---|---|---|---|---|---|
Sonar | 0.7253 |
0.7283
| 0.6018 | 0.7192 | 0.5715 | 0.6918 | 0.6946 | 0.6139 | 0.7046 |
Wpbc |
0.7492
| 0.7012 | 0.6735 | 0.7012 | 0.692 | 0.7288 | 0.6633 | 0.6722 | 0.7481 |
Ionosphere |
0.9168
| 0.9053 | 0.6362 | 0.8897 | 0.8938 | 0.8947 | 0.8413 | 0.9075 | 0.8912 |
Wdbc | 0.8895 | 0.9266 | 0.7692 | 0.8817 | 0.8488 | 0.8374 | 0.916 | 0.8403 |
0.927
|
COLON |
0.8338
| 0.7327 | 0.6385 | 0.6897 | 0.6153 | 0.7452 | 0.7655 | 0.7962 | 0.7836 |
DLBCL |
0.8275
| 0.798 | 0.6373 | 0.7993 | 0.7157 | 0.7163 | 0.8132 | 0.8199 | 0.8081 |
LEUKEMIA |
0.8439
| 0.7352 | 0.6721 | 0.7978 | 0.6051 | 0.7527 | 0.719 | 0.7894 | 0.8205 |
LYMPHOMA |
0.9513
| 0.8267 | 0.7276 | 0.6884 | 0.6895 | 0.7363 | 0.8239 | 0.9316 | 0.9023 |
SRBCT |
0.7753
| 0.5108 | 0.5949 | 0.7069 | 0.7184 | 0.7192 | 0.6348 | 0.5672 | 0.6911 |
Lung Cancer | 0.7007 | 0.6477 | 0.6805 | 0.5091 | 0.5845 | 0.6853 | 0.6099 | 0.615 |
0.7265
|
Ovarian Cancer |
0.9538
| 0.9075 | 0.9122 | 0.9379 | 0.6504 | 0.8844 | 0.8502 | 0.9134 | 0.9027 |
ARCENE |
0.7267
| 0.6845 | 0.5321 | 0.526 | 0.6492 | 0.5507 | 0.5116 | 0.5268 | 0.5912 |
MADELON | 0.5863 |
0.5977
| 0.466 | 0.5949 | 0.5155 | 0.5005 | 0.5047 | 0.5014 | 0.573 |
W/T/L | 9/0/4 | 2/0/11 | 0/0/13 | 0/0/13 | 0/0/13 | 0/0/13 | 0/0/13 | 0/0/13 | 2/0/11 |
Average |
0.8062
| 0.7463 | 0.6571 | 0.7263 | 0.6731 | 0.7264 | 0.7191 | 0.7303 | 0.7746 |
Datasets | FNE- OGSFS | OGSFS- FI | Group- SAOLA | Alpha- investing | SFS- FI | OFS- A3M | FNRS | FNCE | FNPME- FS |
---|---|---|---|---|---|---|---|---|---|
Sonar | 8.8 | 12.4 | 1 | 15.4 | 1 | 21.8 | 1.8 | 1.2 | 10.2 |
Wpbc | 1.6 | 2.2 | 2.6 | 8 | 1 | 1 | 9 | 11.4 | 8.6 |
Ionosphere | 5 | 3.4 | 1 | 16.2 | 2 | 8.8 | 4.2 | 3.8 | 4 |
Wdbc | 6.6 | 11.6 | 10.2 | 17.2 | 1 | 16 | 12.6 | 10.2 | 8.2 |
COLON | 5.8 | 6.2 | 1 | 2.2 | 1 | 31.6 | 3 | 7.2 | 11.6 |
DLBCL | 5.2 | 11.2 | 8.6 | 6.2 | 1.4 | 29.2 | 21.4 | 10.6 | 8.6 |
LEUKEMIA | 8.6 | 10 | 2.2 | 7.6 | 1.4 | 23.4 | 46.6 | 12.2 | 10 |
LYMPHOMA | 7.4 | 10.6 | 6.2 | 20.6 | 2.6 | 30.6 | 32.6 | 21.4 | 14.6 |
SRBCT | 24.8 | 6 | 6.6 | 18.8 | 72 | 13.2 | 38.6 | 19.2 | 15.2 |
Lung Cancer | 14.2 | 11.8 | 12.2 | 30.6 | 1 | 40.2 | 84.2 | 38.2 | 50.6 |
Ovarian Cancer | 6.4 | 47.8 | 21 | 51.4 | 1 | 7.6 | 62.4 | 26.6 | 17.8 |
ARCENE | 1.4 | 22.6 | 10.4 | 3 | 1 | 30.8 | 4.8 | 1.8 | 23.2 |
MADELON | 4.6 | 9.8 | 2.6 | 6.8 | 1 | 2 | 1 | 2.6 | 10.2 |
Average | 7.7 | 12.7 | 6.6 | 15.7 | 6.7 | 19.7 | 24.8 | 12.8 | 14.8 |
Datasets | FNE- OGSFS | OGSFS- FI | Group- SAOLA | Alpha- investing | SFS- FI | OFS- A3M | FNRS | FNCE | FNPME- FS |
---|---|---|---|---|---|---|---|---|---|
Sonar | 0.3386 | 0.0237 | 0.0091 | 0.0024 | 0.0015 | 0.1673 | 0.3966 | 0.8133 | 0.4657 |
Wpbc | 0.159 | 0.0296 | 0.0125 | 0.0011 | 0.0017 | 0.0439 | 0.2032 | 0.3906 | 0.2546 |
Ionosphere | 0.3941 | 0.052 | 0.0166 | 0.0153 | 0.0012 | 0.1938 | 0.3619 | 1.2767 | 0.4993 |
Wdbc | 0.7986 | 0.5163 | 0.0089 | 0.0084 | 0.0011 | 0.6186 | 0.7101 | 2.8849 | 0.9722 |
COLON | 3.5324 | 0.2213 | 0.1689 | 0.0606 | 0.0467 | 0.9483 | 5.228 | 11.7071 | 7.2214 |
DLBCL | 9.4213 | 0.6451 | 0.6014 | 0.3338 | 0.1957 | 5.2565 | 8.3612 | 9.7326 | 9.9175 |
LEUKEMIA | 11.0722 | 0.524 | 0.7435 | 0.4950 | 0.2844 | 6.4142 | 34.5264 | 26.5811 | 30.7039 |
LYMPHOMA | 9.1881 | 0.2439 | 0.6631 | 0.4456 | 0.1241 | 15.2325 | 12.3147 | 14.275 | 18.3367 |
SRBCT | 11.5211 | 0.5226 | 0.2154 | 0.0941 | 0.9237 | 3.0298 | 15.039 | 15.7192 | 16.9631 |
Lung Cancer | 184.463 | 0.6432 | 3.4635 | 1.871 | 0.7521 | 65.0008 | 255.4301 | 271.3386 | 307.2651 |
Ovarian Cancer | 144.7541 | 2.8841 | 4.7695 | 3.4376 | 1.0179 | 66.4503 | 202.4256 | 236.786 | 210.9788 |
ARCENE | 36.81 | 5.1314 | 2.0133 | 0.9418 | 0.5137 | 21.5113 | 40.2291 | 52.3619 | 57.1783 |
MADELON | 94.5696 | 21.0849 | 0.0862 | 0.0975 | 0.0574 | 146.9587 | 71.3425 | 79.0516 | 89.4967 |
Average | 39.0017 | 2.5017 | 0.9825 | 0.6003 | 0.3016 | 25.5251 | 49.736 | 55.6091 | 57.7118 |
Classifiers | Mean rankings | \(\chi _F^2\) | \({F_F}\) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
FNE- OGSFS | OGSFS- FI | Group- SAOLA | Alpha- investing | SFS- FI | OFS- A3M | FNRS | FNCE | FNPME- FS | |||
KNN | 1.54 | 4 | 6.46 | 5 | 7.23 | 5.31 | 5.69 | 5.62 | 4.15 | 37.71 | 6.83 |
SVM | 1.23 | 3.46 | 6.46 | 4.62 | 7.38 | 5.23 | 5.69 | 7.46 | 3.46 | 57.85 | 15.04 |
NB | 1.23 | 4 | 7.85 | 5.38 | 6.69 | 5.15 | 5.62 | 5.62 | 3.46 | 51.13 | 11.61 |
CART | 1.54 | 4.15 | 7.23 | 5.23 | 6.69 | 5.31 | 6.08 | 4.92 | 3.46 | 35.22 | 6.14 |