1 Introduction
-
Proposal of new methods for automatic selection of locally specialized one-class classifiers, based on the determination of a number of mutually complementary competence areas.
-
A complete step-by-step guidance on construction of an efficient one-class ensemble based on soft object space partitioning, that does not require any parameters to be tuned manually.
-
Presentation of extensive experiments that allow evaluating usefulness of the proposed methods for selecting a number of competence areas in both one- and multi-class scenarios.
2 One-class classification
2.1 Classification in the absence of counterexamples
2.2 One-class classifiers for multi-class problems
3 One-class clustering-based ensemble
-
Boundary-based approaches (such as WOCSVM) were shown to display better generalization abilities than clustering-based (reconstruction) OCC [32], but are highly prone to atypical and complex data distributions. Therefore, a hybrid method utilizing both approaches combines the advantages of each while reducing their drawbacks.
-
Since classifier is trained only on a reduced chunk of the data, its computational complexity is reduced in comparison to a single model approach. This reduces the probability of overtraining the one-class learner. Additionally, a number of individual classifiers can easily be applied in a distributed environment, leading to a significant decrease in execution time.
-
Using chunks of data as the classifier input reduces the influence of negative effect, known as the empty sphere; that is, the area covered by the boundary in which no objects from the training set are located [18].
-
A boundary classifier trained on a more compact data partition usually has a lower number of support vectors.
-
By combining the fuzzy clustering with weighting scheme, we are able to obtain good estimation of weights assigned to training objects in a reduced time.
4 Automatic selection of competence areas
4.1 Indexes based on membership values
4.1.1 Partition coefficient
4.1.2 Partition entropy
4.1.3 Modified partition coefficient
4.2 Indexes based on membership values and dataset
4.2.1 I index
4.2.2 Cluster validity measure
4.2.3 Fukuyama–Sugeno index
4.2.4 Fuzzy hyper volume
4.2.5 Average partition density
4.2.6 Xie–Beni index
4.3 Statistical indexes
4.3.1 Akaike information criterion
5 Experimental investigations
-
Evaluation on one-class problems, where we train OCClustE with a single class data without an access to counterexamples.
-
Evaluation on multi-class problem, where we decompose a data set into M separate one-class tasks, train OCClustE on each of them, and then reconstruct the original decision with the usage of ECOC combiner.
5.1 Data sets
No. | Name | Objects | Features | Classes |
---|---|---|---|---|
1. | Breast-cancer | 286 (85) | 9 | 2 |
2. | Breast-Wisconsin | 699 (241) | 9 | 2 |
3. | Colic | 368 (191) | 22 | 2 |
4. | Diabetes | 768 (268) | 8 | 2 |
5. | Heart-statlog | 270 (120) | 13 | 2 |
6. | Hepatitis | 155 (32) | 19 | 2 |
7. | Ionosphere | 351 (124) | 34 | 2 |
8. | Sonar | 208 (97) | 60 | 2 |
9. | Voting records | 435 (168) | 16 | 2 |
10. | CYP2C19 isoform | 837 (181) | 242 | 2 |
11. | Autos | 159 | 25 | 6 |
12. | Car | 1728 | 6 | 4 |
13. | Cleveland | 297 | 13 | 5 |
14. | Dermatology | 366 | 33 | 6 |
15. | Ecoli | 336 | 7 | 8 |
16. | Flare | 1389 | 10 | 6 |
17. | Lymphography | 148 | 18 | 4 |
18. | Segment | 2310 | 19 | 7 |
19. | Vehicle | 846 | 18 | 4 |
20. | Yeast | 1484 | 8 | 10 |
5.2 Set-up
-
For simultaneous training/testing and pairwise comparison, we use a \(5\times 2\) combined CV F-test [1]. It repeats five-time two fold cross-validation so that in each of the folds the size of the training and testing sets is equal. This test is conducted by comparison of all versus all.
-
For assessing the ranks of classifiers over all examined benchmarks, we use a Friedman ranking test [8]. It checks, if the assigned ranks are significantly different from assigning to each classifier an average rank.
-
We use the Shaffer post-hoc test [12] to find out which of the tested methods are distinctive among an \(n \times n\) comparison. The post-hoc procedure is based on a specific value of the significance level \(\alpha \). Additionally, the obtained p values should be examined in order to check how different are the pairs of algorithms.
5.3 Results and discussion
5.3.1 Experiments with one-class classification
Dataset | PC | PE | MPC |
I
| CVM | FS | FHV | APD | XBI | AIC |
---|---|---|---|---|---|---|---|---|---|---|
Breast-cancer | 5 | 4 | 5 | 3 | 6 | 4 | 3 | 5 | 5 | 3 |
Breast-Wisconsin | 7 | 6 | 7 | 5 | 3 | 4 | 5 | 3 | 6 | 5 |
Colic | 4 | 4 | 4 | 2 | 4 | 3 | 2 | 3 | 3 | 2 |
Diabetes | 5 | 7 | 6 | 7 | 5 | 6 | 7 | 5 | 5 | 7 |
Heart-statlog | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
Hepatitis | 2 | 3 | 2 | 3 | 4 | 4 | 3 | 2 | 2 | 3 |
Ionosphere | 3 | 4 | 3 | 5 | 7 | 7 | 3 | 4 | 3 | 5 |
Sonar | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 |
Voting records | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
CYP2C19 isoform | 9 | 10 | 7 | 11 | 7 | 10 | 11 | 9 | 9 | 12 |
Dataset | PC\(^{1}\)
| PE\(^{2}\)
| MPC\(^{3}\)
|
I
4
| CVM\(^{5}\)
| FS\(^{6}\)
| FHV\(^{7}\)
| APD\(^{8}\)
| XBI\(^{9}\)
| AIC\(^{10}\)
|
---|---|---|---|---|---|---|---|---|---|---|
Breast-cancer | 61.28 | 63.79 | 61.28 | 65.18 | 57.49 | 63.79 | 65.18 | 61.28 | 61.28 | 65.18 |
\(^{5}\)
|
\(^{1,3,5,8,9}\)
|
\(^{5}\)
|
\(^{1,2,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,8,9}\)
|
\(^{1,2,3,5,6,8,9}\)
|
\(^{5}\)
|
\(^{5}\)
|
\(^{1,2,3,5,6,8,9}\)
| |
Breast-Wisconsin | 88.93 | 91.45 | 88.93 | 92.18 | 88.25 | 89.76 | 92.18 | 88.25 | 91.45 | 92.18 |
\(^{\text{--}}\)
|
\(^{1,3,5,6,8}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,8}\)
|
\(^{1,3,5,6,8}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8}\)
|
\(^{1,3,5,6,8}\)
| |
Colic | 78.03 | 78.03 | 78.03 | 80.72 | 78.03 | 75.69 | 80.72 | 75.69 | 75.69 | 80.72 |
\(^{6,8,9}\)
|
\(^{6,8,9}\)
|
\(^{6,8,9}\)
|
\(^{1,2,3,5,6,8,9}\)
|
\(^{6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,2,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{1,2,3,5,6,8,9}\)
| |
Diabetes | 55.39 | 62.05 | 59.16 | 62.05 | 55.39 | 59.16 | 62.05 | 55.39 | 55.39 | 62.05 |
\(^{\text{--}}\)
|
\(^{1,3,5,6,7,8,9}\)
|
\(^{1,5,8,9}\)
|
\(^{1,3,5,6,7,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,5,8,9}\)
|
\(^{1,3,5,6,7,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,7,8,9}\)
| |
Heart-statlog | 87.11 | 87.11 | 87.11 | 87.11 | 87.11 | 87.11 | 87.11 | 87.11 | 87.11 | 87.11 |
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
| |
Hepatitis | 56.78 | 60.46 | 56.78 | 60.46 | 58.12 | 58.12 | 60.46 | 56.78 | 56.78 | 60.46 |
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{1,3,8,9}\)
|
\(^{1,3,8,9}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
| |
Ionosphere | 78.64 | 80.63 | 78.64 | 80.92 | 72.07 | 72.07 | 78.64 | 80.63 | 78.64 | 80.92 |
\(^{5,6}\)
|
\(^{1,3,5,6,7,9}\)
|
\(^{5,6}\)
|
\(^{1,3,5,6,7,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{5,6}\)
|
\(^{1,3,5,6,7,9}\)
|
\(^{5,6}\)
|
\(^{1,3,5,6,7,9}\)
| |
Sonar | 92.12 | 92.12 | 92.12 | 92.12 | 92.12 | 92.12 | 92.12 | 92.12 | 92.12 | 93.56 |
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{ALL}\)
| |
Voting records | 89.64 | 89.64 | 89.64 | 89.64 | 89.64 | 89.64 | 89.64 | 89.64 | 89.64 | 89.64 |
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
| |
CYP2C19 isoform | 75.62 | 80.09 | 73.18 | 80.98 | 73.18 | 80.09 | 80.98 | 75.62 | 75.62 | 83.01 |
\(^{3,5}\)
|
\(^{1,3,5,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,8,9}\)
|
\(^{1,3,5,8,9}\)
|
\(^{3,5}\)
|
\(^{3,5}\)
|
\(^{ALL}\)
| |
Avg. rank | 9.20 | 4.10 | 8.60 | 2.70 | 6.80 | 5.80 | 2.80 | 7.30 | 5.40 | 2.30 |
5.3.2 Experiments with multi-class decomposition
Dataset | PC | PE | MPC |
I
| CVM | FS | FHV | APD | XBI | AIC |
---|---|---|---|---|---|---|---|---|---|---|
Autos | 2 | 1 | 2 | 1 | 2 | 2 | 1 | 2 | 2 | 1 |
Car | 7 | 8 | 7 | 9 | 7 | 8 | 9 | 7 | 7 | 9 |
Cleveland | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
Dermatology | 2 | 3 | 2 | 3 | 2 | 2 | 3 | 2 | 2 | 3 |
Ecoli | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
Flare | 10 | 8 | 10 | 6 | 8 | 9 | 6 | 10 | 8 | 6 |
Lymphography | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 |
Segment | 10 | 5 | 10 | 5 | 8 | 8 | 5 | 10 | 10 | 5 |
Vehicle | 7 | 5 | 7 | 5 | 8 | 8 | 5 | 7 | 7 | 4 |
Yeast | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Dataset | PC\(^{1}\)
| PE\(^{2}\)
| MPC\(^{3}\)
|
I
4
| CVM\(^{5}\)
| FS\(^{6}\)
| FHV\(^{7}\)
| APD\(^{8}\)
| XBI\(^{9}\)
| AIC\(^{10}\)
|
---|---|---|---|---|---|---|---|---|---|---|
Autos | 63.98 | 68.12 | 63.98 | 68.12 | 63.98 | 63.98 | 68.12 | 63.98 | 63.98 | 68.12 |
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
| |
Car | 85.18 | 87.82 | 85.18 | 90.06 | 85.18 | 87.82 | 90.06 | 85.18 | 85.18 | 90.06 |
\(^{\text{--}}\)
|
\(^{1,3,5,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,2,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,8,9}\)
|
\(^{1,2,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{1,2,3,5,6,8,9}\)
| |
Cleveland | 60.01 | 60.01 | 60.01 | 60.01 | 60.01 | 60.01 | 60.01 | 60.01 | 60.01 | 60.01 |
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
| |
Dermatology | 91.07 | 94.52 | 91.07 | 94.52 | 91.07 | 91.07 | 94.52 | 91.07 | 91.07 | 94.52 |
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
| |
Ecoli | 80.02 | 80.02 | 80.02 | 80.02 | 80.02 | 80.02 | 80.02 | 80.02 | 80.02 | 80.02 |
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
| |
Flare | 60.86 | 69.02 | 60.86 | 73.28 | 69.02 | 60.86 | 73.28 | 60.86 | 69.02 | 73.28 |
\(^{\text{--}}\)
|
\(^{1,3,6,8}\)
|
\(^{\text{--}}\)
|
\(^{1,2,3,5,6,8,9}\)
|
\(^{1,3,6,8}\)
|
\(^{\text{--}}\)
|
\(^{1,2,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,6,8}\)
|
\(^{1,2,3,5,6,8,9}\)
| |
Lymphography | 75.36 | 75.36 | 75.36 | 75.36 | 75.36 | 75.36 | 75.36 | 75.36 | 75.36 | 79.73 |
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{ALL}\)
| |
Segment | 79.17 | 92.18 | 79.17 | 92.18 | 84.82 | 84.82 | 92.18 | 79.17 | 79.17 | 92.18 |
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{1,3,8,9}\)
|
\(^{1,3,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
| |
Vehicle | 63.87 | 68.02 | 63.87 | 68.02 | 65.06 | 65.06 | 68.02 | 63.87 | 63.87 | 69.94 |
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{1,3,8,9}\)
|
\(^{1,3,8,9}\)
|
\(^{1,3,5,6,8,9}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{ALL}\)
| |
Yeast | 60.76 | 60.76 | 60.76 | 60.76 | 60.76 | 60.76 | 60.76 | 60.76 | 60.76 | 60.76 |
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
|
\(^{\text{--}}\)
| |
Avg. rank | 8.80 | 4.50 | 8.10 | 2.80 | 6.00 | 5.40 | 3.00 | 7.90 | 6.60 | 2.20 |
5.3.3 Discussion of results
Hypothesis |
p value | Hypothesis |
p value | Hypothesis |
p value |
---|---|---|---|---|---|
I vs PC | +(0.0087) | FHV vs PC | +(0.0113) | AIC vs PC | +(0.0064) |
I vs PE | +(0.0376) | FHV vs PE | +(0.0407) | AIC vs PE | +(0.0302) |
I vs MPC | +(0.0162) | FHV vs MPC | +(0.0202) | AIC vs MPC | +(0.0126) |
I vs CVM | +(0.0230) | FHV vs CVM | +(0.0308) | AIC vs CVM | +(0.0184) |
I vs FS | +(0.0302) | FHV vs FS | +(0.0394) | AIC vs FS | +(0.0258) |
I vs FHV | =(0.1026) | FHV vs APD | +(0.0122) | AIC vs APD | +(0.0076) |
I vs APD | +(0.0108) | FHV vs XBI | +(0.0371) | AIC vs XBI | +(0.0259) |
I vs XBI | +(0.0339) | FHV vs AIC | −(0.0465) | – | – |
I vs AIC | −(0.0488) | – | – | – | – |