1 Introduction
-
We propose SGD-based solution for finding the domain of novelty. We perform a rigorous convergence analysis for the proposed solution. We note that the works of [1, 2, 11, 15, 23] utilized the Sequential-Minimal-Optimization-based approach [17] to find the domain of novelty wherein the computational complexity is over-quadratic and it requires loading the entire Gram matrix to the main memory.
-
We propose new clustering assignment strategy which can reduce the clustering assignment for N samples in the entire training set to the same task for M equilibrium points where M is usually very small comparing with N.
-
Comparing with the conference version [16], this paper presents a more rigorous convergence analysis with the full proofs and explanations. In addition, it further introduces new strategy for clustering assignment. Regarding the experiment, it compares with more baselines and produces more experimental results.
2 Stochastic gradient descent large margin one-class support vector machine
2.1 Large margin one-class support vector machine
2.2 SGD-based Solution in the primal form
2.3 Convergence analysis
3 Clustering assignment
4 Experiments
4.1 Visual experiment
4.2 Experiment on real datasets
4.2.1 Clustering validity index
Datasets | Size | Dimension | #Classes |
---|---|---|---|
Aggregation | 788 | 2 | 7 |
Breast cancer | 699 | 9 | 2 |
Compound | 399 | 2 | 6 |
D31 | 3100 | 2 | 31 |
Flame | 240 | 2 | 2 |
Glass | 214 | 9 | 7 |
Iris | 150 | 4 | 3 |
Jain | 373 | 2 | 2 |
Pathbased | 300 | 2 | 3 |
R15 | 600 | 2 | 15 |
Spiral | 312 | 2 | 3 |
Abalone | 4177 | 8 | 28 |
Car | 1728 | 6 | 4 |
Musk | 6598 | 198 | 2 |
Shuttle | 43,500 | 9 | 5 |
Datasets | Purity | Rand index | NMI | ||||||
---|---|---|---|---|---|---|---|---|---|
SVC | SGD | FSVC | VC | SGD | FSVC | SVC | SGD | FSVC | |
Aggregation |
1.00
|
1.00
| 0.22 |
1.00
|
1.00
| 0.22 | 0.69 |
0.75
| 0.60 |
Breast cancer | 0.98 |
0.99
|
0.99
| 0.82 |
0.85
| 0.81 | 0.22 |
0.55
| 0.45 |
Compound |
0.66
| 0.62 | 0.13 |
0.92
| 0.88 | 0.25 | 0.51 |
0.81
| 0.45 |
Flame | 0.86 |
0.87
| 0.03 | 0.75 |
0.76
| 0.03 |
0.55
| 0.51 | 0.05 |
Glass | 0.5 |
0.71
| 0.65 | 0.77 |
0.91
| 0.54 |
0.60
| 0.44 | 0.53 |
Iris |
1.00
|
1.00
| 0.68 |
0.97
| 0.96 | 0.69 | 0.63 |
0.75
| 0.71 |
Jain | 0.37 | 0.46 |
0.69
| 0.7 | 0.71 |
0.77
| 0.53 | 0.31 |
1.00
|
Pathbased | 0.6 | 0.5 |
1.00
| 0.81 | 0.94 |
1.00
|
0.48
| 0.43 | 0.12 |
R15 | 0.88 |
0.9
| 0.37 |
0.74
| 0.71 | 0.37 | 0.67 |
0.77
|
0.77
|
Spiral | 0.09 | 0.33 |
0.53
| 0.15 |
0.94
| 0.75 |
0.52
| 0.34 | 0.16 |
D31 | 0.94 |
0.99
| 0.42 |
0.88
| 0.81 | 0.54 | 0.45 |
0.50
| 0.38 |
Abalone | 0.22 |
0.44
| 0.03 | 0.43 |
0.86
| 0.12 | 0.22 |
0.34
| 0.07 |
Car | 0.94 |
0.95
| 0.70 | 0.46 | 0.46 |
0.54
|
0.32
|
0.32
| 0.24 |
Musk | 0.87 | 0.68 |
0.88
| 0.26 |
0.28
| 0.26 | 0.21 | 0.16 |
0.23
|
Shuttle |
0.06
| 0.05 |
0.06
|
0.84
| 0.83 | 0.75 | 0.26 | 0.41 |
0.50
|
4.2.2 Baselines
-
Fast support vector clustering (FSVC) [8] an equilibrium-based approach for clustering assignment.
Datasets | Compactness | DB index | ||||
---|---|---|---|---|---|---|
SVC | SGD | FSVC | SVC | SGD | FSVC | |
Aggregation |
0.29
|
0.29
| 2.84 |
0.68
| 0.67 | 0.63 |
Breast cancer | 1.26 |
0.68
| 0.71 |
1.58
| 1.38 | 0.53 |
Compound | 0.5 |
0.21
| 2.43 |
2.45
| 0.86 | 0.67 |
Flame | 0.58 |
0.44
| 2.28 |
1.3
| 0.65 | 3.56 |
Glass | 0.72 |
0.68
| 1.85 | 0.53 | 0.56 |
0.93
|
Iris | 0.98 |
0.25
| 0.99 |
1.95
| 1.17 | 0.77 |
Jain | 0.96 |
0.36
| 1.16 |
1.23
| 1.08 | 0.71 |
Pathbased |
0.18
| 0.3 | 1.04 | 0.36 | 0.73 |
1.07
|
R15 | 0.61 |
0.13
| 1.84 |
2.96
| 1.42 | 1.37 |
Spiral | 2 |
0.17
| 0.18 |
1.41
| 0.98 | 0.36 |
D31 | 1.41 |
0.26
| 1.78 |
2.33
| 1.35 | 1.21 |
Abalone | 3.88 |
0.40
| 4.97 | 3.78 |
3.91
| 1.29 |
Car | 0.75 |
0.74
| 14.68 |
1.76
|
1.76
| 1.57 |
Musk |
9.89
| 30.05 | 20.00 | 2.27 |
2.83
| 0.01 |
Shuttle | 0.50 | 0.46 |
0.26
|
1.86
| 1.84 | 1.32 |
Datasets | Training time | Clustering time | ||||
---|---|---|---|---|---|---|
SVC | SGD | FSVC | SVC | SGD | FSVC | |
Aggregation | 0.05 |
0.03
| 0.05 | 31.42 |
2.83
| 7.51 |
Breast cancer | 0.18 |
0.02
| 0.05 | 19.80 |
2.14
| 22.86 |
Compound | 0.03 |
0.02
| 0.10 | 6.82 |
1.17
| 7.24 |
Flame |
0.02
|
0.02
| 15.16 | 1.81 |
0.67
| 4.31 |
Glass | 0.03 | 0.03 |
0.02
| 2.30 |
0.53
| 10.67 |
Iris |
0.02
|
0.02
| 0.04 | 1.03 |
0.34
| 4.33 |
Jain |
0.02
|
0.02
| 0.03 | 5.80 |
0.81
| 4.59 |
Pathbased |
0.02
|
0.02
| 0.05 | 4.02 |
0.54
| 4.22 |
R15 |
0.02
|
0.02
|
0.02
| 4.14 |
3.68
| 10.43 |
Spiral |
0.02
|
0.03
|
0.02
| 1.60 |
0.99
| 7.78 |
D31 | 0.17 |
0.09
|
0.09
| 467.72 |
6.56
| 33.08 |
Abalone | 2.26 |
0.81
| 10.94 | 653.65 |
26.58
| 242.97 |
Car | 5.62 |
0.64
| 8.15 | 67.66 |
7.05
| 84.47 |
Musk | 55.93 |
5.79
| 58.49 | 602.09 |
432.58
| 510.25 |
Shuttle | 10.03 |
0.46
| 68.43 | 1,972.61 |
925
| 1,125.46 |