Introduction
Related works
Background knowledge
Locally linear embedding (LLE)
-
Finding \(l\) neighbors for each \({x}_{i}\) sample with D dimensions.
-
Reconstructing each sample based on the weighted linear combination of its \(l\) nearest neighbors and obtaining the optimal weights \({W}_{ij}\) by minimizing the following equation. In other words, a weighted neighborhood graph is constructed in this step and \({W}_{ij}\)=0 if \({x}_{j}\) is not neighbor of \({x}_{i}\) [33]:\(\varepsilon \left(W\right)={\sum }_{i}{\left|{\overrightarrow{x}}_{i}-\sum_{j}^{l}{W}_{ij}{\overrightarrow{x}}_{j}\right|}^{2}\) \(\left\{{x}_{j}\right\}:l\) nearest neighbors of \({x}_{i}\)$${\sum }_{j}{W}_{ij}=1$$(1)
-
With the optimal weights \({W}_{ij}\) and minimizing the following equation, new embedded \({y}_{i}\) vectors with d dimensions (d < D) can be obtained [32].$$\Phi \left(\mathrm{y}\right)={\sum }_{i}{\left|{\overrightarrow{y}}_{i}-\sum_{j}^{l}{W}_{ij}{\overrightarrow{y}}_{j}\right|}^{2} \left\{{y}_{j}\right\}:l\,\mathrm{ nearest \, neighbors\, of \, }{y}_{i}$$(2)
Geodesic distance
Latent Space Model (LSM)
-
Initialization The latent space \({Z}_{N*Q}\) is initialized randomly with a normal distribution with zero mean and unit variance.$${z}_{ij}=N\left(0.I\right) i.j=1\dots N$$(5)
-
Inference In this step, Maximum a Posteriori (MAP) method is used to estimate hidden space Z under the model. Equation 6 estimates the latent variable \(Z\) by the MAP method where X presents input network and Z presents the latent space.$${Z}_{MAP}={\mathit{arg}}_{z}\mathrm{maxlog}p\left(X|Z\right)$$(6)$${=\mathit{arg}}_{z}\mathrm{max}\sum_{i=1}^{N}\mathrm{log}p\left({x}_{i}|Z\right)$$$${=arg}_{z}\mathrm{ min}\left(-\frac{1}{N}\sum_{i=1}^{N}\mathrm{log}p\left({x}_{i}|Z\right)\right)$$In order to estimate the hidden variable Z by MAP, the gradient descent optimization method is used in a definite number of iterations until it converges to local optima [37].$${\nabla }_{z }\mathrm{log}p(X.Z)$$(7)
-
Estimate connected edges This step is optional unless you are looking to analyze latent space and to estimate connectivity patterns. As the probability of being the edge \({Y}_{ij}\) between the two nodes \({x}_{i}\) and \({x}_{j}\) in the network depends on the Euclidean distance between their positions in the latent space, \({z}_{i}\) and \({z}_{j}\) so \(\left|{z}_{i}-{z}_{j}\right|\) is calculated. Then the probability of the edge, \({Y}_{ij}\) is computed in terms of the Poisson probability distribution function.$${Y}_{ij}=Poisson(\frac{1}{\left|{z}_{i}-{z}_{j}\right|})$$(8)
The proposed method
Experimental results and discussion
Data sets
Dataset | Measure Type | # Records | # Dimensions | # Classes |
---|---|---|---|---|
Iris | Euclidean | 150 | 4 | 3 |
Glass | Euclidean | 214 | 9 | 7 |
Ionosphere | Euclidean | 351 | 34 | 2 |
Monk | Euclidean | 122 | 6 | 2 |
Breast Cancer | Euclidean | 569 | 30 | 2 |
Wine | Euclidean | 178 | 13 | 3 |
USPS_nE | Two-way tangent | 250 | 256 | 10 |
USPS_E | Euclidean | 500 | 256 | 10 |
Digit5 | Euclidean | 200 | 64 | 5 |
MNIST | Euclidean | 70,000 | 784 | 10 |
Performance measurement
Evaluation scenarios
Dataset | LLE (k_LLE) | LSM (lsm_dimension) | LSM (lsm_iteration) | K-fold cross validation | Prototype numbers (k_neighbor_manifold) | ||
---|---|---|---|---|---|---|---|
Iris | 55 | 75 | 100 | 10 | 1,20,75,100,149 | ||
Glass | 10 | 107 | 100 | 10 | 1,10,60,140,213 | ||
Ionosphere | 35 | 200 | 100 | 10 | 1,20,50,100,350 | ||
Monk | 10 | 120 | 100 | 10 | 1,20,60,90,121 | ||
Breast Cancer | 35 | 500 | 100 | 10 | 1,10,20,50,100 | ||
Wine | 15 | 140 | 100 | 10 | 1,10,20,50,80 | ||
USPS_nE | 30 | 125 | 100 | 10 | 1,20,40,60,249 | ||
USPS_E | 10 | 420 | 100 | 10 | 1,20,100,120,150 | ||
Digit5 | 20 | 160 | 100 | 10 | 1,10,20,100,150 | ||
MNIST | 100 | 2000 | 100 | 10 | 1,1000,5000,10,000 |
Results
Evaluating the whole proposed framework
Classifier | #Prototype | ||||
---|---|---|---|---|---|
1 | 20 | 75 | 100 | 149 | |
1-nn | 96.00(± 1.2) | 95.33(± 0.8) | 96.00(± 0.2) | 96.00(± 0.8) | 96.00(± 0.7) |
3-nn | 94.00(± 2.1) | 95.33(± 1.1) | 94.00(± 1.2) | 96.67(± 1.6) | 96.00(± 1.4) |
5-nn | 95.33(± 2.3) | 95.33(± 1.3) | 93.33(± 2.4) | 96.67(± 1.4) | 95.33(± 1.2) |
7-nn | 91.33(± 3.3) | 96.00(± 1.1) | 92.67(± 3.2) | 96.00(± 1.9) | 97.33(± 0.5) |
11-nn | 94.00(± 2.5) | 96.00(± 0.9) | 94.67(± 2.1) | 96.67(± 1.3) | 96.67(± 1.2) |
linSVM | 94.67(± 1.1) | 98.00(± 0.2) | 95.33(± 2.3) | 95.33(± 2.3) | 95.33(± 2.3) |
polySVM(2) | 94.67(± 1.3) | 97.33(± 0.8) | 95.33(± 1.2) | 94.67(± 3.1) | 95.33(± 1.2) |
polySVM(3) | 94.67(± 2.3) | 97.33(± 1.5) | 95.33(± 1.4) | 94.00(± 3.2) | 94.67(± 1.3) |
Average | 94.33(± 1.38) | 96.33(± 1.07) | 94.58(± 1.15) | 95.75(± 1.0) | 95.83(± 0.85) |
Classifier | #prototype | ||||
---|---|---|---|---|---|
1 | 20 | 75 | 100 | 149 | |
1-nn | 0.10 | 0.09 | 0.09 | 0.09 | 0.11 |
3-nn | 0.10 | 0.09 | 0.11 | 0.09 | 0.11 |
5-nn | 0.11 | 0.09 | 0.10 | 0.09 | 0.13 |
7-nn | 0.10 | 0.09 | 0.09 | 0.09 | 0.09 |
11-nn | 0.12 | 0.09 | 0.09 | 0.08 | 0.09 |
linSVM | 0.01 | 0.01 | 0.00 | 0.01 | 0.01 |
polySVM(2) | 0.01 | 0.00 | 0.00 | 0.01 | 0.01 |
polySVM(3) | 0.01 | 0.00 | 0.00 | 0.01 | 0.01 |
Average | 0.10 | 0.09 | 0.09 | 0.09 | 0.11 |
Comparing with the basic dissimilarity spaces
Dataset | Proposed Algorithm | DS_All | DS_Random | DS_RandomC | Dataset |
---|---|---|---|---|---|
Iris | |||||
Best (20,linSVM) | 98.00(± 0.2) | 96.00(± 2.6) | 96.67(± 2.7) | 96.67(± 2.7) | 97.33(± 0.05) |
Average | 96.33(± 1.7) | 95.92(± 2.7) | 96.17(± 2.8) | 95.83(± 3.2) | 95.92(± 0.04) |
Glass | |||||
Best(60,1-NN) | 77.57(± 2.1) | 73.83(± 6.2) | 72.90(± 6.1) | 72.43(± 5.3) | 69.16(± 5.8) |
Average | 71.61(± 2.4) | 70.09(± 5.6) | 69.68(± 5.8) | 70.15(± 5.6) | 67.00(± 7.5) |
Ionosphere | |||||
Best(50,5-NN) | 95.73(± 1.1) | 92.59(± 2.8) | 92.59(± 2.2) | 91.74(± 2.6) | 84.33(± 3.6) |
Average | 93.38(± 1.5) | 93.36(± 2.1) | 93.02(± 2.4) | 92.45(± 3.6) | 82.59(± 3.8) |
Monk | |||||
Best(60,polySVM2) | 92.62(± 0.4) | 90.98(± 1.2) | 90.98(± 1.2) | 90.98(± 1.3) | 90.80(± 1.3) |
Average | 89.55(± 0.9) | 89.14(± 0.4) | 87.86(± 1.1) | 89.14(± 0.3) | 82.07(± 0.8) |
BreastCancer | |||||
Best(50,11-NN) | 94.20(± 0.8) | 92.27(± 3.2) | 91.56(± 3.9) | 92.27(± 2.8) | 96.49(± 0.5) |
Average | 91.83(± 0.7) | 91.55(± 0.4) | 87.81(± 1.8) | 89.26(± 1.2) | 89.98(± 1.5) |
Wine | |||||
Best(10,linSVM) | 74.72(± 1.2) | 80.34(± 2.1) | 73.03(± 3.5) | 74.72(± 4.2) | 96.07(± 0.5) |
Average | 74.37(± 1.8) | 72.89(± 3.2) | 65.17(± 6.1) | 67.98(± 4.5) | 96.35(± 0.4) |
USPS_nE | |||||
Best(60,linSVM) | 96.80(± 0.2) | 94.40(± 0.01) | 88.00(± 2.3) | 87.60(± 3.1) | 87.60(± 2.4) |
Average | 93.55(± 1.5) | 72.60(± 3.2) | 70.50(± 3.5) | 70.45(± 3.1) | 82.55(± 2.4) |
USPS_E | |||||
Best(120,linSVM) | 91.40(± 1.0) | 89.80(± 0.8) | 90.40(± 0.4) | 89.00(± 0.7) | 90.40(± 0.7) |
Average | 88.03(± 0.1) | 88.05(± 0.9) | 87.05(± 0.7) | 86.10(± 1.4) | 85.80(± 0.4) |
Digit5 | |||||
Best(10,linSVM) | 99.50(± 0.1) | 99.50(± 0.07) | 97.50(± 0.7) | 96.50(± 0.8) | 99.50(± 0.00) |
Average | 99.19(± 0.1) | 99.25(± 0.9) | 96.50(± 1.1) | 96.75(± 0.7) | 98.69(± 0.4) |
MNIST | |||||
Best(5000,1-NN) | 94.05(± 3.5) | 96.67(± 0.8) | 95.67(± 0.4) | 83.33(± 2.1) | 97.30(± 0.2) |
Average | 90.46(± 4.6) | 94.21(± 0.9) | 94.25(± 0.5) | 82.50(± 2.3) | 93.29(± 1.01) |
Dataset | Proposed Algorithm | DS_All | DS_Random | DS_RandomC | Dataset |
---|---|---|---|---|---|
Iris | |||||
Best (20,linSVM) | 0.98(± 0.03) | 0.95(0.03) | 0.96(0.02) | 0.96(0.02) | 0.93(± 0.03) |
Average | 0.95(± 0.03) | 0.88(± 0.04) | 0.88(± 0.03) | 0.88(± 0.03) | 0.94(± 0.02) |
Glass | |||||
Best(60,1-NN) | 0.56(± 0.32) | 0.56(± 0.3) | 0.56(± 0.3) | 0.54(± 0.3) | 0.53(± 0.2) |
Average | 0.50(± 0.3) | 0.49(± 0.3) | 0.46(± 0.3) | 0.46(± 0.3) | 0.45(± 0.3) |
Ionosphere | |||||
Best(50,5-NN) | 0.94(± 0.03) | 0.91(± 0.03) | 0.91(± 0.03) | 0.90(± 0.04) | 0.82(± 0.1) |
Average | 0.92(± 0.03) | 0.92(± 0.03) | 0.91(± 0.03) | 0.90(± 0.03) | 0.83(± 0.09) |
Monk | |||||
Best(60,polySVM2) | 0.89(± 0.00) | 0.88(± 0.00) | 0.85(± 0.01) | 0.93(± 0.00) | 0.92(± 0.00) |
Average | 0.87(± 0.01) | 0.88(± 0.00) | 0.88(± 0.00) | 0.89(± 0.00) | 0.81(± 0.00) |
BreastCancer | |||||
Best(50,11-NN) | 0.91(± 0.03) | 0.91(± 0.03) | 0.90(± 0.03) | 0.90(± 0.03) | 0.90(± 0.03) |
Average | 0.8(± 0.08) | 0.82(± 0.08) | 0.85(± 0.07) | 0.83(± 0.07) | 0.86(± 0.06) |
Wine | |||||
Best(10,linSVM) | 0.69(± 0.17) | 0.80(± 0.05) | 0.70(± 0.13) | 0.69(± 0.12) | 0.96(± 0.02) |
Average | 0.6(± 0.2) | 0.62(± 0.17) | 0.63(± 0.20) | 0.61(± 0.20) | 0.81(± 0.02) |
USPS_nE | |||||
Best(60,linSVM) | 0.89(± 0.17) | 0.85(± 0.2) | 0.75(± 0.2) | 0.78(± 0.2) | 0.79(± 0.19) |
Average | 0.83(± 0.2) | 0.85(± 0.19) | 0.80(± 0.24) | 0.81(± 0.19) | 0.76(± 0.18) |
USPS_E | |||||
Best(120,linSVM) | 0.86(± 0.09) | 0.89(± 0.06) | 0.85(± 0.09) | 0.83(± 0.09) | 0.85(± 0.09) |
Average | 0.78(± 0.11) | 0.73(± 0.08) | 0.81(± 0.11) | 0.80(± 0.12) | 0.83(± 0.08) |
Digit5 | |||||
Best(10,linSVM) | 0.98(± 0.01) | 0.98(± 0.01) | 0.95(± 0.01) | 0.94(± 0.05) | 0.98(± 0.01) |
Average | 0.98(± 0.01) | 0.92(± 0.02) | 0.95(± 0.02) | 0.95(± 0.03) | 0.98(± 0.01) |
MNIST | |||||
Best(5000,1-NN) | 0.85(± 0.02) | 0.90(± 0.02) | 0.89(± 0.03) | 0.92(± 0.02) | 0.98(± 0.00) |
Average | 0.83(± 0.02) | 0.91(± 0.01) | 0.90(± 0.02) | 0.91(± 0.01) | 0.97(± 0.01) |
Evaluating the LSM phase
Dataset | Proposed Algorithm | Proposed Algorithm without LSM | ||
---|---|---|---|---|
Accuracy(± std) | \({f}_{1}\) | Accuracy(± std) | \({f}_{1}\) | |
Iris | ||||
Best (20,linSVM) | 98.00(± 0.2) | 0.98(± 0.03) | 98.00(± 0.0) | 0.96(± 0.03) |
Average | 96.33(± 1.7) | 0.95(± 0.03) | 96.67(± 0.1) | 0.96(± 0.03) |
Glass | ||||
Best(60,1-NN) | 77.57(± 2.1) | 0.56(± 0.32) | 76.64(± 1.5) | 0.50(± 0.30) |
Average | 71.61(± 2.4) | 0.50(± 0.3) | 71.85(± 2.5) | 0.42(± 0.60) |
Ionosphere | ||||
Best(50,5-NN) | 95.73(± 1.1) | 0.94(± 0.03) | 91.45(± 1.3) | 0.94(± 0.07) |
Average | 93.38(± 1.5) | 0.92(± 0.03) | 93.16(± 0.2) | 0.83(± 0.07) |
Monk | ||||
Best(60,polySVM2) | 92.62(± 0.4) | 0.89(± 0.00) | 90.98(± 1.1) | 0.87(± 0.00) |
Average | 89.55(± 0.9) | 0.87(± 0.01) | 88.52(± 0.7) | 0.86(± 0.01) |
BreastCancer | ||||
Best(50,11-NN) | 94.20(± 0.8) | 0.91(± 0.03) | 94.20(± 0.1) | 0.91(± 0.03) |
Average | 91.83(± 0.7) | 0.8(± 0.08) | 91.40(± 0.6) | 0.73(± 0.09) |
Wine | ||||
Best(10,linSVM) | 74.72(± 1.2) | 0.69(± 0.17) | 75.28(± 0.5) | 0.68(± 0.18) |
Average | 74.37(± 1.8) | 0.6(± 0.2) | 73.53(± 1.1) | 0.5(± 0.2) |
USPS_nE | ||||
Best(60,linSVM) | 96.80(± 0.2) | 0.89(± 0.17) | 94.40(± 0.9) | 0.90(± 0.11) |
Average | 93.55(± 1.5) | 0.83(± 0.2) | 74.10(± 2.3) | 0.85(± 0.17) |
USPS_E | ||||
Best(120,linSVM) | 91.40(± 1.0) | 0.86(± 0.09) | 90.60(± 0.3) | 0.80(± 0.12) |
Average | 88.03(± 0.1) | 0.78(± 0.11) | 87.25(± 0.2) | 0.79(± 0.14) |
Digit5 | ||||
Best(10,linSVM) | 99.50(± 0.1) | 0.98(± 0.01) | 99.50(± 0.0) | 0.97(± 0.01) |
Average | 99.19(± 0.1) | 0.98(± 0.01) | 99.06(± 0.1) | 0.98(± 0.01) |
MNIST | ||||
Best(50,linSVM) | 94.05(± 3.5) | 0.85(± 0.02) | 88.88(± 5.03) | 0.75(± 0.03) |
Average | 90.46(± 4.6) | 0.83(± 0.02) | 85.16(± 4.7) | 0.73(± 0.02) |
Dataset | Proposed Algorithm | Proposed Algorithm without LSM |
---|---|---|
Iris | ||
Best (20,linSVM) | 0.01 | 0.05 |
Average | 0.06 | 0.07 |
Glass | ||
Best(60,1-NN) | 0.10 | 0.13 |
Average | 0.10 | 0.13 |
Ionosphere | ||
Best(50,5-NN) | 0.15 | 0.17 |
Average | 0.13 | 0.17 |
Monk | ||
Best(60,polySVM2) | 0.02 | 0.09 |
Average | 0.08 | 0.08 |
BreastCancer | ||
Best(50,11-NN) | 0.40 | 0.45 |
Average | 0.47 | 0.53 |
Wine | ||
Best(10,linSVM) | 0.10 | 0.15 |
Average | 0.11 | 0.10 |
USPS_nE | ||
Best(60,linSVM) | 0.08 | 0.15 |
Average | 0.10 | 0.18 |
USPS_E | ||
Best(120,linSVM) | 0.88 | 1.09 |
Average | 0.51 | 0.60 |
Digit5 | ||
Best(10,linSVM) | 0.06 | 0.08 |
Average | 0.09 | 0.10 |
MNIST | ||
Best(5000,1-NN) | 10,000 | 12,000 |
Average | 11,200 | 13,300 |
Comparing with the recent approaches
Methods | Iris | Ionosphere | USPS |
---|---|---|---|
Proposed method | 98(± 0.2)(linSVM) | 95.73(± 1.1)(5-NN) | 96.80(± 1.0)(linSVM) |
Calvo et al. [10] | – | – | 86.70(1-NN) |
Simovici et al. [46] | 94.00(7-NN) | 61.00(3-NN) | – |
Wang et al. [24] | – | – | 85.67(linSVM) |
Conclusions
-
Creating the prototype set P dynamically and unevenly for all samples, so that each sample is not required to be compared with all the prototypes in the set P, but only compared with its k-nearest neighbors on the manifold and then the dissimilarity space is created. As a result, this has a great effect on reducing the computational complexity.
-
The proposed complete framework, in comparison with the method without LSM, requires less classification runtime, which indicates the positive effect of the LSM model on complexity reduction of the augmented dissimilarity space.
-
The proposed framework outperforms the accuracy of the latest similar approaches.
-
In most scenarios, the linear SVM classifier on the proposed method can achieve the highest accuracy.