1 Introduction
2 Related research
3 EM algorithm for Gaussian mixture models
-
E-Step: Given the set of mixture parameters \(\Theta ^{(j-1)}\) from the previous iteration, for each \(m=1,\ldots ,K\) and \(i=1,\ldots ,N\), the posterior probability that a feature vector \({\mathbf{x}}^i\) was generated from the \(m\)-th component is calculated as:$$\begin{aligned} h_m^{(j)}({\mathbf{x}}^i)=\frac{\alpha _m^{(j-1)} p_m({\mathbf{x}}^i|\theta _m^{(j-1)})}{\sum _{k=1}^K \alpha _k^{(j-1)} p_k({\mathbf{x}}^i|\theta _k^{(j-1)})}, \end{aligned}$$(8)where \(\theta _m^{(j-1)}\) and \(\theta _k^{(j-1)}\) denote parameters of components \(m\) and \(k\), in the iteration \(j-1\), respectively.
-
M-Step: Given the posterior probabilities \(h_m^{(j)}({\mathbf{x}}^i)\), the set of parameters \(\Theta ^{(j)}\) is calculated as:$$\begin{aligned} \alpha _m^{(j)}=\frac{1}{N}\sum _{i=1}^{N}h_m^{(j)}({\mathbf{x}}^i) \end{aligned}$$(9)$$\begin{aligned} {\mathbf{\mu }}_m^{(j)}=\frac{\sum _{i=1}^{N}h_m^{(j)}({\mathbf{x}}^i)*{\mathbf{x}}^i}{\sum _{i=1}^{N}h_m^{(j)}({\mathbf{x}}^i)} \end{aligned}$$(10)$$\begin{aligned} \Sigma _m^{(j)}=\frac{\sum _{i=1}^{N}h_m^{(j)}({\mathbf{x}}^i)({\mathbf{x}}^i-{\mathbf{\mu }}_m^{(j)})({\mathbf{x}}^i-{\mathbf{\mu }}_m^{(j)})^T}{\sum _{i=1}^{N}h_m^{(j)}({\mathbf{x}}^i)}. \end{aligned}$$(11)
4 Description of the proposed rnd-maxmin method
4.1 Component means
4.2 Component covariances
-
The relation of the largest eigenvalue to the smallest eigenvalue cannot be greater than 10.
-
The sum of all eigenvalues should be equal \(\frac{1}{10 d K}\mathrm {tr}(\Sigma )\), where \(\mathrm {tr}\) is the trace of a matrix and \(\Sigma\) is the covariance matrix of the learning set \(X\).
4.3 Influence of the proposed method on computational complexity of the MREM algorithm
5 Experimental results
5.1 The algorithms
5.2 Hardware and software
5.3 Synthetic datasets
Method |
\(\bar{\omega }\)
| 0.001 | 0.01 | 0.05 | ||||||
---|---|---|---|---|---|---|---|---|---|---|
\(\check{\omega }\)
| 0.05 | 0.1 | 0.15 | 0.25 | 0.4 | 0.6 | 0.45 | 0.6 | 0.75 | |
rnd-maxmin
|
\(\epsilon _A\)
|
1.16
|
1.72
|
1.13
|
2.71
|
2.96
|
2.74
|
10.72
| 10.80 | 10.91 |
\(p_\epsilon\)
| 5.027e-05 | 0.0001269 | 3.312e-05 | 0.004019 | 0.001762 | 0.1156 | 0.4064 | 0.3956 | 0.8545 | |
\(L\)
|
15010.5
|
18642.2
|
21519
|
8404.22
|
8916.42
|
10697.9
|
1891.88
|
2231.61
|
2644.72
| |
\(p_{L}\)
| 1.911e-09 | 1.425e-09 | 9.802e-09 | 0.0002352 | 1.323e-06 | 3.454e-05 | 0.475 | 0.4064 | 0.2202 | |
rnd-kmeans
|
\(\epsilon _A\)
| 6.04 | 8.32 | 7.96 | 7.77 | 6.83 | 7.08 | 15.01 | 13.48 | 13.80 |
\(L\)
| 14881.6 | 18376.1 | 21231.3 | 8382.19 | 8893.78 | 10669.4 | 1890.34 | 2229.95 | 2643.59 | |
rnd-nearest
|
\(\epsilon _A\)
| 5.17 | 6.47 | 7.31 | 7.09 | 6.15 | 5.94 | 11.71 | 12.07 |
10.41
|
\(L\)
| 14780 | 18277.5 | 21037.4 | 8352.84 | 8870.62 | 10646.3 | 1888.96 | 2229.25 | 2641.7 | |
rnd-spherical
|
\(\epsilon _A\)
| 2.65 | 3.78 | 3.34 | 4.05 | 4.27 | 3.59 | 12.66 |
10.02
| 10.54 |
\(L\)
| 14869.8 | 18427.7 | 21298.9 | 8386.96 | 8893.28 | 10671 | 1891.29 | 2230.63 | 2643.49 | |
Optimal ARI | 0.9886 | 0.9863 | 0.9879 | 0.8959 | 0.8860 | 0.8976 | 0.6870 | 0.6489 | 0.6591 |
Method |
\(\bar{\omega }\)
| 0.001 | 0.01 | 0.05 | ||||||
---|---|---|---|---|---|---|---|---|---|---|
\(\check{\omega }\)
| 0.02 | 0.03 | 0.15 | 0.12 | 0.45 | 0.8 | 0.25 | 0.5 | 0.75 | |
rnd-maxmin
|
\(\epsilon _A\)
|
0.25
|
0.37
|
0.46
|
1.52
|
2.69
|
18.66
|
9.75
|
9.16
|
17.94
|
\(p_\epsilon\)
| 4.341e-07 | 1.394e-07 | 8.949e-09 | 0.002767 | 0.04782 | 1.513e-09 | 0.186 | 0.09116 | 0.0001604 | |
\(L\)
|
14586.5
|
16150.1
|
26070.9
|
3877.54
|
6143.61
|
793.217
|
−6186.06
| −5296.48 |
−9373.68
| |
\(p_{L}\)
| 2.683e-07 | 2.515e-08 | 2.035e-09 | 8.581e-06 | 0.206 | 9.347e-10 | 0.6022 | 0.8849 | 1.631e-08 | |
rnd-kmeans
|
\(\epsilon _A\)
| 3.92 | 3.55 | 5.03 | 5.13 | 5.75 | 14.06 | 12.14 | 12.35 | 16.98 |
\(L\)
| 14498.9 | 16070.7 | 25914.3 | 3851.96 | 6132.82 | 1089.46 | −6187.72 | −5298.09 | −9348.99 | |
rnd-nearest
|
\(\epsilon _A\)
| 2.32 | 3.28 | 3.57 | 2.56 | 4.31 | 10.22 | 10.96 | 10.01 |
14.24
|
\(L\)
| 14482.2 | 16026.7 | 25867.6 | 3857.99 | 6131.11 |
1237.43
| −6193.37 | −5299.95 | −9340.61 | |
rnd-spherical
|
\(\epsilon _A\)
| 1.84 | 2.42 | 3.12 | 2.36 | 3.37 |
9.88
| 10.40 | 9.83 | 14.41 |
\(L\)
| 14504.2 | 16051.9 | 25902.8 | 3859.84 | 6142.69 | 1208.65 | −6190.5 |
−5296.36
|
−9340.41
| |
Optimal ARI | 0.9872 | 0.9870 | 0.9881 | 0.8929 | 0.8980 | 0.9108 | 0.6622 | 0.6734 | 0.6659 |
Method |
\(\bar{\omega }\)
| 0.001 | 0.01 | 0.05 | ||||||
---|---|---|---|---|---|---|---|---|---|---|
\(\check{\omega }\)
| 0.02 | 0.03 | 0.15 | 0.12 | 0.45 | 0.8 | 0.25 | 0.5 | 0.75 | |
rnd-maxmin
|
\(\epsilon _A\)
|
0.24
|
0.79
|
1.20
|
2.23
|
2.75
|
16.80
|
13.08
|
12.99
|
24.91
|
\(p_\epsilon\)
| 1.52e-08 | 3.005e-07 | 0.0001484 | 2.813e-07 | 5.459e-05 | 2.064e-07 | 0.02052 | 9.622e-05 | 4.328e-06 | |
\(L\)
|
8784.88
|
14524.5
|
23539.2
|
−11192.3
|
−10122.5
|
−16748.2
| −30004.1 |
−29665.3
|
−37536.3
| |
\(p_{L}\)
| 1.909e-09 | 4.005e-09 | 0.0009293 | 1.631e-08 | 5.928e-07 | 9.347e-10 | 0.7998 | 0.3667 | 1.705e-09 | |
rnd-kmeans
|
\(\epsilon _A\)
| 3.98 | 4.99 | 4.74 | 7.58 | 6.18 | 11.48 | 17.04 | 17.51 | 21.94 |
\(L\)
| 8670.12 | 14395.2 | 23484.2 | −11244 | −10156.9 |
−16379.2
|
−30003.1
| −29668.6 |
−37458
| |
rnd-nearest
|
\(\epsilon _A\)
| 2.04 | 3.20 | 2.98 | 4.69 | 3.98 | 10.44 | 16.00 | 15.97 |
19.81
|
\(L\)
| 8645.98 | 14380.6 | 23435 | −11252.3 | −10176.8 | −16469.9 | −30055.6 | −29722.6 | −37513.3 | |
rnd-spherical
|
\(\epsilon _A\)
| 2.28 | 3.17 | 3.47 | 4.23 | 3.91 |
9.30
| 14.04 | 14.94 | 20.60 |
\(L\)
| 8664.25 | 14387.3 | 23428.8 | −11232.5 | −10156 | −16422.1 | −30027.1 | −29691.3 | −37488 | |
Optimal ARI | 0.9879 | 0.9875 | 0.9905 | 0.9069 | 0.9001 | 0.9082 | 0.6750 | 0.6721 | 0.6885 |
-
rnd-maxmin dominates the others with respect to ARI and log likelihood when the average overlap between components is very low (\(\bar{\omega }=0.001\)) regardless of the value of \(d\) and \(\check{\omega }\).
-
For situations with low average overlap (\(\bar{\omega }=0.01\)), our method also dominates the others if the maximum overlap between component is not very high. However, the experiments identified the clear weakness of our approach: it usually yields results worse than the other methods (and sometimes by a large margin) if the maximum overlap between components is very high. We conjecture, that in these cases our strategy of locating component means far from already initialized components fails to place two means in close vicinity.
-
A similar trend is repeated for \(\bar{\omega }=0.05\).
-
The performance trends with respect to ARI and log likelihood are not identical (although they are very similar for \(\bar{\omega }=0.001\)).
5.4 Real datasets
Method |
\(\bar{\omega }=0.001\)
|
\(\bar{\omega }=0.01\)
|
\(\bar{\omega }=0.05\)
|
---|---|---|---|
rnd-maxmin
| 9/9 | 6/6 | 2/0 |
rnd-kmeans
| 0/0 | 0/1 | 0/1 |
rnd-nearest
| 0/0 | 0/1 | 0/0 |
rnd-spherical
| 0/0 | 0/0 | 0/0 |
5.4.1 Pendigits dataset
Method |
\(\log p(X|\Theta )\)
|
\(p_\mathrm {L}\)
| ARI |
\(p_\mathrm {ARI}\)
|
---|---|---|---|---|
rnd-spherical
| −419829 ± 142 | 4.35e–11 | 0.6214 ± 0.011 | 1.97e–4 |
rnd-nearest
| −419849 ± 171 | 6.38e–10 | 0.6204 ± 0.012 | 3.35e–3 |
rnd-kmeans
| −420267 ± 117 | 4.47e–18 | 0.6149 ± 0.007 | 2.65e–14 |
rnd-maxmin
| −419687 ± 33.1 | – | 0.6254 ± 0.005 | – |
5.4.2 Brodatz dataset
Method |
\(\log p(X|\Theta )\)
|
\(p_\mathrm {L}\)
| ARI |
\(p_\mathrm {ARI}\)
|
---|---|---|---|---|
rnd-spherical
|
\(227429 \pm 471\)
|
\(1.21\text {e-}4\)
|
\(0.8634 \pm 0.026\)
|
\(1.26\text {e-}4\)
|
rnd-nearest
|
\(227181 \pm 587\)
|
\(3.97\text {e-}7\)
|
\(0.8564 \pm 0.028\)
|
\(2.66\text {e-}7\)
|
rnd-kmeans
|
\(225001 \pm 209\)
|
\(6.9\text {e-}18\)
|
\(0.7966 \pm 0.011\)
|
\(9.34\text {e-}18\)
|
rnd-maxmin
|
\(227766 \pm 384\)
| – |
\(0.8794 \pm 0.018\)
| – |
5.4.3 Thyroid dataset
Method |
\(\log p(X|\Theta )\)
| ARI |
---|---|---|
rnd-spherical
|
\(-2238.39 \pm 0\)
|
\(0.8629 \pm 0\)
|
rnd-nearest
|
\(-2238.39 \pm 0\)
|
\(0.8629 \pm 0\)
|
rnd-kmeans
|
\(-2238.39 \pm 0\)
|
\(0.8629 \pm 0\)
|
rnd-maxmin
|
\(-2238.39 \pm 0\)
|
\(0.8629 \pm 0\)
|
5.4.4 Iris dataset
Method |
\(\log p(X|\Theta )\)
| ARI |
---|---|---|
rnd-spherical
|
\(-180.186 \pm 1\text {e}-4\)
|
\(0.9039 \pm 0\)
|
rnd-nearest
|
\(-180.186 \pm 1\text {e}-4\)
|
\(0.9039 \pm 0\)
|
rnd-kmeans
|
\(-180.186 \pm 0\)
|
\(0.9039 \pm 0\)
|
rnd-maxmin
|
\(-180.186 \pm 1\text {e}-4\)
|
\(0.9039 \pm 0\)
|