1 Introduction
-
We propose to use a greedy optimization method for finding MAP solutions in the PM approach for learning RBMs.
-
We perform an empirical comparison of the PM approach against the Contrastive Divergence. We compare the methods both on unsupervised and supervised data. Given that [26] do not report any experimental results, this is the first empirical evaluation of the PM approach utilized for learning RBMs.
2 Background
2.1 Restricted Boltzmann Machine
2.2 Perturb-and-MAP Approach
3 Learning RBM Using PM
3.1 Perturb Step
3.2 MAP Step
3.2.1 Perturb and Coordinate Descent (P&CD) Learning
3.2.2 Perturb and Greedy Energy Optimization (P&GEO) Learning
4 Experiments
4.1 Toy Problem
Toy \(p=0.01\) | Toy \(p=0.1\) | |||||
---|---|---|---|---|---|---|
Avg. LL | Avg. Std | Avg. Iters | Avg. LL | Avg. Std | Avg. Iters | |
CD-1 |
\(-\,4.45\)
| 1.218 | 48 |
\(-\,6.63\)
| 0.011 | 79 |
P&GEO-1 |
\(-\,3.70\)
| 0.697 | 59 |
\(-\,6.86\)
| 0.221 | 87 |
P&CD-1 |
\(-\,2.99\)
| 0.418 | 48 |
\(-\,6.81\)
| 0.233 | 71 |
CD-5 |
\(-\,2.43\)
| 0.053 | 67 |
\(-\,6.61\)
| 0.008 | 85 |
P&GEO-5 |
\(-\,2.67\)
| 0.359 | 42 |
\(-\,6.62\)
| 0.018 | 91 |
P&CD-5 |
\(-\,2.47\)
| 0.355 | 54 |
\(-\,6.62\)
| 0.003 | 86 |
CD-10 |
\(-\,2.56\)
| 0.093 | 80 |
\(-\,6.62\)
| 0.005 | 93 |
P&GEO-10 |
\(-\,2.36\)
| 0.089 | 59 |
\(-\,6.62\)
| 0.053 | 133 |
P&CD-10 |
\(-\,2.78\)
| 0.173 | 105 |
\(-\,6.62\)
| 0.004 | 98 |
4.2 Image Datasets
4.2.1 Unsupervised Evaluation
LETTERS | MNIST | OMNIGLOT | FREY FACE | HCR | |
---|---|---|---|---|---|
CD-1 |
\(-\,38.63\pm 0.18\)
|
\(-\,108.10\pm 0.08\)
|
\(-\,126.02\pm 0.21\)
|
\(-\,119.54\pm 1.79\)
|
\(-\,160.04\pm 1.09\)
|
CD-5 |
\(-\,37.11\pm 0.06\)
|
\(-\,92.30\pm 0.31\)
|
\(-\,117.02\pm 1.27\)
|
\(-\,112.07\pm 0.88\)
|
\(-\,124.88\pm 1.48\)
|
CD-10 |
\(-\,36.94\pm 0.05\)
|
\(-\,88.94\pm 0.09\)
|
\(-\,114.76\pm 0.36\)
|
\(-\,111.33\pm 1.99\)
|
\(-\,114.57\pm 0.74\)
|
P&GEO-1 |
\(-\,39.17\pm 0.25\)
|
\(-\,103.46\pm 1.80\)
|
\(-\,126.00\pm 1.66\)
|
\(-\,113.38\pm 1.40\)
|
\(-\,122.11\pm 0.60\)
|
P&GEO-5 |
\(-\,37.63\pm 0.01\)
|
\(-\,88.38\pm 1.48\)
|
\(-\,112.08\pm 0.82\)
|
\(-\,108.49\pm 0.24\)
|
\(-\,114.69\pm 3.52\)
|
P&GEO-10 |
\(-\,37.64\pm 0.13\)
|
\(-\,95.31\pm 1.30\)
|
\(-\,110.24\pm 0.76\)
|
\(-\,108.48\pm 0.64\)
|
\(-\,120.40\pm 0.91\)
|
P&CD-1 |
\(-\,39.11\pm 0.12\)
|
\(-\,102.28\pm 0.70\)
|
\(-\,127.12\pm 2.28\)
|
\(-\,113.95\pm 0.70\)
|
\(-\,119.66\pm 0.49\)
|
P&CD-5 |
\(-\,37.60\pm 0.19\)
|
\(-\,90.79\pm 0.77\)
|
\(-\,110.19\pm 0.21\)
|
\(-\,108.65\pm 0.69\)
|
\(-\,114.65\pm 1.39\)
|
P&CD-10 |
\(-\,37.66\pm 0.10\)
|
\(-\,90.59\pm 0.51\)
|
\(-\,110.20\pm 0.34\)
|
\(-\,108.17\pm 0.70\)
|
\(-\,114.98\pm 0.31\)
|
Method | CD | P&GEO | P&CD | ||||||
---|---|---|---|---|---|---|---|---|---|
K
| 1 | 5 | 10 | 1 | 5 | 10 | 1 | 5 | 10 |
MNIST (\(28\times 28)\) | 2.2 | 3.6 | 5.6 | 2.5 | 3.9 | 5.8 | 2.6 | 3.8 | 5.4 |
FREY FACE (\(20\times 28)\) | 1.1 | 1.7 | 2.6 | 1.6 | 2.0 | 2.7 | 1.5 | 2.0 | 2.6 |
LETTERS \((16 \times 10)\) | 0.9 | 1.2 | 1.6 | 0.9 | 1.2 | 1.5 | 1.0 | 1.2 | 1.5 |
TOY \((4 \times 4)\) | 0.4 | 0.5 | 0.7 | 0.5 | 0.6 | 0.7 | 0.5 | 0.6 | 0.7 |
4.2.2 Supervised Evaluation
scikit-learn
package8. The experiment was repeated 3 times and the best results are reported.Method | MNIST | OMNIGLOT | ||||
---|---|---|---|---|---|---|
AvgPrec | ClassAcc | NMI | AvgPrec | ClassAcc | NMI | |
CD | 0.476 | 0.969 | 0.921 | 0.052 | 0.203 | 0.794 |
P&GEO | 0.481 | 0.969 | 0.923 | 0.052 | 0.205 | 0.796 |
P&CD | 0.483 | 0.969 | 0.922 | 0.051 | 0.201 | 0.794 |
4.3 Text Dataset
4.3.1 Unsupervised Evaluation
\(K=1\)
|
\(K=5\)
|
\(K=10\)
| |
---|---|---|---|
CD |
\(-\,13.72\pm 0.02\)
|
\(-\,13.73 \pm 0.03\)
|
\(-\,13.70 \pm 0.02\)
|
P&GEO |
\(-\,18.19 \pm 0.19\)
|
\(-\,13.70 \pm 0.01\)
|
\(-\,13.72\pm 0.02\)
|
P&CD |
\(-\,18.38\pm 0.18\)
|
\(-\,13.71 \pm 0.02\)
|
\(-\,13.72 \pm 0.02 \)
|
4.3.2 Supervised Evaluation
Method | AvgPrec | ClassAcc | NMI |
---|---|---|---|
CD | 0.497 | 0.799 | 0.485 |
P&GEO | 0.503 | 0.801 | 0.490 |
P&CD | 0.502 | 0.801 | 0.491 |