ABSTRACT
A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the model distribution. It is compared to some standard Contrastive Divergence and Pseudo-Likelihood algorithms on the tasks of modeling and classifying various types of data. The Persistent Contrastive Divergence algorithm outperforms the other algorithms, and is equally fast and simple.
- Bengio, Y., & Delalleau, O. (2007). Justifying and generalizing contrastive divergence (Technical Report 1311). Universitéé de Montréal.Google Scholar
- Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., & Montreal, Q. (2007). Greedy Layer-Wise Training of Deep Networks. Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference.Google Scholar
- Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society B, 48, 259--302.Google Scholar
- Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining Top-Down and Bottom-Up Segmentation. Computer Vision and Pattern Recognition Workshop, 2004 Conference on, 46--46. Google ScholarDigital Library
- Carreira-Perpinan, M., & Hinton, G. (2005). On contrastive divergence learning. Artificial Intelligence and Statistics, 2005.Google Scholar
- Gehler, P., Holub, A., & Welling, M. (2006). The rate adapting poisson model for information retrieval and object recognition. Proceedings of the 23rd international conference on Machine learning, 337--344. Google ScholarDigital Library
- Hinton, G. (2002). Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14, 1771--1800. Google ScholarDigital Library
- Hinton, G., & Salakhutdinov, R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313, 504--507.Google Scholar
- Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18. Google ScholarDigital Library
- Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Proceedings of the 24th international conference on Machine learning, 473--480. Google ScholarDigital Library
- LeCun, Y., & Cortes, C. The MNIST database of handwritten digits.Google Scholar
- Neal, R. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56, 71--113. Google ScholarDigital Library
- Robbins, H., & Monro, S. (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400--407.Google ScholarCross Ref
- Salakhutdinov, R., Mnih, A., & Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning, 791--798. Google ScholarDigital Library
- Salakhutdinov, R., & Murray, I. (2008). On the quantitative analysis of deep belief networks. Proceedings of the International Conference on Machine Learning. Google ScholarDigital Library
- Smolensky, P. (1986). Information processing in dynamical systems: foundations of harmony theory. MIT Press Cambridge, MA, USA.Google Scholar
- Wainwright, M., & Jordan, M. (2003). Graphical models, exponential families, and variational inference. UC Berkeley, Dept. of Statistics, Technical Report, 649.Google Scholar
- Welling, M., & Hinton, G. (2002). A New Learning Algorithm for Mean Field Boltzmann Machines. Artificial Neural Networks-Icann 2002: International Conference, Madrid, Spain, August 28--30, 2002: Proceedings. Google ScholarDigital Library
- Welling, M., Rosen-Zvi, M., & Hinton, G. (2005). Exponential family harmoniums with an application to information retrieval. Advances in Neural Information Processing Systems, 17, 1481--1488.Google Scholar
- Younes, L. (1999). On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics An International Journal of Probability and Stochastic Processes, 65, 177--228.Google Scholar
- Yuille, A. (2004). The Convergence of Contrastive Divergences. Advances in Neural Information Processing Systems, 3, 4.Google Scholar
Index Terms
- Training restricted Boltzmann machines using approximations to the likelihood gradient
Recommendations
Optimizing restricted Boltzmann machine learning by injecting Gaussian noise to likelihood gradient approximation
Restricted Boltzmann machines (RBMs) can be trained by applying stochastic gradient ascent to the objective function as the maximum likelihood learning. However, it is a difficult task due to the intractability of marginalization function gradient. ...
Training Restricted Boltzmann Machines with Overlapping Partitions
Machine Learning and Knowledge Discovery in DatabasesAbstractRestricted Boltzmann Machines (RBM) are energy-based models that are successfully used as generative learning models as well as crucial components of Deep Belief Networks (DBN). The most successful training method to date for RBMs is the ...
Approximate Learning Algorithm for Restricted Boltzmann Machines
CIMCA '08: Proceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & AutomationA restricted Boltzmann machine consists of a layer of visible units and a layer of hidden units with no visible-visible or hidden-hidden connections. The restricted Boltzmann machine is the main component used in building up the deep belief network and ...
Comments