skip to main content
10.1145/1390156.1390290acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Training restricted Boltzmann machines using approximations to the likelihood gradient

Published:05 July 2008Publication History

ABSTRACT

A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the model distribution. It is compared to some standard Contrastive Divergence and Pseudo-Likelihood algorithms on the tasks of modeling and classifying various types of data. The Persistent Contrastive Divergence algorithm outperforms the other algorithms, and is equally fast and simple.

References

  1. Bengio, Y., & Delalleau, O. (2007). Justifying and generalizing contrastive divergence (Technical Report 1311). Universitéé de Montréal.Google ScholarGoogle Scholar
  2. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., & Montreal, Q. (2007). Greedy Layer-Wise Training of Deep Networks. Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference.Google ScholarGoogle Scholar
  3. Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society B, 48, 259--302.Google ScholarGoogle Scholar
  4. Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining Top-Down and Bottom-Up Segmentation. Computer Vision and Pattern Recognition Workshop, 2004 Conference on, 46--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Carreira-Perpinan, M., & Hinton, G. (2005). On contrastive divergence learning. Artificial Intelligence and Statistics, 2005.Google ScholarGoogle Scholar
  6. Gehler, P., Holub, A., & Welling, M. (2006). The rate adapting poisson model for information retrieval and object recognition. Proceedings of the 23rd international conference on Machine learning, 337--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hinton, G. (2002). Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14, 1771--1800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hinton, G., & Salakhutdinov, R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313, 504--507.Google ScholarGoogle Scholar
  9. Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Proceedings of the 24th international conference on Machine learning, 473--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. LeCun, Y., & Cortes, C. The MNIST database of handwritten digits.Google ScholarGoogle Scholar
  12. Neal, R. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56, 71--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Robbins, H., & Monro, S. (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400--407.Google ScholarGoogle ScholarCross RefCross Ref
  14. Salakhutdinov, R., Mnih, A., & Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning, 791--798. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Salakhutdinov, R., & Murray, I. (2008). On the quantitative analysis of deep belief networks. Proceedings of the International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Smolensky, P. (1986). Information processing in dynamical systems: foundations of harmony theory. MIT Press Cambridge, MA, USA.Google ScholarGoogle Scholar
  17. Wainwright, M., & Jordan, M. (2003). Graphical models, exponential families, and variational inference. UC Berkeley, Dept. of Statistics, Technical Report, 649.Google ScholarGoogle Scholar
  18. Welling, M., & Hinton, G. (2002). A New Learning Algorithm for Mean Field Boltzmann Machines. Artificial Neural Networks-Icann 2002: International Conference, Madrid, Spain, August 28--30, 2002: Proceedings. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Welling, M., Rosen-Zvi, M., & Hinton, G. (2005). Exponential family harmoniums with an application to information retrieval. Advances in Neural Information Processing Systems, 17, 1481--1488.Google ScholarGoogle Scholar
  20. Younes, L. (1999). On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics An International Journal of Probability and Stochastic Processes, 65, 177--228.Google ScholarGoogle Scholar
  21. Yuille, A. (2004). The Convergence of Contrastive Divergences. Advances in Neural Information Processing Systems, 3, 4.Google ScholarGoogle Scholar

Index Terms

  1. Training restricted Boltzmann machines using approximations to the likelihood gradient

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  ICML '08: Proceedings of the 25th international conference on Machine learning
                  July 2008
                  1310 pages
                  ISBN:9781605582054
                  DOI:10.1145/1390156

                  Copyright © 2008 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 5 July 2008

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate140of548submissions,26%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader