research-article

Training restricted Boltzmann machines using approximations to the likelihood gradient

Author:
Tijmen Tieleman

University of Toronto, Toronto, Ontario, Canada

University of Toronto, Toronto, Ontario, Canada
View Profile

ICML '08: Proceedings of the 25th international conference on Machine learningJuly 2008Pages 1064–1071https://doi.org/10.1145/1390156.1390290

Published:05 July 2008Publication History

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 1064–1071

ABSTRACT

A new algorithm for training Restricted Boltzmann Machines is introduced. The algorithm, named Persistent Contrastive Divergence, is different from the standard Contrastive Divergence algorithms in that it aims to draw samples from almost exactly the model distribution. It is compared to some standard Contrastive Divergence and Pseudo-Likelihood algorithms on the tasks of modeling and classifying various types of data. The Persistent Contrastive Divergence algorithm outperforms the other algorithms, and is equally fast and simple.

References

Bengio, Y., & Delalleau, O. (2007). Justifying and generalizing contrastive divergence (Technical Report 1311). Universitéé de Montréal.Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., & Montreal, Q. (2007). Greedy Layer-Wise Training of Deep Networks. Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference.Google Scholar
Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society B, 48, 259--302.Google Scholar
Borenstein, E., Sharon, E., & Ullman, S. (2004). Combining Top-Down and Bottom-Up Segmentation. Computer Vision and Pattern Recognition Workshop, 2004 Conference on, 46--46. Google ScholarDigital Library
Carreira-Perpinan, M., & Hinton, G. (2005). On contrastive divergence learning. Artificial Intelligence and Statistics, 2005.Google Scholar
Gehler, P., Holub, A., & Welling, M. (2006). The rate adapting poisson model for information retrieval and object recognition. Proceedings of the 23rd international conference on Machine learning, 337--344. Google ScholarDigital Library
Hinton, G. (2002). Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14, 1771--1800. Google ScholarDigital Library
Hinton, G., & Salakhutdinov, R. (2006). Reducing the Dimensionality of Data with Neural Networks. Science, 313, 504--507.Google Scholar
Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18. Google ScholarDigital Library
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Proceedings of the 24th international conference on Machine learning, 473--480. Google ScholarDigital Library
LeCun, Y., & Cortes, C. The MNIST database of handwritten digits.Google Scholar
Neal, R. (1992). Connectionist learning of belief networks. Artificial Intelligence, 56, 71--113. Google ScholarDigital Library
Robbins, H., & Monro, S. (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22, 400--407.Google ScholarCross Ref
Salakhutdinov, R., Mnih, A., & Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning, 791--798. Google ScholarDigital Library
Salakhutdinov, R., & Murray, I. (2008). On the quantitative analysis of deep belief networks. Proceedings of the International Conference on Machine Learning. Google ScholarDigital Library
Smolensky, P. (1986). Information processing in dynamical systems: foundations of harmony theory. MIT Press Cambridge, MA, USA.Google Scholar
Wainwright, M., & Jordan, M. (2003). Graphical models, exponential families, and variational inference. UC Berkeley, Dept. of Statistics, Technical Report, 649.Google Scholar
Welling, M., & Hinton, G. (2002). A New Learning Algorithm for Mean Field Boltzmann Machines. Artificial Neural Networks-Icann 2002: International Conference, Madrid, Spain, August 28--30, 2002: Proceedings. Google ScholarDigital Library
Welling, M., Rosen-Zvi, M., & Hinton, G. (2005). Exponential family harmoniums with an application to information retrieval. Advances in Neural Information Processing Systems, 17, 1481--1488.Google Scholar
Younes, L. (1999). On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics An International Journal of Probability and Stochastic Processes, 65, 177--228.Google Scholar
Yuille, A. (2004). The Convergence of Contrastive Divergences. Advances in Neural Information Processing Systems, 3, 4.Google Scholar

Index Terms

Training restricted Boltzmann machines using approximations to the likelihood gradient

Recommendations

Optimizing restricted Boltzmann machine learning by injecting Gaussian noise to likelihood gradient approximation

Restricted Boltzmann machines (RBMs) can be trained by applying stochastic gradient ascent to the objective function as the maximum likelihood learning. However, it is a difficult task due to the intractability of marginalization function gradient. ...
Read More
Training Restricted Boltzmann Machines with Overlapping Partitions
Machine Learning and Knowledge Discovery in Databases
Abstract
Restricted Boltzmann Machines (RBM) are energy-based models that are successfully used as generative learning models as well as crucial components of Deep Belief Networks (DBN). The most successful training method to date for RBMs is the ...
Read More
Approximate Learning Algorithm for Restricted Boltzmann Machines
CIMCA '08: Proceedings of the 2008 International Conference on Computational Intelligence for Modelling Control & Automation

A restricted Boltzmann machine consists of a layer of visible units and a layer of hidden units with no visible-visible or hidden-hidden connections. The restricted Boltzmann machine is the main component used in building up the deep belief network and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 463
  Total Citations
  View Citations
- 1,980
  Total Downloads
- Downloads (Last 12 months)114
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Training restricted Boltzmann machines using approximations to the likelihood gradient

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimizing restricted Boltzmann machine learning by injecting Gaussian noise to likelihood gradient approximation

Training Restricted Boltzmann Machines with Overlapping Partitions

Approximate Learning Algorithm for Restricted Boltzmann Machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Training restricted Boltzmann machines using approximations to the likelihood gradient

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimizing restricted Boltzmann machine learning by injecting Gaussian noise to likelihood gradient approximation

Training Restricted Boltzmann Machines with Overlapping Partitions

Approximate Learning Algorithm for Restricted Boltzmann Machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media