research-article

Extracting and composing robust features with denoising autoencoders

Authors:
Pascal Vincent

Université de Montréal, Montral, Qubec, Canada

Université de Montréal, Montral, Qubec, Canada
View Profile

,
Hugo Larochelle

Université de Montréal, Montral, Qubec, Canada

Université de Montréal, Montral, Qubec, Canada
View Profile

,
Yoshua Bengio

Université de Montréal, Montral, Qubec, Canada

Université de Montréal, Montral, Qubec, Canada
View Profile

,
Pierre-Antoine Manzagol

Université de Montréal, Montral, Qubec, Canada

Université de Montréal, Montral, Qubec, Canada
View Profile

ICML '08: Proceedings of the 25th international conference on Machine learningJuly 2008Pages 1096–1103https://doi.org/10.1145/1390156.1390294

Published:05 July 2008Publication History

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 1096–1103

ABSTRACT

Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to initialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective. Comparative experiments clearly show the surprising advantage of corrupting the input of autoencoders on a pattern classification benchmark suite.

References

Bengio, Y. (2007). Learning deep architectures for AI (Technical Report 1312). Université de Montréal, dept. IRO.Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19 (pp. 153--160). MIT Press.Google Scholar
Bengio, Y., & Le Cun, Y. (2007). Scaling learning algorithms towards AI. In L. Bottou, O. Chapelle, D. DeCoste and J. Weston (Eds.), Large scale kernel machines. MIT Press.Google Scholar
Bishop, C. M. (1995). Training with noise is equivalent to tikhonov regularization. Neural Computation, 7, 108--116. Google ScholarDigital Library
Doi, E., Balcan, D. C., & Lewicki, M. S. (2006). A theoretical analysis of robust coding over noisy overcomplete channels. In Y. Weiss, B. Schöölkopf and J. Platt (Eds.), Advances in neural information processing systems 18, 307--314. Cambridge, MA: MIT Press.Google Scholar
Doi, E., & Lewicki, M. S. (2007). A theory of retinal population coding. NIPS (pp. 353--360). MIT Press.Google Scholar
Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15, 3736--3745. Google ScholarDigital Library
Gallinari, P., LeCun, Y., Thiria, S., & Fogelman-Soulie, F. (1987). Memoires associatives distribuees. Proceedings of COGNITIVA 87. Paris, La Villette.Google Scholar
Hammond, D., & Simoncelli, E. (2007). A machine learning framework for adaptive combination of signal denoising methods. 2007 International Conference on Image Processing (pp. VI: 29--32).Google ScholarCross Ref
Hinton, G. (1989). Connectionist learning procedures. Artificial Intelligence, 40, 185--234. Google ScholarDigital Library
Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.Google ScholarCross Ref
Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554. Google ScholarDigital Library
Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, USA, 79.Google ScholarCross Ref
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Proceedings of the 24^th International Conference on Machine Learning (ICML'2007) (pp. 473--480). Google ScholarDigital Library
LeCun, Y. (1987). Modèles connexionistes de l'apprentissage. Doctoral dissertation, Université de Paris VI.Google Scholar
Lee, H., Ekanadham, C., & Ng, A. (2008). Sparse deep belief net model for visual area V2. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20. Cambridge, MA: MIT Press.Google Scholar
McClelland, J., Rumelhart, D., & the PDP Research Group (1986). Parallel distributed processing: Explorations in the microstructure of cognition, vol. 2. Cambridge: MIT Press. Google ScholarDigital Library
Memisevic, R. (2007). Non-linear latent factor models for revealing structure in high-dimensional data. Doctoral dissertation, Departement of Computer Science, University of Toronto, Toronto, Ontario, Canada. Google ScholarDigital Library
Ranzato, M., Boureau, Y.-L., & LeCun, Y. (2008). Sparse feature learning for deep belief networks. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20. Cambridge, MA: MIT Press.Google Scholar
Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2007). Efficient learning of sparse representations with an energy-based model. Advances in Neural Information Processing Systems (NIPS 2006). MIT Press.Google Scholar
Roth, S., & Black, M. (2005). Fields of experts: a framework for learning image priors. IEEE Conference on Computer Vision and Pattern Recognition (pp. 860--867). Google ScholarDigital Library
Utgoff, P., & Stracuzzi, D. (2002). Many-layered learning. Neural Computation, 14, 2497--2539. Google ScholarDigital Library
Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders (Technical Report 1316). Université de Montréal, dept. IRO.Google ScholarDigital Library

Index Terms

Extracting and composing robust features with denoising autoencoders

Recommendations

Denoising adversarial autoencoders: classifying skin lesions using limited labelled training data

The authors propose a novel deep learning model for classifying medical images in the setting where there is a large amount of unlabelled medical data available, but the amount of labelled data is limited. They consider the specific case of classifying ...
Read More
Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures
LVA/ICA 2015: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation - Volume 9237

This work aims at a test-time fine-tune scheme to further improve the performance of an already-trained Denoising AutoEncoder DAE in the context of semi-supervised audio source separation. Although the state-of-the-art deep learning-based DAEs show ...
Read More
Convolutional adaptive denoising autoencoders for hierarchical feature extraction

Convolutional neural networks (CNNs) are typical structures for deep learning and are widely used in image recognition and classification. However, the random initialization strategy tends to become stuck at local plateaus or even diverge, which results ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3,909
  Total Citations
  View Citations
- 17,253
  Total Downloads
- Downloads (Last 12 months)1,745
- Downloads (Last 6 weeks)229
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Extracting and composing robust features with denoising autoencoders

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Denoising adversarial autoencoders: classifying skin lesions using limited labelled training data

Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures

Convolutional adaptive denoising autoencoders for hierarchical feature extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Extracting and composing robust features with denoising autoencoders

ICML '08: Proceedings of the 25th international conference on Machine learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Denoising adversarial autoencoders: classifying skin lesions using limited labelled training data

Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures

Convolutional adaptive denoising autoencoders for hierarchical feature extraction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media