skip to main content
10.1145/1390156.1390294acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Extracting and composing robust features with denoising autoencoders

Published:05 July 2008Publication History

ABSTRACT

Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to initialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective. Comparative experiments clearly show the surprising advantage of corrupting the input of autoencoders on a pattern classification benchmark suite.

References

  1. Bengio, Y. (2007). Learning deep architectures for AI (Technical Report 1312). Université de Montréal, dept. IRO.Google ScholarGoogle Scholar
  2. Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems 19 (pp. 153--160). MIT Press.Google ScholarGoogle Scholar
  3. Bengio, Y., & Le Cun, Y. (2007). Scaling learning algorithms towards AI. In L. Bottou, O. Chapelle, D. DeCoste and J. Weston (Eds.), Large scale kernel machines. MIT Press.Google ScholarGoogle Scholar
  4. Bishop, C. M. (1995). Training with noise is equivalent to tikhonov regularization. Neural Computation, 7, 108--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Doi, E., Balcan, D. C., & Lewicki, M. S. (2006). A theoretical analysis of robust coding over noisy overcomplete channels. In Y. Weiss, B. Schöölkopf and J. Platt (Eds.), Advances in neural information processing systems 18, 307--314. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  6. Doi, E., & Lewicki, M. S. (2007). A theory of retinal population coding. NIPS (pp. 353--360). MIT Press.Google ScholarGoogle Scholar
  7. Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15, 3736--3745. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gallinari, P., LeCun, Y., Thiria, S., & Fogelman-Soulie, F. (1987). Memoires associatives distribuees. Proceedings of COGNITIVA 87. Paris, La Villette.Google ScholarGoogle Scholar
  9. Hammond, D., & Simoncelli, E. (2007). A machine learning framework for adaptive combination of signal denoising methods. 2007 International Conference on Image Processing (pp. VI: 29--32).Google ScholarGoogle ScholarCross RefCross Ref
  10. Hinton, G. (1989). Connectionist learning procedures. Artificial Intelligence, 40, 185--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504--507.Google ScholarGoogle ScholarCross RefCross Ref
  12. Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, USA, 79.Google ScholarGoogle ScholarCross RefCross Ref
  14. Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. Proceedings of the 24th International Conference on Machine Learning (ICML'2007) (pp. 473--480). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. LeCun, Y. (1987). Modèles connexionistes de l'apprentissage. Doctoral dissertation, Université de Paris VI.Google ScholarGoogle Scholar
  16. Lee, H., Ekanadham, C., & Ng, A. (2008). Sparse deep belief net model for visual area V2. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  17. McClelland, J., Rumelhart, D., & the PDP Research Group (1986). Parallel distributed processing: Explorations in the microstructure of cognition, vol. 2. Cambridge: MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Memisevic, R. (2007). Non-linear latent factor models for revealing structure in high-dimensional data. Doctoral dissertation, Departement of Computer Science, University of Toronto, Toronto, Ontario, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ranzato, M., Boureau, Y.-L., & LeCun, Y. (2008). Sparse feature learning for deep belief networks. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  20. Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2007). Efficient learning of sparse representations with an energy-based model. Advances in Neural Information Processing Systems (NIPS 2006). MIT Press.Google ScholarGoogle Scholar
  21. Roth, S., & Black, M. (2005). Fields of experts: a framework for learning image priors. IEEE Conference on Computer Vision and Pattern Recognition (pp. 860--867). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Utgoff, P., & Stracuzzi, D. (2002). Many-layered learning. Neural Computation, 14, 2497--2539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders (Technical Report 1316). Université de Montréal, dept. IRO.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Extracting and composing robust features with denoising autoencoders

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              ICML '08: Proceedings of the 25th international conference on Machine learning
              July 2008
              1310 pages
              ISBN:9781605582054
              DOI:10.1145/1390156

              Copyright © 2008 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 5 July 2008

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate140of548submissions,26%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader