Skip to main content
Top
Published in: Neural Computing and Applications 5/2016

01-07-2016 | Original Article

Learning a good representation with unsymmetrical auto-encoder

Authors: Yanan Sun, Hua Mao, Quan Guo, Zhang Yi

Published in: Neural Computing and Applications | Issue 5/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Auto-encoders play a fundamental role in unsupervised feature learning and learning initial parameters of deep architectures for supervised tasks. For given input samples, robust features are used to generate robust representations from two perspectives: (1) invariant to small variation of samples and (2) reconstruction by decoders with minimal error. Traditional auto-encoders with different regularization terms have symmetrical numbers of encoder and decoder layers, and sometimes parameters. We investigate the relation between the number of layers and propose an unsymmetrical structure, i.e., an unsymmetrical auto-encoder (UAE), to learn more effective features. We present empirical results of feature learning using the UAE and state-of-the-art auto-encoders for classification tasks with a range of datasets. We also analyze the gradient vanishing problem mathematically and provide suggestions for the appropriate number of layers to use in UAEs with a logistic activation function. In our experiments, UAEs demonstrated superior performance with the same configuration compared to other auto-encoders.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
The MNIST datasets for these problems are available at http://​www.​iro.​umontreal.​ca/​~lisa/​icml2007.
 
2
We used two GPU models: NVIDIA GTX750Ti and GTX780.
 
Literature
1.
go back to reference Baldi P, Hornik K (1989) Neural networks and principal component analysis: learning from examples without local minima. Neural Netw 2(1):53–58CrossRef Baldi P, Hornik K (1989) Neural networks and principal component analysis: learning from examples without local minima. Neural Netw 2(1):53–58CrossRef
2.
go back to reference Baldi P, Pineda F (1991) Contrastive learning and neural oscillations. Neural Comput 3(4):526–545CrossRef Baldi P, Pineda F (1991) Contrastive learning and neural oscillations. Neural Comput 3(4):526–545CrossRef
3.
go back to reference Baum EB, Haussler D (1989) What size net gives valid generalization? Neural Comput 1(1):151–160CrossRef Baum EB, Haussler D (1989) What size net gives valid generalization? Neural Comput 1(1):151–160CrossRef
5.
go back to reference Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. Unsupervised Transf Learn Chall Mach Learn 7:19 Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. Unsupervised Transf Learn Chall Mach Learn 7:19
6.
go back to reference Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153 Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153
7.
go back to reference Doya K (1992) Bifurcations in the learning of recurrent neural networks 3. Learning (RTRL) 3:17 Doya K (1992) Bifurcations in the learning of recurrent neural networks 3. Learning (RTRL) 3:17
8.
go back to reference Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. Dept. IRO, Universit de Montral, Technical Report Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. Dept. IRO, Universit de Montral, Technical Report
9.
go back to reference Goodfellow I, Lee H, Le QV, Saxe A, Ng AY (2009) Measuring invariances in deep networks. In: Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A (eds) Advances in neural information processing systems 22, Curran Associates, Inc., pp 646–654 Goodfellow I, Lee H, Le QV, Saxe A, Ng AY (2009) Measuring invariances in deep networks. In: Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A (eds) Advances in neural information processing systems 22, Curran Associates, Inc., pp 646–654
10.
go back to reference Hinton GE (1987) Learning translation invariant recognition in a massively parallel networks. In: PARLE Parallel Architectures and Languages Europe, vol 1. Springer, Eindhoven, pp 1–13 Hinton GE (1987) Learning translation invariant recognition in a massively parallel networks. In: PARLE Parallel Architectures and Languages Europe, vol 1. Springer, Eindhoven, pp 1–13
11.
12.
go back to reference Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: IEEE 12th international conference on computer vision, 2009. IEEE, pp 2146–2153 Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: IEEE 12th international conference on computer vision, 2009. IEEE, pp 2146–2153
13.
go back to reference Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep
14.
go back to reference Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th annual international conference on machine learning, pp 609–616. ACM Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th annual international conference on machine learning, pp 609–616. ACM
15.
go back to reference Liou CY, Cheng WC, Liou JW, Liou DR (2014) Autoencoder for words. Neurocomputing 139:84–96CrossRef Liou CY, Cheng WC, Liou JW, Liou DR (2014) Autoencoder for words. Neurocomputing 139:84–96CrossRef
16.
go back to reference Liou CY, Huang JC, Yang WC (2008) Modeling word perception using the Elman network. Neurocomputing 71(16):3150–3157CrossRef Liou CY, Huang JC, Yang WC (2008) Modeling word perception using the Elman network. Neurocomputing 71(16):3150–3157CrossRef
17.
go back to reference Moody J, Hanson S, Krogh A, Hertz JA (1995) A simple weight decay can improve generalization. Adv Neural Inf Process Syst 4:950–957 Moody J, Hanson S, Krogh A, Hertz JA (1995) A simple weight decay can improve generalization. Adv Neural Inf Process Syst 4:950–957
18.
go back to reference Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis Res 37(23):3311–3325CrossRef Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis Res 37(23):3311–3325CrossRef
20.
go back to reference Ranzato MA, Boureau Y-L, Cun YL (2008) Sparse feature learning for deep belief networks. In: Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in neural information processing systems 20, Curran Associates, Inc., Red Hook, New York, pp 1185–1192 Ranzato MA, Boureau Y-L, Cun YL (2008) Sparse feature learning for deep belief networks. In: Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in neural information processing systems 20, Curran Associates, Inc., Red Hook, New York, pp 1185–1192
21.
go back to reference Ranzato MA, Poultney C, Chopra S, Cun YL (2007) Efficient learning of sparse representations with an energy-based model. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in neural information processing systems 19, MIT Press, pp 1137–1144 Ranzato MA, Poultney C, Chopra S, Cun YL (2007) Efficient learning of sparse representations with an energy-based model. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in neural information processing systems 19, MIT Press, pp 1137–1144
22.
go back to reference Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: Explicit invariance during feature extraction. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 833–840 Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: Explicit invariance during feature extraction. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 833–840
23.
go back to reference Schwartz D, Samalam V, Solla SA, Denker J (1990) Exhaustive learning. Neural Comput 2(3):374–385CrossRef Schwartz D, Samalam V, Solla SA, Denker J (1990) Exhaustive learning. Neural Comput 2(3):374–385CrossRef
24.
go back to reference Tishby N, Levin E, Solla SA (1989) Consistent inference of probabilities in layered networks: predictions and generalizations. In: International joint conference on neural networks, IJCNN, 1989. IEEE, pp 403–409 Tishby N, Levin E, Solla SA (1989) Consistent inference of probabilities in layered networks: predictions and generalizations. In: International joint conference on neural networks, IJCNN, 1989. IEEE, pp 403–409
25.
go back to reference Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103 Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103
26.
go back to reference Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408 MathSciNetMATH Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408 MathSciNetMATH
Metadata
Title
Learning a good representation with unsymmetrical auto-encoder
Authors
Yanan Sun
Hua Mao
Quan Guo
Zhang Yi
Publication date
01-07-2016
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 5/2016
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-015-1939-3

Other articles of this Issue 5/2016

Neural Computing and Applications 5/2016 Go to the issue

Premium Partner