Skip to main content
Erschienen in: Neural Computing and Applications 4/2016

01.05.2016 | Original Article

Efficient and robust deep learning with Correntropy-induced loss function

verfasst von: Liangjun Chen, Hua Qu, Jihong Zhao, Badong Chen, Jose C. Principe

Erschienen in: Neural Computing and Applications | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep learning systems aim at using hierarchical models to learning high-level features from low-level features. The progress in deep learning is great in recent years. The robustness of the learning systems with deep architectures is however rarely studied and needs further investigation. In particular, the mean square error (MSE), a commonly used optimization cost function in deep learning, is rather sensitive to outliers (or impulsive noises). Robust methods are needed to improve the learning performance and immunize the harmful influences caused by outliers which are pervasive in real-world data. In this paper, we propose an efficient and robust deep learning model based on stacked auto-encoders and Correntropy-induced loss function (CLF), called CLF-based stacked auto-encoders (CSAE). CLF as a nonlinear measure of similarity is robust to outliers and can approximate different norms (from \(l_0\) to \(l_2\)) of data. Essentially, CLF is an MSE in reproducing kernel Hilbert space. Different from conventional stacked auto-encoders, which use, in general, the MSE as the reconstruction loss and KL divergence as the sparsity penalty term, the reconstruction loss and sparsity penalty term in CSAE are both built with CLF. The fine-tuning procedure in CSAE is also based on CLF, which can further enhance the learning performance. The excellent and robust performance of the proposed model is confirmed by simulation experiments on MNIST benchmark dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
2.
Zurück zum Zitat Bengio Y et al (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, vol 19 (NIPS06). MIT Press, pp 153–160 Bengio Y et al (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, vol 19 (NIPS06). MIT Press, pp 153–160
3.
Zurück zum Zitat Poultney C, Chopra, S, Cun YL (2006) Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems, pp 1137–1144 Poultney C, Chopra, S, Cun YL (2006) Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems, pp 1137–1144
4.
5.
Zurück zum Zitat Freund Y, Haussler D (1992) Unsupervised learning of distributions on binary vectors using two layer networks. In: Advances in neural information processing systems 4. Morgan Kaufmann, San Mateo, CA, pp 912–919 Freund Y, Haussler D (1992) Unsupervised learning of distributions on binary vectors using two layer networks. In: Advances in neural information processing systems 4. Morgan Kaufmann, San Mateo, CA, pp 912–919
6.
Zurück zum Zitat Mobahi H, Collobert R, Weston J (2009) Deep learning from temporal coherence in video. In: Proceedings of the 26th annual international conference on machine learning. ACM Mobahi H, Collobert R, Weston J (2009) Deep learning from temporal coherence in video. In: Proceedings of the 26th annual international conference on machine learning. ACM
7.
Zurück zum Zitat Weston J et al (2012) Deep learning via semi-supervised embedding. In: Neural networks: tricks of the trade. Springer, Berlin, pp 639–655 Weston J et al (2012) Deep learning via semi-supervised embedding. In: Neural networks: tricks of the trade. Springer, Berlin, pp 639–655
8.
Zurück zum Zitat Yu W et al (2015) Learning deep representations via extreme learning machines. Neurocomputing 149:308–315CrossRef Yu W et al (2015) Learning deep representations via extreme learning machines. Neurocomputing 149:308–315CrossRef
9.
Zurück zum Zitat Pandey G, Dukkipati A (2014) To go deep or wide in learning? In: Proceedings of the 17th international conference on artificial intelligence and statistics (AISTATS), vol 33. Reykjavik, Iceland Pandey G, Dukkipati A (2014) To go deep or wide in learning? In: Proceedings of the 17th international conference on artificial intelligence and statistics (AISTATS), vol 33. Reykjavik, Iceland
11.
Zurück zum Zitat Larochelle H et al (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM Larochelle H et al (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM
12.
Zurück zum Zitat Boureau Y, Cun YL (2008) Sparse feature learning for deep belief networks. In: Advances in neural information processing systems, pp 1185–1192 Boureau Y, Cun YL (2008) Sparse feature learning for deep belief networks. In: Advances in neural information processing systems, pp 1185–1192
13.
Zurück zum Zitat Vincent P et al (2008) Extracting and composing robust features with denoising auto-encoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103 Vincent P et al (2008) Extracting and composing robust features with denoising auto-encoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103
14.
Zurück zum Zitat Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: International conference on artificial intelligence and statistics Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: International conference on artificial intelligence and statistics
15.
Zurück zum Zitat Ahmed A et al (2008) Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. Computer Vision-ECCV 2008. Springer, Berlin, pp 69–82 Ahmed A et al (2008) Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. Computer Vision-ECCV 2008. Springer, Berlin, pp 69–82
16.
Zurück zum Zitat Ribeiro B, Lopes N (2013) Extreme learning classifier with deep concepts. In: Progress in pattern recognition, image analysis, computer vision, and applications. Springer, Berlin, pp 182–189 Ribeiro B, Lopes N (2013) Extreme learning classifier with deep concepts. In: Progress in pattern recognition, image analysis, computer vision, and applications. Springer, Berlin, pp 182–189
17.
Zurück zum Zitat Pascal V et al (2010) Stacked denoising auto-encoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetMATH Pascal V et al (2010) Stacked denoising auto-encoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetMATH
18.
Zurück zum Zitat Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. Adv Neural Inf Process Syst 25:350–358 Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. Adv Neural Inf Process Syst 25:350–358
19.
Zurück zum Zitat Martnez AM (2002) Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans Pattern Anal Mach Intell 24(6):748–763CrossRef Martnez AM (2002) Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans Pattern Anal Mach Intell 24(6):748–763CrossRef
20.
Zurück zum Zitat Fidler S, Skocaj D, Leonardis A (2006) Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Trans Pattern Anal Mach Intell 28(3):337–350CrossRef Fidler S, Skocaj D, Leonardis A (2006) Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Trans Pattern Anal Mach Intell 28(3):337–350CrossRef
21.
Zurück zum Zitat Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: Neural networks, 2006. IJCNN 06. International joint conference on, 2006, pp 4919–4924 Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: Neural networks, 2006. IJCNN 06. International joint conference on, 2006, pp 4919–4924
22.
Zurück zum Zitat Principe JC, Fisher JW III, Xu D (2000) Information theoretic learning. In: Haykin S (ed) Unsupervised adaptive filtering. Wiley, New York, NY Principe JC, Fisher JW III, Xu D (2000) Information theoretic learning. In: Haykin S (ed) Unsupervised adaptive filtering. Wiley, New York, NY
23.
Zurück zum Zitat Gunduz A, Principe JC (2009) Correntropy as a novel measure for nonlinearity tests. Signal Process 89(1):14–23CrossRefMATH Gunduz A, Principe JC (2009) Correntropy as a novel measure for nonlinearity tests. Signal Process 89(1):14–23CrossRefMATH
24.
Zurück zum Zitat Liu W, Pokharel PP, Principe JC (2007) Correntropy: properties and applications in non-Gaussian signal processing. Signal Process IEEE Trans 55(11):5286–5298MathSciNetCrossRef Liu W, Pokharel PP, Principe JC (2007) Correntropy: properties and applications in non-Gaussian signal processing. Signal Process IEEE Trans 55(11):5286–5298MathSciNetCrossRef
25.
Zurück zum Zitat He R et al (2011) A regularized correntropy framework for robust pattern recognition. Neural Comput 23(8):2074–2100CrossRefMATH He R et al (2011) A regularized correntropy framework for robust pattern recognition. Neural Comput 23(8):2074–2100CrossRefMATH
26.
Zurück zum Zitat Zhao S, Chen B, Principe JC (2011) Kernel adaptive filtering with maximum correntropy criterion. In: Proceedings of the international joint conference neural networks (IJCNN), pp 2012–2017 Zhao S, Chen B, Principe JC (2011) Kernel adaptive filtering with maximum correntropy criterion. In: Proceedings of the international joint conference neural networks (IJCNN), pp 2012–2017
27.
Zurück zum Zitat Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Proceedings of the international joint conference neural networks (IJCNN), pp 1–6 Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Proceedings of the international joint conference neural networks (IJCNN), pp 1–6
28.
Zurück zum Zitat Chen B, Xing L, Liang J, Zheng N, Principe JC (2014) Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion. IEEE Signal Process Lett 21(8):880–884 Chen B, Xing L, Liang J, Zheng N, Principe JC (2014) Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion. IEEE Signal Process Lett 21(8):880–884
29.
Zurück zum Zitat Chen B, Principe JC (2012) Maximum correntropy estimation is a smoothed MAP estimation. IEEE Signal Process Lett 19:491–494CrossRef Chen B, Principe JC (2012) Maximum correntropy estimation is a smoothed MAP estimation. IEEE Signal Process Lett 19:491–494CrossRef
30.
Zurück zum Zitat Seth S, Principe JC (2008) Compressed signal reconstruction using the correntropy induced metric. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE Seth S, Principe JC (2008) Compressed signal reconstruction using the correntropy induced metric. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE
31.
Zurück zum Zitat Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Neural networks (IJCNN), the 2010 international joint conference on. IEEE Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Neural networks (IJCNN), the 2010 international joint conference on. IEEE
32.
Zurück zum Zitat Singh A, Pokharel R, Principe J (2014) The C-loss function for pattern classification. Pattern Recognit 47(1):441–453CrossRefMATH Singh A, Pokharel R, Principe J (2014) The C-loss function for pattern classification. Pattern Recognit 47(1):441–453CrossRefMATH
33.
Zurück zum Zitat Qi Y, Wang Y, Zheng X et al (2014) Robust feature learning by stacked auto-encoder with maximum correntropy criterion. In: Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE Qi Y, Wang Y, Zheng X et al (2014) Robust feature learning by stacked auto-encoder with maximum correntropy criterion. In: Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE
34.
Zurück zum Zitat LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
35.
Zurück zum Zitat Liu W, Principe JC, Haykin S (2010) Kernel adaptive filtering. Wiley, New YorkCrossRef Liu W, Principe JC, Haykin S (2010) Kernel adaptive filtering. Wiley, New YorkCrossRef
Metadaten
Titel
Efficient and robust deep learning with Correntropy-induced loss function
verfasst von
Liangjun Chen
Hua Qu
Jihong Zhao
Badong Chen
Jose C. Principe
Publikationsdatum
01.05.2016
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 4/2016
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-015-1916-x

Weitere Artikel der Ausgabe 4/2016

Neural Computing and Applications 4/2016 Zur Ausgabe