nach oben

Neural Computing and Applications

Erschienen in:

01.05.2016 | Original Article

Efficient and robust deep learning with Correntropy-induced loss function

verfasst von: Liangjun Chen, Hua Qu, Jihong Zhao, Badong Chen, Jose C. Principe

Erschienen in: Neural Computing and Applications | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Deep learning systems aim at using hierarchical models to learning high-level features from low-level features. The progress in deep learning is great in recent years. The robustness of the learning systems with deep architectures is however rarely studied and needs further investigation. In particular, the mean square error (MSE), a commonly used optimization cost function in deep learning, is rather sensitive to outliers (or impulsive noises). Robust methods are needed to improve the learning performance and immunize the harmful influences caused by outliers which are pervasive in real-world data. In this paper, we propose an efficient and robust deep learning model based on stacked auto-encoders and Correntropy-induced loss function (CLF), called CLF-based stacked auto-encoders (CSAE). CLF as a nonlinear measure of similarity is robust to outliers and can approximate different norms (from \(l_0\) to \(l_2\)) of data. Essentially, CLF is an MSE in reproducing kernel Hilbert space. Different from conventional stacked auto-encoders, which use, in general, the MSE as the reconstruction loss and KL divergence as the sparsity penalty term, the reconstruction loss and sparsity penalty term in CSAE are both built with CLF. The fine-tuning procedure in CSAE is also based on CLF, which can further enhance the learning performance. The excellent and robust performance of the proposed model is confirmed by simulation experiments on MNIST benchmark dataset.

Vorheriger Artikel Intraday dynamic relationships between CSI 300 index futures and spot markets: a high-frequency analysis

Nächster Artikel A wavelet extreme learning machine

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Hinton G, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554MathSciNetCrossRefMATH

Bengio Y et al (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, vol 19 (NIPS06). MIT Press, pp 153–160

Poultney C, Chopra, S, Cun YL (2006) Efficient learning of sparse representations with an energy-based model. In: Advances in neural information processing systems, pp 1137–1144

Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefMATH

Freund Y, Haussler D (1992) Unsupervised learning of distributions on binary vectors using two layer networks. In: Advances in neural information processing systems 4. Morgan Kaufmann, San Mateo, CA, pp 912–919

Mobahi H, Collobert R, Weston J (2009) Deep learning from temporal coherence in video. In: Proceedings of the 26th annual international conference on machine learning. ACM

Weston J et al (2012) Deep learning via semi-supervised embedding. In: Neural networks: tricks of the trade. Springer, Berlin, pp 639–655

Yu W et al (2015) Learning deep representations via extreme learning machines. Neurocomputing 149:308–315CrossRef

Pandey G, Dukkipati A (2014) To go deep or wide in learning? In: Proceedings of the 17th international conference on artificial intelligence and statistics (AISTATS), vol 33. Reykjavik, Iceland

10.

Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127MathSciNetCrossRefMATH

11.

Larochelle H et al (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning. ACM

12.

Boureau Y, Cun YL (2008) Sparse feature learning for deep belief networks. In: Advances in neural information processing systems, pp 1185–1192

13.

Vincent P et al (2008) Extracting and composing robust features with denoising auto-encoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103

14.

Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: International conference on artificial intelligence and statistics

15.

Ahmed A et al (2008) Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. Computer Vision-ECCV 2008. Springer, Berlin, pp 69–82

16.

Ribeiro B, Lopes N (2013) Extreme learning classifier with deep concepts. In: Progress in pattern recognition, image analysis, computer vision, and applications. Springer, Berlin, pp 182–189

17.

Pascal V et al (2010) Stacked denoising auto-encoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetMATH

18.

Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. Adv Neural Inf Process Syst 25:350–358

19.

Martnez AM (2002) Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans Pattern Anal Mach Intell 24(6):748–763CrossRef

20.

Fidler S, Skocaj D, Leonardis A (2006) Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling. IEEE Trans Pattern Anal Mach Intell 28(3):337–350CrossRef

21.

Liu W, Pokharel PP, Principe JC (2006) Correntropy: a localized similarity measure. In: Neural networks, 2006. IJCNN 06. International joint conference on, 2006, pp 4919–4924

22.

Principe JC, Fisher JW III, Xu D (2000) Information theoretic learning. In: Haykin S (ed) Unsupervised adaptive filtering. Wiley, New York, NY

23.

Gunduz A, Principe JC (2009) Correntropy as a novel measure for nonlinearity tests. Signal Process 89(1):14–23CrossRefMATH

24.

Liu W, Pokharel PP, Principe JC (2007) Correntropy: properties and applications in non-Gaussian signal processing. Signal Process IEEE Trans 55(11):5286–5298MathSciNetCrossRef

25.

He R et al (2011) A regularized correntropy framework for robust pattern recognition. Neural Comput 23(8):2074–2100CrossRefMATH

26.

Zhao S, Chen B, Principe JC (2011) Kernel adaptive filtering with maximum correntropy criterion. In: Proceedings of the international joint conference neural networks (IJCNN), pp 2012–2017

27.

Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Proceedings of the international joint conference neural networks (IJCNN), pp 1–6

28.

Chen B, Xing L, Liang J, Zheng N, Principe JC (2014) Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion. IEEE Signal Process Lett 21(8):880–884

29.

Chen B, Principe JC (2012) Maximum correntropy estimation is a smoothed MAP estimation. IEEE Signal Process Lett 19:491–494CrossRef

30.

Seth S, Principe JC (2008) Compressed signal reconstruction using the correntropy induced metric. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE

31.

Singh A, Principe JC (2010) A loss function for classification based on a robust similarity metric. In: Neural networks (IJCNN), the 2010 international joint conference on. IEEE

32.

Singh A, Pokharel R, Principe J (2014) The C-loss function for pattern classification. Pattern Recognit 47(1):441–453CrossRefMATH

33.

Qi Y, Wang Y, Zheng X et al (2014) Robust feature learning by stacked auto-encoder with maximum correntropy criterion. In: Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE

34.

LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef

35.

Liu W, Principe JC, Haykin S (2010) Kernel adaptive filtering. Wiley, New YorkCrossRef

Titel: Efficient and robust deep learning with Correntropy-induced loss function
verfasst von: Liangjun Chen
Hua Qu
Jihong Zhao
Badong Chen
Jose C. Principe
Publikationsdatum: 01.05.2016
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 4/2016
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-015-1916-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 4/2016

Forecasting stock returns based on information transmission across global markets using support vector machines

Multiplicative neuron model artificial neural network based on Gaussian activation function

Small-world Hopfield neural networks with weight salience priority and memristor synapses for digit recognition

Face recognition using a permutation coding neural classifier

Leaf recognition based on PCNN

An integrated chaotic time series prediction model based on efficient extreme learning machine and differential evolution