Top

Neural Processing Letters

Published in:

21-11-2017

A New Conjugate Gradient Method with Smoothing \(L_{1/2} \) Regularization Based on a Modified Secant Equation for Training Neural Networks

Authors: Wenyu Li, Yan Liu, Jie Yang, Wei Wu

Published in: Neural Processing Letters | Issue 2/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Proposed in this paper is a new conjugate gradient method with smoothing \(L_{1/2} \) regularization based on a modified secant equation for training neural networks, where a descent search direction is generated by selecting an adaptive learning rate based on the strong Wolfe conditions. Two adaptive parameters are introduced such that the new training method possesses both quasi-Newton property and sufficient descent property. As shown in the numerical experiments for five benchmark classification problems from UCI repository, compared with the other conjugate gradient training algorithms, the new training algorithm has roughly the same or even better learning capacity, but significantly better generalization capacity and network sparsity. Under mild assumptions, a global convergence result of the proposed training method is also proved.

previous article Density Based Cluster Growing via Dominant Sets

next article Robust Exponential Synchronization for Stochastic Delayed Neural Networks with Reaction–Diffusion Terms and Markovian Jumping Parameters

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, OxfordMATH

Hmich A, Badri A, Sahel A (2011) Automatic speaker identification by using the neural network. In: Proceeding of 2011 IEEE international conference on multimedia computing and systems (ICMCS), pp 1–5

Zhou W, Zurada JM (2010) Competitive layer model of discrete-time recurrent neural networks with LT neurons. Neural Comput 22:2137–2160MathSciNetCrossRef

Li EY (1994) Artificial neural networks and their business applications. Inf Manag 27:303–313CrossRef

Svozil D, Kvasnicka V, Pospichal J (1997) Introduction to multi-layer feed-forward neural networks. Chemom Intell Lab 39(1):43–62CrossRef

Wu W (2003) Computation of neural networks. Higher Education Press, Beijing

Xu ZB, Chang XY, Xu FM, Zhang H (2012) \(L_{1/2}\) regularization: a thresholding representation theory and a fast solver. IEEE Trans Neural Netw Learn Syst 23(7):1013–1027CrossRef

Wu W, Fan QW, Zurada JM, Wang J, Yang DK, Liu Y (2014) Batch gradient method with smoothing \(L_{1/2}\) regularization for training of feedforward neural networks. Neural Netw 50:72–78CrossRef

Reed R (1993) Pruning algorithms a survey. IEEE Trans Neural Netw 4:740–747CrossRef

10.

Sakar A, Mammone RJ (1993) Growing and pruning neural tree networks. IEEE Trans Comput 42(3):291–299CrossRef

11.

Arribas JI, Cid-Sueiro J (2005) A model selection algorithm for a posteriori probability estimation with neural networks. IEEE Trans Neural Netw 16(4):799–809CrossRef

12.

Hinton G (1989) Connectionist learning procedures. Artif Intell 40:185–235CrossRef

13.

Hertz J, Krogh A, Palmer R (1991) Introduction to the theory of neural computation. Addison Wesley, Redwood City

14.

Wang C, Venkatesh SS, Judd JS (1994) Optimal stopping and effective machine complexity in learning. Adv Neural Inf Process Syst 6:303–310

15.

Bishop CM (1995) Regularization and complexity control in feedforward networks. In: Proceedings of international conference on artificial neural networks ICANN’95. EC2 et Cie, pp 141–148

16.

Nowlan SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4(4):473–493CrossRef

17.

Fogel DB (1991) An information criterion for optimal neural network selection. IEEE Trans Neural Netw 2(2):490–7CrossRef

18.

Seghouane AK, Amari SI (2007) The AIC criterion and symmetrizing the Kullback–Leibler divergence. IEEE Trans Neural Netw 18(1):97–106CrossRef

19.

Ishikawa M (1996) Structural learning with forgetting. Neural Netw 9:509–521CrossRef

20.

Mc Loone S, Irwin G (2001) Improving neural network training solutions using regularisation. Neurocomputing 37:71–90CrossRef

21.

Saito K, Nakano R (2000) Second-order learning algorithm with squared penalty term. Neural Comput 12:709–729CrossRef

22.

Wu W, Shao HM, Li ZX (2006) Convergence of batch BP algorithm with penalty for FNN training. In: King I, Wang J, Chan L, Wang DL (eds) Neural information processing. Springer, Berlin, pp 562–569CrossRef

23.

Cid-Sueiro J, Arribas JI, Urbn-Muñoz S, Figueiras-Vidal AR (1999) Cost functions to estimate a posteriori probabilities in multiclass problems. IEEE Trans Neural Netw 10(3):645–656CrossRef

24.

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH

25.

Natarajan BK (1995) Sparse approximate solutions to linear systems. Siam J Sci Comput 24:227–234MathSciNetCrossRef

26.

Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288MathSciNetMATH

27.

Yuan GX, Ho CH, Lin CJ (2012) Recent advances of large-scale linear classification. Proc IEEE 100(9):2584–2603CrossRef

28.

Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Methodol) (Stat Methodol) 67:301–320MathSciNetCrossRef

29.

Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, HeidelbergMATH

30.

Donoho DL (2006) For most large underdetermined systems of linear equations the minimal \(l_1\)-norm solution is also the sparsest solution. Commun Pure Appl Math 59(6):797–829MathSciNetCrossRef

31.

Xu ZB, Zhang H, Wang Y, Chang XY (2010) \(L_{1/2}\) regularizer. Sci China Ser F Inf Sci 52:1–9

32.

Igel C, Hüsken M (2003) Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing 50:105–123CrossRef

33.

Rumelhart DE, McClelland JL, PDP Research Group (1986) Parallel distributed processing: explorations in the microstructure of cognition: psychological and biological models. MIT Press, Cambridge

34.

Vogl TP, Mangis JK, Rigler AK, Zink WT, Alkon DL (1988) Accelerating the convergence of the back-propagation method. Biol Cybern 59:257–263CrossRef

35.

Shao H, Zheng G (2011) Convergence analysis of a back-propagation algorithm with adaptive momentum. Neurocomputing 74:749–752CrossRef

36.

Sun W, Yuan Y (2006) Optimization theory and methods nonlinear programming. Springer, New YorkMATH

37.

Livieris IE, Pintelas P (2013) A new conjugate gradient algorithm for training neural networks based on a modified secant equation. Appl Math Comput 221:491–502MathSciNetMATH

38.

Jiang M, Gielen G, Zhang B, Luo Z (2003) Fast learning algorithms for feedforward neural networks. Appl Intell 18:37–54CrossRef

39.

Battiti R (1989) Accelerated backpropagation learning: two optimization methods. Complex Syst 3(4):331–342MATH

40.

Battiti R (1992) First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Comput 4(2):141–166CrossRef

41.

Johansson EM, Dowla FU, Goodman DM (1991) Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. Int J Neural Syst 2(4):291–301CrossRef

42.

Charalambous C (1992) Conjugate gradient algorithm for efficient training of artificial neural networks. Circuits Device Syst IEE Proc G 139(3):301–310CrossRef

43.

Adeli H, Hung SL (1994) An adaptive conjugate gradient learning algorithm for efficient training of neural networks. Appl Math Comput 62:81–102MATH

44.

Moller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6:525–533CrossRef

45.

Kostopoulos AE, Grapsa TN (2009) Self-scaled conjugate gradient training algorithms. Neurocomputing 72:3000–3019CrossRef

46.

Barzilai J, Borwein JM (1988) Two point step size gradient methods. IMA J Numer Anal 8:141–148MathSciNetCrossRef

47.

Wang J, Wu W, Zurada JM (2011) Deterministic convergence of conjugate gradient method for feedforward neural networks. Neurocomputing 74:2368–2376CrossRef

48.

Dai YH, Yuan Y (1999) A nonlinear conjugate gradient method with a strong global convergence property. Siam J Optim 10:177–182MathSciNetCrossRef

49.

Xu C, Zhang J (2001) A survey of quasi-Newton equations and quasi-Newton methods for optimization. Ann Oper Res 103:213–234MathSciNetCrossRef

50.

Yabe H, Sakaiwa N (2005) A new nonlinear conjugate gradient method for unconstrained optimization. J Oper Res Soc Jpn Keiei Kagaku 48:284–296MathSciNetCrossRef

51.

Li WY, Wu W (2015) A parameter conjugate gradient method based on secant equation for unconstrained optimization. J Inf Comput Sci 12(16):5865–5871CrossRef

52.

Fan Q, Zurada JM, Wu W (2014) Convergence of online gradient method for feedforward neural networks with smoothing \(L_{1/2}\) regularization penalty. Neurocomputing 131:208–216CrossRef

53.

Liu Y, Wu W, Fan Q, Yang D, Wang J (2014) A modified gradient learning algorithm with smoothing \(L_{1/2}\) regularization for Takagi–Sugeno fuzzy models. Neurocomputing 138:229–237CrossRef

54.

Dai YH, Yuan Y (2001) An efficient hybrid conjugate gradient method for unconstrained optimization. Ann Oper Res 103:33–47MathSciNetCrossRef

55.

http://www.cs.toronto.edu/~hinton

56.

Blake C, Merz C (1998) UCI repository of machine leaning database. http://www.ics.uci.edu/mlearn/MLReposito

Title: A New Conjugate Gradient Method with Smoothing Regularization Based on a Modified Secant Equation for Training Neural Networks
Authors: Wenyu Li
Yan Liu
Jie Yang
Wei Wu
Publication date: 21-11-2017
Publisher: Springer US
Published in: Neural Processing Letters / Issue 2/2018
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-017-9737-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2018

Associative Memory Realized by Reconfigurable Coupled Three-Cell CNNs

Time Series Classification in Reservoir- and Model-Space

Ultra-Sparse Classifiers Through Minimizing the VC Dimension in the Empirical Feature Space

Soft-Constrained Neural Networks for Nonparametric Density Estimation

Developmental Network: An Internal Emergent Object Feature Learning

Image Segmentation via Mean Curvature Regularized Mumford-Shah Model and Thresholding