Skip to main content
Top
Published in: Neural Processing Letters 2/2018

21-11-2017

A New Conjugate Gradient Method with Smoothing \(L_{1/2} \) Regularization Based on a Modified Secant Equation for Training Neural Networks

Authors: Wenyu Li, Yan Liu, Jie Yang, Wei Wu

Published in: Neural Processing Letters | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Proposed in this paper is a new conjugate gradient method with smoothing \(L_{1/2} \) regularization based on a modified secant equation for training neural networks, where a descent search direction is generated by selecting an adaptive learning rate based on the strong Wolfe conditions. Two adaptive parameters are introduced such that the new training method possesses both quasi-Newton property and sufficient descent property. As shown in the numerical experiments for five benchmark classification problems from UCI repository, compared with the other conjugate gradient training algorithms, the new training algorithm has roughly the same or even better learning capacity, but significantly better generalization capacity and network sparsity. Under mild assumptions, a global convergence result of the proposed training method is also proved.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, OxfordMATH Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, OxfordMATH
2.
go back to reference Hmich A, Badri A, Sahel A (2011) Automatic speaker identification by using the neural network. In: Proceeding of 2011 IEEE international conference on multimedia computing and systems (ICMCS), pp 1–5 Hmich A, Badri A, Sahel A (2011) Automatic speaker identification by using the neural network. In: Proceeding of 2011 IEEE international conference on multimedia computing and systems (ICMCS), pp 1–5
3.
go back to reference Zhou W, Zurada JM (2010) Competitive layer model of discrete-time recurrent neural networks with LT neurons. Neural Comput 22:2137–2160MathSciNetCrossRef Zhou W, Zurada JM (2010) Competitive layer model of discrete-time recurrent neural networks with LT neurons. Neural Comput 22:2137–2160MathSciNetCrossRef
4.
go back to reference Li EY (1994) Artificial neural networks and their business applications. Inf Manag 27:303–313CrossRef Li EY (1994) Artificial neural networks and their business applications. Inf Manag 27:303–313CrossRef
5.
go back to reference Svozil D, Kvasnicka V, Pospichal J (1997) Introduction to multi-layer feed-forward neural networks. Chemom Intell Lab 39(1):43–62CrossRef Svozil D, Kvasnicka V, Pospichal J (1997) Introduction to multi-layer feed-forward neural networks. Chemom Intell Lab 39(1):43–62CrossRef
6.
go back to reference Wu W (2003) Computation of neural networks. Higher Education Press, Beijing Wu W (2003) Computation of neural networks. Higher Education Press, Beijing
7.
go back to reference Xu ZB, Chang XY, Xu FM, Zhang H (2012) \(L_{1/2}\) regularization: a thresholding representation theory and a fast solver. IEEE Trans Neural Netw Learn Syst 23(7):1013–1027CrossRef Xu ZB, Chang XY, Xu FM, Zhang H (2012) \(L_{1/2}\) regularization: a thresholding representation theory and a fast solver. IEEE Trans Neural Netw Learn Syst 23(7):1013–1027CrossRef
8.
go back to reference Wu W, Fan QW, Zurada JM, Wang J, Yang DK, Liu Y (2014) Batch gradient method with smoothing \(L_{1/2}\) regularization for training of feedforward neural networks. Neural Netw 50:72–78CrossRef Wu W, Fan QW, Zurada JM, Wang J, Yang DK, Liu Y (2014) Batch gradient method with smoothing \(L_{1/2}\) regularization for training of feedforward neural networks. Neural Netw 50:72–78CrossRef
9.
go back to reference Reed R (1993) Pruning algorithms a survey. IEEE Trans Neural Netw 4:740–747CrossRef Reed R (1993) Pruning algorithms a survey. IEEE Trans Neural Netw 4:740–747CrossRef
10.
go back to reference Sakar A, Mammone RJ (1993) Growing and pruning neural tree networks. IEEE Trans Comput 42(3):291–299CrossRef Sakar A, Mammone RJ (1993) Growing and pruning neural tree networks. IEEE Trans Comput 42(3):291–299CrossRef
11.
go back to reference Arribas JI, Cid-Sueiro J (2005) A model selection algorithm for a posteriori probability estimation with neural networks. IEEE Trans Neural Netw 16(4):799–809CrossRef Arribas JI, Cid-Sueiro J (2005) A model selection algorithm for a posteriori probability estimation with neural networks. IEEE Trans Neural Netw 16(4):799–809CrossRef
12.
go back to reference Hinton G (1989) Connectionist learning procedures. Artif Intell 40:185–235CrossRef Hinton G (1989) Connectionist learning procedures. Artif Intell 40:185–235CrossRef
13.
go back to reference Hertz J, Krogh A, Palmer R (1991) Introduction to the theory of neural computation. Addison Wesley, Redwood City Hertz J, Krogh A, Palmer R (1991) Introduction to the theory of neural computation. Addison Wesley, Redwood City
14.
go back to reference Wang C, Venkatesh SS, Judd JS (1994) Optimal stopping and effective machine complexity in learning. Adv Neural Inf Process Syst 6:303–310 Wang C, Venkatesh SS, Judd JS (1994) Optimal stopping and effective machine complexity in learning. Adv Neural Inf Process Syst 6:303–310
15.
go back to reference Bishop CM (1995) Regularization and complexity control in feedforward networks. In: Proceedings of international conference on artificial neural networks ICANN’95. EC2 et Cie, pp 141–148 Bishop CM (1995) Regularization and complexity control in feedforward networks. In: Proceedings of international conference on artificial neural networks ICANN’95. EC2 et Cie, pp 141–148
16.
go back to reference Nowlan SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4(4):473–493CrossRef Nowlan SJ, Hinton GE (1992) Simplifying neural networks by soft weight-sharing. Neural Comput 4(4):473–493CrossRef
17.
go back to reference Fogel DB (1991) An information criterion for optimal neural network selection. IEEE Trans Neural Netw 2(2):490–7CrossRef Fogel DB (1991) An information criterion for optimal neural network selection. IEEE Trans Neural Netw 2(2):490–7CrossRef
18.
go back to reference Seghouane AK, Amari SI (2007) The AIC criterion and symmetrizing the Kullback–Leibler divergence. IEEE Trans Neural Netw 18(1):97–106CrossRef Seghouane AK, Amari SI (2007) The AIC criterion and symmetrizing the Kullback–Leibler divergence. IEEE Trans Neural Netw 18(1):97–106CrossRef
19.
go back to reference Ishikawa M (1996) Structural learning with forgetting. Neural Netw 9:509–521CrossRef Ishikawa M (1996) Structural learning with forgetting. Neural Netw 9:509–521CrossRef
20.
go back to reference Mc Loone S, Irwin G (2001) Improving neural network training solutions using regularisation. Neurocomputing 37:71–90CrossRef Mc Loone S, Irwin G (2001) Improving neural network training solutions using regularisation. Neurocomputing 37:71–90CrossRef
21.
go back to reference Saito K, Nakano R (2000) Second-order learning algorithm with squared penalty term. Neural Comput 12:709–729CrossRef Saito K, Nakano R (2000) Second-order learning algorithm with squared penalty term. Neural Comput 12:709–729CrossRef
22.
go back to reference Wu W, Shao HM, Li ZX (2006) Convergence of batch BP algorithm with penalty for FNN training. In: King I, Wang J, Chan L, Wang DL (eds) Neural information processing. Springer, Berlin, pp 562–569CrossRef Wu W, Shao HM, Li ZX (2006) Convergence of batch BP algorithm with penalty for FNN training. In: King I, Wang J, Chan L, Wang DL (eds) Neural information processing. Springer, Berlin, pp 562–569CrossRef
23.
go back to reference Cid-Sueiro J, Arribas JI, Urbn-Muñoz S, Figueiras-Vidal AR (1999) Cost functions to estimate a posteriori probabilities in multiclass problems. IEEE Trans Neural Netw 10(3):645–656CrossRef Cid-Sueiro J, Arribas JI, Urbn-Muñoz S, Figueiras-Vidal AR (1999) Cost functions to estimate a posteriori probabilities in multiclass problems. IEEE Trans Neural Netw 10(3):645–656CrossRef
24.
go back to reference Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH
26.
go back to reference Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288MathSciNetMATH
27.
go back to reference Yuan GX, Ho CH, Lin CJ (2012) Recent advances of large-scale linear classification. Proc IEEE 100(9):2584–2603CrossRef Yuan GX, Ho CH, Lin CJ (2012) Recent advances of large-scale linear classification. Proc IEEE 100(9):2584–2603CrossRef
28.
go back to reference Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Methodol) (Stat Methodol) 67:301–320MathSciNetCrossRef Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Methodol) (Stat Methodol) 67:301–320MathSciNetCrossRef
29.
go back to reference Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, HeidelbergMATH Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, HeidelbergMATH
30.
go back to reference Donoho DL (2006) For most large underdetermined systems of linear equations the minimal \(l_1\)-norm solution is also the sparsest solution. Commun Pure Appl Math 59(6):797–829MathSciNetCrossRef Donoho DL (2006) For most large underdetermined systems of linear equations the minimal \(l_1\)-norm solution is also the sparsest solution. Commun Pure Appl Math 59(6):797–829MathSciNetCrossRef
31.
go back to reference Xu ZB, Zhang H, Wang Y, Chang XY (2010) \(L_{1/2}\) regularizer. Sci China Ser F Inf Sci 52:1–9 Xu ZB, Zhang H, Wang Y, Chang XY (2010) \(L_{1/2}\) regularizer. Sci China Ser F Inf Sci 52:1–9
32.
go back to reference Igel C, Hüsken M (2003) Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing 50:105–123CrossRef Igel C, Hüsken M (2003) Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing 50:105–123CrossRef
33.
go back to reference Rumelhart DE, McClelland JL, PDP Research Group (1986) Parallel distributed processing: explorations in the microstructure of cognition: psychological and biological models. MIT Press, Cambridge Rumelhart DE, McClelland JL, PDP Research Group (1986) Parallel distributed processing: explorations in the microstructure of cognition: psychological and biological models. MIT Press, Cambridge
34.
go back to reference Vogl TP, Mangis JK, Rigler AK, Zink WT, Alkon DL (1988) Accelerating the convergence of the back-propagation method. Biol Cybern 59:257–263CrossRef Vogl TP, Mangis JK, Rigler AK, Zink WT, Alkon DL (1988) Accelerating the convergence of the back-propagation method. Biol Cybern 59:257–263CrossRef
35.
go back to reference Shao H, Zheng G (2011) Convergence analysis of a back-propagation algorithm with adaptive momentum. Neurocomputing 74:749–752CrossRef Shao H, Zheng G (2011) Convergence analysis of a back-propagation algorithm with adaptive momentum. Neurocomputing 74:749–752CrossRef
36.
go back to reference Sun W, Yuan Y (2006) Optimization theory and methods nonlinear programming. Springer, New YorkMATH Sun W, Yuan Y (2006) Optimization theory and methods nonlinear programming. Springer, New YorkMATH
37.
go back to reference Livieris IE, Pintelas P (2013) A new conjugate gradient algorithm for training neural networks based on a modified secant equation. Appl Math Comput 221:491–502MathSciNetMATH Livieris IE, Pintelas P (2013) A new conjugate gradient algorithm for training neural networks based on a modified secant equation. Appl Math Comput 221:491–502MathSciNetMATH
38.
go back to reference Jiang M, Gielen G, Zhang B, Luo Z (2003) Fast learning algorithms for feedforward neural networks. Appl Intell 18:37–54CrossRef Jiang M, Gielen G, Zhang B, Luo Z (2003) Fast learning algorithms for feedforward neural networks. Appl Intell 18:37–54CrossRef
39.
go back to reference Battiti R (1989) Accelerated backpropagation learning: two optimization methods. Complex Syst 3(4):331–342MATH Battiti R (1989) Accelerated backpropagation learning: two optimization methods. Complex Syst 3(4):331–342MATH
40.
go back to reference Battiti R (1992) First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Comput 4(2):141–166CrossRef Battiti R (1992) First-and second-order methods for learning: between steepest descent and Newton’s method. Neural Comput 4(2):141–166CrossRef
41.
go back to reference Johansson EM, Dowla FU, Goodman DM (1991) Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. Int J Neural Syst 2(4):291–301CrossRef Johansson EM, Dowla FU, Goodman DM (1991) Backpropagation learning for multilayer feed-forward neural networks using the conjugate gradient method. Int J Neural Syst 2(4):291–301CrossRef
42.
go back to reference Charalambous C (1992) Conjugate gradient algorithm for efficient training of artificial neural networks. Circuits Device Syst IEE Proc G 139(3):301–310CrossRef Charalambous C (1992) Conjugate gradient algorithm for efficient training of artificial neural networks. Circuits Device Syst IEE Proc G 139(3):301–310CrossRef
43.
go back to reference Adeli H, Hung SL (1994) An adaptive conjugate gradient learning algorithm for efficient training of neural networks. Appl Math Comput 62:81–102MATH Adeli H, Hung SL (1994) An adaptive conjugate gradient learning algorithm for efficient training of neural networks. Appl Math Comput 62:81–102MATH
44.
go back to reference Moller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6:525–533CrossRef Moller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6:525–533CrossRef
45.
go back to reference Kostopoulos AE, Grapsa TN (2009) Self-scaled conjugate gradient training algorithms. Neurocomputing 72:3000–3019CrossRef Kostopoulos AE, Grapsa TN (2009) Self-scaled conjugate gradient training algorithms. Neurocomputing 72:3000–3019CrossRef
47.
go back to reference Wang J, Wu W, Zurada JM (2011) Deterministic convergence of conjugate gradient method for feedforward neural networks. Neurocomputing 74:2368–2376CrossRef Wang J, Wu W, Zurada JM (2011) Deterministic convergence of conjugate gradient method for feedforward neural networks. Neurocomputing 74:2368–2376CrossRef
48.
go back to reference Dai YH, Yuan Y (1999) A nonlinear conjugate gradient method with a strong global convergence property. Siam J Optim 10:177–182MathSciNetCrossRef Dai YH, Yuan Y (1999) A nonlinear conjugate gradient method with a strong global convergence property. Siam J Optim 10:177–182MathSciNetCrossRef
49.
go back to reference Xu C, Zhang J (2001) A survey of quasi-Newton equations and quasi-Newton methods for optimization. Ann Oper Res 103:213–234MathSciNetCrossRef Xu C, Zhang J (2001) A survey of quasi-Newton equations and quasi-Newton methods for optimization. Ann Oper Res 103:213–234MathSciNetCrossRef
50.
go back to reference Yabe H, Sakaiwa N (2005) A new nonlinear conjugate gradient method for unconstrained optimization. J Oper Res Soc Jpn Keiei Kagaku 48:284–296MathSciNetCrossRef Yabe H, Sakaiwa N (2005) A new nonlinear conjugate gradient method for unconstrained optimization. J Oper Res Soc Jpn Keiei Kagaku 48:284–296MathSciNetCrossRef
51.
go back to reference Li WY, Wu W (2015) A parameter conjugate gradient method based on secant equation for unconstrained optimization. J Inf Comput Sci 12(16):5865–5871CrossRef Li WY, Wu W (2015) A parameter conjugate gradient method based on secant equation for unconstrained optimization. J Inf Comput Sci 12(16):5865–5871CrossRef
52.
go back to reference Fan Q, Zurada JM, Wu W (2014) Convergence of online gradient method for feedforward neural networks with smoothing \(L_{1/2}\) regularization penalty. Neurocomputing 131:208–216CrossRef Fan Q, Zurada JM, Wu W (2014) Convergence of online gradient method for feedforward neural networks with smoothing \(L_{1/2}\) regularization penalty. Neurocomputing 131:208–216CrossRef
53.
go back to reference Liu Y, Wu W, Fan Q, Yang D, Wang J (2014) A modified gradient learning algorithm with smoothing \(L_{1/2}\) regularization for Takagi–Sugeno fuzzy models. Neurocomputing 138:229–237CrossRef Liu Y, Wu W, Fan Q, Yang D, Wang J (2014) A modified gradient learning algorithm with smoothing \(L_{1/2}\) regularization for Takagi–Sugeno fuzzy models. Neurocomputing 138:229–237CrossRef
54.
go back to reference Dai YH, Yuan Y (2001) An efficient hybrid conjugate gradient method for unconstrained optimization. Ann Oper Res 103:33–47MathSciNetCrossRef Dai YH, Yuan Y (2001) An efficient hybrid conjugate gradient method for unconstrained optimization. Ann Oper Res 103:33–47MathSciNetCrossRef
Metadata
Title
A New Conjugate Gradient Method with Smoothing Regularization Based on a Modified Secant Equation for Training Neural Networks
Authors
Wenyu Li
Yan Liu
Jie Yang
Wei Wu
Publication date
21-11-2017
Publisher
Springer US
Published in
Neural Processing Letters / Issue 2/2018
Print ISSN: 1370-4621
Electronic ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-017-9737-9

Other articles of this Issue 2/2018

Neural Processing Letters 2/2018 Go to the issue