Skip to main content
Erschienen in: Soft Computing 2/2021

05.09.2020 | Methodologies and Application

An improved optimization technique using Deep Neural Networks for digit recognition

verfasst von: T. Senthil, C. Rajan, J. Deepika

Erschienen in: Soft Computing | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the world of information retrieval, recognizing hand-written digits stands as an interesting application of machine learning (deep learning). Though this is already a matured field, a way to recognize digits using an effective optimization using soft computing technique is a challenging task. Training such a system with larger data often fails due to higher computation and storage. In this paper, a recurrent deep neural network with hybrid mini-batch and stochastic Hessian-free optimization (MBSHF) is for accurate and faster convergence of predictions as outputs. A second-order approximation is used for achieving better performance for solving quadratic equations which greatly depends on computation and storage. Also, the proposed technique uses an iterative minimization algorithm for faster convergence using a random initialization though huge additional parameters are involved. As a solution, a convex approximation of MBSHF optimization is formulated and its performance on experimenting with the standard MNIST dataset is discussed. A recurrent deep neural network till a depth of 20 layers is successfully trained using the proposed MBSHF optimization, resulting in a better quality performance in computation and storage. The results are compared with other standard optimization techniques like mini-batch stochastic gradient descent (MBSGD), stochastic gradient descent (SGD), stochastic Hessian-free optimization (SHF), Hessian-free optimization (HF), nonlinear conjugate gradient (NCG). The proposed technique produced higher recognition accuracy of 12.2% better than MBSGD, 27.2% better than SHF, 35.4% better than HF, 40.2% better than NCG and 32% better than SGD on an average when applied to 50,000 testing sample size.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Avancha DD, Mudigere S, Vaidynathan D, Sridharan K, Kalamkar S, Kaul D, Dubey B (2016) Distributed deep learning using synchronous stochastic gradient descent. arXiv:1602.06709 Avancha DD, Mudigere S, Vaidynathan D, Sridharan K, Kalamkar S, Kaul D, Dubey B (2016) Distributed deep learning using synchronous stochastic gradient descent. arXiv:​1602.​06709
Zurück zum Zitat Bashar A (2019) Survey on evolving deep learning neural network architectures. J Artif Intell 1(02):73–82 Bashar A (2019) Survey on evolving deep learning neural network architectures. J Artif Intell 1(02):73–82
Zurück zum Zitat Bordes A, Bottou L, Gallinari P (2010) SGD-QN. Careful quasi-newton stochastic gradient descent. J Mach Learn Res 10:1737–1754MathSciNetMATH Bordes A, Bottou L, Gallinari P (2010) SGD-QN. Careful quasi-newton stochastic gradient descent. J Mach Learn Res 10:1737–1754MathSciNetMATH
Zurück zum Zitat Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Prog 134(1):1–29MathSciNetCrossRef Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Prog 134(1):1–29MathSciNetCrossRef
Zurück zum Zitat Chapelle O, Erhan D (2011) Improved preconditioner for hessian free optimization. In: NIPS workshop on deep learning and unsupervised feature learning Chapelle O, Erhan D (2011) Improved preconditioner for hessian free optimization. In: NIPS workshop on deep learning and unsupervised feature learning
Zurück zum Zitat Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2017) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Springer, vol 71, issue (2–3), pp 569–608 Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2017) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Springer, vol 71, issue (2–3), pp 569–608
Zurück zum Zitat Deepika J, Senthil T, Rajan C, Surendar A (2018) Machine learning algorithms: a background artifact. Int J Eng Technol 7:143–149CrossRef Deepika J, Senthil T, Rajan C, Surendar A (2018) Machine learning algorithms: a background artifact. Int J Eng Technol 7:143–149CrossRef
Zurück zum Zitat Deng HG, Yu L, Dahl D, Mohamed GE, Jaitly A, Senior N, Vanhoucke A, Nguyen V, Sainath P (2012) Deep neural networks for acoustic modeling in speech recognition. Signal Process Magz 29:82–97 Deng HG, Yu L, Dahl D, Mohamed GE, Jaitly A, Senior N, Vanhoucke A, Nguyen V, Sainath P (2012) Deep neural networks for acoustic modeling in speech recognition. Signal Process Magz 29:82–97
Zurück zum Zitat Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660MathSciNetMATH Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660MathSciNetMATH
Zurück zum Zitat Goodfellow I, Le QV, Saxe A, Lee H, Ng AY (2010) Measuring invariances in deep network. Adv Neural Inform Process Syst 22:646–654 Goodfellow I, Le QV, Saxe A, Lee H, Ng AY (2010) Measuring invariances in deep network. Adv Neural Inform Process Syst 22:646–654
Zurück zum Zitat Graves A (2012) Supervised sequence labeling with recurrent neural networks. Stud Comput Intell 385:5–13MathSciNetMATH Graves A (2012) Supervised sequence labeling with recurrent neural networks. Stud Comput Intell 385:5–13MathSciNetMATH
Zurück zum Zitat He X, Mudigere D, Smelyanskiy M, Takac M (2017) Distributed Hessian-free optimization for deep neural network. In: Association for the advancement of artificial intelligence, pp 1–8 He X, Mudigere D, Smelyanskiy M, Takac M (2017) Distributed Hessian-free optimization for deep neural network. In: Association for the advancement of artificial intelligence, pp 1–8
Zurück zum Zitat Jaitly N, Nguyen P, Senior A, Vanhoucke V (2012) Application of pretrained deep neural networks to large vocabulary speech recognition. In: Proceedings of Interspeech Jaitly N, Nguyen P, Senior A, Vanhoucke V (2012) Application of pretrained deep neural networks to large vocabulary speech recognition. In: Proceedings of Interspeech
Zurück zum Zitat Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR
Zurück zum Zitat Le Q, Ngiam J, Coates A, Lahiri A, Prochnow B, Ng A (2011) On optimization methods for deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 265–272 Le Q, Ngiam J, Coates A, Lahiri A, Prochnow B, Ng A (2011) On optimization methods for deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 265–272
Zurück zum Zitat LeCun YB, Orr LG, Muller KG, Muller K (1998) Efficient backprop on neural networks, Springer, vol 2, pp 918–92 LeCun YB, Orr LG, Muller KG, Muller K (1998) Efficient backprop on neural networks, Springer, vol 2, pp 918–92
Zurück zum Zitat Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR
Zurück zum Zitat Martens J, Sutskever I (2012) Training deep and recurrent networks with hessian-free optimization. Neural Netw 2:479–535 Martens J, Sutskever I (2012) Training deep and recurrent networks with hessian-free optimization. Neural Netw 2:479–535
Zurück zum Zitat Martens J, Sutskever I (2016) Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the 28th international conference on machine learning, pp 1033–1040 Martens J, Sutskever I (2016) Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the 28th international conference on machine learning, pp 1033–1040
Zurück zum Zitat Martens J, Bottou L, Littman M (2010) Deep learning via hessian-free optimization. In: Proceedings of the Twenty-seventh international conference on machine learning, vol 10, pp 735–742 Martens J, Bottou L, Littman M (2010) Deep learning via hessian-free optimization. In: Proceedings of the Twenty-seventh international conference on machine learning, vol 10, pp 735–742
Zurück zum Zitat Nocedal J, Wright S (2006) Numerical optimization. Springer, BerlinMATH Nocedal J, Wright S (2006) Numerical optimization. Springer, BerlinMATH
Zurück zum Zitat Nocedal J, Wright SJ, Mikosch TV, Resnick SI, Robinson SM (1999) Numerical optimization, Springer series in operations research and financial engineering Nocedal J, Wright SJ, Mikosch TV, Resnick SI, Robinson SM (1999) Numerical optimization, Springer series in operations research and financial engineering
Zurück zum Zitat Pascanu R, Bengio Y (2014) Revisiting natural gradient for deep networks. In: International conference on learning representations Pascanu R, Bengio Y (2014) Revisiting natural gradient for deep networks. In: International conference on learning representations
Zurück zum Zitat Pearlmutter BA (1994) Fast exact multiplication by the hessian. Neural Comput 6(1):147–160CrossRef Pearlmutter BA (1994) Fast exact multiplication by the hessian. Neural Comput 6(1):147–160CrossRef
Zurück zum Zitat Ren S, He K, Girshick R, Sun J (2015) Faster, R-CNN-Towards real-time object detection with region proposal networks. In: NIPS Ren S, He K, Girshick R, Sun J (2015) Faster, R-CNN-Towards real-time object detection with region proposal networks. In: NIPS
Zurück zum Zitat Sainath TN, Kingsbury B, Ramabhadran B, Fousek P, Novak P, Mohamed A-R (2011) Making deep belief networks effective for large vocabulary continuous speech recognition. In: IEEE workshop on automatic speech recognition and understanding (ASRU). pp 30–35 Sainath TN, Kingsbury B, Ramabhadran B, Fousek P, Novak P, Mohamed A-R (2011) Making deep belief networks effective for large vocabulary continuous speech recognition. In: IEEE workshop on automatic speech recognition and understanding (ASRU). pp 30–35
Zurück zum Zitat Schraudolph N (2002) Fast curvature matrix-vector products for second-order gradient descent. Neural Comput 14(7):1723–1738CrossRef Schraudolph N (2002) Fast curvature matrix-vector products for second-order gradient descent. Neural Comput 14(7):1723–1738CrossRef
Zurück zum Zitat Seide F, Fu H, Droppo J, Li G, Yu D (2014) On parallelizability of stochastic gradient descent for speech DNNS. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 235–239 Seide F, Fu H, Droppo J, Li G, Yu D (2014) On parallelizability of stochastic gradient descent for speech DNNS. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 235–239
Zurück zum Zitat Shakya S (2019) Machine learning based nonlinearity determination for optical fiber communication-review. J Ubiquit Comput Commun Technol (UCCT) 1(02):121–127 Shakya S (2019) Machine learning based nonlinearity determination for optical fiber communication-review. J Ubiquit Comput Commun Technol (UCCT) 1(02):121–127
Zurück zum Zitat Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Published as a conference paper at ICLR, pp 1–12 Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Published as a conference paper at ICLR, pp 1–12
Zurück zum Zitat Szegedy C, Liu W, YangqingJia PS, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc IEEE Conf Comput Vis Pattern Recognit 1:1–9 Szegedy C, Liu W, YangqingJia PS, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc IEEE Conf Comput Vis Pattern Recognit 1:1–9
Zurück zum Zitat Wiesler S, Li J, Xue J (2013) Investigations on hessian free optimization for cross-entropy training of deep neural network. In: INTERSPEECH, pp 3317–3321 Wiesler S, Li J, Xue J (2013) Investigations on hessian free optimization for cross-entropy training of deep neural network. In: INTERSPEECH, pp 3317–3321
Zurück zum Zitat Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR
Zurück zum Zitat Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) The microsoft 2016 conversational speech recognition system. arXiv:1609.03528 Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) The microsoft 2016 conversational speech recognition system. arXiv:​1609.​03528
Metadaten
Titel
An improved optimization technique using Deep Neural Networks for digit recognition
verfasst von
T. Senthil
C. Rajan
J. Deepika
Publikationsdatum
05.09.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 2/2021
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-020-05262-3

Weitere Artikel der Ausgabe 2/2021

Soft Computing 2/2021 Zur Ausgabe

Premium Partner