nach oben

Soft Computing

Erschienen in:

05.09.2020 | Methodologies and Application

An improved optimization technique using Deep Neural Networks for digit recognition

verfasst von: T. Senthil, C. Rajan, J. Deepika

Erschienen in: Soft Computing | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In the world of information retrieval, recognizing hand-written digits stands as an interesting application of machine learning (deep learning). Though this is already a matured field, a way to recognize digits using an effective optimization using soft computing technique is a challenging task. Training such a system with larger data often fails due to higher computation and storage. In this paper, a recurrent deep neural network with hybrid mini-batch and stochastic Hessian-free optimization (MBSHF) is for accurate and faster convergence of predictions as outputs. A second-order approximation is used for achieving better performance for solving quadratic equations which greatly depends on computation and storage. Also, the proposed technique uses an iterative minimization algorithm for faster convergence using a random initialization though huge additional parameters are involved. As a solution, a convex approximation of MBSHF optimization is formulated and its performance on experimenting with the standard MNIST dataset is discussed. A recurrent deep neural network till a depth of 20 layers is successfully trained using the proposed MBSHF optimization, resulting in a better quality performance in computation and storage. The results are compared with other standard optimization techniques like mini-batch stochastic gradient descent (MBSGD), stochastic gradient descent (SGD), stochastic Hessian-free optimization (SHF), Hessian-free optimization (HF), nonlinear conjugate gradient (NCG). The proposed technique produced higher recognition accuracy of 12.2% better than MBSGD, 27.2% better than SHF, 35.4% better than HF, 40.2% better than NCG and 32% better than SGD on an average when applied to 50,000 testing sample size.

Vorheriger Artikel A simple numerical scheme for generation of weighting factors for multiobjective optimisation

Nächster Artikel A hybrid method for biometric authentication-oriented face detection using autoregressive model with Bayes Backpropagation Neural Network

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Avancha DD, Mudigere S, Vaidynathan D, Sridharan K, Kalamkar S, Kaul D, Dubey B (2016) Distributed deep learning using synchronous stochastic gradient descent. arXiv:1602.06709

Bashar A (2019) Survey on evolving deep learning neural network architectures. J Artif Intell 1(02):73–82

Bengio Y, Boulanger-Lewandowski N, Pascanu R (2012) Advances in optimizing recurrent networks. arXiv:1212.0901

Bordes A, Bottou L, Gallinari P (2010) SGD-QN. Careful quasi-newton stochastic gradient descent. J Mach Learn Res 10:1737–1754MathSciNetMATH

Bottou L, Curtis FE, Nocedal J (2016) Opt. methods for large-scale machine learning. arXiv:1606.04838

Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Prog 134(1):1–29MathSciNetCrossRef

Chapelle O, Erhan D (2011) Improved preconditioner for hessian free optimization. In: NIPS workshop on deep learning and unsupervised feature learning

Chen J, Pan X, Monga R, Bengio S, Jozefowicz R (2016) Revisiting distributed synchronous SGD, pp 1–12. arXiv:1604.00981

Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2017) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, Springer, vol 71, issue (2–3), pp 569–608

Deepika J, Senthil T, Rajan C, Surendar A (2018) Machine learning algorithms: a background artifact. Int J Eng Technol 7:143–149CrossRef

Deng HG, Yu L, Dahl D, Mohamed GE, Jaitly A, Senior N, Vanhoucke A, Nguyen V, Sainath P (2012) Deep neural networks for acoustic modeling in speech recognition. Signal Process Magz 29:82–97

Erhan D, Bengio Y, Courville A, Manzagol P, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660MathSciNetMATH

Goodfellow I, Le QV, Saxe A, Lee H, Ng AY (2010) Measuring invariances in deep network. Adv Neural Inform Process Syst 22:646–654

Graves A (2012) Supervised sequence labeling with recurrent neural networks. Stud Comput Intell 385:5–13MathSciNetMATH

Han D, Nie H, Chen J, Chen M, Deng Z, Zhang J (2018) Multi-modal haptic image recognition based on deep learning. Sensor Rev. https://doi.org/10.1108/SR-08-2017-0160CrossRef

He X, Mudigere D, Smelyanskiy M, Takac M (2017) Distributed Hessian-free optimization for deep neural network. In: Association for the advancement of artificial intelligence, pp 1–8

Jaitly N, Nguyen P, Senior A, Vanhoucke V (2012) Application of pretrained deep neural networks to large vocabulary speech recognition. In: Proceedings of Interspeech

Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR

Kiros R (2013) Training neural networks with stochastic hessian-free optimization. pp 1–12. arXiv:130.1.3641

Le Q, Ngiam J, Coates A, Lahiri A, Prochnow B, Ng A (2011) On optimization methods for deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 265–272

LeCun YB, Orr LG, Muller KG, Muller K (1998) Efficient backprop on neural networks, Springer, vol 2, pp 918–92

Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR

Martens J, Sutskever I (2012) Training deep and recurrent networks with hessian-free optimization. Neural Netw 2:479–535

Martens J, Sutskever I (2016) Learning recurrent neural networks with hessian-free optimization. In: Proceedings of the 28th international conference on machine learning, pp 1033–1040

Martens J, Bottou L, Littman M (2010) Deep learning via hessian-free optimization. In: Proceedings of the Twenty-seventh international conference on machine learning, vol 10, pp 735–742

Nocedal J, Wright S (2006) Numerical optimization. Springer, BerlinMATH

Nocedal J, Wright SJ, Mikosch TV, Resnick SI, Robinson SM (1999) Numerical optimization, Springer series in operations research and financial engineering

Pascanu R, Bengio Y (2014) Revisiting natural gradient for deep networks. In: International conference on learning representations

Pearlmutter BA (1994) Fast exact multiplication by the hessian. Neural Comput 6(1):147–160CrossRef

Ren S, He K, Girshick R, Sun J (2015) Faster, R-CNN-Towards real-time object detection with region proposal networks. In: NIPS

Sainath TN, Kingsbury B, Ramabhadran B, Fousek P, Novak P, Mohamed A-R (2011) Making deep belief networks effective for large vocabulary continuous speech recognition. In: IEEE workshop on automatic speech recognition and understanding (ASRU). pp 30–35

Schraudolph N (2002) Fast curvature matrix-vector products for second-order gradient descent. Neural Comput 14(7):1723–1738CrossRef

Seide F, Fu H, Droppo J, Li G, Yu D (2014) On parallelizability of stochastic gradient descent for speech DNNS. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 235–239

Shakya S (2019) Machine learning based nonlinearity determination for optical fiber communication-review. J Ubiquit Comput Commun Technol (UCCT) 1(02):121–127

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Published as a conference paper at ICLR, pp 1–12

Szegedy C, Liu W, YangqingJia PS, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc IEEE Conf Comput Vis Pattern Recognit 1:1–9

Wiesler S, Li J, Xue J (2013) Investigations on hessian free optimization for cross-entropy training of deep neural network. In: INTERSPEECH, pp 3317–3321

Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR

Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) The microsoft 2016 conversational speech recognition system. arXiv:1609.03528

Zhang K, Ren X, Sun SJ (2015) Deep residual learning for image recognition. pp 1–12. arXiv:1512.03385

Titel: An improved optimization technique using Deep Neural Networks for digit recognition
verfasst von: T. Senthil
C. Rajan
J. Deepika
Publikationsdatum: 05.09.2020
Verlag: Springer Berlin Heidelberg
Erschienen in: Soft Computing / Ausgabe 2/2021
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-020-05262-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 2/2021

Design and performance analysis of adaptive neuro-fuzzy controller for speed control of permanent magnet synchronous motor drive

Whale optimization and sine–cosine optimization algorithms with cellular topology for parameter identification of chaotic systems and Schottky barrier diode models

Algorithm for determining the mutual impact of nodes in weighted directed graphs

A simple numerical scheme for generation of weighting factors for multiobjective optimisation

Saddle point equilibrium model for uncertain discrete systems

A general approach to fuzzy regression models based on different loss functions

Premium Partner