Top

Neural Computing and Applications

Published in:

19-09-2018 | S.I. : EANN 2017

Long-term temporal averaging for stochastic optimization of deep neural networks

Authors: Nikolaos Passalis, Anastasios Tefas

Published in: Neural Computing and Applications | Issue 6/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Deep learning models are capable of successfully tackling several difficult tasks. However, training deep neural models is not always a straightforward task due to several well-known issues, such as the problems of vanishing and exploding gradients. Furthermore, the stochastic nature of most of the used optimization techniques inevitably leads to instabilities during the training process, even when state-of-the-art stochastic optimization techniques are used. In this work, we propose an advanced temporal averaging technique that is capable of stabilizing the convergence of stochastic optimization for neural network training. Six different datasets and evaluation setups are used to extensively evaluate the proposed method and demonstrate the performance benefits. The more stable convergence of the algorithm also reduces the risk of stopping the training process when a bad descent step was taken and the learning rate was not appropriately set.

previous article Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches

next article Modeling beach realignment using a neuro-fuzzy network optimized by a novel backtracking search algorithm

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y MathSciNetCrossRef

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: Proceedings of the European conference on computer vision, pp 21–37 (2016)

Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

Van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. In: 9th ISCA speech synthesis workshop, pp 125–125

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef

Passalis N, Tefas A (2017) Concept detection and face pose estimation using lightweight convolutional neural networks for steering drone video shooting. In: Proceedings of the European signal processing conference, pp 71–75

Smolyanskiy N, Kamenev A, Smith J, Birchfield S (2017) Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. arXiv preprint arXiv:1705.02550

Arunkumar R, Karthigaikumar P (2017) Multi-retinal disease classification by reduced deep learning features. Neural Comput Appl 28(2):329–334CrossRef

Jiang F, Grigorev A, Rho S, Tian Z, Fu Y, Jifara W, Adil K, Liu S (2018) Medical image semantic segmentation based on deep learning. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3158-6

10.

Liu Y, Gadepalli K, Norouzi M, Dahl GE, Kohlberger T, Boyko A, Venugopalan S, Timofeev A, Nelson PQ, Corrado GS et al (2017) Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442

11.

Wang X, Guo Y, Wang Y, Yu J (2017) Automatic breast tumor detection in ABVS images based on convolutional neural network and superpixel patterns. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3138-x

12.

Yuxin D, Siyi Z (2017) Malware detection based on deep learning algorithm. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3077-6

13.

Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166CrossRef

14.

Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the international conference on machine learning, pp 1139–1147

15.

Farzad A, Mashayekhi H, Hassanpour H (2017) A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Comput Appl. https://doi.org/10.1007/s00521-017-3210-6

16.

He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

17.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

18.

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the international conference on machine learning, pp 448–456

19.

Kingma D, Ba J (2015) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

20.

Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv preprint arXiv:1505.00387

21.

Moulines E, Bach FR (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Proceedings of the advances in neural information processing systems, pp 451–459 (2011)

22.

Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J Control Optim 30(4):838–855MathSciNetCrossRefMATH

23.

Ruppert D (1988) Efficient estimations from a slowly convergent Robbins–Monro process. Technical report, Cornell University Operations Research and Industrial Engineering

24.

Passalis N, Tefas A (2017) Improving face pose estimation using long-term temporal averaging for stochastic optimization. In: Proceedings of the international conference on engineering applications of neural networks, pp 194–204

25.

Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159MathSciNetMATH

26.

Zeiler MD (2012) ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701

27.

Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI conference on artificial intelligence

28.

Anschel O, Baram N, Shimkin N (2017) Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: Proceedings of the international conference on machine learning

29.

Haykin S, Network N (2004) A comprehensive foundation. Neural Netw 2(2004):41

30.

Passalis N, Tefas A, Pitas I (2018) Efficient camera control using 2D visual information for unmanned aerial vehicle-based cinematography. In: Proceedings of the IEEE international symposium on circuits and systems, pp 1–5

31.

Nousi P, Tefas A (2017) Discriminatively trained autoencoders for fast and accurate face recognition. In: Proceedings of the international conference on engineering applications of neural networks, pp 205–215

32.

Nousi P, Tefas A (2018) Self-supervised autoencoders for clustering and classification. Evol Syst. https://doi.org/10.1007/s12530-018-9235-y

33.

Mademlis I et al (2018) Challenges in autonomous UAV cinematography: an overview. In: Proceedings of the IEEE international conference on multimedia and expo

34.

Chollet F et al (2015) Keras. https://keras.io. Accessed 17 Sept 2018

35.

LeCun Y, et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef

36.

Haykin SS, Haykin SS, Haykin SS, Haykin SS (2009) Neural networks and learning machines, vol 3. Pearson, Upper Saddle RiverMATH

37.

Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH

38.

Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto

39.

Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970CrossRef

40.

Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747

41.

Gourier N, Hall D, Crowley JL (2004) Estimating face orientation from robust detection of salient facial structures. In: FG NET workshop on visual observation of deictic gestures

42.

Chollet F et al (2015) Keras. https://github.com/fchollet/keras. Accessed 17 Sept 2018

43.

Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: First IEEE international workshop on benchmarking facial image analysis technologies

44.

Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the annual meeting of the Association for Computational Linguistics: human language technologies pp 142–150

45.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

46.

Gibbons JD, Chakraborti S (2011) Nonparametric statistical inference. Springer, BerlinCrossRefMATH

47.

Jones E, Oliphant E, Peterson P et al (2001) SciPy: open source scientific tools for Python. http://www.scipy.org/. Accessed 31 July 2018

Title: Long-term temporal averaging for stochastic optimization of deep neural networks
Authors: Nikolaos Passalis
Anastasios Tefas
Publication date: 19-09-2018
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 6/2019
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-018-3712-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 6/2019

Cuckoo optimization algorithm in optimal water allocation and crop planning under various weather conditions (case study: Qazvin plain, Iran)

A piecewise weight update rule for a supervised training of cortical algorithms

Customised ensemble methodologies for deep learning: Boosted Residual Networks and related approaches

A two-warehouse inventory model for non-instantaneous deteriorating items with interval-valued inventory costs and stock-dependent demand under inflationary conditions

Harmony search algorithm and combined index-based optimal reallocation of generators in a deregulated power system

A neuroplasticity-inspired neural circuit for acoustic navigation with obstacle avoidance that learns smooth motion paths

Premium Partner