Skip to main content
Top
Published in: Neural Computing and Applications 6/2019

19-09-2018 | S.I. : EANN 2017

Long-term temporal averaging for stochastic optimization of deep neural networks

Authors: Nikolaos Passalis, Anastasios Tefas

Published in: Neural Computing and Applications | Issue 6/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Deep learning models are capable of successfully tackling several difficult tasks. However, training deep neural models is not always a straightforward task due to several well-known issues, such as the problems of vanishing and exploding gradients. Furthermore, the stochastic nature of most of the used optimization techniques inevitably leads to instabilities during the training process, even when state-of-the-art stochastic optimization techniques are used. In this work, we propose an advanced temporal averaging technique that is capable of stabilizing the convergence of stochastic optimization for neural network training. Six different datasets and evaluation setups are used to extensively evaluate the proposed method and demonstrate the performance benefits. The more stable convergence of the algorithm also reduces the risk of stopping the training process when a bad descent step was taken and the learning rate was not appropriately set.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: Proceedings of the European conference on computer vision, pp 21–37 (2016) Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: Proceedings of the European conference on computer vision, pp 21–37 (2016)
3.
go back to reference Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788 Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
4.
go back to reference Van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. In: 9th ISCA speech synthesis workshop, pp 125–125 Van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: a generative model for raw audio. In: 9th ISCA speech synthesis workshop, pp 125–125
5.
go back to reference LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef
6.
go back to reference Passalis N, Tefas A (2017) Concept detection and face pose estimation using lightweight convolutional neural networks for steering drone video shooting. In: Proceedings of the European signal processing conference, pp 71–75 Passalis N, Tefas A (2017) Concept detection and face pose estimation using lightweight convolutional neural networks for steering drone video shooting. In: Proceedings of the European signal processing conference, pp 71–75
7.
go back to reference Smolyanskiy N, Kamenev A, Smith J, Birchfield S (2017) Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. arXiv preprint arXiv:1705.02550 Smolyanskiy N, Kamenev A, Smith J, Birchfield S (2017) Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. arXiv preprint arXiv:​1705.​02550
8.
go back to reference Arunkumar R, Karthigaikumar P (2017) Multi-retinal disease classification by reduced deep learning features. Neural Comput Appl 28(2):329–334CrossRef Arunkumar R, Karthigaikumar P (2017) Multi-retinal disease classification by reduced deep learning features. Neural Comput Appl 28(2):329–334CrossRef
10.
go back to reference Liu Y, Gadepalli K, Norouzi M, Dahl GE, Kohlberger T, Boyko A, Venugopalan S, Timofeev A, Nelson PQ, Corrado GS et al (2017) Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442 Liu Y, Gadepalli K, Norouzi M, Dahl GE, Kohlberger T, Boyko A, Venugopalan S, Timofeev A, Nelson PQ, Corrado GS et al (2017) Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:​1703.​02442
13.
go back to reference Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166CrossRef Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166CrossRef
14.
go back to reference Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the international conference on machine learning, pp 1139–1147 Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the international conference on machine learning, pp 1139–1147
16.
go back to reference He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034 He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
17.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
18.
go back to reference Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the international conference on machine learning, pp 448–456 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the international conference on machine learning, pp 448–456
21.
go back to reference Moulines E, Bach FR (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Proceedings of the advances in neural information processing systems, pp 451–459 (2011) Moulines E, Bach FR (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Proceedings of the advances in neural information processing systems, pp 451–459 (2011)
22.
23.
go back to reference Ruppert D (1988) Efficient estimations from a slowly convergent Robbins–Monro process. Technical report, Cornell University Operations Research and Industrial Engineering Ruppert D (1988) Efficient estimations from a slowly convergent Robbins–Monro process. Technical report, Cornell University Operations Research and Industrial Engineering
24.
go back to reference Passalis N, Tefas A (2017) Improving face pose estimation using long-term temporal averaging for stochastic optimization. In: Proceedings of the international conference on engineering applications of neural networks, pp 194–204 Passalis N, Tefas A (2017) Improving face pose estimation using long-term temporal averaging for stochastic optimization. In: Proceedings of the international conference on engineering applications of neural networks, pp 194–204
25.
go back to reference Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159MathSciNetMATH Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159MathSciNetMATH
27.
go back to reference Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI conference on artificial intelligence Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI conference on artificial intelligence
28.
go back to reference Anschel O, Baram N, Shimkin N (2017) Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: Proceedings of the international conference on machine learning Anschel O, Baram N, Shimkin N (2017) Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: Proceedings of the international conference on machine learning
29.
go back to reference Haykin S, Network N (2004) A comprehensive foundation. Neural Netw 2(2004):41 Haykin S, Network N (2004) A comprehensive foundation. Neural Netw 2(2004):41
30.
go back to reference Passalis N, Tefas A, Pitas I (2018) Efficient camera control using 2D visual information for unmanned aerial vehicle-based cinematography. In: Proceedings of the IEEE international symposium on circuits and systems, pp 1–5 Passalis N, Tefas A, Pitas I (2018) Efficient camera control using 2D visual information for unmanned aerial vehicle-based cinematography. In: Proceedings of the IEEE international symposium on circuits and systems, pp 1–5
31.
go back to reference Nousi P, Tefas A (2017) Discriminatively trained autoencoders for fast and accurate face recognition. In: Proceedings of the international conference on engineering applications of neural networks, pp 205–215 Nousi P, Tefas A (2017) Discriminatively trained autoencoders for fast and accurate face recognition. In: Proceedings of the international conference on engineering applications of neural networks, pp 205–215
33.
go back to reference Mademlis I et al (2018) Challenges in autonomous UAV cinematography: an overview. In: Proceedings of the IEEE international conference on multimedia and expo Mademlis I et al (2018) Challenges in autonomous UAV cinematography: an overview. In: Proceedings of the IEEE international conference on multimedia and expo
35.
go back to reference LeCun Y, et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
36.
go back to reference Haykin SS, Haykin SS, Haykin SS, Haykin SS (2009) Neural networks and learning machines, vol 3. Pearson, Upper Saddle RiverMATH Haykin SS, Haykin SS, Haykin SS, Haykin SS (2009) Neural networks and learning machines, vol 3. Pearson, Upper Saddle RiverMATH
37.
go back to reference Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH
38.
go back to reference Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
39.
go back to reference Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970CrossRef Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970CrossRef
40.
41.
go back to reference Gourier N, Hall D, Crowley JL (2004) Estimating face orientation from robust detection of salient facial structures. In: FG NET workshop on visual observation of deictic gestures Gourier N, Hall D, Crowley JL (2004) Estimating face orientation from robust detection of salient facial structures. In: FG NET workshop on visual observation of deictic gestures
43.
go back to reference Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: First IEEE international workshop on benchmarking facial image analysis technologies Koestinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: First IEEE international workshop on benchmarking facial image analysis technologies
44.
go back to reference Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the annual meeting of the Association for Computational Linguistics: human language technologies pp 142–150 Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the annual meeting of the Association for Computational Linguistics: human language technologies pp 142–150
45.
go back to reference Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
46.
Metadata
Title
Long-term temporal averaging for stochastic optimization of deep neural networks
Authors
Nikolaos Passalis
Anastasios Tefas
Publication date
19-09-2018
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 6/2019
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3712-x

Other articles of this Issue 6/2019

Neural Computing and Applications 6/2019 Go to the issue

Premium Partner