Skip to main content
Top

2020 | OriginalPaper | Chapter

Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-performing Gradient Descent

Authors : Alper Yegenoglu, Kai Krajsek, Sandra Diaz Pier, Michael Herty

Published in: Machine Learning, Optimization, and Data Science

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The successful training of deep neural networks is dependent on initialization schemes and choice of activation functions. Non-optimally chosen parameter settings lead to the known problem of exploding or vanishing gradients. This issue occurs when gradient descent and backpropagation are applied. For this setting the Ensemble Kalman Filter (EnKF) can be used as an alternative optimizer when training neural networks. The EnKF does not require the explicit calculation of gradients or adjoints and we show this resolves the exploding and vanishing gradient problem. We analyze different parameter initializations, propose a dynamic change in ensembles and compare results to established methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
v.1.2.0.
 
3
Following Pytorch’s nomenclature.
 
Literature
1.
go back to reference Aanonsen, S.I., Nævdal, G., Oliver, D.S., Reynolds, A.C., Vallès, B., et al.: The ensemble kalman filter in reservoir engineering-a review. Spe J. 14(03), 393–412 (2009)CrossRef Aanonsen, S.I., Nævdal, G., Oliver, D.S., Reynolds, A.C., Vallès, B., et al.: The ensemble kalman filter in reservoir engineering-a review. Spe J. 14(03), 393–412 (2009)CrossRef
5.
go back to reference Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:​1511.​07289 (2015)
8.
go back to reference Haber, E., Lucka, F., Ruthotto, L.: Never look back - A modified EnKF method and its application to the training of neural networks without back propagation (2018) Haber, E., Lucka, F., Ruthotto, L.: Never look back - A modified EnKF method and its application to the training of neural networks without back propagation (2018)
9.
go back to reference Hanin, B.: Which neural net architectures give rise to exploding and vanishing gradients? In: Advances in Neural Information Processing Systems, pp. 582–591 (2018) Hanin, B.: Which neural net architectures give rise to exploding and vanishing gradients? In: Advances in Neural Information Processing Systems, pp. 582–591 (2018)
10.
go back to reference Hayou, S., Doucet, A., Rousseau, J.: On the impact of the activation function on deep neural networks training. arXiv preprint arXiv:1902.06853 (2019) Hayou, S., Doucet, A., Rousseau, J.: On the impact of the activation function on deep neural networks training. arXiv preprint arXiv:​1902.​06853 (2019)
11.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015) He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015)
12.
go back to reference Herty, M., Visconti, G.: Kinetic methods for inverse problems. Kinetic & Related Models vol. 12, pp. 1109 (2019) 19375093\_2019\_5\_1109 Herty, M., Visconti, G.: Kinetic methods for inverse problems. Kinetic & Related Models vol. 12, pp. 1109 (2019) 19375093\_2019\_5\_1109
13.
go back to reference Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J., et al.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (2001) Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J., et al.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (2001)
15.
go back to reference Janjic, T., McLaughlin, D., Cohn, S.E., Verlaan, M.: Conservation of mass and preservation of positivity with ensemble-type kalman filter algorithms. Mon. Weather Rev. 142(2), 755–773 (2014)CrossRef Janjic, T., McLaughlin, D., Cohn, S.E., Verlaan, M.: Conservation of mass and preservation of positivity with ensemble-type kalman filter algorithms. Mon. Weather Rev. 142(2), 755–773 (2014)CrossRef
16.
go back to reference Katzfuss, M., Stroud, J.R., Wikle, C.K.: Understanding the ensemble kalman filter. Am. Stat. 70(4), 350–357 (2016)MathSciNetCrossRef Katzfuss, M., Stroud, J.R., Wikle, C.K.: Understanding the ensemble kalman filter. Am. Stat. 70(4), 350–357 (2016)MathSciNetCrossRef
18.
go back to reference Kovachki, N.B., Stuart, A.M.: Ensemble kalman inversion: a derivative-free technique for machine learning tasks (2018) Kovachki, N.B., Stuart, A.M.: Ensemble kalman inversion: a derivative-free technique for machine learning tasks (2018)
19.
go back to reference Lecun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop (1998) Lecun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop (1998)
22.
go back to reference Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010) Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)
23.
go back to reference Paszke, A., et al.: Automatic differentiation in PyTorch. In: NeurIPS Autodiff Workshop (2017) Paszke, A., et al.: Automatic differentiation in PyTorch. In: NeurIPS Autodiff Workshop (2017)
24.
go back to reference Pennington, J., Schoenholz, S., Ganguli, S.: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. In: Advances in Neural Information Processing Systems, pp. 4785–4795 (2017) Pennington, J., Schoenholz, S., Ganguli, S.: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. In: Advances in Neural Information Processing Systems, pp. 4785–4795 (2017)
26.
27.
go back to reference Schwenzer, M., Stemmler, S., Ay, M., Bergs, T., Abel, D.: Ensemble kalman filtering for force model identification in milling. Procedia CIRP 82, 296–301 (2019)CrossRef Schwenzer, M., Stemmler, S., Ay, M., Bergs, T., Abel, D.: Ensemble kalman filtering for force model identification in milling. Procedia CIRP 82, 296–301 (2019)CrossRef
28.
go back to reference Sussillo, D., Abbott, L.: Random walk initialization for training very deep feedforward networks. arXiv preprint arXiv:1412.6558 (2014) Sussillo, D., Abbott, L.: Random walk initialization for training very deep feedforward networks. arXiv preprint arXiv:​1412.​6558 (2014)
29.
go back to reference Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp. 1139–1147 (2013) Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp. 1139–1147 (2013)
30.
go back to reference Xavier Glorot, Y.B.: Understanding the difficulty of training deep feedforward neural networks (2010) Xavier Glorot, Y.B.: Understanding the difficulty of training deep feedforward neural networks (2010)
31.
go back to reference Xie, D., Xiong, J., Pu, S.: All you need is beyond a good Init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185 (2017) Xie, D., Xiong, J., Pu, S.: All you need is beyond a good Init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185 (2017)
Metadata
Title
Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-performing Gradient Descent
Authors
Alper Yegenoglu
Kai Krajsek
Sandra Diaz Pier
Michael Herty
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-64580-9_7

Premium Partner