Top

Published in:

2020 | OriginalPaper | Chapter

Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-performing Gradient Descent

Authors : Alper Yegenoglu, Kai Krajsek, Sandra Diaz Pier, Michael Herty

Published in: Machine Learning, Optimization, and Data Science

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The successful training of deep neural networks is dependent on initialization schemes and choice of activation functions. Non-optimally chosen parameter settings lead to the known problem of exploding or vanishing gradients. This issue occurs when gradient descent and backpropagation are applied. For this setting the Ensemble Kalman Filter (EnKF) can be used as an alternative optimizer when training neural networks. The EnKF does not require the explicit calculation of gradients or adjoints and we show this resolves the exploding and vanishing gradient problem. We analyze different parameter initializations, propose a dynamic change in ensembles and compare results to established methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Learning Controllers for Adaptive Spreading of Carbon Fiber Tows

next chapter Effects of Random Seeds on the Accuracy of Convolutional Neural Networks

v.1.2.0.

Code can be found on GitHub: https://github.com/alperyeg/enkf-dnn-lod2020.

Following Pytorch’s nomenclature.

Aanonsen, S.I., Nævdal, G., Oliver, D.S., Reynolds, A.C., Vallès, B., et al.: The ensemble kalman filter in reservoir engineering-a review. Spe J. 14(03), 393–412 (2009)CrossRef

Bengio, Y., Frasconi, P., Simard, P.: The problem of learning long-term dependencies in recurrent networks. In: IEEE International Conference on Neural Networks, vol. 3, pp. 1183–1188 (March 1993). https://doi.org/10.1109/ICNN.1993.298725

Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Networks 5(2), 157–166 (1994). https://doi.org/10.1109/72.279181CrossRef

Bulatov, Y.: notMNIST. Kaggle dataset (February 2018). https://www.kaggle.com/jwjohnson314/notmnist#notMNIST_large

Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015)

Evensen, G.: Data Assimilation. Springer, Berlin (2009). https://doi.org/10.1007/978-3-642-03711-5CrossRefMATH

Evensen, G.: Sequential Data Assimilation, pp. 27–45. Springer, Berlin (2009). https://doi.org/10.1007/978-3-642-03711-5_4CrossRef

Haber, E., Lucka, F., Ruthotto, L.: Never look back - A modified EnKF method and its application to the training of neural networks without back propagation (2018)

Hanin, B.: Which neural net architectures give rise to exploding and vanishing gradients? In: Advances in Neural Information Processing Systems, pp. 582–591 (2018)

10.

Hayou, S., Doucet, A., Rousseau, J.: On the impact of the activation function on deep neural networks training. arXiv preprint arXiv:1902.06853 (2019)

11.

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification (2015)

12.

Herty, M., Visconti, G.: Kinetic methods for inverse problems. Kinetic & Related Models vol. 12, pp. 1109 (2019) 19375093\_2019\_5\_1109

13.

Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J., et al.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (2001)

14.

Iglesias, M.A., Law, K.J.H., Stuart, A.M.: Ensemble kalman methods for inverse problems. Inverse Prob. 29(4), 045001 (2013). https://doi.org/10.1088/0266-5611/29/4/045001MathSciNetCrossRefMATH

15.

Janjic, T., McLaughlin, D., Cohn, S.E., Verlaan, M.: Conservation of mass and preservation of positivity with ensemble-type kalman filter algorithms. Mon. Weather Rev. 142(2), 755–773 (2014)CrossRef

16.

Katzfuss, M., Stroud, J.R., Wikle, C.K.: Understanding the ensemble kalman filter. Am. Stat. 70(4), 350–357 (2016)MathSciNetCrossRef

17.

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

18.

Kovachki, N.B., Stuart, A.M.: Ensemble kalman inversion: a derivative-free technique for machine learning tasks (2018)

19.

Lecun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop (1998)

20.

LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database. AT&T Labs 2, 18 (2010). http://yann.lecun.com/exdb/mnist

21.

Mirikitani, D.T., Nikolaev, N.: Dynamic modeling with ensemble kalman filter trained recurrent neural networks. In: 2008 Seventh International Conference on Machine Learning and Applications, IEEE (2008). https://doi.org/10.1109/icmla.2008.79

22.

Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 807–814 (2010)

23.

Paszke, A., et al.: Automatic differentiation in PyTorch. In: NeurIPS Autodiff Workshop (2017)

24.

Pennington, J., Schoenholz, S., Ganguli, S.: Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. In: Advances in Neural Information Processing Systems, pp. 4785–4795 (2017)

25.

Schillings, C., Stuart, A.M.: Convergence analysis of ensemble Kalman inversion: the linear, noisy case. Appl. Anal. 97(1), 107–123 (2018). https://doi.org/10.1080/00036811.2017.1386784MathSciNetCrossRefMATH

26.

Schoenholz, S.S., Gilmer, J., Ganguli, S., Sohl-Dickstein, J.: Deep information propagation. arXiv preprint arXiv:1611.01232 (2016)

27.

Schwenzer, M., Stemmler, S., Ay, M., Bergs, T., Abel, D.: Ensemble kalman filtering for force model identification in milling. Procedia CIRP 82, 296–301 (2019)CrossRef

28.

Sussillo, D., Abbott, L.: Random walk initialization for training very deep feedforward networks. arXiv preprint arXiv:1412.6558 (2014)

29.

Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp. 1139–1147 (2013)

30.

Xavier Glorot, Y.B.: Understanding the difficulty of training deep feedforward neural networks (2010)

31.

Xie, D., Xiong, J., Pu, S.: All you need is beyond a good Init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185 (2017)

Title: Ensemble Kalman Filter Optimizing Deep Neural Networks: An Alternative Approach to Non-performing Gradient Descent
Authors: Alper Yegenoglu
Kai Krajsek
Sandra Diaz Pier
Michael Herty
Publisher: Springer International Publishing
Book: Machine Learning, Optimization, and Data Science
Print ISBN: 978-3-030-64579-3

Electronic ISBN: 978-3-030-64580-9

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-64580-9_7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner