nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Efficient Model Averaging for Deep Neural Networks

verfasst von : Michael Opitz, Horst Possegger, Horst Bischof

Erschienen in: Computer Vision – ACCV 2016

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Large neural networks trained on small datasets are increasingly prone to overfitting. Traditional machine learning methods can reduce overfitting by employing bagging or boosting to train several diverse models. For large neural networks, however, this is prohibitively expensive. To address this issue, we propose a method to leverage the benefits of ensembles without explicitely training several expensive neural network models. In contrast to Dropout, to encourage diversity of our sub-networks, we propose to maximize diversity of individual networks with a loss function: DivLoss. We demonstrate the effectiveness of DivLoss on the challenging CIFAR datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Analysis on the Dropout Effect in Convolutional Neural Networks

Nächstes Kapitel Joint Training of Generic CNN-CRF Models with Stochastic Optimization

Breiman, L.: Bagging predictors. Mach. Learn. (ML) 24, 123–140 (1996)MATH

Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. (JCSS) 55, 119–139 (1997)MathSciNetCrossRefMATH

Breiman, L.: Random forests. Mach. Learn. (ML) 45, 5–32 (2001)CrossRefMATH

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (ICLR) (2014)

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15, 1929–1958 (2014)MathSciNetMATH

Cogswell, M., Ahmed, F., Girshick, R.B., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the International Conference on Machine Learning (ICML) (2010)

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

10.

Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the International Conference on Machine Learning (ICML) (2013)

11.

Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the International Conference on Machine Learning (ICML) (2013)

12.

Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

13.

Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J. (eds.) Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)

14.

Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)

15.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. (2015). arXiv:1512.03385

16.

Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. (JMLR) 12, 2121–2159 (2011)MathSciNetMATH

17.

Tieleman, T., Hinton, G.: Lecture 6.5–RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning (2012)

18.

Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). http://arxiv.org/abs/1212.5701

19.

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)

20.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning (ICML) (2015)

21.

Mishkin, D., Matas, J.: All you need is a good in it. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

22.

Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Artificial Intelligence and Statistics Conference (AISTATS) (2010)

23.

Krähenbühl, P., Doersch, C., Donahue, J., Darrell, T.: Data-dependent initializations of convolutional neural networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2016)

24.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

25.

Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics Conference (AISTATS) (2015)

26.

Lin, M., Chen, Q., Yan, S.: Network in network. In: Proceedings of the International Conference on Learning Representations (ICLR) (2014)

27.

Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN) (2011)

28.

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). http://arxiv.org/abs/1503.02531

29.

Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)

30.

Wan, L., Zeiler, M., Zhang, S., Cun, Y.L., Fergus, R.: Regularization of neural networks using dropConnect. In: Proceedings of the International Conference on Machine Learning (ICML), JMLR Workshop and Conference Proceedings (2013)

31.

Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. In: Proceedings of the International Conference on Learning Representations (ICLR) (2013)

32.

Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Netw. 12, 1399–1404 (1999)CrossRef

33.

Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)MATH

34.

Ueda, N., Nakano, R.: Generalization error of ensemble estimators. In: Proceedings of the IEEE International Conference on Neural Networks (ICNN) (1996)

35.

Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images (2009)

36.

The Theano Development Team: Theano: a Python framework for fast computation of mathematical expressions (2016). arXiv.org/abs/1605.02688

Titel: Efficient Model Averaging for Deep Neural Networks
verfasst von: Michael Opitz
Horst Possegger
Horst Bischof
Verlag: Springer International Publishing
Buch: Computer Vision – ACCV 2016
Print ISBN: 978-3-319-54183-9

Electronic ISBN: 978-3-319-54184-6

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-54184-6_13

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner