Skip to main content
Erschienen in: Artificial Intelligence Review 6/2020

05.12.2019

A survey of regularization strategies for deep models

verfasst von: Reza Moradi, Reza Berangi, Behrouz Minaei

Erschienen in: Artificial Intelligence Review | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The most critical concern in machine learning is how to make an algorithm that performs well both on training data and new data. No free lunch theorem implies that each specific task needs its own tailored machine learning algorithm to be designed. A set of strategies and preferences are built into learning machines to tune them for the problem at hand. These strategies and preferences, with the core concern of generalization improvement, are collectively known as regularization. In deep learning, because of a considerable number of parameters, a great many forms of regularization methods are available to the deep learning community. Developing more effective regularization strategies has been the subject of significant research efforts in recent years. However, it is difficult for developers to choose the most suitable strategy for their problem at hand, because there is no comparative study regarding the performance of different strategies. In this paper, at the first step, the most effective regularization methods and their variants are presented and analyzed in a systematic approach. At the second step, comparative research on regularization techniques is presented in which the testing errors and computational costs are evaluated in a convolutional neural network, using CIFAR-10 (https://​www.​cs.​toronto.​edu/​~kriz/​cifar.​html) dataset. In the end, different regularization methods are compared in terms of accuracy of the network, the number of epochs for the network to be trained and the number of operations per input sample. Also, the results are discussed and interpreted based on the employed strategy. The experiment results showed that weight decay and data augmentation regularizations have little computational side effects so can be used in most applications. In the case of enough computational resources, Dropout family methods are rational to be used. Moreover, in the case of abundant computational resources, batch normalization family and ensemble methods are reasonable strategies to be employed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
A learning algorithm that compares new problem instances with instances in the training set.
 
2
A model in which a graph expresses the conditional dependence structure between random variables.
 
3
Natural Language Processing.
 
4
Part-of-speech tagging.
 
5
Name entity recognition.
 
6
semantic-role labeling.
 
7
Dense Convolutional Network.
 
8
Long Short-Term Memory.
 
Literatur
Zurück zum Zitat Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66 Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Zurück zum Zitat Bartle RG (1995) The elements of integration and Lebesgue measure. Wiley, New YorMATH Bartle RG (1995) The elements of integration and Lebesgue measure. Wiley, New YorMATH
Zurück zum Zitat Bordes A, Chopra S, Weston J (2014) Question answering with subgraph embeddings. In: Empirical methods in natural language processing Bordes A, Chopra S, Weston J (2014) Question answering with subgraph embeddings. In: Empirical methods in natural language processing
Zurück zum Zitat Breiman L (1994) Bagging predictors. Mach Learn, pp 123–140 Breiman L (1994) Bagging predictors. Mach Learn, pp 123–140
Zurück zum Zitat Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: British machine vision Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: British machine vision
Zurück zum Zitat Chen PY, Zhang H, Sharma Y, Yi J, Hsieh CJ (2017) Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM workshop on artificial intelligence and security Chen PY, Zhang H, Sharma Y, Yi J, Hsieh CJ (2017) Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM workshop on artificial intelligence and security
Zurück zum Zitat Cohen D, Mitra B, Hofmann K, Croft WB (2018) Cross domain regularization for neural ranking models using adversarial learning, arXiv:1805.03403v1 Cohen D, Mitra B, Hofmann K, Croft WB (2018) Cross domain regularization for neural ranking models using adversarial learning, arXiv:​1805.​03403v1
Zurück zum Zitat Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537MATH Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537MATH
Zurück zum Zitat Domingos P (2000) A unified bias-variance decomposition and its applications. In: International conference on machine learning Domingos P (2000) A unified bias-variance decomposition and its applications. In: International conference on machine learning
Zurück zum Zitat Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87 Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87
Zurück zum Zitat Erhan D, Manzagol PA, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: AISTATS Erhan D, Manzagol PA, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: AISTATS
Zurück zum Zitat Frazão XF, Alexandre LA (2014) DropAll: generalization of two convolutional neural network regularization methods. In: International conference on image analysis and recognition Frazão XF, Alexandre LA (2014) DropAll: generalization of two convolutional neural network regularization methods. In: International conference on image analysis and recognition
Zurück zum Zitat Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainity in deep learning. In: Proceedings of the international conference on machine learning Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainity in deep learning. In: Proceedings of the international conference on machine learning
Zurück zum Zitat Ghahramani Z (2015) Probabilistic machine learning and artificial intelligence. Nature 521:452–459 Ghahramani Z (2015) Probabilistic machine learning and artificial intelligence. Nature 521:452–459
Zurück zum Zitat Gitman I, Ginsburg B (2017) Comparison of batch and weight normalization algorithms for largescale image classification, arXiv:1709.08145 Gitman I, Ginsburg B (2017) Comparison of batch and weight normalization algorithms for largescale image classification, arXiv:​1709.​08145
Zurück zum Zitat Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: International conference on machine learning Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: International conference on machine learning
Zurück zum Zitat Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, CambridgeMATH Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, CambridgeMATH
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, arXiv:1502.01852 He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, arXiv:​1502.​01852
Zurück zum Zitat Helmstaedter M, Briggman KL, Turaga SC, Jain V, Seung HS, Denk W (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature 500:168–174 Helmstaedter M, Briggman KL, Turaga SC, Jain V, Seung HS, Denk W (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature 500:168–174
Zurück zum Zitat Henke N, Bughin J, Chui M, Manyika J, Saleh T, Wiseman B, Sethupathy G (2016) The age of analytics: competing in a data-driven world. In: McKinsey Global Institute Henke N, Bughin J, Chui M, Manyika J, Saleh T, Wiseman B, Sethupathy G (2016) The age of analytics: competing in a data-driven world. In: McKinsey Global Institute
Zurück zum Zitat Hinton G, Deng L, Yu D, Dahl G, Mohamed AR, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012a) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97 Hinton G, Deng L, Yu D, Dahl G, Mohamed AR, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012a) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97
Zurück zum Zitat Hinton G, Deng L, Yu D, Dahl G, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012b) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82–97 Hinton G, Deng L, Yu D, Dahl G, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012b) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82–97
Zurück zum Zitat Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012c) Improving neural networks by preventing co-adaptation of feature detectors, arXiv:1207.0580 Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012c) Improving neural networks by preventing co-adaptation of feature detectors, arXiv:​1207.​0580
Zurück zum Zitat Hochreiter S, Schmidhuber J (1995) Simplifying neural nets by discovering flat minima. In: Advances in neural information processing systems, vol 7 Hochreiter S, Schmidhuber J (1995) Simplifying neural nets by discovering flat minima. In: Advances in neural information processing systems, vol 7
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Zurück zum Zitat Huang L, Liu X, Lang B, Yu AW, Wang W, Li B (2017) Orthogonal weight normalization: solution to optimization over multiple dependent stiefel manifolds in deep neural networks, arXiv:1709.06079 Huang L, Liu X, Lang B, Yu AW, Wang W, Li B (2017) Orthogonal weight normalization: solution to optimization over multiple dependent stiefel manifolds in deep neural networks, arXiv:​1709.​06079
Zurück zum Zitat Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:​1502.​03167
Zurück zum Zitat Jakubovitz D, Giryes R (2018) Improving DNN robustness to adversarial attacks using jacobian regularization. In: European conference on computer vision Jakubovitz D, Giryes R (2018) Improving DNN robustness to adversarial attacks using jacobian regularization. In: European conference on computer vision
Zurück zum Zitat Kang G, Li J, Tao D (2016) Shakeout: a new regularized deep neural network training scheme. In: Proceedings of the thirtieth AAAI conference on artificial intelligence Kang G, Li J, Tao D (2016) Shakeout: a new regularized deep neural network training scheme. In: Proceedings of the thirtieth AAAI conference on artificial intelligence
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton G (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90 Krizhevsky A, Sutskever I, Hinton G (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Zurück zum Zitat LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Zurück zum Zitat Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55(2):263–274 Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55(2):263–274
Zurück zum Zitat Mash R, Borghetti B, Pecarina J (2016) Improved aircraft recognition for aerial refueling through data augmentation in convolutional neural networks. In: International symposium on visual computing Mash R, Borghetti B, Pecarina J (2016) Improved aircraft recognition for aerial refueling through data augmentation in convolutional neural networks. In: International symposium on visual computing
Zurück zum Zitat Moradi R, Berangi R, Minaei B (2019) SparseMaps: convolutional networks with sparse feature maps for tiny image classification. Expert Syst Appl 119:142–154 Moradi R, Berangi R, Minaei B (2019) SparseMaps: convolutional networks with sparse feature maps for tiny image classification. Expert Syst Appl 119:142–154
Zurück zum Zitat Ng AY (1997) Preventing “overfitting” of cross-validation data. In: International conference on machine learning Ng AY (1997) Preventing “overfitting” of cross-validation data. In: International conference on machine learning
Zurück zum Zitat Peng H, Mou L, Li G, Chen Y, Lu Y, Jin Z (2015) A comparative study on regularization strategies for embedding-based neural networks. In: Empirical methods in natural language processing Peng H, Mou L, Li G, Chen Y, Lu Y, Jin Z (2015) A comparative study on regularization strategies for embedding-based neural networks. In: Empirical methods in natural language processing
Zurück zum Zitat Poole B, Sohl-Dickstein J, Ganguli S (2014) Analyzing noise in autoencoders and deep networks. In: CoRR Poole B, Sohl-Dickstein J, Ganguli S (2014) Analyzing noise in autoencoders and deep networks. In: CoRR
Zurück zum Zitat Rozsa A, Rudd EM, Boult TE (2016) Adversarial diversity and hard positive generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops Rozsa A, Rudd EM, Boult TE (2016) Adversarial diversity and hard positive generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops
Zurück zum Zitat Salimans T, Kingma DP (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks, arXiv:1602.07868 Salimans T, Kingma DP (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks, arXiv:​1602.​07868
Zurück zum Zitat Sankaranarayanan S, Jain A, Chellappa R, Lim SN (2018) Regularizing deep networks using efficient layerwise adversarial training. In: AAAI conference on artificial intelligence Sankaranarayanan S, Jain A, Chellappa R, Lim SN (2018) Regularizing deep networks using efficient layerwise adversarial training. In: AAAI conference on artificial intelligence
Zurück zum Zitat Shalev-Shwartz S, Ben-David S (2014) Rademacher complexities. In: Understanding machine learning—from theory to algorithms. Cambridge University Press, Cambridge, pp 325–336 Shalev-Shwartz S, Ben-David S (2014) Rademacher complexities. In: Understanding machine learning—from theory to algorithms. Cambridge University Press, Cambridge, pp 325–336
Zurück zum Zitat Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014a) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetMATH Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014a) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetMATH
Zurück zum Zitat Srivastava N, Hinton GE, Krizhevsky A, Sutskever I (2014b) A simple way to prevent neural network to prevent overfitting. Mach Learn Res 15(1):1929–1958MathSciNetMATH Srivastava N, Hinton GE, Krizhevsky A, Sutskever I (2014b) A simple way to prevent neural network to prevent overfitting. Mach Learn Res 15(1):1929–1958MathSciNetMATH
Zurück zum Zitat Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems
Zurück zum Zitat Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks, arXiv:1312.6199 Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks, arXiv:​1312.​6199
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions, arXiv:1409.4842 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions, arXiv:​1409.​4842
Zurück zum Zitat Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision, arXiv:1512.00567 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision, arXiv:​1512.​00567
Zurück zum Zitat Wager S, Wang S, Liang PS (2013) Dropout training as adaptive regularizationauthor. In: Advances in neural information processing systems Wager S, Wang S, Liang PS (2013) Dropout training as adaptive regularizationauthor. In: Advances in neural information processing systems
Zurück zum Zitat Wan L, Zeiler M, Zhang S, LeCun Y, Fergus R (2013) Regularization of neural networks using DropConnect. In: ICML, Department of Computer Science, Courant Institute of Mathematical Science, New York University, [Online]. Available: https://cs.nyu.edu/~wanli/dropc/ Wan L, Zeiler M, Zhang S, LeCun Y, Fergus R (2013) Regularization of neural networks using DropConnect. In: ICML, Department of Computer Science, Courant Institute of Mathematical Science, New York University, [Online]. Available: https://​cs.​nyu.​edu/​~wanli/​dropc/​
Zurück zum Zitat Wang S, Manning C (2013) Fast dropout training. In: International conference on machine learning Wang S, Manning C (2013) Fast dropout training. In: International conference on machine learning
Zurück zum Zitat Wen W, Wu C, Wang W, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. In: Advances in neural information processing systems Wen W, Wu C, Wang W, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. In: Advances in neural information processing systems
Zurück zum Zitat Wolpert D (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8:1341–1390 Wolpert D (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8:1341–1390
Zurück zum Zitat Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RKC, Frey BJ (2015) The human splicing code reveals new insights into the genetic determinants of disease. Science 347(6218):1254806 Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RKC, Frey BJ (2015) The human splicing code reveals new insights into the genetic determinants of disease. Science 347(6218):1254806
Zurück zum Zitat Yu K, Xu W, Gong Y (2009) Deep learning with kernel regularization for visual recognition. In: Advances in neural information processing systems, vol 21 Yu K, Xu W, Gong Y (2009) Deep learning with kernel regularization for visual recognition. In: Advances in neural information processing systems, vol 21
Zurück zum Zitat Zeiler M, Fergus R (2014) Visualizing and understanding convolutional networks. In: IEEE European conference on computer vision Zeiler M, Fergus R (2014) Visualizing and understanding convolutional networks. In: IEEE European conference on computer vision
Metadaten
Titel
A survey of regularization strategies for deep models
verfasst von
Reza Moradi
Reza Berangi
Behrouz Minaei
Publikationsdatum
05.12.2019
Verlag
Springer Netherlands
Erschienen in
Artificial Intelligence Review / Ausgabe 6/2020
Print ISSN: 0269-2821
Elektronische ISSN: 1573-7462
DOI
https://doi.org/10.1007/s10462-019-09784-7

Weitere Artikel der Ausgabe 6/2020

Artificial Intelligence Review 6/2020 Zur Ausgabe

Premium Partner