nach oben

Artificial Intelligence Review

Erschienen in:

05.12.2019

A survey of regularization strategies for deep models

verfasst von: Reza Moradi, Reza Berangi, Behrouz Minaei

Erschienen in: Artificial Intelligence Review | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The most critical concern in machine learning is how to make an algorithm that performs well both on training data and new data. No free lunch theorem implies that each specific task needs its own tailored machine learning algorithm to be designed. A set of strategies and preferences are built into learning machines to tune them for the problem at hand. These strategies and preferences, with the core concern of generalization improvement, are collectively known as regularization. In deep learning, because of a considerable number of parameters, a great many forms of regularization methods are available to the deep learning community. Developing more effective regularization strategies has been the subject of significant research efforts in recent years. However, it is difficult for developers to choose the most suitable strategy for their problem at hand, because there is no comparative study regarding the performance of different strategies. In this paper, at the first step, the most effective regularization methods and their variants are presented and analyzed in a systematic approach. At the second step, comparative research on regularization techniques is presented in which the testing errors and computational costs are evaluated in a convolutional neural network, using CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html) dataset. In the end, different regularization methods are compared in terms of accuracy of the network, the number of epochs for the network to be trained and the number of operations per input sample. Also, the results are discussed and interpreted based on the employed strategy. The experiment results showed that weight decay and data augmentation regularizations have little computational side effects so can be used in most applications. In the case of enough computational resources, Dropout family methods are rational to be used. Moreover, in the case of abundant computational resources, batch normalization family and ensemble methods are reasonable strategies to be employed.

Vorheriger Artikel On multi-resident activity recognition in ambient smart-homes

Nächster Artikel An interactive multi-agent reasoning model for sentiment analysis: a case for computational semiotics

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

A learning algorithm that compares new problem instances with instances in the training set.

A model in which a graph expresses the conditional dependence structure between random variables.

Natural Language Processing.

Part-of-speech tagging.

Name entity recognition.

semantic-role labeling.

Dense Convolutional Network.

Long Short-Term Memory.

Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

Ba JL, Kiros JR, Hinton GE (2016) Layer normalization, arXiv:1607.06450

Bartle RG (1995) The elements of integration and Lebesgue measure. Wiley, New YorMATH

Bordes A, Chopra S, Weston J (2014) Question answering with subgraph embeddings. In: Empirical methods in natural language processing

Bouthillier X, Konda K, Vincent P, Memisevic R (2015) Dropout as data augmentation, arXiv:1506.08700

Breiman L (1994) Bagging predictors. Mach Learn, pp 123–140

Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. In: British machine vision

Chen PY, Zhang H, Sharma Y, Yi J, Hsieh CJ (2017) Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM workshop on artificial intelligence and security

Cohen D, Mitra B, Hofmann K, Croft WB (2018) Cross domain regularization for neural ranking models using adversarial learning, arXiv:1805.03403v1

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537MATH

DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout, arXiv:1708.04552, 2017

Domingos P (2000) A unified bias-variance decomposition and its applications. In: International conference on machine learning

Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87

Dong Y, Liao F, Pang T, Su H, Hu X, Li J, Zhu J (2017) Boosting adversarial attacks with momentum, arXiv:1710.06081

Erhan D, Manzagol PA, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: AISTATS

Frazão XF, Alexandre LA (2014) DropAll: generalization of two convolutional neural network regularization methods. In: International conference on image analysis and recognition

Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainity in deep learning. In: Proceedings of the international conference on machine learning

Gastaldi X (2017) Shake–shake regularization, arXiv:1705.07485

Ghahramani Z (2015) Probabilistic machine learning and artificial intelligence. Nature 521:452–459

Gitman I, Ginsburg B (2017) Comparison of batch and weight normalization algorithms for largescale image classification, arXiv:1709.08145

Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: International conference on machine learning

Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. CoRR. arXiv:1412.6572

Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, CambridgeMATH

Graham B (2015) Fractional max-pooling, arXiv:1412.6071

He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification, arXiv:1502.01852

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, arXiv:1512.03385v1

Helmstaedter M, Briggman KL, Turaga SC, Jain V, Seung HS, Denk W (2013) Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature 500:168–174

Henke N, Bughin J, Chui M, Manyika J, Saleh T, Wiseman B, Sethupathy G (2016) The age of analytics: competing in a data-driven world. In: McKinsey Global Institute

Hinton G, Deng L, Yu D, Dahl G, Mohamed AR, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012a) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97

Hinton G, Deng L, Yu D, Dahl G, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012b) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82–97

Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012c) Improving neural networks by preventing co-adaptation of feature detectors, arXiv:1207.0580

Hochreiter S, Schmidhuber J (1995) Simplifying neural nets by discovering flat minima. In: Advances in neural information processing systems, vol 7

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

Huang G, Liu Z, Weinberger KQ, Maaten L (2016a) Densely connected convolutional networks, arXiv:1608.06993

Huang G, Sun Y, Liu Z, Sedra D, Weinberger K (2016b) Deep networks with stochastic depth, arXiv:1603.09382

Huang L, Liu X, Lang B, Yu AW, Wang W, Li B (2017) Orthogonal weight normalization: solution to optimization over multiple dependent stiefel manifolds in deep neural networks, arXiv:1709.06079

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

Jakubovitz D, Giryes R (2018) Improving DNN robustness to adversarial attacks using jacobian regularization. In: European conference on computer vision

Kang G, Li J, Tao D (2016) Shakeout: a new regularized deep neural network training scheme. In: Proceedings of the thirtieth AAAI conference on artificial intelligence

Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems

Krizhevsky A, Sutskever I, Hinton G (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

Laarhoven TV (2017) L2 regularization versus batch and weight normalization. arXiv:1706.05350

Larsson G, Maire M, Shakhnarovich G (2017) FractalNet: ultra-deep neural networks without residuals, arXiv:1605.07648

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444

Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55(2):263–274

Maeda SI (2014) A Bayesian encourages dropout, arXiv:1412.7003

Mash R, Borghetti B, Pecarina J (2016) Improved aircraft recognition for aerial refueling through data augmentation in convolutional neural networks. In: International symposium on visual computing

Moradi R, Berangi R, Minaei B (2019) SparseMaps: convolutional networks with sparse feature maps for tiny image classification. Expert Syst Appl 119:142–154

Morerio P, Cavazza J, Volpi R, Vidal R, Murino V (2017) Curriculum dropout, arXiv:1703.06229

Ng AY (1997) Preventing “overfitting” of cross-validation data. In: International conference on machine learning

Peng H, Mou L, Li G, Chen Y, Lu Y, Jin Z (2015) A comparative study on regularization strategies for embedding-based neural networks. In: Empirical methods in natural language processing

Poole B, Sohl-Dickstein J, Ganguli S (2014) Analyzing noise in autoencoders and deep networks. In: CoRR

Roth K, Lucchi A, Nowozin S, Hofmann T (2018) Adversarially Robust training through structured gradient regularization. arXiv:1805.08736v1

Rozsa A, Rudd EM, Boult TE (2016) Adversarial diversity and hard positive generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops

Salimans T, Kingma DP (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks, arXiv:1602.07868

Sankaranarayanan S, Jain A, Chellappa R, Lim SN (2018) Regularizing deep networks using efficient layerwise adversarial training. In: AAAI conference on artificial intelligence

Santurkar S, Tsipras D, Ilyas A, Madry A (2018) How does batch normalization help optimization? arXiv:1805.11604

Shalev-Shwartz S, Ben-David S (2014) Rademacher complexities. In: Understanding machine learning—from theory to algorithms. Cambridge University Press, Cambridge, pp 325–336

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014a) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNetMATH

Srivastava N, Hinton GE, Krizhevsky A, Sutskever I (2014b) A simple way to prevent neural network to prevent overfitting. Mach Learn Res 15(1):1929–1958MathSciNetMATH

Su J, Vargas DV, Kouichi S (2017) One pixel attack for fooling deep neural networks, arXiv:1710.08864

Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems

Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks, arXiv:1312.6199

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions, arXiv:1409.4842

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision, arXiv:1512.00567

Taylor L, Nitschke G (2017) Improving deep learning using generic data augmentation, arXiv:1708.06020

Wager S, Wang S, Liang PS (2013) Dropout training as adaptive regularizationauthor. In: Advances in neural information processing systems

Wan L, Zeiler M, Zhang S, LeCun Y, Fergus R (2013) Regularization of neural networks using DropConnect. In: ICML, Department of Computer Science, Courant Institute of Mathematical Science, New York University, [Online]. Available: https://cs.nyu.edu/~wanli/dropc/

Wang Q, JaJa J (2013) From maxout to Channel-Out: Encoding information on sparse pathways, arXiv:1312.1909

Wang S, Manning C (2013) Fast dropout training. In: International conference on machine learning

Wen W, Wu C, Wang W, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. In: Advances in neural information processing systems

Wolpert D (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8:1341–1390

Wu Y, He K (2018) Group normalization, arXiv:1803.08494

Xiong HY, Alipanahi B, Lee LJ, Bretschneider H, Merico D, Yuen RKC, Frey BJ (2015) The human splicing code reveals new insights into the genetic determinants of disease. Science 347(6218):1254806

Yu K, Xu W, Gong Y (2009) Deep learning with kernel regularization for visual recognition. In: Advances in neural information processing systems, vol 21

Yuan X, He P, Zhu Q, Bhat RR, Li X (2017) Adversarial examples: attacks and defenses for deep learning, arXiv:1712.07107

Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks, arXiv:1301.3557

Zeiler M, Fergus R (2014) Visualizing and understanding convolutional networks. In: IEEE European conference on computer vision

Zhao Z, Dua D, Singh S (2017) Generating natural adversarial examples, arXiv:1710.11342

Titel: A survey of regularization strategies for deep models
verfasst von: Reza Moradi
Reza Berangi
Behrouz Minaei
Publikationsdatum: 05.12.2019
Verlag: Springer Netherlands
Erschienen in: Artificial Intelligence Review / Ausgabe 6/2020
Print ISSN: 0269-2821
Elektronische ISSN: 1573-7462
DOI: https://doi.org/10.1007/s10462-019-09784-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 6/2020

An implicit opinion analysis model based on feature-based implicit opinion patterns

On multi-resident activity recognition in ambient smart-homes

On hesitant neutrosophic rough set over two universes and its application

Semantic association computation: a comprehensive survey

Intuitionistic 2-tuple linguistic aggregation information based on Einstein operations and their applications in group decision making

A ground truth contest between modularity maximization and modularity density maximization

Premium Partner