nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

4. Teaching Deep Learners to Generalize

verfasst von : Charu C. Aggarwal

Erschienen in: Neural Networks and Deep Learning

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Neural networks are powerful learners that have repeatedly proven to be capable of learning complex functions in many domains. However, the great power of neural networks is also their greatest weakness; neural networks often simply overfit the training data if care is not taken to design the learning process carefully.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Training Deep Neural Networks

Nächstes Kapitel Radial Basis Function Networks

Computational errors can be ignored by requiring that | w _i | should be at least 10⁻⁶ in order for w _i to be considered truly non-zero.

[5]

C. Aggarwal. Outlier analysis. Springer, 2017.

[31]

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. NIPS Conference, 19, 153, 2007.

[32]

Y. Bengio, N. Le Roux, P. Vincent, O. Delalleau, and P. Marcotte. Convex neural networks. NIPS Conference, pp. 123–130, 2005.

[33]

Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. ICML Conference, 2009.

[34]

Y. Bengio, L. Yao, G. Alain, and P. Vincent. Generalized denoising auto-encoders as generative models. NIPS Conference, pp. 899–907, 2013.

[44]

C. M. Bishop. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1),pp. 108–116, 1995.CrossRef

[50]

L. Breiman. Bagging predictors. Machine Learning, 24(2), pp. 123–140, 1996.MathSciNetMATH

[56]

P. Bühlmann and B. Yu. Analyzing bagging. Annals of Statistics, pp. 927–961, 2002.

[58]

Y. Burda, R. Grosse, and R. Salakhutdinov. Importance weighted autoencoders. arXiv:1509.00519, 2015.https://arxiv.org/abs/1509.00519

[63]

N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, pp. 321–357, 2002.CrossRef

[64]

J. Chen, S. Sathe, C. Aggarwal, and D. Turaga. Outlier detection with autoencoder ensembles. SIAM Conference on Data Mining, 2017.

[67]

Y. Chen and M. Zaki. KATE: K-Competitive Autoencoder for Text. ACM KDD Conference, 2017.

[106]

C. Doersch. Tutorial on variational autoencoders. arXiv:1606.05908, 2016.https://arxiv.org/abs/1606.05908

[107]

H. Drucker and Y. LeCun. Improving generalization performance using double backpropagation. IEEE Transactions on Neural Networks, 3(6), pp. 991–997, 1992.CrossRef

[112]

J. Elman. Learning and development in neural networks: The importance of starting small. Cognition, 48, pp. 781–799, 1993.CrossRef

[113]

D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning?. Journal of Machine Learning Research, 11, pp. 625–660, 2010.MathSciNetMATH

[122]

Y. Freund and R. Schapire. A decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, pp. 23–37, 1995.

[161]

K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771, 2016.https://arxiv.org/abs/1612.07771

[170]

L. K. Hansen and P. Salamon. Neural network ensembles. IEEE TPAMI, 12(10), pp. 993–1001, 1990.CrossRef

[175]

B. Hassibi and D. Stork. Second order derivatives for network pruning: Optimal brain surgeon. NIPS Conference, 1993.

[177]

T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.

[179]

T. Hastie, R. Tibshirani, and M. Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC Press, 2015.

[184]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.

[192]

G. Hinton. To recognize shapes, first learn to generate images. Progress in Brain Research, 165, pp. 535–547, 2007.CrossRef

[196]

G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7), pp. 1527–1554, 2006.MathSciNetCrossRef

[201]

G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012.https://arxiv.org/abs/1207.0580

[204]

S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.CrossRef

[238]

F. Khan, B. Mutlu, and X. Zhu. How do humans teach: On curriculum learning and teaching dimension. NIPS Conference, pp. 1449–1457, 2011.

[242]

D. Kingma and M. Welling. Auto-encoding variational bayes. arXiv:1312.6114, 2013.https://arxiv.org/abs/1312.6114

[244]

S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220, pp. 671–680, 1983.MathSciNetCrossRef

[247]

R. Kohavi and D. Wolpert. Bias plus variance decomposition for zero-one loss functions. ICML Conference, 1996.

[252]

E. Kong and T. Dietterich. Error-correcting output coding corrects bias and variance. ICML Conference, pp. 313–321, 1995.

[255]

A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp. 1097–1105. 2012.

[273]

Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng, On optimization methods for deep learning. ICML Conference, pp. 265–272, 2011.

[274]

Q. Le, W. Zou, S. Yeung, and A. Ng. Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis. CVPR Conference, 2011.

[282]

Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. NIPS Conference, pp. 598–605, 1990.

[284]

H. Lee, C. Ekanadham, and A. Ng. Sparse deep belief net model for visual area V2. NIPS Conference, 2008.

[303]

J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 55(2), pp. 263–274, 2015.CrossRef

[311]

A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. arXiv:1511.05644, 2015.https://arxiv.org/abs/1511.05644

[339]

H. Mobahi and J. Fisher. A theoretical analysis of optimization by Gaussian continuation. AAAI Conference, 2015.

[354]

A. Ng. Sparse autoencoder. CS294A Lecture notes, 2011. https://nlp.stanford.edu/~socherr/sparseAutoencoder_2011new.pdf https://web.stanford.edu/class/cs294a/sparseAutoencoder_2011new.pdf

[360]

S. Nowlan and G. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), pp. 473–493, 1992.CrossRef

[382]

B. Poole, J. Sohl-Dickstein, and S. Ganguli. Analyzing noise in autoencoders and deep networks. arXiv:1406.1831, 2014.https://arxiv.org/abs/1406.1831

[386]

M.’ A. Ranzato, Y-L. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. NIPS Conference, pp. 1185–1192, 2008.

[388]

A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semi-supervised learning with ladder networks. NIPS Conference, pp. 3546–3554, 2015.

[392]

S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. ICML Conference, pp. 1060–1069, 2016.

[397]

S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. ICML Conference, pp. 833–840, 2011.

[398]

S. Rifai, Y. Dauphin, P. Vincent, Y. Bengio, and X. Muller. The manifold tangent classifier. NIPS Conference, pp. 2294–2302, 2011.

[399]

D. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. arXiv:1401.4082, 2014.https://arxiv.org/abs/1401.4082

[422]

T Sanger. Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Transactions on Robotics and Automation, 10(3), 1994.

[435]

H. Schwenk and Y. Bengio. Boosting neural networks. Neural Computation, 12(8), pp. 1869–1887, 2000.CrossRef

[438]

G. Seni and J. Elder. Ensemble methods in data mining: Improving accuracy through combining predictions. Morgan and Claypool, 2010.

[450]

J. Sietsma and R. Dow. Creating artificial neural networks that generalize. Neural Networks, 4(1), pp. 67–79, 1991.CrossRef

[463]

K. Sohn, H. Lee, and X. Yan. Learning structured output representation using deep conditional generative models. NIPS Conference, 2015.

[464]

R. Solomonoff. A system for incremental learning based on algorithmic probability. Sixth Israeli Conference on Artificial Intelligence, Computer Vision and Pattern Recognition, pp. 515–527, 1994.

[467]

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), pp. 1929–1958, 2014.MathSciNetMATH

[470]

R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv:1505.00387, 2015.https://arxiv.org/abs/1505.00387

[472]

F. Strub and J. Mary. Collaborative filtering with stacked denoising autoencoders and sparse inputs. NIPS Workshop on Machine Learning for eCommerce, 2015.

[499]

A. Tikhonov and V. Arsenin. Solution of ill-posed problems. Winston and Sons, 1977.

[502]

H. Valpola. From neural PCA to deep unsupervised learning. Advances in Independent Component Analysis and Learning Machines, pp. 143–171, Elsevier, 2015.

[506]

P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. ICML Confererence, pp. 1096–1103, 2008.

[510]

J. Walker, C. Doersch, A. Gupta, and M. Hebert. An uncertain future: Forecasting from static images using variational autoencoders. European Conference on Computer Vision, pp. 835–851, 2016.

[511]

L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. Regularization of neural networks using dropconnect. ICML Conference, pp. 1058–1066, 2013.

[515]

S. Wang, C. Aggarwal, and H. Liu. Using a random forest to inspire a neural network and improving on it. SIAM Conference on Data Mining, 2017.

[535]

Y. Wu, C. DuBois, A. Zheng, and M. Ester. Collaborative denoising auto-encoders for top-n recommender systems. Web Search and Data Mining, pp. 153–162, 2016.

[536]

Z. Wu. Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, pp. 814–836, 1997.MathSciNetCrossRef

[557]

C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generalization. arXiv:1611.03530.https://arxiv.org/abs/1611.03530

[566]

Z.-H. Zhou. Ensemble methods: Foundations and algorithms. CRC Press, 2012.

[567]

Z.-H. Zhou, J. Wu, and W. Tang. Ensembling neural networks: many could be better than all. Artificial Intelligence, 137(1–2), pp. 239–263, 2002.MathSciNetCrossRef

[587]

http://scikit-learn.org/

[595]

https://github.com/caglar/autoencoders

[596]

https://github.com/y0ast

[597]

https://github.com/fastforwardlabs/vae-tf/tree/master

[601]

https://archive.ics.uci.edu/ml/datasets.html

[640]

https://github.com/wiseodd/generative-models

Titel: Teaching Deep Learners to Generalize
verfasst von: Charu C. Aggarwal
Verlag: Springer International Publishing
Buch: Neural Networks and Deep Learning
Print ISBN: 978-3-319-94462-3

Electronic ISBN: 978-3-319-94463-0

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-94463-0_4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner