Skip to main content

2018 | OriginalPaper | Buchkapitel

4. Teaching Deep Learners to Generalize

verfasst von : Charu C. Aggarwal

Erschienen in: Neural Networks and Deep Learning

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Neural networks are powerful learners that have repeatedly proven to be capable of learning complex functions in many domains. However, the great power of neural networks is also their greatest weakness; neural networks often simply overfit the training data if care is not taken to design the learning process carefully.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Computational errors can be ignored by requiring that | w i | should be at least 10−6 in order for w i to be considered truly non-zero.
 
Literatur
[5]
Zurück zum Zitat C. Aggarwal. Outlier analysis. Springer, 2017. C. Aggarwal. Outlier analysis. Springer, 2017.
[31]
Zurück zum Zitat Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. NIPS Conference, 19, 153, 2007. Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. NIPS Conference, 19, 153, 2007.
[32]
Zurück zum Zitat Y. Bengio, N. Le Roux, P. Vincent, O. Delalleau, and P. Marcotte. Convex neural networks. NIPS Conference, pp. 123–130, 2005. Y. Bengio, N. Le Roux, P. Vincent, O. Delalleau, and P. Marcotte. Convex neural networks. NIPS Conference, pp. 123–130, 2005.
[33]
Zurück zum Zitat Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. ICML Conference, 2009. Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. ICML Conference, 2009.
[34]
Zurück zum Zitat Y. Bengio, L. Yao, G. Alain, and P. Vincent. Generalized denoising auto-encoders as generative models. NIPS Conference, pp. 899–907, 2013. Y. Bengio, L. Yao, G. Alain, and P. Vincent. Generalized denoising auto-encoders as generative models. NIPS Conference, pp. 899–907, 2013.
[44]
Zurück zum Zitat C. M. Bishop. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1),pp. 108–116, 1995.CrossRef C. M. Bishop. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1),pp. 108–116, 1995.CrossRef
[56]
Zurück zum Zitat P. Bühlmann and B. Yu. Analyzing bagging. Annals of Statistics, pp. 927–961, 2002. P. Bühlmann and B. Yu. Analyzing bagging. Annals of Statistics, pp. 927–961, 2002.
[63]
Zurück zum Zitat N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, pp. 321–357, 2002.CrossRef N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, pp. 321–357, 2002.CrossRef
[64]
Zurück zum Zitat J. Chen, S. Sathe, C. Aggarwal, and D. Turaga. Outlier detection with autoencoder ensembles. SIAM Conference on Data Mining, 2017. J. Chen, S. Sathe, C. Aggarwal, and D. Turaga. Outlier detection with autoencoder ensembles. SIAM Conference on Data Mining, 2017.
[67]
Zurück zum Zitat Y. Chen and M. Zaki. KATE: K-Competitive Autoencoder for Text. ACM KDD Conference, 2017. Y. Chen and M. Zaki. KATE: K-Competitive Autoencoder for Text. ACM KDD Conference, 2017.
[107]
Zurück zum Zitat H. Drucker and Y. LeCun. Improving generalization performance using double backpropagation. IEEE Transactions on Neural Networks, 3(6), pp. 991–997, 1992.CrossRef H. Drucker and Y. LeCun. Improving generalization performance using double backpropagation. IEEE Transactions on Neural Networks, 3(6), pp. 991–997, 1992.CrossRef
[112]
Zurück zum Zitat J. Elman. Learning and development in neural networks: The importance of starting small. Cognition, 48, pp. 781–799, 1993.CrossRef J. Elman. Learning and development in neural networks: The importance of starting small. Cognition, 48, pp. 781–799, 1993.CrossRef
[113]
Zurück zum Zitat D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning?. Journal of Machine Learning Research, 11, pp. 625–660, 2010.MathSciNetMATH D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning?. Journal of Machine Learning Research, 11, pp. 625–660, 2010.MathSciNetMATH
[122]
Zurück zum Zitat Y. Freund and R. Schapire. A decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, pp. 23–37, 1995. Y. Freund and R. Schapire. A decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, pp. 23–37, 1995.
[170]
Zurück zum Zitat L. K. Hansen and P. Salamon. Neural network ensembles. IEEE TPAMI, 12(10), pp. 993–1001, 1990.CrossRef L. K. Hansen and P. Salamon. Neural network ensembles. IEEE TPAMI, 12(10), pp. 993–1001, 1990.CrossRef
[175]
Zurück zum Zitat B. Hassibi and D. Stork. Second order derivatives for network pruning: Optimal brain surgeon. NIPS Conference, 1993. B. Hassibi and D. Stork. Second order derivatives for network pruning: Optimal brain surgeon. NIPS Conference, 1993.
[177]
Zurück zum Zitat T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009. T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.
[179]
Zurück zum Zitat T. Hastie, R. Tibshirani, and M. Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC Press, 2015. T. Hastie, R. Tibshirani, and M. Wainwright. Statistical learning with sparsity: the lasso and generalizations. CRC Press, 2015.
[184]
Zurück zum Zitat K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
[192]
Zurück zum Zitat G. Hinton. To recognize shapes, first learn to generate images. Progress in Brain Research, 165, pp. 535–547, 2007.CrossRef G. Hinton. To recognize shapes, first learn to generate images. Progress in Brain Research, 165, pp. 535–547, 2007.CrossRef
[196]
Zurück zum Zitat G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7), pp. 1527–1554, 2006.MathSciNetCrossRef G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7), pp. 1527–1554, 2006.MathSciNetCrossRef
[201]
[204]
Zurück zum Zitat S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.CrossRef S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp. 1735–1785, 1997.CrossRef
[238]
Zurück zum Zitat F. Khan, B. Mutlu, and X. Zhu. How do humans teach: On curriculum learning and teaching dimension. NIPS Conference, pp. 1449–1457, 2011. F. Khan, B. Mutlu, and X. Zhu. How do humans teach: On curriculum learning and teaching dimension. NIPS Conference, pp. 1449–1457, 2011.
[244]
Zurück zum Zitat S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220, pp. 671–680, 1983.MathSciNetCrossRef S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220, pp. 671–680, 1983.MathSciNetCrossRef
[247]
Zurück zum Zitat R. Kohavi and D. Wolpert. Bias plus variance decomposition for zero-one loss functions. ICML Conference, 1996. R. Kohavi and D. Wolpert. Bias plus variance decomposition for zero-one loss functions. ICML Conference, 1996.
[252]
Zurück zum Zitat E. Kong and T. Dietterich. Error-correcting output coding corrects bias and variance. ICML Conference, pp. 313–321, 1995. E. Kong and T. Dietterich. Error-correcting output coding corrects bias and variance. ICML Conference, pp. 313–321, 1995.
[255]
Zurück zum Zitat A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp. 1097–1105. 2012. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp. 1097–1105. 2012.
[273]
Zurück zum Zitat Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng, On optimization methods for deep learning. ICML Conference, pp. 265–272, 2011. Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng, On optimization methods for deep learning. ICML Conference, pp. 265–272, 2011.
[274]
Zurück zum Zitat Q. Le, W. Zou, S. Yeung, and A. Ng. Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis. CVPR Conference, 2011. Q. Le, W. Zou, S. Yeung, and A. Ng. Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis. CVPR Conference, 2011.
[282]
Zurück zum Zitat Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. NIPS Conference, pp. 598–605, 1990. Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. NIPS Conference, pp. 598–605, 1990.
[284]
Zurück zum Zitat H. Lee, C. Ekanadham, and A. Ng. Sparse deep belief net model for visual area V2. NIPS Conference, 2008. H. Lee, C. Ekanadham, and A. Ng. Sparse deep belief net model for visual area V2. NIPS Conference, 2008.
[303]
Zurück zum Zitat J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 55(2), pp. 263–274, 2015.CrossRef J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 55(2), pp. 263–274, 2015.CrossRef
[339]
Zurück zum Zitat H. Mobahi and J. Fisher. A theoretical analysis of optimization by Gaussian continuation. AAAI Conference, 2015. H. Mobahi and J. Fisher. A theoretical analysis of optimization by Gaussian continuation. AAAI Conference, 2015.
[360]
Zurück zum Zitat S. Nowlan and G. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), pp. 473–493, 1992.CrossRef S. Nowlan and G. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), pp. 473–493, 1992.CrossRef
[386]
Zurück zum Zitat M.’ A. Ranzato, Y-L. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. NIPS Conference, pp. 1185–1192, 2008. M.’ A. Ranzato, Y-L. Boureau, and Y. LeCun. Sparse feature learning for deep belief networks. NIPS Conference, pp. 1185–1192, 2008.
[388]
Zurück zum Zitat A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semi-supervised learning with ladder networks. NIPS Conference, pp. 3546–3554, 2015. A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semi-supervised learning with ladder networks. NIPS Conference, pp. 3546–3554, 2015.
[392]
Zurück zum Zitat S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. ICML Conference, pp. 1060–1069, 2016. S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee. Generative adversarial text to image synthesis. ICML Conference, pp. 1060–1069, 2016.
[397]
Zurück zum Zitat S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. ICML Conference, pp. 833–840, 2011. S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. Contractive auto-encoders: Explicit invariance during feature extraction. ICML Conference, pp. 833–840, 2011.
[398]
Zurück zum Zitat S. Rifai, Y. Dauphin, P. Vincent, Y. Bengio, and X. Muller. The manifold tangent classifier. NIPS Conference, pp. 2294–2302, 2011. S. Rifai, Y. Dauphin, P. Vincent, Y. Bengio, and X. Muller. The manifold tangent classifier. NIPS Conference, pp. 2294–2302, 2011.
[422]
Zurück zum Zitat T Sanger. Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Transactions on Robotics and Automation, 10(3), 1994. T Sanger. Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Transactions on Robotics and Automation, 10(3), 1994.
[435]
Zurück zum Zitat H. Schwenk and Y. Bengio. Boosting neural networks. Neural Computation, 12(8), pp. 1869–1887, 2000.CrossRef H. Schwenk and Y. Bengio. Boosting neural networks. Neural Computation, 12(8), pp. 1869–1887, 2000.CrossRef
[438]
Zurück zum Zitat G. Seni and J. Elder. Ensemble methods in data mining: Improving accuracy through combining predictions. Morgan and Claypool, 2010. G. Seni and J. Elder. Ensemble methods in data mining: Improving accuracy through combining predictions. Morgan and Claypool, 2010.
[450]
Zurück zum Zitat J. Sietsma and R. Dow. Creating artificial neural networks that generalize. Neural Networks, 4(1), pp. 67–79, 1991.CrossRef J. Sietsma and R. Dow. Creating artificial neural networks that generalize. Neural Networks, 4(1), pp. 67–79, 1991.CrossRef
[463]
Zurück zum Zitat K. Sohn, H. Lee, and X. Yan. Learning structured output representation using deep conditional generative models. NIPS Conference, 2015. K. Sohn, H. Lee, and X. Yan. Learning structured output representation using deep conditional generative models. NIPS Conference, 2015.
[464]
Zurück zum Zitat R. Solomonoff. A system for incremental learning based on algorithmic probability. Sixth Israeli Conference on Artificial Intelligence, Computer Vision and Pattern Recognition, pp. 515–527, 1994. R. Solomonoff. A system for incremental learning based on algorithmic probability. Sixth Israeli Conference on Artificial Intelligence, Computer Vision and Pattern Recognition, pp. 515–527, 1994.
[467]
Zurück zum Zitat N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), pp. 1929–1958, 2014.MathSciNetMATH N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), pp. 1929–1958, 2014.MathSciNetMATH
[472]
Zurück zum Zitat F. Strub and J. Mary. Collaborative filtering with stacked denoising autoencoders and sparse inputs. NIPS Workshop on Machine Learning for eCommerce, 2015. F. Strub and J. Mary. Collaborative filtering with stacked denoising autoencoders and sparse inputs. NIPS Workshop on Machine Learning for eCommerce, 2015.
[499]
Zurück zum Zitat A. Tikhonov and V. Arsenin. Solution of ill-posed problems. Winston and Sons, 1977. A. Tikhonov and V. Arsenin. Solution of ill-posed problems. Winston and Sons, 1977.
[502]
Zurück zum Zitat H. Valpola. From neural PCA to deep unsupervised learning. Advances in Independent Component Analysis and Learning Machines, pp. 143–171, Elsevier, 2015. H. Valpola. From neural PCA to deep unsupervised learning. Advances in Independent Component Analysis and Learning Machines, pp. 143–171, Elsevier, 2015.
[506]
Zurück zum Zitat P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. ICML Confererence, pp. 1096–1103, 2008. P. Vincent, H. Larochelle, Y. Bengio, and P. Manzagol. Extracting and composing robust features with denoising autoencoders. ICML Confererence, pp. 1096–1103, 2008.
[510]
Zurück zum Zitat J. Walker, C. Doersch, A. Gupta, and M. Hebert. An uncertain future: Forecasting from static images using variational autoencoders. European Conference on Computer Vision, pp. 835–851, 2016. J. Walker, C. Doersch, A. Gupta, and M. Hebert. An uncertain future: Forecasting from static images using variational autoencoders. European Conference on Computer Vision, pp. 835–851, 2016.
[511]
Zurück zum Zitat L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. Regularization of neural networks using dropconnect. ICML Conference, pp. 1058–1066, 2013. L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. Regularization of neural networks using dropconnect. ICML Conference, pp. 1058–1066, 2013.
[515]
Zurück zum Zitat S. Wang, C. Aggarwal, and H. Liu. Using a random forest to inspire a neural network and improving on it. SIAM Conference on Data Mining, 2017. S. Wang, C. Aggarwal, and H. Liu. Using a random forest to inspire a neural network and improving on it. SIAM Conference on Data Mining, 2017.
[535]
Zurück zum Zitat Y. Wu, C. DuBois, A. Zheng, and M. Ester. Collaborative denoising auto-encoders for top-n recommender systems. Web Search and Data Mining, pp. 153–162, 2016. Y. Wu, C. DuBois, A. Zheng, and M. Ester. Collaborative denoising auto-encoders for top-n recommender systems. Web Search and Data Mining, pp. 153–162, 2016.
[536]
Zurück zum Zitat Z. Wu. Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, pp. 814–836, 1997.MathSciNetCrossRef Z. Wu. Global continuation for distance geometry problems. SIAM Journal of Optimization, 7, pp. 814–836, 1997.MathSciNetCrossRef
[566]
Zurück zum Zitat Z.-H. Zhou. Ensemble methods: Foundations and algorithms. CRC Press, 2012. Z.-H. Zhou. Ensemble methods: Foundations and algorithms. CRC Press, 2012.
[567]
Zurück zum Zitat Z.-H. Zhou, J. Wu, and W. Tang. Ensembling neural networks: many could be better than all. Artificial Intelligence, 137(1–2), pp. 239–263, 2002.MathSciNetCrossRef Z.-H. Zhou, J. Wu, and W. Tang. Ensembling neural networks: many could be better than all. Artificial Intelligence, 137(1–2), pp. 239–263, 2002.MathSciNetCrossRef
Metadaten
Titel
Teaching Deep Learners to Generalize
verfasst von
Charu C. Aggarwal
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-94463-0_4

Premium Partner