Skip to main content

2018 | OriginalPaper | Buchkapitel

3. Training Deep Neural Networks

verfasst von : Charu C. Aggarwal

Erschienen in: Neural Networks and Deep Learning

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The procedure for training neural networks with backpropagation is briefly introduced in Chapter 1 This chapter will expand on the description on Chapter 1 in several ways

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Although the backpropagation algorithm was popularized by the Rumelhart et al. papers [408, 409], it had been studied earlier in the context of control theory. Crucially, Paul Werbos’s forgotten (and eventually rediscovered) thesis in 1974 discussed how these backpropagation methods could be used in neural networks. This was well before Rumelhart et al.’s papers in 1986, which were nevertheless significant because the style of presentation contributed to a better understanding of why backpropagation might work.
 
2
A different type of manifestation occurs in cases where the parameters in earlier and later layers are shared. In such cases, the effect of an update can be highly unpredictable because of the combined effect of different layers. Such scenarios occur in recurrent neural networks in which the parameters in later temporal layers are tied to those of earlier temporal layers. In such cases, small changes in the parameters can cause large changes in the loss function in very localized regions without any gradient-based indication in nearby regions. Such topological characteristics of the loss function are referred to as cliffs (cf. Section 3.5.4), and they make the problem harder to optimize because the gradient descent tends to either overshoot or undershoot.
 
3
In most of this book, we have worked with \(\overline{W}\) as a row-vector. However, it is notationally convenient here to work with \(\overline{W}\) as a column-vector.
 
Literatur
[7]
Zurück zum Zitat R. Ahuja, T. Magnanti, and J. Orlin. Network flows: Theory, algorithms, and applications. Prentice Hall, 1993. R. Ahuja, T. Magnanti, and J. Orlin. Network flows: Theory, algorithms, and applications. Prentice Hall, 1993.
[13]
Zurück zum Zitat J. Ba and R. Caruana. Do deep nets really need to be deep? NIPS Conference, pp. 2654–2662, 2014. J. Ba and R. Caruana. Do deep nets really need to be deep? NIPS Conference, pp. 2654–2662, 2014.
[23]
Zurück zum Zitat M. Bazaraa, H. Sherali, and C. Shetty. Nonlinear programming: theory and algorithms. John Wiley and Sons, 2013. M. Bazaraa, H. Sherali, and C. Shetty. Nonlinear programming: theory and algorithms. John Wiley and Sons, 2013.
[24]
Zurück zum Zitat S. Becker, and Y. LeCun. Improving the convergence of back-propagation learning with second order methods. Proceedings of the 1988 connectionist models summer school, pp. 29–37, 1988. S. Becker, and Y. LeCun. Improving the convergence of back-propagation learning with second order methods. Proceedings of the 1988 connectionist models summer school, pp. 29–37, 1988.
[36]
Zurück zum Zitat J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for hyper-parameter optimization. NIPS Conference, pp. 2546–2554, 2011. J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl. Algorithms for hyper-parameter optimization. NIPS Conference, pp. 2546–2554, 2011.
[37]
Zurück zum Zitat J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, pp. 281–305, 2012.MathSciNetMATH J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, pp. 281–305, 2012.MathSciNetMATH
[38]
Zurück zum Zitat J. Bergstra, D. Yamins, and D. Cox. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. ICML Confererence, pp. 115–123, 2013. J. Bergstra, D. Yamins, and D. Cox. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. ICML Confererence, pp. 115–123, 2013.
[39]
Zurück zum Zitat D. Bertsekas. Nonlinear programming Athena Scientific, 1999. D. Bertsekas. Nonlinear programming Athena Scientific, 1999.
[41]
Zurück zum Zitat C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995. C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.
[42]
Zurück zum Zitat C. M. Bishop. Bayesian Techniques. Chapter 10 in “Neural Networks for Pattern Recognition,” pp. 385–439, 1995. C. M. Bishop. Bayesian Techniques. Chapter 10 in “Neural Networks for Pattern Recognition,” pp. 385–439, 1995.
[54]
Zurück zum Zitat A. Bryson. A gradient method for optimizing multi-stage allocation processes. Harvard University Symposium on Digital Computers and their Applications, 1961. A. Bryson. A gradient method for optimizing multi-stage allocation processes. Harvard University Symposium on Digital Computers and their Applications, 1961.
[55]
Zurück zum Zitat C. Bucilu, R. Caruana, and A. Niculescu-Mizil. Model compression. ACM KDD Conference, pp. 535–541, 2006. C. Bucilu, R. Caruana, and A. Niculescu-Mizil. Model compression. ACM KDD Conference, pp. 535–541, 2006.
[66]
Zurück zum Zitat W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen. Compressing neural networks with the hashing trick. ICML Confererence, pp. 2285–2294, 2015. W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen. Compressing neural networks with the hashing trick. ICML Confererence, pp. 2285–2294, 2015.
[74]
Zurück zum Zitat A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, and B. Catanzaro. Deep learning with COTS HPC systems. ICML Confererence, pp. 1337–1345, 2013. A. Coates, B. Huval, T. Wang, D. Wu, A. Ng, and B. Catanzaro. Deep learning with COTS HPC systems. ICML Confererence, pp. 1337–1345, 2013.
[88]
Zurück zum Zitat Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NIPS Conference, pp. 2933–2941, 2014. Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NIPS Conference, pp. 2933–2941, 2014.
[91]
Zurück zum Zitat J. Dean et al. Large scale distributed deep networks. NIPS Conference, 2012. J. Dean et al. Large scale distributed deep networks. NIPS Conference, 2012.
[94]
Zurück zum Zitat M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. de Freitas. Predicting parameters in deep learning. NIPS Conference, pp. 2148–2156, 2013. M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. de Freitas. Predicting parameters in deep learning. NIPS Conference, pp. 2148–2156, 2013.
[96]
Zurück zum Zitat G. Desjardins, K. Simonyan, and R. Pascanu. Natural neural networks. NIPS Congference, pp. 2071–2079, 2015. G. Desjardins, K. Simonyan, and R. Pascanu. Natural neural networks. NIPS Congference, pp. 2071–2079, 2015.
[108]
Zurück zum Zitat J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, pp. 2121–2159, 2011.MathSciNetMATH J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, pp. 2121–2159, 2011.MathSciNetMATH
[140]
Zurück zum Zitat X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, pp. 249–256, 2010. X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, pp. 249–256, 2010.
[141]
Zurück zum Zitat X. Glorot, A. Bordes, and Y. Bengio. Deep Sparse Rectifier Neural Networks. AISTATS, 15(106), 2011. X. Glorot, A. Bordes, and Y. Bengio. Deep Sparse Rectifier Neural Networks. AISTATS, 15(106), 2011.
[146]
Zurück zum Zitat I. Goodfellow, O. Vinyals, and A. Saxe. Qualitatively characterizing neural network optimization problems. arXiv:1412.6544, 2014. [Also appears in International Conference in Learning Representations, 2015]https://arxiv.org/abs/1412.6544 I. Goodfellow, O. Vinyals, and A. Saxe. Qualitatively characterizing neural network optimization problems. arXiv:1412.6544, 2014. [Also appears in International Conference in Learning Representations, 2015]https://​arxiv.​org/​abs/​1412.​6544
[148]
Zurück zum Zitat I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. arXiv:1302.4389, 2013. I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. arXiv:1302.4389, 2013.
[167]
Zurück zum Zitat R. Hahnloser and H. S. Seung. Permitted and forbidden sets in symmetric threshold-linear networks. NIPS Conference, pp. 217–223, 2001. R. Hahnloser and H. S. Seung. Permitted and forbidden sets in symmetric threshold-linear networks. NIPS Conference, pp. 217–223, 2001.
[168]
Zurück zum Zitat S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, and W. Dally. EIE: Efficient Inference Engine for Compressed Neural Network. ACM SIGARCH Computer Architecture News, 44(3), pp. 243–254, 2016.CrossRef S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, and W. Dally. EIE: Efficient Inference Engine for Compressed Neural Network. ACM SIGARCH Computer Architecture News, 44(3), pp. 243–254, 2016.CrossRef
[169]
Zurück zum Zitat S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural networks. NIPS Conference, pp. 1135–1143, 2015. S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural networks. NIPS Conference, pp. 1135–1143, 2015.
[171]
Zurück zum Zitat M. Hardt, B. Recht, and Y. Singer. Train faster, generalize better: Stability of stochastic gradient descent. ICML Confererence, pp. 1225–1234, 2006. M. Hardt, B. Recht, and Y. Singer. Train faster, generalize better: Stability of stochastic gradient descent. ICML Confererence, pp. 1225–1234, 2006.
[183]
Zurück zum Zitat K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. IEEE International Conference on Computer Vision, pp. 1026–1034, 2015. K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. IEEE International Conference on Computer Vision, pp. 1026–1034, 2015.
[184]
Zurück zum Zitat K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
[189]
Zurück zum Zitat M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 1952. M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 1952.
[194]
Zurück zum Zitat G. Hinton. Neural networks for machine learning, Coursera Video, 2012. G. Hinton. Neural networks for machine learning, Coursera Video, 2012.
[202]
Zurück zum Zitat G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. NIPS Workshop, 2014. G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. NIPS Workshop, 2014.
[205]
Zurück zum Zitat S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001. S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001.
[213]
[214]
Zurück zum Zitat S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015.
[217]
Zurück zum Zitat R. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1(4), pp. 295–307, 1988.CrossRef R. Jacobs. Increased rates of convergence through learning rate adaptation. Neural Networks, 1(4), pp. 295–307, 1988.CrossRef
[221]
Zurück zum Zitat K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? International Conference on Computer Vision (ICCV), 2009. K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? International Conference on Computer Vision (ICCV), 2009.
[237]
Zurück zum Zitat H. J. Kelley. Gradient theory of optimal flight paths. Ars Journal, 30(10), pp. 947–954, 1960.CrossRef H. J. Kelley. Gradient theory of optimal flight paths. Ars Journal, 30(10), pp. 947–954, 1960.CrossRef
[255]
Zurück zum Zitat A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp. 1097–1105. 2012. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp. 1097–1105. 2012.
[273]
Zurück zum Zitat Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng, On optimization methods for deep learning. ICML Conference, pp. 265–272, 2011. Q. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Ng, On optimization methods for deep learning. ICML Conference, pp. 265–272, 2011.
[278]
Zurück zum Zitat Y. LeCun, L. Bottou, G. Orr, and K. Muller. Efficient backprop. in G. Orr and K. Muller (eds.) Neural Networks: Tricks of the Trade, Springer, 1998. Y. LeCun, L. Bottou, G. Orr, and K. Muller. Efficient backprop. in G. Orr and K. Muller (eds.) Neural Networks: Tricks of the Trade, Springer, 1998.
[282]
Zurück zum Zitat Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. NIPS Conference, pp. 598–605, 1990. Y. LeCun, J. Denker, and S. Solla. Optimal brain damage. NIPS Conference, pp. 598–605, 1990.
[300]
Zurück zum Zitat D. Luenberger and Y. Ye. Linear and nonlinear programming, Addison-Wesley, 1984. D. Luenberger and Y. Ye. Linear and nonlinear programming, Addison-Wesley, 1984.
[306]
Zurück zum Zitat D. J. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3), pp. 448–472, 1992.CrossRef D. J. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3), pp. 448–472, 1992.CrossRef
[313]
Zurück zum Zitat J. Martens. Deep learning via Hessian-free optimization. ICML Conference, pp. 735–742, 2010. J. Martens. Deep learning via Hessian-free optimization. ICML Conference, pp. 735–742, 2010.
[314]
Zurück zum Zitat J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. ICML Conference, pp. 1033–1040, 2011. J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. ICML Conference, pp. 1033–1040, 2011.
[316]
Zurück zum Zitat J. Martens and R. Grosse. Optimizing Neural Networks with Kronecker-factored Approximate Curvature. ICML Conference, 2015. J. Martens and R. Grosse. Optimizing Neural Networks with Kronecker-factored Approximate Curvature. ICML Conference, 2015.
[324]
Zurück zum Zitat T. Mikolov. Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012. T. Mikolov. Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012.
[330]
Zurück zum Zitat M. Minsky and S. Papert. Perceptrons. An Introduction to Computational Geometry, MIT Press, 1969. M. Minsky and S. Papert. Perceptrons. An Introduction to Computational Geometry, MIT Press, 1969.
[353]
Zurück zum Zitat Y. Nesterov. A method of solving a convex programming problem with convergence rate O(1∕k 2). Soviet Mathematics Doklady, 27, pp. 372–376, 1983.MATH Y. Nesterov. A method of solving a convex programming problem with convergence rate O(1∕k 2). Soviet Mathematics Doklady, 27, pp. 372–376, 1983.MATH
[359]
Zurück zum Zitat J. Nocedal and S. Wright. Numerical optimization. Springer, 2006. J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.
[362]
Zurück zum Zitat G. Orr and K.-R. Müller (editors). Neural Networks: Tricks of the Trade, Springer, 1998. G. Orr and K.-R. Müller (editors). Neural Networks: Tricks of the Trade, Springer, 1998.
[368]
Zurück zum Zitat R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. ICML Conference, 28, pp. 1310–1318, 2013. R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. ICML Conference, 28, pp. 1310–1318, 2013.
[369]
Zurück zum Zitat R. Pascanu, T. Mikolov, and Y. Bengio. Understanding the exploding gradient problem. CoRR, abs/1211.5063, 2012. R. Pascanu, T. Mikolov, and Y. Bengio. Understanding the exploding gradient problem. CoRR, abs/1211.5063, 2012.
[376]
Zurück zum Zitat E. Polak. Computational methods in optimization: a unified approach. Academic Press, 1971. E. Polak. Computational methods in optimization: a unified approach. Academic Press, 1971.
[380]
Zurück zum Zitat B. Polyak and A. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4), pp. 838–855, 1992.MathSciNetCrossRef B. Polyak and A. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4), pp. 838–855, 1992.MathSciNetCrossRef
[408]
Zurück zum Zitat D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, 323 (6088), pp. 533–536, 1986.CrossRef D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, 323 (6088), pp. 533–536, 1986.CrossRef
[409]
Zurück zum Zitat D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by back-propagating errors. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp. 318–362, 1986. D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by back-propagating errors. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp. 318–362, 1986.
[419]
Zurück zum Zitat T. Salimans and D. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. NIPS Conference, pp. 901–909, 2016. T. Salimans and D. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. NIPS Conference, pp. 901–909, 2016.
[426]
Zurück zum Zitat A. Saxe, J. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120, 2013. A. Saxe, J. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120, 2013.
[429]
Zurück zum Zitat T. Schaul, S. Zhang, and Y. LeCun. No more pesky learning rates. ICML Confererence, pp. 343–351, 2013. T. Schaul, S. Zhang, and Y. LeCun. No more pesky learning rates. ICML Confererence, pp. 343–351, 2013.
[443]
Zurück zum Zitat J. Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical Report, CMU-CS-94-125, Carnegie-Mellon University, 1994. J. Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical Report, CMU-CS-94-125, Carnegie-Mellon University, 1994.
[458]
Zurück zum Zitat J. Snoek, H. Larochelle, and R. Adams. Practical bayesian optimization of machine learning algorithms. NIPS Conference, pp. 2951–2959, 2013. J. Snoek, H. Larochelle, and R. Adams. Practical bayesian optimization of machine learning algorithms. NIPS Conference, pp. 2951–2959, 2013.
[478]
Zurück zum Zitat I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. ICML Confererence, pp. 1139–1147, 2013. I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. ICML Confererence, pp. 1139–1147, 2013.
[490]
Zurück zum Zitat C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. ACM KDD Conference, pp. 847–855, 2013. C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. ACM KDD Conference, pp. 847–855, 2013.
[524]
Zurück zum Zitat P. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974. P. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974.
[525]
Zurück zum Zitat P. Werbos. The roots of backpropagation: from ordered derivatives to neural networks and political forecasting (Vol. 1). John Wiley and Sons, 1994. P. Werbos. The roots of backpropagation: from ordered derivatives to neural networks and political forecasting (Vol. 1). John Wiley and Sons, 1994.
[532]
Zurück zum Zitat S. Wieseler and H. Ney. A convergence analysis of log-linear training. NIPS Conference, pp. 657–665, 2011. S. Wieseler and H. Ney. A convergence analysis of log-linear training. NIPS Conference, pp. 657–665, 2011.
[545]
Zurück zum Zitat H. Yu and B. Wilamowski. Levenberg–Marquardt training. Industrial Electronics Handbook, 5(12), 1, 2011. H. Yu and B. Wilamowski. Levenberg–Marquardt training. Industrial Electronics Handbook, 5(12), 1, 2011.
Metadaten
Titel
Training Deep Neural Networks
verfasst von
Charu C. Aggarwal
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-94463-0_3