Skip to main content
Top

2019 | OriginalPaper | Chapter

4. Basics of Deep Learning

Authors : Uday Kamath, John Liu, James Whitaker

Published in: Deep Learning for NLP and Speech Recognition

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

One of the most talked-about concepts in machine learning both in the academic community and in the media is the evolving field of deep learning. The idea of neural networks, and subsequently deep learning, gathers its inspiration from the biological representation of the human brain (or any brained creature for that matter).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The universal approximation theorem was initially proved for neural network architectures using the sigmoid activation function, but was subsequently shown to apply to all fully connected networks [Cyb89b, HSW89].
 
2
This output of the encoder is sometimes referred to as the code, encoding or embedding.
 
3
If the task has real-valued inputs between 0 and 1, then Bernoulli cross-entropy is a better choice for the objective function.
 
4
Note: there are no learned parameters in the noise function presented here.
 
6
Note, PyTorch can still train in mini-batch mode. The view function converts the input tensor into the dimensions [n, 1, 1, 3072], where n is the mini-batch size.
 
Literature
[Aba+15]
go back to reference Martín Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Martín Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015.
[Bis95]
go back to reference Christopher M Bishop. “Regularization and complexity control in feed-forward networks”. In: (1995). Christopher M Bishop. “Regularization and complexity control in feed-forward networks”. In: (1995).
[BGW18]
go back to reference Sebastian Bock, Josef Goppold, and Martin Weiß. “An improvement of the convergence proof of the ADAM-Optimizer”. In: arXiv preprint arXiv:1804.10587 (2018). Sebastian Bock, Josef Goppold, and Martin Weiß. “An improvement of the convergence proof of the ADAM-Optimizer”. In: arXiv preprint arXiv:1804.10587 (2018).
[Cho+15a]
go back to reference Anna Choromanska et al. “The loss surfaces of multilayer networks”. In: Artificial Intelligence and Statistics. 2015, pp. 192–204. Anna Choromanska et al. “The loss surfaces of multilayer networks”. In: Artificial Intelligence and Statistics. 2015, pp. 192–204.
[Cyb89b]
go back to reference George Cybenko. “Approximation by superpositions of a sigmoidal function”. In: Mathematics of control, signals and systems 2.4 (1989), pp. 303–314.MathSciNetCrossRef George Cybenko. “Approximation by superpositions of a sigmoidal function”. In: Mathematics of control, signals and systems 2.4 (1989), pp. 303–314.MathSciNetCrossRef
[Den+09b]
go back to reference Jia Deng et al. “Imagenet: A large-scale hierarchical image database”. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE. 2009, pp. 248–255. Jia Deng et al. “Imagenet: A large-scale hierarchical image database”. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE. 2009, pp. 248–255.
[DHS11]
go back to reference John Duchi, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning and stochastic optimization”. In: Journal of Machine Learning Research 12.Jul (2011), pp. 2121–2159. John Duchi, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning and stochastic optimization”. In: Journal of Machine Learning Research 12.Jul (2011), pp. 2121–2159.
[GBC16a]
go back to reference Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.MATH Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.MATH
[GBC16b]
go back to reference Ian Goodfellow, Yoshua Bengio, and Aaron Courville. “Deep learning (adaptive computation and machine learning series)”. In: Adaptive Computation and Machine Learning series (2016), p. 800. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. “Deep learning (adaptive computation and machine learning series)”. In: Adaptive Computation and Machine Learning series (2016), p. 800.
[Goo+14a]
go back to reference Ian Goodfellow et al. “Generative adversarial nets”. In: Advances in neural information processing systems. 2014, pp. 2672–2680. Ian Goodfellow et al. “Generative adversarial nets”. In: Advances in neural information processing systems. 2014, pp. 2672–2680.
[GSS14]
go back to reference Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples”. In: arXiv preprint arXiv:1412.6572 (2014). Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples”. In: arXiv preprint arXiv:1412.6572 (2014).
[GW08]
go back to reference Andreas Griewank and Andrea Walther. Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, 2008.CrossRef Andreas Griewank and Andrea Walther. Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, 2008.CrossRef
[Gul+17]
go back to reference Ishaan Gulrajani et al. “Improved training of Wasserstein GANs”. In: Advances in Neural Information Processing Systems. 2017, pp. 5767–5777. Ishaan Gulrajani et al. “Improved training of Wasserstein GANs”. In: Advances in Neural Information Processing Systems. 2017, pp. 5767–5777.
[HOT06b]
go back to reference Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. “A fast learning algorithm for deep belief nets”. In: Neural computation 18.7 (2006), pp. 1527–1554.MathSciNetCrossRef Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. “A fast learning algorithm for deep belief nets”. In: Neural computation 18.7 (2006), pp. 1527–1554.MathSciNetCrossRef
[HS06]
go back to reference Geoffrey E Hinton and Ruslan R Salakhutdinov. “Reducing the dimensionality of data with neural networks”. In: science 313.5786 (2006), pp. 504–507. Geoffrey E Hinton and Ruslan R Salakhutdinov. “Reducing the dimensionality of data with neural networks”. In: science 313.5786 (2006), pp. 504–507.
[HS83]
go back to reference Geoffrey E Hinton and Terrence J Sejnowski. “Optimal perceptual inference”. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Citeseer. 1983, pp. 448–453. Geoffrey E Hinton and Terrence J Sejnowski. “Optimal perceptual inference”. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Citeseer. 1983, pp. 448–453.
[HSW89]
go back to reference Kurt Hornik, Maxwell Stinchcombe, and Halbert White. “Multilayer feedforward networks are universal approximators”. In: Neural networks 2.5 (1989), pp. 359–366.CrossRef Kurt Hornik, Maxwell Stinchcombe, and Halbert White. “Multilayer feedforward networks are universal approximators”. In: Neural networks 2.5 (1989), pp. 359–366.CrossRef
[IS15]
go back to reference Sergey Ioffe and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. In: CoRR abs/1502.03167 (2015). Sergey Ioffe and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. In: CoRR abs/1502.03167 (2015).
[Iva68]
go back to reference Aleksey Grigorievitch Ivakhnenko. “The group method of data handling - a rival of the method of stochastic approximation”. In: Soviet Automatic Control 13.3 (1968), pp. 43–55. Aleksey Grigorievitch Ivakhnenko. “The group method of data handling - a rival of the method of stochastic approximation”. In: Soviet Automatic Control 13.3 (1968), pp. 43–55.
[JGP16]
go back to reference Eric Jang, Shixiang Gu, and Ben Poole. “Categorical reparameterization with gumbel-softmax”. In: arXiv preprint arXiv:1611.01144 (2016). Eric Jang, Shixiang Gu, and Ben Poole. “Categorical reparameterization with gumbel-softmax”. In: arXiv preprint arXiv:1611.01144 (2016).
[Jou+16b]
go back to reference Armand Joulin et al. “Fasttext. zip: Compressing text classification models”. In: arXiv preprint arXiv:1612.03651 (2016). Armand Joulin et al. “Fasttext. zip: Compressing text classification models”. In: arXiv preprint arXiv:1612.03651 (2016).
[KB14]
go back to reference Diederik Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014). Diederik Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).
[KSH12c]
go back to reference Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classification with deep convolutional neural networks”. In: Advances in neural information processing systems. 2012, pp. 1097–1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classification with deep convolutional neural networks”. In: Advances in neural information processing systems. 2012, pp. 1097–1105.
[LeC+06]
go back to reference Yann LeCun et al. “A tutorial on energy-based learning”. In: Predicting structured data 1.0 (2006). Yann LeCun et al. “A tutorial on energy-based learning”. In: Predicting structured data 1.0 (2006).
[Mai+10]
go back to reference Julien Mairal et al. “Online learning for matrix factorization and sparse coding”. In: Journal of Machine Learning Research 11.Jan (2010), pp. 19–60. Julien Mairal et al. “Online learning for matrix factorization and sparse coding”. In: Journal of Machine Learning Research 11.Jan (2010), pp. 19–60.
[MB05]
go back to reference Frederic Morin and Yoshua Bengio. “Hierarchical Probabilistic Neural Network Language Model.” In: Aistats. Vol. 5. Citeseer. 2005, pp. 246–252. Frederic Morin and Yoshua Bengio. “Hierarchical Probabilistic Neural Network Language Model.” In: Aistats. Vol. 5. Citeseer. 2005, pp. 246–252.
[MK87]
go back to reference Katta G Murty and Santosh N Kabadi. “Some NP-complete problems in quadratic and nonlinear programming”. In: Mathematical programming 39.2 (1987), pp. 117–129.MathSciNetCrossRef Katta G Murty and Santosh N Kabadi. “Some NP-complete problems in quadratic and nonlinear programming”. In: Mathematical programming 39.2 (1987), pp. 117–129.MathSciNetCrossRef
[Pas+17]
go back to reference Adam Paszke et al. “Automatic differentiation in PyTorch”. In: (2017). Adam Paszke et al. “Automatic differentiation in PyTorch”. In: (2017).
[PSM14]
go back to reference Jeffrey Pennington, Richard Socher, and Christopher Manning. “Glove: Global vectors for word representation”. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014, pp. 1532–1543. Jeffrey Pennington, Richard Socher, and Christopher Manning. “Glove: Global vectors for word representation”. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014, pp. 1532–1543.
[RKK18]
go back to reference Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. “On the convergence of Adam and beyond”. In: (2018). Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. “On the convergence of Adam and beyond”. In: (2018).
[Rud17a]
go back to reference Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017). Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017).
[SLA12]
go back to reference Jasper Snoek, Hugo Larochelle, and Ryan P Adams. “Practical Bayesian optimization of machine learning algorithms”. In: Advances in neural information processing systems. 2012, pp. 2951–2959. Jasper Snoek, Hugo Larochelle, and Ryan P Adams. “Practical Bayesian optimization of machine learning algorithms”. In: Advances in neural information processing systems. 2012, pp. 2951–2959.
[Spe80]
go back to reference Bert Speelpenning. Compiling fast partial derivatives of functions given by algorithms. Tech. rep. Illinois Univ., Urbana (USA). Dept. of Computer Science, 1980. Bert Speelpenning. Compiling fast partial derivatives of functions given by algorithms. Tech. rep. Illinois Univ., Urbana (USA). Dept. of Computer Science, 1980.
[Sri+14]
go back to reference Nitish Srivastava et al. “Dropout: a simple way to prevent neural networks from overfitting.” In: Journal of machine learning research 15.1 (2014), pp. 1929–1958. Nitish Srivastava et al. “Dropout: a simple way to prevent neural networks from overfitting.” In: Journal of machine learning research 15.1 (2014), pp. 1929–1958.
[TH12]
go back to reference Tijmen Tieleman and Geoffrey Hinton. “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude”. In: COURSERA: Neural networks for machine learning 4.2 (2012), pp. 26–31. Tijmen Tieleman and Geoffrey Hinton. “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude”. In: COURSERA: Neural networks for machine learning 4.2 (2012), pp. 26–31.
[Zei12]
go back to reference Matthew D. Zeiler. “ADADELTA: An Adaptive Learning Rate Method”. In: CoRR abs/1212.5701 (2012). Matthew D. Zeiler. “ADADELTA: An Adaptive Learning Rate Method”. In: CoRR abs/1212.5701 (2012).
[Zha+16]
go back to reference Chiyuan Zhang et al. “Understanding deep learning requires rethinking generalization”. In: CoRR abs/1611.03530 (2016). Chiyuan Zhang et al. “Understanding deep learning requires rethinking generalization”. In: CoRR abs/1611.03530 (2016).
Metadata
Title
Basics of Deep Learning
Authors
Uday Kamath
John Liu
James Whitaker
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-14596-5_4

Premium Partner