Top

Published in:

2019 | OriginalPaper | Chapter

4. Basics of Deep Learning

Authors : Uday Kamath, John Liu, James Whitaker

Published in: Deep Learning for NLP and Speech Recognition

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

One of the most talked-about concepts in machine learning both in the academic community and in the media is the evolving field of deep learning. The idea of neural networks, and subsequently deep learning, gathers its inspiration from the biological representation of the human brain (or any brained creature for that matter).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Text and Speech Basics

next chapter Distributed Representations

The universal approximation theorem was initially proved for neural network architectures using the sigmoid activation function, but was subsequently shown to apply to all fully connected networks [Cyb89b, HSW89].

This output of the encoder is sometimes referred to as the code, encoding or embedding.

If the task has real-valued inputs between 0 and 1, then Bernoulli cross-entropy is a better choice for the objective function.

Note: there are no learned parameters in the noise function presented here.

https://github.com/Jakobovski/free-spoken-digit-dataset.

Note, PyTorch can still train in mini-batch mode. The view function converts the input tensor into the dimensions [n, 1, 1, 3072], where n is the mini-batch size.

[Aba+15]

Martín Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015.

[Bis95]

Christopher M Bishop. “Regularization and complexity control in feed-forward networks”. In: (1995).

[BGW18]

Sebastian Bock, Josef Goppold, and Martin Weiß. “An improvement of the convergence proof of the ADAM-Optimizer”. In: arXiv preprint arXiv:1804.10587 (2018).

[Cho+15a]

Anna Choromanska et al. “The loss surfaces of multilayer networks”. In: Artificial Intelligence and Statistics. 2015, pp. 192–204.

[Cyb89b]

George Cybenko. “Approximation by superpositions of a sigmoidal function”. In: Mathematics of control, signals and systems 2.4 (1989), pp. 303–314.MathSciNetCrossRef

[Den+09b]

Jia Deng et al. “Imagenet: A large-scale hierarchical image database”. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE. 2009, pp. 248–255.

[DHS11]

John Duchi, Elad Hazan, and Yoram Singer. “Adaptive subgradient methods for online learning and stochastic optimization”. In: Journal of Machine Learning Research 12.Jul (2011), pp. 2121–2159.

[GBC16a]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.MATH

[GBC16b]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. “Deep learning (adaptive computation and machine learning series)”. In: Adaptive Computation and Machine Learning series (2016), p. 800.

[Goo+14a]

Ian Goodfellow et al. “Generative adversarial nets”. In: Advances in neural information processing systems. 2014, pp. 2672–2680.

[GSS14]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples”. In: arXiv preprint arXiv:1412.6572 (2014).

[GW08]

Andreas Griewank and Andrea Walther. Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, 2008.CrossRef

[Gul+17]

Ishaan Gulrajani et al. “Improved training of Wasserstein GANs”. In: Advances in Neural Information Processing Systems. 2017, pp. 5767–5777.

[HOT06b]

Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. “A fast learning algorithm for deep belief nets”. In: Neural computation 18.7 (2006), pp. 1527–1554.MathSciNetCrossRef

[HS06]

Geoffrey E Hinton and Ruslan R Salakhutdinov. “Reducing the dimensionality of data with neural networks”. In: science 313.5786 (2006), pp. 504–507.

[HS83]

Geoffrey E Hinton and Terrence J Sejnowski. “Optimal perceptual inference”. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Citeseer. 1983, pp. 448–453.

[HSW89]

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. “Multilayer feedforward networks are universal approximators”. In: Neural networks 2.5 (1989), pp. 359–366.CrossRef

[IS15]

Sergey Ioffe and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. In: CoRR abs/1502.03167 (2015).

[Iva68]

Aleksey Grigorievitch Ivakhnenko. “The group method of data handling - a rival of the method of stochastic approximation”. In: Soviet Automatic Control 13.3 (1968), pp. 43–55.

[JGP16]

Eric Jang, Shixiang Gu, and Ben Poole. “Categorical reparameterization with gumbel-softmax”. In: arXiv preprint arXiv:1611.01144 (2016).

[Jou+16b]

Armand Joulin et al. “Fasttext. zip: Compressing text classification models”. In: arXiv preprint arXiv:1612.03651 (2016).

[KB14]

Diederik Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).

[KSH12c]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classification with deep convolutional neural networks”. In: Advances in neural information processing systems. 2012, pp. 1097–1105.

[LeC+06]

Yann LeCun et al. “A tutorial on energy-based learning”. In: Predicting structured data 1.0 (2006).

[Mai+10]

Julien Mairal et al. “Online learning for matrix factorization and sparse coding”. In: Journal of Machine Learning Research 11.Jan (2010), pp. 19–60.

[MB05]

Frederic Morin and Yoshua Bengio. “Hierarchical Probabilistic Neural Network Language Model.” In: Aistats. Vol. 5. Citeseer. 2005, pp. 246–252.

[MK87]

Katta G Murty and Santosh N Kabadi. “Some NP-complete problems in quadratic and nonlinear programming”. In: Mathematical programming 39.2 (1987), pp. 117–129.MathSciNetCrossRef

[Pas+17]

Adam Paszke et al. “Automatic differentiation in PyTorch”. In: (2017).

[PSM14]

Jeffrey Pennington, Richard Socher, and Christopher Manning. “Glove: Global vectors for word representation”. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014, pp. 1532–1543.

[RKK18]

Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. “On the convergence of Adam and beyond”. In: (2018).

[Rud17a]

Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017).

[SLA12]

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. “Practical Bayesian optimization of machine learning algorithms”. In: Advances in neural information processing systems. 2012, pp. 2951–2959.

[Spe80]

Bert Speelpenning. Compiling fast partial derivatives of functions given by algorithms. Tech. rep. Illinois Univ., Urbana (USA). Dept. of Computer Science, 1980.

[Sri+14]

Nitish Srivastava et al. “Dropout: a simple way to prevent neural networks from overfitting.” In: Journal of machine learning research 15.1 (2014), pp. 1929–1958.

[TH12]

Tijmen Tieleman and Geoffrey Hinton. “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude”. In: COURSERA: Neural networks for machine learning 4.2 (2012), pp. 26–31.

[Zei12]

Matthew D. Zeiler. “ADADELTA: An Adaptive Learning Rate Method”. In: CoRR abs/1212.5701 (2012).

[Zha+16]

Chiyuan Zhang et al. “Understanding deep learning requires rethinking generalization”. In: CoRR abs/1611.03530 (2016).

Title: Basics of Deep Learning
Authors: Uday Kamath
John Liu
James Whitaker
Publisher: Springer International Publishing
Book: Deep Learning for NLP and Speech Recognition
Print ISBN: 978-3-030-14595-8

Electronic ISBN: 978-3-030-14596-5

Copyright Year: 2019
DOI: https://doi.org/10.1007/978-3-030-14596-5_4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner