nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

2. Mathematical and Computational Prerequisites

verfasst von : Sandro Skansi

Erschienen in: Introduction to Deep Learning

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The mathematical part starts with the review of functions, derivations, vectors and matrices. There all the prerequisites for understanding gradient descent and calculating gradients by hand are given. The chapter provides also an overview of the basic probability concepts, as deep learning today (as opposed to the historical approach) is mainly perceived as either calculating conditional probabilities or probability distributions. The following section gives a brief overview of logic and Turing machines aimed at better understanding the XOR problem and memory-based architectures. Threshold logic gates are only briefly touched upon and placed in the context of a metatheory for deep learning. The remainder of the chapter is a quick introduction to Python, as this will be the language used in the examples in the book. The introduction to Python presented here is sufficient to understand all code in the book.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel From Logic to Cognitive Science

Nächstes Kapitel Machine Learning Basics

Notice that they also have the same number of members or cardinality, namely 2.

The counting starts with 0, and we will use this convention in the whole book.

The traditional definition uses sets to define tuples, tuples to define relations and relations to define functions, but that is an overly logical approach for our needs in the present volume. This definition provides a much wider class of entities to be considered functions.

A function with n-arguments is called an n-ary function.

The ReLU or rectified linear unit defined by \(\rho (x) = \max (x,0)\) is an example of a function that is continuous even though it is (usually) defined by cases. We will be using ReLU extensively from Chap. 6 onwards.

This is why \(0.999\dots \ne 1\).

This is especially true in programming, since when we program we need to approximate functions with real numbers by using functions with rational numbers. This approximation also goes a long way in terms of intuition, so it is good to think about this when trying to figure out how a function will behave.

With the exception of division where the divisor is 0. In this case, the division function is undefined, and therefore the notion of continuity does not have any meaning in this point.

Rational functions are of the form \(\frac{f(x)}{g(x)}\) where f and g are polynomial functions.

The process of finding derivatives is called ‘differentiation’.

Which is a 0-ary function, i.e. a function that gives the same value regardless of the input.

The chain rule in Lagrange notation is more clumsy and void of the intuitive similarity with fractions: \(h'(x)=f'(g(x))g'(x)\).

Keep in mind that \(h(x)=g(f(x))=(g\circ f)(x)=g(u)\circ f(x)\), which means that h is the composition of the functions g and f. It is very important not to mix up compositions of functions like \(f(x)=(3-2x)^5\) with an ordinary function like \(f(x)=3-2x^5\), or with a product like \(f(x)=sin x \cdot x^5\).

These rules are not independent, since both ChainExp and Exp are a consequence of CHAINRULE.

We deliberately avoid talking about fields here since we only use \(\mathbb {R}\), and there is no reason to complicate the exposition.

One for each dimension.

A minimal subset such that a property P holds is a subset (of some larger set) of which we can take no proper subset such that P would still hold.

Matrix subtraction works in exactly the same way, only with subtraction instead of addition.

To get the actual f(x) we just need to plug in the minimal x and calculate f(x).

In the case of multiple dimensions, we shall do the same calculation for every pair of \(x_i\) and \(\nabla _i f(\mathbf {x})\).

Note that a function can have many local minima or minimal points, but only one global minimum. Gradient descent can get ‘stuck’ in a local minimum, but our example has only one local minimum which is the actual global minimum.

We stop simply because we consider it to be ‘good enough’—there is no mathematical reason for stopping here.

This book is available online for free at https://www.probabilitycourse.com/.

Properties are called features in machine learning, while in statistics they are called variables, which can be quite confusing, but it is standard terminology.

Note that the mean is equally useless for describing the first four and the last member taken in isolation.

The sequence can be sorted in ascending or descending order, it does not matter.

This is the ‘official’ name for the mean, median and mode.

Not 5 on one die or the other, but 5 as in when you need to roll a 5 in \(\text {Monopoly}^{\circledR }\) to buy that last street you need to start building houses.

In \(6^2\), the 6 denotes the number of values on each die, and the 2 denotes the number of dice used.

What we called here ‘basic probabilities’ are actually called priors in the literature, and we will be referring to them as such in the later chapters.

All machine learning algorithms are estimators.

Note that ideally we would like an estimator to be a perfect predictor of the future in all cases, but this would be equal to having foresight. Scientifically speaking, we have models and we try to make them as accurate as possible, but perfect prediction is simply not on the table.

‘Disjoint’ means \(A\cap B = \emptyset \).

There are others, but they are in disguise.

A version of Bayes’ original manuscript is available at http://www.stat.ucla.edu/history/essay.pdf.

This is not exactly how it behaves, but it is a simplification which is more than enough for our needs.

Text editors are Notepad, Vim, Emacs, Sublime, Notepad++, Atom, Nano, cat and many others. Feel free to experiment and find the one you like most (most are free). You might have heard of the so-called IDEs or Integrated Development Environments. They are basically text editors with additional functions. Some IDEs you might know of are Visual Studio, Eclipse and PyCharm. Unlike text editors, most IDEs are not freely available, but there are free versions and trial versions, so you may experiment with them before buying. Remember, there is nothing essential an IDE can do but a text editor cannot, but they do offer additional conveniences in IDEs. My personal preference is to use Vim.

Never call this an ‘if-loop’, since it is simply wrong.

In a programming jargon, when we say ‘the syntax is the same’ or ‘you can use a similar syntax’ means that you should try to reproduce the same style but with the new values or objects.

Note that even though the name we assign to a library is arbitrary, there are standard abbreviations used in the Python community. Examples are np for Numpy, tf for TensorFlow, pd for Pandas and so on. This is important to know since on StackOverflow you might find a solution but without the import statements. So if the solution has np somewhere in it, it means that you should have a line which imports Numpy with the name np.

In Python, technically speaking, every function returns something. If no return command is issued, the function will return None which is a special Python keyword for ‘nothing’. This a subtle point, but also the cause of many intermediate-level bugs, and therefore it is worth noting it now.

In Python 3, this is no longer exactlythat list, but this is a minor issue at this stage of learning Python. What you need to know is that you can count on it to behave exactly like that list.

Notice that the code, as it stands now, does not have this problem, but this is a bug since a problem would arise if the room temperature turns out to be an odd number, and not an even number as we have now.

JSON stands for JavaScript Object Notation, and JSONs (i.e. Python dictionaries) are referred to as objects in JavaScript.

J.R. Hindley, J.P. Seldin, Lambda-Calculus and Combinators: An Introduction (Cambridge University Press, Cambridge, 2008)CrossRef

G.S. Boolos, J.P. Burges, R.C. Jeffrey, Computability and Logic (Cambridge University Press, Cambridge, 2007)CrossRef

P. Renteln, Manifolds, Tensors, and Forms: An Introduction for Mathematicians and Physicists (Cambridge University Press, Cambridge, 2013)CrossRef

R. Courant, J. Fritz, Introduction to Calculus and Analysis, vol. 1 (Springer, New York, 1999)CrossRef

S. Axler, Linear Algebra Done Right (Springer, New York, 2015)MATH

P.N. Klein, Coding the Matrix (Newtonian Press, London, 2013)

H. Pishro-Nik, Introduction to Probability, Statistics, and Random Processes (Kappa Books Publishers, Blue Bell, 2014)

D.P. Bertsekas, J.N. Tsitsiklis, Introduction to Probability (Athena Scientific, Nashua, 2008)

S.M. Stigler, Laplace’s 1774 memoir on inverse probability. Stat. Sci. 1, 359–363 (1986)MathSciNetCrossRef

10.

A. Hald, Laplace’s Theory of Inverse Probability, 1774–1786 (Springer, New York, 2007), pp. 33–46

11.

W. Rautenberg, A Concise Introduction to Mathematical Logic (Springer, New York, 2006)MATH

12.

D. van Dalen, Logic and Structure (Springer, New York, 2004)CrossRef

13.

A.M. Turing, On computable numbers, with an application to the Entscheidungsproblem. Proc. Lond. Math. Soc. 42(2), 230–265 (1936)MathSciNetMATH

Titel: Mathematical and Computational Prerequisites
verfasst von: Sandro Skansi
Verlag: Springer International Publishing
Buch: Introduction to Deep Learning
Print ISBN: 978-3-319-73003-5

Electronic ISBN: 978-3-319-73004-2

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-73004-2_2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"