2018 | Book

# Introduction to Deep Learning

## From Logical Calculus to Artificial Intelligence

Author: Dr. Sandro Skansi

Publisher: Springer International Publishing

Book Series : Undergraduate Topics in Computer Science

2018 | Book

Author: Dr. Sandro Skansi

Publisher: Springer International Publishing

Book Series : Undergraduate Topics in Computer Science

This textbook presents a concise, accessible and engaging first introduction to deep learning, offering a wide range of connectionist models which represent the current state-of-the-art. The text explores the most popular algorithms and architectures in a simple and intuitive style, explaining the mathematical derivations in a step-by-step manner. The content coverage includes convolutional networks, LSTMs, Word2vec, RBMs, DBNs, neural Turing machines, memory networks and autoencoders. Numerous examples in working Python code are provided throughout the book, and the code is also supplied separately at an accompanying website.

Topics and features: introduces the fundamentals of machine learning, and the mathematical and computational prerequisites for deep learning; discusses feed-forward neural networks, and explores the modifications to these which can be applied to any neural network; examines convolutional neural networks, and the recurrent connections to a feed-forward neural network; describes the notion of distributed representations, the concept of the autoencoder, and the ideas behind language processing with deep learning; presents a brief history of artificial intelligence and neural networks, and reviews interesting open research problems in deep learning and connectionism.

This clearly written and lively primer on deep learning is essential reading for graduate and advanced undergraduate students of computer science, cognitive science and mathematics, as well as fields such as linguistics, logic, philosophy, and psychology.

Advertisement

Abstract

This chapter focuses on the history of neural networks as a logical calculus aimed at formalizing neural activity. After a brief prehistory of artificial intelligence (in the works of Leibniz), the early beginnings of neural networks and the main protagonists, McCulloch, Pitts, Lettvin and Weiner are presented in detail and their connections with Russell and Carnap explained. Their encounters at the University of Chicago and MIT are presented in detail, focusing both on historical facts and curiosities. The next section introduces Rosenblatt, who was the person who invented the first true learning rule for neural networks and Minsky and Papert who discovered one of the great problems for neural networks, the XOR problem. The next epoch presented are the 1980s, with the San Diego circle working on neural networks from a cognitive science aspect, and continuing to recent history and the birth of deep learning. Major trends behind the history of artificial intelligence and neural networks are explored, and placed both in a historical and systematic context, with an exploration of the philosophical aspects.

Abstract

The mathematical part starts with the review of functions, derivations, vectors and matrices. There all the prerequisites for understanding gradient descent and calculating gradients by hand are given. The chapter provides also an overview of the basic probability concepts, as deep learning today (as opposed to the historical approach) is mainly perceived as either calculating conditional probabilities or probability distributions. The following section gives a brief overview of logic and Turing machines aimed at better understanding the XOR problem and memory-based architectures. Threshold logic gates are only briefly touched upon and placed in the context of a metatheory for deep learning. The remainder of the chapter is a quick introduction to Python, as this will be the language used in the examples in the book. The introduction to Python presented here is sufficient to understand all code in the book.

Abstract

This chapter explores the fundamentals of machine learning, since deep learning is above everything else, a technique for machine learning. We explore the idea of classification and what it means for a classificator to classify data, and proceed to evaluating the performance of a general classifier. The first actual classifier we present is naive Bayes (which includes a general discussion on data encoding and normalization), and we also present the simplest neural network, logistic regression, which is the bread and butter of deep learning. We introduce the classic MNIST dataset of handwritten digits, the so-called ‘fruit fly of machine learning’. We present also two showcase techniques of unsupervised learning, K-means to explain clustering and the general principle of learning without labels and the principal component analysis (PCA) to explain how to learn representations. PCA is also explored in more detail later on. We conclude with a brief exposition on how to represent language for learning with bag of words.

Abstract

This chapter introduces feedforward neural networks, and introduces the basic terminology of deep learning. It also presents a discussion on how to represent these abstract and graphical objects as mathematical objects (vectors, matrices and tensors). Rosenblatt’s perceptron rule is also presented in detail, which makes it clear that a multilayered perceptron is impossible. The Delta rule, as an alternative, is presented, and the idea of iterative procedures of rule updates are also presented in this chapter with abundant examples to form both an abstract intuition and a numerical intuition. Backpropagation is explained in great detail with all the calculations for a simple example carried out. Error functions and their role in the whole system is explained in great detail. This chapter introduces the first example of Python code in Keras, with all the details of running Python and Keras explained in detail (imports, Keras-specific functions and regular Python functions).

Abstract

This chapter explores modifications and extensions to simple feed-forward neural networks, which can be applied to any other neural network. The problem of local minima as one of the main problems in machine learning is explored with all of its intricacies. The main strategy against local minima is the idea of regularization, by adding a regularization parameter when learning. Both L1 and L2 regularizations are explored and explained in detail. The chapter also addresses the idea of the learning rate and shows how to implement it in backpropagation, both in the static and dynamic setting. Momentum is also explored, as a technique which also helps against local minima by adding inertia to the gradient descent. This chapter also explores the stochastic gradient descent in the form of learning with batches and pure online learning. This chapter concludes with a final view on the vanishing and exploding gradient problems, setting the stage for deep learning.

Abstract

This chapter introduces the first deep learning architecture of the book, convolutional neural networks. It starts with redefining the way a logistic regression accepts data, and defines 1D and 2D convolutional layers as a natural extension of the logistic regression. The chapter also details on how to connect the layers and dimensionality problems. The local receptive field is introduced as a core concept of any convolutional architecture and the connections with the vanishing gradient problem is explored. Also the idea of padding is introduced in the visual setting, as well as the stride of the local receptive field. Pooling is also explored in the general setting and as max-pooling. A complete convolutional neural network for classifying MNIST is then presented in Keras code, and all the details of the code are presented as comments and illustrations. The final section of the chapter presents modifications needed to adapt convolutional networks, which are primarily visual classificators, to work with text and language.

Abstract

This chapter introduces recurrent connections to a simple feedforward neural network. The idea of unfolding is presented in detail, and also the three basic settings of learning (sequence to label, sequence to sequence of labels and sequences with no labels) are introduced and explained in probabilistic terms. The role of hidden states is presented in a detailed exposition (with abundant illustrations) in the setting of a simple recurrent network, both with Elman and Jordan units. All the necessary mathematical framework is developed in detail. The chapter also introduces the long short-term memory (LSTM) and gives a detailed mathematical and graphical treatment of all the gates used in the long short-term memory. The chapter also gives an example of a recurrent neural network for learning text without labels on a word level, and explains all the necessary details of the code. The code also introduces a different kind of evaluation, which is typical for natural language learning.

Abstract

This chapter revisits the principal component analysis and the notion of distributed representations. The main focus here is on filling the part that was left out in Chap. 3, completing the exposition of the principal component analysis, and demonstrating what a distributed representation is in mathematical terms. The chapter then introduces the main unsupervised learning technique for deep learning, the autoencoder. The structural aspects are presented in detail with both explanations and illustrations, and several different types of autoencoders are presented as variations of a single theme. The idea of stacking autoencoders to produce even more condensed distributed representations is presented in detail, and the Python code for stacking and saving representations with autoencoders is provided with abundant explanations and illustrations. The chapter concludes with the recreation of a classical result in deep learning, an autoencoder that can learn to draw cats from watching unlabeled videos.

Abstract

This chapter revisits language processing, this time equipped with deep learning. Recurrent neural networks and autoencoders are needed for this chapter, but the exposition is clear and uses them mainly in a conceptual rather than computational sense. The idea of word embeddings is explored and the main deep learning method for representing text, the neural word embedding is described with the famous Word2vec algorithm, in both the Skip-gram and CBOW variant. A CBOW Word2vec architecture is explored in detail and presented in Python code. This code presupposes a text as a list, but this code was written and explained in the previous chapters, and PCA is used in the code to reduce the dimensionality of the vectors to enable an easy display. The chapter concludes with word analogies and simple calculations that can be done and form the basis of analogical reasoning, a simple reasoning calculus that is neural all the way with no symbolic manipulation used.

Abstract

Energy-based models are a specific class of neural networks. The simplest energy model is the Hopfield Network dating back from the 1980s (Hopfield Proc Nat Acad Sci USA 79(8):2554–2558, 1982, [1]). Hopfield networks are often thought to be very simple, but they are quite different from what we have seen before.

Abstract

This chapter is the conclusion of the book with a short recapitulation of the ideas and an incomplete and interdisciplinary list of interesting open research problems in deep learning and connectionism.