Skip to main content
main-content

Über dieses Buch

This SpringerBrief describes how to build a rigorous end-to-end mathematical framework for deep neural networks. The authors provide tools to represent and describe neural networks, casting previous results in the field in a more natural light. In particular, the authors derive gradient descent algorithms in a unified way for several neural network structures, including multilayer perceptrons, convolutional neural networks, deep autoencoders and recurrent neural networks. Furthermore, the authors developed framework is both more concise and mathematically intuitive than previous representations of neural networks.

This SpringerBrief is one step towards unlocking the black box of Deep Learning. The authors believe that this framework will help catalyze further discoveries regarding the mathematical properties of neural networks.This SpringerBrief is accessible not only to researchers, professionals and students working and studying in the field of deep learning, but also to those outside of the neutral network community.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction and Motivation

Abstract
This chapter serves as a basic introduction to neural networks, including their history and some applications in which they have achieved state-of-the-art results.
Anthony L. Caterini, Dong Eui Chang

Chapter 2. Mathematical Preliminaries

Abstract
Current mathematical descriptions of neural networks are either exclusively based on scalars or with loosely-defined vector-valued derivatives, which needs to be improved upon. Thus, in this chapter we build up the framework by introducing prerequisite mathematical concepts and notation for handling generic vector-valued maps. The notation that we introduce is standard within vector calculus and provides us with a set of tools to establish a generic neural network structure. Even though some of the concepts in this chapter are quite basic, it is necessary to solidify the symbols and language that we use throughout the book to avoid the pitfall of having ambiguous notation. The first topic that we examine is notation for linear maps, which are useful not only in the feedforward aspect of a generic network, but also in backpropagation. Then we define vector-valued derivative maps, which we require when performing gradient descent steps to optimize the neural network. To represent the dependence of a neural network on its parameters, we then introduce the notion of parameter-dependent maps, including distinct notation for derivatives with respect to parameters as opposed to state variables. Finally, we define elementwise functions, which are used in neural networks as nonlinear activation functions, i.e. to apply a nonlinear function to individual components of a vector.
Anthony L. Caterini, Dong Eui Chang

Chapter 3. Generic Representation of Neural Networks

Abstract
In the previous chapter, we took the first step towards creating a standard mathematical framework for neural networks by developing mathematical tools for vector-valued functions and their derivatives. We use these tools in this chapter to describe the operations employed in a generic layered neural network. Since neural networks have been empirically shown to reap performance benefits from stacking increasingly more layers in succession, it is important to develop a solid and concise theory for representing repeated function composition as it pertains to neural networks, and we see how this can be done in this chapter. Furthermore, since neural networks often learn their parameters via some form of gradient descent, we also compute derivatives of these functions with respect to the parameters at each layer. The derivative maps that we compute remain in the same vector space as the parameters, which allow us to perform gradient descent naturally over these vector spaces for each parameter. This approach contrasts with standard approaches to neural network modelling where the parameters are broken down into their components. We can avoid this unnecessary operation using the framework that we describe. We begin this chapter by formulating a generic neural network as the composition of parameter-dependent functions. We then introduce standard loss functions based on this composition for both the regression and classification cases, and take their derivatives with respect to the parameters at each layer. There are some commonalities between these two cases that we explore. In particular, both employ the same form of error backpropagation, albeit with a slightly differing initialization. We are able to express backpropagation in terms of adjoints of derivative maps over generic vector spaces, which has not been explored before. We then outline a concise algorithm for computing derivatives of the loss functions with respect to their parameters directly over the vector space in which the parameters are defined. This helps to clarify the theoretical results presented. We also model a higher-order loss function that imposes a penalty on the derivative towards the end of this chapter. This demonstrates one way to extend the framework that we have developed to a more complicated loss function and also demonstrates its flexibility.
Anthony L. Caterini, Dong Eui Chang

Chapter 4. Specific Network Descriptions

Abstract
We developed an algebraic framework for a generic layered network in the preceding chapter, including a method to express error backpropagation and loss function derivatives directly over the inner product space in which the network parameters are defined. We dedicate this chapter to expressing three common neural network structures within this generic framework: the Multilayer Perceptron (MLP), the Convolutional Neural Network (CNN), and the Deep Auto-Encoder (DAE). The exact layout of this chapter is as follows. We first explore the simple case of the MLP, deriving the canonical vector-valued form of backpropagation along the way. Then, we shift our attention to the CNN. Here, our layerwise function is far more complicated, as our inputs and parameters are in tensor product spaces, and thus we require more complex operations to combine the inputs and the parameters. CNNs still fit squarely in the framework developed in the previous chapter. The final network that we consider in this chapter, the DAE, does not fit as easily into that framework, as the parameters at any given layer have a deterministic relationship with the parameters at exactly one other layer, violating the assumption of parametric independence across layers. We overcome this issue, however, with a small adjustment to the framework. An algorithm for one step of gradient descent is derived for each type of network.
Anthony L. Caterini, Dong Eui Chang

Chapter 5. Recurrent Neural Networks

Abstract
We applied the generic neural network framework from Chap. 3 to specific network structures in the previous chapter. Multilayer Perceptrons and Convolutional Neural Networks fit squarely into that framework, and we were also able to modify it to capture Deep Auto-Encoders. We now extend the generic framework even further to handle Recurrent Neural Networks (RNNs), the sequence-parsing network structure containing a recurring latent, or hidden, state that evolves at each layer of the network. This involves the development of new notation, but we remain as consistent as possible with previous chapters. The specific layout of this chapter is as follows. We first formulate a generic, feed-forward recurrent neural network. We calculate gradients of loss functions for these networks in two ways: Real-Time Recurrent Learning (RTRL) and Backpropagation Through Time (BPTT). Using our notation for vector-valued maps, we derive these algorithms directly over the inner product space in which the parameters reside. We then proceed to formally represent a vanilla RNN, which is the simplest form of RNN, and we formulate RTRL and BPTT for that as well. At the end of the chapter, we briefly mention modern RNN variants in the context of our generic framework.
Anthony L. Caterini, Dong Eui Chang

Chapter 6. Conclusion and Future Work

Abstract
This chapter serves as a conclusion of this book and provides possible directions for future research.
Anthony L. Caterini, Dong Eui Chang

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise