Elsevier

Neural Networks

Volume 61, January 2015, Pages 85-117
Neural Networks

Review
Deep learning in neural networks: An overview

https://doi.org/10.1016/j.neunet.2014.09.003Get rights and content

Abstract

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarizes relevant work, much of it from the previous millennium. Shallow and Deep Learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

Section snippets

Preface

This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and

Introduction to Deep Learning (DL) in Neural Networks (NNs)

Which modifiable components of a learning system are responsible for its success or failure? What changes to them improve performance? This has been called the fundamental credit assignment problem (Minsky, 1963). There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses (Section  6.8). The present survey, however, will focus on the narrower, but now commercially important, subfield of Deep Learning (DL) in Artificial Neural

Event-oriented notation for activation spreading in NNs

Throughout this paper, let i,j,k,t,p,q,r denote positive integer variables assuming ranges implicit in the given contexts. Let n,m,T denote positive integer constants.

An NN’s topology may change over time (e.g., Sections 5.3, 5.6.3). At any given moment, it can be described as a finite subset of units (or nodes or neurons) N={u1,u2,,} and a finite set HN×N of directed edges or connections between nodes. FNNs are acyclic graphs, RNNs cyclic. The first (input) layer is the set of input units, a

Depth of Credit Assignment Paths (CAPs) and of problems

To measure whether credit assignment in a given NN application is of the deep or shallow type, I introduce the concept of Credit Assignment Paths or CAPs, which are chains of possibly causal links between the events of Section  2, e.g., from input through hidden to output layers in FNNs, or through transformations over time in RNNs.

Let us first focus on SL. Consider two events xp and xq(1p<qT). Depending on the application, they may have a Potential Direct Causal Connection (PDCC) expressed

Dynamic programming for Supervised/Reinforcement Learning (SL/RL)

One recurring theme of DL is Dynamic Programming (DP) (Bellman, 1957), which can help to facilitate credit assignment under certain assumptions. For example, in SL NNs, backpropagation itself can be viewed as a DP-derived method (Section  5.5). In traditional RL based on strong Markovian assumptions, DP-derived methods can help to greatly reduce problem depth (Section  6.2). DP algorithms are also essential for systems that combine concepts of NNs and graphical models, such as Hidden Markov

Supervised NNs, some helped by unsupervised NNs

The main focus of current practical applications is on Supervised Learning (SL), which has dominated recent pattern recognition contests (Sections  5.17 2009: first official competitions won by RNNs, and with MPCNNs, 5.18 2010: plain backprop (+ distortions) on GPU breaks MNIST record, 5.19 2011: MPCNNs on GPU achieve superhuman vision performance, 5.20 2011: Hessian-free optimization for RNNs, 5.21 2012: first contests won on ImageNet, object detection, segmentation, 5.22 2013-: more contests

DL in FNNs and RNNs for Reinforcement Learning (RL)

So far we have focused on Deep Learning (DL) in supervised or unsupervised NNs. Such NNs learn to perceive/encode/predict/classify patterns or pattern sequences, but they do not learn to act in the more general sense of Reinforcement Learning (RL) in unknown environments (see surveys, e.g.,  Kaelbling et al., 1996, Sutton and Barto, 1998, Wiering and van Otterlo, 2012). Here we add a discussion of DL FNNs and RNNs for RL. It will be shorter than the discussion of FNNs and RNNs for SL and UL

Conclusion and outlook

Deep Learning (DL) in Neural Networks (NNs) is relevant for Supervised Learning (SL) (Section  5), Unsupervised Learning (UL) (Section  5), and Reinforcement Learning (RL) (Section  6). By alleviating problems with deep Credit Assignment Paths (CAPs, Sections 3, 5.9), UL (Section  5.6.4) cannot only facilitate SL of sequences (Section  5.10) and stationary patterns (Sections 5.7, 5.15), but also RL (Sections 6.4, 4.2). Dynamic Programming (DP, Section  4.1) is important for both deep SL

Acknowledgments

Since 16 April 2014, drafts of this paper have undergone massive open online peer review through public mailing lists including [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], Googlemachine learning forum. Thanks to numerous NN/DL experts for valuable comments. Thanks to SNF, DFG, and the European Commission for partially funding my DL research group in the past quarter-century.

References (888)

  • D. Aberdeen

    Policy-gradient algorithms for partially observable Markov decision processes

    (2003)
  • J. Abounadi et al.

    Learning algorithms for Markov decision processes with average cost

    SIAM Journal on Control and Optimization

    (2002)
  • H. Akaike

    Statistical predictor identification

    Annals of the Institute of Statistical Mathematics

    (1970)
  • H. Akaike

    Information theory and an extension of the maximum likelihood principle

  • H. Akaike

    A new look at the statistical model identification

    IEEE Transactions on Automatic Control

    (1974)
  • A. Allender

    Application of time-bounded Kolmogorov complexity in complexity theory

  • Almeida, L. B. (1987). A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In...
  • L.B. Almeida et al.

    On-line step size adaptation. Technical report, INESC, 9 Rua Alves Redol, 1000

    (1997)
  • S. Amari

    A theory of adaptive pattern classifiers

    IEEE Transactions on Electronic Computers

    (1967)
  • S.-I. Amari

    Natural gradient works efficiently in learning

    Neural Computation

    (1998)
  • S. Amari et al.

    A new learning algorithm for blind signal separation

  • S. Amari et al.

    Statistical theory of learning curves under entropic loss criterion

    Neural Computation

    (1993)
  • D.J. Amit et al.

    Dynamics of a recurrent network of spiking neurons before and following learning

    Network: Computation in Neural Systems

    (1997)
  • G. An

    The effects of adding noise during backpropagation training on a generalization performance

    Neural Computation

    (1996)
  • M.A. Andrade et al.

    Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network

    Protein Engineering

    (1993)
  • I. Arel et al.

    Deep machine learning—a new frontier in artificial intelligence research

    IEEE Computational Intelligence Magazine

    (2010)
  • T. Ash

    Dynamic node creation in backpropagation neural networks

    Connection Science

    (1989)
  • J.J. Atick et al.

    Understanding retinal color coding from first principles

    Neural Computation

    (1992)
  • A.F. Atiya et al.

    New results on recurrent network training: unifying the algorithms and accelerating convergence

    IEEE Transactions on Neural Networks

    (2000)
  • J. Ba et al.

    Adaptive dropout for training deep neural networks

  • Baird, H. (1990). Document image defect models. In Proceddings, IAPR workshop on syntactic and structural pattern...
  • Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In International...
  • L. Baird et al.

    Gradient descent for general reinforcement learning

  • B. Bakker

    Reinforcement learning with long short-term memory

  • B. Bakker et al.

    Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization

  • Bakker, B., Zhumatiy, V., Gruener, G., & Schmidhuber, J. (2003). A robot that reinforcement-learns to identify and...
  • P. Baldi

    Gradient descent learning algorithms overview: A general dynamical systems perspective

    IEEE Transactions on Neural Networks

    (1995)
  • P. Baldi

    Autoencoders, unsupervised learning, and deep architectures

    Journal of Machine Learning Research

    (2012)
  • P. Baldi et al.

    Exploiting the past and the future in protein secondary structure prediction

    Bioinformatics

    (1999)
  • P. Baldi et al.

    Neural networks for fingerprint recognition

    Neural Computation

    (1993)
  • P. Baldi et al.

    Hybrid modeling, HMM/NN architectures, and protein applications

    Neural Computation

    (1996)
  • P. Baldi et al.

    Learning in linear networks: a survey

    IEEE Transactions on Neural Networks

    (1995)
  • P. Baldi et al.

    The principled design of large-scale recursive neural network architectures—DAG-RNNs and the protein structure prediction problem

    Journal of Machine Learning Research

    (2003)
  • Ballard, D. H. (1987). Modular learning in neural networks. In Proc. AAAI (pp....
  • S. Baluja

    Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report CMU-CS-94-163

    (1994)
  • R. Balzer

    A 15 year perspective on automatic programming

    IEEE Transactions on Software Engineering

    (1985)
  • H.B. Barlow

    Unsupervised learning

    Neural Computation

    (1989)
  • H.B. Barlow et al.

    Finding minimum entropy codes

    Neural Computation

    (1989)
  • H.G. Barrow

    Learning receptive fields

  • A.G. Barto et al.

    Recent advances in hierarchical reinforcement learning

    Discrete Event Dynamic Systems

    (2003)
  • Cited by (14572)

    View all citing articles on Scopus
    View full text