Skip to main content
main-content

Über dieses Buch

Artificial "neural networks" are widely used as flexible models for classification and regression applications, but questions remain about how the power of these models can be safely exploited when training data is limited. This book demonstrates how Bayesian methods allow complex neural network models to be used without fear of the "overfitting" that can occur with traditional training methods. Insight into the nature of these complex Bayesian models is provided by a theoretical investigation of the priors over functions that underlie them. A practical implementation of Bayesian neural network learning using Markov chain Monte Carlo methods is also described, and software for it is freely available over the Internet. Presupposing only basic knowledge of probability and statistics, this book should be of interest to researchers in statistics, engineering, and artificial intelligence.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract
This book develops the Bayesian approach to learning for neural networks by examining the meaning of the prior distributions that are the starting point for Bayesian learning, by showing how the computations required by the Bayesian approach can be performed using Markov chain Monte Carlo methods, and by evaluating the effectiveness of Bayesian methods on several real and synthetic data sets. This work has practical significance for modeling data with neural networks. From a broader perspective, it shows how the Bayesian approach can be successfully applied to complex models, and in particular, challenges the common notion that one must limit the complexity of the model used when the amount of training data is small. I begin here by introducing the Bayesian framework, discussing past work on applying it to neural networks, and reviewing the basic concepts of Markov chain Monte Carlo implementation.
Radford M. Neal

Chapter 2. Priors for Infinite Networks

Abstract
In this chapter, I show that priors over network parameters can be defined in such a way that the corresponding priors over functions computed by the network reach reasonable limits as the number of hidden units goes to infinity. When using such priors,there is thus no need to limit the size of the network in order to avoid “overfitting”. The infinite network limit also provides insight into the properties of different priors. A Gaussian prior for hidden-to-output weights results in a Gaussian process prior for functions,which may be smooth, Brownian, or fractional Brownian. Quite different effects can be obtained using priors based on non-Gaussian stable distributions. In networks with more than one hidden layer, a combination of Gaussian and non-Gaussian priors appears most interesting.
Radford M. Neal

Chapter 3. Monte Carlo Implementation

Abstract
This chapter presents a Markov chain Monte Carlo implementation of Bayesian learning for neural networks in which network parameters are updated using the hybrid Monte Carlo algorithm, a form of the Metropolis algorithm in which candidate states are found by means of dynamical simulation. Hyperparameters are updated separately using Gibbs sampling, allowing their values to be used in chosing good stepsizes for the discretized dynamics. I show that hybrid Monte Carlo performs better than simple Metropolis,due to its avoidance of random walk behaviour. I also discuss variants of hybrid Monte Carlo in which dynamical computations are done using “partial gradients”, in which acceptance is based on a “window” of states,and in which momentum updates incorporate “persistence”.
Radford M. Neal

Chapter 4. Evaluation of Neural Network Models

Abstract
This chapter reports empirical evaluations of the predictive performance of Bayesian neural network models applied to several synthetic and real data sets. Good results were obtained when large networks with appropriate priors were used on small data sets for a synthetic regression problem, confirming expectations based on properties of the associated priors over functions. The Automatic Relevance Determination model was effective in suppressing irrelevant inputs in tests on synthetic regression and classification problems. Tests on two real data sets showed that Bayesian neural network models, implemented using hybrid Monte Carlo, can produce good results when applied to realistic problems of moderate size.
Radford M. Neal

Chaspter 5. Conclusions and Further Work

Abstract
The preceding three chapters have examined the meaning of Bayesian neural network models, showed how these models can be implemented by Markov chain Monte Carlo methods, and demonstrated that such an implementation can be applied in practice to problems of moderate size, with good results. In this concluding chapter, I will review what has been accomplished in these areas, and describe on-going and potential future work to extend these results, both for neural networks and for other flexible Bayesian models.
Radford M. Neal

Backmatter

Weitere Informationen