Abstract
The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aizerman, M. A., E. M. Braverman, and L. I. Rozoner (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25, 821–837.
Barber, D. and C. K. I. Williams (1997). Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo. In M. C. Mozer, M. I. Jordan, and T. Petsche (Eds.), Advances in Neural Information Processing Systems 9. MIT Press.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
Box, G. E. P. and G. C. Tiao (1973). Bayesian Inference in Statistical Analysis. Reading, Mass.: Addison-Wesley.
Bridle, J. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fougelman-Soulie and J. Herault (Eds.), NATO ASI series on systems and computer science. Springer-Verlag.
Cressie, N. A. C. (1993). Statistics for Spatial Data. New York: Wiley.
Duane, S., A. D. Kennedy, B. J. Pendleton, and D. Roweth (1987). Hybrid Monte Carlo. Physics Letters B 195, 216–222.
Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (1995). Bayesian Data Analysis. London: Chapman and Hall.
Gibbs, M. and D. J. C. Mackay (1997a). Efficient Implementation of Gaussian Processes. Draft manuscript, available from http://wol.ra.phy.cam.ac.uk/mackay/honiepage.html.
Gibbs, M. and D. J. C. Mackay (1997b). Variational Gaussian Process Classifiers. Draft manuscript, available via http://wol.ra.phy.cam.ac.uk/mackay/homepage.html.
Girard, D. (1989). A fast ”Monte Carlo cross-validation” procedure for large least squares problems with noisy data. Numer. Math. 56, 1–23.
Girosi, F., M. Jones, and T. Poggio (1995). Regularization Theory and Neural Networks Architectures. Neural Computation 7(2), 219–269.
Goldberg, P. W., C. K. I. Williams, and C. M. Bishop (1997). Regression with Inputdependent Noise: A Gaussian Process Treatment. Accepted to NIPS*97.
Green, P. J. and B. W. Silverman (1994). Nonparametric regression and generalized linear models. London: Chapman and Hall.
Handcock, M. S. and M. L. Stein (1993). A Bayesian Analysis of kriging. Technometrics 35(4), 403–410.
Hastie, T. (1996). Pseudosplines. Journal of the Royal Statistical Society B 58, 379–396.
Hastie, T. J. and R. J. Tibshirani (1990). Generalized Additive Models. London: Chapman and Hall.
Hornik, K. (1993). Some new results on neural network approximation. Neural Networks 6(8), 1069–1072.
Hutchinson, M. (1989). A stochastic estimator for the trace of the influence matrix for Laplacian smoothing splines. Communications in statistics:Simulation and computation 18, 1059–1076.
Journel, A. G. and C. J. Huijbregts (1978). Mining Geostatistics. Academic Press.
Kimeldorf, G. and G. Wahba (1970). A correspondence between Bayesian estimation of stochastic processes and smoothing by splines. Annals of Mathematical Statistics 41, 495–502.
MacKay, D. J. C. (1992). A Practical Bayesian Framework for Backpropagation Networks. Neural Computation 4(3), 448–472.
MacKay, D. J. C. (1993). Bayesian Methods for Backpropagation Networks. In J. L. van Hemmen, E. Domany, and K. Schulten (Eds.), Models of Neural Networks II. Springer.
Mardia, K. V. and R. J. Marshall (1984). Maximum likelihood estimation for models of residual covariance in spatial regression. Biometrika 71(1), 135–146.
Neal, R. M. (1997). Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification. Draft manuscript, available from http://www.cs.toronto.edu/~radford/.
Neal, R. M. (1996). Bayesian Learning for Neural Networks. New York: Springer. Lecture Notes in Statistics 118.
O’Hagan, A. (1978). Curve Fitting and Optimal Design for Prediction (with discussion). Journal of the Royal Statistical Society B 40(1), 1–42.
O’Sullivan, F., B. S. Yandell, and W. J. Raynor (1986). Automatic Smoothing of Regression Functions in Generalized Linear Models. Journal of the American Statistical Association 81, 96–103.
Poggio, T. and F. Girosi (1990). Networks for approximation and learning. Proceedings of IEEE 78, 1481–1497.
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992). Numerical Recipes in C (second ed.). Cambridge University Press.
Rasmussen, C. E. (1996). Evaluation of Gaussian Processes and Other Methods for Nonlinear Regression. Ph.D. thesis, Dept. of Computer Science, University of Toronto. Available from http://ward.cs.utoronto.ca/~carl/.
Ripley, B. (1996). Pattern Recognition and Neural Networks. Cambridge, UK: Cambridge University Press.
Sacks, J., W. J. Welch, T. J. Mitchell, and H. P. Wynn (1989). Design and Analysis of Computer Experiments. Statistical Science 4(4), 409–435.
Sampson, P. D. and P. Guttorp (1992). Nonparametric estimation of nonstationary covariance structure. Journal of the American Statistical Association 87, 108–119.
Silverman, B. W. (1978). Density Ratios, Empirical Likelihood and Cot Death. Applied Statistics 27(1), 26–33.
Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with discussion). J. Roy. Stat. Sac. B 47(1), 1–52.
Skilling, J. (1993). Bayesian numerical analysis. In W. T. Grandy, Jr. and P. Milonni (Eds.), Physics and Probability. Cambridge University Press.
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer Verlag.
von Mises, R. (1964). Mathematical Theory of Probability and Statistics. Academic Press.
Wahba, G. (1990). Spline Models for Observational Data. Society for Industrial and Applied Mathematics. CBMS-NSF Regional Conference series in applied mathematics.
Whittle, P. (1963). Prediction and regulation by linear least-square methods. English Universities Press.
Williams, C. K. I. (1997a). Computation with infinite neural networks. Submitted to Neural Computation.
Williams, C. K. I. (1997b). Computing with infinite networks. In M. C. Moser, M. I. Jordan, and T. Petsche (Eds.), Advances in Neural Information Processing Systems 9. MIT Press.
Williams, C. K. I. and C. E. Rasmussen (1996). Gaussian processes for regression. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8, pp. 514–520. MIT Press.
Wong, E. (1971). Stochastic Processes in Information and Dynamical Systems. New York: McGraw-Hill.
Zhu, H. and R. Rohwer (1996). Bayesian Regression Filters and the Issue of Priors. Neural Computing and Applications 4, 130–142.
Zhu, H., C. K. I. Williams, R. J. Rohwer, and M. Morciniec (1997). Gaussian Regression and Optimal Finite Dimensional Linear Models. Technical Report NCRG/97/011, Aston University, UK. Available from http://www.ncrg.aston.ac.uk/Papers/.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Williams, C.K.I. (1998). Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond. In: Jordan, M.I. (eds) Learning in Graphical Models. NATO ASI Series, vol 89. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5014-9_23
Download citation
DOI: https://doi.org/10.1007/978-94-011-5014-9_23
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-6104-9
Online ISBN: 978-94-011-5014-9
eBook Packages: Springer Book Archive