Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond

Williams, C. K. I.

doi:10.1007/978-94-011-5014-9_23

C. K. I. Williams²

Part of the book series: NATO ASI Series ((ASID,volume 89))

2772 Accesses
218 Citations
3 Altmetric

Abstract

The main aim of this paper is to provide a tutorial on regression with Gaussian processes. We start from Bayesian linear regression, and show how by a change of viewpoint one can see this method as a Gaussian process predictor based on priors over functions, rather than on priors over parameters. This leads in to a more general discussion of Gaussian processes in section 4. Section 5 deals with further issues, including hierarchical modelling and the setting of the parameters that control the Gaussian process, the covariance functions for neural network models and the use of Gaussian processes in classification problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aizerman, M. A., E. M. Braverman, and L. I. Rozoner (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25, 821–837.
Google Scholar
Barber, D. and C. K. I. Williams (1997). Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo. In M. C. Mozer, M. I. Jordan, and T. Petsche (Eds.), Advances in Neural Information Processing Systems 9. MIT Press.
Google Scholar
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
Google Scholar
Box, G. E. P. and G. C. Tiao (1973). Bayesian Inference in Statistical Analysis. Reading, Mass.: Addison-Wesley.
MATH Google Scholar
Bridle, J. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fougelman-Soulie and J. Herault (Eds.), NATO ASI series on systems and computer science. Springer-Verlag.
Google Scholar
Cressie, N. A. C. (1993). Statistics for Spatial Data. New York: Wiley.
Google Scholar
Duane, S., A. D. Kennedy, B. J. Pendleton, and D. Roweth (1987). Hybrid Monte Carlo. Physics Letters B 195, 216–222.
Article Google Scholar
Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin (1995). Bayesian Data Analysis. London: Chapman and Hall.
Google Scholar
Gibbs, M. and D. J. C. Mackay (1997a). Efficient Implementation of Gaussian Processes. Draft manuscript, available from http://wol.ra.phy.cam.ac.uk/mackay/honiepage.html.
Gibbs, M. and D. J. C. Mackay (1997b). Variational Gaussian Process Classifiers. Draft manuscript, available via http://wol.ra.phy.cam.ac.uk/mackay/homepage.html.
Girard, D. (1989). A fast ”Monte Carlo cross-validation” procedure for large least squares problems with noisy data. Numer. Math. 56, 1–23.
Article MathSciNet MATH Google Scholar
Girosi, F., M. Jones, and T. Poggio (1995). Regularization Theory and Neural Networks Architectures. Neural Computation 7(2), 219–269.
Article Google Scholar
Goldberg, P. W., C. K. I. Williams, and C. M. Bishop (1997). Regression with Inputdependent Noise: A Gaussian Process Treatment. Accepted to NIPS*97.
Google Scholar
Green, P. J. and B. W. Silverman (1994). Nonparametric regression and generalized linear models. London: Chapman and Hall.
MATH Google Scholar
Handcock, M. S. and M. L. Stein (1993). A Bayesian Analysis of kriging. Technometrics 35(4), 403–410.
Article Google Scholar
Hastie, T. (1996). Pseudosplines. Journal of the Royal Statistical Society B 58, 379–396.
MathSciNet MATH Google Scholar
Hastie, T. J. and R. J. Tibshirani (1990). Generalized Additive Models. London: Chapman and Hall.
MATH Google Scholar
Hornik, K. (1993). Some new results on neural network approximation. Neural Networks 6(8), 1069–1072.
Article Google Scholar
Hutchinson, M. (1989). A stochastic estimator for the trace of the influence matrix for Laplacian smoothing splines. Communications in statistics:Simulation and computation 18, 1059–1076.
Article MathSciNet MATH Google Scholar
Journel, A. G. and C. J. Huijbregts (1978). Mining Geostatistics. Academic Press.
Google Scholar
Kimeldorf, G. and G. Wahba (1970). A correspondence between Bayesian estimation of stochastic processes and smoothing by splines. Annals of Mathematical Statistics 41, 495–502.
Article MathSciNet MATH Google Scholar
MacKay, D. J. C. (1992). A Practical Bayesian Framework for Backpropagation Networks. Neural Computation 4(3), 448–472.
Article Google Scholar
MacKay, D. J. C. (1993). Bayesian Methods for Backpropagation Networks. In J. L. van Hemmen, E. Domany, and K. Schulten (Eds.), Models of Neural Networks II. Springer.
Google Scholar
Mardia, K. V. and R. J. Marshall (1984). Maximum likelihood estimation for models of residual covariance in spatial regression. Biometrika 71(1), 135–146.
Article MathSciNet MATH Google Scholar
Neal, R. M. (1997). Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification. Draft manuscript, available from http://www.cs.toronto.edu/~radford/.
Neal, R. M. (1996). Bayesian Learning for Neural Networks. New York: Springer. Lecture Notes in Statistics 118.
Book MATH Google Scholar
O’Hagan, A. (1978). Curve Fitting and Optimal Design for Prediction (with discussion). Journal of the Royal Statistical Society B 40(1), 1–42.
MathSciNet MATH Google Scholar
O’Sullivan, F., B. S. Yandell, and W. J. Raynor (1986). Automatic Smoothing of Regression Functions in Generalized Linear Models. Journal of the American Statistical Association 81, 96–103.
Article MathSciNet Google Scholar
Poggio, T. and F. Girosi (1990). Networks for approximation and learning. Proceedings of IEEE 78, 1481–1497.
Article Google Scholar
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992). Numerical Recipes in C (second ed.). Cambridge University Press.
Google Scholar
Rasmussen, C. E. (1996). Evaluation of Gaussian Processes and Other Methods for Nonlinear Regression. Ph.D. thesis, Dept. of Computer Science, University of Toronto. Available from http://ward.cs.utoronto.ca/~carl/.
Ripley, B. (1996). Pattern Recognition and Neural Networks. Cambridge, UK: Cambridge University Press.
MATH Google Scholar
Sacks, J., W. J. Welch, T. J. Mitchell, and H. P. Wynn (1989). Design and Analysis of Computer Experiments. Statistical Science 4(4), 409–435.
Article MathSciNet MATH Google Scholar
Sampson, P. D. and P. Guttorp (1992). Nonparametric estimation of nonstationary covariance structure. Journal of the American Statistical Association 87, 108–119.
Article Google Scholar
Silverman, B. W. (1978). Density Ratios, Empirical Likelihood and Cot Death. Applied Statistics 27(1), 26–33.
Article Google Scholar
Silverman, B. W. (1985). Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with discussion). J. Roy. Stat. Sac. B 47(1), 1–52.
MATH Google Scholar
Skilling, J. (1993). Bayesian numerical analysis. In W. T. Grandy, Jr. and P. Milonni (Eds.), Physics and Probability. Cambridge University Press.
Google Scholar
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. New York: Springer Verlag.
Book MATH Google Scholar
von Mises, R. (1964). Mathematical Theory of Probability and Statistics. Academic Press.
MATH Google Scholar
Wahba, G. (1990). Spline Models for Observational Data. Society for Industrial and Applied Mathematics. CBMS-NSF Regional Conference series in applied mathematics.
Google Scholar
Whittle, P. (1963). Prediction and regulation by linear least-square methods. English Universities Press.
Google Scholar
Williams, C. K. I. (1997a). Computation with infinite neural networks. Submitted to Neural Computation.
Google Scholar
Williams, C. K. I. (1997b). Computing with infinite networks. In M. C. Moser, M. I. Jordan, and T. Petsche (Eds.), Advances in Neural Information Processing Systems 9. MIT Press.
Google Scholar
Williams, C. K. I. and C. E. Rasmussen (1996). Gaussian processes for regression. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8, pp. 514–520. MIT Press.
Google Scholar
Wong, E. (1971). Stochastic Processes in Information and Dynamical Systems. New York: McGraw-Hill.
MATH Google Scholar
Zhu, H. and R. Rohwer (1996). Bayesian Regression Filters and the Issue of Priors. Neural Computing and Applications 4, 130–142.
Article Google Scholar
Zhu, H., C. K. I. Williams, R. J. Rohwer, and M. Morciniec (1997). Gaussian Regression and Optimal Finite Dimensional Linear Models. Technical Report NCRG/97/011, Aston University, UK. Available from http://www.ncrg.aston.ac.uk/Papers/.

Download references

Author information

Authors and Affiliations

Neural Computing Research Group, Aston University, Birmingham, B4 7ET, UK
C. K. I. Williams

Authors

C. K. I. Williams
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Massachusetts Institute of Technology, E25-229, Cambridge, MA, 02139, USA
Michael I. Jordan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Williams, C.K.I. (1998). Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond. In: Jordan, M.I. (eds) Learning in Graphical Models. NATO ASI Series, vol 89. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5014-9_23

Download citation

DOI: https://doi.org/10.1007/978-94-011-5014-9_23
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-6104-9
Online ISBN: 978-94-011-5014-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics