Gaussian Process Models (GPMs)

Nelles, Oliver

doi:10.1007/978-3-030-47439-3_16

Oliver Nelles²

3490 Accesses

The original version of this chapter was revised: The abbreviation of the word (ACSMO) has been corrected to (ASCMO) on page no. 640. The correction to this chapter is available at https://doi.org//10.1007/978-3-030-47439-3_30

Abstract

This chapter is devoted to Gaussian processes. Compared to existing literature, it tries to approach this very abstract and complex topic from an intuitive perspective. The features of kernel methods are explained, and their characteristics are highlighted. The key ideas are illustrated with the help of many extremely simplified examples, typically just 1D or 2D, and very few data points. This should allow grasping the basic concepts involved. All toy examples are simultaneously carried out with two different kernel functions: Gaussians and inverse quadratic. The concepts are introduced step by step, starting with just the mean prediction in the noise-free case and adding complexity gradually. The relationship with RBF networks is discussed explicitly. Shedding light on Gaussian processes from various directions, they are hopefully easier to understand than from standard textbooks on this topic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.etas.com/download-center-files/products_ASCMO/ascmo_flyer_en.pdf
2.
The original ideas go back to “the Master’s thesis of Danie G. Krige” in 1951 [317], https://en.wikipedia.org/wiki/Kriging.
3.
Engineering notation
4.
Neural network notation
5.
Statistics notation
6.
≈ 1 for points nearby, ≈ 0 for points far away from each other
7.
for the dual variables
8.
The size of the training data set N is not selected to control the network’s flexibility like it is usually done with the number of neurons M.
9.
http://math.stackexchange.com/questions/892832/why-we-consider-log-likelihood-instead-of-likelihood-in-gaussian-distribution
10.
Slices through Gaussians and marginal distributions of Gaussians always are Gaussians themselves. This is the reason why Gaussian process models are working so nicely and efficiently.
11.
For matrices and vectors, it can be interpreted as some increasing or decreasing norm.
12.
Note that in most of the standard literature, both r and k are typically denoted by p which is reserved for the number of inputs throughout this book.
13.
The probability to fit any real number exactly is zero, of course.
14.
http://stats.stackexchange.com/questions/24799/cross-validation-vs-empirical-bayes-for-estimating-hyperparameters/24818
15.
http://www.gaussianprocess.org/gpml/code/matlab/doc/index.html

References

Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: International Conference on Database Theory, pp. 420–434. Springer (2001)
Google Scholar
Anjyo, K., Lewis, J.P.: RBF interpolation and gaussian process regression through an RKHS formulation. J. Math. Ind. 3(6), 63–71 (2011)
MathSciNet MATH Google Scholar
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Article MathSciNet MATH Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012)
MathSciNet MATH Google Scholar
Brovelli, M.A., Sanso, F., Venuti, G.: A discussion on the Wiener–Kolmogorov prediction principle with easy-to-compute and robust variants. J. Geod. 76(11–12), 673–683 (2003)
Article MATH ADS Google Scholar
Cawley, G.C., Talbot, N.L.C.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)
MathSciNet MATH Google Scholar
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1–3), 131–159 (2002)
Article MATH Google Scholar
Chen, P.-W., Wang, J.-Y., Lee, H.-M.: Model selection of SVMs using GA approach. In: 2004 IEEE International Joint Conference on Neural Networks. Proceedings. vol. 3, pp. 2035–2040. IEEE (2004)
Google Scholar
Duan, K., Keerthi, S.S., Poo, A.N.: Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51, 41–59 (2003)
Article Google Scholar
Duvenaud, D.: Automatic Model Construction with Gaussian Processes. Ph.D. thesis, University of Cambridge (2014)
Google Scholar
Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput Math. 13(1), 1–50 (2000)
Article MathSciNet MATH Google Scholar
Francois, D., Wertz, V., Verleysen, M., et al.: About the locality of kernels in high-dimensional spaces. In: International Symposium on Applied Stochastic Models and Data Analysis, pp. 238–245 (2005)
Google Scholar
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7(2), 219–269 (1995)
Article Google Scholar
Guo, X.C., Yang, J.H., Wu, C.G., Wang, C.Y., Liang, Y.C.: A novel LS-SVMS hyper-parameter selection based on particle swarm optimization. Neurocomputing 71(16), 3211–3215 (2008)
Article Google Scholar
Hainmueller, J., Hazlett, C.: Kernel regularized least squares: reducing misspecification bias with a flexible and interpretable machine learning approach. Polit. Anal. mpt019 (2013)
Google Scholar
Hoffmann, S., Schrott, M., Huber, T., Kruse, T.: Model-based methods for the calibration of modern internal combustion engines. MTZ Worldwide 76(4), 24–29 (2015)
Article Google Scholar
Hofmann, T., Schölkopf, B., Smola, A.J.: A tutorial review of RKHS methods in machine learning (2005)
Google Scholar
Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Article Google Scholar
Jeng, J.-T.: Hybrid approach of selecting hyperparameters of support vector machine for regression. IEEE Trans. Syst. Man Cybern. Part B Cybern. 36(3), 699–709 (2005)
Article Google Scholar
Kbiob, D.: A statistical approach to some basic mine valuation problems on the witwatersrand. J. Chem. Metall. Min. Soc. S. Afr. (1951)
Google Scholar
Lin, S.-W., Lee, Z.-J., Chen, S.-C., Tseng, T.-Y.: Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl. Soft Comput. 8(4), 1505–1512 (2008)
Article Google Scholar
Martin, J.D., Simpson, T.W.: Use of kriging models to approximate deterministic computer models. AIAA J. 43(4), 853–863 (2005)
Article ADS Google Scholar
Monaghan, J.J., Gingold, R.A.: Shock simulation by the particle method SPH. J. Comput. Phys. 52(2), 374–389 (1983)
Article MATH ADS Google Scholar
Ong, C.S., Williamson, R.C., Smola, A.J.: Learning the kernel with hyperkernels. J. Mach. Learn. Res. 6(1), 1043–1071 (2005)
MathSciNet MATH Google Scholar
Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G., Ljung, L.: Kernel methods in system identification, machine learning and function estimation: a survey. Automatica 50(3), 657–682 (2014)
Article MathSciNet MATH Google Scholar
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)
Article MATH Google Scholar
Quinonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate gaussian process regression. J. Mach. Learn. Res. 6, 1939–1959 (2005)
MathSciNet MATH Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. MIT Press, Cambridge, MA (2006)
MATH Google Scholar
Rifkin, R.M.: Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. Ph.D. thesis, Massachusetts Institute of Technology (2002)
Google Scholar
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)
MathSciNet MATH Google Scholar
Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci. 4(4), 409–423 (1989)
MathSciNet MATH Google Scholar
Snelson, E., Ghahramani, Z.: Local and global sparse gaussian process approximations. In: AISTATS, vol. 11, pp. 524–531 (2007)
Google Scholar
Sollich, P., Williams, C.K.I.: Understanding gaussian process regression using the equivalent kernel. In: Deterministic and statistical methods in machine learning, pp. 211–228. Springer (2005)
Google Scholar
Suykens, J.A.K., Gestel, T.V., Brabanter, J., Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific Publishing, New Jersey (2003)
MATH Google Scholar
Wahba, G.: A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann. Stat. 13(4), 1378–1402 (1985)
Article MathSciNet MATH Google Scholar
Wahba, G.: Spline Models for Observational data, vol. 59. SIAM (1990)
Google Scholar
Welling, M.: Kernel ridge regression. Max Welling’s Classnotes in Machine Learning (http://www.ics.uci.edu/welling/classnotes/classnotes.html), pp. 1–3 (2013)

Download references

Author information

Authors and Affiliations

University of Siegen, Netphen, Germany
Oliver Nelles

Authors

Oliver Nelles
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nelles, O. (2020). Gaussian Process Models (GPMs). In: Nonlinear System Identification. Springer, Cham. https://doi.org/10.1007/978-3-030-47439-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-47439-3_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-47438-6
Online ISBN: 978-3-030-47439-3
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics