Abstract
This chapter is devoted to Gaussian processes. Compared to existing literature, it tries to approach this very abstract and complex topic from an intuitive perspective. The features of kernel methods are explained, and their characteristics are highlighted. The key ideas are illustrated with the help of many extremely simplified examples, typically just 1D or 2D, and very few data points. This should allow grasping the basic concepts involved. All toy examples are simultaneously carried out with two different kernel functions: Gaussians and inverse quadratic. The concepts are introduced step by step, starting with just the mean prediction in the noise-free case and adding complexity gradually. The relationship with RBF networks is discussed explicitly. Shedding light on Gaussian processes from various directions, they are hopefully easier to understand than from standard textbooks on this topic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The original ideas go back to “the Master’s thesis of Danie G. Krige” in 1951 [317], https://en.wikipedia.org/wiki/Kriging.
- 3.
Engineering notation
- 4.
Neural network notation
- 5.
Statistics notation
- 6.
≈ 1 for points nearby, ≈ 0 for points far away from each other
- 7.
for the dual variables
- 8.
The size of the training data set N is not selected to control the network’s flexibility like it is usually done with the number of neurons M.
- 9.
- 10.
Slices through Gaussians and marginal distributions of Gaussians always are Gaussians themselves. This is the reason why Gaussian process models are working so nicely and efficiently.
- 11.
For matrices and vectors, it can be interpreted as some increasing or decreasing norm.
- 12.
Note that in most of the standard literature, both r and k are typically denoted by p which is reserved for the number of inputs throughout this book.
- 13.
The probability to fit any real number exactly is zero, of course.
- 14.
- 15.
References
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: International Conference on Database Theory, pp. 420–434. Springer (2001)
Anjyo, K., Lewis, J.P.: RBF interpolation and gaussian process regression through an RKHS formulation. J. Math. Ind. 3(6), 63–71 (2011)
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012)
Brovelli, M.A., Sanso, F., Venuti, G.: A discussion on the Wiener–Kolmogorov prediction principle with easy-to-compute and robust variants. J. Geod. 76(11–12), 673–683 (2003)
Cawley, G.C., Talbot, N.L.C.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)
Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1–3), 131–159 (2002)
Chen, P.-W., Wang, J.-Y., Lee, H.-M.: Model selection of SVMs using GA approach. In: 2004 IEEE International Joint Conference on Neural Networks. Proceedings. vol. 3, pp. 2035–2040. IEEE (2004)
Duan, K., Keerthi, S.S., Poo, A.N.: Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51, 41–59 (2003)
Duvenaud, D.: Automatic Model Construction with Gaussian Processes. Ph.D. thesis, University of Cambridge (2014)
Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput Math. 13(1), 1–50 (2000)
Francois, D., Wertz, V., Verleysen, M., et al.: About the locality of kernels in high-dimensional spaces. In: International Symposium on Applied Stochastic Models and Data Analysis, pp. 238–245 (2005)
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7(2), 219–269 (1995)
Guo, X.C., Yang, J.H., Wu, C.G., Wang, C.Y., Liang, Y.C.: A novel LS-SVMS hyper-parameter selection based on particle swarm optimization. Neurocomputing 71(16), 3211–3215 (2008)
Hainmueller, J., Hazlett, C.: Kernel regularized least squares: reducing misspecification bias with a flexible and interpretable machine learning approach. Polit. Anal. mpt019 (2013)
Hoffmann, S., Schrott, M., Huber, T., Kruse, T.: Model-based methods for the calibration of modern internal combustion engines. MTZ Worldwide 76(4), 24–29 (2015)
Hofmann, T., Schölkopf, B., Smola, A.J.: A tutorial review of RKHS methods in machine learning (2005)
Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Jeng, J.-T.: Hybrid approach of selecting hyperparameters of support vector machine for regression. IEEE Trans. Syst. Man Cybern. Part B Cybern. 36(3), 699–709 (2005)
Kbiob, D.: A statistical approach to some basic mine valuation problems on the witwatersrand. J. Chem. Metall. Min. Soc. S. Afr. (1951)
Lin, S.-W., Lee, Z.-J., Chen, S.-C., Tseng, T.-Y.: Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl. Soft Comput. 8(4), 1505–1512 (2008)
Martin, J.D., Simpson, T.W.: Use of kriging models to approximate deterministic computer models. AIAA J. 43(4), 853–863 (2005)
Monaghan, J.J., Gingold, R.A.: Shock simulation by the particle method SPH. J. Comput. Phys. 52(2), 374–389 (1983)
Ong, C.S., Williamson, R.C., Smola, A.J.: Learning the kernel with hyperkernels. J. Mach. Learn. Res. 6(1), 1043–1071 (2005)
Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G., Ljung, L.: Kernel methods in system identification, machine learning and function estimation: a survey. Automatica 50(3), 657–682 (2014)
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)
Quinonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate gaussian process regression. J. Mach. Learn. Res. 6, 1939–1959 (2005)
Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. MIT Press, Cambridge, MA (2006)
Rifkin, R.M.: Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. Ph.D. thesis, Massachusetts Institute of Technology (2002)
Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)
Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci. 4(4), 409–423 (1989)
Snelson, E., Ghahramani, Z.: Local and global sparse gaussian process approximations. In: AISTATS, vol. 11, pp. 524–531 (2007)
Sollich, P., Williams, C.K.I.: Understanding gaussian process regression using the equivalent kernel. In: Deterministic and statistical methods in machine learning, pp. 211–228. Springer (2005)
Suykens, J.A.K., Gestel, T.V., Brabanter, J., Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific Publishing, New Jersey (2003)
Wahba, G.: A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann. Stat. 13(4), 1378–1402 (1985)
Wahba, G.: Spline Models for Observational data, vol. 59. SIAM (1990)
Welling, M.: Kernel ridge regression. Max Welling’s Classnotes in Machine Learning (http://www.ics.uci.edu/welling/classnotes/classnotes.html), pp. 1–3 (2013)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Nelles, O. (2020). Gaussian Process Models (GPMs). In: Nonlinear System Identification. Springer, Cham. https://doi.org/10.1007/978-3-030-47439-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-47439-3_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-47438-6
Online ISBN: 978-3-030-47439-3
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)