Skip to main content

Gaussian Process Models (GPMs)

  • Chapter
Nonlinear System Identification
  • 3490 Accesses

  • The original version of this chapter was revised: The abbreviation of the word (ACSMO) has been corrected to (ASCMO) on page no. 640. The correction to this chapter is available at https://doi.org//10.1007/978-3-030-47439-3_30

Abstract

This chapter is devoted to Gaussian processes. Compared to existing literature, it tries to approach this very abstract and complex topic from an intuitive perspective. The features of kernel methods are explained, and their characteristics are highlighted. The key ideas are illustrated with the help of many extremely simplified examples, typically just 1D or 2D, and very few data points. This should allow grasping the basic concepts involved. All toy examples are simultaneously carried out with two different kernel functions: Gaussians and inverse quadratic. The concepts are introduced step by step, starting with just the mean prediction in the noise-free case and adding complexity gradually. The relationship with RBF networks is discussed explicitly. Shedding light on Gaussian processes from various directions, they are hopefully easier to understand than from standard textbooks on this topic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.etas.com/download-center-files/products_ASCMO/ascmo_flyer_en.pdf

  2. 2.

    The original ideas go back to “the Master’s thesis of Danie G. Krige” in 1951 [317], https://en.wikipedia.org/wiki/Kriging.

  3. 3.

    Engineering notation

  4. 4.

    Neural network notation

  5. 5.

    Statistics notation

  6. 6.

    ≈ 1 for points nearby, ≈ 0 for points far away from each other

  7. 7.

    for the dual variables

  8. 8.

    The size of the training data set N is not selected to control the network’s flexibility like it is usually done with the number of neurons M.

  9. 9.

    http://math.stackexchange.com/questions/892832/why-we-consider-log-likelihood-instead-of-likelihood-in-gaussian-distribution

  10. 10.

    Slices through Gaussians and marginal distributions of Gaussians always are Gaussians themselves. This is the reason why Gaussian process models are working so nicely and efficiently.

  11. 11.

    For matrices and vectors, it can be interpreted as some increasing or decreasing norm.

  12. 12.

    Note that in most of the standard literature, both r and k are typically denoted by p which is reserved for the number of inputs throughout this book.

  13. 13.

    The probability to fit any real number exactly is zero, of course.

  14. 14.

    http://stats.stackexchange.com/questions/24799/cross-validation-vs-empirical-bayes-for-estimating-hyperparameters/24818

  15. 15.

    http://www.gaussianprocess.org/gpml/code/matlab/doc/index.html

References

  1. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: International Conference on Database Theory, pp. 420–434. Springer (2001)

    Google Scholar 

  2. Anjyo, K., Lewis, J.P.: RBF interpolation and gaussian process regression through an RKHS formulation. J. Math. Ind. 3(6), 63–71 (2011)

    MathSciNet  MATH  Google Scholar 

  3. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  5. Brovelli, M.A., Sanso, F., Venuti, G.: A discussion on the Wiener–Kolmogorov prediction principle with easy-to-compute and robust variants. J. Geod. 76(11–12), 673–683 (2003)

    Article  MATH  ADS  Google Scholar 

  6. Cawley, G.C., Talbot, N.L.C.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)

    MathSciNet  MATH  Google Scholar 

  7. Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Mach. Learn. 46(1–3), 131–159 (2002)

    Article  MATH  Google Scholar 

  8. Chen, P.-W., Wang, J.-Y., Lee, H.-M.: Model selection of SVMs using GA approach. In: 2004 IEEE International Joint Conference on Neural Networks. Proceedings. vol. 3, pp. 2035–2040. IEEE (2004)

    Google Scholar 

  9. Duan, K., Keerthi, S.S., Poo, A.N.: Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51, 41–59 (2003)

    Article  Google Scholar 

  10. Duvenaud, D.: Automatic Model Construction with Gaussian Processes. Ph.D. thesis, University of Cambridge (2014)

    Google Scholar 

  11. Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput Math. 13(1), 1–50 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. Francois, D., Wertz, V., Verleysen, M., et al.: About the locality of kernels in high-dimensional spaces. In: International Symposium on Applied Stochastic Models and Data Analysis, pp. 238–245 (2005)

    Google Scholar 

  13. Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7(2), 219–269 (1995)

    Article  Google Scholar 

  14. Guo, X.C., Yang, J.H., Wu, C.G., Wang, C.Y., Liang, Y.C.: A novel LS-SVMS hyper-parameter selection based on particle swarm optimization. Neurocomputing 71(16), 3211–3215 (2008)

    Article  Google Scholar 

  15. Hainmueller, J., Hazlett, C.: Kernel regularized least squares: reducing misspecification bias with a flexible and interpretable machine learning approach. Polit. Anal. mpt019 (2013)

    Google Scholar 

  16. Hoffmann, S., Schrott, M., Huber, T., Kruse, T.: Model-based methods for the calibration of modern internal combustion engines. MTZ Worldwide 76(4), 24–29 (2015)

    Article  Google Scholar 

  17. Hofmann, T., Schölkopf, B., Smola, A.J.: A tutorial review of RKHS methods in machine learning (2005)

    Google Scholar 

  18. Hsu, C.-W., Lin, C.-J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)

    Article  Google Scholar 

  19. Jeng, J.-T.: Hybrid approach of selecting hyperparameters of support vector machine for regression. IEEE Trans. Syst. Man Cybern. Part B Cybern. 36(3), 699–709 (2005)

    Article  Google Scholar 

  20. Kbiob, D.: A statistical approach to some basic mine valuation problems on the witwatersrand. J. Chem. Metall. Min. Soc. S. Afr. (1951)

    Google Scholar 

  21. Lin, S.-W., Lee, Z.-J., Chen, S.-C., Tseng, T.-Y.: Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl. Soft Comput. 8(4), 1505–1512 (2008)

    Article  Google Scholar 

  22. Martin, J.D., Simpson, T.W.: Use of kriging models to approximate deterministic computer models. AIAA J. 43(4), 853–863 (2005)

    Article  ADS  Google Scholar 

  23. Monaghan, J.J., Gingold, R.A.: Shock simulation by the particle method SPH. J. Comput. Phys. 52(2), 374–389 (1983)

    Article  MATH  ADS  Google Scholar 

  24. Ong, C.S., Williamson, R.C., Smola, A.J.: Learning the kernel with hyperkernels. J. Mach. Learn. Res. 6(1), 1043–1071 (2005)

    MathSciNet  MATH  Google Scholar 

  25. Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G., Ljung, L.: Kernel methods in system identification, machine learning and function estimation: a survey. Automatica 50(3), 657–682 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  26. Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)

    Article  MATH  Google Scholar 

  27. Quinonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate gaussian process regression. J. Mach. Learn. Res. 6, 1939–1959 (2005)

    MathSciNet  MATH  Google Scholar 

  28. Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. MIT Press, Cambridge, MA (2006)

    MATH  Google Scholar 

  29. Rifkin, R.M.: Everything Old Is New Again: A Fresh Look at Historical Approaches in Machine Learning. Ph.D. thesis, Massachusetts Institute of Technology (2002)

    Google Scholar 

  30. Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004)

    MathSciNet  MATH  Google Scholar 

  31. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci. 4(4), 409–423 (1989)

    MathSciNet  MATH  Google Scholar 

  32. Snelson, E., Ghahramani, Z.: Local and global sparse gaussian process approximations. In: AISTATS, vol. 11, pp. 524–531 (2007)

    Google Scholar 

  33. Sollich, P., Williams, C.K.I.: Understanding gaussian process regression using the equivalent kernel. In: Deterministic and statistical methods in machine learning, pp. 211–228. Springer (2005)

    Google Scholar 

  34. Suykens, J.A.K., Gestel, T.V., Brabanter, J., Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific Publishing, New Jersey (2003)

    MATH  Google Scholar 

  35. Wahba, G.: A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann. Stat. 13(4), 1378–1402 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  36. Wahba, G.: Spline Models for Observational data, vol. 59. SIAM (1990)

    Google Scholar 

  37. Welling, M.: Kernel ridge regression. Max Welling’s Classnotes in Machine Learning (http://www.ics.uci.edu/welling/classnotes/classnotes.html), pp. 1–3 (2013)

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Cite this chapter

Nelles, O. (2020). Gaussian Process Models (GPMs). In: Nonlinear System Identification. Springer, Cham. https://doi.org/10.1007/978-3-030-47439-3_16

Download citation

Publish with us

Policies and ethics