Skip to main content
  • 3408 Accesses

Abstract

This chapter deals with linear regression in detail. Modeling problems where the model output is linear in the model’s parameters are studied. The method of “least squares” which minimizes the sum of squared model errors is derived, and its properties are analyzed. Furthermore, various extensions are introduced like weighting and regularization. The concept of the smoothing or hat matrix, which maps the measured output values to the model output values, is introduced. It will be required to understand the leave-one-out error in linear regression and the local behavior of kernel methods in later chapters. Also, the important aspect of the “effective number of parameters” is discussed as it will be a key topic throughout the whole book. In addition, methods for recursive updating are introduced that are capable of dealing with data streams. Finally, the more advanced issue of linear subset selection is discussed in detail. It allows to select regressors incrementally, thereby carrying out structure optimization. These approaches are also a recurrent theme throughout the book.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This error is called the equation error and is different from the output error (difference between process and model output) used in the other examples. The reason for using the equation error here is that for IIR filters the output error would not be linear in the parameters; see Chap. 18.

  2. 2.

    Strictly speaking, the regression matrix must have at least as many rows as columns. (Recall that for the FIR and IIR filter examples, the number of rows is smaller than N.) Moreover, this condition is not sufficient since additionally the columns must be linearly independent.

  3. 3.

    Note that the Hessian \( \underline {H}\) is symmetric and therefore all eigenvalues are real. Furthermore, the eigenvalues are non-negative because the Hessian is positive semi-definite since \( \underline {H}= \underline {X}^T \underline {X}\). If \( \underline {X}\) and thus \( \underline {H}\) are not singular (i.e., have full rank), the eigenvalues are strictly positive.

  4. 4.

    Found under https://onlinecourses.science.psu.edu/stat857/node/155.

References

  1. Branch, M.A., Grace, A.: MATLAB Optimization Toolbox User’s Guide, Version 1.5. The MATHWORKS Inc., Natick, MA (1998)

    Google Scholar 

  2. Breiman, L., et al.: Arcing classifier (with discussion and a rejoinder by the author). Ann. Stat. 26(3), 801–849 (1998)

    Article  MATH  Google Scholar 

  3. Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22(4), 477–505 (2007)

    MathSciNet  MATH  Google Scholar 

  4. Chen, S., Billings, S.A., Luo, W.: Orthogonal least squares methods and their application to nonlinear system identification. Int. J. Control 50(5), 1873–1896 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chen, S., Cowan, C.F.N., Grant, P.M.: Orthogonal least-squares learning algorithm for radial basis function networks. IEEE Trans. Neural Netw. 2(2) (1991)

    Google Scholar 

  6. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  7. Draper, N.R., Smith, H.: Applied Regression Analysis. Probability and Mathematical Statistics. John Wiley & Sons, New York (1981)

    Google Scholar 

  8. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fischer, M.: Fuzzy-modellbasierte Regelung nichtlinearer Prozesse. Ph.D. Thesis, TU Darmstadt, Reihe 8: Mess-, Steuerungs- und Regelungstechnik, Nr. 750. VDI-Verlag, Düsseldorf (1999)

    Google Scholar 

  10. Fischer, M., Nelles, O., Fink, A.: Adaptive fuzzy model-based control. Journal A 39(3), 22–28 (1998)

    MATH  Google Scholar 

  11. Fortescue, T.R., Kershenbaum, L.S., Ydstie, B.E.: Implementation of self-tuning regulators with variable forgetting factor. Automatica 17, 831–835 (1981)

    Article  Google Scholar 

  12. Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  13. Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Academic Press, London (1981)

    MATH  Google Scholar 

  14. Golub, G.H., Van Loan, C.F.: Matrix Computations. Mathematical Sciences. The Johns Hopkins University Press, Baltimore (1987)

    MATH  Google Scholar 

  15. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics, 2nd edn. Springer, Berlin (2009)

    Google Scholar 

  16. Haykin, S.: Adaptive Filter Theory. Prentice Hall, Oxford (1991)

    MATH  Google Scholar 

  17. Isermann, R.: Identifikation dynamischer Syteme – Band 1, 2. ed. Springer, Berlin (1992)

    Book  MATH  Google Scholar 

  18. Isermann, R., Lachmann, K.-H., Matko, D.: Adaptive Control Systems. Prentice Hall, New York (1992)

    MATH  Google Scholar 

  19. Johansen, T.A.: On Tikhonov regularization, bias and variance in nonlinear system identification. Automatica 33(3), 441–446 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  20. Kofahl, R.: Parameteradaptive Regelungen mit robusten Eigenschaften. Ph.D. Thesis, TU Darmstadt, FB MSR 39. Springer, Berlin (1988)

    Google Scholar 

  21. Lewis, J.M., Lakshmivarahan, S., Dhall, S.: Dynamic data assimilation: a least squares approach, vol. 13. Cambridge University Press, Cambridge (2006)

    Book  MATH  Google Scholar 

  22. Miller, A.J.: Subset Selection in Regression. Statistics and Applied Probability. Chapman & Hall, New York (1990)

    Google Scholar 

  23. Murray-Smith, R., Johansen, T.A.: Local learning in local model networks. In: Murray-Smith, R., Johansen, T.A. (eds.) Multiple Model Approaches to Modelling and Control, chapter 7, pp. 185–210. Taylor & Francis, London (1997)

    Google Scholar 

  24. Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning (ICML-1998), pp. 515–521. Morgan Kaufmann (1998)

    Google Scholar 

  25. Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)

    Article  Google Scholar 

  26. Söderström, T., Stoica, P.: System Identification. Prentice Hall, New York (1989)

    MATH  Google Scholar 

  27. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodolog.) 58(1), 267–288 (1996)

    Google Scholar 

  28. Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Wiley, New York (1977)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Cite this chapter

Nelles, O. (2020). Linear Optimization. In: Nonlinear System Identification. Springer, Cham. https://doi.org/10.1007/978-3-030-47439-3_3

Download citation

Publish with us

Policies and ethics