Linear Optimization

Nelles, Oliver

doi:10.1007/978-3-030-47439-3_3

Oliver Nelles²

3408 Accesses

Abstract

This chapter deals with linear regression in detail. Modeling problems where the model output is linear in the model’s parameters are studied. The method of “least squares” which minimizes the sum of squared model errors is derived, and its properties are analyzed. Furthermore, various extensions are introduced like weighting and regularization. The concept of the smoothing or hat matrix, which maps the measured output values to the model output values, is introduced. It will be required to understand the leave-one-out error in linear regression and the local behavior of kernel methods in later chapters. Also, the important aspect of the “effective number of parameters” is discussed as it will be a key topic throughout the whole book. In addition, methods for recursive updating are introduced that are capable of dealing with data streams. Finally, the more advanced issue of linear subset selection is discussed in detail. It allows to select regressors incrementally, thereby carrying out structure optimization. These approaches are also a recurrent theme throughout the book.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This error is called the equation error and is different from the output error (difference between process and model output) used in the other examples. The reason for using the equation error here is that for IIR filters the output error would not be linear in the parameters; see Chap. 18.
2.
Strictly speaking, the regression matrix must have at least as many rows as columns. (Recall that for the FIR and IIR filter examples, the number of rows is smaller than N.) Moreover, this condition is not sufficient since additionally the columns must be linearly independent.
3.
Note that the Hessian \( \underline {H}\) is symmetric and therefore all eigenvalues are real. Furthermore, the eigenvalues are non-negative because the Hessian is positive semi-definite since \( \underline {H}= \underline {X}^T \underline {X}\). If \( \underline {X}\) and thus \( \underline {H}\) are not singular (i.e., have full rank), the eigenvalues are strictly positive.
4.
Found under https://onlinecourses.science.psu.edu/stat857/node/155.

References

Branch, M.A., Grace, A.: MATLAB Optimization Toolbox User’s Guide, Version 1.5. The MATHWORKS Inc., Natick, MA (1998)
Google Scholar
Breiman, L., et al.: Arcing classifier (with discussion and a rejoinder by the author). Ann. Stat. 26(3), 801–849 (1998)
Article MATH Google Scholar
Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting. Stat. Sci. 22(4), 477–505 (2007)
MathSciNet MATH Google Scholar
Chen, S., Billings, S.A., Luo, W.: Orthogonal least squares methods and their application to nonlinear system identification. Int. J. Control 50(5), 1873–1896 (1989)
Article MathSciNet MATH Google Scholar
Chen, S., Cowan, C.F.N., Grant, P.M.: Orthogonal least-squares learning algorithm for radial basis function networks. IEEE Trans. Neural Netw. 2(2) (1991)
Google Scholar
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet MATH Google Scholar
Draper, N.R., Smith, H.: Applied Regression Analysis. Probability and Mathematical Statistics. John Wiley & Sons, New York (1981)
Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Article MathSciNet MATH Google Scholar
Fischer, M.: Fuzzy-modellbasierte Regelung nichtlinearer Prozesse. Ph.D. Thesis, TU Darmstadt, Reihe 8: Mess-, Steuerungs- und Regelungstechnik, Nr. 750. VDI-Verlag, Düsseldorf (1999)
Google Scholar
Fischer, M., Nelles, O., Fink, A.: Adaptive fuzzy model-based control. Journal A 39(3), 22–28 (1998)
MATH Google Scholar
Fortescue, T.R., Kershenbaum, L.S., Ydstie, B.E.: Implementation of self-tuning regulators with variable forgetting factor. Automatica 17, 831–835 (1981)
Article Google Scholar
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Article MathSciNet MATH Google Scholar
Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Academic Press, London (1981)
MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations. Mathematical Sciences. The Johns Hopkins University Press, Baltimore (1987)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics, 2nd edn. Springer, Berlin (2009)
Google Scholar
Haykin, S.: Adaptive Filter Theory. Prentice Hall, Oxford (1991)
MATH Google Scholar
Isermann, R.: Identifikation dynamischer Syteme – Band 1, 2. ed. Springer, Berlin (1992)
Book MATH Google Scholar
Isermann, R., Lachmann, K.-H., Matko, D.: Adaptive Control Systems. Prentice Hall, New York (1992)
MATH Google Scholar
Johansen, T.A.: On Tikhonov regularization, bias and variance in nonlinear system identification. Automatica 33(3), 441–446 (1997)
Article MathSciNet MATH Google Scholar
Kofahl, R.: Parameteradaptive Regelungen mit robusten Eigenschaften. Ph.D. Thesis, TU Darmstadt, FB MSR 39. Springer, Berlin (1988)
Google Scholar
Lewis, J.M., Lakshmivarahan, S., Dhall, S.: Dynamic data assimilation: a least squares approach, vol. 13. Cambridge University Press, Cambridge (2006)
Book MATH Google Scholar
Miller, A.J.: Subset Selection in Regression. Statistics and Applied Probability. Chapman & Hall, New York (1990)
Google Scholar
Murray-Smith, R., Johansen, T.A.: Local learning in local model networks. In: Murray-Smith, R., Johansen, T.A. (eds.) Multiple Model Approaches to Modelling and Control, chapter 7, pp. 185–210. Taylor & Francis, London (1997)
Google Scholar
Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning (ICML-1998), pp. 515–521. Morgan Kaufmann (1998)
Google Scholar
Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
Article Google Scholar
Söderström, T., Stoica, P.: System Identification. Prentice Hall, New York (1989)
MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodolog.) 58(1), 267–288 (1996)
Google Scholar
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Wiley, New York (1977)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Siegen, Netphen, Germany
Oliver Nelles

Authors

Oliver Nelles
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nelles, O. (2020). Linear Optimization. In: Nonlinear System Identification. Springer, Cham. https://doi.org/10.1007/978-3-030-47439-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-47439-3_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-47438-6
Online ISBN: 978-3-030-47439-3
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics