nach oben

Erschienen in:

2011 | OriginalPaper | Buchkapitel

Absolute Penalty Estimation

verfasst von : Ejaz S. Ahmed, Enayetur Raheem, Shakhawat Hossain

Erschienen in: International Encyclopedia of Statistical Science

Verlag: Springer Berlin Heidelberg

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Excerpt

In statistics, the technique of least squares is used for estimating the unknown parameters in a linear regression model (see Linear Regression Models). This method minimizes the sum of squared distances between the observed responses in a set of data, and the fitted responses from the regression model. Suppose we observe a collection of data {y _i, x _i} _{i = 1} ⁿ on n units, where y _is are responses and x _i = (x _i1, x _i2, …, x _ip)^T is a vector of predictors. It is convenient to write the model in matrix notation, as,

$$y = X\beta + \epsilon ,$$

(1)

where y is n ×1 vector of responses, X is n ×p matrix, known as the design matrix, β = (β ₁, β ₂, …, β _p)^T is the unknown parameter vector and ε is the vector of random errors. In ordinary least squares (OLS) regression, we estimate β by minimizing the residual sum of squares, $RSS = {(y -X\beta )}^{T}(y -X\beta ),$ giving $\hat{{\beta }}_{\mathrm{OLS}} = {({X}^{T}\!X)}^{-1}{X}^{T}\!y.$ This estimator is simple and has some good statistical properties. However, the estimator suffers from lack of uniqueness if the design matrix X is less than full rank, and if the columns of X are (nearly) collinear. To achieve better prediction and to alleviate ill conditioning problem of X ^T X, Hoerl and Kernard (1970) introduced ridge regression (see Ridge and Surrogate Ridge Regressions), which minimizes the RSS subject to a constraint, ∑ β _j ² ≤ t, in other words

$$\hat{{\beta }}^{\mathrm{ridge}} = \mathop {\arg\min }\limits_\beta \left \{ \sum \limits_{i=1}^{N}{({y}_{ i} - {\beta }_{0} -\sum \limits_{j=1}^{p}{x}_{ ij}{\beta }_{j})}^{2} + \lambda \sum \limits_{j=1}^{p}{\beta }_{ j}^{2}\right \},$$

(2)

where λ ≥ 0 is known as the complexity parameter that controls the amount of shrinkage. The larger the value of λ, the greater the amount of shrinkage. The quadratic penalty term makes $\hat{{\beta }}^{\mathrm{ridge}}$ a linear function of y. Frank and Friedman (1993) introduced bridge regression, a generalized version of penalty (or absolute penalty type) estimation, which includes ridge regression when γ = 2. For a given penalty function π( ⋅) and regularization parameter λ, the general form can be written as

$$\phi (\beta ) = {(\!\,y -X\beta )}^{T}(\!\,y -X\beta ) + \lambda \pi (\beta ),$$

where the penalty function is of the form

$$\pi (\beta ) = \sum \limits_{j=1}^{p}\vert {\beta }_{ j}{\vert }^{\gamma },\ \gamma> 0.$$

(3)

The penalty function in (3) bounds the L _γ norm of the parameters in the given model as ∑ _{j = 1} ^m | β _j | ^γ ≤ t, where t is the tuning parameter that controls the amount of shrinkage. We see that for γ = 2, we obtain ridge regression. However, if γ≠2, the penalty function will not be rotationally invariant. Interestingly, for γ < 2, it shrinks the coefficient toward zero, and depending on the value of λ, it sets some of them to be exactly zero. Thus, the procedure combines variable selection and shrinkage of coefficients of penalized regression. An important member of the penalized least squares (PLS) family is the L ₁ penalized least squares estimator or the lasso [least absolute shrinkage and selection operator, Tibshirani (1996)]. In other words, the absolute penalty estimator (APE) arises when the absolute value of penalty term is considered, i.e., γ = 1 in (3). Similar to the ridge regression, the lasso estimates are obtained as

$$\hat{{\beta }}^{\mathrm{lasso}} =\mathop {\arg \min }\limits_\beta \left \{ \sum \limits_{i=1}^{n}{({y}_{ i} - {\beta }_{0} -\sum \limits_{j=1}^{p}{x}_{ ij}{\beta }_{j})}^{2} + \lambda \sum \limits_{j=1}^{p}\vert {\beta }_{ j}\vert \right \}.$$

(4)

The lasso shrinks the OLS estimator toward zero and depending on the value of λ, it sets some coefficients to exactly zero. Tibshirani (1996) used a quadratic programming method to solve (4) for $\hat{{\beta }}^{\mathrm{lasso}}.$ Later, Efron et al. (2004) proposed least angle regression (LAR), a type of stepwise regression, with which the lasso estimates can be obtained at the same computational cost as that of an ordinary least squares estimation Hastie et al. (2009). Further, the lasso estimator remains numerically feasible for dimensions m that are much higher than the sample size n. Zou and Hastie (2005) introduced a hybrid PLS regression with the so called elastic net penalty defined as λ ∑ _{j = 1} ^p (αβ _j ² + (1 − α) | β _j | ). Here the penalty function is a linear combination of the ridge regression penalty function and lasso penalty function. A different type of PLS, called garotte is due to Breiman (1993). Further, PLS estimation provides a generalization of both nonparametric least squares and weighted projection estimators, and a popular version of the PLS is given by Tikhonov regularization (Tikhonov 1963). Generally speaking, the ridge regression is highly efficient and stable when there are many small coefficients. The performance of lasso is superior when there are a small-to-medium number of moderate-sized coefficients. On the other hand, shrinkage estimators perform well when there are large known zero coefficients. …

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel Accelerated Lifetime Testing

Ahmed SE, Doksum KA, Hossain S, You J (2007) Shrinkage, pretest and absolute penalty estimators in partially linear models. Aust NZ J Stat 49(4):435–454MATHMathSciNet

Breiman L (1993) Better subset selection using the non-negative garotte. Technical report, University of California, Berkeley

Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression (with discussion). Ann Stat 32(2):407–499MATHMathSciNet

Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148MATH

Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67MATH

Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288MATHMathSciNet

Tikhonov An (1963) Solution of incorrectly formulated problems and the regularization method. Soviet Math Dokl 4:1035–1038, English translation of Dokl Akad Nauk SSSR 151, 1963,501–504

Zou H, Hastie T (2005) Regularization and variable selction via the elastic net. J R Stat Soc B 67(2):301–320MATHMathSciNet

Titel: Absolute Penalty Estimation
verfasst von: Ejaz S. Ahmed
Enayetur Raheem
Shakhawat Hossain
Verlag: Springer Berlin Heidelberg
Buch: International Encyclopedia of Statistical Science
Print ISBN: 978-3-642-04897-5

Electronic ISBN: 978-3-642-04898-2

Copyright-Jahr: 2011
DOI: https://doi.org/10.1007/978-3-642-04898-2_102

Springer Professional

Excerpt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"