Skip to main content
Top
Published in:
Cover of the book

2011 | OriginalPaper | Chapter

Absolute Penalty Estimation

Authors : Ejaz S. Ahmed, Enayetur Raheem, Shakhawat Hossain

Published in: International Encyclopedia of Statistical Science

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Excerpt

In statistics, the technique of least squares is used for estimating the unknown parameters in a linear regression model (see Linear Regression Models). This method minimizes the sum of squared distances between the observed responses in a set of data, and the fitted responses from the regression model. Suppose we observe a collection of data {y i , x i } i = 1 n on n units, where y i s are responses and x i = (x i1, x i2, , x ip ) T is a vector of predictors. It is convenient to write the model in matrix notation, as,
$$y = X\beta + \epsilon ,$$
(1)
where y is n ×1 vector of responses, X is n ×p matrix, known as the design matrix, β = (β 1, β 2, , β p ) T is the unknown parameter vector and ε is the vector of random errors. In ordinary least squares (OLS) regression, we estimate β by minimizing the residual sum of squares, \(RSS = {(y -X\beta )}^{T}(y -X\beta ),\) giving \(\hat{{\beta }}_{\mathrm{OLS}} = {({X}^{T}\!X)}^{-1}{X}^{T}\!y.\) This estimator is simple and has some good statistical properties. However, the estimator suffers from lack of uniqueness if the design matrix X is less than full rank, and if the columns of X are (nearly) collinear. To achieve better prediction and to alleviate ill conditioning problem of X T X, Hoerl and Kernard (1970) introduced ridge regression (see Ridge and Surrogate Ridge Regressions), which minimizes the RSS subject to a constraint, ∑ β j 2 t, in other words
$$\hat{{\beta }}^{\mathrm{ridge}} = \mathop {\arg\min }\limits_\beta \left \{ \sum \limits_{i=1}^{N}{({y}_{ i} - {\beta }_{0} -\sum \limits_{j=1}^{p}{x}_{ ij}{\beta }_{j})}^{2} + \lambda \sum \limits_{j=1}^{p}{\beta }_{ j}^{2}\right \},$$
(2)
where λ ≥ 0 is known as the complexity parameter that controls the amount of shrinkage. The larger the value of λ, the greater the amount of shrinkage. The quadratic penalty term makes \(\hat{{\beta }}^{\mathrm{ridge}}\) a linear function of y. Frank and Friedman (1993) introduced bridge regression, a generalized version of penalty (or absolute penalty type) estimation, which includes ridge regression when γ = 2. For a given penalty function π( ⋅) and regularization parameter λ, the general form can be written as
$$\phi (\beta ) = {(\!\,y -X\beta )}^{T}(\!\,y -X\beta ) + \lambda \pi (\beta ),$$
where the penalty function is of the form
$$\pi (\beta ) = \sum \limits_{j=1}^{p}\vert {\beta }_{ j}{\vert }^{\gamma },\ \gamma> 0.$$
(3)
The penalty function in (3) bounds the L γ norm of the parameters in the given model as ∑ j = 1 m | β j | γ t, where t is the tuning parameter that controls the amount of shrinkage. We see that for γ = 2, we obtain ridge regression. However, if γ≠2, the penalty function will not be rotationally invariant. Interestingly, for γ < 2, it shrinks the coefficient toward zero, and depending on the value of λ, it sets some of them to be exactly zero. Thus, the procedure combines variable selection and shrinkage of coefficients of penalized regression. An important member of the penalized least squares (PLS) family is the L 1 penalized least squares estimator or the lasso [least absolute shrinkage and selection operator, Tibshirani (1996)]. In other words, the absolute penalty estimator (APE) arises when the absolute value of penalty term is considered, i.e., γ = 1 in (3). Similar to the ridge regression, the lasso estimates are obtained as
$$\hat{{\beta }}^{\mathrm{lasso}} =\mathop {\arg \min }\limits_\beta \left \{ \sum \limits_{i=1}^{n}{({y}_{ i} - {\beta }_{0} -\sum \limits_{j=1}^{p}{x}_{ ij}{\beta }_{j})}^{2} + \lambda \sum \limits_{j=1}^{p}\vert {\beta }_{ j}\vert \right \}.$$
(4)
The lasso shrinks the OLS estimator toward zero and depending on the value of λ, it sets some coefficients to exactly zero. Tibshirani (1996) used a quadratic programming method to solve (4) for \(\hat{{\beta }}^{\mathrm{lasso}}.\) Later, Efron et al. (2004) proposed least angle regression (LAR), a type of stepwise regression, with which the lasso estimates can be obtained at the same computational cost as that of an ordinary least squares estimation Hastie et al. (2009). Further, the lasso estimator remains numerically feasible for dimensions m that are much higher than the sample size n. Zou and Hastie (2005) introduced a hybrid PLS regression with the so called elastic net penalty defined as λ ∑ j = 1 p (αβ j 2 + (1 − α) | β j | ). Here the penalty function is a linear combination of the ridge regression penalty function and lasso penalty function. A different type of PLS, called garotte is due to Breiman (1993). Further, PLS estimation provides a generalization of both nonparametric least squares and weighted projection estimators, and a popular version of the PLS is given by Tikhonov regularization (Tikhonov 1963). Generally speaking, the ridge regression is highly efficient and stable when there are many small coefficients. The performance of lasso is superior when there are a small-to-medium number of moderate-sized coefficients. On the other hand, shrinkage estimators perform well when there are large known zero coefficients. …

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Ahmed SE, Doksum KA, Hossain S, You J (2007) Shrinkage, pretest and absolute penalty estimators in partially linear models. Aust NZ J Stat 49(4):435–454MATHMathSciNet Ahmed SE, Doksum KA, Hossain S, You J (2007) Shrinkage, pretest and absolute penalty estimators in partially linear models. Aust NZ J Stat 49(4):435–454MATHMathSciNet
go back to reference Breiman L (1993) Better subset selection using the non-negative garotte. Technical report, University of California, Berkeley Breiman L (1993) Better subset selection using the non-negative garotte. Technical report, University of California, Berkeley
go back to reference Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression (with discussion). Ann Stat 32(2):407–499MATHMathSciNet Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression (with discussion). Ann Stat 32(2):407–499MATHMathSciNet
go back to reference Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148MATH Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148MATH
go back to reference Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
go back to reference Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67MATH Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67MATH
go back to reference Tikhonov An (1963) Solution of incorrectly formulated problems and the regularization method. Soviet Math Dokl 4:1035–1038, English translation of Dokl Akad Nauk SSSR 151, 1963,501–504 Tikhonov An (1963) Solution of incorrectly formulated problems and the regularization method. Soviet Math Dokl 4:1035–1038, English translation of Dokl Akad Nauk SSSR 151, 1963,501–504
go back to reference Zou H, Hastie T (2005) Regularization and variable selction via the elastic net. J R Stat Soc B 67(2):301–320MATHMathSciNet Zou H, Hastie T (2005) Regularization and variable selction via the elastic net. J R Stat Soc B 67(2):301–320MATHMathSciNet
Metadata
Title
Absolute Penalty Estimation
Authors
Ejaz S. Ahmed
Enayetur Raheem
Shakhawat Hossain
Copyright Year
2011
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-04898-2_102