High-dimensional variable screening and bias in subsequent inference, with an empirical comparison

Bühlmann, Peter; Mandozzi, Jacopo

doi:10.1007/s00180-013-0436-3

High-dimensional variable screening and bias in subsequent inference, with an empirical comparison

Original Paper
Published: 23 July 2013

Volume 29, pages 407–430, (2014)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Peter Bühlmann¹ &
Jacopo Mandozzi¹

961 Accesses
45 Citations
Explore all metrics

Abstract

We review variable selection and variable screening in high-dimensional linear models. Thereby, a major focus is an empirical comparison of various estimation methods with respect to true and false positive selection rates based on 128 different sparse scenarios from semi-real data (real data covariables but synthetic regression coefficients and noise). Furthermore, we present some theoretical bounds for the bias in subsequent least squares estimation, using the selected variables from the first stage, which have direct implications for construction of p-values for regression coefficients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Laplace Approximation in High-Dimensional Bayesian Regression

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Article Open access 19 December 2019

Usage of the GO estimator in high dimensional linear models

Article 18 June 2020

References

Adragni K, Cook R (2009) Sufficient dimension reduction and prediction in regression. Philos Trans R Soc A 367:4385–4400
Article MATH MathSciNet Google Scholar
Bickel P, Ritov Y, Tsybakov A (2009) Simultaneous analysis of Lasso and Dantzig selector. Ann Stat 37:1705–1732
Article MATH MathSciNet Google Scholar
Bühlmann P (2012) Statistical significance in high-dimensional linear models. Bernoulli (to appear)
Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, New York
Book Google Scholar
Bühlmann P, Meier L, Kalisch M (2013) High-dimensional statistics with a view towards applications in biology. Annu Rev Stat Appl (to appear)
Bunea F, Tsybakov A, Wegkamp M (2007) Sparsity oracle inequalities for the Lasso. Electron J Stat 1:169–194
Article MATH MathSciNet Google Scholar
Candès E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35(6):2313–2351
Article MATH Google Scholar
Dettling M (2004) Bagboosting for tumor classification with gene expression data. Bioinformatics 20(18):3583–3593
Article Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MATH MathSciNet Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultra-high dimensional feature space (with discussion). J R Stat Soc Ser B 70:849–911
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Google Scholar
Greenshtein E, Ritov Y (2004) Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10:971–988
Article MATH MathSciNet Google Scholar
Hebiri M, van de Geer S (2011) The smooth Lasso and other \(\ell _1+ \ell _2\)-penalized methods. Electron J Stat 5:1184–1226
Article MATH MathSciNet Google Scholar
Koltchinskii V (2009a) The Dantzig selector and sparsity oracle inequalities. Bernoulli 15:799–828
Article MATH MathSciNet Google Scholar
Koltchinskii V (2009b) Sparsity in penalized empirical risk minimization. Ann de l’Institut Henri Poincaré, Probab et Stat 45:7–57
Article MATH MathSciNet Google Scholar
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the Lasso. Ann Stat 34:1436–1462
Article MATH Google Scholar
Meinshausen N, Meier L, Bühlmann P (2009) P-values for high-dimensional regression. J Am Stat Assoc 104:1671–1681
Article MATH Google Scholar
Meinshausen N, Yu B (2009) Lasso-type recovery of sparse representations for high-dimensional data. Ann Stat 37:246–270
Article MATH MathSciNet Google Scholar
Raskutti G, Wainwright M, Yu B (2010) Restricted eigenvalue properties for correlated Gaussian designs. J Mach Learn Res 11:2241–2259
MATH MathSciNet Google Scholar
Shao J, Deng X (2012) Estimation in high-dimensional linear models with deterministic design matrices. Ann Stat 40:812–831
Article MATH MathSciNet Google Scholar
Sun T, Zhang C-H (2012) Scaled sparse linear regression. Biometrika 99:879–898
Article MATH MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267–288
MATH MathSciNet Google Scholar
van de Geer S (2007) The deterministic Lasso. In: JSM proceedings, p. 140. American Statistical Association, Alexandria, VA
van de Geer S (2008) High-dimensional generalized linear models and the Lasso. Ann Stat 36:614–645
Article MATH Google Scholar
van de Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the Lasso. Electron J Stat 3:1360–1392
Article MATH MathSciNet Google Scholar
van de Geer S, Bühlmann P, Zhou S (2011) The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron J Stat 5:688–749
Article MATH MathSciNet Google Scholar
Wainwright M (2009) Sharp thresholds for high-dimensional and noisy sparsity recovery using \(\ell _{1}\)-constrained quadratic programming (Lasso). IEEE Trans Inf Theory 55:2183–2202
Article MathSciNet Google Scholar
Wasserman L, Roeder K (2009) High dimensional variable selection. Ann Stat 37:2178–2201
Article MATH MathSciNet Google Scholar
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci 98(20):11462–11467
Article Google Scholar
Ye F, Zhang C-H (2010) Rate minimaxity of the Lasso and Dantzig selector for the \(\ell _q\) loss in \(\ell _r\) balls. J Mach Learn Res 11:3519–3540
MATH MathSciNet Google Scholar
Zhang C-H, Huang J (2008) The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594
Article MATH Google Scholar
Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7:2541–2563
MATH MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320
Article MATH MathSciNet Google Scholar
Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Seminar for Statistics, ETH, Zurich, Switzerland
Peter Bühlmann & Jacopo Mandozzi

Authors

Peter Bühlmann
View author publications
You can also search for this author in PubMed Google Scholar
Jacopo Mandozzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Bühlmann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bühlmann, P., Mandozzi, J. High-dimensional variable screening and bias in subsequent inference, with an empirical comparison. Comput Stat 29, 407–430 (2014). https://doi.org/10.1007/s00180-013-0436-3

Download citation

Received: 03 September 2012
Accepted: 01 July 2013
Published: 23 July 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s00180-013-0436-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional variable screening and bias in subsequent inference, with an empirical comparison

Abstract

Access this article

Similar content being viewed by others

Laplace Approximation in High-Dimensional Bayesian Regression

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Usage of the GO estimator in high dimensional linear models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-dimensional variable screening and bias in subsequent inference, with an empirical comparison

Abstract

Access this article

Similar content being viewed by others

Laplace Approximation in High-Dimensional Bayesian Regression

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Usage of the GO estimator in high dimensional linear models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation