ABSTRACT
We propose a new penalty function which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators. The support of the sparse vector is typically a union of potentially overlapping groups of co-variates defined a priori, or a set of covariates which tend to be connected to each other when a graph of covariates is given. We study theoretical properties of the estimator, and illustrate its behavior on simulated and breast cancer gene expression data.
- Bach, F. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res., 9, 1179--1225. Google ScholarDigital Library
- Bach, F. (2009). Exploring large feature spaces with hierarchical multiple kernel learning. Adv. Neural. Inform. Process Syst., 105--112.Google Scholar
- Chen, S. S., Donoho, D. L., & Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20, 33--61. Google ScholarDigital Library
- Chuang, H.-Y., Lee, E., Liu, Y.-T., Lee, D., & Ideker, T. (2007). Network-based classification of breast cancer metastasis. Mol. Syst. Biol., 3, 140.Google ScholarCross Ref
- Fu, W., & Knight, K. (2000). Asymptotics for Lasso-type estimators. Ann. Stat., 28, 1356--1378.Google ScholarCross Ref
- Jenatton, R., Audibert, J.-Y., & Bach, F. (2009). Structured Variable Selection with Sparsity-Inducing Norms. INRIA - Ecole Normale Supéérieure de Paris.Google Scholar
- Kumagai, S. (1980). An implicit function theorem: Comment. J. Optim. Theor. Appl., 31, 285--288.Google ScholarDigital Library
- Meier, L., van de Geer, S., & Büühlmann, P. (2008). The group lasso for logistic regression. J. Roy. Stat. Soc. B, 70, 53--71.Google ScholarCross Ref
- Obozinski, G., Taskar, B., & Jordan, M. (2009). Joint co-variate selection and joint subspace selection for multiple classification problems. Stat. Comput.. To appear. Google ScholarDigital Library
- Rakotomamonjy, A., Bach, F., Canu, S., & Grandvalet, Y. (2008). SimpleMKL. J. Mach. Learn. Res., 9, 2491--2521.Google Scholar
- Rapaport, F., Zynoviev, A., Dutreix, M., Barillot, E., & Vert, J.-P. (2007). Classification of microarray data using gene networks. BMC Bioinformatics, 8, 35.Google ScholarCross Ref
- Roth, V. (2002). The generalized lasso: a wrapper approach to gene selection for microarray data. Proc. Conference on Automated Deduction 14, 252--255.Google Scholar
- Roth, V., & Fischer, B. (2008). The group-lasso for generalized linear models: uniqueness of solutions and efficient algorithms. Int. Conf. Mach. Learn., 848--855. Google ScholarDigital Library
- Subramanian, A., et al., (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA, 102, 15545--15550.Google ScholarCross Ref
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc. B., 58, 267--288.Google ScholarCross Ref
- Van de Vijver, M. J., et al., (2002). A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med., 347, 1999--2009.Google ScholarCross Ref
- Wainwright, M. J. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity (Technical Report 709). UC Berkeley, Department of Statistics.Google Scholar
- Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B, 68, 49--67.Google ScholarCross Ref
- Zhao, P., Rocha, G., & Yu, B. (2009). Grouped and hierarchical model selection through composite absolute penalties. Ann. Stat. To appear.Google Scholar
- Zhao, P., & Yu, B. (2006). On model selection consistency of lasso. J. Mach. Learn. Res., 7, 2541--2563. Google ScholarDigital Library
Index Terms
- Group lasso with overlap and graph lasso
Recommendations
Modeling disease progression via fused sparse group lasso
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningAlzheimer's Disease (AD) is the most common neurodegenerative disorder associated with aging. Understanding how the disease progresses and identifying related pathological biomarkers for the progression is of primary importance in Alzheimer's disease ...
Group Fused Lasso
Proceedings of the 23rd International Conference on Artificial Neural Networks and Machine Learning ICANN 2013 - Volume 8131We introduce the Group Total Variation (GTV) regularizer, a modification of Total Variation that uses the ℓ2,1 norm instead of the ℓ1 one to deal with multidimensional features. When used as the only regularizer, GTV can be applied jointly with ...
Stagewise Lasso
Many statistical machine learning algorithms minimize either an empirical loss function as in AdaBoost, or a penalized empirical loss as in Lasso or SVM. A single regularization tuning parameter controls the trade-off between fidelity to the data and ...
Comments