research-article

An efficient projection for l₁, _∞ regularization

Authors:
Ariadna Quattoni

Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and UC Berkeley EECS and ICSI, Berkeley, CA

Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and UC Berkeley EECS and ICSI, Berkeley, CA
View Profile

,
Xavier Carreras

Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA

Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA
View Profile

,
Michael Collins

Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA

Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA
View Profile

,
Trevor Darrell

UC Berkeley EECS and ICSI, Berkeley, CA

UC Berkeley EECS and ICSI, Berkeley, CA
View Profile

ICML '09: Proceedings of the 26th Annual International Conference on Machine LearningJune 2009Pages 857–864https://doi.org/10.1145/1553374.1553484

Published:14 June 2009Publication History

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 857–864

ABSTRACT

In recent years the l₁, _∞ norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l₁ framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of l₁, _∞ regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the l₁, _∞ ball. We present an algorithm that works in O(n log n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multi-task image annotation problem. Our results show that l₁, _∞ leads to better performance than both l₂ and l₁ regularization and that it is is effective in discovering jointly sparse solutions.

References

Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., & Roth, D. (2005). Generalization bounds for the area under the roc curve. Journal of Machine Learning Research, 6, 393--425. Google ScholarDigital Library
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. Advances in Neural Information Processing Systems 19 (pp. 41--48).Google Scholar
Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.Google Scholar
Donoho, D. (2004). For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. (Technical Report). Statistics Dept., Stanford University.Google Scholar
Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l1-ball for learning in high dimensions. Proc. of Intl. Conf. on Machine Learning (pp. 272--279). Google ScholarDigital Library
Grauman, K., & Darrell, T. (2008). The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research, 8, 725--760. Google ScholarDigital Library
Lee, S. I., Ganapathi, V., & Koller, D. (2007). Effcient structure learning of markov networks using l1-regularization. Advances in Neural Information Processing Systems 19 (pp. 817--824).Google Scholar
Meier, L., van de Geer, S., & Buhlmann, P. (2006). The group lasso for logistic regression (Technical Report). ETH Seminar fur Statistik.Google Scholar
Ng, A. Y. (2004). Feature selection, l1 vs. l2 regularization, and rotational invariance. Proc. of Intl. Conf. on Machine Learning. Google ScholarDigital Library
Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. Proc. of Conf. on Computer Vision and Pattern Recognition. Google ScholarDigital Library
Obozinski, G., Taskar, B., & Jordan, M. (2006). Multitask feature selection (Technical Report). Statistics Dept., University of California, Berkeley.Google Scholar
Park, M. Y., & Hastie, T. (2006). Regularization path algorithms for detecting gene interactions (Technical Report). Stanford University.Google Scholar
Quattoni, A., Collins, M., & Darrell, T. (2008). Transfer learning for image classification with sparse prototype representations. Proc. of Conf. on Computer Vision and Pattern Recognition. Google ScholarDigital Library
Schmidt, M., Murphy, K., Fung, G., & Rosale, R. (2008). Structure learning in random fields for heart motion abnormality detection. Proc. of Conf. on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Schmidt, M., van den Berg, E., Friedlander, M., & Murphy, K. (2009). Optimizing costly functions with simple constraints: A limited-memory projected quasi-newton algorithm. Proc. of Conf. on Artificial Intelligence and Statistics (pp. 456--463).Google Scholar
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. Proc. of Intl. Conf. on Machine Learning (pp. 807--814). Google ScholarDigital Library
Similä, T., & Tikka, J. (2007). Input selection and shrinkage in multiresponse linear regression. Computational Statistics and Data Analysis, 52, 406--422.Google ScholarCross Ref
Tropp, J. (2006). Algorithms for simultaneous sparse approximation, part ii: convex relaxation. Signal Processing (pp. 589--602). Google ScholarDigital Library
Turlach, B., Venables, W., & Wright, S. (2005). Simultaneous variable selection. Technometrics, 47, 349--363.Google ScholarCross Ref
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Royal Statistical Society Series B, 68, 49--67.Google ScholarCross Ref
Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proc. of Intl. Conf. on Machine Learning (pp. 928--936).Google Scholar

Index Terms

Recommendations

Image compressive sensing via Truncated Schatten-p Norm regularization

Low-rank property as a useful image prior has attracted much attention in image processing communities. Recently, a nonlocal low-rank regularization (NLR) approach toward exploiting low-rank property has shown the state-of-the-art performance in ...
Read More
Image deconvolution using ℓ¹ sparse regularization
ICIMCS '15: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service

This paper studies sparse regularization image deconvolution scheme over the space of measures. This regularization method is the natural extension of the ℓ¹ norm of vectors to the setting of measures. The proposed model is composed of data fitting term ...
Read More
Representer Theorems for Sparsity-Promoting $\ell _{1}$ Regularization

We present a theoretical analysis and comparison of the effect of $\ell _{1}$ versus $\ell _{2}$ regularization for the resolution of ill-posed linear inverse and/or compressed sensing problems. Our formulation covers the most general setting where the solution is specified ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374
General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University
Copyright © 2009 Copyright 2009 by the author(s)/owner(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 58
  Total Citations
  View Citations
- 496
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An efficient projection for l₁, _∞ regularization

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Image compressive sensing via Truncated Schatten-p Norm regularization

Image deconvolution using ℓ¹ sparse regularization

Representer Theorems for Sparsity-Promoting $\ell _{1}$ Regularization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An efficient projection for l1, ∞ regularization

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Image compressive sensing via Truncated Schatten-p Norm regularization

Image deconvolution using ℓ1 sparse regularization

Representer Theorems for Sparsity-Promoting $\ell _{1}$ Regularization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

An efficient projection for l₁, _∞ regularization

Image deconvolution using ℓ¹ sparse regularization