skip to main content
10.1145/1553374.1553484acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

An efficient projection for l1, regularization

Authors Info & Claims
Published:14 June 2009Publication History

ABSTRACT

In recent years the l1, norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of l1, regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the l1, ball. We present an algorithm that works in O(n log n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multi-task image annotation problem. Our results show that l1, leads to better performance than both l2 and l1 regularization and that it is is effective in discovering jointly sparse solutions.

References

  1. Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., & Roth, D. (2005). Generalization bounds for the area under the roc curve. Journal of Machine Learning Research, 6, 393--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. Advances in Neural Information Processing Systems 19 (pp. 41--48).Google ScholarGoogle Scholar
  3. Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.Google ScholarGoogle Scholar
  4. Donoho, D. (2004). For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. (Technical Report). Statistics Dept., Stanford University.Google ScholarGoogle Scholar
  5. Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l1-ball for learning in high dimensions. Proc. of Intl. Conf. on Machine Learning (pp. 272--279). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Grauman, K., & Darrell, T. (2008). The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research, 8, 725--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lee, S. I., Ganapathi, V., & Koller, D. (2007). Effcient structure learning of markov networks using l1-regularization. Advances in Neural Information Processing Systems 19 (pp. 817--824).Google ScholarGoogle Scholar
  8. Meier, L., van de Geer, S., & Buhlmann, P. (2006). The group lasso for logistic regression (Technical Report). ETH Seminar fur Statistik.Google ScholarGoogle Scholar
  9. Ng, A. Y. (2004). Feature selection, l1 vs. l2 regularization, and rotational invariance. Proc. of Intl. Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. Proc. of Conf. on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Obozinski, G., Taskar, B., & Jordan, M. (2006). Multitask feature selection (Technical Report). Statistics Dept., University of California, Berkeley.Google ScholarGoogle Scholar
  12. Park, M. Y., & Hastie, T. (2006). Regularization path algorithms for detecting gene interactions (Technical Report). Stanford University.Google ScholarGoogle Scholar
  13. Quattoni, A., Collins, M., & Darrell, T. (2008). Transfer learning for image classification with sparse prototype representations. Proc. of Conf. on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Schmidt, M., Murphy, K., Fung, G., & Rosale, R. (2008). Structure learning in random fields for heart motion abnormality detection. Proc. of Conf. on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  15. Schmidt, M., van den Berg, E., Friedlander, M., & Murphy, K. (2009). Optimizing costly functions with simple constraints: A limited-memory projected quasi-newton algorithm. Proc. of Conf. on Artificial Intelligence and Statistics (pp. 456--463).Google ScholarGoogle Scholar
  16. Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. Proc. of Intl. Conf. on Machine Learning (pp. 807--814). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Similä, T., & Tikka, J. (2007). Input selection and shrinkage in multiresponse linear regression. Computational Statistics and Data Analysis, 52, 406--422.Google ScholarGoogle ScholarCross RefCross Ref
  18. Tropp, J. (2006). Algorithms for simultaneous sparse approximation, part ii: convex relaxation. Signal Processing (pp. 589--602). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Turlach, B., Venables, W., & Wright, S. (2005). Simultaneous variable selection. Technometrics, 47, 349--363.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Royal Statistical Society Series B, 68, 49--67.Google ScholarGoogle ScholarCross RefCross Ref
  21. Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proc. of Intl. Conf. on Machine Learning (pp. 928--936).Google ScholarGoogle Scholar

Index Terms

  1. An efficient projection for l1, regularization

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
                  June 2009
                  1331 pages
                  ISBN:9781605585161
                  DOI:10.1145/1553374

                  Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 14 June 2009

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate140of548submissions,26%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader