ABSTRACT
In recent years the l1, ∞ norm has been proposed for joint regularization. In essence, this type of regularization aims at extending the l1 framework for learning sparse models to a setting where the goal is to learn a set of jointly sparse models. In this paper we derive a simple and effective projected gradient method for optimization of l1, ∞ regularized problems. The main challenge in developing such a method resides on being able to compute efficient projections to the l1, ∞ ball. We present an algorithm that works in O(n log n) time and O(n) memory where n is the number of parameters. We test our algorithm in a multi-task image annotation problem. Our results show that l1, ∞ leads to better performance than both l2 and l1 regularization and that it is is effective in discovering jointly sparse solutions.
- Agarwal, S., Graepel, T., Herbrich, R., Har-Peled, S., & Roth, D. (2005). Generalization bounds for the area under the roc curve. Journal of Machine Learning Research, 6, 393--425. Google ScholarDigital Library
- Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. Advances in Neural Information Processing Systems 19 (pp. 41--48).Google Scholar
- Bertsekas, D. (1999). Nonlinear programming. Athena Scientific.Google Scholar
- Donoho, D. (2004). For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. (Technical Report). Statistics Dept., Stanford University.Google Scholar
- Duchi, J., Shalev-Shwartz, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the l1-ball for learning in high dimensions. Proc. of Intl. Conf. on Machine Learning (pp. 272--279). Google ScholarDigital Library
- Grauman, K., & Darrell, T. (2008). The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research, 8, 725--760. Google ScholarDigital Library
- Lee, S. I., Ganapathi, V., & Koller, D. (2007). Effcient structure learning of markov networks using l1-regularization. Advances in Neural Information Processing Systems 19 (pp. 817--824).Google Scholar
- Meier, L., van de Geer, S., & Buhlmann, P. (2006). The group lasso for logistic regression (Technical Report). ETH Seminar fur Statistik.Google Scholar
- Ng, A. Y. (2004). Feature selection, l1 vs. l2 regularization, and rotational invariance. Proc. of Intl. Conf. on Machine Learning. Google ScholarDigital Library
- Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. Proc. of Conf. on Computer Vision and Pattern Recognition. Google ScholarDigital Library
- Obozinski, G., Taskar, B., & Jordan, M. (2006). Multitask feature selection (Technical Report). Statistics Dept., University of California, Berkeley.Google Scholar
- Park, M. Y., & Hastie, T. (2006). Regularization path algorithms for detecting gene interactions (Technical Report). Stanford University.Google Scholar
- Quattoni, A., Collins, M., & Darrell, T. (2008). Transfer learning for image classification with sparse prototype representations. Proc. of Conf. on Computer Vision and Pattern Recognition. Google ScholarDigital Library
- Schmidt, M., Murphy, K., Fung, G., & Rosale, R. (2008). Structure learning in random fields for heart motion abnormality detection. Proc. of Conf. on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Schmidt, M., van den Berg, E., Friedlander, M., & Murphy, K. (2009). Optimizing costly functions with simple constraints: A limited-memory projected quasi-newton algorithm. Proc. of Conf. on Artificial Intelligence and Statistics (pp. 456--463).Google Scholar
- Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. Proc. of Intl. Conf. on Machine Learning (pp. 807--814). Google ScholarDigital Library
- Similä, T., & Tikka, J. (2007). Input selection and shrinkage in multiresponse linear regression. Computational Statistics and Data Analysis, 52, 406--422.Google ScholarCross Ref
- Tropp, J. (2006). Algorithms for simultaneous sparse approximation, part ii: convex relaxation. Signal Processing (pp. 589--602). Google ScholarDigital Library
- Turlach, B., Venables, W., & Wright, S. (2005). Simultaneous variable selection. Technometrics, 47, 349--363.Google ScholarCross Ref
- Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Royal Statistical Society Series B, 68, 49--67.Google ScholarCross Ref
- Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. Proc. of Intl. Conf. on Machine Learning (pp. 928--936).Google Scholar
Index Terms
- An efficient projection for l1, ∞ regularization
Recommendations
Image compressive sensing via Truncated Schatten-p Norm regularization
Low-rank property as a useful image prior has attracted much attention in image processing communities. Recently, a nonlocal low-rank regularization (NLR) approach toward exploiting low-rank property has shown the state-of-the-art performance in ...
Image deconvolution using ℓ1 sparse regularization
ICIMCS '15: Proceedings of the 7th International Conference on Internet Multimedia Computing and ServiceThis paper studies sparse regularization image deconvolution scheme over the space of measures. This regularization method is the natural extension of the ℓ1 norm of vectors to the setting of measures. The proposed model is composed of data fitting term ...
Representer Theorems for Sparsity-Promoting $\ell _{1}$ Regularization
We present a theoretical analysis and comparison of the effect of $\ell _{1}$ versus $\ell _{2}$ regularization for the resolution of ill-posed linear inverse and/or compressed sensing problems. Our formulation covers the most general setting where the solution is specified ...
Comments