Abstract
We consider the problem of learning incoherent sparse and low-rank patterns from multiple tasks. Our approach is based on a linear multitask learning formulation, in which the sparse and low-rank patterns are induced by a cardinality regularization term and a low-rank constraint, respectively. This formulation is nonconvex; we convert it into its convex surrogate, which can be routinely solved via semidefinite programming for small-size problems. We propose employing the general projected gradient scheme to efficiently solve such a convex surrogate; however, in the optimization formulation, the objective function is nondifferentiable and the feasible domain is nontrivial. We present the procedures for computing the projected gradient and ensuring the global convergence of the projected gradient scheme. The computation of the projected gradient involves a constrained optimization problem; we show that the optimal solution to such a problem can be obtained via solving an unconstrained optimization subproblem and a Euclidean projection subproblem. We also present two projected gradient algorithms and analyze their rates of convergence in detail. In addition, we illustrate the use of the presented projected gradient algorithms for the proposed multitask learning formulation using the least squares loss. Experimental results on a collection of real-world data sets demonstrate the effectiveness of the proposed multitask learning formulation and the efficiency of the proposed projected gradient algorithms.
- Abernethy, J., Bach, F., Evgeniou, T., and Vert, J.-P. 2009. A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res. 10, 803--826. Google ScholarDigital Library
- Ando, R. K. 2007. BioCreative II gene mention tagging system at IBM Watson. In Proceedings of the 2nd BioCreative Challenge Evaluation Workshop.Google Scholar
- Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817--1853. Google ScholarDigital Library
- Argyriou, A., Evgeniou, T., and Pontil, M. 2008. Convex multi-task feature learning. Mach. Learn. 73, 3, 243--272. Google ScholarDigital Library
- Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. 2011. Convex optimization with sparsity-inducing norms. In Optimization for Machine Learning. S. Sra, S. Nowozin, and S. J. Wright Eds., MIT Press.Google Scholar
- Bakker, B. and Heskes, T. 2003. Task clustering and gating for Bayesian multitask learning. J. Mach. Learn. R. 4, 83--99. Google ScholarDigital Library
- Beck, A. and Teboulle, M. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183--202. Google ScholarDigital Library
- Bertsekas, D. P. 1999. Nonlinear Programming. Athena Scientific.Google Scholar
- Bertsekas, D. P., Nedic, A., and Ozdaglar, A. E. 2003. Convex Analysis and Optimization. Athena Scientific.Google Scholar
- Bi, J., Xiong, T., Yu, S., Dundar, M., and Rao, R. B. 2008. An improved multi-task learning approach with applications in medical diagnosis. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD). Google ScholarDigital Library
- Bickel, S., Bogojeska, J., Lengauer, T., and Scheffer, T. 2008. Multi-task learning for HIV therapy screening. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
- Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press. Google ScholarDigital Library
- Candès, E. J., Li, X., Ma, Y., and Wright, J. 2009. Robust principal component analysis? J. ACM 58, 1, 1--37. Google ScholarDigital Library
- Caruana, R. 1997. Multitask learning. Mach. Learn. 28, 1, 41--75. Google ScholarDigital Library
- Chandrasekaran, V., Sanghavi, S., Parrilo, P. A., and Willsky, A. S. 2009. Sparse and low-rank matrix decompositions. In Proceedings of the15th IFAC Symposium on System Identification (SYSID).Google Scholar
- Chang, C.-C. and Lin, C.-J. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27, 1--27. Google ScholarDigital Library
- Chen, J., Tang, L., Liu, J., and Ye, J. 2009a. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
- Chen, X., Pan, W., Kwok, J. T., and Carbonell, J. G. 2009b. Accelerated gradient method for multi-task sparse learning problem. In Proceedings of the IEEE International Conference on Data Mining (ICDM). Google ScholarDigital Library
- Chen, X., Kim, S., Lin, Q., Carbonell, J. G., and Xing, E. P. 2010a. Graph-structured multi-task regression and an efficient optimization method for general fused lasso. Computing Research Repository (CoRR), abs/1005.3579.Google Scholar
- Chen, X., Lin, Q., Kim, S., Carbonell, J. G., and Xing, E. P. 2010b. An efficient proximal-gradient method for single and multi-task regression with structured sparsity. Computing Research Repository (CoRR), abs/1005.4717v3.Google Scholar
- Evgeniou, T., Micchelli, C. A., and Pontil, M. 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615--637. Google ScholarDigital Library
- Fazel, M., Hindi, H., and Boyd, S. 2001. A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference (ACC).Google Scholar
- Fowlkes, C. C., Hendriks, C. L. L., Keränen, S. V., Weber, G. H., Rübel, O., Huang, M.-Y., Chatoor, S., DePace, A. H., Simirenko, L., Henriquez, C., Beaton, A., Weiszmann, R., Celniker, S., Hamann, B., Knowles, D. W., Biggin, M. D., Eisen, M. B., and Malik, J. 2008. A quantitative spatiotemporal atlas of gene expression in the drosophila blastoderm. Cell 133, 2, 364--374.Google ScholarCross Ref
- Golub, G. H. and Van Loan, C. F. 1996. Matrix Computations. Johns Hopkins University Press.Google Scholar
- Gonzalez, R. C. and Woods, R. E. 2002. Digital Image Processing. Prentice Hall. Google ScholarDigital Library
- Jacob, L., Bach, F., and Vert, J.-P. 2008. Clustered multi-task learning: A convex formulation. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
- Jalali, A., Ravikumar, P., Sanghavi, S., and Ruan, C. 2010. A dirty model for multi-task learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
- Ji, S. and Ye, J. 2009. An accelerated gradient method for trace norm minimization. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
- Ji, S., Yuan, L., Li, Y.-X., Zhou, Z.-H., Kumar, S., and Ye, J. 2009. Drosophila gene expression pattern annotation using sparse features and term-term interactions. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
- Kim, S. and Xing, E. P. 2010. Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the International Conference on Machine Learning (ICML).Google Scholar
- Lawrence, N. D. and Platt, J. C. 2004. Learning to learn with the informative vector machine. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
- Lécuyer, E., Yoshida, H., Parthasarathy, N., Alm, C., Babak, T., Cerovina, T., Hughes, T. R., Tomancak, P., and Krause, H. M. 2007. Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function. Cell 131, 1, 174--187.Google ScholarCross Ref
- Liu, J. and Ye, J. 2009. Efficient Euclidean projections in linear time. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
- Liu, J., Ji, S., and Ye, J. 2009a. Multi-task feature learning via efficient l2−1-norm minimization. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarDigital Library
- Liu, J., Ji, S., and Ye, J. 2009b. SLEP: Sparse Learning with Efficient Projections. Arizona State University.Google Scholar
- Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110. Google ScholarDigital Library
- Martinez, A. and Benavente, R. 1998. The AR face database. Tech. rep. 24, Computer Vision Center (CVC).Google Scholar
- Nemirovski, A. 1995. Efficient methods in convex programming. Lecture Notes, Technion.Google Scholar
- Nesterov, Y. 1998. Introductory lectures on convex programming. Lecture Notes.Google Scholar
- Obozinski, G., B.Taskar, and Jordan, M. 2010. Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20, 2, 231--252. Google ScholarDigital Library
- Pong, T. K., Tseng, P., Ji, S., and Ye, J. 2010. Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20, 6. Google ScholarDigital Library
- Schölkopf, B. and Smola, A. J. 2002. Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press.Google Scholar
- Schwaighofer, A., Tresp, V., and Yu, K. 2004. Learning Gaussian process kernels via hierarchical Bayes. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
- Shapiro, A. 1982. Weighted minimum trace factor analysis. Psychometrika 47, 3, 243--264.Google ScholarCross Ref
- Si, S., Tao, D., and Geng, B. 2010. Bregman divergence-based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 22, 7, 929--942. Google ScholarDigital Library
- Sturm, J. F. 2001. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11-12, 653--625.Google Scholar
- Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc., Series B 58, 267--288.Google ScholarCross Ref
- Ueda, N. and Saito, K. 2002. Single-shot detection of multiple categories of text using parametric mixture models. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
- Vandenberghe, L. and Boyd, S. 1996. Semidefinite programming. SIAM Rev. 38, 1, 49--95. Google ScholarDigital Library
- Watson, G. A. 1992. Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170, 33--45.Google ScholarCross Ref
- Wright, J., Peng, Y., Ma, Y., Ganesh, A., and Rao, S. 2009. Robust principal component analysis: Exact recovery of corrupted low-rank matrices by convex optimization. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
- Xue, Y., Liao, X., Carin, L., and Krishnapuram, B. 2007. Multi-task learning for classification with Dirichlet process priors. J. Mach. Learn. Res. 8, 35--63. Google ScholarDigital Library
- Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
- Yu, K., Tresp, V., and Schwaighofer, A. 2005. Learning Gaussian processes from multiple tasks. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
- Zhang, J., Ghahramani, Z., and Yang, Y. 2005. Learning multiple related tasks using latent independent component analysis. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
- Zhang, X., Saha, A., and Vishwanathan, S. V. N. 2010. Regularized risk minimization by Nesterov’s accelerated gradient methods: Algorithmic extensions and empirical studies. Computing Research Repository (CoRR). abs/1011.0472.Google Scholar
- Zhang, Y. and Schneider, J. 2010. Learning multiple tasks with a sparse matrix-normal penalty. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
- Zhang, Y. and Yeung, D.-Y. 2010. A convex formulation for learning task relationship in multi-task learning. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI).Google Scholar
- Zhou, J., Chen, J., and Ye, J. 2011. Clustered multi-task learning via alternating structure optimization. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Index Terms
- Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
Recommendations
Learning incoherent sparse and low-rank patterns from multiple tasks
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningWe consider the problem of learning incoherent sparse and low-rank patterns from multiple tasks. Our approach is based on a linear multi-task learning formulation, in which the sparse and low-rank patterns are induced by a cardinality regularization ...
Improved sparse low-rank matrix estimation
We consider estimating simultaneously sparse and low-rank matrices from their noisy observations.We use non-convex penalty functions that are parameterized to ensure strict convexity of the overall objective function.An ADMM based algorithm is derived ...
Denoising by low-rank and sparse representations
A nonlocal image denoising approach using sparsity and low-rank priors is proposed.A parameter-free optimal singular value shrinker is introduced for low-rank modeling.An iterative patch-based low-rank regularized collaborative filtering is developed.A ...
Comments