research-article

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

Authors:
Jianhui Chen

Arizona State University

Arizona State University
View Profile

,
Ji Liu

Arizona State University

Arizona State University
View Profile

,
Jieping Ye

Arizona State University

Arizona State University
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 5 Issue 4Article No.: 22pp 1–31https://doi.org/10.1145/2086737.2086742

Published:01 February 2012Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

We consider the problem of learning incoherent sparse and low-rank patterns from multiple tasks. Our approach is based on a linear multitask learning formulation, in which the sparse and low-rank patterns are induced by a cardinality regularization term and a low-rank constraint, respectively. This formulation is nonconvex; we convert it into its convex surrogate, which can be routinely solved via semidefinite programming for small-size problems. We propose employing the general projected gradient scheme to efficiently solve such a convex surrogate; however, in the optimization formulation, the objective function is nondifferentiable and the feasible domain is nontrivial. We present the procedures for computing the projected gradient and ensuring the global convergence of the projected gradient scheme. The computation of the projected gradient involves a constrained optimization problem; we show that the optimal solution to such a problem can be obtained via solving an unconstrained optimization subproblem and a Euclidean projection subproblem. We also present two projected gradient algorithms and analyze their rates of convergence in detail. In addition, we illustrate the use of the presented projected gradient algorithms for the proposed multitask learning formulation using the least squares loss. Experimental results on a collection of real-world data sets demonstrate the effectiveness of the proposed multitask learning formulation and the efficiency of the proposed projected gradient algorithms.

References

Abernethy, J., Bach, F., Evgeniou, T., and Vert, J.-P. 2009. A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res. 10, 803--826. Google ScholarDigital Library
Ando, R. K. 2007. BioCreative II gene mention tagging system at IBM Watson. In Proceedings of the 2nd BioCreative Challenge Evaluation Workshop.Google Scholar
Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817--1853. Google ScholarDigital Library
Argyriou, A., Evgeniou, T., and Pontil, M. 2008. Convex multi-task feature learning. Mach. Learn. 73, 3, 243--272. Google ScholarDigital Library
Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. 2011. Convex optimization with sparsity-inducing norms. In Optimization for Machine Learning. S. Sra, S. Nowozin, and S. J. Wright Eds., MIT Press.Google Scholar
Bakker, B. and Heskes, T. 2003. Task clustering and gating for Bayesian multitask learning. J. Mach. Learn. R. 4, 83--99. Google ScholarDigital Library
Beck, A. and Teboulle, M. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183--202. Google ScholarDigital Library
Bertsekas, D. P. 1999. Nonlinear Programming. Athena Scientific.Google Scholar
Bertsekas, D. P., Nedic, A., and Ozdaglar, A. E. 2003. Convex Analysis and Optimization. Athena Scientific.Google Scholar
Bi, J., Xiong, T., Yu, S., Dundar, M., and Rao, R. B. 2008. An improved multi-task learning approach with applications in medical diagnosis. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD). Google ScholarDigital Library
Bickel, S., Bogojeska, J., Lengauer, T., and Scheffer, T. 2008. Multi-task learning for HIV therapy screening. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press. Google ScholarDigital Library
Candès, E. J., Li, X., Ma, Y., and Wright, J. 2009. Robust principal component analysis? J. ACM 58, 1, 1--37. Google ScholarDigital Library
Caruana, R. 1997. Multitask learning. Mach. Learn. 28, 1, 41--75. Google ScholarDigital Library
Chandrasekaran, V., Sanghavi, S., Parrilo, P. A., and Willsky, A. S. 2009. Sparse and low-rank matrix decompositions. In Proceedings of the15th IFAC Symposium on System Identification (SYSID).Google Scholar
Chang, C.-C. and Lin, C.-J. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27, 1--27. Google ScholarDigital Library
Chen, J., Tang, L., Liu, J., and Ye, J. 2009a. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
Chen, X., Pan, W., Kwok, J. T., and Carbonell, J. G. 2009b. Accelerated gradient method for multi-task sparse learning problem. In Proceedings of the IEEE International Conference on Data Mining (ICDM). Google ScholarDigital Library
Chen, X., Kim, S., Lin, Q., Carbonell, J. G., and Xing, E. P. 2010a. Graph-structured multi-task regression and an efficient optimization method for general fused lasso. Computing Research Repository (CoRR), abs/1005.3579.Google Scholar
Chen, X., Lin, Q., Kim, S., Carbonell, J. G., and Xing, E. P. 2010b. An efficient proximal-gradient method for single and multi-task regression with structured sparsity. Computing Research Repository (CoRR), abs/1005.4717v3.Google Scholar
Evgeniou, T., Micchelli, C. A., and Pontil, M. 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615--637. Google ScholarDigital Library
Fazel, M., Hindi, H., and Boyd, S. 2001. A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference (ACC).Google Scholar
Fowlkes, C. C., Hendriks, C. L. L., Keränen, S. V., Weber, G. H., Rübel, O., Huang, M.-Y., Chatoor, S., DePace, A. H., Simirenko, L., Henriquez, C., Beaton, A., Weiszmann, R., Celniker, S., Hamann, B., Knowles, D. W., Biggin, M. D., Eisen, M. B., and Malik, J. 2008. A quantitative spatiotemporal atlas of gene expression in the drosophila blastoderm. Cell 133, 2, 364--374.Google ScholarCross Ref
Golub, G. H. and Van Loan, C. F. 1996. Matrix Computations. Johns Hopkins University Press.Google Scholar
Gonzalez, R. C. and Woods, R. E. 2002. Digital Image Processing. Prentice Hall. Google ScholarDigital Library
Jacob, L., Bach, F., and Vert, J.-P. 2008. Clustered multi-task learning: A convex formulation. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Jalali, A., Ravikumar, P., Sanghavi, S., and Ruan, C. 2010. A dirty model for multi-task learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Ji, S. and Ye, J. 2009. An accelerated gradient method for trace norm minimization. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
Ji, S., Yuan, L., Li, Y.-X., Zhou, Z.-H., Kumar, S., and Ye, J. 2009. Drosophila gene expression pattern annotation using sparse features and term-term interactions. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
Kim, S. and Xing, E. P. 2010. Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the International Conference on Machine Learning (ICML).Google Scholar
Lawrence, N. D. and Platt, J. C. 2004. Learning to learn with the informative vector machine. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
Lécuyer, E., Yoshida, H., Parthasarathy, N., Alm, C., Babak, T., Cerovina, T., Hughes, T. R., Tomancak, P., and Krause, H. M. 2007. Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function. Cell 131, 1, 174--187.Google ScholarCross Ref
Liu, J. and Ye, J. 2009. Efficient Euclidean projections in linear time. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
Liu, J., Ji, S., and Ye, J. 2009a. Multi-task feature learning via efficient l_2−1-norm minimization. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarDigital Library
Liu, J., Ji, S., and Ye, J. 2009b. SLEP: Sparse Learning with Efficient Projections. Arizona State University.Google Scholar
Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110. Google ScholarDigital Library
Martinez, A. and Benavente, R. 1998. The AR face database. Tech. rep. 24, Computer Vision Center (CVC).Google Scholar
Nemirovski, A. 1995. Efficient methods in convex programming. Lecture Notes, Technion.Google Scholar
Nesterov, Y. 1998. Introductory lectures on convex programming. Lecture Notes.Google Scholar
Obozinski, G., B.Taskar, and Jordan, M. 2010. Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20, 2, 231--252. Google ScholarDigital Library
Pong, T. K., Tseng, P., Ji, S., and Ye, J. 2010. Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20, 6. Google ScholarDigital Library
Schölkopf, B. and Smola, A. J. 2002. Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press.Google Scholar
Schwaighofer, A., Tresp, V., and Yu, K. 2004. Learning Gaussian process kernels via hierarchical Bayes. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Shapiro, A. 1982. Weighted minimum trace factor analysis. Psychometrika 47, 3, 243--264.Google ScholarCross Ref
Si, S., Tao, D., and Geng, B. 2010. Bregman divergence-based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 22, 7, 929--942. Google ScholarDigital Library
Sturm, J. F. 2001. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11-12, 653--625.Google Scholar
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc., Series B 58, 267--288.Google ScholarCross Ref
Ueda, N. and Saito, K. 2002. Single-shot detection of multiple categories of text using parametric mixture models. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
Vandenberghe, L. and Boyd, S. 1996. Semidefinite programming. SIAM Rev. 38, 1, 49--95. Google ScholarDigital Library
Watson, G. A. 1992. Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170, 33--45.Google ScholarCross Ref
Wright, J., Peng, Y., Ma, Y., Ganesh, A., and Rao, S. 2009. Robust principal component analysis: Exact recovery of corrupted low-rank matrices by convex optimization. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Xue, Y., Liao, X., Carin, L., and Krishnapuram, B. 2007. Multi-task learning for classification with Dirichlet process priors. J. Mach. Learn. Res. 8, 35--63. Google ScholarDigital Library
Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
Yu, K., Tresp, V., and Schwaighofer, A. 2005. Learning Gaussian processes from multiple tasks. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
Zhang, J., Ghahramani, Z., and Yang, Y. 2005. Learning multiple related tasks using latent independent component analysis. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Zhang, X., Saha, A., and Vishwanathan, S. V. N. 2010. Regularized risk minimization by Nesterov’s accelerated gradient methods: Algorithmic extensions and empirical studies. Computing Research Repository (CoRR). abs/1011.0472.Google Scholar
Zhang, Y. and Schneider, J. 2010. Learning multiple tasks with a sparse matrix-normal penalty. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
Zhang, Y. and Yeung, D.-Y. 2010. A convex formulation for learning task relationship in multi-task learning. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI).Google Scholar
Zhou, J., Chen, J., and Ye, J. 2011. Clustered multi-task learning via alternating structure optimization. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar

Index Terms

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Learning incoherent sparse and low-rank patterns from multiple tasks
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

We consider the problem of learning incoherent sparse and low-rank patterns from multiple tasks. Our approach is based on a linear multi-task learning formulation, in which the sparse and low-rank patterns are induced by a cardinality regularization ...
Read More
Improved sparse low-rank matrix estimation

We consider estimating simultaneously sparse and low-rank matrices from their noisy observations.We use non-convex penalty functions that are parameterized to ensure strict convexity of the overall objective function.An ADMM based algorithm is derived ...
Read More
Denoising by low-rank and sparse representations

A nonlocal image denoising approach using sparsity and low-rank priors is proposed.A parameter-free optimal singular value shrinker is introduced for low-rank modeling.An iterative patch-based low-rank regularized collaborative filtering is developed.A ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Knowledge Discovery from Data Volume 5, Issue 4
February 2012
176 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2086737
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 2012
- Accepted: 1 November 2011
- Revised: 1 October 2011
- Received: 1 December 2010
Published in tkdd Volume 5, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multitask learning
low-rank and sparse patterns
trace norm
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 81
  Total Citations
  View Citations
- 1,106
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Learning incoherent sparse and low-rank patterns from multiple tasks

Improved sparse low-rank matrix estimation

Denoising by low-rank and sparse representations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Learning incoherent sparse and low-rank patterns from multiple tasks

Improved sparse low-rank matrix estimation

Denoising by low-rank and sparse representations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media