skip to main content
research-article

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

Published:01 February 2012Publication History
Skip Abstract Section

Abstract

We consider the problem of learning incoherent sparse and low-rank patterns from multiple tasks. Our approach is based on a linear multitask learning formulation, in which the sparse and low-rank patterns are induced by a cardinality regularization term and a low-rank constraint, respectively. This formulation is nonconvex; we convert it into its convex surrogate, which can be routinely solved via semidefinite programming for small-size problems. We propose employing the general projected gradient scheme to efficiently solve such a convex surrogate; however, in the optimization formulation, the objective function is nondifferentiable and the feasible domain is nontrivial. We present the procedures for computing the projected gradient and ensuring the global convergence of the projected gradient scheme. The computation of the projected gradient involves a constrained optimization problem; we show that the optimal solution to such a problem can be obtained via solving an unconstrained optimization subproblem and a Euclidean projection subproblem. We also present two projected gradient algorithms and analyze their rates of convergence in detail. In addition, we illustrate the use of the presented projected gradient algorithms for the proposed multitask learning formulation using the least squares loss. Experimental results on a collection of real-world data sets demonstrate the effectiveness of the proposed multitask learning formulation and the efficiency of the proposed projected gradient algorithms.

References

  1. Abernethy, J., Bach, F., Evgeniou, T., and Vert, J.-P. 2009. A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res. 10, 803--826. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ando, R. K. 2007. BioCreative II gene mention tagging system at IBM Watson. In Proceedings of the 2nd BioCreative Challenge Evaluation Workshop.Google ScholarGoogle Scholar
  3. Ando, R. K. and Zhang, T. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817--1853. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Argyriou, A., Evgeniou, T., and Pontil, M. 2008. Convex multi-task feature learning. Mach. Learn. 73, 3, 243--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. 2011. Convex optimization with sparsity-inducing norms. In Optimization for Machine Learning. S. Sra, S. Nowozin, and S. J. Wright Eds., MIT Press.Google ScholarGoogle Scholar
  6. Bakker, B. and Heskes, T. 2003. Task clustering and gating for Bayesian multitask learning. J. Mach. Learn. R. 4, 83--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Beck, A. and Teboulle, M. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bertsekas, D. P. 1999. Nonlinear Programming. Athena Scientific.Google ScholarGoogle Scholar
  9. Bertsekas, D. P., Nedic, A., and Ozdaglar, A. E. 2003. Convex Analysis and Optimization. Athena Scientific.Google ScholarGoogle Scholar
  10. Bi, J., Xiong, T., Yu, S., Dundar, M., and Rao, R. B. 2008. An improved multi-task learning approach with applications in medical diagnosis. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bickel, S., Bogojeska, J., Lengauer, T., and Scheffer, T. 2008. Multi-task learning for HIV therapy screening. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Candès, E. J., Li, X., Ma, Y., and Wright, J. 2009. Robust principal component analysis? J. ACM 58, 1, 1--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Caruana, R. 1997. Multitask learning. Mach. Learn. 28, 1, 41--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chandrasekaran, V., Sanghavi, S., Parrilo, P. A., and Willsky, A. S. 2009. Sparse and low-rank matrix decompositions. In Proceedings of the15th IFAC Symposium on System Identification (SYSID).Google ScholarGoogle Scholar
  16. Chang, C.-C. and Lin, C.-J. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27, 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chen, J., Tang, L., Liu, J., and Ye, J. 2009a. A convex formulation for learning shared structures from multiple tasks. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chen, X., Pan, W., Kwok, J. T., and Carbonell, J. G. 2009b. Accelerated gradient method for multi-task sparse learning problem. In Proceedings of the IEEE International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chen, X., Kim, S., Lin, Q., Carbonell, J. G., and Xing, E. P. 2010a. Graph-structured multi-task regression and an efficient optimization method for general fused lasso. Computing Research Repository (CoRR), abs/1005.3579.Google ScholarGoogle Scholar
  20. Chen, X., Lin, Q., Kim, S., Carbonell, J. G., and Xing, E. P. 2010b. An efficient proximal-gradient method for single and multi-task regression with structured sparsity. Computing Research Repository (CoRR), abs/1005.4717v3.Google ScholarGoogle Scholar
  21. Evgeniou, T., Micchelli, C. A., and Pontil, M. 2005. Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615--637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Fazel, M., Hindi, H., and Boyd, S. 2001. A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference (ACC).Google ScholarGoogle Scholar
  23. Fowlkes, C. C., Hendriks, C. L. L., Keränen, S. V., Weber, G. H., Rübel, O., Huang, M.-Y., Chatoor, S., DePace, A. H., Simirenko, L., Henriquez, C., Beaton, A., Weiszmann, R., Celniker, S., Hamann, B., Knowles, D. W., Biggin, M. D., Eisen, M. B., and Malik, J. 2008. A quantitative spatiotemporal atlas of gene expression in the drosophila blastoderm. Cell 133, 2, 364--374.Google ScholarGoogle ScholarCross RefCross Ref
  24. Golub, G. H. and Van Loan, C. F. 1996. Matrix Computations. Johns Hopkins University Press.Google ScholarGoogle Scholar
  25. Gonzalez, R. C. and Woods, R. E. 2002. Digital Image Processing. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jacob, L., Bach, F., and Vert, J.-P. 2008. Clustered multi-task learning: A convex formulation. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  27. Jalali, A., Ravikumar, P., Sanghavi, S., and Ruan, C. 2010. A dirty model for multi-task learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  28. Ji, S. and Ye, J. 2009. An accelerated gradient method for trace norm minimization. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ji, S., Yuan, L., Li, Y.-X., Zhou, Z.-H., Kumar, S., and Ye, J. 2009. Drosophila gene expression pattern annotation using sparse features and term-term interactions. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kim, S. and Xing, E. P. 2010. Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  31. Lawrence, N. D. and Platt, J. C. 2004. Learning to learn with the informative vector machine. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lécuyer, E., Yoshida, H., Parthasarathy, N., Alm, C., Babak, T., Cerovina, T., Hughes, T. R., Tomancak, P., and Krause, H. M. 2007. Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function. Cell 131, 1, 174--187.Google ScholarGoogle ScholarCross RefCross Ref
  33. Liu, J. and Ye, J. 2009. Efficient Euclidean projections in linear time. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Liu, J., Ji, S., and Ye, J. 2009a. Multi-task feature learning via efficient l2−1-norm minimization. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Liu, J., Ji, S., and Ye, J. 2009b. SLEP: Sparse Learning with Efficient Projections. Arizona State University.Google ScholarGoogle Scholar
  36. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Martinez, A. and Benavente, R. 1998. The AR face database. Tech. rep. 24, Computer Vision Center (CVC).Google ScholarGoogle Scholar
  38. Nemirovski, A. 1995. Efficient methods in convex programming. Lecture Notes, Technion.Google ScholarGoogle Scholar
  39. Nesterov, Y. 1998. Introductory lectures on convex programming. Lecture Notes.Google ScholarGoogle Scholar
  40. Obozinski, G., B.Taskar, and Jordan, M. 2010. Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20, 2, 231--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Pong, T. K., Tseng, P., Ji, S., and Ye, J. 2010. Trace norm regularization: Reformulations, algorithms, and multi-task learning. SIAM J. Optim. 20, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Schölkopf, B. and Smola, A. J. 2002. Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press.Google ScholarGoogle Scholar
  43. Schwaighofer, A., Tresp, V., and Yu, K. 2004. Learning Gaussian process kernels via hierarchical Bayes. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  44. Shapiro, A. 1982. Weighted minimum trace factor analysis. Psychometrika 47, 3, 243--264.Google ScholarGoogle ScholarCross RefCross Ref
  45. Si, S., Tao, D., and Geng, B. 2010. Bregman divergence-based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 22, 7, 929--942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sturm, J. F. 2001. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11-12, 653--625.Google ScholarGoogle Scholar
  47. Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc., Series B 58, 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  48. Ueda, N. and Saito, K. 2002. Single-shot detection of multiple categories of text using parametric mixture models. In Proceedings of the International SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Vandenberghe, L. and Boyd, S. 1996. Semidefinite programming. SIAM Rev. 38, 1, 49--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Watson, G. A. 1992. Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170, 33--45.Google ScholarGoogle ScholarCross RefCross Ref
  51. Wright, J., Peng, Y., Ma, Y., Ganesh, A., and Rao, S. 2009. Robust principal component analysis: Exact recovery of corrupted low-rank matrices by convex optimization. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  52. Xue, Y., Liao, X., Carin, L., and Krishnapuram, B. 2007. Multi-task learning for classification with Dirichlet process priors. J. Mach. Learn. Res. 8, 35--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Yu, K., Tresp, V., and Schwaighofer, A. 2005. Learning Gaussian processes from multiple tasks. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Zhang, J., Ghahramani, Z., and Yang, Y. 2005. Learning multiple related tasks using latent independent component analysis. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  56. Zhang, X., Saha, A., and Vishwanathan, S. V. N. 2010. Regularized risk minimization by Nesterov’s accelerated gradient methods: Algorithmic extensions and empirical studies. Computing Research Repository (CoRR). abs/1011.0472.Google ScholarGoogle Scholar
  57. Zhang, Y. and Schneider, J. 2010. Learning multiple tasks with a sparse matrix-normal penalty. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  58. Zhang, Y. and Yeung, D.-Y. 2010. A convex formulation for learning task relationship in multi-task learning. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI).Google ScholarGoogle Scholar
  59. Zhou, J., Chen, J., and Ye, J. 2011. Clustered multi-task learning via alternating structure optimization. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar

Index Terms

  1. Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 5, Issue 4
      February 2012
      176 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2086737
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 February 2012
      • Accepted: 1 November 2011
      • Revised: 1 October 2011
      • Received: 1 December 2010
      Published in tkdd Volume 5, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader