skip to main content
10.1145/2020408.2020577acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Fast coordinate descent methods with variable selection for non-negative matrix factorization

Published:21 August 2011Publication History

ABSTRACT

Nonnegative Matrix Factorization (NMF) is an effective dimension reduction method for non-negative dyadic data, and has proven to be useful in many areas, such as text mining, bioinformatics and image processing. NMF is usually formulated as a constrained non-convex optimization problem, and many algorithms have been developed for solving it. Recently, a coordinate descent method, called FastHals, has been proposed to solve least squares NMF and is regarded as one of the state-of-the-art techniques for the problem. In this paper, we first show that FastHals has an inefficiency in that it uses a cyclic coordinate descent scheme and thus, performs unneeded descent steps on unimportant variables. We then present a variable selection scheme that uses the gradient of the objective function to arrive at a new coordinate descent method. Our new method is considerably faster in practice and we show that it has theoretical convergence guarantees. Moreover when the solution is sparse, as is often the case in real applications, our new method benefits by selecting important variables to update more often, thus resulting in higher speed. As an example, on a text dataset RCV1, our method is 7 times faster than FastHals, and more than 15 times faster when the sparsity is increased by adding an L1 penalty. We also develop new coordinate descent methods when error in NMF is measured by KL-divergence by applying the Newton method to solve the one-variable sub-problems. Experiments indicate that our algorithm for minimizing the KL-divergence is faster than the Lee & Seung multiplicative rule by a factor of 10 on the CBCL image dataset.

References

  1. M. Berry, M. Browne, A. Langville, P. Pauca, and R. Plemmon. Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics and Data Analysis, 2007. Submitted.Google ScholarGoogle ScholarCross RefCross Ref
  2. D. Bersekas and J. Tsitsiklis. Parallel and distributed computation. Prentice-Hall, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Cichocki and A.-H. Phan. Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Transaction on Fundamentals, E92-A(3):708--721, 2009.Google ScholarGoogle Scholar
  4. E. Gaussier and C. Goutte. Relation between PLSA and NMF and implications. 28th Annual International ACM SIGIR Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. F. Gonzales and Y. Zhang. Accelerating the Lee-Seung algorithm for non-negative matrix factorization. Technical report, Department of Computational and Applied Mathematics, Rice University, 2005.Google ScholarGoogle Scholar
  6. L. Grippo and M. Sciandrone. On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Operations Research Letters, 26:127--136, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. O. Hoyer. Non-negative sparse coding. In Proceedings of IEEE Workshop on Neural Networks for Signal Processing, pages 557--565, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5:1457--1469, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C.-J. Hsieh and I. S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. Department of Computer Science TR-11-06, University of Texas at Austin, 2011.Google ScholarGoogle Scholar
  10. D. Kim, S. Sra, and I. S. Dhillon. Fast Newton-type methods for the least squares nonnegative matrix appoximation problem. Proceedings of the Sixth SIAM International Conference on Data Mining, pages 343--354, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Kim and H. Park. Non-negative matrix factorization based on alternating non-negativity constrained least squares and active set method. SIAM Journal on Matrix Analysis and Applications, 30(2):713--730, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Kim and H. Park. Toward faster nonnegative matrix factorization: A new algorithm and comparisons. Proceedings of the IEEE International Conference on Data Mining, pages 353--362, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, November 1998. MNIST database available at http://yann.lecun.com/exdb/mnist/.Google ScholarGoogle ScholarCross RefCross Ref
  14. D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 556--562. MIT Press, 2001.Google ScholarGoogle Scholar
  16. D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Li and S. Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Probl. Imaging, 3(3):487--503, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  18. C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19:2756--2779, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Liu, H. chih Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed Non-negative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce. 2010.Google ScholarGoogle Scholar
  20. P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error. Environmetrics, 5:111--126, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research, 3:1333--1356, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Piper, P. Pauca, R. Plemmons, and M. Giffin. Object characterization from spectral data using nonnegative factorization and information theory. In Proceedings of AMOS Technical Conference, 2004.Google ScholarGoogle Scholar
  23. R. Zdunek and A. Cichocki. Non-negative matrix factorization with quasi-newton optimization. Eighth International Conference on Artificial Intelligence and Soft Computing, ICAISC, pages 870--879, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast coordinate descent methods with variable selection for non-negative matrix factorization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
          August 2011
          1446 pages
          ISBN:9781450308137
          DOI:10.1145/2020408

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 August 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader