poster

Fast coordinate descent methods with variable selection for non-negative matrix factorization

Authors:
Cho-Jui Hsieh

University of Texas at Austin, Austin, TX, USA

University of Texas at Austin, Austin, TX, USA
View Profile

,
Inderjit S. Dhillon

University of Texas at Austin, Austin, TX, USA

University of Texas at Austin, Austin, TX, USA
View Profile

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2011Pages 1064–1072https://doi.org/10.1145/2020408.2020577

Published:21 August 2011Publication History

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1064–1072

ABSTRACT

Nonnegative Matrix Factorization (NMF) is an effective dimension reduction method for non-negative dyadic data, and has proven to be useful in many areas, such as text mining, bioinformatics and image processing. NMF is usually formulated as a constrained non-convex optimization problem, and many algorithms have been developed for solving it. Recently, a coordinate descent method, called FastHals, has been proposed to solve least squares NMF and is regarded as one of the state-of-the-art techniques for the problem. In this paper, we first show that FastHals has an inefficiency in that it uses a cyclic coordinate descent scheme and thus, performs unneeded descent steps on unimportant variables. We then present a variable selection scheme that uses the gradient of the objective function to arrive at a new coordinate descent method. Our new method is considerably faster in practice and we show that it has theoretical convergence guarantees. Moreover when the solution is sparse, as is often the case in real applications, our new method benefits by selecting important variables to update more often, thus resulting in higher speed. As an example, on a text dataset RCV1, our method is 7 times faster than FastHals, and more than 15 times faster when the sparsity is increased by adding an L1 penalty. We also develop new coordinate descent methods when error in NMF is measured by KL-divergence by applying the Newton method to solve the one-variable sub-problems. Experiments indicate that our algorithm for minimizing the KL-divergence is faster than the Lee & Seung multiplicative rule by a factor of 10 on the CBCL image dataset.

References

M. Berry, M. Browne, A. Langville, P. Pauca, and R. Plemmon. Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics and Data Analysis, 2007. Submitted.Google ScholarCross Ref
D. Bersekas and J. Tsitsiklis. Parallel and distributed computation. Prentice-Hall, 1989. Google ScholarDigital Library
A. Cichocki and A.-H. Phan. Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Transaction on Fundamentals, E92-A(3):708--721, 2009.Google Scholar
E. Gaussier and C. Goutte. Relation between PLSA and NMF and implications. 28th Annual International ACM SIGIR Conference, 2005. Google ScholarDigital Library
E. F. Gonzales and Y. Zhang. Accelerating the Lee-Seung algorithm for non-negative matrix factorization. Technical report, Department of Computational and Applied Mathematics, Rice University, 2005.Google Scholar
L. Grippo and M. Sciandrone. On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Operations Research Letters, 26:127--136, 2000. Google ScholarDigital Library
P. O. Hoyer. Non-negative sparse coding. In Proceedings of IEEE Workshop on Neural Networks for Signal Processing, pages 557--565, 2002.Google ScholarCross Ref
P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5:1457--1469, 2004. Google ScholarDigital Library
C.-J. Hsieh and I. S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. Department of Computer Science TR-11-06, University of Texas at Austin, 2011.Google Scholar
D. Kim, S. Sra, and I. S. Dhillon. Fast Newton-type methods for the least squares nonnegative matrix appoximation problem. Proceedings of the Sixth SIAM International Conference on Data Mining, pages 343--354, 2007.Google ScholarCross Ref
J. Kim and H. Park. Non-negative matrix factorization based on alternating non-negativity constrained least squares and active set method. SIAM Journal on Matrix Analysis and Applications, 30(2):713--730, 2008. Google ScholarDigital Library
J. Kim and H. Park. Toward faster nonnegative matrix factorization: A new algorithm and comparisons. Proceedings of the IEEE International Conference on Data Mining, pages 353--362, 2008. Google ScholarDigital Library
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, November 1998. MNIST database available at http://yann.lecun.com/exdb/mnist/.Google ScholarCross Ref
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.Google ScholarCross Ref
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 556--562. MIT Press, 2001.Google Scholar
D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004. Google ScholarDigital Library
Y. Li and S. Osher. Coordinate descent optimization for l1 minimization with application to compressed sensing; a greedy algorithm. Inverse Probl. Imaging, 3(3):487--503, 2009.Google ScholarCross Ref
C.-J. Lin. Projected gradient methods for non-negative matrix factorization. Neural Computation, 19:2756--2779, 2007. Google ScholarDigital Library
C. Liu, H. chih Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed Non-negative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce. 2010.Google Scholar
P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error. Environmetrics, 5:111--126, 1994.Google ScholarCross Ref
S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research, 3:1333--1356, 2003. Google ScholarDigital Library
J. Piper, P. Pauca, R. Plemmons, and M. Giffin. Object characterization from spectral data using nonnegative factorization and information theory. In Proceedings of AMOS Technical Conference, 2004.Google Scholar
R. Zdunek and A. Cichocki. Non-negative matrix factorization with quasi-newton optimization. Eighth International Conference on Artificial Intelligence and Soft Computing, ICAISC, pages 870--879, 2006. Google ScholarDigital Library

Index Terms

Fast coordinate descent methods with variable selection for non-negative matrix factorization

Recommendations

New SVD based initialization strategy for non-negative matrix factorization

We give a new method to determine the rank of the factorization for NMF algorithms.We propose a novel method SVD-NMF to enhance initialization for NMF.The compute process is cheap.Numerical results show that SVD-NMF convergent faster than NNDSVD and ...
Read More
Fast bregman divergence NMF using taylor expansion and coordinate descent
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Non-negative matrix factorization (NMF) provides a lower rank approximation of a matrix. Due to nonnegativity imposed on the factors, it gives a latent structure that is often more physically meaningful than other lower rank approximations such as ...
Read More
Non-negative Matrix Factorization for Binary Data
IC3K 2015: Proceedings of the International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management

We propose the Logistic Non-negative Matrix Factorization for decomposition of binary data. Binary data

are frequently generated in e.g. text analysis, sensory data, market basket data etc. A common method for

analysing non-negative data is the Non-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convergence
coordinate descent method
non-negative matrix factorization
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 137
  Total Citations
  View Citations
- 850
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast coordinate descent methods with variable selection for non-negative matrix factorization

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

New SVD based initialization strategy for non-negative matrix factorization

Fast bregman divergence NMF using taylor expansion and coordinate descent

Non-negative Matrix Factorization for Binary Data