Algorithms and applications for approximate nonnegative matrix factorization

https://doi.org/10.1016/j.csda.2006.11.006Get rights and content

Abstract

The development and use of low-rank approximate nonnegative matrix factorization (NMF) algorithms for feature extraction and identification in the fields of text mining and spectral data analysis are presented. The evolution and convergence properties of hybrid methods based on both sparsity and smoothness constraints for the resulting nonnegative matrix factors are discussed. The interpretability of NMF outputs in specific contexts are provided along with opportunities for future work in the modification of NMF algorithms for large-scale and time-varying data sets.

Introduction

Recent technological developments in sensor technology and computer hardware have resulted in increasing quantities of data, rapidly overwhelming many of the classical data analysis tools available. Processing these large amounts of data has created new concerns with respect to data representation, disambiguation, and dimensionality reduction. Because information gathering devices have only finite bandwidth, the collected data are not often exact. For example, signals received by antenna arrays often are contaminated by noise and other degradations. Before useful deductive science can be applied, it is often important to first reconstruct or represent the data so that the inexactness is reduced while certain feasibility conditions are satisfied.

Secondly, in many situations the data observed from complex phenomena represent the integrated result of several interrelated variables acting together. When these variables are less precisely defined, the actual information contained in the original data might be overlapping and ambiguous. A reduced system model could provide a fidelity near the level of the original system. One common ground in the various approaches for noise removal, model reduction, feasibility reconstruction, and so on, is to replace the original data by a lower dimensional representation obtained via subspace approximation. The use of low-rank approximations, therefore, comes to the forefront in a wide range of important applications. Factor analysis and principal component analysis are two of the many classical methods used to accomplish the goal of reducing the number of variables and detecting structures among the variables.

Often the data to be analyzed is nonnegative, and the low-rank data are further required to be comprised of nonnegative values in order to avoid contradicting physical realities. Classical tools cannot guarantee to maintain the nonnegativity. The approach of finding reduced rank nonnegative factors to approximate a given nonnegative data matrix thus becomes a natural choice. This is the so-called nonnegative matrix factorization (NMF) problem which can be stated in generic form as follows:

NMF problem

Given a nonnegative matrix ARm×n and a positive integer k<min{m,n}, find nonnegative matrices WRm×k and HRk×n to minimize the functionalf(W,H)=12A-WHF2.The product WH is called a NMF of A, although A is not necessarily equal to the product WH. Clearly the product WH is an approximate factorization of rank at most k, but we will omit the word “approximate” in the remainder of this paper. An appropriate decision on the value of k is critical in practice, but the choice of k is very often problem dependent. In most cases, however, k is usually chosen such that kmin(m,n) in which case WH can be thought of as a compressed form of the data in A.

Another key characteristic of NMF is the ability of numerical methods that minimize (1) to extract underlying features as basis vectors in W, which can then be subsequently used for identification and classification. By not allowing negative entries in W and H, NMF enables a non-subtractive combination of parts to form a whole (Lee and Seung, 1999). Features may be parts of faces in image data, topics or clusters in textual data, or specific absorption characteristics in hyperspectral data. In this paper, we discuss the enhancement of NMF algorithms for the primary goal of feature extraction and identification in text and spectral data mining.

Important challenges affecting the numerical minimization of (1) include the existence of local minima due to the non-convexity of f(W,H) in both W and H, and perhaps more importantly the lack of a unique solution which can be easily seen by considering WDD-1H for any nonnegative invertible matrix D whose inverse, D-1, is also nonnegative. These and other convergence related issues are dealt with in Section 3. Still, NMF is quite appealing for data mining applications since, in practice, even local minima can provide desirable properties such as data compression and feature extraction as previously explained.

The remainder of this paper is organized as follows. In Section 2 we give a brief description of numerical approaches for the solution of the NMF problem. Fundamental NMF algorithms and their convergence properties are discussed in Section 3. The use of constraints or penalty terms to augment solutions is discussed in Section 4 and applications of NMF algorithms in the fields of text mining and spectral data analysis are highlighted in Section 5. The need for further research in NMF algorithms concludes the paper in Section 6.

Section snippets

Numerical approaches for NMF

The 1999 article in Nature by Daniel Lee and Sebastian Seung (Lee and Seung, 1999) started a flurry of research into the new NMF. Hundreds of papers have cited Lee and Seung, but prior to its publication several lesser known papers by Pentti Paatero (Paatero and Tapper, 1994, Paatero, 1997, Paatero, 1999) actually deserve more credit for the factorization's historical development. Though Lee and Seung cite Paatero's 1997 paper on his so-called positive matrix factorization in their Nature

Fundamental algorithms

In this section, we provide a basic classification scheme that encompasses many of the NMF algorithms previously mentioned. Although such algorithms can straddle more than one class, in general they can be divided into three general classes: multiplicative update algorithms, gradient descent algorithms, and alternating least squares (ALS) algorithms. We note that Cichocki and Zdunek (2006) have recently created an entire library (NMFLAB) of MATLAB® routines for each class of the NMF algorithms.

Application-dependent auxiliary constraints

As previously explained, the NMF problem formulation given in Section 1 is sometimes extended to include auxiliary constraints on W and/or H. This is often done to compensate for uncertainties in the data, to enforce desired characteristics in the computed solution, or to impose prior knowledge about the application at hand. Penalty terms are typically used to enforce auxiliary constraints, extending the cost function of Eq. (1) as follows:f(W,H)=A-WHF2+αJ1(W)+βJ2(H).Here J1(W) and J2(H) are

Sample applications

The remaining sections of the paper illustrate two prominent applications of NMF algorithms: text mining and spectral data analysis. In each case, several references are provided along with new results achieved with the CNMF algorithm specified above.

Further improvements

In this paper, we have attempted to outline some of the major concepts related to nonnegative matrix factorization. In addition to developing applications for space object identification and classification and topic detection and tracking in text mining, several open problems with NMF remain. Here are a few of them:

  • Initializing the factors W and H: Methods for choosing, or seeding, the initial matrices W and H for various algorithms (see, e.g., Wild, 2002, Wild et al., 2003, Boutsidis and

Acknowledgments

The authors would like to thank the anonymous referees for their valuable comments and suggestions on the original manuscript.

References (65)

  • British National Corpus (BNC), 2004....
  • R. Bro et al.

    A fast non-negativity constrained linear least squares algorithm

    J. Chemometrics

    (1997)
  • J. Cea

    Optimisation: Theorie et algorithmes

    (1971)
  • C.-I. Chang

    An information theoretic-based approach to spectral variability, similarity, and discriminability for hyperspectral image analysis

    IEEE Trans. Inform. Theory

    (2000)
  • Chen, Z., Cichocki, A., 2005. Nonnegative matrix factorization with temporal smoothness and/or spatial decorrelation...
  • Chu, M., Diele, F., Plemmons, R., Ragni, S., 2004. Optimality, computation, and interpretations of nonnegative matrix...
  • Cichocki, A., Zdunek, R., 2006. NMFLAB for Signal Processing, available at...
  • Cichocki, A., Zdunek, R., Amari, S., 2006. Csiszar's divergences for non-negative matrix factorization: family of new...
  • J. de Leeuw et al.

    Additive structure in qualitative data: an alternating least squares method with optimal scaling features

    Psychometrika

    (1976)
  • Dhillon, I., Sra, S., 2005. Generalized nonnegative matrix approximations with bregman divergences. In: Proceeding of...
  • Ding, C., He, X., Simon, H., 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In:...
  • Donoho, D., Stodden, V., 2003. When does non-negative matrix factorization give a correct decomposition into parts? In:...
  • P. Drineas et al.

    Fast Monte Carlo algorithms for matrices iii: computing a compressed approximate matrix decomposition

    SIAM J. Comput.

    (2006)
  • Finesso, L., Spreij, P., 2004. Approximate nonnegative matrix factorization via alternating minimization. In:...
  • P. Gill et al.

    Practical Optimization

    (1981)
  • Gonzalez, E., Zhang, Y., 2005. Accelerating the Lee–Seung algorithm for nonnegative matrix factorization. Technical...
  • Grieve, T., 2003. The Decline and Fall of the Enron Empire. Slate...
  • Guillamet, D., Bressan, M., Vitria, J., 2001. A weighted non-negative matrix factorization for local representations....
  • A.B. Hamza et al.

    Reconstruction of reflectance spectra using robust non-negative matrix factorization

    IEEE Transactions on Signal Processing

    (2006)
  • Hoyer, P., 2002. Non-negative sparse coding. In: Proceedings of the IEEE Workshop on Neural Networks for Signal...
  • P. Hoyer

    Non-negative matrix factorization with sparseness constraints

    J. Mach. Learning Res.

    (2004)
  • Karvanen, J., Cichocki, A., 2003. Measuring sparseness of noisy signals. In: Proceedings of the Fourth International...
  • Cited by (1247)

    View all citing articles on Scopus
    1

    Research supported in part by the National Science Foundation under Grant CAREER-CCF-0546622.

    2

    Research supported in part by the Air Force Office of Scientific Research under Grant FA49620-03-1-0215, the Army Research Office under Grants DAAD19-00-1-0540 and W911NF-05-1-0402, and by the U.S. Army Research Laboratory under the Collaborative Technology Alliance Program, Grant DAAD19-01-2-0008.

    View full text