Algorithms and applications for approximate nonnegative matrix factorization
Introduction
Recent technological developments in sensor technology and computer hardware have resulted in increasing quantities of data, rapidly overwhelming many of the classical data analysis tools available. Processing these large amounts of data has created new concerns with respect to data representation, disambiguation, and dimensionality reduction. Because information gathering devices have only finite bandwidth, the collected data are not often exact. For example, signals received by antenna arrays often are contaminated by noise and other degradations. Before useful deductive science can be applied, it is often important to first reconstruct or represent the data so that the inexactness is reduced while certain feasibility conditions are satisfied.
Secondly, in many situations the data observed from complex phenomena represent the integrated result of several interrelated variables acting together. When these variables are less precisely defined, the actual information contained in the original data might be overlapping and ambiguous. A reduced system model could provide a fidelity near the level of the original system. One common ground in the various approaches for noise removal, model reduction, feasibility reconstruction, and so on, is to replace the original data by a lower dimensional representation obtained via subspace approximation. The use of low-rank approximations, therefore, comes to the forefront in a wide range of important applications. Factor analysis and principal component analysis are two of the many classical methods used to accomplish the goal of reducing the number of variables and detecting structures among the variables.
Often the data to be analyzed is nonnegative, and the low-rank data are further required to be comprised of nonnegative values in order to avoid contradicting physical realities. Classical tools cannot guarantee to maintain the nonnegativity. The approach of finding reduced rank nonnegative factors to approximate a given nonnegative data matrix thus becomes a natural choice. This is the so-called nonnegative matrix factorization (NMF) problem which can be stated in generic form as follows: NMF problem Given a nonnegative matrix and a positive integer , find nonnegative matrices and to minimize the functionalThe product is called a NMF of , although is not necessarily equal to the product . Clearly the product is an approximate factorization of rank at most k, but we will omit the word “approximate” in the remainder of this paper. An appropriate decision on the value of k is critical in practice, but the choice of k is very often problem dependent. In most cases, however, k is usually chosen such that in which case can be thought of as a compressed form of the data in .
Another key characteristic of NMF is the ability of numerical methods that minimize (1) to extract underlying features as basis vectors in , which can then be subsequently used for identification and classification. By not allowing negative entries in and , NMF enables a non-subtractive combination of parts to form a whole (Lee and Seung, 1999). Features may be parts of faces in image data, topics or clusters in textual data, or specific absorption characteristics in hyperspectral data. In this paper, we discuss the enhancement of NMF algorithms for the primary goal of feature extraction and identification in text and spectral data mining.
Important challenges affecting the numerical minimization of (1) include the existence of local minima due to the non-convexity of in both and , and perhaps more importantly the lack of a unique solution which can be easily seen by considering for any nonnegative invertible matrix whose inverse, , is also nonnegative. These and other convergence related issues are dealt with in Section 3. Still, NMF is quite appealing for data mining applications since, in practice, even local minima can provide desirable properties such as data compression and feature extraction as previously explained.
The remainder of this paper is organized as follows. In Section 2 we give a brief description of numerical approaches for the solution of the NMF problem. Fundamental NMF algorithms and their convergence properties are discussed in Section 3. The use of constraints or penalty terms to augment solutions is discussed in Section 4 and applications of NMF algorithms in the fields of text mining and spectral data analysis are highlighted in Section 5. The need for further research in NMF algorithms concludes the paper in Section 6.
Section snippets
Numerical approaches for NMF
The 1999 article in Nature by Daniel Lee and Sebastian Seung (Lee and Seung, 1999) started a flurry of research into the new NMF. Hundreds of papers have cited Lee and Seung, but prior to its publication several lesser known papers by Pentti Paatero (Paatero and Tapper, 1994, Paatero, 1997, Paatero, 1999) actually deserve more credit for the factorization's historical development. Though Lee and Seung cite Paatero's 1997 paper on his so-called positive matrix factorization in their Nature
Fundamental algorithms
In this section, we provide a basic classification scheme that encompasses many of the NMF algorithms previously mentioned. Although such algorithms can straddle more than one class, in general they can be divided into three general classes: multiplicative update algorithms, gradient descent algorithms, and alternating least squares (ALS) algorithms. We note that Cichocki and Zdunek (2006) have recently created an entire library (NMFLAB) of MATLAB routines for each class of the NMF algorithms.
Application-dependent auxiliary constraints
As previously explained, the NMF problem formulation given in Section 1 is sometimes extended to include auxiliary constraints on and/or . This is often done to compensate for uncertainties in the data, to enforce desired characteristics in the computed solution, or to impose prior knowledge about the application at hand. Penalty terms are typically used to enforce auxiliary constraints, extending the cost function of Eq. (1) as follows:Here and are
Sample applications
The remaining sections of the paper illustrate two prominent applications of NMF algorithms: text mining and spectral data analysis. In each case, several references are provided along with new results achieved with the CNMF algorithm specified above.
Further improvements
In this paper, we have attempted to outline some of the major concepts related to nonnegative matrix factorization. In addition to developing applications for space object identification and classification and topic detection and tracking in text mining, several open problems with NMF remain. Here are a few of them:
- •
Initializing the factors and : Methods for choosing, or seeding, the initial matrices and for various algorithms (see, e.g., Wild, 2002, Wild et al., 2003, Boutsidis and
Acknowledgments
The authors would like to thank the anonymous referees for their valuable comments and suggestions on the original manuscript.
References (65)
- et al.
On reduced rank non-negative factorization for symmetric non-negative matrices
Linear Algebra Appl.
(2004) - et al.
On the convergence of the block nonlinear Gauss–Seidel method under convex constraints
Oper. Res. Lett.
(2000) Least squares formulation of robust non-negative factor analysis
Chemometrics and Intell. Laboratory Syst.
(1997)- et al.
Document clustering using nonnegative matrix factorization
Inform. Process. Manage.
(2006) - et al.
Non-Negative Matrices in the Mathematical Sciences
(1994) Lattice approximations to the minima of functions of several variables
J. Assoc. Comput. Mach.
(1969)- et al.
Email surveillance using nonnegative matrix factorization
Comput. Math. Organization Theory
(2005) - et al.
Understanding Search Engines: Mathematical Modeling and Text Retrieval
(2005) Nonlinear Programming
(1999)- Boutsidis, C., Gallopoulos, E., 2005. On SVD-based initialization for nonnegative matrix factorization. Technical...
A fast non-negativity constrained linear least squares algorithm
J. Chemometrics
Optimisation: Theorie et algorithmes
An information theoretic-based approach to spectral variability, similarity, and discriminability for hyperspectral image analysis
IEEE Trans. Inform. Theory
Additive structure in qualitative data: an alternating least squares method with optimal scaling features
Psychometrika
Fast Monte Carlo algorithms for matrices iii: computing a compressed approximate matrix decomposition
SIAM J. Comput.
Practical Optimization
Reconstruction of reflectance spectra using robust non-negative matrix factorization
IEEE Transactions on Signal Processing
Non-negative matrix factorization with sparseness constraints
J. Mach. Learning Res.
Cited by (1247)
Improving repeatability of surface electromyography measurement of sit-to-stand motions by using muscle synergy
2024, Biomedical Signal Processing and ControlBi-level algorithm for optimizing hyperparameters in penalized nonnegative matrix factorization
2023, Applied Mathematics and ComputationA unified beamforming and source separation model for static and dynamic human-robot interaction
2024, JASA Express Letters
- 1
Research supported in part by the National Science Foundation under Grant CAREER-CCF-0546622.
- 2
Research supported in part by the Air Force Office of Scientific Research under Grant FA49620-03-1-0215, the Army Research Office under Grants DAAD19-00-1-0540 and W911NF-05-1-0402, and by the U.S. Army Research Laboratory under the Collaborative Technology Alliance Program, Grant DAAD19-01-2-0008.