An NMF-L2,1-Norm Constraint Method for Characteristic Gene Selection

Dong Wang; Jin-Xing Liu; Ying-Lian Gao; Jiguo Yu; Chun-Hou Zheng; Yong Xu

doi:10.1371/journal.pone.0158494

Abstract

Recent research has demonstrated that characteristic gene selection based on gene expression data remains faced with considerable challenges. This is primarily because gene expression data are typically high dimensional, negative, non-sparse and noisy. However, existing methods for data analysis are able to cope with only some of these challenges. In this paper, we address all of these challenges with a unified method: nonnegative matrix factorization via the L_2,1-norm (NMF-L_2,1). While L_2,1-norm minimization is applied to both the error function and the regularization term, our method is robust to outliers and noise in the data and generates sparse results. The application of our method to plant and tumor gene expression data demonstrates that NMF-L_2,1 can extract more characteristic genes than other existing state-of-the-art methods.

Citation: Wang D, Liu J-X, Gao Y-L, Yu J, Zheng C-H, Xu Y (2016) An NMF-L_2,1-Norm Constraint Method for Characteristic Gene Selection. PLoS ONE 11(7): e0158494. https://doi.org/10.1371/journal.pone.0158494

Editor: Xi Luo, Brown University, UNITED STATES

Received: August 21, 2015; Accepted: June 16, 2016; Published: July 18, 2016

Copyright: © 2016 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported in part by the NSFC under grant nos. 61572284, 61502272, 61370163 and 61272339; the Shandong Provincial Natural Science Foundation, under grant nos. ZR2013FL016 and BS2014DX004; and Shenzhen Municipal Science and Technology Innovation Council (Nos. JCYJ20140904154645958, JCYJ20140417172417174 and CXZZ20140904154910774). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

The development of microarray technologies makes the study of complex biological gene expression networks possible. Microarray datasets typically contain expression data for the thousands of genes profiled on each chip, and the number of replicates is much smaller than the number of genes, making the selection of genes difficult[1]. In addition, the inclusion of irrelevant or noisy variables may decrease the accuracy of selection[2]. The problem of how to select genes associated with the target terms has become a challenge for scientists[3]. For example, plants are able to cope with environmental challenges such as cold, heat, and salt, which are referred to as abiotic stresses; there must therefore exist specific interacting genes that respond to each abiotic stress. Another typical example is that of cancer, an important cause of human morbidity; the identification of genes that are frequently mutated in cancers and play an essential role in cancer development is critical. Many methods have been proposed for processing gene expression data collected by DNA microarray profiling[4–9]. For example, Liu et al.[10]used a method based on penalized matrix decomposition (PMD) to extract characteristic plant genes, and Zheng et al.[11] applied nonnegative matrix factorization (NMF) to tumor gene selection. Principal component analysis (PCA) and singular value decomposition (SVD) have also been used to analyze gene expression data[12]. Liu et al.[13] proposed a CIPMD algorithm (A Class-Information-Based Penalized Matrix Decomposition) for identifying plants core genes responding to abiotic stresses. This method is PMD method with label information. Liu et al.[14] proposed a PRFE algorithm (A P-Norm Robust Feature Extraction) for identifying differentially expressed genes. Although those methods are all feature selection methods and in widespread use, they present some disadvantages:

Although the elements of the initial data matrix are entirely nonnegative, the traditional low-rank algorithm [15]cannot guarantee nonnegative values in the project matrix, thereby complicating their biological interpretation.
The high dimensionality of data poses challenges, such as the so-called curse of dimensionality[16,17].
Faced with millions of individual data points, it is difficult to interpret gene expression data without sparse constraints.
Gene expression data often contains numerous outliers and abundant noise, which traditional methods do not effectively address.

NMF has been widely used in various fields because it can generate low-rank and nonnegative results. The ability to generate a low-rank nonnegative matrix to approximate a given nonnegative data matrix is a significant advantage[18], but the lack of sparsity in data processed via NMF makes this method less than ideal for characteristic gene selection. In high throughput datasets, gene expression data are high dimensional and always contain some redundant information(i.e., not all features are relevant). To address these problems, we sought to incorporate sparsity, or the reduction of certain vector elements to zero. The regular inclusion of sparsity has played a significant role in dimensionality reduction and feature selection[19]. For example, Journée et al.[20] proposed a sparse principal component analysis (SPCA) method using the generalized power method, and Wittenet al.[21]proposed a penalized matrix decomposition (PMD) method, which has been proven useful in microarray analysis by imposing penalization on factor matrices. Nonnegative matrix factorization with sparse constraints (NMFSC), which was first introduced by Patrik O. Hoyer in 2004 [22], accurately controls sparsity. NMFSC has been applied to the problems of imaging and gene selection, among others. However, it does not guarantee that entire rows of a matrix are sparse, which can lead to difficulties during feature selection. To address these issues, the L_2,1 version of NMF favors the inclusion of a small number of non-zero rows in the factor matrix, which are proposed to generate sparse results for rows.

However, these methods for generating sparsity apply the least square error function, which does not reliably address noise and outliers [23]. When faced with these complications, the error for both features and samples will be squared[24], increasing the effect of large noises or outliers [25]. As a result, the L_2,1 version of the error function has been proposed to address noisy data [26].

In light of these problems, we propose a novel method called Nonnegative Matrix Factorization with L_2,1-norm (NMF-L_2,1), which imposes an L_2,1-norm constraint on both the error function and a regularization term to solve the aforementioned problems simultaneously. A sparse regularization term avoids the potential problem of over-fitting and selects a sparse subset of features. Rather than use an L₂-norm-based error function, the L_2,1-norm-based error function diminishes the impact of the outliers and noise in a dataset[25,27].

The main contributions of this paper are the following:

First, L_2,1-norm is employed for regularization in our method to generate sparse results, making the results easier to interpret.
Second, nonnegative matrix factorization is utilized to generate low-rank results with nonnegative values.
Third, the L_2,1-norm-based error function is used to diminish the outliers and noise inherent in gene expression data.

This paper is organized as follows. The methodology section introduces the NMF-L_2,1 method and provides an efficient algorithm for estimation. The results and discussion section compares our method with other three methods: PMD, NMFSC and SPCA. Our conclusions are presented in the third section.

2 Methodology

2.1 Mathematical Definition of L_2,1-norm

This subsection briefly introduces the L_2,1-norm proposed in [28]. It is defined as (1) where mⁱ is the i-th row of M, m_ij is the (i, j)-th entry in M, M is an n × s matrix. An explanation of L_2,1-norm is as follows. First, we compute the L₂-norm of rows mⁱ, then compute the L₁-norm of vector b(M) = (‖m¹‖₂,‖m²‖₂,…,‖m^s‖₂). The amplitude of the components of vector b(M) dictate how important each dimension is L_2,1-norm favors a small number of non-zero rows in M,ensuring that an appropriate dimensional reduction is achieved[29].

2.2 Extracting Characteristic Genes by NMF-L2,1

In this paper, the matrix X denotes the initial gene expression dataset, whose size is n × c. Each column of X represents the transcriptional response of the n genes in one sample. Each row of X represents the expression level of a gene across all samples. Thus, the X can be approximated as: (2) where A is an n × d matrix, Y is a d × c matrix, and d < min(n, c).

The matrices Y and A are called the coefficient and basis matrices, respectively. Given suitable parameters for NMF-L_2,1, the sparse matrix A can be obtained. Characteristic genes can then be extracted according to the non-zero entries in A [30].

2.3 NMF based on L2,1-norm (NMF-L2,1)

Let X = (x₁, x₂,…,x_c) ∊ R^n×c, Y = (y₁, y₂,…,y_c)^T ∊ R^d×c. The error function of standard NMF [31] is (3)

Here, the error for each data point is calculated as a squared residual error in terms of ‖x_i−Ay_i‖². As a result, a few outliers with large errors can easily dominate the objection function due to the squared errors. Thus, it is reasonable to propose an NMF-L_2,1 formulation to reduce the influence of outliers and errors.

The error function of the NMF-L_2,1 formulation is: (4)

In this formulation, the error for each data point is ‖x_i−Ay_i‖, which is not squared; thus, the impact of large errors caused by outliers does not fully dominate the objective function[32].

The NMF-L_2,1 optimization problem is formulated as (5)

The problem in Eq (4) is equivalent to (6)

Thus, the above problem can be rewritten as (7) where I ∊ R^n×n is an identity matrix, n = c. Let B = [A λI] ∊ R^n×b and , where b = d + n.

Then, the problem can be reformulate das (8)

How can this optimization problem handle high dimensional, nonnegative, noisy and sparse data simultaneously? The reasons are as follows:

The L_2,1-norm error function term is designed to diminish the impact of noise or outliers contained in the original data. As a result, we can expect to obtain cleaner data for subsequent analyses.
This clean data may be not sparse–some features may be irrelevant to the learning procedure—so the L_2,1-norm regularization term[33]can be used to generate a sparse solution.
In the next section, we will demonstrate that the above conditions produce more effective models, especially for datasets that are sparse, nonnegative, high dimensional and noisy.

2.4 An Efficient Algorithm for NMF-L2,1

To solve the constrained optimization problem in Eq (7), Nie et al.[34]have provided an efficient algorithm. Here, we briefly introduce the efficient algorithm.

By introducing Lagrangian multiplier Λ, we first give the Lagrangian function as follows: (9) where Tr() is the trace function of a matrix. Here we introduce the augmented cost-function (10) where q ∊ R^b is an auxiliary vector and Q = diag(q) is a diagonal matrix with the diagonal element (11) in which ε is a positive number and infinitely close to, but not equal to, zero.

Taking the derivative of J(U, q) with respect to U to zero, we obtain: (12)

By multiplying the two sides of Eq (11) by BQ⁻¹ and using the constraint BU = X, we obtain (13)

Then, we obtain: (14)

More details of the algorithm can be found in[34]. Here, we summarize our method in Box 1. In each iteration, U is calculated with the current Q. Then, Q is updated based on the current U. The iteration procedure is repeated until the algorithm converges.

Box 1. NMF-L_2,1 method.

Input: X ∊ R^n×c and parameter λ.

Output: Y ∊ R^d×c,A ∊ R^n×d.

1: Initialize Q_t ∈ R^b×b as an identity matrix and A ∊ R^n×d as a nonnegative matrix,

set t = 0.

2: repeat

Compute .

Setting U > 0.

Compute diagonal matrix Q_t+1 according to Eq (11).

t = t+1.

A and Y are obtained from U according to Eq (7).

Until convergence.

In this paper, the characteristic genes are extracted by the coefficient matrix A. We summarize the NMF-L_2,1 method to extract core genes as follows:

Create the data matrix X based on gene expression data.
Obtain the basis matrix A by using the NMF-L_2,1 method.
Extract characteristic genes from non-zero entries in matrix A.
Exploit the Gene Ontology (GO) tool to investigate the extracted genes.

3 Results and Discussion

In this section, several experiments are carried out. In the first subsection, the NMF-L_2,1 method is compared with the following methods for a gene expression dataset obtained from plants responding to abiotic stresses: (a) the PMD method (proposed by Witten et al. [15]); (b) the SPCA method (proposed by Journée et al. [35]); and (c) the NMFSC method (proposed by Patrik O. Hoyer[36]);(d) the CIPMD method (proposed by Liu et al.[13]); (e) PRFE method (proposed by Liu et al. [14]). In the second subsection, the six methods are compared for Medulloblastoma and leukemia tumor datasets.

3.1 Results for Plant Gene Expression Data

Plants are continually challenged by environmental parameters such as drought, salt, cold, osmotic pressure, and UV-B light [37]. Among plant genes, there must exist a specific set of interacting genes that respond to each abiotic stress. Thus, it is important but challenging to extract genes responding to each abiotic stress from plant gene expression data.

3.1.1 Data source.

The gene expression datasets used in our experiment were downloaded from the NASC Arrays [http://affy.arabidopsis.info/link_to_iplant.shtml]. Each sample profiles 22810 genes. The plant gene expression datasets are shown in supplementary file (S1 Table). Table 1 lists the reference numbers and sample numbers for each stressor.

Download:

Table 1. Reference and Sample Numbers for Stress Types.

https://doi.org/10.1371/journal.pone.0158494.t001

3.1.2 The selection of parameter λ.

In order to obtain the most effective results, we used gene expression data to train the parameter λ. For each sample, the parameter varied from 0–1 with a step of 0.1, and GO Terms were used to select the most appropriate parameter. The results are provided in Table 2.

Download:

Table 2. The Selection of Parameter λ.

https://doi.org/10.1371/journal.pone.0158494.t002

3.1.3 gene ontology (GO) analysis.

In this paper, GO Term is used to evaluate the genes that responded to plant abiotic stressors[38]. GO Term Finder analysis provided information to aid with the biological interpretation of high-throughput experiments. GO Term Finder is available publicly at [http://go.princeton.edu/cgi-bin/GOTermFinderS] [39]; it aims to describe genes in the query/input set and to find the genes that may have something in common.

For the sake of simplicity, 500 genes were selected from the gene expression data by the NMFSC, PMD, SPCA, CIPMD, PRFE and NMF-L₂₁ methods. The threshold parameters used were: maximum P-value = 0.01, and minimum number of genes = 2.

3.1.4 Response to stimulus.

Table 3 summarizes the results of the response to a stimulus whose background frequency in the TAIR (A.thaliana (common wallcress))set was 6617/30320 (21.8%). The results are presented according to P-value and sample frequency. The P-value was calculated using a hyper-geometric distribution (details can be seen in[40]). The sample frequency denotes the number of the characteristic genes selected. For example, 330/500 denotes that 330 genes corresponding to GO terms out of 500 genes were selected by the method.

Download:

Table 3. Response to Stimulus (GO: 0050896).

https://doi.org/10.1371/journal.pone.0158494.t003

As listed in Table 3, the six methods were compared by sample frequency and P-value. NMFSC, NMFL21, PRFE, SPCA and PMD are unsupervised methods, so we first compare the five algorithms. In the 12 terms, the results show that our method could extract more characteristic genes than the other methods for eight of twelve samples. For example, for shoot samples exposed to UV-B stress, the sample frequency was 76.4% by our method, 72% by NMFSC, 72.4% by PMD, 66.4% by SPCA and 61.8% by PRFE. This shows that our method is markedly improved over PRFE, PMD and SPCA. When compared with the supervised method CIPMD, except the salt and UV-B stress, our methods performs worth than CIPMD. Generally speaking, since supervised methods take the class labels into consideration, they usually have better performance than unsupervised methods.

3.1.5 Response to the abiotic stimulus.

Table 4 summarizes the results of the six methods for datasets describing the response to abiotic stimulus whose background frequency in the TAIR (A.thaliana (common wallcress)) set is 1539/29556 (5.2%). The numbers of characteristic genes and the P-values of genes responding to an abiotic stimulus (GO:0009628) in root and shoot samples are listed in Table 4.

Download:

Table 4. Response to an Abiotic Stimulus (GO:0009628).

https://doi.org/10.1371/journal.pone.0158494.t004

As described above in the ‘Response to stimulus’ section, we first compare the five algorithms NMFSC, NMFL21, PRFE, SPCA and PMD. From the results we can see that our method could extract more characteristic genes than the PMD and SPCA methods for all datasets. For the four sample datasets (drought, salt, UV-B and osmotic), our method performed worse than NMFSC, but superior to PMD and SPCA. When compared with the supervised method CIPMD, except the salt, heat and UV-B stress, CIPMD performs better than other methods.

3.1.6 Characteristic terms.

In Table 5, we list the characteristic terms. Our method outperformed SPCA and PMD for all 12 items and outperformed NMFSC for seven items. Only in one item (shoot sample in UV-B) the PRFE outperforms than our method. However, for one of the twelve items (cold in root) our method produced the same result as NMFSC. From these results, it can be concluded that our method is more effective than other unsupervised methods. The response to water deprivation (GO:0009414) for shoot samples is also analyzed in Table 5. The background frequency of the response to water deprivation (GO: 0009414) is 1.4%. It is obvious that NMF-L_2,1 is able to extract more characteristic genes than the other methods, and the sample frequency in response to water deprivation by our method is 18.1%, while it is 13.2% for NMFSC, 11.9% for PMD, 16.8% for CIPMD, 11.2% for PRFE and 8.2% for SPCA, indicating that our method performs 6.2% better than PMD, 7% better than PRFE and almost 10% better than SPCA.

Download:

Table 5. Characteristic Terms Selected from GO by Algorithms.

https://doi.org/10.1371/journal.pone.0158494.t005

3.2 Results for tumor datasets.

Two tumor datasets were also analyzed to verify the performance of the proposed method. The medulloblastoma dataset contains 34 samples, which can be divided into 25 tumor and 9 normal tissue samples, and assesses the expression of 5893 genes [41]. The leukemia dataset consists of 5000 genes and 38 samples [42]. The samples include 27 tumor and 11 normal tissue samples.

To make a fair comparison, all the six methods extracted 100 genes as characteristic genes from the two tumor datasets. The Gene Ontology (GO) enrichment and functional annotation of the extracted genes by all six methods was performed by ToppFun, which is publicly available at [http://toppgene.cchmc.org/enrichment.jsp].

Tables 6 and 7 list the top 10 closely related terms with P-value corresponding to different methods for the two tumor datasets. From Table 6, it can be seen that NMF-L_2,1 outperforms the other three methods for all terms. For example, for the term M11197 in Table 6, the P-value from NMF-L_2,1 is 7.36E-129 and 70/389 denotes that NMF-L_2,1 extracts 70 genes corresponding to M11197, whereas NMFSC, PMD,CIPMF, PRFE and SPCA extracted 62, 62,48, 52 and 55 genes corresponding to M11197, respectively, and the total number of genes corresponding to M11197 was 389. In Table 7 we can see that our method outperforms than other method in seven terms, only in other three terms(17092989, 19755675, 11108479) PRFE have a lower P-value than our method. Thus, we can conclude that our method extracts more genes than others.

Download:

Table 6. P-value Terms for the Medulloblastoma Dataset.

https://doi.org/10.1371/journal.pone.0158494.t006

Download:

Table 7. P-value Terms for the Leukemia Dataset.

https://doi.org/10.1371/journal.pone.0158494.t007

In summary, we conclude that our method is generally superior to others and is effective for the extraction of genes.

4 Conclusions

In this paper, we proposed an effective method to select characteristic genes withL_2,1-norm minimization of both the error function and the regularization term. The L_2,1-norm-based error function is robust to outliers and noise in the data points and is computationally efficient. Furthermore, the L_2,1-norm-based regularization term is used to generate a sparse solution. We also used the nonnegative factorization method to avoid the problems stemming from the high-dimensional and nonnegative nature of the data. In summary, our method can cope with high dimensionality, non-negativity, sparseness and noise simultaneously.

Furthermore, the genes selected by our method and others from both plant and tumor datasets were compared using GO enrichment. These results indicate that the proposed NMF-L_2,1 method is superior to SPCA and PMD for selecting characteristic genes.

Supporting Information

S1 Table. The plant gene expression dataset.

https://doi.org/10.1371/journal.pone.0158494.s001

(ZIP)

Author Contributions

Conceived and designed the experiments: JXL. Performed the experiments: DW JXL. Analyzed the data: YLG JGY CHZ. Contributed reagents/materials/analysis tools: DW YX. Wrote the paper: DW JXL YX.

References

1. Zheng CH, Huang DS, Zhang L, Kong XZ (2009) Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Transactions on Information Technology in Biomedicine 13: 599–607. pmid:19369170
- View Article
- PubMed/NCBI
- Google Scholar
2. Hou C, Nie F, Li X, Yi D, Wu Y (2014) Joint embedding learning and sparse regression: A framework for unsupervised feature selection. Cybernetics, IEEE Transactions on 44: 793–804.
- View Article
- Google Scholar
3. Nie F, Xiang S, Jia Y, Zhang C, Yan S. Trace Ratio Criterion for Feature Selection; 2008. pp. 671–676.
4. Jauhari S, Rizvi S (2014) Mining gene expression data focusing cancer therapeutics: a digest. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 11: 533–547.
- View Article
- Google Scholar
5. Fa R, Nandi AK (2014) Noise resistant generalized parametric validity index of clustering for gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 11: 741–752.
- View Article
- Google Scholar
6. BALADANDAYUTHAPANI V, Coombes K, Momin A (2014) Latent Feature Decompositions for Integrative Analysis of Diverse High-throughput Genomic Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics: 1.
- View Article
- Google Scholar
7. Mazza T, Fusilli C, Saracino C, Mazzoccoli G, Tavano F, Vinciguerra M, et al. (2015) Functional impact of autophagy-related genes on the homeostasis and dynamics of pancreatic cancer cell lines. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1.
- View Article
- Google Scholar
8. Fang X, Xu Y, Li X, Fan Z, Liu H, Chen Y (2014) Locality and similarity preserving embedding for feature selection. Neurocomputing 128: 304–315.
- View Article
- Google Scholar
9. Nie F, Yuan J, Huang H. Optimal mean robust principal component analysis; 2014. pp. 1062–1070.
10. Liu J-X, Zheng C-H, Xu Y (2012) Extracting plants core genes responding to abiotic stresses by penalized matrix decomposition. Computers in biology and medicine 42: 582–589.
- View Article
- Google Scholar
11. Zheng CH, Ng To-Yee V, Zhang L, Shiu CK, Wang HQ (2011) Tumor Classification Based on Non-Negative Matrix Factorization Using Gene Expression Data. IEEE Transactions on NanoBioscience 10: 86–93. pmid:21742573
- View Article
- PubMed/NCBI
- Google Scholar
12. Livak KJ, Schmittgen TD (2001) Analysis of Relative Gene Expression Data Using Real-Time Quantitative PCR and the 2< sup>− ΔΔCT Method. methods 25: 402–408. pmid:11846609
- View Article
- PubMed/NCBI
- Google Scholar
13. Liu J-X, Liu J, Gao Y-L, Mi J-X, Ma C-X, Wang D (2014) A Class-Information-Based Penalized Matrix Decomposition for Identifying Plants Core Genes Responding to Abiotic Stresses. PloS one 9: e106097. pmid:25180509
- View Article
- PubMed/NCBI
- Google Scholar
14. Liu J, Liu J-X, Gao Y-L, Kong X-Z, Wang X-S, Wang D (2015) A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes. PloS one 10: e0133124. pmid:26201006
- View Article
- PubMed/NCBI
- Google Scholar
15. Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics: kxp008.
- View Article
- Google Scholar
16. Chen D, Cao X, Wen F, Sun J. Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification; 2013. IEEE. pp. 3025–3032 2013.
17. Hall P, Marron J, Neeman A (2005) Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67: 427–444.
- View Article
- Google Scholar
18. Lee DD, Seung HS. Algorithms for non-negative matrix factorization; 2001. pp. 556–562.
19. Di L, Pagan PE, Packer D, Martin CL, Akther S, Ramrattan G, et al. (2014) BorreliaBase: a phylogeny-centered browser of Borrelia genomes. BMC bioinformatics 15: 233. pmid:24994456
- View Article
- PubMed/NCBI
- Google Scholar
20. Journée M, Nesterov Y, Richtarik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research 11: 517–553.
- View Article
- Google Scholar
21. Yalavarthy PK, Pogue BW, Dehghani H, Paulsen KD (2007) Weight-matrix structured regularization provides optimal generalized least-squares estimate in diffuse optical tomography. Medical physics 34: 2085–2098. pmid:17654912
- View Article
- PubMed/NCBI
- Google Scholar
22. Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. The Journal of Machine Learning Research 5: 1457–1469
- View Article
- Google Scholar
23. Lin C-f, Wang S-d (2004) Training algorithms for fuzzy support vector machines with noisy data. Pattern recognition letters 25: 1647–1656.
- View Article
- Google Scholar
24. Ferson W, Nallareddy S, Xie B (2013) The “out-of-sample” performance of long run risk models. Journal of Financial Economics 107: 537–556.
- View Article
- Google Scholar
25. Nikolova M (2004) A variational approach to remove outliers and impulse noise. Journal of Mathematical Imaging and Vision 20: 99–120.
- View Article
- Google Scholar
26. Ding H, Wang C, Huang K, Machiraju R (2014) iGPSe: A visual analytic system for integrative genomic based cancer patient stratification. BMC Bioinformatics 15: 203. pmid:25000928
- View Article
- PubMed/NCBI
- Google Scholar
27. Utreras F (2013) Optimal smoothing of noisy data using spline functions. SIAM Journal on Scientific and Statistical Computing 2.
- View Article
- Google Scholar
28. Kong D, Ding C, Huang H. Robust nonnegative matrix factorization using l21-norm; 2011. ACM. pp. 673–682.
29. Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint l2, 1-norms minimization. Advances in neural information processing systems 23: 1813–1821.
- View Article
- Google Scholar
30. Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23: 1061–1067. pmid:17332020
- View Article
- PubMed/NCBI
- Google Scholar
31. Ortega-Martorell S, Lisboa PJ, Vellido A, Julià-Sapé M, Arús C (2012) Non-negative matrix factorisation methods for the spectral decomposition of MRS data from human brain tumours. BMC bioinformatics 13: 38. pmid:22401579
- View Article
- PubMed/NCBI
- Google Scholar
32. Liu J, Ji S, Ye J. Multi-task feature learning via efficient l 2,1-norm minimization; 2009. AUAI Press. pp. 339–348.
33. Yang S, Hou C, Zhang C, Wu Y (2013) Robust non-negative matrix factorization via joint sparse and graph regularization for transfer learning. Neural Computing and Applications 23: 541–559.
- View Article
- Google Scholar
34. Nie F, Huang H, Cai X, Ding CH. Efficient and robust feature selection via joint ℓ2, 1-norms minimization; 2010. pp. 1813–1821.
35. Nyamundanda G, Gormley IC, Brennan L (2014) A dynamic probabilistic principal components model for the analysis of longitudinal metabolomics data. Journal of the Royal Statistical Society: Series C (Applied Statistics).
- View Article
- Google Scholar
36. ZHANG Y, MU Z-c (2006) Ear recognition based on improved NMFSC. Journal of Computer Applications 4: 010.
- View Article
- Google Scholar
37. Allen GJ, Chu SP, Schumacher K, Shimazaki CT, Vafeados D, Kemper A, et al. (2000) Alteration of stimulus-specific guard cell calcium oscillations and stomatal closing in Arabidopsis det3 mutant. Science 289: 2338–2342. pmid:11009417
- View Article
- PubMed/NCBI
- Google Scholar
38. Jenks MA, Hasegawa PM (2008) Plant abiotic stress: John Wiley & Sons.
39. Feigelman J, Theis FJ, Marr C (2014) MCA: Multiresolution Correlation Analysis, a graphical tool for subpopulation identification in single-cell gene expression data. arXiv preprint arXiv:14072112.
- View Article
- Google Scholar
40. Dinkla K, El-Kebir M, Bucur C-I, Siderius M, Smit MJ, Westenberg MA, et al. (2014) eXamine: Exploring annotated modules in networks. BMC bioinformatics 15: 201. pmid:25002203
- View Article
- PubMed/NCBI
- Google Scholar
41. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, et al. (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415: 436–442. pmid:11807556
- View Article
- PubMed/NCBI
- Google Scholar
42. Wu M-Y, Dai D-Q, Zhang X-F, Zhu Y (2013) Cancer Subtype Discovery and Biomarker Identification via a New Robust Network Clustering Algorithm. PloS one 8.
- View Article
- Google Scholar

[ref1] 1. Zheng CH, Huang DS, Zhang L, Kong XZ (2009) Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Transactions on Information Technology in Biomedicine 13: 599–607. pmid:19369170
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Hou C, Nie F, Li X, Yi D, Wu Y (2014) Joint embedding learning and sparse regression: A framework for unsupervised feature selection. Cybernetics, IEEE Transactions on 44: 793–804.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Nie F, Xiang S, Jia Y, Zhang C, Yan S. Trace Ratio Criterion for Feature Selection; 2008. pp. 671–676.

[ref4] 4. Jauhari S, Rizvi S (2014) Mining gene expression data focusing cancer therapeutics: a digest. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 11: 533–547.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref5] 5. Fa R, Nandi AK (2014) Noise resistant generalized parametric validity index of clustering for gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 11: 741–752.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref6] 6. BALADANDAYUTHAPANI V, Coombes K, Momin A (2014) Latent Feature Decompositions for Integrative Analysis of Diverse High-throughput Genomic Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics: 1.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref7] 7. Mazza T, Fusilli C, Saracino C, Mazzoccoli G, Tavano F, Vinciguerra M, et al. (2015) Functional impact of autophagy-related genes on the homeostasis and dynamics of pancreatic cancer cell lines. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref8] 8. Fang X, Xu Y, Li X, Fan Z, Liu H, Chen Y (2014) Locality and similarity preserving embedding for feature selection. Neurocomputing 128: 304–315.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref9] 9. Nie F, Yuan J, Huang H. Optimal mean robust principal component analysis; 2014. pp. 1062–1070.

[ref10] 10. Liu J-X, Zheng C-H, Xu Y (2012) Extracting plants core genes responding to abiotic stresses by penalized matrix decomposition. Computers in biology and medicine 42: 582–589.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref11] 11. Zheng CH, Ng To-Yee V, Zhang L, Shiu CK, Wang HQ (2011) Tumor Classification Based on Non-Negative Matrix Factorization Using Gene Expression Data. IEEE Transactions on NanoBioscience 10: 86–93. pmid:21742573
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref12] 12. Livak KJ, Schmittgen TD (2001) Analysis of Relative Gene Expression Data Using Real-Time Quantitative PCR and the 2< sup>− ΔΔCT Method. methods 25: 402–408. pmid:11846609
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref13] 13. Liu J-X, Liu J, Gao Y-L, Mi J-X, Ma C-X, Wang D (2014) A Class-Information-Based Penalized Matrix Decomposition for Identifying Plants Core Genes Responding to Abiotic Stresses. PloS one 9: e106097. pmid:25180509
View Article
PubMed/NCBI
Google Scholar

[37] View Article

[38] PubMed/NCBI

[39] Google Scholar

[ref14] 14. Liu J, Liu J-X, Gao Y-L, Kong X-Z, Wang X-S, Wang D (2015) A P-Norm Robust Feature Extraction Method for Identifying Differentially Expressed Genes. PloS one 10: e0133124. pmid:26201006
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref15] 15. Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics: kxp008.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref16] 16. Chen D, Cao X, Wen F, Sun J. Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification; 2013. IEEE. pp. 3025–3032 2013.

[ref17] 17. Hall P, Marron J, Neeman A (2005) Geometric representation of high dimension, low sample size data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67: 427–444.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref18] 18. Lee DD, Seung HS. Algorithms for non-negative matrix factorization; 2001. pp. 556–562.

[ref19] 19. Di L, Pagan PE, Packer D, Martin CL, Akther S, Ramrattan G, et al. (2014) BorreliaBase: a phylogeny-centered browser of Borrelia genomes. BMC bioinformatics 15: 233. pmid:24994456
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref20] 20. Journée M, Nesterov Y, Richtarik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research 11: 517–553.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref21] 21. Yalavarthy PK, Pogue BW, Dehghani H, Paulsen KD (2007) Weight-matrix structured regularization provides optimal generalized least-squares estimate in diffuse optical tomography. Medical physics 34: 2085–2098. pmid:17654912
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref22] 22. Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. The Journal of Machine Learning Research 5: 1457–1469
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref23] 23. Lin C-f, Wang S-d (2004) Training algorithms for fuzzy support vector machines with noisy data. Pattern recognition letters 25: 1647–1656.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref24] 24. Ferson W, Nallareddy S, Xie B (2013) The “out-of-sample” performance of long run risk models. Journal of Financial Economics 107: 537–556.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref25] 25. Nikolova M (2004) A variational approach to remove outliers and impulse noise. Journal of Mathematical Imaging and Vision 20: 99–120.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref26] 26. Ding H, Wang C, Huang K, Machiraju R (2014) iGPSe: A visual analytic system for integrative genomic based cancer patient stratification. BMC Bioinformatics 15: 203. pmid:25000928
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref27] 27. Utreras F (2013) Optimal smoothing of noisy data using spline functions. SIAM Journal on Scientific and Statistical Computing 2.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref28] 28. Kong D, Ding C, Huang H. Robust nonnegative matrix factorization using l21-norm; 2011. ACM. pp. 673–682.

[ref29] 29. Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint l2, 1-norms minimization. Advances in neural information processing systems 23: 1813–1821.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref30] 30. Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23: 1061–1067. pmid:17332020
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref31] 31. Ortega-Martorell S, Lisboa PJ, Vellido A, Julià-Sapé M, Arús C (2012) Non-negative matrix factorisation methods for the spectral decomposition of MRS data from human brain tumours. BMC bioinformatics 13: 38. pmid:22401579
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref32] 32. Liu J, Ji S, Ye J. Multi-task feature learning via efficient l 2,1-norm minimization; 2009. AUAI Press. pp. 339–348.

[ref33] 33. Yang S, Hou C, Zhang C, Wu Y (2013) Robust non-negative matrix factorization via joint sparse and graph regularization for transfer learning. Neural Computing and Applications 23: 541–559.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref34] 34. Nie F, Huang H, Cai X, Ding CH. Efficient and robust feature selection via joint ℓ2, 1-norms minimization; 2010. pp. 1813–1821.

[ref35] 35. Nyamundanda G, Gormley IC, Brennan L (2014) A dynamic probabilistic principal components model for the analysis of longitudinal metabolomics data. Journal of the Royal Statistical Society: Series C (Applied Statistics).
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref36] 36. ZHANG Y, MU Z-c (2006) Ear recognition based on improved NMFSC. Journal of Computer Applications 4: 010.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref37] 37. Allen GJ, Chu SP, Schumacher K, Shimazaki CT, Vafeados D, Kemper A, et al. (2000) Alteration of stimulus-specific guard cell calcium oscillations and stomatal closing in Arabidopsis det3 mutant. Science 289: 2338–2342. pmid:11009417
View Article
PubMed/NCBI
Google Scholar

[106] View Article

[107] PubMed/NCBI

[108] Google Scholar

[ref38] 38. Jenks MA, Hasegawa PM (2008) Plant abiotic stress: John Wiley & Sons.

[ref39] 39. Feigelman J, Theis FJ, Marr C (2014) MCA: Multiresolution Correlation Analysis, a graphical tool for subpopulation identification in single-cell gene expression data. arXiv preprint arXiv:14072112.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref40] 40. Dinkla K, El-Kebir M, Bucur C-I, Siderius M, Smit MJ, Westenberg MA, et al. (2014) eXamine: Exploring annotated modules in networks. BMC bioinformatics 15: 201. pmid:25002203
View Article
PubMed/NCBI
Google Scholar

[114] View Article

[115] PubMed/NCBI

[116] Google Scholar

[ref41] 41. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, et al. (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415: 436–442. pmid:11807556
View Article
PubMed/NCBI
Google Scholar

[118] View Article

[119] PubMed/NCBI

[120] Google Scholar

[ref42] 42. Wu M-Y, Dai D-Q, Zhang X-F, Zhu Y (2013) Cancer Subtype Discovery and Biomarker Identification via a New Robust Network Clustering Algorithm. PloS one 8.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

Figures

Abstract

1 Introduction

2 Methodology

2.1 Mathematical Definition of L2,1-norm

2.2 Extracting Characteristic Genes by NMF-L2,1

2.3 NMF based on L2,1-norm (NMF-L2,1)

2.4 An Efficient Algorithm for NMF-L2,1

Box 1. NMF-L2,1 method.

3 Results and Discussion

3.1 Results for Plant Gene Expression Data

3.1.1 Data source.

3.1.2 The selection of parameter λ.

3.1.3 gene ontology (GO) analysis.

3.1.4 Response to stimulus.

3.1.5 Response to the abiotic stimulus.

3.1.6 Characteristic terms.

3.2 Results for tumor datasets.

4 Conclusions

Supporting Information

S1 Table. The plant gene expression dataset.

Author Contributions

References

2.1 Mathematical Definition of L_2,1-norm

Box 1. NMF-L_2,1 method.