Abstract
Feature selection techniques have been widely applied to tumor gene expression data analysis in recent years. A filter feature selection method named marginal Fisher analysis score (MFA score) which is based on graph embedding has been proposed, and it has been widely used mainly because it is superior to Fisher score. Considering the heavy redundancy in gene expression data, we proposed a new filter feature selection technique in this paper. It is named MFA score+ and is based on MFA score and redundancy excluding. We applied it to an artificial dataset and eight tumor gene expression datasets to select important features and then used support vector machine as the classifier to classify the samples. Compared with MFA score, t test and Fisher score, it achieved higher classification accuracy.
Similar content being viewed by others
Abbreviations
- MFA:
-
Marginal Fisher analysis
- FDA:
-
Fisher discriminant analysis
- SVM:
-
Support vector machine
- MFA score+:
-
Marginal Fisher analysis score and redundancy excluding
References
Liu H, Li J, Wong L (2002) A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform 13:51–60
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517
He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Proceedings of neural information processing systems, pp 505–512
He X, Cai D, Yan S, Zhang H (2005) Neighborhood preserving embedding. IEEE Int Conf Comput Vis 2:1208–1213
He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27:328–340
Xu D, Yan S, Tao D, Lin S, Zhang H (2007) Marginal Fisher analysis and its variants for human gait recognition and content-based image retrieval. IEEE Trans Image Process 16:2811–2821
Yan S, Xu D, Zhang B, Zhang H (2005) Graph embedding: a general framework for dimensionality reduction. IEEE Intell Conf Comput Vis Pattern Recognit 2:830–837
Yan S, Xu D, Zhang L, Zhang B, Zhang H (2005) Coupled kernel-based subspace learning. Comput Soc Conf Comput Vis Pattern Recognit 1:645–650
Wei D, Li S, Tan M (2012) Graph embedding based feature selection. Neurocomputing 93:115–125
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF (2005) GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform 74:491–503
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Devore J, Peck R (1997) Statistics: the exploration and analysis of data. Duxbury Press, Pacific Grove
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
Fukunaga K, Mantock JM (1983) Nonparametric discriminant analysis. IEEE Trans Pattern Anal Mach Intell 5:671–678
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
The Y, Roweis S (2002) Automatic alignment of hidden representations. Adv Neural Inf Process Syst 15:841–848
Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Yan S, Zhang H, Hu Y, Zhang B, Cheng Q (2004) Discriminant analysis on embedded manifold. Process Eighth Eur Conf Comput Vis 1:121–132
Ye J, Janardan R, Li Q (2005) Two-dimensional linear discriminant analysis. Adv Neural Inf Process Syst 17:1569–1576
Yu H, Yang J (2001) A direct LDA algorithm for high dimensional data-with application to face recognition. Pattern Recognit 34:2067–2070
Zhao J, Lu K, He X (2008) Locality sensitive semi-supervised feature selection. Neurocomputing 71:1842–1849
Acknowledgments
This work is supported by the Project for the National Key Technology R&D Program under Grant No. 2011BAC12B0304 and the Scientific Plan of Beijing Municipal Commission of Education under Grant No. JC002011200903.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, J., Su, L. & Pang, Z. A Filter Feature Selection Method Based on MFA Score and Redundancy Excluding and It’s Application to Tumor Gene Expression Data Analysis. Interdiscip Sci Comput Life Sci 7, 391–396 (2015). https://doi.org/10.1007/s12539-015-0272-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-015-0272-y