Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 4/2012

01.12.2012 | Original Article

Null space based feature selection method for gene expression data

verfasst von: Alok Sharma, Seiya Imoto, Satoru Miyano, Vandana Sharma

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 4/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature selection is quite an important process in gene expression data analysis. Feature selection methods discard unimportant genes from several thousands of genes for finding important genes or pathways for the target biological phenomenon like cancer. The obtained gene subset is used for statistical analysis for prediction such as survival as well as functional analysis for understanding biological characteristics. In this paper we propose a null space based feature selection method for gene expression data in terms of supervised classification. The proposed method discards the redundant genes by applying the information of null space of scatter matrices. We derive the method theoretically and demonstrate its effectiveness on several DNA gene expression datasets. The method is easy to implement and computationally efficient.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The finer categorization of feature selection methods will include filter approach, wrapper approach and embedded approach [19].
 
2
Most of the datasets are downloaded from the Kent Ridge Bio-medical Dataset (KRBD) (http://​datam.​i2r.​a-star.​edu.​sg/​datasets/​krbd/​). The datasets are transformed or reformatted and made available by KRBD repository and we have used them without any further preprocessing. Some datasets which are not available on KRBD repository are downloaded and directly used from respective authors’ supplement link. The URL addresses for all the datasets are given in the Reference Section.
 
Literatur
1.
Zurück zum Zitat Arif M, Akram MU, Minhas FAA (2010) Pruned fuzzy k-nearest neighbor classifier for beat classification. J Biomed Sci Eng 3:380–389CrossRef Arif M, Akram MU, Minhas FAA (2010) Pruned fuzzy k-nearest neighbor classifier for beat classification. J Biomed Sci Eng 3:380–389CrossRef
3.
Zurück zum Zitat Banerjee M, Mitra S, Banka H (2007) Evolutinary-rough feature selection in gene expression data. IEEE Trans Syst Man Cybern Part C Appl Rev 37:622–632CrossRef Banerjee M, Mitra S, Banka H (2007) Evolutinary-rough feature selection in gene expression data. IEEE Trans Syst Man Cybern Part C Appl Rev 37:622–632CrossRef
4.
Zurück zum Zitat Chen L-F, Liao H-YM, Ko M-T, Lin J-C, Yu G-J (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit 33:1713–1726CrossRef Chen L-F, Liao H-YM, Ko M-T, Lin J-C, Yu G-J (2000) A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognit 33:1713–1726CrossRef
5.
Zurück zum Zitat Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134CrossRef Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134CrossRef
6.
Zurück zum Zitat Caballero JCF, Martinez FJ, Hervas C, Gutierrez PA (2010) Sensitivity versus accuracy in multiclass problems using memetic Pareto evolutionary neural networks. IEEE Trans Neural Netw 21(5):750–770CrossRef Caballero JCF, Martinez FJ, Hervas C, Gutierrez PA (2010) Sensitivity versus accuracy in multiclass problems using memetic Pareto evolutionary neural networks. IEEE Trans Neural Netw 21(5):750–770CrossRef
7.
Zurück zum Zitat Cong G, Tan K-L, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: The ACM SIGMOD International Conference on Management of Data, pp 670–681 Cong G, Tan K-L, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: The ACM SIGMOD International Conference on Management of Data, pp 670–681
8.
Zurück zum Zitat Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkMATH Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkMATH
9.
Zurück zum Zitat Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discriminant methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87MathSciNetMATHCrossRef Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discriminant methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87MathSciNetMATHCrossRef
10.
Zurück zum Zitat Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press Inc., Hartcourt Brace Jovanovich, Publishers, Boston Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press Inc., Hartcourt Brace Jovanovich, Publishers, Boston
11.
Zurück zum Zitat Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914CrossRef Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914CrossRef
12.
Zurück zum Zitat Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537 (Data Source: http://datam.i2r.a-star.edu.sg/datasets/krbd/) Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537 (Data Source: http://​datam.​i2r.​a-star.​edu.​sg/​datasets/​krbd/​)
14.
Zurück zum Zitat Huang R, Liu Q, Lu H, Ma S (2002) Solving the small sample size problem of LDA. Proc ICPR 3:29–32 Huang R, Liu Q, Lu H, Ma S (2002) Solving the small sample size problem of LDA. Proc ICPR 3:29–32
15.
16.
Zurück zum Zitat Li J, Wong L (2003) Using rules to analyse bio-medical data: a comparison between C4.5 and PCL, In: Advances in Web-Age Information Management. Springer, Berlin/Heidelberg, pp 254–265 Li J, Wong L (2003) Using rules to analyse bio-medical data: a comparison between C4.5 and PCL, In: Advances in Web-Age Information Management. Springer, Berlin/Heidelberg, pp 254–265
17.
Zurück zum Zitat Pan W (2002) A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18:546–554CrossRef Pan W (2002) A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 18:546–554CrossRef
18.
Zurück zum Zitat Pavlidis P, Weston J, Cai J, Grundy WN, (2001) Gene functional classification from heterogeneous data. In: International Conference on Computational Biology, pp 249–255 Pavlidis P, Weston J, Cai J, Grundy WN, (2001) Gene functional classification from heterogeneous data. In: International Conference on Computational Biology, pp 249–255
19.
Zurück zum Zitat Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517 Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
20.
Zurück zum Zitat Sharma A, Paliwal KK (2010) Improved nearest centroid classifier with shrunken distance measure for null LDA method on cancer classification problem. Electron Lett IEE 46(18):1251–1252CrossRef Sharma A, Paliwal KK (2010) Improved nearest centroid classifier with shrunken distance measure for null LDA method on cancer classification problem. Electron Lett IEE 46(18):1251–1252CrossRef
21.
Zurück zum Zitat Sharma A, Koh CH, Imoto S, Miyano S (2011) Strategy of finding optimal number of features on gene expression data. Electron Lett IEE 47(8):480–482CrossRef Sharma A, Koh CH, Imoto S, Miyano S (2011) Strategy of finding optimal number of features on gene expression data. Electron Lett IEE 47(8):480–482CrossRef
23.
Zurück zum Zitat Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinforma 2(3 Suppl):S75–S83 Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinforma 2(3 Suppl):S75–S83
24.
Zurück zum Zitat Tao L, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(14):2429–2437 Tao L, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(14):2429–2437
25.
Zurück zum Zitat Thomas J, Olson JM, Tapscott SJ, Zhao LP (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11:1227–1236CrossRef Thomas J, Olson JM, Tapscott SJ, Zhao LP (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11:1227–1236CrossRef
26.
Zurück zum Zitat Tong DL, Mintram R (2010) Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int J Mach Learn Cybern 1(1–4):75–87CrossRef Tong DL, Mintram R (2010) Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int J Mach Learn Cybern 1(1–4):75–87CrossRef
27.
Zurück zum Zitat Wang X-Z, Dong C-R (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567CrossRef Wang X-Z, Dong C-R (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567CrossRef
28.
Zurück zum Zitat Wang X-Z, Zhai J-H, Lu S-X (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188–3202MathSciNetMATHCrossRef Wang X-Z, Zhai J-H, Lu S-X (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188–3202MathSciNetMATHCrossRef
29.
Zurück zum Zitat Ye J (2005) Characterization of a family of algorithms for generalized discriminant analysis on under sampled problems. J Mach Learn Res 6:483–502MathSciNetMATH Ye J (2005) Characterization of a family of algorithms for generalized discriminant analysis on under sampled problems. J Mach Learn Res 6:483–502MathSciNetMATH
30.
Zurück zum Zitat Zhao H-X, Xing H-J, Wang X-Z (2011) Two-stage dimensionality reduction approach based on 2DLDA and fuzzy rough sets technique. Neurocomputing 74:3722–3727CrossRef Zhao H-X, Xing H-J, Wang X-Z (2011) Two-stage dimensionality reduction approach based on 2DLDA and fuzzy rough sets technique. Neurocomputing 74:3722–3727CrossRef
Metadaten
Titel
Null space based feature selection method for gene expression data
verfasst von
Alok Sharma
Seiya Imoto
Satoru Miyano
Vandana Sharma
Publikationsdatum
01.12.2012
Verlag
Springer-Verlag
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 4/2012
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-011-0061-9

Weitere Artikel der Ausgabe 4/2012

International Journal of Machine Learning and Cybernetics 4/2012 Zur Ausgabe

Neuer Inhalt