Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 1-4/2010

01.12.2010 | Original Article

An efficient gene selection technique for cancer recognition based on neighborhood mutual information

verfasst von: Qinghua Hu, Wei Pan, Shuang An, Peijun Ma, Jinmao Wei

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 1-4/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Gene selection is a key problem in gene expression based cancer recognition and related tasks. A measure, called neighborhood mutual information (NMI), is introduced to evaluate the relevance between genes and related decision in this work. Then the measure is combined with the search strategy of minimal redundancy and maximal relevancy (mRMR) for constructing a NMI based mRMR gene selection algorithm (NMI_mRMR). In addition, it is also found that the first k best genes with respect to NMI are usually enough for cancer classification. We can just perform mRMR on these genes and remove the rest in the preprocessing step, which will lead to reduction of computational time. Based on this observation, an efficient gene selection algorithm, denoted by NMI_EmRMR, is proposed. Several cancer recognition tasks are gathered for testing the proposed technique. The experimental results show NMI_EmRMR is effective and efficient.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Chee M, Yang R, Hubbell E et al (1996) Accessing genetic information with high-density DNA arrays. Science 274:610–614CrossRef Chee M, Yang R, Hubbell E et al (1996) Accessing genetic information with high-density DNA arrays. Science 274:610–614CrossRef
2.
Zurück zum Zitat Fodor SP, Read JL, Pirrung MC et al (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251:767–773CrossRef Fodor SP, Read JL, Pirrung MC et al (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251:767–773CrossRef
3.
Zurück zum Zitat DeRisi J et al (1996) Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nat Genet 14:457–460CrossRef DeRisi J et al (1996) Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nat Genet 14:457–460CrossRef
4.
Zurück zum Zitat Golub T et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef Golub T et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef
5.
Zurück zum Zitat Hoogeboom HJ, Kosters WA, Laros JFJ (2008) Selection of DNA markers. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:26–32CrossRef Hoogeboom HJ, Kosters WA, Laros JFJ (2008) Selection of DNA markers. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:26–32CrossRef
6.
Zurück zum Zitat Piatetsky-Shapiro G, Tamayo P (2003) Articles on microarray data mining. SIGKDD Explor 5:1–5CrossRef Piatetsky-Shapiro G, Tamayo P (2003) Articles on microarray data mining. SIGKDD Explor 5:1–5CrossRef
7.
Zurück zum Zitat Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20:2429–2437CrossRef Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20:2429–2437CrossRef
8.
Zurück zum Zitat Saeys Y, Inza I, Larranag P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517CrossRef Saeys Y, Inza I, Larranag P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517CrossRef
9.
Zurück zum Zitat Bandyopadhyay S, Maulik U, Roy D (2008) Gene identification: classical and computational intelligence approaches. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:55–68CrossRef Bandyopadhyay S, Maulik U, Roy D (2008) Gene identification: classical and computational intelligence approaches. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:55–68CrossRef
10.
Zurück zum Zitat Zhu ZX, Ong YS, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybernet Part B Cybernet 37:70–76CrossRef Zhu ZX, Ong YS, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybernet Part B Cybernet 37:70–76CrossRef
11.
Zurück zum Zitat Chow TWS, Wang P, Ma EWM (2008) A new feature selection scheme using a data distribution factor for unsupervised nominal data. IEEE Trans Syst Man Cybernet Part B Cybernet 38:499–509CrossRef Chow TWS, Wang P, Ma EWM (2008) A new feature selection scheme using a data distribution factor for unsupervised nominal data. IEEE Trans Syst Man Cybernet Part B Cybernet 38:499–509CrossRef
12.
Zurück zum Zitat Guyon I et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422MATHCrossRef Guyon I et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422MATHCrossRef
13.
Zurück zum Zitat Chen Z, Li J, Wei L (2007) A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 41:161–175CrossRef Chen Z, Li J, Wei L (2007) A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 41:161–175CrossRef
14.
Zurück zum Zitat Liu J, Ranka S, Kahveci T (2008) Classification and feature selection algorithms for multi-class. CGH Data 24:i86–i95 Liu J, Ranka S, Kahveci T (2008) Classification and feature selection algorithms for multi-class. CGH Data 24:i86–i95
15.
Zurück zum Zitat Maglietta R, D’Addabbo A, Piepoli A, Perri BF et al (2007) Selection of relevant genes in cancer diagnosis based on their prediction accuracy. Artif Intell Med 40:29–44CrossRef Maglietta R, D’Addabbo A, Piepoli A, Perri BF et al (2007) Selection of relevant genes in cancer diagnosis based on their prediction accuracy. Artif Intell Med 40:29–44CrossRef
16.
Zurück zum Zitat Su Y, Murali TM, Pavlovic V, Kasif S (2003) RankGene: identification of diagnostic genes based on expression data. Bioinformatics, pp 1578–1579 Su Y, Murali TM, Pavlovic V, Kasif S (2003) RankGene: identification of diagnostic genes based on expression data. Bioinformatics, pp 1578–1579
17.
Zurück zum Zitat Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the IEEE computer society conference on bioinformatics, pp 523–528 Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the IEEE computer society conference on bioinformatics, pp 523–528
18.
Zurück zum Zitat Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238CrossRef
19.
Zurück zum Zitat Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3:185–205CrossRef Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3:185–205CrossRef
21.
Zurück zum Zitat Yun C, Shin D, Jo H, Yang J, Kim S (2007) An experimental study on feature subset selection methods. Computer and Information Technology, in CIT 2007. 7th IEEE international conference on, pp 77–82 Yun C, Shin D, Jo H, Yang J, Kim S (2007) An experimental study on feature subset selection methods. Computer and Information Technology, in CIT 2007. 7th IEEE international conference on, pp 77–82
23.
Zurück zum Zitat Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550CrossRef Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550CrossRef
24.
Zurück zum Zitat Wang H, Bell D, Murtagh F (1999) Axiomatic approach to feature subset selection based on relevance. IEEE Trans Pattern Anal Mach Intell 21:271–277CrossRef Wang H, Bell D, Murtagh F (1999) Axiomatic approach to feature subset selection based on relevance. IEEE Trans Pattern Anal Mach Intell 21:271–277CrossRef
25.
Zurück zum Zitat Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of 17th international conference machine learning, pp 359–366 Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of 17th international conference machine learning, pp 359–366
26.
Zurück zum Zitat Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res J Mach Learn Res 5:1205–1224MathSciNet Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res J Mach Learn Res 5:1205–1224MathSciNet
27.
Zurück zum Zitat Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of thirteenth international joint conference on artificial intelligence. Morgan Kaufmann, San Mateo, CA, pp 1022–1027 Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of thirteenth international joint conference on artificial intelligence. Morgan Kaufmann, San Mateo, CA, pp 1022–1027
28.
Zurück zum Zitat Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24:1667–1671CrossRef Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24:1667–1671CrossRef
29.
Zurück zum Zitat Li J, Su H, Chen H, Futscher BW (2007) Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans Inform Technol Biomed 11:398–405CrossRef Li J, Su H, Chen H, Futscher BW (2007) Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans Inform Technol Biomed 11:398–405CrossRef
30.
Zurück zum Zitat Perou CM, Sørlie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752CrossRef Perou CM, Sørlie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752CrossRef
31.
Zurück zum Zitat Alizadeh A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 4051:503–511CrossRef Alizadeh A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 4051:503–511CrossRef
32.
Zurück zum Zitat Slonim DK, et al (2000) Class predication and discovery using expression data. In: Proceedings of the 4th annual international conference on computational molecular biology, pp 263–272 Slonim DK, et al (2000) Class predication and discovery using expression data. In: Proceedings of the 4th annual international conference on computational molecular biology, pp 263–272
33.
Zurück zum Zitat Liu J, Iba H, Ishizuka M (2001) Selecting informative genes with parallel genetic algorithms in tissue classification. Genome Inform 12:14–23 Liu J, Iba H, Ishizuka M (2001) Selecting informative genes with parallel genetic algorithms in tissue classification. Genome Inform 12:14–23
34.
Zurück zum Zitat Armstrong SA et al (2000) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47CrossRef Armstrong SA et al (2000) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47CrossRef
35.
Zurück zum Zitat Beer DG, Kardia SLR, Huang CC et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824 Beer DG, Kardia SLR, Huang CC et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824
36.
Zurück zum Zitat Khan J, Weil JS, Ringnér M, Saall LH, Ladanyi M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679CrossRef Khan J, Weil JS, Ringnér M, Saall LH, Ladanyi M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679CrossRef
37.
Zurück zum Zitat Hu QH, Yu DR, Liu JF, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178:3577–3594MATHCrossRefMathSciNet Hu QH, Yu DR, Liu JF, Wu C (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178:3577–3594MATHCrossRefMathSciNet
38.
Zurück zum Zitat Hu QH, Yu DR, Xie ZX. Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27:414–423 Hu QH, Yu DR, Xie ZX. Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27:414–423
39.
Zurück zum Zitat Robnik-sikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53:23–69MATHCrossRef Robnik-sikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53:23–69MATHCrossRef
40.
Zurück zum Zitat Sotoca JM, Pla F, Sánchez JS (2007) Band selection in multispectral images by minimization of dependent information. IEEE Trans Syst Man Cybernet Part C Appl Rev 37:258–267CrossRef Sotoca JM, Pla F, Sánchez JS (2007) Band selection in multispectral images by minimization of dependent information. IEEE Trans Syst Man Cybernet Part C Appl Rev 37:258–267CrossRef
Metadaten
Titel
An efficient gene selection technique for cancer recognition based on neighborhood mutual information
verfasst von
Qinghua Hu
Wei Pan
Shuang An
Peijun Ma
Jinmao Wei
Publikationsdatum
01.12.2010
Verlag
Springer-Verlag
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 1-4/2010
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-010-0008-6

Weitere Artikel der Ausgabe 1-4/2010

International Journal of Machine Learning and Cybernetics 1-4/2010 Zur Ausgabe

Neuer Inhalt