Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 1-4/2010

01-12-2010 | Original Article

An efficient gene selection technique for cancer recognition based on neighborhood mutual information

Authors: Qinghua Hu, Wei Pan, Shuang An, Peijun Ma, Jinmao Wei

Published in: International Journal of Machine Learning and Cybernetics | Issue 1-4/2010

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Gene selection is a key problem in gene expression based cancer recognition and related tasks. A measure, called neighborhood mutual information (NMI), is introduced to evaluate the relevance between genes and related decision in this work. Then the measure is combined with the search strategy of minimal redundancy and maximal relevancy (mRMR) for constructing a NMI based mRMR gene selection algorithm (NMI_mRMR). In addition, it is also found that the first k best genes with respect to NMI are usually enough for cancer classification. We can just perform mRMR on these genes and remove the rest in the preprocessing step, which will lead to reduction of computational time. Based on this observation, an efficient gene selection algorithm, denoted by NMI_EmRMR, is proposed. Several cancer recognition tasks are gathered for testing the proposed technique. The experimental results show NMI_EmRMR is effective and efficient.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Chee M, Yang R, Hubbell E et al (1996) Accessing genetic information with high-density DNA arrays. Science 274:610–614CrossRef Chee M, Yang R, Hubbell E et al (1996) Accessing genetic information with high-density DNA arrays. Science 274:610–614CrossRef
2.
go back to reference Fodor SP, Read JL, Pirrung MC et al (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251:767–773CrossRef Fodor SP, Read JL, Pirrung MC et al (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251:767–773CrossRef
3.
go back to reference DeRisi J et al (1996) Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nat Genet 14:457–460CrossRef DeRisi J et al (1996) Use of a cDNA microarray to analyze gene expression patterns in human cancer. Nat Genet 14:457–460CrossRef
4.
go back to reference Golub T et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef Golub T et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef
5.
go back to reference Hoogeboom HJ, Kosters WA, Laros JFJ (2008) Selection of DNA markers. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:26–32CrossRef Hoogeboom HJ, Kosters WA, Laros JFJ (2008) Selection of DNA markers. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:26–32CrossRef
6.
go back to reference Piatetsky-Shapiro G, Tamayo P (2003) Articles on microarray data mining. SIGKDD Explor 5:1–5CrossRef Piatetsky-Shapiro G, Tamayo P (2003) Articles on microarray data mining. SIGKDD Explor 5:1–5CrossRef
7.
go back to reference Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20:2429–2437CrossRef Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20:2429–2437CrossRef
8.
go back to reference Saeys Y, Inza I, Larranag P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517CrossRef Saeys Y, Inza I, Larranag P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517CrossRef
9.
go back to reference Bandyopadhyay S, Maulik U, Roy D (2008) Gene identification: classical and computational intelligence approaches. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:55–68CrossRef Bandyopadhyay S, Maulik U, Roy D (2008) Gene identification: classical and computational intelligence approaches. IEEE Trans Syst Man Cybernet Part C Appl Rev 38:55–68CrossRef
10.
go back to reference Zhu ZX, Ong YS, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybernet Part B Cybernet 37:70–76CrossRef Zhu ZX, Ong YS, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybernet Part B Cybernet 37:70–76CrossRef
11.
go back to reference Chow TWS, Wang P, Ma EWM (2008) A new feature selection scheme using a data distribution factor for unsupervised nominal data. IEEE Trans Syst Man Cybernet Part B Cybernet 38:499–509CrossRef Chow TWS, Wang P, Ma EWM (2008) A new feature selection scheme using a data distribution factor for unsupervised nominal data. IEEE Trans Syst Man Cybernet Part B Cybernet 38:499–509CrossRef
12.
go back to reference Guyon I et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422MATHCrossRef Guyon I et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422MATHCrossRef
13.
go back to reference Chen Z, Li J, Wei L (2007) A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 41:161–175CrossRef Chen Z, Li J, Wei L (2007) A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue. Artif Intell Med 41:161–175CrossRef
14.
go back to reference Liu J, Ranka S, Kahveci T (2008) Classification and feature selection algorithms for multi-class. CGH Data 24:i86–i95 Liu J, Ranka S, Kahveci T (2008) Classification and feature selection algorithms for multi-class. CGH Data 24:i86–i95
15.
go back to reference Maglietta R, D’Addabbo A, Piepoli A, Perri BF et al (2007) Selection of relevant genes in cancer diagnosis based on their prediction accuracy. Artif Intell Med 40:29–44CrossRef Maglietta R, D’Addabbo A, Piepoli A, Perri BF et al (2007) Selection of relevant genes in cancer diagnosis based on their prediction accuracy. Artif Intell Med 40:29–44CrossRef
16.
go back to reference Su Y, Murali TM, Pavlovic V, Kasif S (2003) RankGene: identification of diagnostic genes based on expression data. Bioinformatics, pp 1578–1579 Su Y, Murali TM, Pavlovic V, Kasif S (2003) RankGene: identification of diagnostic genes based on expression data. Bioinformatics, pp 1578–1579
17.
go back to reference Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the IEEE computer society conference on bioinformatics, pp 523–528 Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the IEEE computer society conference on bioinformatics, pp 523–528
18.
go back to reference Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238CrossRef
19.
go back to reference Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3:185–205CrossRef Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3:185–205CrossRef
21.
go back to reference Yun C, Shin D, Jo H, Yang J, Kim S (2007) An experimental study on feature subset selection methods. Computer and Information Technology, in CIT 2007. 7th IEEE international conference on, pp 77–82 Yun C, Shin D, Jo H, Yang J, Kim S (2007) An experimental study on feature subset selection methods. Computer and Information Technology, in CIT 2007. 7th IEEE international conference on, pp 77–82
23.
go back to reference Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550CrossRef Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550CrossRef
24.
go back to reference Wang H, Bell D, Murtagh F (1999) Axiomatic approach to feature subset selection based on relevance. IEEE Trans Pattern Anal Mach Intell 21:271–277CrossRef Wang H, Bell D, Murtagh F (1999) Axiomatic approach to feature subset selection based on relevance. IEEE Trans Pattern Anal Mach Intell 21:271–277CrossRef
25.
go back to reference Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of 17th international conference machine learning, pp 359–366 Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of 17th international conference machine learning, pp 359–366
26.
go back to reference Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res J Mach Learn Res 5:1205–1224MathSciNet Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res J Mach Learn Res 5:1205–1224MathSciNet
27.
go back to reference Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of thirteenth international joint conference on artificial intelligence. Morgan Kaufmann, San Mateo, CA, pp 1022–1027 Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of thirteenth international joint conference on artificial intelligence. Morgan Kaufmann, San Mateo, CA, pp 1022–1027
28.
go back to reference Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24:1667–1671CrossRef Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24:1667–1671CrossRef
29.
go back to reference Li J, Su H, Chen H, Futscher BW (2007) Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans Inform Technol Biomed 11:398–405CrossRef Li J, Su H, Chen H, Futscher BW (2007) Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans Inform Technol Biomed 11:398–405CrossRef
30.
go back to reference Perou CM, Sørlie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752CrossRef Perou CM, Sørlie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752CrossRef
31.
go back to reference Alizadeh A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 4051:503–511CrossRef Alizadeh A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 4051:503–511CrossRef
32.
go back to reference Slonim DK, et al (2000) Class predication and discovery using expression data. In: Proceedings of the 4th annual international conference on computational molecular biology, pp 263–272 Slonim DK, et al (2000) Class predication and discovery using expression data. In: Proceedings of the 4th annual international conference on computational molecular biology, pp 263–272
33.
go back to reference Liu J, Iba H, Ishizuka M (2001) Selecting informative genes with parallel genetic algorithms in tissue classification. Genome Inform 12:14–23 Liu J, Iba H, Ishizuka M (2001) Selecting informative genes with parallel genetic algorithms in tissue classification. Genome Inform 12:14–23
34.
go back to reference Armstrong SA et al (2000) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47CrossRef Armstrong SA et al (2000) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47CrossRef
35.
go back to reference Beer DG, Kardia SLR, Huang CC et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824 Beer DG, Kardia SLR, Huang CC et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824
36.
go back to reference Khan J, Weil JS, Ringnér M, Saall LH, Ladanyi M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679CrossRef Khan J, Weil JS, Ringnér M, Saall LH, Ladanyi M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679CrossRef
37.
38.
go back to reference Hu QH, Yu DR, Xie ZX. Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27:414–423 Hu QH, Yu DR, Xie ZX. Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27:414–423
39.
go back to reference Robnik-sikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53:23–69MATHCrossRef Robnik-sikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53:23–69MATHCrossRef
40.
go back to reference Sotoca JM, Pla F, Sánchez JS (2007) Band selection in multispectral images by minimization of dependent information. IEEE Trans Syst Man Cybernet Part C Appl Rev 37:258–267CrossRef Sotoca JM, Pla F, Sánchez JS (2007) Band selection in multispectral images by minimization of dependent information. IEEE Trans Syst Man Cybernet Part C Appl Rev 37:258–267CrossRef
Metadata
Title
An efficient gene selection technique for cancer recognition based on neighborhood mutual information
Authors
Qinghua Hu
Wei Pan
Shuang An
Peijun Ma
Jinmao Wei
Publication date
01-12-2010
Publisher
Springer-Verlag
Published in
International Journal of Machine Learning and Cybernetics / Issue 1-4/2010
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-010-0008-6

Other articles of this Issue 1-4/2010

International Journal of Machine Learning and Cybernetics 1-4/2010 Go to the issue