Skip to main content
Erschienen in: Neural Computing and Applications 7/2020

10.11.2018 | Original Article

An entropy-based classification of breast cancerous genes using microarray data

verfasst von: Mausami Mondal, Rahul Semwal, Utkarsh Raj, Imlimaong Aier, Pritish Kumar Varadwaj

Erschienen in: Neural Computing and Applications | Ausgabe 7/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Gene expression levels obtained from microarray data provide a promising technique for doing classification on cancerous data. Due to the high dimensionality of the microarray datasets, the redundant genes need to be removed and only significant genes are required for building the classifier. In this work, an entropy-based method was used based on supervised learning to differentiate between normal tissue and breast tumor based on their gene expression profiles. This work employs four widely used machine learning techniques for breast cancer prediction, namely support vector machine (SVM), random forest, k-nearest neighbor (KNN) and naive Bayes. The performance of these techniques was evaluated on four different classification performance measurements which result in getting more accuracy in case of SVM as compared to other machine learning algorithms. Classification accuracy of 91.5% was achieved by support vector machine with 0.833 F1 measures. Furthermore, these techniques were evaluated on the basis of performance by ROC curve and calibration graph.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef
2.
Zurück zum Zitat Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422CrossRef Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422CrossRef
3.
Zurück zum Zitat Ben-Dor A, Bruhn L, Friedman N et al (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–583CrossRef Ben-Dor A, Bruhn L, Friedman N et al (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–583CrossRef
4.
Zurück zum Zitat DeSantis CE, Siegel RL, Sauer AG et al (2016) Cancer statistics for African Americans, 2016: progress and opportunities in reducing racial disparities. CA Cancer J Clin 66:290–308CrossRef DeSantis CE, Siegel RL, Sauer AG et al (2016) Cancer statistics for African Americans, 2016: progress and opportunities in reducing racial disparities. CA Cancer J Clin 66:290–308CrossRef
5.
Zurück zum Zitat Hedley DW, Rugg CA, Gelber RD (1987) Association of DNA index and S-phase fraction with prognosis of nodes positive early breast cancer. Cancer Res 47:4729–4735 Hedley DW, Rugg CA, Gelber RD (1987) Association of DNA index and S-phase fraction with prognosis of nodes positive early breast cancer. Cancer Res 47:4729–4735
6.
Zurück zum Zitat Khan J, Wei JS, Ringner M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679CrossRef Khan J, Wei JS, Ringner M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679CrossRef
8.
Zurück zum Zitat Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470CrossRef Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470CrossRef
9.
Zurück zum Zitat DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680–686CrossRef DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680–686CrossRef
10.
Zurück zum Zitat Wang L, Chu F, Xie W (2007) Accurate cancer classification using expressions of very few genes. IEEEACM Trans Comput Biol Bioinforma TCBB 4:40–53CrossRef Wang L, Chu F, Xie W (2007) Accurate cancer classification using expressions of very few genes. IEEEACM Trans Comput Biol Bioinforma TCBB 4:40–53CrossRef
11.
Zurück zum Zitat Furberg CD, Yusuf S (1988) Effect of drug therapy on survival in chronic congestive heart failure. Am J Cardiol 62:41A–45ACrossRef Furberg CD, Yusuf S (1988) Effect of drug therapy on survival in chronic congestive heart failure. Am J Cardiol 62:41A–45ACrossRef
13.
Zurück zum Zitat Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recognit 45:531–539CrossRef Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recognit 45:531–539CrossRef
14.
Zurück zum Zitat Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17:126–136CrossRef Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17:126–136CrossRef
15.
Zurück zum Zitat Dembele D, Kastner P (2003) Fuzzy C-means method for clustering microarray data. Bioinformatics 19:973–980CrossRef Dembele D, Kastner P (2003) Fuzzy C-means method for clustering microarray data. Bioinformatics 19:973–980CrossRef
16.
Zurück zum Zitat Saldanha AJ (2004) Java Treeview—extensible visualization of microarray data. Bioinformatics 20:3246–3248CrossRef Saldanha AJ (2004) Java Treeview—extensible visualization of microarray data. Bioinformatics 20:3246–3248CrossRef
17.
Zurück zum Zitat Vanitha CDA, Devaraj D, Venkatesulu M (2015) Gene expression data classification using support vector machine and mutual information-based gene selection. Proced Comput Sci 47:13–21CrossRef Vanitha CDA, Devaraj D, Venkatesulu M (2015) Gene expression data classification using support vector machine and mutual information-based gene selection. Proced Comput Sci 47:13–21CrossRef
18.
Zurück zum Zitat Yeung KY, Haynor DR, Ruzzo WL (2001) Validating clustering for gene expression data. Bioinformatics 17:309–318CrossRef Yeung KY, Haynor DR, Ruzzo WL (2001) Validating clustering for gene expression data. Bioinformatics 17:309–318CrossRef
19.
Zurück zum Zitat Chang JC, Wooten EC, Tsimelzon A et al (2003) Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362:362–369CrossRef Chang JC, Wooten EC, Tsimelzon A et al (2003) Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362:362–369CrossRef
20.
Zurück zum Zitat Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30:1145–1159CrossRef Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30:1145–1159CrossRef
21.
Zurück zum Zitat Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Mach Learn ECML 98:137–142CrossRef Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Mach Learn ECML 98:137–142CrossRef
22.
Zurück zum Zitat Furey TS, Cristianini N, Duffy N et al (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914CrossRef Furey TS, Cristianini N, Duffy N et al (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914CrossRef
23.
Zurück zum Zitat Anderson TF, Abrams DS, Grens EA (1978) Evaluation of parameters for nonlinear thermodynamic models. AIChE J 24:20–29MathSciNetCrossRef Anderson TF, Abrams DS, Grens EA (1978) Evaluation of parameters for nonlinear thermodynamic models. AIChE J 24:20–29MathSciNetCrossRef
24.
Zurück zum Zitat Serretti A, Smeraldi E (2004) Neural network analysis in pharmacogenetics of mood disorders. BMC Med Genet 5:27CrossRef Serretti A, Smeraldi E (2004) Neural network analysis in pharmacogenetics of mood disorders. BMC Med Genet 5:27CrossRef
25.
Zurück zum Zitat Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems. pp 841–848 Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems. pp 841–848
27.
Zurück zum Zitat Lu Y, Han J (2003) Cancer classification using gene expression data. Inf Syst 28:243–268CrossRef Lu Y, Han J (2003) Cancer classification using gene expression data. Inf Syst 28:243–268CrossRef
28.
Zurück zum Zitat Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22 Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22
29.
Zurück zum Zitat Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947–1958CrossRef Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947–1958CrossRef
31.
Zurück zum Zitat Ray C (2011) Cancer identification and gene classification using DNA micro array gene expression patterns. Int J Comput Sci Issues 8:155–160 Ray C (2011) Cancer identification and gene classification using DNA micro array gene expression patterns. Int J Comput Sci Issues 8:155–160
32.
Zurück zum Zitat Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40:2038–2048CrossRef Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40:2038–2048CrossRef
33.
Zurück zum Zitat Parry RM, Jones W, Stokes TH et al (2010) k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics J 10:292–309CrossRef Parry RM, Jones W, Stokes TH et al (2010) k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics J 10:292–309CrossRef
34.
Zurück zum Zitat Geisser S (1993) Selecting a statistical model and predicting. In: Predictive inference: an introduction. Springer, Berlin, pp 88–117 Geisser S (1993) Selecting a statistical model and predicting. In: Predictive inference: an introduction. Springer, Berlin, pp 88–117
35.
Zurück zum Zitat Demšar J, Curk T, Erjavec A et al (2013) Orange: data mining toolbox in Python. J Mach Learn Res 14:2349–2353MATH Demšar J, Curk T, Erjavec A et al (2013) Orange: data mining toolbox in Python. J Mach Learn Res 14:2349–2353MATH
Metadaten
Titel
An entropy-based classification of breast cancerous genes using microarray data
verfasst von
Mausami Mondal
Rahul Semwal
Utkarsh Raj
Imlimaong Aier
Pritish Kumar Varadwaj
Publikationsdatum
10.11.2018
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 7/2020
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3864-8

Weitere Artikel der Ausgabe 7/2020

Neural Computing and Applications 7/2020 Zur Ausgabe

Deep Learning & Neural Computing for Intelligent Sensing and Control

Even faster retinal vessel segmentation via accelerated singular value decomposition

Premium Partner