Skip to main content

2015 | OriginalPaper | Buchkapitel

50. Using Information Gain to Compare the Effeciency of Machine Learning Techniques When Classifying Influenza Based on Viral Hosts

verfasst von : Nermin Shaltout, Ahmed Rafea, Ahmed Moustafa, Mahmoud ElHefnawi

Erschienen in: Transactions on Engineering Technologies

Verlag: Springer Netherlands

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The paper compares the performance of two classical machine learning techniques when features selection is used to improve Influenza-A host classification. The impact of using the most informative positions on both the classifier efficiency and performance of decision trees (DTs) and neural networks (NNs) was measured. The experiments were conducted on cDNA sequences belonging to all the viral segments of the subtype H1 to ensure authenticity of results. Sequences belonging to each viral segment were further divided into viruses infecting human and non-human hosts prior to classification analysis. The performance measures, accuracy, sensitivity, specificity, precision, and time, were used. Extracting the best hundred informative positions with the information gain (IG) algorithm increased classification efficiency for both classifiers by more than 80 % for all viral segments. The change in performance was insignificant. The overall results of statistical significant tests showed that NNs classified viral hosts more accurately than DTs for subtype H1. The tests also showed that the DTs are significantly faster than NNs in classifying Influenza hosts despite the slight decrease in performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bouvier, N.M., Palese, P.: The biology of influenza viruses. Vaccine 26, D49–D53 (2008)CrossRef Bouvier, N.M., Palese, P.: The biology of influenza viruses. Vaccine 26, D49–D53 (2008)CrossRef
2.
Zurück zum Zitat Ghedin, E., Sengamalay, N., Shumway, M., Zaborsky, J., Feldblyum, T., et al.: Large-scale sequencing of human Influenza reveals the dynamic nature of viral genome evolution. Nature 437, 1162–1166 (2005)CrossRef Ghedin, E., Sengamalay, N., Shumway, M., Zaborsky, J., Feldblyum, T., et al.: Large-scale sequencing of human Influenza reveals the dynamic nature of viral genome evolution. Nature 437, 1162–1166 (2005)CrossRef
3.
Zurück zum Zitat Fislova, T., Kostolansky, F.: The factors of virulence of Influenza-A virus. Acta Virol. 49, 147–157 (2005) Fislova, T., Kostolansky, F.: The factors of virulence of Influenza-A virus. Acta Virol. 49, 147–157 (2005)
4.
Zurück zum Zitat Gabriel, G., Dauber, B., Wolff, T., Planz, O., Klenk, H.D., Stech, J.: The viral polymerase mediates adaptation of an Avian Influenza virus to a mammalian host. Proc. Natl. Acad. Sci. 102, 18590–18595 (2005)CrossRef Gabriel, G., Dauber, B., Wolff, T., Planz, O., Klenk, H.D., Stech, J.: The viral polymerase mediates adaptation of an Avian Influenza virus to a mammalian host. Proc. Natl. Acad. Sci. 102, 18590–18595 (2005)CrossRef
5.
Zurück zum Zitat Pedersen, J.C.: Hemagglutination-inhibition test for Avian Influenza virus subtype identification and the detection and quantitation of serum antibodies to the Avian Influenza virus. Methods Mol. Biol. 436, 53–66 (2008) Pedersen, J.C.: Hemagglutination-inhibition test for Avian Influenza virus subtype identification and the detection and quantitation of serum antibodies to the Avian Influenza virus. Methods Mol. Biol. 436, 53–66 (2008)
6.
Zurück zum Zitat Pedersen, J.C.: Neuraminidase-Inhibition assay for the identification of Influenza-A virus Neuraminidase subtype or Neuraminidase antibody specificity. Methods Mol. Biol. 436, 67–75 (2008) Pedersen, J.C.: Neuraminidase-Inhibition assay for the identification of Influenza-A virus Neuraminidase subtype or Neuraminidase antibody specificity. Methods Mol. Biol. 436, 67–75 (2008)
7.
Zurück zum Zitat Song, D.S., Lee, C.S., Jung, K., Kang, B.K., Oh, J.S., Yoon, Y.D., et al.: Isolation and phylogenetic analysis of H1N1 Swine Influenza virus isolated in Korea. Virus Res. 125, 98–103 (2007)CrossRef Song, D.S., Lee, C.S., Jung, K., Kang, B.K., Oh, J.S., Yoon, Y.D., et al.: Isolation and phylogenetic analysis of H1N1 Swine Influenza virus isolated in Korea. Virus Res. 125, 98–103 (2007)CrossRef
8.
Zurück zum Zitat Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)CrossRef Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)CrossRef
9.
Zurück zum Zitat Sami, A., Takahashi, M.: Decision tree construction for genetic applications based on association rules. IEEE TENCON 2005. pp. 21–25. Melbourne (2005) Sami, A., Takahashi, M.: Decision tree construction for genetic applications based on association rules. IEEE TENCON 2005. pp. 21–25. Melbourne (2005)
10.
Zurück zum Zitat Salzber, S.L., Delcher, A.L., Kasif, S., White, O.: Microbial gene identification using interpolated Markov models. Nucl. Acids Res. 26, 544–548 (1980)CrossRef Salzber, S.L., Delcher, A.L., Kasif, S., White, O.: Microbial gene identification using interpolated Markov models. Nucl. Acids Res. 26, 544–548 (1980)CrossRef
11.
Zurück zum Zitat Brunak, S., Engelbrecht, J., Knudsen, S.: Neural network detects errors in the assignment of mRNA splice sites. Nucl. Acids Res. 18, 4797–4801 (1990)CrossRef Brunak, S., Engelbrecht, J., Knudsen, S.: Neural network detects errors in the assignment of mRNA splice sites. Nucl. Acids Res. 18, 4797–4801 (1990)CrossRef
12.
Zurück zum Zitat Demeler, B., Zhou, G.: Neural network optimization for E. coli promoter prediction. Nucl. Acids Res. 19, 1593–1599 (1991)CrossRef Demeler, B., Zhou, G.: Neural network optimization for E. coli promoter prediction. Nucl. Acids Res. 19, 1593–1599 (1991)CrossRef
13.
Zurück zum Zitat Wu, C.H., Shivakumar, S.: Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences. Nucl. Acids Res. 22, 4291–4299 (1994)CrossRef Wu, C.H., Shivakumar, S.: Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences. Nucl. Acids Res. 22, 4291–4299 (1994)CrossRef
14.
Zurück zum Zitat Farber, R., Lapedes, A., Sirotkin, K.: Determination of eukaryotic protein coding regions using neural networks and information theory. J. Mol. Biol. 226, 471–479 (1992)CrossRef Farber, R., Lapedes, A., Sirotkin, K.: Determination of eukaryotic protein coding regions using neural networks and information theory. J. Mol. Biol. 226, 471–479 (1992)CrossRef
15.
Zurück zum Zitat Attaluri, P.K.: Classifying Influenza subtypes and hosts using machine learning techniques. ProQuest. UMI Dissertation Publishing (2012) Attaluri, P.K.: Classifying Influenza subtypes and hosts using machine learning techniques. ProQuest. UMI Dissertation Publishing (2012)
16.
Zurück zum Zitat Yuan, X., Xiaohui Yuan, Yang, F., Peng, J., Buckles, B.P.: Gene expression classification: decision Trees vs. SVMs, FLAIRS (2003) Yuan, X., Xiaohui Yuan, Yang, F., Peng, J., Buckles, B.P.: Gene expression classification: decision Trees vs. SVMs, FLAIRS (2003)
17.
Zurück zum Zitat Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Brief Bioinform. 23, 2507–2517 (2007)CrossRef Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Brief Bioinform. 23, 2507–2517 (2007)CrossRef
18.
Zurück zum Zitat Leung, K.S., Lee, K.H., et al.: Data mining on DNA sequences of Hepatitis B virus. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 428–440 (2011)MathSciNetCrossRef Leung, K.S., Lee, K.H., et al.: Data mining on DNA sequences of Hepatitis B virus. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 428–440 (2011)MathSciNetCrossRef
19.
Zurück zum Zitat ElHefnawi, M., Kadah, Y.M., Sherif, F.: Influenza-A subtyping and host origin classification using profile hidden Markov models. J. Mech. Med. Biol. 12, (1240009) 1–11 (2012) ElHefnawi, M., Kadah, Y.M., Sherif, F.: Influenza-A subtyping and host origin classification using profile hidden Markov models. J. Mech. Med. Biol. 12, (1240009) 1–11 (2012)
20.
Zurück zum Zitat ElHefnawi, M., Kadah, Y.M., Sherif, F.: Accurate classification and Hemagglutinin amino acid signatures for Influenza-A virus host-origin association and subtyping. Virology 449, 328–338 (2014)CrossRef ElHefnawi, M., Kadah, Y.M., Sherif, F.: Accurate classification and Hemagglutinin amino acid signatures for Influenza-A virus host-origin association and subtyping. Virology 449, 328–338 (2014)CrossRef
21.
Zurück zum Zitat Shaltout, N., Rafea, A., Moustafa, A., El Hefnawi, M.: Information gain as a feature selection method for the efficient classification of Influenza based on Viral hosts. Lecture Notes in Engineering and Computer Science: Proceedings of the World Congress on Engineering 2014, WCE 2014, pp. 625–631. London 2–4 July 2014 Shaltout, N., Rafea, A., Moustafa, A., El Hefnawi, M.: Information gain as a feature selection method for the efficient classification of Influenza based on Viral hosts. Lecture Notes in Engineering and Computer Science: Proceedings of the World Congress on Engineering 2014, WCE 2014, pp. 625–631. London 2–4 July 2014
Metadaten
Titel
Using Information Gain to Compare the Effeciency of Machine Learning Techniques When Classifying Influenza Based on Viral Hosts
verfasst von
Nermin Shaltout
Ahmed Rafea
Ahmed Moustafa
Mahmoud ElHefnawi
Copyright-Jahr
2015
Verlag
Springer Netherlands
DOI
https://doi.org/10.1007/978-94-017-9804-4_50

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.