Skip to main content
Top

2020 | OriginalPaper | Chapter

Evaluating Mutual Information and Chi-Square Metrics in Text Features Selection Process: A Study Case Applied to the Text Classification in PubMed

Authors : José Párraga-Valle, Rodolfo García-Bermúdez, Fernando Rojas, Christian Torres-Morán, Alfredo Simón-Cuevas

Published in: Bioinformatics and Biomedical Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The aim of this work was to compare the behavior of mutual information and Chi-square as metrics in the evaluation of the relevance of the terms extracted from documents related to “software design” retrieved from PubMed database tested in two contexts: using a set of terms retrieved from the vectorization of the corpus of abstracts and using only the terms retrieved from the vocabulary defined by the IEEE standard ISO/IEC/IEEE 24765. A search was conducted concerning the subject “software” in the last 6 years and we used Medical Subject Headings (Mesh) term “software design” of the articles to label them. Then mutual information and Chi-square metrics were computed as metrics to sort and select features. Chi-square obtained the highest accuracy scores in documents classification by using a multinomial naive Bayes classifier. Although these results suggest that Chi-square is better than mutual information in feature relevance estimation in the context of this work, further research is necessary to obtain a consistent foundation of this conclusion.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
7.
go back to reference McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification (1998) McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification (1998)
8.
go back to reference Mengle, S.S., Goharian, N.: Using ambiguity measure feature selection algorithm for support vector machine classifier. In: Proceedings of the 2008 ACM symposium on Applied computing, pp. 916–920 (2008) Mengle, S.S., Goharian, N.: Using ambiguity measure feature selection algorithm for support vector machine classifier. In: Proceedings of the 2008 ACM symposium on Applied computing, pp. 916–920 (2008)
Metadata
Title
Evaluating Mutual Information and Chi-Square Metrics in Text Features Selection Process: A Study Case Applied to the Text Classification in PubMed
Authors
José Párraga-Valle
Rodolfo García-Bermúdez
Fernando Rojas
Christian Torres-Morán
Alfredo Simón-Cuevas
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-45385-5_57

Premium Partner