Skip to main content
Top

2019 | OriginalPaper | Chapter

Weighting Words Using Bi-Normal Separation for Text Classification Tasks with Multiple Classes

Authors : Jean-Thomas Baillargeon, Luc Lamontagne, Étienne Marceau

Published in: Advances in Artificial Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

An important usage of natural language processing is creating vector representations of documents as features in a classification task. The traditional bag-of-word approach uses one-hot vector representations of words that aggregate into sparse vector document representation. This representation can be enhanced by weighting words that contribute the most to a classification task. In this paper, we propose a generalization of the Bi-Normal Separation metric that enhances vector representations of documents and outperforms TF-IDF scaling algorithms for one-of-m classification tasks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference De Boom, C., Van Canneyt, S., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, 150–156 (2016)CrossRef De Boom, C., Van Canneyt, S., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, 150–156 (2016)CrossRef
2.
go back to reference Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)MATH Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)MATH
3.
go back to reference Forman, G.: BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 263–270. ACM (2008) Forman, G.: BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 263–270. ACM (2008)
4.
go back to reference Kapoor, A., Dhavale, S.: Control flow graph based multiclass malware detection using bi-normal separation. Defence Sci. J. 66(2), 138–145 (2016)CrossRef Kapoor, A., Dhavale, S.: Control flow graph based multiclass malware detection using bi-normal separation. Defence Sci. J. 66(2), 138–145 (2016)CrossRef
5.
6.
go back to reference Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Natural Lang. Eng. 12(3), 229–249 (2006)CrossRef Li, X., Roth, D.: Learning question classifiers: the role of semantic information. Natural Lang. Eng. 12(3), 229–249 (2006)CrossRef
7.
go back to reference Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRef Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRef
Metadata
Title
Weighting Words Using Bi-Normal Separation for Text Classification Tasks with Multiple Classes
Authors
Jean-Thomas Baillargeon
Luc Lamontagne
Étienne Marceau
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-18305-9_41

Premium Partner