Skip to main content
Erschienen in: The Journal of Supercomputing 9/2019

30.04.2019

Application of improved distributed naive Bayesian algorithms in text classification

verfasst von: Hongyi Gao, Xi Zeng, Chunhua Yao

Erschienen in: The Journal of Supercomputing | Ausgabe 9/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The naive Bayes classifier is a widely used text classification method that applies statistical theory to text classification. Due to the particularity of the text, related feature items may generate new semantic information, which may be lost when the traditional vector space model represents text. This paper mainly studies the construction and improvement of distributed naive Bayes automatic classification system. The application of Hadoop cloud computing in web page classification is one of the focuses of this article. Firstly, the text classification system and Bayesian classification model are analyzed and discussed, including the representation and extraction of text information, text classification methods and Bayesian text classification methods. Then, in view of the shortcomings of the above-mentioned naive Bayesian text classification method, when training text, we use the mutual information method to check the correlation between the feature sets generated after feature selection, and then combine the features with higher correlation degree appropriately. Through a series of tests, the experimental data show that the improved text classification system can achieve better classification results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
2.
Zurück zum Zitat Xu J, Ma B (2014) Study of network public opinion classification method based on naive bayesian algorithm in hadoop environment. Appl Mech Mater 519–520:4 Xu J, Ma B (2014) Study of network public opinion classification method based on naive bayesian algorithm in hadoop environment. Appl Mech Mater 519–520:4
3.
Zurück zum Zitat Jiang L, Li C, Wang S et al (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39CrossRef Jiang L, Li C, Wang S et al (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39CrossRef
4.
Zurück zum Zitat Cao Y, Sun L, Han C et al (2018) Improved side information generation algorithm based on naive Bayesian theory for distributed video coding. IET Image Process 12(3):354–360CrossRef Cao Y, Sun L, Han C et al (2018) Improved side information generation algorithm based on naive Bayesian theory for distributed video coding. IET Image Process 12(3):354–360CrossRef
5.
Zurück zum Zitat Nisa R, Qamar U (2015) A text mining based approach for web service classification. Inf Syst e-Bus Manag 13(4):751–768CrossRef Nisa R, Qamar U (2015) A text mining based approach for web service classification. Inf Syst e-Bus Manag 13(4):751–768CrossRef
6.
Zurück zum Zitat Diab DM, El Hindi KM (2017) Using differential evolution for fine tuning naïve Bayesian classifiers and its application for text classification. Appl Soft Comput 54:183–199CrossRef Diab DM, El Hindi KM (2017) Using differential evolution for fine tuning naïve Bayesian classifiers and its application for text classification. Appl Soft Comput 54:183–199CrossRef
7.
Zurück zum Zitat Wong Tzu-Tsung (2014) Generalized Dirichlet priors for Naive Bayesian classifiers with multinomial models in document classification. Data Min Knowl Discov 28(1):123–144MathSciNetCrossRef Wong Tzu-Tsung (2014) Generalized Dirichlet priors for Naive Bayesian classifiers with multinomial models in document classification. Data Min Knowl Discov 28(1):123–144MathSciNetCrossRef
8.
Zurück zum Zitat Guan G, Guo J, Wang H (2014) Varying Naïve Bayes models with applications to classification of chinese text documents. J Bus Econ Stat 32(3):445–456CrossRef Guan G, Guo J, Wang H (2014) Varying Naïve Bayes models with applications to classification of chinese text documents. J Bus Econ Stat 32(3):445–456CrossRef
9.
Zurück zum Zitat Jing-Hui LI, Xiao-Gang Z, Hua C et al (2013) Improved algorithm for learning hidden Naive Bayes. J Chin Comput Syst 21(10):1361–1371 Jing-Hui LI, Xiao-Gang Z, Hua C et al (2013) Improved algorithm for learning hidden Naive Bayes. J Chin Comput Syst 21(10):1361–1371
10.
Zurück zum Zitat Yang B, Lei Y, Yan B (2016) Distributed multi-human location algorithm using Naive Bayes classifier for a binary pyroelectric infrared sensor tracking system. IEEE Sens J 16(1):216–223CrossRef Yang B, Lei Y, Yan B (2016) Distributed multi-human location algorithm using Naive Bayes classifier for a binary pyroelectric infrared sensor tracking system. IEEE Sens J 16(1):216–223CrossRef
11.
Zurück zum Zitat Zhang X, Jiang J, Hong R et al (2015) Accelerated image classification algorithm based on naive Bayes K-nearest neighbor. Beijing Hangkong Hangtian Daxue Xuebao/J Beijing Univ Aeronaut Astronaut 41(2):302–310 Zhang X, Jiang J, Hong R et al (2015) Accelerated image classification algorithm based on naive Bayes K-nearest neighbor. Beijing Hangkong Hangtian Daxue Xuebao/J Beijing Univ Aeronaut Astronaut 41(2):302–310
12.
Zurück zum Zitat Wang S, Jiang L, Li C (2015) Adapting naive Bayes tree for text classification. Knowl Inf Syst 44(1):77–89CrossRef Wang S, Jiang L, Li C (2015) Adapting naive Bayes tree for text classification. Knowl Inf Syst 44(1):77–89CrossRef
13.
Zurück zum Zitat Chettri R, Pradhan S, Chettri L (2015) Internet of things: comparative study on classification algorithms (k-NN, Naive Bayes and case based reasoning). Int J Comput Appl 130(12):7–9 Chettri R, Pradhan S, Chettri L (2015) Internet of things: comparative study on classification algorithms (k-NN, Naive Bayes and case based reasoning). Int J Comput Appl 130(12):7–9
14.
Zurück zum Zitat Jiang JC, Lin TY (2013) Mahalanobis-Taguchi system and selective Naive Bayesian algorithm for multivariate pattern recognition. J Comput Theor Nanosci 19(2):638–641 Jiang JC, Lin TY (2013) Mahalanobis-Taguchi system and selective Naive Bayesian algorithm for multivariate pattern recognition. J Comput Theor Nanosci 19(2):638–641
Metadaten
Titel
Application of improved distributed naive Bayesian algorithms in text classification
verfasst von
Hongyi Gao
Xi Zeng
Chunhua Yao
Publikationsdatum
30.04.2019
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 9/2019
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-019-02862-1

Weitere Artikel der Ausgabe 9/2019

The Journal of Supercomputing 9/2019 Zur Ausgabe

Premium Partner