nach oben

Knowledge and Information Systems

Erschienen in:

01.07.2015 | Regular Paper

Adapting naive Bayes tree for text classification

verfasst von: Shasha Wang, Liangxiao Jiang, Chaoqun Li

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Naive Bayes (NB) is one of the top 10 algorithms thanks to its simplicity, efficiency, and interpretability. To weaken its attribute independence assumption, naive Bayes tree (NBTree) has been proposed. NBTree is a hybrid algorithm, which deploys a naive Bayes classifier on each leaf node of the built decision tree and has demonstrated remarkable classification performance. When comes to text classification tasks, multinomial naive Bayes (MNB) has been a dominant modeling approach after the multi-variate Bernoulli model. Inspired by the success of NBTree, we propose a new algorithm called multinomial naive Bayes tree (MNBTree) by deploying a multinomial naive Bayes text classifier on each leaf node of the built decision tree. Different from NBTree, MNBTree builds a binary tree, in which the split attributes’ values are just divided into zero and nonzero. At the same time, MNBTree uses the information gain measure instead of the classification accuracy measure to build the tree for reducing the time consumption. To further scale up the classification performance of MNBTree, we propose its multiclass learning version called multiclass multinomial naive Bayes tree (MMNBTree) by applying the multiclass technique to MNBTree. The experimental results on a large number of widely used text classification benchmark datasets validate the effectiveness of our proposed algorithms: MNBTree and MMNBTree.

Vorheriger Artikel Using proximity and tag weights for focused retrieval in structured documents

Nächster Artikel A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: Working notes of the 1998 AAAI/ICML workshop on learning for text. AAAI Press, pp 41–48

Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the twentieth international conference on machine learning. Morgan Kaufmann, Los Altos, pp 616–623

Han EH, Karypis G (2000) Centroid-based document classification: analysis and experimental results. Springer, Berlin

Han EH, Karypis G, Kumar V (2001) Text categorization using weight adjusted K-nearest neighbor classification. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 53–65

Jiang L, Wang D, Cai Z (2012) Discriminatively weighted naive Bayes and its application in text classification. Int J Artif Intell Tools 21(01):1–19. Article ID 1250007

Jiang L, Cai Z, Zhang H, Wang D (2013) Naive Bayes text classifiers: a locally weighted learning approach. J Exp Theor Artif Intell 25(2):273–286CrossRef

Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp. 275–281

Losada DE, Azzopardi L (2008) Assessing multivariate Bernoulli models for information retrieval. ACM Trans Inf Syst (TOIS) 26(3):17CrossRef

Losada D (2005) Language modeling for sentence retrieval: a comparison between multiple-bernoulli models and multinomial models. In: Information Retrieval and Theory Workshop, Glasgow, UK

10.

Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, pp 202–207

11.

Shi L, Weng M, Ma X et al (2010) Rough set based decision tree ensemble algorithm for text classification. J Comput Inf Syst 6:89–95

12.

Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRef

13.

Chickering DM (1996) Learning Bayesian networks is NP-complete. In: Fisher D, Lenz H (eds) Learning from data: artificial intelligence and statistics V. Springer, Berlin, pp 121–130CrossRef

14.

Jiang L, Wang D, Cai Z, Yan X (2007) Survey of improving naive Bayes for classification. In: Proceedings of the 3rd international conference on advanced data mining and applications (ADMA’07), LNAI 4632, pp 134–145

15.

Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive Bayes for class probability estimation. Knowl Based Syst 26:239–245CrossRef

16.

Lorena AC, Carvalho AC, Gama JM (2008) A review on the combination of binary classifiers in multi-class problems. Artif Intell Rev 30(1–4):19–37CrossRef

17.

Galar M, Fernndez A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn 44(8):1761–1776CrossRef

18.

Tan PN, Steinbach M, Kumar V (2013) Introduction to data mining, 2nd edn. Addison-Wesley, Reading

19.

Aly M (2005) Survey on multiclass classification methods. Technical Report, Caltech, USA

20.

Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3 edn. Morgan Kaufmann, Los Altos. ISBN 978-0-12-374856-0

21.

Alcal-Fdez J, Snchez L, Garca S et al (2009) KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput 13(3):307–318CrossRef

22.

Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet

23.

Mitchell TM (1997) Instance-based Learning. Chapter 8 in machine learning. McGraw-Hill, New York

24.

Frank E, Hall M, Pfahringer B (2003) Locally weighted naive Bayes. In: Proceedings of the conference on uncertainty in artificial intelligence. Morgan Kaufmann, Los Altos, pp 249–256

Titel: Adapting naive Bayes tree for text classification
verfasst von: Shasha Wang
Liangxiao Jiang
Chaoqun Li
Publikationsdatum: 01.07.2015
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 1/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-014-0746-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2015

Speeding up multiple instance learning classification rules on GPUs

Partial spatio-temporal co-occurrence pattern mining

Improved concept drift handling in surgery prediction and other applications

A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation

LC-mine: a framework for frequent subgraph mining with local consistency techniques

Secure support vector machines outsourcing with random linear transformation

Premium Partner