Skip to main content
Erschienen in: Knowledge and Information Systems 1/2015

01.07.2015 | Regular Paper

Adapting naive Bayes tree for text classification

verfasst von: Shasha Wang, Liangxiao Jiang, Chaoqun Li

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Naive Bayes (NB) is one of the top 10 algorithms thanks to its simplicity, efficiency, and interpretability. To weaken its attribute independence assumption, naive Bayes tree (NBTree) has been proposed. NBTree is a hybrid algorithm, which deploys a naive Bayes classifier on each leaf node of the built decision tree and has demonstrated remarkable classification performance. When comes to text classification tasks, multinomial naive Bayes (MNB) has been a dominant modeling approach after the multi-variate Bernoulli model. Inspired by the success of NBTree, we propose a new algorithm called multinomial naive Bayes tree (MNBTree) by deploying a multinomial naive Bayes text classifier on each leaf node of the built decision tree. Different from NBTree, MNBTree builds a binary tree, in which the split attributes’ values are just divided into zero and nonzero. At the same time, MNBTree uses the information gain measure instead of the classification accuracy measure to build the tree for reducing the time consumption. To further scale up the classification performance of MNBTree, we propose its multiclass learning version called multiclass multinomial naive Bayes tree (MMNBTree) by applying the multiclass technique to MNBTree. The experimental results on a large number of widely used text classification benchmark datasets validate the effectiveness of our proposed algorithms: MNBTree and MMNBTree.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: Working notes of the 1998 AAAI/ICML workshop on learning for text. AAAI Press, pp 41–48 McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: Working notes of the 1998 AAAI/ICML workshop on learning for text. AAAI Press, pp 41–48
2.
Zurück zum Zitat Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the twentieth international conference on machine learning. Morgan Kaufmann, Los Altos, pp 616–623 Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive bayes text classifiers. In: Proceedings of the twentieth international conference on machine learning. Morgan Kaufmann, Los Altos, pp 616–623
3.
Zurück zum Zitat Han EH, Karypis G (2000) Centroid-based document classification: analysis and experimental results. Springer, Berlin Han EH, Karypis G (2000) Centroid-based document classification: analysis and experimental results. Springer, Berlin
4.
Zurück zum Zitat Han EH, Karypis G, Kumar V (2001) Text categorization using weight adjusted K-nearest neighbor classification. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 53–65 Han EH, Karypis G, Kumar V (2001) Text categorization using weight adjusted K-nearest neighbor classification. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 53–65
5.
Zurück zum Zitat Jiang L, Wang D, Cai Z (2012) Discriminatively weighted naive Bayes and its application in text classification. Int J Artif Intell Tools 21(01):1–19. Article ID 1250007 Jiang L, Wang D, Cai Z (2012) Discriminatively weighted naive Bayes and its application in text classification. Int J Artif Intell Tools 21(01):1–19. Article ID 1250007
6.
Zurück zum Zitat Jiang L, Cai Z, Zhang H, Wang D (2013) Naive Bayes text classifiers: a locally weighted learning approach. J Exp Theor Artif Intell 25(2):273–286CrossRef Jiang L, Cai Z, Zhang H, Wang D (2013) Naive Bayes text classifiers: a locally weighted learning approach. J Exp Theor Artif Intell 25(2):273–286CrossRef
7.
Zurück zum Zitat Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp. 275–281 Ponte JM, Croft WB (1998) A language modeling approach to information retrieval. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp. 275–281
8.
Zurück zum Zitat Losada DE, Azzopardi L (2008) Assessing multivariate Bernoulli models for information retrieval. ACM Trans Inf Syst (TOIS) 26(3):17CrossRef Losada DE, Azzopardi L (2008) Assessing multivariate Bernoulli models for information retrieval. ACM Trans Inf Syst (TOIS) 26(3):17CrossRef
9.
Zurück zum Zitat Losada D (2005) Language modeling for sentence retrieval: a comparison between multiple-bernoulli models and multinomial models. In: Information Retrieval and Theory Workshop, Glasgow, UK Losada D (2005) Language modeling for sentence retrieval: a comparison between multiple-bernoulli models and multinomial models. In: Information Retrieval and Theory Workshop, Glasgow, UK
10.
Zurück zum Zitat Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, pp 202–207 Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, pp 202–207
11.
Zurück zum Zitat Shi L, Weng M, Ma X et al (2010) Rough set based decision tree ensemble algorithm for text classification. J Comput Inf Syst 6:89–95 Shi L, Weng M, Ma X et al (2010) Rough set based decision tree ensemble algorithm for text classification. J Comput Inf Syst 6:89–95
12.
Zurück zum Zitat Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRef Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRef
13.
Zurück zum Zitat Chickering DM (1996) Learning Bayesian networks is NP-complete. In: Fisher D, Lenz H (eds) Learning from data: artificial intelligence and statistics V. Springer, Berlin, pp 121–130CrossRef Chickering DM (1996) Learning Bayesian networks is NP-complete. In: Fisher D, Lenz H (eds) Learning from data: artificial intelligence and statistics V. Springer, Berlin, pp 121–130CrossRef
14.
Zurück zum Zitat Jiang L, Wang D, Cai Z, Yan X (2007) Survey of improving naive Bayes for classification. In: Proceedings of the 3rd international conference on advanced data mining and applications (ADMA’07), LNAI 4632, pp 134–145 Jiang L, Wang D, Cai Z, Yan X (2007) Survey of improving naive Bayes for classification. In: Proceedings of the 3rd international conference on advanced data mining and applications (ADMA’07), LNAI 4632, pp 134–145
15.
Zurück zum Zitat Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive Bayes for class probability estimation. Knowl Based Syst 26:239–245CrossRef Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive Bayes for class probability estimation. Knowl Based Syst 26:239–245CrossRef
16.
Zurück zum Zitat Lorena AC, Carvalho AC, Gama JM (2008) A review on the combination of binary classifiers in multi-class problems. Artif Intell Rev 30(1–4):19–37CrossRef Lorena AC, Carvalho AC, Gama JM (2008) A review on the combination of binary classifiers in multi-class problems. Artif Intell Rev 30(1–4):19–37CrossRef
17.
Zurück zum Zitat Galar M, Fernndez A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn 44(8):1761–1776CrossRef Galar M, Fernndez A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems experimental study on one-vs-one and one-vs-all schemes. Pattern Recogn 44(8):1761–1776CrossRef
18.
Zurück zum Zitat Tan PN, Steinbach M, Kumar V (2013) Introduction to data mining, 2nd edn. Addison-Wesley, Reading Tan PN, Steinbach M, Kumar V (2013) Introduction to data mining, 2nd edn. Addison-Wesley, Reading
19.
Zurück zum Zitat Aly M (2005) Survey on multiclass classification methods. Technical Report, Caltech, USA Aly M (2005) Survey on multiclass classification methods. Technical Report, Caltech, USA
20.
Zurück zum Zitat Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3 edn. Morgan Kaufmann, Los Altos. ISBN 978-0-12-374856-0 Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3 edn. Morgan Kaufmann, Los Altos. ISBN 978-0-12-374856-0
21.
Zurück zum Zitat Alcal-Fdez J, Snchez L, Garca S et al (2009) KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput 13(3):307–318CrossRef Alcal-Fdez J, Snchez L, Garca S et al (2009) KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft Comput 13(3):307–318CrossRef
22.
Zurück zum Zitat Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MATHMathSciNet
23.
Zurück zum Zitat Mitchell TM (1997) Instance-based Learning. Chapter 8 in machine learning. McGraw-Hill, New York Mitchell TM (1997) Instance-based Learning. Chapter 8 in machine learning. McGraw-Hill, New York
24.
Zurück zum Zitat Frank E, Hall M, Pfahringer B (2003) Locally weighted naive Bayes. In: Proceedings of the conference on uncertainty in artificial intelligence. Morgan Kaufmann, Los Altos, pp 249–256 Frank E, Hall M, Pfahringer B (2003) Locally weighted naive Bayes. In: Proceedings of the conference on uncertainty in artificial intelligence. Morgan Kaufmann, Los Altos, pp 249–256
Metadaten
Titel
Adapting naive Bayes tree for text classification
verfasst von
Shasha Wang
Liangxiao Jiang
Chaoqun Li
Publikationsdatum
01.07.2015
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0746-y

Weitere Artikel der Ausgabe 1/2015

Knowledge and Information Systems 1/2015 Zur Ausgabe

Premium Partner