Skip to main content

2015 | OriginalPaper | Buchkapitel

Incremental Weighted Naive Bays Classifiers for Data Stream

verfasst von : Christophe Salperwyck, Vincent Lemaire, Carine Hue

Erschienen in: Data Science, Learning by Latent Structures, and Knowledge Discovery

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with naive independence assumption. The explanatory variables (X i ) are assumed to be independent from the target variable (Y ). Despite this strong assumption this classifier has proved to be very effective on many real applications and is often used on data stream for supervised classification. The naive Bayes classifier simply relies on the estimation of the univariate conditional probabilities P(X i  | C). This estimation can be provided on a data stream using a “supervised quantiles summary.” The literature shows that the naive Bayes classifier can be improved (1) using a variable selection method (2) weighting the explanatory variables. Most of these methods are related to batch (off-line) learning and need to store all the data in memory and/or require reading more than once each example. Therefore they cannot be used on data stream. This paper presents a new method based on a graphical model which computes the weights on the input variables using a stochastic estimation. The method is incremental and produces a Weighted Naive Bayes Classifier for data stream. This method will be compared to classical naive Bayes classifier on the Large Scale Learning challenge datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford University Press. Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford University Press.
Zurück zum Zitat Boullé, M. (2006a). MODL: A Bayes optimal discretization method for continuous attributes. Machine Learning, 65(1), 131–165.CrossRef Boullé, M. (2006a). MODL: A Bayes optimal discretization method for continuous attributes. Machine Learning, 65(1), 131–165.CrossRef
Zurück zum Zitat Boullé, M. (2006b). Regularization and averaging of the selective naive bayes classifier. In The 2006 IEEE International Joint Conference on Neural Network Proceedings (pp. 1680–1688). Boullé, M. (2006b). Regularization and averaging of the selective naive bayes classifier. In The 2006 IEEE International Joint Conference on Neural Network Proceedings (pp. 1680–1688).
Zurück zum Zitat Gama, J. (2010). Knowledge discovery from data streams. Chapman and Hall/CRC Press Gama, J. (2010). Knowledge discovery from data streams. Chapman and Hall/CRC Press
Zurück zum Zitat Gama, J., & Pinto, C. (2006). Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM Symposium on Applied Computing (pp. 662–667). Gama, J., & Pinto, C. (2006). Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM Symposium on Applied Computing (pp. 662–667).
Zurück zum Zitat Greenwald, M., & Khanna, S. (2001). Space-efficient online computation of quantile summaries. ACM SIGMOD Record, 30(2), 58–66.CrossRef Greenwald, M., & Khanna, S. (2001). Space-efficient online computation of quantile summaries. ACM SIGMOD Record, 30(2), 58–66.CrossRef
Zurück zum Zitat Guigourès, R., & Boullé, M. (2011). Optimisation directe des poids de modèles dans un prédicteur Bayésien naif moyenné. In Extraction et gestion des connaissances EGC’2011 (pp. 77–82). Guigourès, R., & Boullé, M. (2011). Optimisation directe des poids de modèles dans un prédicteur Bayésien naif moyenné. In Extraction et gestion des connaissances EGC’2011 (pp. 77–82).
Zurück zum Zitat Guyon, I., Lemaire, V., Boullé, M., Dror, G., & Vogel, D. (2009). Analysis of the KDD cup 2009: Fast scoring on a large orange customer database. In JMLR: Workshop and Conference Proceedings (Vol. 7, pp. 1–22). Guyon, I., Lemaire, V., Boullé, M., Dror, G., & Vogel, D. (2009). Analysis of the KDD cup 2009: Fast scoring on a large orange customer database. In JMLR: Workshop and Conference Proceedings (Vol. 7, pp. 1–22).
Zurück zum Zitat Koller, D., & Sahami, M. (1996, May). Toward optimal feature selection. In International Conference on Machine Learning (pp. 284–292). Koller, D., & Sahami, M. (1996, May). Toward optimal feature selection. In International Conference on Machine Learning (pp. 284–292).
Zurück zum Zitat Kuncheva, L. I., & Plumpton, C. O. (2008). Adaptive learning rate for online linear discriminant classifiers. In Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition (pp. 510–519). Heidelberg: Springer. Kuncheva, L. I., & Plumpton, C. O. (2008). Adaptive learning rate for online linear discriminant classifiers. In Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition (pp. 510–519). Heidelberg: Springer.
Zurück zum Zitat Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the National Conference on Artificial Intelligence (pp. 223–228). Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the National Conference on Artificial Intelligence (pp. 223–228).
Zurück zum Zitat Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In R. L. Mantaras & D. Poole (Eds.), Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406). Seattle, WA: Morgan Kaufmann. Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In R. L. Mantaras & D. Poole (Eds.), Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406). Seattle, WA: Morgan Kaufmann.
Zurück zum Zitat Lecun, Y., Bottou, L., Orr, G. B., & Müller, K. R. (1998). Efficient BackProp. In G. Orr & K. Müller (Eds.), Neural networks: Tricks of the trade. Lecture notes in computer science (Vol. 1524, pp. 5–50). Heidelberg: Springer. Lecun, Y., Bottou, L., Orr, G. B., & Müller, K. R. (1998). Efficient BackProp. In G. Orr & K. Müller (Eds.), Neural networks: Tricks of the trade. Lecture notes in computer science (Vol. 1524, pp. 5–50). Heidelberg: Springer.
Zurück zum Zitat Salperwyck, C. (2012). Apprentissage incrémental en ligne sur flux de données. PhD thesis, University of Lille. Salperwyck, C. (2012). Apprentissage incrémental en ligne sur flux de données. PhD thesis, University of Lille.
Metadaten
Titel
Incremental Weighted Naive Bays Classifiers for Data Stream
verfasst von
Christophe Salperwyck
Vincent Lemaire
Carine Hue
Copyright-Jahr
2015
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-44983-7_16