Technical Note: Naive Bayes for Regression

Frank, Eibe; Trigg, Leonard; Holmes, Geoffrey; Witten, Ian H.

doi:10.1023/A:1007670802811

Technical Note: Naive Bayes for Regression

Published: October 2000

Volume 41, pages 5–25, (2000)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Technical Note: Naive Bayes for Regression

Download PDF

Eibe Frank¹,
Leonard Trigg²,
Geoffrey Holmes³ &
…
Ian H. Witten⁴

8041 Accesses
146 Citations
Explore all metrics

Abstract

Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates.

This paper shows how to apply the naive Bayes methodology to numeric prediction (i.e., regression) tasks by modeling the probability distribution of the target value with kernel density estimators, and compares it to linear regression, locally weighted linear regression, and a method that produces “model trees”—decision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naive Bayes is the method of choice, on real-world datasets it is almost uniformly worse than locally weighted linear regression and model trees. The comparison with linear regression depends on the error measure: for one measure naive Bayes performs similarly, while for another it is worse. We also show that standard naive Bayes applied to regression problems by discretizing the target value performs similarly badly. We then present empirical evidence that isolates naive Bayes' independence assumption as the culprit for its poor performance in the regression setting. These results indicate that the simplistic statistical assumption that naive Bayes makes is indeed more restrictive for regression than for classification.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Proceedings of the 2nd International Symposium on Information Theory (pp. 267–281). Budapest: Akadémiai Kiadó.
Google Scholar
Atkeson, C. G., Moore, A.W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73.
Google Scholar
Blake, C., Keogh, E., & Merz, C. J. (1998). UCI repository of machine learning data-bases. Irvine, CA: University of California, Department of Information and Computer Science. [http://www.ics.uci.edu/~mlearn/ MLRepository.html].
Google Scholar
Cestnik, B. (1990). Estimating probabilities:A crucial task in machine learning. In Proceedings of the 9th European Conference on Artificial Intelligence, Stockholm, Sweden (pp. 147–149). London: Pitman.
Google Scholar
Clark, P. & Niblett, T. (1989). The CN2 Induction Algorithm. Machine Learning, 3(4), 261–283.
Google Scholar
Domingos, P. & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. In Machine Learning, 29(2/3), 103–130.
Google Scholar
Duda, R. & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
Google Scholar
Fayyad, U. M. & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France (pp. 1022–1027). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32(1), 63–76.
Google Scholar
Friedman, J. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1, 55–77.
Google Scholar
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2/3), 131–163.
Google Scholar
Ghahramani, Z.& Jordan, M. I. (1994). Supervised learning from incomplete data via anEMapproach. In Advances in neural information processing systems 6 (pp. 120–127). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–91.
Google Scholar
John, G. H. & Kohavi, R. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1/2), 273–324.
Google Scholar
John, G. H. & Langley P. (1995). Estimating continuous distributions in Bayesian Classifiers. Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec (pp. 338–345). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Kasif, S., Salzberg, S., Waltz, D., Rachlin, J., & Aha, D.W. (1998). A probabilistic framework for memory-based reasoning. Artificial Intelligence, 104(1/2), 297–312.
Google Scholar
Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning with encoding length selection. In Progress in Connectionist-Based Information Systems, Dunedin, New Zealand (pp. 984–987). Singapore: Springer-Verlag.
Google Scholar
Kononenko, I. (1991). Semi-naive Bayesian classifier. In Proceedings of the 6th European Working Session on Learning, Porto, Portugal (pp. 206–219). Berlin: Springer-Verlag.
Google Scholar
Kononenko, I. (1998). Personal Communication.
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the 10th National Conference on Artificial Intelligence, San Jose, CA (pp. 223–228). Menlo Park, CA: AAAI Press.
Google Scholar
Langley, P. (1993).Induction of recursive Bayesian classifiers. In Proceedings of the 8th European Conference on Machine Learning, Vienna, Austria (pp. 153–164). Berlin: Springer-Verlag.
Google Scholar
Langley, P. & Sage, S. (1994). Induction of selective Bayesian classifiers, In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, Seattle, WA (pp. 399–406). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Lehmann, E. L. (1983). Theory of point estimation. New York: Wiley.
Google Scholar
Pazzani, M. (1996). Searching for dependencies in Bayesian classifiers. In Learning from data: Artificial intelligence and statistics V (pp. 343–348). New York: Springer-Verlag.
Google Scholar
Quinlan, J. R. (1992). Learning with continuous classes. In Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Australia (pp. 343–348). Singapore: World Scientific.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR (pp. 335–338). Menlo Park, CA: AAAI Press.
Google Scholar
Silverman, B. W. (1986). Density estimation for statistics and data analysis. New York: Chapman and Hall.
Google Scholar
Simonoff, J. S. (1996). Smoothing methods in statistics. New York: Springer-Verlag.
Google Scholar
Smyth, P., Gray, A., & Fayyad, U. M. (1995). Retrofitting decision tree classifiers using kernel density estimation, In Proceedings of the 12th International Conference on Machine Learning, Tahoe City, CA (pp. 506–514). San Francisco, CA: Morgan Kaufmann.
Google Scholar
StatLib (1999). Department of Statistics, Carnegie Mellon University. [http://lib.stat.cmu.edu].
Wang, Y. & Witten, I. H. (1997). Induction of model trees for predicting continuous classes, In Proceedings of the Poster Papers of the European Conference on Machine Learning, Prague (pp. 128–137). Prague: University of Economics, Faculty of Informatics and Statistics.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Waikato, Hamilton, New Zealand
Eibe Frank
Department of Computer Science, University of Waikato, Hamilton, New Zealand
Leonard Trigg
Department of Computer Science, University of Waikato, Hamilton, New Zealand
Geoffrey Holmes
Department of Computer Science, University of Waikato, Hamilton, New Zealand
Ian H. Witten

Authors

Eibe Frank
View author publications
You can also search for this author in PubMed Google Scholar
Leonard Trigg
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey Holmes
View author publications
You can also search for this author in PubMed Google Scholar
Ian H. Witten
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frank, E., Trigg, L., Holmes, G. et al. Technical Note: Naive Bayes for Regression. Machine Learning 41, 5–25 (2000). https://doi.org/10.1023/A:1007670802811

Download citation

Issue Date: October 2000
DOI: https://doi.org/10.1023/A:1007670802811

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Technical Note: Naive Bayes for Regression

Abstract

Article PDF

Similar content being viewed by others

Kernel Ridge Regression

Linear, Logistic, and Kernel Regression

Quantile-distribution functions and their use for classification, with application to naïve Bayes classifiers

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Technical Note: Naive Bayes for Regression

Abstract

Article PDF

Similar content being viewed by others

Kernel Ridge Regression

Linear, Logistic, and Kernel Regression

Quantile-distribution functions and their use for classification, with application to naïve Bayes classifiers

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation