Discretization for naive-Bayes learning: managing discretization bias and variance

Yang, Ying; Webb, Geoffrey I.

doi:10.1007/s10994-008-5083-5

Discretization for naive-Bayes learning: managing discretization bias and variance

Published: 04 September 2008

Volume 74, pages 39–74, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Discretization for naive-Bayes learning: managing discretization bias and variance

Download PDF

Ying Yang¹ &
Geoffrey I. Webb²

3465 Accesses
162 Citations
Explore all metrics

Abstract

Quantitative attributes are usually discretized in Naive-Bayes learning. We establish simple conditions under which discretization is equivalent to use of the true probability density function during naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error. In particular, we supply insights into managing discretization bias and variance by adjusting the number of intervals and the number of training instances contained in each interval. We accordingly propose proportional discretization and fixed frequency discretization, two efficient unsupervised discretization methods that are able to effectively manage discretization bias and variance. We evaluate our new techniques against four key discretization methods for naive-Bayes classifiers. The experimental results support our theoretical analyses by showing that with statistically significant frequency, naive-Bayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by current established discretization methods.

References

Acid, S., Campos, L. M. D., & Castellano, J. G. (2005). Learning Bayesian network classifiers: searching in a space of partially directed acyclic graphs. Machine Learning, 59(3), 213–235.
MATH Google Scholar
An, A., & Cercone, N. (1999). Discretization of continuous attributes for learning classification rules. In Proceedings of the 3rd Pacific-Asia conference on methodologies for knowledge discovery and data mining (pp. 509–514), 1999.
Androutsopoulos, I., Koutsias, J., Chandrinos, K., & Spyropoulos, C. (2000). An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with encrypted personal e-mail messages. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 160–167), 2000.
Bay, S. D. (1999). The UCI KDD archive [http://kdd.ics.uci.edu]. Irvine: Department of Information and Computer Science, University of California.
Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases [http://www.ics.uci.edu/~mlearn/mlrepository.html]. Irvine: Department of Information and Computer Science, University of California.
Bluman, A. G. (1992). Elementary statistics, a step by step approach. Dubuque: Wm.C. Brown Publishers.
Google Scholar
Breiman, L. (1996). Bias, variance and arcing classifiers (Technical report 460). Statistics Department, University of California, Berkeley.
Casella, G., & Berger, R. L. (1990). Statistical inference. Pacific Grove.
Catlett, J. (1991). On changing continuous attributes into ordered discrete attributes. In Proceedings of the European working session on learning (pp. 164–178), 1991.
Cerquides, J., & Mántaras, R. L. D. (2005). TAN classifiers based on decomposable distributions. Machine Learning, 59(3), 323–354.
Article MATH Google Scholar
Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In Proceedings of the 9th European conference on artificial intelligence (pp. 147–149), 1990.
Cestnik, B., Kononenko, I., & Bratko, I. (1987). Assistant 86: a knowledge-elicitation tool for sophisticated users. In Proceedings of the 2nd European working session on learning (pp. 31–45), 1987.
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.
Google Scholar
Crawford, E., Kay, J., & Eric, M. (2002). IEMS—the intelligent email sorter. In Proceedings of the 19th international conference on machine learning (pp. 83–90), 2002.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
MathSciNet Google Scholar
Domingos, P., & Pazzani, M. J. (1996). Beyond independence: conditions for the optimality of the simple Bayesian classifier. In Proceedings of the 13th international conference on machine learning (pp. 105–112). San Mateo: Morgan Kaufmann Publishers.
Google Scholar
Domingos, P., & Pazzani, M. J. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.
Article MATH Google Scholar
Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of the 12th international conference on machine learning (pp. 194–202), 1995.
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
MATH Google Scholar
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1022–1027), 1993.
Freitas, A. A., & Lavington, S. H. (1996). Speeding up knowledge discovery in large relational databases by means of a new discretization algorithm. In Advances in databases, proceedings of the 14th British national conference on databases (pp. 124–133), 1996.
Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1(1), 55–77.
Article Google Scholar
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675–701.
Article Google Scholar
Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics, 11, 86–92.
Article MATH MathSciNet Google Scholar
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2), 131–163.
Article MATH Google Scholar
Gama, J., Torgo, L., & Soares, C. (1998). Dynamic discretization of continuous attributes. In Proceedings of the 6th Ibero-American conference on AI (pp. 160–169), 1998.
Hsu, C.-N., Huang, H.-J., & Wong, T.-T. (2000). Why discretization works for naive Bayesian classifiers. In Proceedings of the 17th international conference on machine learning (pp. 309–406), 2000.
Hsu, C.-N., Huang, H.-J., & Wong, T.-T. (2003). Implications of the Dirichlet assumption for discretization of continuous variables in naive Bayesian classifiers. Machine Learning, 53(3), 235–263.
Article MATH Google Scholar
Hussain, F., Liu, H., Tan, C. L., & Dash, M. (1999). Discretization: An enabling technique. (Technical Report, TRC6/99). School of Computing, National University of Singapore.
John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the 11th conference on uncertainty in artificial intelligence (pp. 338–345), 1995.
Keogh, E., & Pazzani, M. J. (1999). Learning augmented Bayesian classifiers: a comparison of distribution-based and classification-based approaches. In Proceedings of international workshop on artificial intelligence and statistics (pp. 225–230), 1999.
Kerber, R. (1992). Chimerge: Discretization for numeric attributes. In National conference on artificial intelligence (pp. 123–128). Menlo Park: AAAI Press.
Google Scholar
Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In Proceedings of the 13th international conference on machine learning (pp. 275–283), 1996.
Kong, E. B., & Dietterich, T. G. (1995). Error-correcting output coding corrects bias and variance. In Proceedings of the 12th international conference on machine learning (pp. 313–321), 1995.
Kononenko, I. (1990). Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition. Amsterdam: IOS Press.
Google Scholar
Kononenko, I. (1992). Naive Bayesian classifier and continuous attributes. Informatica, 16(1), 1–8.
MathSciNet Google Scholar
Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89–109.
Article MathSciNet Google Scholar
Langley, P. (1993). Induction of recursive Bayesian classifiers. In Proceedings of the European conference on machine learning (pp. 153–164), 1993.
Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the 10th conference on uncertainty in artificial intelligence (pp. 399–406), 1994.
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the 10th national conference on artificial intelligence (pp. 223–228), 1992.
Lavrac, N. (1998). Data mining in medicine: selected techniques and applications. In Proceedings of the 2nd international conference on the practical applications of knowledge discovery and data mining (pp. 11–31), 1998.
Lavrac, N., Keravnou, E., & Zupan, B. (2000). Intelligent data analysis in medicine. Encyclopedia of Computer Science and Technology, 42(9), 113–157.
Google Scholar
Lewis, D. D. (1998). Naive (Bayes) at forty: the independence assumption in information retrieval. In Proceedings of the 10th European conference on machine learning (pp. 4–15), 1998.
Maron, M., & Kuhns, J. (1960). On relevance, probabilistic indexing, and information retrieval. Journal of the Association for Computing Machinery, 7(3), 216–244.
Google Scholar
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
MATH Google Scholar
Miyahara, K., & Pazzani, M. J. (2000). Collaborative filtering with the simple Bayesian classifier. In Proceedings of the 6th Pacific rim international conference on artificial intelligence (pp. 679–689), 2000.
Mooney, R. J., & Roy, L. (2000). Content-based book recommending using learning for text categorization. In Proceedings of the 5th ACM conference on digital libraries (pp. 195–204). New York: ACM Press.
Chapter Google Scholar
Moore, D. S., & McCabe, G. P. (2002). Introduction to the practice of statistics (4th ed.). San Francisco: Michelle Julet.
Google Scholar
Mora, L., Fortes, I., Morales, R., & Triguero, F. (2000). Dynamic discretization of continuous values from time series. In Proceedings of the 11th European conference on machine learning (pp. 280–291), 2000.
Pazzani, M. J. (1995). An iterative improvement approach for the discretization of numeric attributes in Bayesian classifiers. In Proceedings of the 1st international conference on knowledge discovery and data mining (pp. 228–233), 1995.
Pazzani, M. J., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In Proceedings of the 11th international conference on machine learning (pp. 217–225). San Mateo: Morgan Kaufmann.
Google Scholar
Perner, P., & Trautzsch, S. (1998). Multi-interval discretization methods for decision tree learning. In Proceedings of advances in pattern recognition, joint IAPR international workshops SSPR98 and SPR98 (pp. 475–482), 1998.
Provost, F., & Aronis, J. (1996). Scaling up machine learning with massive parallelism. Machine Learning, 23(1), 33–46.
Google Scholar
Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In Proceedings of the 2nd international conference on knowledge discovery and data mining (pp. 334–338), 1996.
Samuels, M. L., & Witmer, J. A. (1999). Statistics for the life sciences (2nd ed.). New York: Prentice-Hall.
Google Scholar
Singh, M., & Provan, G. M. (1996). Efficient learning of selective Bayesian network classifiers. In Proceedings of the 13th international conference on machine learning (pp. 453–461), 1996.
Starr, B., Ackerman, M. S., & Pazzani, M. J. (1996). Do-I-care: a collaborative web agent. In Proceedings of the ACM conference on human factors in computing systems (pp. 273–274), 1996.
Torgo, L., & Gama, J. (1997). Search-based class discretization. In Proceedings of the 9th European conference on machine learning (pp. 266–273), 1997.
Webb, G. I. (2000). Multiboosting: a technique for combining boosting and wagging. Machine Learning, 40(2), 159–196.
Article Google Scholar
Webb, G. I., Boughton, J., & Wang, Z. (2005). Not so naive Bayes: averaged one-dependence estimators. Machine Learning, 58(1), 5–24.
Article MATH Google Scholar
Weiss, N. A. (2002). Introductory statistics (6th ed.). Greg Tobin.
Yang, Y., & Webb, G. I. (2001). Proportional k-interval discretization for naive-Bayes classifiers. In Proceedings of the 12th European conference on machine learning (pp. 564–575), 2001.
Yang, Y., & Webb, G. I. (2003). On why discretization works for naive-Bayes classifiers. In Proceedings of the 16th Australian joint conference on artificial intelligence (pp. 440–452), 2003.
Zheng, Z., & Webb, G. I. (2000). Lazy learning of Bayesian rules. Machine Learning, 41(1), 53–84.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Australian Taxation Office, 990 Whitehorse Road, Box Hill, Victoria, 3128, Australia
Ying Yang
Faculty of Information Technology, Monash University, Clayton, Victoria, 3800, Australia
Geoffrey I. Webb

Authors

Ying Yang
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying Yang.

Additional information

Editor: Dan Roth.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Webb, G.I. Discretization for naive-Bayes learning: managing discretization bias and variance. Mach Learn 74, 39–74 (2009). https://doi.org/10.1007/s10994-008-5083-5

Download citation

Received: 04 July 2005
Revised: 28 February 2008
Accepted: 07 August 2008
Published: 04 September 2008
Issue Date: January 2009
DOI: https://doi.org/10.1007/s10994-008-5083-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Discretization for naive-Bayes learning: managing discretization bias and variance

Abstract

Article PDF

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A random forest guided tour

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discretization for naive-Bayes learning: managing discretization bias and variance

Abstract

Article PDF

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A random forest guided tour

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation