Abstract
Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classification accuracy, training time, and (in the case of trees) number of leaves. Classification accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called POLYCLSSS at the top, although it is not statistically significantly different from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is QUEST with linear splits, which ranks fourth and fifth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. POLYCLASS, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The QUEST and logistic regression algorithms are substantially faster. Among decision tree algorithms with univariate splits, C4.5, IND-CART, and QUEST have the best combinations of error rate and speed. But C4.5 tends to produce trees with twice as many leaves as those from IND-CART and QUEST.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Agresti, A. (1990). Categorical Data Analysis. New York, NY: John Wiley & Sons.
Aronis, J. M. & Provost, F. J. (1997). Increasing the efficiency of data mining algorithms with breadth-first marker propagation. In D. Heckerman, H. Mannila, D. Pregibon, & R. Uthurusamy (Eds.), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 119–122). Menlo Park, CA: AAAI Press.
Auer, P., Holte, R. C., & Maass, W. (1995). Theory and applications of agnostic PAC-learning with small decision trees. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 21–29). San Francisco, CA: Morgan Kaufmann.
Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988). The New S Language. Wadsworth.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. New York, NY: Oxford University Press.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1994). Classification and Regression Trees. New York, NY: Chapman and Hall.
Breslow, L. A. & Aha, D. W. (1997). Simplifying decision trees: A survey. Knowledge Engineering Review, 12, 1–40.
Brodley, C. E. & Utgoff, P. E. (1992). Multivariate versus univariate decision trees. Technical Report 92–8, Department of Computer Science, University of Massachusetts, Amherst, MA.
Brodley, C. E. & Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19, 45–77.
Brown, D. E., Corruble, V., & Pittard, C. L. (1993). A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems. Pattern Recognition, 26, 953–961.
Bull, S. (1994). Analysis of attitudes toward workplace smoking restrictions. In N. Lange, L. Ryan, L. Billard, D. Brillinger, L. Conquest, & J. Greenhouse (Eds.), Case Studies in Biometry (pp. 249–271). New York, NY: John Wiley & Sons.
Buntine, W. (1992). Learning classification trees. Statistics and Computing, 2, 63–73.
Buntine, W. & Caruana, R. (1992). Introduction to IND Version 2.1 and Recursive Partitioning. NASA Ames Research Center, Moffet Field, CA.
Clark, L. A. & Pregibon, D. (1993). Tree-based models. In J. M. Chambers & T. J. Hastie (Eds.), Statistical Models in S (pp. 377–419). New York, NY: Chapman & Hall.
Cohen, W.W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 115–123). San Francisco, CA: Morgan Kaufmann.
Curram, S. P. & Mingers, J. (1994). Neural networks, decision tree induction and discriminant analysis: An empirical comparison. Journal of the Operational Research Society, 45, 440–450.
Friedman, J. (1991). Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19, 1–141.
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675–701.
Hand, D. J. (1997). Construction and Assessment of Classification Rules. Chichester, England: John Wiley & Sons.
Harrison, D. & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.
Hastie, T., Buja, A., & Tibshirani, R. (1995). Penalized discriminant analysis. Annals of Statistics, 23, 73–102.
Hastie, T. & Tibshirani, R. (1996). Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society, Series B, 58, 155–176.
Hastie, T., Tibshirani, R., & Buja, A. (1994). Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association, 89, 1255–1270.
Hollander, M. & Wolfe, D. A. (1994). Nonparametric Statistical Methods, (2nd ed.). New York, NY: John Wiley & Sons.
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–90.
Johnson, R. A. & Wichern, D. W. (1992). Applied Multivariate Statistical Analysis, (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.
Kohonen, T. (1995). Self-Organizing Maps. Heidelberg: Springer-Verlag.
Kooperberg, C., Bose, S., & Stone, C. J. (1997). Polychotomous regression. Journal of the American Statistical Association, 92, 117–127.
Lerman, C., Molyneaux, J.W., Pangemanan, S., & Iswarati. (1991). The determinants of contraceptive method and service point choice. In Secondary Analysis of the 1987 National Indonesia Contraceptive Prevalence Survey, Volume 1: Fertility and Family Planning. Honolulu, HI: East-West Population Institute.
Loh, W.-Y. & Shih, Y.-S. (1997). Split selection methods for classification trees. Statistica Sinica, 7, 815–840.
Loh, W.-Y. & Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association, 83, 715–728.
Mangasarian, O. L. & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. Siam News, 23, 1–18.
Merz, C. J. & Murphy, P. M. (1996). UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, Irvine, CA (http://www.ics.uci.edu/~mlearn/MLRepository.html).
Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (Eds.). (1994). Machine Learning, Neural and Statistical Classification, Ellis Horwood: London.
Ruppert G. Miller, Jr. (1981). Simultaneous Statistical Inference, (2nd ed.). New York: Springer-Verlag.
Müller, W. & Wysotzki, F. (1994). Automatic construction of decision trees for classification. Annals of Operations Research, 52, 231–247.
Müller, W. & Wysotzki, F. (1997). The decision-tree algorithm CAL5 based on a statistical approach to its splitting algorithm. In G. Nakhaeizadeh & C. C. Taylor (Eds.), Machine Learning and Statistics: The Interface (pp. 45–65). New York, NY: John Wiley & Sons.
Murthy, S. K., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2, 1–33.
Neter, J., Wasserman, W., & Kutner, M. H. (1990). Applied Linear Statistical Models, (3rd ed.). Boston, MA: Irwin.
Oates, T. & Jensen, D. (1997). The effects of training set size on decision tree complexity. In D. H. Fisher, Jr. (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (pp. 254–262). San Francisco, CA: Morgan Kaufmann.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.
Sarle, W. S. (1994). Neural networks and statistical models. In Proceedings of the Nineteenth Annual SAS Users Groups International Conference, Cary, NC (pp. 1538–1550). SAS Institute, Inc. (ftp://ftp.sas.com/pub/neural/neural1.ps).
SAS Institute, Inc. SAS/STAT User's Guide, Version 6, (Vols. 1 & 2). SAS Institute, Inc., Cary, NC, 1990.
Shavlik, J. W., Mooney, R. J., & Towell G. G. (1991). Symbolic and neural learning algorithms: an empirical comparison. Machine Learning, 6, 111–144.
Venables, W. N. & Ripley, B. D. (1997). Modern Applied Statistics with S-Plus, (2nd ed.). NewYork, NY: Springer.
Wolberg, W. H., Tanner, M. A., & Loh, W.-Y. (1988). Diagnostic schemes for fine needle aspirates of breast masses. Analytical and Quantitative Cytology and Histology, 10, 225–228.
Wolberg, W. H., Tanner, M. A., & Loh, W.-Y. (1989). Fine needle aspiration for breast mass diagnosis. Archives of Surgery, 124, 814–818.
Wolberg, W. H., Tanner, M. A., Loh, W.-Y., & Vanichsetakul, N. (1987). Statistical approach to fine needle aspiration diagnosis of breast masses. Acta Cytologica, 31, 737–741.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lim, TS., Loh, WY. & Shih, YS. A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms. Machine Learning 40, 203–228 (2000). https://doi.org/10.1023/A:1007608224229
Issue Date:
DOI: https://doi.org/10.1023/A:1007608224229