A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Lim, Tjen-Sien; Loh, Wei-Yin; Shih, Yu-Shan

doi:10.1023/A:1007608224229

A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Published: September 2000

Volume 40, pages 203–228, (2000)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Download PDF

Tjen-Sien Lim¹,
Wei-Yin Loh² &
Yu-Shan Shih³

12k Accesses
738 Citations
4 Altmetric
Explore all metrics

Abstract

Twenty-two decision tree, nine statistical, and two neural network algorithms are compared on thirty-two datasets in terms of classification accuracy, training time, and (in the case of trees) number of leaves. Classification accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called POLYCLSSS at the top, although it is not statistically significantly different from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is QUEST with linear splits, which ranks fourth and fifth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. POLYCLASS, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The QUEST and logistic regression algorithms are substantially faster. Among decision tree algorithms with univariate splits, C4.5, IND-CART, and QUEST have the best combinations of error rate and speed. But C4.5 tends to produce trees with twice as many leaves as those from IND-CART and QUEST.

Article PDF

SPAARC: A Fast Decision Tree Algorithm

Classification

Building semi-supervised decision trees with semi-cart algorithm

Article Open access 24 April 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Agresti, A. (1990). Categorical Data Analysis. New York, NY: John Wiley & Sons.
Google Scholar
Aronis, J. M. & Provost, F. J. (1997). Increasing the efficiency of data mining algorithms with breadth-first marker propagation. In D. Heckerman, H. Mannila, D. Pregibon, & R. Uthurusamy (Eds.), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 119–122). Menlo Park, CA: AAAI Press.
Google Scholar
Auer, P., Holte, R. C., & Maass, W. (1995). Theory and applications of agnostic PAC-learning with small decision trees. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 21–29). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Becker, R. A., Chambers, J. M., & Wilks, A. R. (1988). The New S Language. Wadsworth.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. New York, NY: Oxford University Press.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1994). Classification and Regression Trees. New York, NY: Chapman and Hall.
Google Scholar
Breslow, L. A. & Aha, D. W. (1997). Simplifying decision trees: A survey. Knowledge Engineering Review, 12, 1–40.
Google Scholar
Brodley, C. E. & Utgoff, P. E. (1992). Multivariate versus univariate decision trees. Technical Report 92–8, Department of Computer Science, University of Massachusetts, Amherst, MA.
Google Scholar
Brodley, C. E. & Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19, 45–77.
Google Scholar
Brown, D. E., Corruble, V., & Pittard, C. L. (1993). A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems. Pattern Recognition, 26, 953–961.
Google Scholar
Bull, S. (1994). Analysis of attitudes toward workplace smoking restrictions. In N. Lange, L. Ryan, L. Billard, D. Brillinger, L. Conquest, & J. Greenhouse (Eds.), Case Studies in Biometry (pp. 249–271). New York, NY: John Wiley & Sons.
Google Scholar
Buntine, W. (1992). Learning classification trees. Statistics and Computing, 2, 63–73.
Google Scholar
Buntine, W. & Caruana, R. (1992). Introduction to IND Version 2.1 and Recursive Partitioning. NASA Ames Research Center, Moffet Field, CA.
Google Scholar
Clark, L. A. & Pregibon, D. (1993). Tree-based models. In J. M. Chambers & T. J. Hastie (Eds.), Statistical Models in S (pp. 377–419). New York, NY: Chapman & Hall.
Google Scholar
Cohen, W.W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 115–123). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Curram, S. P. & Mingers, J. (1994). Neural networks, decision tree induction and discriminant analysis: An empirical comparison. Journal of the Operational Research Society, 45, 440–450.
Google Scholar
Friedman, J. (1991). Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19, 1–141.
Google Scholar
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32, 675–701.
Google Scholar
Hand, D. J. (1997). Construction and Assessment of Classification Rules. Chichester, England: John Wiley & Sons.
Google Scholar
Harrison, D. & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.
Google Scholar
Hastie, T., Buja, A., & Tibshirani, R. (1995). Penalized discriminant analysis. Annals of Statistics, 23, 73–102.
Google Scholar
Hastie, T. & Tibshirani, R. (1996). Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society, Series B, 58, 155–176.
Google Scholar
Hastie, T., Tibshirani, R., & Buja, A. (1994). Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association, 89, 1255–1270.
Google Scholar
Hollander, M. & Wolfe, D. A. (1994). Nonparametric Statistical Methods, (2nd ed.). New York, NY: John Wiley & Sons.
Google Scholar
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–90.
Google Scholar
Johnson, R. A. & Wichern, D. W. (1992). Applied Multivariate Statistical Analysis, (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Kohonen, T. (1995). Self-Organizing Maps. Heidelberg: Springer-Verlag.
Google Scholar
Kooperberg, C., Bose, S., & Stone, C. J. (1997). Polychotomous regression. Journal of the American Statistical Association, 92, 117–127.
Google Scholar
Lerman, C., Molyneaux, J.W., Pangemanan, S., & Iswarati. (1991). The determinants of contraceptive method and service point choice. In Secondary Analysis of the 1987 National Indonesia Contraceptive Prevalence Survey, Volume 1: Fertility and Family Planning. Honolulu, HI: East-West Population Institute.
Google Scholar
Loh, W.-Y. & Shih, Y.-S. (1997). Split selection methods for classification trees. Statistica Sinica, 7, 815–840.
Google Scholar
Loh, W.-Y. & Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association, 83, 715–728.
Google Scholar
Mangasarian, O. L. & Wolberg, W. H. (1990). Cancer diagnosis via linear programming. Siam News, 23, 1–18.
Google Scholar
Merz, C. J. & Murphy, P. M. (1996). UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, Irvine, CA (http://www.ics.uci.edu/~mlearn/MLRepository.html).
Google Scholar
Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (Eds.). (1994). Machine Learning, Neural and Statistical Classification, Ellis Horwood: London.
Google Scholar
Ruppert G. Miller, Jr. (1981). Simultaneous Statistical Inference, (2nd ed.). New York: Springer-Verlag.
Google Scholar
Müller, W. & Wysotzki, F. (1994). Automatic construction of decision trees for classification. Annals of Operations Research, 52, 231–247.
Google Scholar
Müller, W. & Wysotzki, F. (1997). The decision-tree algorithm CAL5 based on a statistical approach to its splitting algorithm. In G. Nakhaeizadeh & C. C. Taylor (Eds.), Machine Learning and Statistics: The Interface (pp. 45–65). New York, NY: John Wiley & Sons.
Google Scholar
Murthy, S. K., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2, 1–33.
Google Scholar
Neter, J., Wasserman, W., & Kutner, M. H. (1990). Applied Linear Statistical Models, (3rd ed.). Boston, MA: Irwin.
Google Scholar
Oates, T. & Jensen, D. (1997). The effects of training set size on decision tree complexity. In D. H. Fisher, Jr. (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (pp. 254–262). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.
Google Scholar
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.
Google Scholar
Sarle, W. S. (1994). Neural networks and statistical models. In Proceedings of the Nineteenth Annual SAS Users Groups International Conference, Cary, NC (pp. 1538–1550). SAS Institute, Inc. (ftp://ftp.sas.com/pub/neural/neural1.ps).
Google Scholar
SAS Institute, Inc. SAS/STAT User's Guide, Version 6, (Vols. 1 & 2). SAS Institute, Inc., Cary, NC, 1990.
Google Scholar
Shavlik, J. W., Mooney, R. J., & Towell G. G. (1991). Symbolic and neural learning algorithms: an empirical comparison. Machine Learning, 6, 111–144.
Google Scholar
Venables, W. N. & Ripley, B. D. (1997). Modern Applied Statistics with S-Plus, (2nd ed.). NewYork, NY: Springer.
Google Scholar
Wolberg, W. H., Tanner, M. A., & Loh, W.-Y. (1988). Diagnostic schemes for fine needle aspirates of breast masses. Analytical and Quantitative Cytology and Histology, 10, 225–228.
Google Scholar
Wolberg, W. H., Tanner, M. A., & Loh, W.-Y. (1989). Fine needle aspiration for breast mass diagnosis. Archives of Surgery, 124, 814–818.
Google Scholar
Wolberg, W. H., Tanner, M. A., Loh, W.-Y., & Vanichsetakul, N. (1987). Statistical approach to fine needle aspiration diagnosis of breast masses. Acta Cytologica, 31, 737–741.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Wisconsin, Madison, WI, 53706, USA
Tjen-Sien Lim
Department of Statistics, University of Wisconsin, Madison, WI, 53706, USA
Wei-Yin Loh
Department of Mathematics, National Chung Cheng University, Chiayi, 621, Taiwan, R.O.C.
Yu-Shan Shih

Authors

Tjen-Sien Lim
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Yin Loh
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Shan Shih
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lim, TS., Loh, WY. & Shih, YS. A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms. Machine Learning 40, 203–228 (2000). https://doi.org/10.1023/A:1007608224229

Download citation

Issue Date: September 2000
DOI: https://doi.org/10.1023/A:1007608224229

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Abstract

Article PDF

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Classification

Building semi-supervised decision trees with semi-cart algorithm

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Abstract

Article PDF

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Classification

Building semi-supervised decision trees with semi-cart algorithm

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation