Abstract
Using multiple classifiers for increasing learning accuracy is an active research area. In this paper we present two related methods for merging classifiers. The first method, Cascade Generalization, couples classifiers loosely. It belongs to the family of stacking algorithms. The basic idea of Cascade Generalization is to use sequentially the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. The second method exploits tight coupling of classifiers, by applying Cascade Generalization locally. At each iteration of a divide and conquer algorithm, a reconstruction of the instance space occurs by the addition of new attributes. Each new attribute represents the probability that an example belongs to a class given by a base classifier. We have implemented three Local Generalization Algorithms. The first merges a linear discriminant with a decision tree, the second merges a naive Bayes with a decision tree, and the third merges a linear discriminant and a naive Bayes with a decision tree. All the algorithms show an increase of performance, when compared with the corresponding single models. Cascade also outperforms other methods for combining classifiers, like Stacked Generalization, and competes well against Boosting at statistically significant confidence levels.
Article PDF
Similar content being viewed by others
References
Ali, K. M. & Pazzani, M. J. (1996). Error reduction through learning multiple descriptions. Machine Learning, 4, 173–202.
Bauer, E. & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–139.
Blake, C., Keogh, E., & Merz, C. (1999). UCI repository of Machine Learning databases. Department of Information and Computer Science, University of California at Irvine, Irvine, CA.
Breiman, L. (1998). Arcing classifiers. The Annals of Statistics, 26(3), 801–849.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth International Group.
Brodley, C. E. (1995). Recursive automatic bias selection for classifier construction. Machine Learning, 20, 63–94.
Brodley, C. E. & Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19, 45–77.
Buntine, W. (1990). A theory of learning classification rules. Ph.D. Thesis, University of Sydney.
Chan, P. & Stolfo, S. (1995a). A comparative evaluation of voting and meta-learning on partitioned data. In A. Prieditis & S. Russel (Eds.), Machine Learning, Proc. of 12th International Conference. Morgan Kaufmann.
Chan, P. & Stolfo, S. (1995b). Learning arbiter and combiner trees from partitioned data for scaling machine learning. In U. M. Fayyad & R. Uthurusamy (Eds.), Proc. of the First Intern. Conference on Knowledge Discovery and Data Mining. AAAI Press.
Dillon, W. & Goldstein, M. (1984). Multivariate analysis, methods and applications. J. Wiley and Sons, Inc.
Domingos, P. & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–129.
Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In A. Prieditis & S. Russel (Eds.), Machine Learning, Proc. of 12th International Conference. Morgan Kaufmann.
Fahlman, S. E. (1991). The recurrent cascade-correlation architecture. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (Eds.), Advances in neural information processing systems (Vol. 3, pp. 190–196). Morgan Kaufmann Publishers, Inc.
Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In L. Saitta (Ed.), Machine Learning, Proc. of the 13th International Conference. Morgan Kaufmann.
Gama, J. (1998). Combining classifiers with constructive induction. In C. Nedellec & C. Rouveirol (Eds.), Proc. of European Conf. on Machine Learning ECML-98. LNAI 1398, Springer Verlag.
Gama, J. & Brazdil, P. (1999). Linear tree. Intelligent Data Analysis, 3(1), 1–22.
Henery, B. (1997). Combining classification procedures. In R. Nakhaeizadeh, C. Taylor (Ed.), Machine learning and statistics: The Interface. John Wiley, Sons, Inc.
Kohavi, R. & Wolpert, D. H. (1996). Bias plus variance decomposition for zero-one loss functions. In L. Saitta (Ed.), Machine Learning, Proceedings of the 13th International Conference. Morgan Kaufmann.
Langley, P. (1993). Induction of recursive Bayesian classifiers. In P. Brazdil (Ed.), Proc. of European Conf. on Machine Learning: ECML-93. LNAI 667, Springer Verlag.
Langley, P. (1996). Elements of machine learning. Morgan Kaufmann.
Michie, D., Spiegelhalter, D., & Taylor, C. (1994). Machine learning, neural and statistical classification. Ellis Horwood.
Mitchell, T. (1997). Machine learning. MacGraw-Hill Companies, Inc.
Murthy, S., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2, 1–32.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann Publishers, Inc.
Quinlan, R. (1986). Induction of decision trees. Machine Learning, 1, 89–106.
Quinlan, R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers, Inc.
Quinlan, R. (1996). Bagging, Boosting and C4.5 In Proc. 13th American Association for Artificial Intelligence. AAAI Press.
Rivest, R. L. (1987). Learning decision lists. Machine Learning, 2, 229–246.
Schaffer, C. (1993). Selecting a classification method by cross-validation. Machine Learning, 13, 135–143.
Skalak, D. (1997). Prototype selection for composite nearest neighbor classifiers. Ph.D. Thesis, University of Massachusetts Amherst.
Ting, K. & Witten, I. (1997). Stacked generalization: When does it work? In Proc. International Joint Conference on Artificial Intelligence. Morgan Kaufmann.
Tumer, K. & Ghosh, J. (1996). Error correlation and error reduction in ensemble classifiers. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble approaches, 8(34), 385–404.
Wolpert, D. (1992). Stacked generalization. Neural networks (Vol. 5, pp. 241–260). Pergamon Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gama, J., Brazdil, P. Cascade Generalization. Machine Learning 41, 315–343 (2000). https://doi.org/10.1023/A:1007652114878
Issue Date:
DOI: https://doi.org/10.1023/A:1007652114878