Cascade Generalization

Gama, João; Brazdil, Pavel

doi:10.1023/A:1007652114878

Cascade Generalization

Published: December 2000

Volume 41, pages 315–343, (2000)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Cascade Generalization

Download PDF

João Gama¹ &
Pavel Brazdil²

2409 Accesses
179 Citations
9 Altmetric
Explore all metrics

Abstract

Using multiple classifiers for increasing learning accuracy is an active research area. In this paper we present two related methods for merging classifiers. The first method, Cascade Generalization, couples classifiers loosely. It belongs to the family of stacking algorithms. The basic idea of Cascade Generalization is to use sequentially the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. The second method exploits tight coupling of classifiers, by applying Cascade Generalization locally. At each iteration of a divide and conquer algorithm, a reconstruction of the instance space occurs by the addition of new attributes. Each new attribute represents the probability that an example belongs to a class given by a base classifier. We have implemented three Local Generalization Algorithms. The first merges a linear discriminant with a decision tree, the second merges a naive Bayes with a decision tree, and the third merges a linear discriminant and a naive Bayes with a decision tree. All the algorithms show an increase of performance, when compared with the corresponding single models. Cascade also outperforms other methods for combining classifiers, like Stacked Generalization, and competes well against Boosting at statistically significant confidence levels.

References

Ali, K. M. & Pazzani, M. J. (1996). Error reduction through learning multiple descriptions. Machine Learning, 4, 173–202.
Google Scholar
Bauer, E. & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–139.
Google Scholar
Blake, C., Keogh, E., & Merz, C. (1999). UCI repository of Machine Learning databases. Department of Information and Computer Science, University of California at Irvine, Irvine, CA.
Google Scholar
Breiman, L. (1998). Arcing classifiers. The Annals of Statistics, 26(3), 801–849.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth International Group.
Brodley, C. E. (1995). Recursive automatic bias selection for classifier construction. Machine Learning, 20, 63–94.
Google Scholar
Brodley, C. E. & Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19, 45–77.
Google Scholar
Buntine, W. (1990). A theory of learning classification rules. Ph.D. Thesis, University of Sydney.
Chan, P. & Stolfo, S. (1995a). A comparative evaluation of voting and meta-learning on partitioned data. In A. Prieditis & S. Russel (Eds.), Machine Learning, Proc. of 12th International Conference. Morgan Kaufmann.
Chan, P. & Stolfo, S. (1995b). Learning arbiter and combiner trees from partitioned data for scaling machine learning. In U. M. Fayyad & R. Uthurusamy (Eds.), Proc. of the First Intern. Conference on Knowledge Discovery and Data Mining. AAAI Press.
Dillon, W. & Goldstein, M. (1984). Multivariate analysis, methods and applications. J. Wiley and Sons, Inc.
Domingos, P. & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–129.
Google Scholar
Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In A. Prieditis & S. Russel (Eds.), Machine Learning, Proc. of 12th International Conference. Morgan Kaufmann.
Fahlman, S. E. (1991). The recurrent cascade-correlation architecture. In R. P. Lippmann, J. E. Moody, & D. S. Touretzky (Eds.), Advances in neural information processing systems (Vol. 3, pp. 190–196). Morgan Kaufmann Publishers, Inc.
Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In L. Saitta (Ed.), Machine Learning, Proc. of the 13th International Conference. Morgan Kaufmann.
Gama, J. (1998). Combining classifiers with constructive induction. In C. Nedellec & C. Rouveirol (Eds.), Proc. of European Conf. on Machine Learning ECML-98. LNAI 1398, Springer Verlag.
Gama, J. & Brazdil, P. (1999). Linear tree. Intelligent Data Analysis, 3(1), 1–22.
Google Scholar
Henery, B. (1997). Combining classification procedures. In R. Nakhaeizadeh, C. Taylor (Ed.), Machine learning and statistics: The Interface. John Wiley, Sons, Inc.
Kohavi, R. & Wolpert, D. H. (1996). Bias plus variance decomposition for zero-one loss functions. In L. Saitta (Ed.), Machine Learning, Proceedings of the 13th International Conference. Morgan Kaufmann.
Langley, P. (1993). Induction of recursive Bayesian classifiers. In P. Brazdil (Ed.), Proc. of European Conf. on Machine Learning: ECML-93. LNAI 667, Springer Verlag.
Langley, P. (1996). Elements of machine learning. Morgan Kaufmann.
Michie, D., Spiegelhalter, D., & Taylor, C. (1994). Machine learning, neural and statistical classification. Ellis Horwood.
Mitchell, T. (1997). Machine learning. MacGraw-Hill Companies, Inc.
Murthy, S., Kasif, S., & Salzberg, S. (1994). A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2, 1–32.
Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann Publishers, Inc.
Quinlan, R. (1986). Induction of decision trees. Machine Learning, 1, 89–106.
Google Scholar
Quinlan, R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers, Inc.
Quinlan, R. (1996). Bagging, Boosting and C4.5 In Proc. 13th American Association for Artificial Intelligence. AAAI Press.
Rivest, R. L. (1987). Learning decision lists. Machine Learning, 2, 229–246.
Google Scholar
Schaffer, C. (1993). Selecting a classification method by cross-validation. Machine Learning, 13, 135–143.
Google Scholar
Skalak, D. (1997). Prototype selection for composite nearest neighbor classifiers. Ph.D. Thesis, University of Massachusetts Amherst.
Ting, K. & Witten, I. (1997). Stacked generalization: When does it work? In Proc. International Joint Conference on Artificial Intelligence. Morgan Kaufmann.
Tumer, K. & Ghosh, J. (1996). Error correlation and error reduction in ensemble classifiers. Connection Science, Special Issue on Combining Artificial Neural Networks: Ensemble approaches, 8(34), 385–404.
Google Scholar
Wolpert, D. (1992). Stacked generalization. Neural networks (Vol. 5, pp. 241–260). Pergamon Press.
Google Scholar

Download references

Author information

Authors and Affiliations

LIACC, FEP, University of Porto, Rua Campo Alegre, 823 4150, Porto, Portugal
João Gama
LIACC, FEP, University of Porto, Rua Campo Alegre, 823 4150, Porto, Portugal
Pavel Brazdil

Authors

João Gama
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Brazdil
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gama, J., Brazdil, P. Cascade Generalization. Machine Learning 41, 315–343 (2000). https://doi.org/10.1023/A:1007652114878

Download citation

Issue Date: December 2000
DOI: https://doi.org/10.1023/A:1007652114878

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Cascade Generalization

Abstract

Article PDF

Similar content being viewed by others

Hierarchical Classification for Solving Multi-class Problems: A New Approach Using Naive Bayesian Classification

Stacked generalization: an introduction to super learning

Inner Ensembles: Using Ensemble Methods Inside the Learning Algorithm

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Cascade Generalization

Abstract

Article PDF

Similar content being viewed by others

Hierarchical Classification for Solving Multi-class Problems: A New Approach Using Naive Bayesian Classification

Stacked generalization: an introduction to super learning

Inner Ensembles: Using Ensemble Methods Inside the Learning Algorithm

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation