Is Combining Classifiers with Stacking Better than Selecting the Best One?

Published: March 2004

Volume 54, pages 255–273, (2004)
Cite this article

Machine Learning Aims and scope Submit manuscript

Is Combining Classifiers with Stacking Better than Selecting the Best One?

Saso Džeroski¹ &
Bernard Ženko

21k Accesses
562 Citations
6 Altmetric
Explore all metrics

Abstract

We empirically evaluate several state-of-the-art methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. Among state-of-the-art stacking methods, stacking with probability distributions and multi-response linear regression performs best. We propose two extensions of this method, one using an extended set of meta-level features and the other using multi-response model trees to learn at the meta-level. We show that the latter extension performs better than existing stacking approaches and better than selecting the best classifier by cross validation.

Article PDF

Similar content being viewed by others

An Introduction to Stacking Regression for Economists

Chapter © 2022

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

Article 26 July 2018

Robust Naive Bayes Combination of Multiple Classifications

Chapter © 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aha, D., Kibler, W. D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Google Scholar
Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases.
Cleary, J. G., & Trigg, L. E. (1995). K*: An instance-based learner using an entropic distance measure. In Proceedings of the 12th International Conference on Machine Learning (pp. 108–114). San Francisco, Morgan Kaufmann.
Google Scholar
Dietterich, T. G. (1997). Machine-learning research: Four current directions. AI Magazine, 18:4, 97–136.
Google Scholar
Dietterich, T. G. (1998). Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, 10:7, 1895–1923.
Google Scholar
Dietterich, T. G. (2000). Ensemble methods in machine learning. In Proceedings of the First International Workshop on Multiple Classifier Systems (pp. 1–15). Berlin: Springer.
Google Scholar
Džeroski, S., & Ženko, B. (2002). Is combining classifiers better than selecting the best one? In Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco: Morgan Kaufmann.
Google Scholar
Džeroski, S., & Ženko, B. (2002). Stacking with multi-response model trees. In Multiple Classifiers Systems, Proceedings of the Third International Workshop, Berlin: Springer.
Google Scholar
Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32:1, 63–76.
Google Scholar
Gams, M., Bohanec, M., & Cestnik, B. (1994). A schema for using multiple knowledge. In S. J. Hanson, T. Petsche, M. Kearns, & R. L. Rivest, editors, Computational Learning Theory and Natural Learning Systems, volume II (pp. 157–170). Cambridge, Massachusetts: MIT Press.
Google Scholar
John, G. H., & Langley, P. (1995). Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338–345). San Francisco, Morgan Kaufmann.
Google Scholar
Kohavi, R. (1995). The power of decision tables. In Proceedings of the Eighth European Conference on Machine Learning (pp. 174–189).
Merz, C. J. (1999). Using correspondence analysis to combine classifiers. Machine Learning, 36:1/2, 33–58.
Google Scholar
Quinlan, J. R. (1992). Learning with continuous classes. In Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence (pp. 343–348). Singapore, World Scientific.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann.
Google Scholar
Seewald, A. K. (2002). How to make stacking better and faster while also taking care of an unknown weakness. In Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco: Morgan Kaufmann.
Google Scholar
Ting, K. M., & Witten, I. H. (1999) Issues in stacked generalization. Journal of Artificial Intelligence Research, 10, 271–289.
Google Scholar
Todorovski, L., & Džeroski, S. (2000). Combining multiple models with meta decision trees. In Proceedings of the Fourth European Conference on Principles of Data Mining and Knowledge Discovery (pp. 54–64). Berlin, Springer.
Google Scholar
Todorovski, L., & Džeroski, S. (2002). Combining classifiers with meta decision trees. Machine Learning, 50:3, 223–249.
Google Scholar
Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18:2, 77–95.
Google Scholar
Wang, Y., & Witten, I. H. (1997). Induction of model trees for predicting continuous classes. In Proceedings of the Poster Papers of the European Conference on Machine Learning, Prague. University of Economics, Faculty of Informatics and Statistics.
Google Scholar
Witten, I. H., & Frank, E. (1999). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann.
Google Scholar
Wolpert, D. (1992). Stacked generalization. Neural Networks, 5:2, 241–260.
Google Scholar
Ženko, B., & Džeroski, S. (2002). Stacking with an extended set of meta-level attributes and MLR. In Proceedings of the Thirteenth European Conference on Machine Learning, Berlin: Springer.
Google Scholar
Ženko, B., Todorovski, L., & Džeroski, S. (2001). A comparison of stacking with MDTs to bagging, boosting, and other stacking methods. In Proceedings of the First IEEE International Conference on Data Mining (pp. 669–670). Los Alamitos, IEEE Computer Society.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, SI-1000, Ljubljana, Slovenia
Saso Džeroski

Authors

Saso Džeroski
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Ženko
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Džeroski, S., Ženko, B. Is Combining Classifiers with Stacking Better than Selecting the Best One?. Machine Learning 54, 255–273 (2004). https://doi.org/10.1023/B:MACH.0000015881.36452.6e

Download citation

Issue Date: March 2004
DOI: https://doi.org/10.1023/B:MACH.0000015881.36452.6e

Use our pre-submission checklist

Avoid common mistakes on your manuscript.