Abstract
We empirically evaluate several state-of-the-art methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. Among state-of-the-art stacking methods, stacking with probability distributions and multi-response linear regression performs best. We propose two extensions of this method, one using an extended set of meta-level features and the other using multi-response model trees to learn at the meta-level. We show that the latter extension performs better than existing stacking approaches and better than selecting the best classifier by cross validation.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aha, D., Kibler, W. D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases.
Cleary, J. G., & Trigg, L. E. (1995). K*: An instance-based learner using an entropic distance measure. In Proceedings of the 12th International Conference on Machine Learning (pp. 108–114). San Francisco, Morgan Kaufmann.
Dietterich, T. G. (1997). Machine-learning research: Four current directions. AI Magazine, 18:4, 97–136.
Dietterich, T. G. (1998). Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, 10:7, 1895–1923.
Dietterich, T. G. (2000). Ensemble methods in machine learning. In Proceedings of the First International Workshop on Multiple Classifier Systems (pp. 1–15). Berlin: Springer.
Džeroski, S., & Ženko, B. (2002). Is combining classifiers better than selecting the best one? In Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco: Morgan Kaufmann.
Džeroski, S., & Ženko, B. (2002). Stacking with multi-response model trees. In Multiple Classifiers Systems, Proceedings of the Third International Workshop, Berlin: Springer.
Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32:1, 63–76.
Gams, M., Bohanec, M., & Cestnik, B. (1994). A schema for using multiple knowledge. In S. J. Hanson, T. Petsche, M. Kearns, & R. L. Rivest, editors, Computational Learning Theory and Natural Learning Systems, volume II (pp. 157–170). Cambridge, Massachusetts: MIT Press.
John, G. H., & Langley, P. (1995). Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338–345). San Francisco, Morgan Kaufmann.
Kohavi, R. (1995). The power of decision tables. In Proceedings of the Eighth European Conference on Machine Learning (pp. 174–189).
Merz, C. J. (1999). Using correspondence analysis to combine classifiers. Machine Learning, 36:1/2, 33–58.
Quinlan, J. R. (1992). Learning with continuous classes. In Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence (pp. 343–348). Singapore, World Scientific.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann.
Seewald, A. K. (2002). How to make stacking better and faster while also taking care of an unknown weakness. In Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco: Morgan Kaufmann.
Ting, K. M., & Witten, I. H. (1999) Issues in stacked generalization. Journal of Artificial Intelligence Research, 10, 271–289.
Todorovski, L., & Džeroski, S. (2000). Combining multiple models with meta decision trees. In Proceedings of the Fourth European Conference on Principles of Data Mining and Knowledge Discovery (pp. 54–64). Berlin, Springer.
Todorovski, L., & Džeroski, S. (2002). Combining classifiers with meta decision trees. Machine Learning, 50:3, 223–249.
Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18:2, 77–95.
Wang, Y., & Witten, I. H. (1997). Induction of model trees for predicting continuous classes. In Proceedings of the Poster Papers of the European Conference on Machine Learning, Prague. University of Economics, Faculty of Informatics and Statistics.
Witten, I. H., & Frank, E. (1999). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann.
Wolpert, D. (1992). Stacked generalization. Neural Networks, 5:2, 241–260.
Ženko, B., & Džeroski, S. (2002). Stacking with an extended set of meta-level attributes and MLR. In Proceedings of the Thirteenth European Conference on Machine Learning, Berlin: Springer.
Ženko, B., Todorovski, L., & Džeroski, S. (2001). A comparison of stacking with MDTs to bagging, boosting, and other stacking methods. In Proceedings of the First IEEE International Conference on Data Mining (pp. 669–670). Los Alamitos, IEEE Computer Society.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Džeroski, S., Ženko, B. Is Combining Classifiers with Stacking Better than Selecting the Best One?. Machine Learning 54, 255–273 (2004). https://doi.org/10.1023/B:MACH.0000015881.36452.6e
Issue Date:
DOI: https://doi.org/10.1023/B:MACH.0000015881.36452.6e