Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
One of the crucial problems of designing a classifier ensemble is the proper choice of the base classifier line-up. Basically, such an ensemble is formed on the basis of individual classifiers, which are trained in such a way to ensure their high diversity or they are chosen on the basis of pruning which reduces the number of predictive models in order to improve efficiency and predictive performance of the ensemble. This work is focusing on clustering-based ensemble pruning, which looks for the group of similar classifiers which are replaced by their representatives. We propose a novel pruning criterion based on well-known diversity measures and describe three algorithms using classifier clustering. The first method selects the model with the best predictive performance from each cluster to form the final ensemble, the second one employs the multistage organization, where instead of removing the classifiers from the ensemble each classifier cluster makes the decision independently, while the third proposition combines multistage organization and sampling with replacement. The proposed approaches were evaluated using 30 datasets with different characteristics. Experimentation results validated through statistical tests confirmed the usefulness of the proposed approaches.