Skip to main content
Top
Published in: Artificial Intelligence Review 1/2018

22-09-2016

The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks

Authors: Michael R. Smith, Tony Martinez

Published in: Artificial Intelligence Review | Issue 1/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Removing or filtering outliers and mislabeled instances prior to training a learning algorithm has been shown to increase classification accuracy, especially in noisy data sets. A popular approach is to remove any instance that is misclassified by a learning algorithm. However, the use of ensemble methods has also been shown to generally increase classification accuracy. In this paper, we extensively examine filtering and ensembling. We examine 9 learning algorithms individually and ensembled together as filtering algorithms as well as the effects of filtering in the 9 chosen learning algorithms on a set of 54 data sets. We compare the filtering results with using a majority voting ensemble. We find that the majority voting ensemble significantly outperforms filtering unless there are high amounts of noise present in the data set. Additionally, for most cases, using an ensemble of learning algorithms for filtering produces a greater increase in classification accuracy than using a single learning algorithm for filtering.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
As opposed to an ensemble composed of models linduced by the same learning algorithm such as bagging or boosting.
 
2
The NNge learning algorithm did not finish running two data sets: eye-movements and Magic telescope. RIPPER did not finish on the lung cancer data set. In these cases, the data sets are omitted from the presented results. As such, NNge was evaluated on a set of 52 data sets and RIPPER was evaluated on a set of 53 data sets.
 
Literature
go back to reference Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 1. Springer, New YorkMATH Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 1. Springer, New YorkMATH
go back to reference Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167MATH Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167MATH
go back to reference Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23rd international conference on machine learning, pp 201–208 Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23rd international conference on machine learning, pp 201–208
go back to reference Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
go back to reference Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems, Lecture Notes in Computer Science, vol 1857. Springer, Berlin, pp 1–15 Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems, Lecture Notes in Computer Science, vol 1857. Springer, Berlin, pp 1–15
go back to reference Freund Y (1990) Boosting a weak learning algorithm by majority. In: Proceedings of the third annual workshop on computational learning theory, pp 202–216 Freund Y (1990) Boosting a weak learning algorithm by majority. In: Proceedings of the third annual workshop on computational learning theory, pp 202–216
go back to reference Gamberger D, Lavrač N, Džeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14(2):205–223CrossRef Gamberger D, Lavrač N, Džeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14(2):205–223CrossRef
go back to reference Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18CrossRef Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18CrossRef
go back to reference Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24:289–300CrossRef Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24:289–300CrossRef
go back to reference John G.H (1995) Robust decision trees: removing outliers from databases. In: Knowledge discovery and data mining, pp 174–179 John G.H (1995) Robust decision trees: removing outliers from databases. In: Knowledge discovery and data mining, pp 174–179
go back to reference Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207CrossRefMATH Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207CrossRefMATH
go back to reference Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. In: In Proceedings of the 18th international conference on machine learning, pp 306–313 Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. In: In Proceedings of the 18th international conference on machine learning, pp 306–313
go back to reference Lee J, Giraud-Carrier C (2011) A metric for unsupervised metalearning. Intell Data Anal 15(6):827–841 Lee J, Giraud-Carrier C (2011) A metric for unsupervised metalearning. Intell Data Anal 15(6):827–841
go back to reference Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306CrossRef Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306CrossRef
go back to reference Ng AY, Jordan MI (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems, vol 14, pp 841–848 Ng AY, Jordan MI (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems, vol 14, pp 841–848
go back to reference Opitz DW, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198MATH Opitz DW, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198MATH
go back to reference Orriols-Puig A, Macià N, Bernadó-Mansilla E, Ho TK (2009) Documentation for the data complexity library in C\(++\). Tech. Rep. 2009001, La Salle—Universitat Ramon Llull Orriols-Puig A, Macià N, Bernadó-Mansilla E, Ho TK (2009) Documentation for the data complexity library in C\(++\). Tech. Rep. 2009001, La Salle—Universitat Ramon Llull
go back to reference Peterson AH, Martinez TR (2005) Estimating the potential for combining learning models. In: Proceedings of the ICML Workshop on meta-learning, pp 68–75 Peterson AH, Martinez TR (2005) Estimating the potential for combining learning models. In: Proceedings of the ICML Workshop on meta-learning, pp 68–75
go back to reference Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45CrossRef Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45CrossRef
go back to reference Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
go back to reference Rebbapragada U, Brodley CE (2007) Class noise mitigation through instance weighting. In: Proceedings of the 18th European conference on machine learning, pp 708–715 Rebbapragada U, Brodley CE (2007) Class noise mitigation through instance weighting. In: Proceedings of the 18th European conference on machine learning, pp 708–715
go back to reference Sáez JA, Luengo J, Herrera F (2013) Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit 46(1):355–364CrossRef Sáez JA, Luengo J, Herrera F (2013) Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit 46(1):355–364CrossRef
go back to reference Salojärvi J, Puolamäki K, Simola J, Kovanen L, Kojo I, Kaski S (2005) Inferring relevance from eye movements: feature extraction. Tech. Rep. A82, Helsinki University of Technology Salojärvi J, Puolamäki K, Simola J, Kovanen L, Kojo I, Kaski S (2005) Inferring relevance from eye movements: feature extraction. Tech. Rep. A82, Helsinki University of Technology
go back to reference Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227 Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
go back to reference Segata N, Blanzieri E, Cunningham P (2009) A scalable noise reduction technique for large case-based systems. In: Proceedings of the 8th international conference on case-based reasoning: case-based reasoning research and development, pp 328–342 Segata N, Blanzieri E, Cunningham P (2009) A scalable noise reduction technique for large case-based systems. In: Proceedings of the 8th international conference on case-based reasoning: case-based reasoning research and development, pp 328–342
go back to reference Smith MR, Martinez T (2011) Improving classification accuracy by identifying and removing instances that should be misclassified. In: Proceedings of the IEEE international joint conference on neural networks, pp 2690–2697 Smith MR, Martinez T (2011) Improving classification accuracy by identifying and removing instances that should be misclassified. In: Proceedings of the IEEE international joint conference on neural networks, pp 2690–2697
go back to reference Smith MR, Martinez T (2014) Reducing the effects of detrimental instances. In: Proceedings of the 13th international conference on machine learning and applications, pp 183–188 Smith MR, Martinez T (2014) Reducing the effects of detrimental instances. In: Proceedings of the 13th international conference on machine learning and applications, pp 183–188
go back to reference Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256MathSciNetCrossRef Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256MathSciNetCrossRef
go back to reference Teng C (2003) Combining noise correction with feature selection. Data warehousing and knowledge discovery, Lecture Notes in Computer Science, vol 2737, pp 340–349 Teng C (2003) Combining noise correction with feature selection. Data warehousing and knowledge discovery, Lecture Notes in Computer Science, vol 2737, pp 340–349
go back to reference Teng CM (2000) Evaluating noise correction. In: PRICAI, pp 188–198 Teng CM (2000) Evaluating noise correction. In: PRICAI, pp 188–198
go back to reference Thomson K, McQueen RJ (1996) Machine learning applied to fourteen agricultural datasets. Tech. Rep. 96/18, The University of Waikato Thomson K, McQueen RJ (1996) Machine learning applied to fourteen agricultural datasets. Tech. Rep. 96/18, The University of Waikato
go back to reference Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6:448–452MathSciNetMATH Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6:448–452MathSciNetMATH
go back to reference Verbaeten S, Van Assche A (2003) Ensemble methods for noise elimination in classification problems. In: Proceedings of the 4th international conference on multiple classifier systems, pp 317–325 Verbaeten S, Van Assche A (2003) Ensemble methods for noise elimination in classification problems. In: Proceedings of the 4th international conference on multiple classifier systems, pp 317–325
go back to reference Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286CrossRefMATH Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286CrossRefMATH
go back to reference Zeng X, Martinez TR (2001) An algorithm for correcting mislabeled data. Intell Data Anal 5:491–502MATH Zeng X, Martinez TR (2001) An algorithm for correcting mislabeled data. Intell Data Anal 5:491–502MATH
go back to reference Zeng X, Martinez TR (2003) A noise filtering method using neural networks. In: Proceedings of the international workshop of soft computing techniques in instrumentation, measurement and related applications Zeng X, Martinez TR (2003) A noise filtering method using neural networks. In: Proceedings of the international workshop of soft computing techniques in instrumentation, measurement and related applications
go back to reference Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22:177–210CrossRefMATH Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22:177–210CrossRefMATH
Metadata
Title
The robustness of majority voting compared to filtering misclassified instances in supervised classification tasks
Authors
Michael R. Smith
Tony Martinez
Publication date
22-09-2016
Publisher
Springer Netherlands
Published in
Artificial Intelligence Review / Issue 1/2018
Print ISSN: 0269-2821
Electronic ISSN: 1573-7462
DOI
https://doi.org/10.1007/s10462-016-9518-2

Other articles of this Issue 1/2018

Artificial Intelligence Review 1/2018 Go to the issue

Premium Partner