Skip to main content
Erschienen in: Soft Computing 18/2019

08.04.2019 | Focus

A taxonomy on impact of label noise and feature noise using machine learning techniques

verfasst von: A. Shanthini, G. Vinodhini, R. M. Chandrasekaran, P. Supraja

Erschienen in: Soft Computing | Ausgabe 18/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Soft computing techniques are effective techniques that are used in prediction of noise in the dataset which causes misclassification. In classification, it is expected to have perfect labeling, but the noise present in data has impact on the label mapped and influences the input values by affecting the input feature values of the instances. Existence of noise complicates prediction in the real-world data which leads to vicious effect of the classifier. Present study aims at quantitative assessment of label noise and feature noise through machine learning, and classification performance in medical datasets as noise handling has become an important aspect in the research work related to data mining and its application. Weak classifier boosting provides high standard accuracy levels in classification problems. This study explores the performance of most recent soft computing technique in machine learning which includes weak learner-based boosting algorithms, such as adaptive boosting, generalized tree boosting and extreme gradient boosting. Current study was made to compare and analyze disparate boosting algorithms in divergent noise and feature levels (5%, 10%, 15% and 20%) on distinct medical datasets. The performances of weak learners are measured in terms of accuracy and equalized loss of accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837CrossRef Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837CrossRef
Zurück zum Zitat Cao J, Kwong S, Wang R (2012) A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recogn 45(12):4451–4465CrossRefMATH Cao J, Kwong S, Wang R (2012) A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recogn 45(12):4451–4465CrossRefMATH
Zurück zum Zitat Carlotto MJ (2009) Effect of errors in ground truth on classification accuracy. Int J Remote Sens 30:4831–4849 (Remote Sens 2017, 9: 173 23 of 24) CrossRef Carlotto MJ (2009) Effect of errors in ground truth on classification accuracy. Int J Remote Sens 30:4831–4849 (Remote Sens 2017, 9: 173 23 of 24) CrossRef
Zurück zum Zitat Chen T, He T, Benesty M (2015) XGBoost: extreme gradient boosting. R package version 0.4-2, pp 1–4 Chen T, He T, Benesty M (2015) XGBoost: extreme gradient boosting. R package version 0.4-2, pp 1–4
Zurück zum Zitat Folleco A, Khoshgoftaar TM, Hulse JV, Napolitano A (2009) Identifying learners robust to low quality data. Informatica 33:245–259MathSciNetMATH Folleco A, Khoshgoftaar TM, Hulse JV, Napolitano A (2009) Identifying learners robust to low quality data. Informatica 33:245–259MathSciNetMATH
Zurück zum Zitat Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey, IEEE Trans. Neural Netw Learn Syst 25:845–869CrossRef Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey, IEEE Trans. Neural Netw Learn Syst 25:845–869CrossRef
Zurück zum Zitat Garcia LP, de Carvalho AC, Lorena AC (2015) Effect of label noise in the complexity of classification problems. Neurocomputing 160:108–119CrossRef Garcia LP, de Carvalho AC, Lorena AC (2015) Effect of label noise in the complexity of classification problems. Neurocomputing 160:108–119CrossRef
Zurück zum Zitat Görnitz N, Porbadnigk A, Binder A, Sannelli C, Braun ML, Müller KR, Kloft M (2014) Learning and evaluation in presence of non-IID label noise. In: Proceedings of the international conference on artificial intelligence and statistics, Reykjavik, Iceland, 22–25 April 2014, pp 293–302 Görnitz N, Porbadnigk A, Binder A, Sannelli C, Braun ML, Müller KR, Kloft M (2014) Learning and evaluation in presence of non-IID label noise. In: Proceedings of the international conference on artificial intelligence and statistics, Reykjavik, Iceland, 22–25 April 2014, pp 293–302
Zurück zum Zitat Karmaker A, Kwek S (2006) A boosting approach to remove class label noise. Int J Hybrid Intell Syst 3(3):169–177CrossRefMATH Karmaker A, Kwek S (2006) A boosting approach to remove class label noise. Int J Hybrid Intell Syst 3(3):169–177CrossRefMATH
Zurück zum Zitat Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A Syst Hum 41(3):552–568CrossRef Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A Syst Hum 41(3):552–568CrossRef
Zurück zum Zitat Kottilingam K, Gunasekaran R, Saranya K (2016) A data activity-based server-side cache replacement for mobile devices. In: Proceedings on artificial intelligence and evolutionary computations in engineering systems. Springer, pp 579–589 Kottilingam K, Gunasekaran R, Saranya K (2016) A data activity-based server-side cache replacement for mobile devices. In: Proceedings on artificial intelligence and evolutionary computations in engineering systems. Springer, pp 579–589
Zurück zum Zitat Li Y, Wessels LF, de Ridder D, Reinders MJ (2007) Classification in the presence of class noise using a probabilistic kernel fisher method. Pattern Recogn 40(12):3349–3357CrossRefMATH Li Y, Wessels LF, de Ridder D, Reinders MJ (2007) Classification in the presence of class noise using a probabilistic kernel fisher method. Pattern Recogn 40(12):3349–3357CrossRefMATH
Zurück zum Zitat Mantas CJ, Abellán J (2014) Analysis and extension of decision trees based on imprecise probabilities: application on noisy data. Expert Syst Appl 41(5):2514–2525CrossRef Mantas CJ, Abellán J (2014) Analysis and extension of decision trees based on imprecise probabilities: application on noisy data. Expert Syst Appl 41(5):2514–2525CrossRef
Zurück zum Zitat Mellor A, Boukir S, Haywood A, Jones S (2015) Exploring issues of training data imbalance and mislabeling on Random Forest performance for large area land cover classification using the ensemble margin. ISPRS J Photogramm Remote Sens 105:155–168CrossRef Mellor A, Boukir S, Haywood A, Jones S (2015) Exploring issues of training data imbalance and mislabeling on Random Forest performance for large area land cover classification using the ensemble margin. ISPRS J Photogramm Remote Sens 105:155–168CrossRef
Zurück zum Zitat Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, vol 26. Curran Associates, Inc., Lake Tahoe, USA, pp 1196–1204 Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, vol 26. Curran Associates, Inc., Lake Tahoe, USA, pp 1196–1204
Zurück zum Zitat Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. In: Proceedings of the 19th IEEE symposium on computer-based medical systems (CBMS’06), Salt Lake City, UT, USA, 22–23 June 2006, pp 708–713 Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. In: Proceedings of the 19th IEEE symposium on computer-based medical systems (CBMS’06), Salt Lake City, UT, USA, 22–23 June 2006, pp 708–713
Zurück zum Zitat Sabzevari M, Martínez-Muñoz G, Suárez A (2018) Vote-boosting ensembles. Pattern Recogn 83:119–133CrossRef Sabzevari M, Martínez-Muñoz G, Suárez A (2018) Vote-boosting ensembles. Pattern Recogn 83:119–133CrossRef
Zurück zum Zitat Saez JA, Luengo J, Herrera F (2016) Evaluating the classifier behaviour with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176:26–35CrossRef Saez JA, Luengo J, Herrera F (2016) Evaluating the classifier behaviour with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176:26–35CrossRef
Zurück zum Zitat Seiffert C, Khoshgoftaar TM, Van Hulse J, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595CrossRef Seiffert C, Khoshgoftaar TM, Van Hulse J, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595CrossRef
Zurück zum Zitat Shanthini A, Vinodhini G, Chandrasekaran RM (2018) Predicting students’ academic performance in the university using meta decision tree classifiers. J Comput Sci JCS. ISSN: 1552-6607. https://doi.org/10.3844/jcssp Shanthini A, Vinodhini G, Chandrasekaran RM (2018) Predicting students’ academic performance in the university using meta decision tree classifiers. J Comput Sci JCS. ISSN: 1552-6607. https://​doi.​org/​10.​3844/​jcssp
Zurück zum Zitat Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160:120–131CrossRef Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160:120–131CrossRef
Zurück zum Zitat Sun B, Chen S, Wang J, Chen H (2016) A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl Based Syst 102:87–102CrossRef Sun B, Chen S, Wang J, Chen H (2016) A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl Based Syst 102:87–102CrossRef
Zurück zum Zitat Teng CM (2001) A comparison of noise handling techniques. In: Proceedings of the international florida artificial intelligence research society conference, Key West, FL, USA, 21–23 May 2001, pp 269–273 Teng CM (2001) A comparison of noise handling techniques. In: Proceedings of the international florida artificial intelligence research society conference, Key West, FL, USA, 21–23 May 2001, pp 269–273
Zurück zum Zitat Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015a) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62CrossRef Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015a) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62CrossRef
Zurück zum Zitat Xiao T, Xia T, Yang Y, Huang C, Wang X (2015b) Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015, pp 2691–2699 Xiao T, Xia T, Yang Y, Huang C, Wang X (2015b) Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015, pp 2691–2699
Metadaten
Titel
A taxonomy on impact of label noise and feature noise using machine learning techniques
verfasst von
A. Shanthini
G. Vinodhini
R. M. Chandrasekaran
P. Supraja
Publikationsdatum
08.04.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 18/2019
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-03968-7

Weitere Artikel der Ausgabe 18/2019

Soft Computing 18/2019 Zur Ausgabe