Skip to main content
Top
Published in: Soft Computing 18/2019

08-04-2019 | Focus

A taxonomy on impact of label noise and feature noise using machine learning techniques

Authors: A. Shanthini, G. Vinodhini, R. M. Chandrasekaran, P. Supraja

Published in: Soft Computing | Issue 18/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Soft computing techniques are effective techniques that are used in prediction of noise in the dataset which causes misclassification. In classification, it is expected to have perfect labeling, but the noise present in data has impact on the label mapped and influences the input values by affecting the input feature values of the instances. Existence of noise complicates prediction in the real-world data which leads to vicious effect of the classifier. Present study aims at quantitative assessment of label noise and feature noise through machine learning, and classification performance in medical datasets as noise handling has become an important aspect in the research work related to data mining and its application. Weak classifier boosting provides high standard accuracy levels in classification problems. This study explores the performance of most recent soft computing technique in machine learning which includes weak learner-based boosting algorithms, such as adaptive boosting, generalized tree boosting and extreme gradient boosting. Current study was made to compare and analyze disparate boosting algorithms in divergent noise and feature levels (5%, 10%, 15% and 20%) on distinct medical datasets. The performances of weak learners are measured in terms of accuracy and equalized loss of accuracy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837CrossRef Abellán J, Masegosa AR (2012) Bagging schemes on the presence of class noise in classification. Expert Syst Appl 39(8):6827–6837CrossRef
go back to reference Cao J, Kwong S, Wang R (2012) A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recogn 45(12):4451–4465CrossRefMATH Cao J, Kwong S, Wang R (2012) A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recogn 45(12):4451–4465CrossRefMATH
go back to reference Carlotto MJ (2009) Effect of errors in ground truth on classification accuracy. Int J Remote Sens 30:4831–4849 (Remote Sens 2017, 9: 173 23 of 24) CrossRef Carlotto MJ (2009) Effect of errors in ground truth on classification accuracy. Int J Remote Sens 30:4831–4849 (Remote Sens 2017, 9: 173 23 of 24) CrossRef
go back to reference Chen T, He T, Benesty M (2015) XGBoost: extreme gradient boosting. R package version 0.4-2, pp 1–4 Chen T, He T, Benesty M (2015) XGBoost: extreme gradient boosting. R package version 0.4-2, pp 1–4
go back to reference Folleco A, Khoshgoftaar TM, Hulse JV, Napolitano A (2009) Identifying learners robust to low quality data. Informatica 33:245–259MathSciNetMATH Folleco A, Khoshgoftaar TM, Hulse JV, Napolitano A (2009) Identifying learners robust to low quality data. Informatica 33:245–259MathSciNetMATH
go back to reference Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey, IEEE Trans. Neural Netw Learn Syst 25:845–869CrossRef Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey, IEEE Trans. Neural Netw Learn Syst 25:845–869CrossRef
go back to reference Garcia LP, de Carvalho AC, Lorena AC (2015) Effect of label noise in the complexity of classification problems. Neurocomputing 160:108–119CrossRef Garcia LP, de Carvalho AC, Lorena AC (2015) Effect of label noise in the complexity of classification problems. Neurocomputing 160:108–119CrossRef
go back to reference Görnitz N, Porbadnigk A, Binder A, Sannelli C, Braun ML, Müller KR, Kloft M (2014) Learning and evaluation in presence of non-IID label noise. In: Proceedings of the international conference on artificial intelligence and statistics, Reykjavik, Iceland, 22–25 April 2014, pp 293–302 Görnitz N, Porbadnigk A, Binder A, Sannelli C, Braun ML, Müller KR, Kloft M (2014) Learning and evaluation in presence of non-IID label noise. In: Proceedings of the international conference on artificial intelligence and statistics, Reykjavik, Iceland, 22–25 April 2014, pp 293–302
go back to reference Karmaker A, Kwek S (2006) A boosting approach to remove class label noise. Int J Hybrid Intell Syst 3(3):169–177CrossRefMATH Karmaker A, Kwek S (2006) A boosting approach to remove class label noise. Int J Hybrid Intell Syst 3(3):169–177CrossRefMATH
go back to reference Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A Syst Hum 41(3):552–568CrossRef Khoshgoftaar TM, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A Syst Hum 41(3):552–568CrossRef
go back to reference Kottilingam K, Gunasekaran R, Saranya K (2016) A data activity-based server-side cache replacement for mobile devices. In: Proceedings on artificial intelligence and evolutionary computations in engineering systems. Springer, pp 579–589 Kottilingam K, Gunasekaran R, Saranya K (2016) A data activity-based server-side cache replacement for mobile devices. In: Proceedings on artificial intelligence and evolutionary computations in engineering systems. Springer, pp 579–589
go back to reference Li Y, Wessels LF, de Ridder D, Reinders MJ (2007) Classification in the presence of class noise using a probabilistic kernel fisher method. Pattern Recogn 40(12):3349–3357CrossRefMATH Li Y, Wessels LF, de Ridder D, Reinders MJ (2007) Classification in the presence of class noise using a probabilistic kernel fisher method. Pattern Recogn 40(12):3349–3357CrossRefMATH
go back to reference Mantas CJ, Abellán J (2014) Analysis and extension of decision trees based on imprecise probabilities: application on noisy data. Expert Syst Appl 41(5):2514–2525CrossRef Mantas CJ, Abellán J (2014) Analysis and extension of decision trees based on imprecise probabilities: application on noisy data. Expert Syst Appl 41(5):2514–2525CrossRef
go back to reference Mellor A, Boukir S, Haywood A, Jones S (2015) Exploring issues of training data imbalance and mislabeling on Random Forest performance for large area land cover classification using the ensemble margin. ISPRS J Photogramm Remote Sens 105:155–168CrossRef Mellor A, Boukir S, Haywood A, Jones S (2015) Exploring issues of training data imbalance and mislabeling on Random Forest performance for large area land cover classification using the ensemble margin. ISPRS J Photogramm Remote Sens 105:155–168CrossRef
go back to reference Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, vol 26. Curran Associates, Inc., Lake Tahoe, USA, pp 1196–1204 Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, vol 26. Curran Associates, Inc., Lake Tahoe, USA, pp 1196–1204
go back to reference Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. In: Proceedings of the 19th IEEE symposium on computer-based medical systems (CBMS’06), Salt Lake City, UT, USA, 22–23 June 2006, pp 708–713 Pechenizkiy M, Tsymbal A, Puuronen S, Pechenizkiy O (2006) Class noise and supervised learning in medical domains: the effect of feature extraction. In: Proceedings of the 19th IEEE symposium on computer-based medical systems (CBMS’06), Salt Lake City, UT, USA, 22–23 June 2006, pp 708–713
go back to reference Sabzevari M, Martínez-Muñoz G, Suárez A (2018) Vote-boosting ensembles. Pattern Recogn 83:119–133CrossRef Sabzevari M, Martínez-Muñoz G, Suárez A (2018) Vote-boosting ensembles. Pattern Recogn 83:119–133CrossRef
go back to reference Saez JA, Luengo J, Herrera F (2016) Evaluating the classifier behaviour with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176:26–35CrossRef Saez JA, Luengo J, Herrera F (2016) Evaluating the classifier behaviour with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176:26–35CrossRef
go back to reference Seiffert C, Khoshgoftaar TM, Van Hulse J, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595CrossRef Seiffert C, Khoshgoftaar TM, Van Hulse J, Folleco A (2014) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Inf Sci 259:571–595CrossRef
go back to reference Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160:120–131CrossRef Sluban B, Lavrač N (2015) Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160:120–131CrossRef
go back to reference Sun B, Chen S, Wang J, Chen H (2016) A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl Based Syst 102:87–102CrossRef Sun B, Chen S, Wang J, Chen H (2016) A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl Based Syst 102:87–102CrossRef
go back to reference Teng CM (2001) A comparison of noise handling techniques. In: Proceedings of the international florida artificial intelligence research society conference, Key West, FL, USA, 21–23 May 2001, pp 269–273 Teng CM (2001) A comparison of noise handling techniques. In: Proceedings of the international florida artificial intelligence research society conference, Key West, FL, USA, 21–23 May 2001, pp 269–273
go back to reference Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015a) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62CrossRef Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015a) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62CrossRef
go back to reference Xiao T, Xia T, Yang Y, Huang C, Wang X (2015b) Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015, pp 2691–2699 Xiao T, Xia T, Yang Y, Huang C, Wang X (2015b) Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, 7–12 June 2015, pp 2691–2699
Metadata
Title
A taxonomy on impact of label noise and feature noise using machine learning techniques
Authors
A. Shanthini
G. Vinodhini
R. M. Chandrasekaran
P. Supraja
Publication date
08-04-2019
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 18/2019
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-03968-7

Other articles of this Issue 18/2019

Soft Computing 18/2019 Go to the issue

Premium Partner