Skip to main content

30.08.2018 | Theoretical Advances

Towards instance-dependent label noise-tolerant classification: a probabilistic approach

verfasst von: Jakramate Bootkrajang, Jeerayut Chaijaruwanich

Erschienen in: Pattern Analysis and Applications

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Learning from labelled data is becoming more and more challenging due to inherent imperfection of training labels. Existing label noise-tolerant learning machines were primarily designed to tackle class-conditional noise which occurs at random, independently from input instances. However, relatively less attention was given to a more general type of label noise which is influenced by input features. In this paper, we try to address the problem of learning a classifier in the presence of instance-dependent label noise by developing a novel label noise model which is expected to capture the variation of label noise rate within a class. This is accomplished by adopting a probability density function of a mixture of Gaussians to approximate the label flipping probabilities. Experimental results demonstrate the effectiveness of the proposed method over existing approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We used LIBLINEAR [31] in this study.
 
Literatur
1.
Zurück zum Zitat Beigman E, Klebanov BB (2009) Learning with annotation noise. In: ACL 2009, Proceedings of the 47th annual meeting of the association for computational linguistics, 2–7 August 2009, Singapore, pp 280–287 Beigman E, Klebanov BB (2009) Learning with annotation noise. In: ACL 2009, Proceedings of the 47th annual meeting of the association for computational linguistics, 2–7 August 2009, Singapore, pp 280–287
2.
Zurück zum Zitat Kolcz A, Cormack GV (2009) Genre-based decomposition of email class noise. In: SIGKDD’09, pp 427–436 Kolcz A, Cormack GV (2009) Genre-based decomposition of email class noise. In: SIGKDD’09, pp 427–436
3.
Zurück zum Zitat Johnson BA, Iizuka K (2016) Integrating openstreetmap crowdsourced data and landsat time-series imagery for rapid land use/land cover (LULC) mapping: case study of the laguna de bay area of the philippines. Appl Geogr 67:140–149CrossRef Johnson BA, Iizuka K (2016) Integrating openstreetmap crowdsourced data and landsat time-series imagery for rapid land use/land cover (LULC) mapping: case study of the laguna de bay area of the philippines. Appl Geogr 67:140–149CrossRef
4.
Zurück zum Zitat Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In: EMNLP, pp 254–263 Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In: EMNLP, pp 254–263
5.
Zurück zum Zitat Shen D, Ruvini J-D, Sarwar B (2012) Large-scale item categorization for e-commerce. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12, New York, NY, USA. ACM, pp 595–604 Shen D, Ruvini J-D, Sarwar B (2012) Large-scale item categorization for e-commerce. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12, New York, NY, USA. ACM, pp 595–604
6.
Zurück zum Zitat Xiao T, Xia T, Yang Y, Huang C, Wang X (2015) Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2691–2699 Xiao T, Xia T, Yang Y, Huang C, Wang X (2015) Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2691–2699
7.
Zurück zum Zitat Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRef Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRef
8.
Zurück zum Zitat Menon AK, van Rooyen B, Natarajan N (2016) Learning from binary labels with instance-dependent corruption. arXiv preprint arXiv:1605.00751 Menon AK, van Rooyen B, Natarajan N (2016) Learning from binary labels with instance-dependent corruption. arXiv preprint arXiv:​1605.​00751
9.
Zurück zum Zitat Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. In: ACML, volume 20 of JMLR proceedings, pp 97–112. JMLR.org Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. In: ACML, volume 20 of JMLR proceedings, pp 97–112. JMLR.org
10.
Zurück zum Zitat Chhikara RS, McKeon J (1984) Linear discriminant analysis with misallocation in training samples. J Am Stat Assoc 79(388):899–906MathSciNetCrossRef Chhikara RS, McKeon J (1984) Linear discriminant analysis with misallocation in training samples. J Am Stat Assoc 79(388):899–906MathSciNetCrossRef
11.
Zurück zum Zitat Lawrence ND, Schölkopf B (2001) Estimating a Kernel fisher discriminant in the presence of label noise. In: ICML’01. Morgan Kaufmann, pp 306–313 Lawrence ND, Schölkopf B (2001) Estimating a Kernel fisher discriminant in the presence of label noise. In: ICML’01. Morgan Kaufmann, pp 306–313
12.
Zurück zum Zitat Li Y, Wessels LFA, de Ridder D, Reinders MJT (2007) Classification in the presence of class noise using a probabilistic kernel Fisher method. Pattern Recognit 40(12):3349–3357CrossRef Li Y, Wessels LFA, de Ridder D, Reinders MJT (2007) Classification in the presence of class noise using a probabilistic kernel Fisher method. Pattern Recognit 40(12):3349–3357CrossRef
13.
Zurück zum Zitat Raykar VC, Shipeng Y, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNet Raykar VC, Shipeng Y, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNet
14.
Zurück zum Zitat Bootkrajang J, Kabán A (2012) Label-noise robust logistic regression and its applications. In: ECML-PKDD’12, pp 143–158CrossRef Bootkrajang J, Kabán A (2012) Label-noise robust logistic regression and its applications. In: ECML-PKDD’12, pp 143–158CrossRef
15.
Zurück zum Zitat Bootkrajang J, Kabán A (2014) Learning kernel logistic regression in the presence of class label noise. Pattern Recognit 47(11):3641–3655CrossRef Bootkrajang J, Kabán A (2014) Learning kernel logistic regression in the presence of class label noise. Pattern Recognit 47(11):3641–3655CrossRef
17.
Zurück zum Zitat Long PM, Servedio RA (2010) Random classification noise defeats all convex potential boosters. Mach Learn 78(3):287–304MathSciNetCrossRef Long PM, Servedio RA (2010) Random classification noise defeats all convex potential boosters. Mach Learn 78(3):287–304MathSciNetCrossRef
18.
Zurück zum Zitat Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: NIPS’13, pp 1196–1204 Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: NIPS’13, pp 1196–1204
19.
Zurück zum Zitat Manwani N, Sastry PS (2013) Noise tolerance under risk minimization. IEEE Trans Cybernet 43(3):1146–1151CrossRef Manwani N, Sastry PS (2013) Noise tolerance under risk minimization. IEEE Trans Cybernet 43(3):1146–1151CrossRef
20.
Zurück zum Zitat Ghosh A, Manwani N, Sastry PS (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107CrossRef Ghosh A, Manwani N, Sastry PS (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107CrossRef
21.
Zurück zum Zitat Lachenbruch PA (1974) Discriminant analysis when the initial samples are misclassified II: non-random misclassification models. Technometrics 16(3):419–424CrossRef Lachenbruch PA (1974) Discriminant analysis when the initial samples are misclassified II: non-random misclassification models. Technometrics 16(3):419–424CrossRef
22.
Zurück zum Zitat Bootkrajang J (2016) A generalised label noise model for classification in the presence of annotation errors. Neurocomputing 192:61–71CrossRef Bootkrajang J (2016) A generalised label noise model for classification in the presence of annotation errors. Neurocomputing 192:61–71CrossRef
23.
Zurück zum Zitat Du J, Cai Z (2015) Modelling class noise with symmetric and asymmetric distributions. In: AAAI, pp 2589–2595 Du J, Cai Z (2015) Modelling class noise with symmetric and asymmetric distributions. In: AAAI, pp 2589–2595
26.
Zurück zum Zitat West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA Jr, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98(20):11462–11467CrossRef West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA Jr, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98(20):11462–11467CrossRef
27.
Zurück zum Zitat Alon U, Barkai N, Notterman DA, Gishdagger K, Ybarradagger S, Mackdagger D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750CrossRef Alon U, Barkai N, Notterman DA, Gishdagger K, Ybarradagger S, Mackdagger D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750CrossRef
28.
Zurück zum Zitat Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef
30.
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
31.
Zurück zum Zitat Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874MATH Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874MATH
Metadaten
Titel
Towards instance-dependent label noise-tolerant classification: a probabilistic approach
verfasst von
Jakramate Bootkrajang
Jeerayut Chaijaruwanich
Publikationsdatum
30.08.2018
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-018-0750-z

Premium Partner