Skip to main content

2015 | OriginalPaper | Buchkapitel

Identifying and Mitigating Labelling Errors in Active Learning

verfasst von : Mohamed-Rafik Bouguelia, Yolande Belaïd, Abdel Belaïd

Erschienen in: Pattern Recognition: Applications and Methods

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most existing active learning methods for classification, assume that the observed labels (i.e. given by a human labeller) are perfectly correct. However, in real world applications, the labeller is usually subject to labelling errors that reduce the classification accuracy of the learned model. In this paper, we address this issue for active learning in the streaming setting and we try to answer the following questions: (1) which labelled instances are most likely to be mislabelled? (2) is it always good to abstain from learning when data is suspected to be mislabelled? (3) which mislabelled instances require relabelling? We propose a hybrid active learning strategy based on two measures. The first measure allows to filter the potentially mislabelled instances, based on the degree of disagreement among the manually given label and the predicted class label. The second measure allows to select (for relabelling) only the most informative instances that deserve to be corrected. An instance is worth relabelling if it shows highly conflicting information among the predicted and the queried labels. Experiments on several real world data show that filtering mislabelled instances according to the first measure and relabelling few instances selected according to the second measure, greatly improves the classification accuracy of the stream-based active learning.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zliobaite, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–39 (2014)CrossRef Zliobaite, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–39 (2014)CrossRef
2.
Zurück zum Zitat Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, pp. 313–326 (2014) Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, pp. 313–326 (2014)
3.
Zurück zum Zitat Huang, L., Liu, Y., Liu, X., Wang, X., Lang, B.: Graph-based active semi-supervised learning: a new perspective for relieving multi-class annotation labor. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2014) Huang, L., Liu, Y., Liu, X., Wang, X., Lang, B.: Graph-based active semi-supervised learning: a new perspective for relieving multi-class annotation labor. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2014)
4.
Zurück zum Zitat Kushnir, D.: Active-transductive learning with label-adapted kernels. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 462–471 (2014) Kushnir, D.: Active-transductive learning with label-adapted kernels. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 462–471 (2014)
5.
Zurück zum Zitat Settles, B.: Active learning. In: Synthesis Lectures on Artificial Intelligence and Machine Learning, pp. 1–114 (2012) Settles, B.: Active learning. In: Synthesis Lectures on Artificial Intelligence and Machine Learning, pp. 1–114 (2012)
6.
Zurück zum Zitat Bouguelia, M-R., Belaïd, Y., Belaïd, A.: A stream-based semi-supervised active learning approach for document classification. In: IEEE International Conference on Document Analysis and Recognition, pp. 611–615 (2013) Bouguelia, M-R., Belaïd, Y., Belaïd, A.: A stream-based semi-supervised active learning approach for document classification. In: IEEE International Conference on Document Analysis and Recognition, pp. 611–615 (2013)
7.
Zurück zum Zitat Goldberg, A., Zhu, X., Furger, A., Xu, J.M.: OASIS: online active semi-supervised learning. In: AAAI Conference on Artificial Intelligence, pp. 1–6 (2011) Goldberg, A., Zhu, X., Furger, A., Xu, J.M.: OASIS: online active semi-supervised learning. In: AAAI Conference on Artificial Intelligence, pp. 1–6 (2011)
8.
Zurück zum Zitat Dasgupta, S.: Coarse sample complexity bounds for active learning. In: Neural Information Processing Systems (NIPS), pp. 235–242 (2005) Dasgupta, S.: Coarse sample complexity bounds for active learning. In: Neural Information Processing Systems (NIPS), pp. 235–242 (2005)
9.
Zurück zum Zitat Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)CrossRef Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)CrossRef
10.
Zurück zum Zitat Zhu, X., Zhang, P., Wu, X., He, D., Zhang, C., Shi, Y.: Cleansing noisy data streams. In: IEEE International Conference on Data Mining, pp. 1139–1144 (2008) Zhu, X., Zhang, P., Wu, X., He, D., Zhang, C., Shi, Y.: Cleansing noisy data streams. In: IEEE International Conference on Data Mining, pp. 1139–1144 (2008)
11.
Zurück zum Zitat Rebbapragada, U., Brodley, C.E., Sulla-Menashe, D., Friedl, M.A.: Active label correction. In: IEEE International Conference on Data Mining, pp. 1080–1085 (2012) Rebbapragada, U., Brodley, C.E., Sulla-Menashe, D., Friedl, M.A.: Active label correction. In: IEEE International Conference on Data Mining, pp. 1080–1085 (2012)
12.
Zurück zum Zitat Fang, M., Zhu, X.: Active learning with uncertain labeling knowledge. Pattern Recogn. Lett. 43, 98–108 (2013)CrossRef Fang, M., Zhu, X.: Active learning with uncertain labeling knowledge. Pattern Recogn. Lett. 43, 98–108 (2013)CrossRef
13.
Zurück zum Zitat Tuia, D., Munoz-Mari, J.: Learning user’s confidence for active learning. IEEE Trans. Geosci. Remote Sens. 51(2), 872–880 (2013)CrossRef Tuia, D., Munoz-Mari, J.: Learning user’s confidence for active learning. IEEE Trans. Geosci. Remote Sens. 51(2), 872–880 (2013)CrossRef
14.
Zurück zum Zitat Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple noisy labelers. In: ACM Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple noisy labelers. In: ACM Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008)
15.
Zurück zum Zitat Ipeirotis, P.G., Provost, F., Sheng, V.S., Wang, J.: Repeated labeling using multiple noisy labelers. In: ACM Conference on Knowledge Discovery and Data Mining, pp. 402–441 (2014) Ipeirotis, P.G., Provost, F., Sheng, V.S., Wang, J.: Repeated labeling using multiple noisy labelers. In: ACM Conference on Knowledge Discovery and Data Mining, pp. 402–441 (2014)
16.
Zurück zum Zitat Yan, Y., Fung, G.M., Rosales, R., Dy, J.G.: Active learning from crowds. In: International Conference on Machine Learning, pp. 1161–1168 (2011) Yan, Y., Fung, G.M., Rosales, R., Dy, J.G.: Active learning from crowds. In: International Conference on Machine Learning, pp. 1161–1168 (2011)
17.
Zurück zum Zitat Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)CrossRefMATH Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)CrossRefMATH
18.
Zurück zum Zitat Sun, S.: A survey of multi-view machine learning. Neural Comput. Appl. 23(7–8), 2031–2038 (2013)CrossRef Sun, S.: A survey of multi-view machine learning. Neural Comput. Appl. 23(7–8), 2031–2038 (2013)CrossRef
19.
Zurück zum Zitat Gamberger, D., Lavrac, N., Dzeroski, S.: Noise elimination in inductive concept learning: a case study in medical diagnosis. In: Arikawa, Setsuo, Sharma, A.K. (eds.) ALT 1996. LNCS, vol. 1160, pp. 199–212. Springer, Heidelberg (1996) CrossRef Gamberger, D., Lavrac, N., Dzeroski, S.: Noise elimination in inductive concept learning: a case study in medical diagnosis. In: Arikawa, Setsuo, Sharma, A.K. (eds.) ALT 1996. LNCS, vol. 1160, pp. 199–212. Springer, Heidelberg (1996) CrossRef
20.
Zurück zum Zitat Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)MATH Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)MATH
21.
Zurück zum Zitat Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
Metadaten
Titel
Identifying and Mitigating Labelling Errors in Active Learning
verfasst von
Mohamed-Rafik Bouguelia
Yolande Belaïd
Abdel Belaïd
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-27677-9_3

Premium Partner