nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

CrowdTeacher: Robust Co-teaching with Noisy Answers and Sample-Specific Perturbations for Tabular Data

verfasst von : Mani Sotoodeh, Li Xiong, Joyce Ho

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Samples with ground truth labels may not always be available in numerous domains. While learning from crowdsourcing labels has been explored, existing models can still fail in the presence of sparse, unreliable, or differing annotations. Co-teaching methods have shown promising improvements for computer vision problems with noisy labels by employing two classifiers trained on each others’ confident samples in each batch. Inspired by the idea of separating confident and uncertain samples during the training process, we extend it for the crowdsourcing problem. Our model, CrowdTeacher, uses the idea that perturbation in the input space model can improve the robustness of the classifier for noisy labels. Treating crowdsourcing annotations as a source of noisy labeling, we perturb samples based on the certainty from the aggregated annotations. The perturbed samples are fed to a Co-teaching algorithm tuned to also accommodate smaller tabular data. We showcase the boost in predictive power attained using CrowdTeacher for both synthetic and real datasets across various label density settings. Our experiments reveal that our proposed approach beats baselines modeling individual annotations and then combining them, methods simultaneously learning a classifier and inferring truth labels, and the Co-teaching algorithm with aggregated labels through common truth inference methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Progressive AutoSpeech: An Efficient and General Framework for Automatic Speech Classification

Nächstes Kapitel Effective and Adaptive Refined Multi-metric Similarity Graph Fusion for Multi-view Clustering

https://github.com/manisci/CrowdTeacher.

Albarqouni, S., Baur, C., Achilles, F., Belagiannis, V., Demirci, S., Navab, N.: Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Trans. Med. Imaging 35(5), 1313–1321 (2016)CrossRef

Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: a holistic approach to semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 5049–5059 (2019)

Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 20–28 (1979)CrossRef

Guan, M.Y., Gulshan, V., Dai, A.M., Hinton, G.E.: Who said what: modeling individual labelers improves classification. arXiv preprint arXiv:1703.08774 (2017)

Han, B., et al.: Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Advances in Neural Information Processing Systems, pp. 8527–8537 (2018)

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3, 1–9 (2016)CrossRef

Mobadersany, P., et al.: Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl. Acad. Sci. 115(13), E2970–E2979 (2018)CrossRef

Nguyen, V.A., et al.: CLARA: confidence of labels and raters, pp. 2542–2552. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3394486.3403304

10.

Patki, N., Wedge, R., Veeramachaneni, K.: The synthetic data vault. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 399–410 (2016). https://doi.org/10.1109/DSAA.2016.49

11.

Rodrigues, F., Pereira, F.: Deep learning from crowds. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)

12.

Soans, N., Asali, E., Hong, Y., Doshi, P.: Sa-net: robust state-action recognition for learning from observations. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 2153–2159. IEEE (2020)

13.

Tahmasebian, F., Xiong, L., Sotoodeh, M., Sunderam, V.: Edgeinfer: robust truth inference under data poisoning attack. In: 2020 IEEE International Conference on Smart Data Services (SMDS), pp. 45–52 (2020). https://doi.org/10.1109/SMDS49396.2020.00013

14.

Tahmasebian, F., Xiong, L., Sotoodeh, M., Sunderam, V.: Crowdsourcing under data poisoning attacks: a comparative study. In: Singhal, A., Vaidya, J. (eds.) DBSec 2020. LNCS, vol. 12122, pp. 310–332. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49669-2_18CrossRef

15.

Waugh, S.M., Bergquist-Beringer, S.: Inter-rater agreement of pressure ulcer risk and prevention measures in the national database of nursing quality indicators (ndnqi). Res. Nurs. Health 39(3), 164–174 (2016)CrossRef

16.

Xu, L., Skoularidou, M., Cuesta-Infante, A., Veeramachaneni, K.: Modeling tabular data using conditional GAN. In: Advances in Neural Information Processing Systems, pp. 7335–7345 (2019)

17.

Zhang, Z., Zhang, H., Arik, S.O., Lee, H., Pfister, T.: Distilling effective supervision from severe label noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9294–9303 (2020)

Titel: CrowdTeacher: Robust Co-teaching with Noisy Answers and Sample-Specific Perturbations for Tabular Data
verfasst von: Mani Sotoodeh
Li Xiong
Joyce Ho
Verlag: Springer International Publishing
Buch: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-3-030-75764-9

Electronic ISBN: 978-3-030-75765-6

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-75765-6_15

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"