Skip to main content

2020 | OriginalPaper | Buchkapitel

Impact of Noisy Labels in Learning Techniques: A Survey

verfasst von : Nitika Nigam, Tanima Dutta, Hari Prabhat Gupta

Erschienen in: Advances in Data and Information Sciences

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Noisy data is the main issue in classification. The possible sources of noise label can be insufficient availability of information or encoding/communication problems, or data entry error by experts/nonexperts, etc., which can deteriorate the model’s performance and accuracy. However, in a real-world dataset, like Flickr, the likelihood of containing the noisy label is high. Initially, few methods such as identification, correcting, and elimination of noisy data was used to enhance the performance. Various machine learning algorithms are used to diminish the noisy environment, but in the recent studies, deep learning models are resolving this issue. In this survey, a brief introduction about the solution for the noisy label is provided.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Angluin, D., & Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4), 343–370. Angluin, D., & Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4), 343–370.
2.
Zurück zum Zitat Azadi, S., Feng, J., Jegelka, S., & Darrell, T. (2015). Auxiliary image regularization for deep cnns with noisy labels. arXiv:151107069. Azadi, S., Feng, J., Jegelka, S., & Darrell, T. (2015). Auxiliary image regularization for deep cnns with noisy labels. arXiv:​151107069.
3.
Zurück zum Zitat Biggio, B., Nelson, B., Laskov, P. (2011). Support vector machines under adversarial label noise. In Asian Conference on Machine Learning (pp. 97–112). Biggio, B., Nelson, B., Laskov, P. (2011). Support vector machines under adversarial label noise. In Asian Conference on Machine Learning (pp. 97–112).
5.
Zurück zum Zitat Bouveyron, C., & Girard, S. (2009). Robust supervised classification with mixture models: Learning from data with uncertain labels. Pattern Recognition, 42(11), 2649–2658.CrossRef Bouveyron, C., & Girard, S. (2009). Robust supervised classification with mixture models: Learning from data with uncertain labels. Pattern Recognition, 42(11), 2649–2658.CrossRef
6.
Zurück zum Zitat Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 131–167.CrossRef Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 131–167.CrossRef
7.
Zurück zum Zitat Cantador, I., Dorronsoro, J. R. (2005). Boosting parallel perceptrons for label noise reduction in classification problems. In International Work-Conference on the Interplay Between Natural and Artificial Computation (pp. 586–593). Springer. Cantador, I., Dorronsoro, J. R. (2005). Boosting parallel perceptrons for label noise reduction in classification problems. In International Work-Conference on the Interplay Between Natural and Artificial Computation (pp. 586–593). Springer.
8.
Zurück zum Zitat Chen, X., Gupta, A. (2015). Webly supervised learning of convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1431–1439). Chen, X., Gupta, A. (2015). Webly supervised learning of convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1431–1439).
9.
Zurück zum Zitat Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.CrossRef Frénay, B., & Verleysen, M. (2014). Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5), 845–869.CrossRef
10.
Zurück zum Zitat Freund, Y., Schapire, R. E., et al. (1996). Experiments with a new boosting algorithm. In Icml, Citeseer (Vol. 96, pp. 148–156). Freund, Y., Schapire, R. E., et al. (1996). Experiments with a new boosting algorithm. In Icml, Citeseer (Vol. 96, pp. 148–156).
11.
Zurück zum Zitat Friedman, J., Hastie, T., Tibshirani, R., et al. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2), 337–407.MathSciNetCrossRef Friedman, J., Hastie, T., Tibshirani, R., et al. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2), 337–407.MathSciNetCrossRef
12.
Zurück zum Zitat Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y. (2016). Deep learning (Vol. 1). MIT press Cambridge. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y. (2016). Deep learning (Vol. 1). MIT press Cambridge.
13.
Zurück zum Zitat Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., et al. (2018) Co-sampling: Training robust networks for extremely noisy supervision. arXiv:180406872. Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., et al. (2018) Co-sampling: Training robust networks for extremely noisy supervision. arXiv:​180406872.
14.
Zurück zum Zitat Hickey, R. J. (1996). Noise modelling and evaluating learning from examples. Artificial Intelligence, 82(1–2), 157–179.MathSciNetCrossRef Hickey, R. J. (1996). Noise modelling and evaluating learning from examples. Artificial Intelligence, 82(1–2), 157–179.MathSciNetCrossRef
15.
Zurück zum Zitat Izadinia, H., Russell, B. C., Farhadi, A., Hoffman, M. D., Hertzmann, A. (2015) Deep classifiers from image tags in the wild. In Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions (pp. 13–18). ACM. Izadinia, H., Russell, B. C., Farhadi, A., Hoffman, M. D., Hertzmann, A. (2015) Deep classifiers from image tags in the wild. In Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions (pp. 13–18). ACM.
16.
Zurück zum Zitat Joulin, A., van der Maaten, L., Jabri, A., Vasilache, N. (2016). Learning visual features from large weakly supervised data. In European Conference on Computer Vision (pp. 67–84). Springer Joulin, A., van der Maaten, L., Jabri, A., Vasilache, N. (2016). Learning visual features from large weakly supervised data. In European Conference on Computer Vision (pp. 67–84). Springer
17.
Zurück zum Zitat Karmaker, A., & Kwek, S. (2006). A boosting approach to remove class label noise 1. International Journal of Hybrid Intelligent Systems, 3(3), 169–177.CrossRef Karmaker, A., & Kwek, S. (2006). A boosting approach to remove class label noise 1. International Journal of Hybrid Intelligent Systems, 3(3), 169–177.CrossRef
18.
Zurück zum Zitat Khoshgoftaar, T. M., Zhong, S., & Joshi, V. (2005). Enhancing software quality estimation using ensemble-classifier based noise filtering. Intelligent Data Analysis, 9(1), 3–27.CrossRef Khoshgoftaar, T. M., Zhong, S., & Joshi, V. (2005). Enhancing software quality estimation using ensemble-classifier based noise filtering. Intelligent Data Analysis, 9(1), 3–27.CrossRef
19.
Zurück zum Zitat Li, Y., Yang, J., Song, Y., Cao, L., Luo, J., & Li, L. J. (2017). Learning from noisy labels with distillation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1910–1918) Li, Y., Yang, J., Song, Y., Cao, L., Luo, J., & Li, L. J. (2017). Learning from noisy labels with distillation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1910–1918)
20.
Zurück zum Zitat Lin, C. H., Weld, D. S., et al. (2014). To re (label), or not to re (label). In Second AAAI Conference on Human Computation and Crowdsourcing Lin, C. H., Weld, D. S., et al. (2014). To re (label), or not to re (label). In Second AAAI Conference on Human Computation and Crowdsourcing
21.
Zurück zum Zitat Liu, H., & Zhang, S. (2012). Noisy data elimination using mutual k-nearest neighbor for classification mining. Journal of Systems and Software, 85(5), 1067–1074.CrossRef Liu, H., & Zhang, S. (2012). Noisy data elimination using mutual k-nearest neighbor for classification mining. Journal of Systems and Software, 85(5), 1067–1074.CrossRef
22.
Zurück zum Zitat Liu, T., & Tao, D. (2016). Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3), 447–461.CrossRef Liu, T., & Tao, D. (2016). Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3), 447–461.CrossRef
23.
Zurück zum Zitat Malach, E., Shalev-Shwartz, S. (2017). Decoupling “when to update” from “how to update”. In Advances in Neural Information Processing Systems (pp. 960–970). Malach, E., Shalev-Shwartz, S. (2017). Decoupling “when to update” from “how to update”. In Advances in Neural Information Processing Systems (pp. 960–970).
24.
Zurück zum Zitat Menon, A., Rooyen, B. V., Ong, C. S., Williamson, B. (2015). Learning from corrupted binary labels via class-probability estimation. In F Bach, D Blei, (Eds.) Proceedings of the 32nd International Conference on Machine Learning, PMLR, Lille, France, Proceedings of Machine Learning Research (Vol. 37, pp. 125–134). http://proceedings.mlr.press/v37/menon15.html. Menon, A., Rooyen, B. V., Ong, C. S., Williamson, B. (2015). Learning from corrupted binary labels via class-probability estimation. In F Bach, D Blei, (Eds.) Proceedings of the 32nd International Conference on Machine Learning, PMLR, Lille, France, Proceedings of Machine Learning Research (Vol. 37, pp. 125–134). http://​proceedings.​mlr.​press/​v37/​menon15.​html.
25.
Zurück zum Zitat Mnih, V., Hinton, G. E. (2012). Learning to label aerial images from noisy data. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) (pp. 567–574) Mnih, V., Hinton, G. E. (2012). Learning to label aerial images from noisy data. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) (pp. 567–574)
26.
Zurück zum Zitat Nettleton, D. F., Orriols-Puig, A., & Fornells, A. (2010). A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review, 33(4), 275–306.CrossRef Nettleton, D. F., Orriols-Puig, A., & Fornells, A. (2010). A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review, 33(4), 275–306.CrossRef
27.
Zurück zum Zitat Oja, E. (1980). On the convergence of an associative learning algorithm in the presence of noise. International Journal of Systems Science, 11(5), 629–640.MathSciNetCrossRef Oja, E. (1980). On the convergence of an associative learning algorithm in the presence of noise. International Journal of Systems Science, 11(5), 629–640.MathSciNetCrossRef
28.
Zurück zum Zitat Orr, K. (1998). Data quality and systems theory. Communications of the ACM, 41(2), 66–71.CrossRef Orr, K. (1998). Data quality and systems theory. Communications of the ACM, 41(2), 66–71.CrossRef
29.
Zurück zum Zitat Oza, N. C. (2004) Aveboost2: Boosting for noisy data. In International Workshop on Multiple Classifier Systems (pp. 31–40). Springer. Oza, N. C. (2004) Aveboost2: Boosting for noisy data. In International Workshop on Multiple Classifier Systems (pp. 31–40). Springer.
30.
Zurück zum Zitat Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
31.
Zurück zum Zitat Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., et al. (2010). Learning from crowds. Journal of Machine Learning Research, 11(Apr), 1297–1322. Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., et al. (2010). Learning from crowds. Journal of Machine Learning Research, 11(Apr), 1297–1322.
32.
Zurück zum Zitat Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., Rabinovich, A. (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv:14126596. Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., Rabinovich, A. (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv:​14126596.
33.
Zurück zum Zitat Rodrigues, F., Pereira, F. C. (2018). Deep learning from crowds. In Thirty-Second AAAI Conference on Artificial Intelligence. Rodrigues, F., Pereira, F. C. (2018). Deep learning from crowds. In Thirty-Second AAAI Conference on Artificial Intelligence.
34.
Zurück zum Zitat Sluban, B., Gamberger, D., & Lavrač, N. (2014). Ensemble-based noise detection: Noise ranking and visual performance evaluation. Data Mining and Knowledge Discovery, 28(2), 265–303.MathSciNetCrossRef Sluban, B., Gamberger, D., & Lavrač, N. (2014). Ensemble-based noise detection: Noise ranking and visual performance evaluation. Data Mining and Knowledge Discovery, 28(2), 265–303.MathSciNetCrossRef
35.
Zurück zum Zitat Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus, R. (2014). Training convolutional networks with noisy labels. arXiv:14062080. Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus, R. (2014). Training convolutional networks with noisy labels. arXiv:​14062080.
36.
Zurück zum Zitat Sun, J. W., Zhao, F. Y., Wang, C. J., Chen, S. F. (2007). Identifying and correcting mislabeled training instances. In Future Generation Communication and Networking (FGCN 2007) (Vol. 1, pp. 244–250), IEEE. Sun, J. W., Zhao, F. Y., Wang, C. J., Chen, S. F. (2007). Identifying and correcting mislabeled training instances. In Future Generation Communication and Networking (FGCN 2007) (Vol. 1, pp. 244–250), IEEE.
38.
Zurück zum Zitat Teng, C. M. (1999). Correcting noisy data. In ICML, Citeseer (pp. 239–248) Teng, C. M. (1999). Correcting noisy data. In ICML, Citeseer (pp. 239–248)
39.
Zurück zum Zitat Verbaeten, S., Van Assche, A. (2003). Ensemble methods for noise elimination in classification problems. In International Workshop on Multiple Classifier Systems (pp. 317–325). Springer. Verbaeten, S., Van Assche, A. (2003). Ensemble methods for noise elimination in classification problems. In International Workshop on Multiple Classifier Systems (pp. 317–325). Springer.
40.
Zurück zum Zitat Vu, T. K., Tran, Q. L. (2018). Robust loss functions: Defense mechanisms for deep architectures. In: 2018 10th International Conference on Knowledge and Systems Engineering (KSE) (pp. 163–168). IEEE. Vu, T. K., Tran, Q. L. (2018). Robust loss functions: Defense mechanisms for deep architectures. In: 2018 10th International Conference on Knowledge and Systems Engineering (KSE) (pp. 163–168). IEEE.
42.
Zurück zum Zitat Yao, J., Wang, J., Tsang, I. W., Zhang, Y., Sun, J., Zhang, C., et al. (2019). Deep learning from noisy image labels with quality embedding. IEEE Transactions on Image Processing, 28(4), 1909–1922.MathSciNetCrossRef Yao, J., Wang, J., Tsang, I. W., Zhang, Y., Sun, J., Zhang, C., et al. (2019). Deep learning from noisy image labels with quality embedding. IEEE Transactions on Image Processing, 28(4), 1909–1922.MathSciNetCrossRef
43.
Zurück zum Zitat Zhong, S., Tang, W., & Khoshgoftaar, T. M. (2005). Boosted noise filters for identifying mislabeled data. Department of Computer Science and engineering, Florida Atlantic University. Zhong, S., Tang, W., & Khoshgoftaar, T. M. (2005). Boosted noise filters for identifying mislabeled data. Department of Computer Science and engineering, Florida Atlantic University.
44.
Zurück zum Zitat Zhu, X., Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review, 22(3), 177–210. Zhu, X., Wu, X. (2004). Class noise vs. attribute noise: A quantitative study. Artificial Intelligence Review, 22(3), 177–210.
45.
Zurück zum Zitat Zhu, X., Wu, X., Chen, Q. (2003). Eliminating class noise in large datasets. In Proceedings of the 20th International Conference on Machine Learning (ICML-03) (pp. 920–927) Zhu, X., Wu, X., Chen, Q. (2003). Eliminating class noise in large datasets. In Proceedings of the 20th International Conference on Machine Learning (ICML-03) (pp. 920–927)
Metadaten
Titel
Impact of Noisy Labels in Learning Techniques: A Survey
verfasst von
Nitika Nigam
Tanima Dutta
Hari Prabhat Gupta
Copyright-Jahr
2020
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-0694-9_38

Neuer Inhalt