Skip to main content
Top
Published in: Neural Computing and Applications 14/2022

04-03-2022 | Original Article

Binary cross-entropy with dynamical clipping

Authors: Petr Hurtik, Stefania Tomasiello, Jan Hula, David Hynar

Published in: Neural Computing and Applications | Issue 14/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We investigate the adverse effect of noisy labels in a training dataset on a neural network’s precision in an image classification task. The importance of this research lies in the fact that most datasets include noisy labels. To reduce the impact of noisy labels, we propose to extend the binary cross-entropy by dynamical clipping, which clips all samples’ loss values in a mini-batch by a clipping constant. Such a constant is dynamically determined for every single mini-batch using its statistics. The advantage is the dynamic adaptation to any number of noisy labels in a training dataset. Thanks to that, the proposed binary cross-entropy with dynamical clipping can be used in any model utilizing cross-entropy or focal loss, including pre-trained models. We prove that the proposed loss function is an \(\alpha \)-calibrated classification loss, implying consistency and robustness to noise misclassification in more general asymmetric problems. We demonstrate our loss function’s usefulness on Fashion MNIST, CIFAR-10, CIFAR-100 datasets, where we heuristically create training data with noisy labels and achieve a nice performance boost compared to the standard binary cross-entropy. These results are also confirmed in the second experiment, where we use a trained model on Google Images to classify the ImageWoof dataset, and the third experiment, where we deal with the WebVision and ANIMAL-10N datasets. We also show that the proposed technique yields significantly better performance than the gradient clipping. Code: gitlab.com/irafm-ai/clipping_cross_entropy

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
https://paperswithcode.com/sota/image-classification-on-imagenet.
 
2
https://https-deeplearning-ai.github.io/data-centric-comp.
 
3
https://gitlab.com/irafm-ai/clipping_cross_entropy.
 
6
https://paperswithcode.com/sota/learning-with-noisy-labels-on-animal.
 
Literature
1.
go back to reference Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2623–2631 Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2623–2631
2.
go back to reference Al-Rawi M, Karatzas D (2018) On the labeling correctness in computer vision datasets. In: IAL@ PKDD/ECML, pp. 1–23 Al-Rawi M, Karatzas D (2018) On the labeling correctness in computer vision datasets. In: IAL@ PKDD/ECML, pp. 1–23
3.
go back to reference Arazo E, Ortego D, Albert P, O’Connor N, McGuinness K (2019) Unsupervised label noise modeling and loss correction. In: International Conference on Machine Learning, pp. 312–321. PMLR Arazo E, Ortego D, Albert P, O’Connor N, McGuinness K (2019) Unsupervised label noise modeling and loss correction. In: International Conference on Machine Learning, pp. 312–321. PMLR
4.
go back to reference Zhou BC, Han CY, Td G (2021) Convergence of stochastic gradient descent in deep neural network. Acta Mathematicae Applicatae Sinica, English Ser 37(1):126–136MathSciNetCrossRef Zhou BC, Han CY, Td G (2021) Convergence of stochastic gradient descent in deep neural network. Acta Mathematicae Applicatae Sinica, English Ser 37(1):126–136MathSciNetCrossRef
6.
go back to reference Brock A, De S, Smith SL, Simonyan K (2021) High-performance large-scale image recognition without normalization. arXiv preprint arXiv:2102.06171 Brock A, De S, Smith SL, Simonyan K (2021) High-performance large-scale image recognition without normalization. arXiv preprint arXiv:​2102.​06171
7.
go back to reference Canziani A, Paszke A, Culurciello E (2016) An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678 Canziani A, Paszke A, Culurciello E (2016) An analysis of deep neural network models for practical applications. arXiv preprint arXiv:​1605.​07678
8.
go back to reference Chen P, Liao BB, Chen G, Zhang S (2019) Understanding and utilizing deep neural networks trained with noisy labels. In: International Conference on Machine Learning, pp. 1062–1070. PMLR Chen P, Liao BB, Chen G, Zhang S (2019) Understanding and utilizing deep neural networks trained with noisy labels. In: International Conference on Machine Learning, pp. 1062–1070. PMLR
9.
go back to reference Chen X, Wu SZ, Hong M (2020) Understanding gradient clipping in private sgd: a geometric perspective. Adv Neural Inform Process Syst 33:13773 Chen X, Wu SZ, Hong M (2020) Understanding gradient clipping in private sgd: a geometric perspective. Adv Neural Inform Process Syst 33:13773
10.
go back to reference Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4470–4478 Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4470–4478
11.
go back to reference Chen Y, Shen X, Hu SX, Suykens JA (2021) Boosting co-teaching with compression regularization for label noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2688–2692 Chen Y, Shen X, Hu SX, Suykens JA (2021) Boosting co-teaching with compression regularization for label noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2688–2692
12.
go back to reference Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee
13.
go back to reference Ding Y, Wang L, Fan D, Gong B (2018) A semi-supervised two-stage approach to learning from noisy labels. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1215–1224. IEEE Ding Y, Wang L, Fan D, Gong B (2018) A semi-supervised two-stage approach to learning from noisy labels. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1215–1224. IEEE
14.
go back to reference Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:​2010.​11929
15.
go back to reference Ekambaram R, Goldgof DB, Hall LO (2017) Finding label noise examples in large scale datasets. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2420–2424. IEEE Ekambaram R, Goldgof DB, Hall LO (2017) Finding label noise examples in large scale datasets. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2420–2424. IEEE
16.
17.
go back to reference Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8536–8546 Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8536–8546
18.
go back to reference He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
19.
go back to reference Huang J, Qu L, Jia R, Zhao B (2019) O2u-net: a simple noisy label detection approach for deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3326–3334 Huang J, Qu L, Jia R, Zhao B (2019) O2u-net: a simple noisy label detection approach for deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3326–3334
20.
go back to reference Jiang L, Zhou Z, Leung T, Li LJ, Fei-Fei L (2018) Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2304–2313. PMLR Jiang L, Zhou Z, Leung T, Li LJ, Fei-Fei L (2018) Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2304–2313. PMLR
22.
go back to reference Köhler JM, Autenrieth M, Beluch WH (2019) Uncertainty based detection and relabeling of noisy image labels. In: CVPR Workshops, pp. 33–37 Köhler JM, Autenrieth M, Beluch WH (2019) Uncertainty based detection and relabeling of noisy image labels. In: CVPR Workshops, pp. 33–37
23.
go back to reference Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Tech rep, Citeseer Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Tech rep, Citeseer
24.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105
25.
go back to reference LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
26.
go back to reference Lee KH, He X, Zhang L, Yang L (2018) Cleannet: transfer learning for scalable image classifier training with label noise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5447–5456 Lee KH, He X, Zhang L, Yang L (2018) Cleannet: transfer learning for scalable image classifier training with label noise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5447–5456
27.
go back to reference Li W, Wang L, Li W, Agustsson E, Van Gool L (2017) Webvision database: visual learning and understanding from web data. arXiv preprint arXiv:1708.02862 Li W, Wang L, Li W, Agustsson E, Van Gool L (2017) Webvision database: visual learning and understanding from web data. arXiv preprint arXiv:​1708.​02862
28.
go back to reference Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988 Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988
29.
go back to reference Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer
30.
go back to reference Liu Y, Guo H (2020) Peer loss functions: learning from noisy labels without knowing noise rates. In: International Conference on Machine Learning, pp. 6226–6236. PMLR Liu Y, Guo H (2020) Peer loss functions: learning from noisy labels without knowing noise rates. In: International Conference on Machine Learning, pp. 6226–6236. PMLR
31.
go back to reference Masnadi-Shirazi H, Vasconcelos N (2008) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1049–1056 Masnadi-Shirazi H, Vasconcelos N (2008) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1049–1056
32.
go back to reference Menon AK, Rawat AS, Reddi SJ, Kumar S (2019) Can gradient clipping mitigate label noise? In: International Conference on Learning Representations Menon AK, Rawat AS, Reddi SJ, Kumar S (2019) Can gradient clipping mitigate label noise? In: International Conference on Learning Representations
33.
go back to reference Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? In: Advances in Neural Information Processing Systems, pp. 4694–4703 Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? In: Advances in Neural Information Processing Systems, pp. 4694–4703
34.
go back to reference Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1944–1952 Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1944–1952
36.
go back to reference Pleiss G, Zhang T, Elenberg ER, Weinberger KQ (2020) Identifying mislabeled data using the area under the margin ranking. arXiv preprint arXiv:2001.10528 Pleiss G, Zhang T, Elenberg ER, Weinberger KQ (2020) Identifying mislabeled data using the area under the margin ranking. arXiv preprint arXiv:​2001.​10528
37.
go back to reference Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596 Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:​1412.​6596
38.
go back to reference Composite binary losses (2010) Reid M.D., W.R. Journal of Machine Learning Research 11:2387–2422 Composite binary losses (2010) Reid M.D., W.R. Journal of Machine Learning Research 11:2387–2422
39.
go back to reference Rippel O, Gelbart M, Adams R (2014) Learning ordered representations with nested dropout. In: International Conference on Machine Learning, pp. 1746–1754. PMLR Rippel O, Gelbart M, Adams R (2014) Learning ordered representations with nested dropout. In: International Conference on Machine Learning, pp. 1746–1754. PMLR
41.
42.
go back to reference Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE
43.
go back to reference Smith LN (2018) A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820 Smith LN (2018) A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:​1803.​09820
44.
go back to reference Song H, Kim M, Lee JG (2019) Selfie: Refurbishing unclean samples for robust deep learning. In: International Conference on Machine Learning, pp. 5907–5915. PMLR Song H, Kim M, Lee JG (2019) Selfie: Refurbishing unclean samples for robust deep learning. In: International Conference on Machine Learning, pp. 5907–5915. PMLR
45.
go back to reference Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R (2014) Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080 Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R (2014) Training convolutional networks with noisy labels. arXiv preprint arXiv:​1406.​2080
46.
go back to reference Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp. 1139–1147. PMLR Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp. 1139–1147. PMLR
47.
go back to reference Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826
48.
49.
go back to reference Van Gansbeke W, Vandenhende S, Georgoulis S, Proesmans M, Van Gool L (2020) Scan: learning to classify images without labels. In: Proceedings of the European Conference on Computer Vision Van Gansbeke W, Vandenhende S, Georgoulis S, Proesmans M, Van Gool L (2020) Scan: learning to classify images without labels. In: Proceedings of the European Conference on Computer Vision
50.
go back to reference Van Rooyen B, Menon A, Williamson RC (2015) Learning with symmetric label noise: the importance of being unhinged. In: Advances in Neural Information Processing Systems, pp. 10–18 Van Rooyen B, Menon A, Williamson RC (2015) Learning with symmetric label noise: the importance of being unhinged. In: Advances in Neural Information Processing Systems, pp. 10–18
51.
go back to reference Wang X, Hua Y, Kodirov E, Robertson NM (2019) Imae for noise-robust learning: mean absolute error does not treat examples equally and gradient magnitude’s variance matters. arXiv preprint arXiv:1903.12141 Wang X, Hua Y, Kodirov E, Robertson NM (2019) Imae for noise-robust learning: mean absolute error does not treat examples equally and gradient magnitude’s variance matters. arXiv preprint arXiv:​1903.​12141
52.
go back to reference Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J (2019) Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 322–330 Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J (2019) Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 322–330
53.
go back to reference Xie Q, Luong MT, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698 Xie Q, Luong MT, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698
54.
go back to reference Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500
55.
57.
go back to reference Yang J, Feng L, Chen W, Yan X, Zheng H, Luo P, Zhang W (2020) Webly supervised image classification with self-contained confidence. arXiv preprint arXiv:2008.11894 Yang J, Feng L, Chen W, Yan X, Zheng H, Luo P, Zhang W (2020) Webly supervised image classification with self-contained confidence. arXiv preprint arXiv:​2008.​11894
58.
go back to reference Zhang Y, Zheng S, Wu P, Goswami M, Chen C (2020) Learning with feature-dependent label noise: a progressive approach. In: International Conference on Learning Representations Zhang Y, Zheng S, Wu P, Goswami M, Chen C (2020) Learning with feature-dependent label noise: a progressive approach. In: International Conference on Learning Representations
59.
go back to reference Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in neural information processing systems, pp. 8778–8788 Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in neural information processing systems, pp. 8778–8788
60.
go back to reference Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: AAAI, pp. 13001–13008 Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: AAAI, pp. 13001–13008
61.
go back to reference Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710 Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710
Metadata
Title
Binary cross-entropy with dynamical clipping
Authors
Petr Hurtik
Stefania Tomasiello
Jan Hula
David Hynar
Publication date
04-03-2022
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 14/2022
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-022-07091-x

Other articles of this Issue 14/2022

Neural Computing and Applications 14/2022 Go to the issue

Premium Partner