Top

Neural Computing and Applications

Published in:

04-03-2022 | Original Article

Binary cross-entropy with dynamical clipping

Authors: Petr Hurtik, Stefania Tomasiello, Jan Hula, David Hynar

Published in: Neural Computing and Applications | Issue 14/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We investigate the adverse effect of noisy labels in a training dataset on a neural network’s precision in an image classification task. The importance of this research lies in the fact that most datasets include noisy labels. To reduce the impact of noisy labels, we propose to extend the binary cross-entropy by dynamical clipping, which clips all samples’ loss values in a mini-batch by a clipping constant. Such a constant is dynamically determined for every single mini-batch using its statistics. The advantage is the dynamic adaptation to any number of noisy labels in a training dataset. Thanks to that, the proposed binary cross-entropy with dynamical clipping can be used in any model utilizing cross-entropy or focal loss, including pre-trained models. We prove that the proposed loss function is an \(\alpha \)-calibrated classification loss, implying consistency and robustness to noise misclassification in more general asymmetric problems. We demonstrate our loss function’s usefulness on Fashion MNIST, CIFAR-10, CIFAR-100 datasets, where we heuristically create training data with noisy labels and achieve a nice performance boost compared to the standard binary cross-entropy. These results are also confirmed in the second experiment, where we use a trained model on Google Images to classify the ImageWoof dataset, and the third experiment, where we deal with the WebVision and ANIMAL-10N datasets. We also show that the proposed technique yields significantly better performance than the gradient clipping. Code: gitlab.com/irafm-ai/clipping_cross_entropy

previous article RESC: REfine the SCore with adaptive transformer head for end-to-end object detection

next article Application of boundary-fitted convolutional neural network to simulate non-Newtonian fluid flow behavior in eccentric annulus

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

https://paperswithcode.com/sota/image-classification-on-imagenet.

https://https-deeplearning-ai.github.io/data-centric-comp.

https://gitlab.com/irafm-ai/clipping_cross_entropy.

https://keras.io/api/preprocessing/image/.

https://github.com/fastai/imagenette.

https://paperswithcode.com/sota/learning-with-noisy-labels-on-animal.

Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2623–2631

Al-Rawi M, Karatzas D (2018) On the labeling correctness in computer vision datasets. In: IAL@ PKDD/ECML, pp. 1–23

Arazo E, Ortego D, Albert P, O’Connor N, McGuinness K (2019) Unsupervised label noise modeling and loss correction. In: International Conference on Machine Learning, pp. 312–321. PMLR

Zhou BC, Han CY, Td G (2021) Convergence of stochastic gradient descent in deep neural network. Acta Mathematicae Applicatae Sinica, English Ser 37(1):126–136MathSciNetCrossRef

Beyer L, Hénaff OJ, Kolesnikov A, Zhai X, Oord Avd (2020) Are we done with imagenet? arXiv preprint arXiv:2006.07159

Brock A, De S, Smith SL, Simonyan K (2021) High-performance large-scale image recognition without normalization. arXiv preprint arXiv:2102.06171

Canziani A, Paszke A, Culurciello E (2016) An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678

Chen P, Liao BB, Chen G, Zhang S (2019) Understanding and utilizing deep neural networks trained with noisy labels. In: International Conference on Machine Learning, pp. 1062–1070. PMLR

Chen X, Wu SZ, Hong M (2020) Understanding gradient clipping in private sgd: a geometric perspective. Adv Neural Inform Process Syst 33:13773

10.

Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4470–4478

11.

Chen Y, Shen X, Hu SX, Suykens JA (2021) Boosting co-teaching with compression regularization for label noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2688–2692

12.

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee

13.

Ding Y, Wang L, Fan D, Gong B (2018) A semi-supervised two-stage approach to learning from noisy labels. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1215–1224. IEEE

14.

Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

15.

Ekambaram R, Goldgof DB, Hall LO (2017) Finding label noise examples in large scale datasets. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2420–2424. IEEE

16.

Ghosh A, Kumar H, Sastry P (2017) Robust loss functions under label noise for deep neural networks. arXiv preprint arXiv:1712.09482

17.

Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8536–8546

18.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

19.

Huang J, Qu L, Jia R, Zhao B (2019) O2u-net: a simple noisy label detection approach for deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3326–3334

20.

Jiang L, Zhou Z, Leung T, Li LJ, Fei-Fei L (2018) Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning, pp. 2304–2313. PMLR

21.

Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

22.

Köhler JM, Autenrieth M, Beluch WH (2019) Uncertainty based detection and relabeling of noisy image labels. In: CVPR Workshops, pp. 33–37

23.

Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Tech rep, Citeseer

24.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105

25.

LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef

26.

Lee KH, He X, Zhang L, Yang L (2018) Cleannet: transfer learning for scalable image classifier training with label noise. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5447–5456

27.

Li W, Wang L, Li W, Agustsson E, Van Gool L (2017) Webvision database: visual learning and understanding from web data. arXiv preprint arXiv:1708.02862

28.

Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988

29.

Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp. 740–755. Springer

30.

Liu Y, Guo H (2020) Peer loss functions: learning from noisy labels without knowing noise rates. In: International Conference on Machine Learning, pp. 6226–6236. PMLR

31.

Masnadi-Shirazi H, Vasconcelos N (2008) On the design of loss functions for classification: theory, robustness to outliers, and savageboost. In: Proceedings of the 21st International Conference on Neural Information Processing Systems, pp. 1049–1056

32.

Menon AK, Rawat AS, Reddi SJ, Kumar S (2019) Can gradient clipping mitigate label noise? In: International Conference on Learning Representations

33.

Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? In: Advances in Neural Information Processing Systems, pp. 4694–4703

34.

Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1944–1952

35.

Pham H, Xie Q, Dai Z, Le QV (2020) Meta pseudo labels. arXiv preprint arXiv:2003.10580

36.

Pleiss G, Zhang T, Elenberg ER, Weinberger KQ (2020) Identifying mislabeled data using the area under the margin ranking. arXiv preprint arXiv:2001.10528

37.

Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596

38.

Composite binary losses (2010) Reid M.D., W.R. Journal of Machine Learning Research 11:2387–2422

39.

Rippel O, Gelbart M, Adams R (2014) Learning ordered representations with nested dropout. In: International Conference on Machine Learning, pp. 1746–1754. PMLR

40.

Scott C (2012) Calibrated asymmetric surrogate losses. Electron J Statist 6:958–992MathSciNetCrossRef

41.

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

42.

Smith LN (2017) Cyclical learning rates for training neural networks. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472. IEEE

43.

Smith LN (2018) A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820

44.

Song H, Kim M, Lee JG (2019) Selfie: Refurbishing unclean samples for robust deep learning. In: International Conference on Machine Learning, pp. 5907–5915. PMLR

45.

Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R (2014) Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080

46.

Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp. 1139–1147. PMLR

47.

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826

48.

Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946

49.

Van Gansbeke W, Vandenhende S, Georgoulis S, Proesmans M, Van Gool L (2020) Scan: learning to classify images without labels. In: Proceedings of the European Conference on Computer Vision

50.

Van Rooyen B, Menon A, Williamson RC (2015) Learning with symmetric label noise: the importance of being unhinged. In: Advances in Neural Information Processing Systems, pp. 10–18

51.

Wang X, Hua Y, Kodirov E, Robertson NM (2019) Imae for noise-robust learning: mean absolute error does not treat examples equally and gradient magnitude’s variance matters. arXiv preprint arXiv:1903.12141

52.

Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J (2019) Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 322–330

53.

Xie Q, Luong MT, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698

54.

Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500

55.

Xu Y, Cao P, Kong Y, Wang Y (2019) L\_dmi: an information-theoretic noise-robust loss function. arXiv preprint arXiv:1909.03388

56.

Yang G, Schoenholz SS (2017) Mean field residual networks: on the edge of chaos. arXiv preprint arXiv:1712.08969

57.

Yang J, Feng L, Chen W, Yan X, Zheng H, Luo P, Zhang W (2020) Webly supervised image classification with self-contained confidence. arXiv preprint arXiv:2008.11894

58.

Zhang Y, Zheng S, Wu P, Goswami M, Chen C (2020) Learning with feature-dependent label noise: a progressive approach. In: International Conference on Learning Representations

59.

Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in neural information processing systems, pp. 8778–8788

60.

Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: AAAI, pp. 13001–13008

61.

Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710

Title: Binary cross-entropy with dynamical clipping
Authors: Petr Hurtik
Stefania Tomasiello
Jan Hula
David Hynar
Publication date: 04-03-2022
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 14/2022
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-022-07091-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 14/2022

The monarch butterfly optimization algorithm for solving feature selection problems

Annual dilated convolution neural network for newbuilding ship prices forecasting

An IoT-based smart healthcare system to detect dysphonia

Multi-agent learning algorithms for content placement in cache-enabled small cell networks: 4G and 5G use cases

An interpretable CNN model for classification of partial discharge waveforms in 3D-printed dielectric samples with different void sizes

SELM: Siamese extreme learning machine with application to face biometrics

Premium Partner