Top

Published in:

2018 | OriginalPaper | Chapter

Group Normalization

Authors : Yuxin Wu, Kaiming He

Published in: Computer Vision – ECCV 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems—BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

next chapter Deep Expander Networks: Efficient Deep Networks from Graph Theory

In the context of this paper, we use “batch size” to refer to the number of samples per worker (e.g., GPU). BN’s statistics are computed for each worker, but not broadcast across workers, as is standard in many libraries.

Detectron [59] uses pre-trained models provided by the authors of [3]. For fair comparisons, we instead use the models pre-trained in this paper. The object detection and segmentation accuracy is statistically similar between these pre-trained models.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)CrossRef

Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: ICLR Workshop (2016)

Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)

Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)

Girshick, R.: Fast R-CNN. In: ICCV (2015)

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

10.

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)

11.

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

12.

Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)

13.

Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR (2017)

14.

Lowe, D.G.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004)MathSciNetCrossRef

15.

Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

16.

Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. In: IJCV (2015)

17.

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization (2016). arXiv:1607.06450

18.

Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization (2016). arXiv:1607.08022

19.

Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: NIPS (2016)

20.

Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48CrossRef

21.

Kay, W., et al.: The kinetics human action video dataset (2017). arXiv:1705.06950

22.

Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)CrossRef

23.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

24.

Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)

25.

Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)

26.

Lyu, S., Simoncelli, E.P.: Nonlinear image representation using divisive normalization. In: CVPR (2008)

27.

Jarrett, K., Kavukcuoglu, K., LeCun, Y., et al.: What is the best multi-stage architecture for object recognition? In: ICCV (2009)

28.

Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

29.

Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53CrossRef

30.

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR (2014)

31.

Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)

32.

Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: NIPS (2017)

33.

Arpit, D., Zhou, Y., Kota, B., Govindaraju, V.: Normalization propagation: a parametric technique for removing internal covariate shift in deep networks. In: ICML (2016)

34.

Ren, M., Liao, R., Urtasun, R., Sinz, F.H., Zemel, R.S.: Normalizing the normalizers: comparing and extending network normalization schemes. In: ICLR (2017)

35.

Ioffe, S.: Batch renormalization: towards reducing minibatch dependence in batch-normalized models. In: NIPS (2017)

36.

Peng, C., et al.: MegDet: a large mini-batch object detector. In: CVPR (2018)

37.

Dean, J., et al.: Large scale distributed deep networks. In: NIPS (2012)

38.

Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017). arXiv:1704.04861

39.

Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR (2017)

40.

Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: CVPR (2018)

41.

Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. In: IJCV (2001)

42.

Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)

43.

Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR (2007)

44.

Dieleman, S., De Fauw, J., Kavukcuoglu, K.: Exploiting cyclic symmetry in convolutional neural networks. In: ICML (2016)

45.

Cohen, T., Welling, M.: Group equivariant convolutional networks. In: ICML (2016)

46.

Heeger, D.J.: Normalization of cell responses in cat striate cortex. Vis. Neurosci. 9(2), 181–197 (1992)MathSciNetCrossRef

47.

Schwartz, O., Simoncelli, E.P.: Natural signal statistics and sensory gain control. Nat. Neurosci. 4(8), 819 (2001)CrossRef

48.

Simoncelli, E.P., Olshausen, B.A.: Natural image statistics and neural representation. Ann. Rev. Neurosci. 24(1), 1193–1216 (2001)CrossRef

49.

Carandini, M., Heeger, D.J.: Normalization as a canonical neural computation. Nat. Rev. Neurosci. 13(1), 51 (2012)CrossRef

50.

Paszke, A., et al.: Automatic differentiation in pytorch (2017)

51.

Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: Operating Systems Design and Implementation (OSDI) (2016)

52.

Gross, S., Wilber, M.: Training and investigating Residual Nets (2016). https://github.com/facebook/fb.resnet.torch

53.

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)

54.

Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour (2017). arXiv:1706.02677

55.

Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks (2014). arXiv:1404.5997

56.

Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning (2016). arXiv:1606.04838

57.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

58.

Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)

59.

Girshick, R., Radosavovic, I., Gkioxari, G., Dollár, P., He, K.: Detectron (2018). https://github.com/facebookresearch/detectron

60.

Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)

61.

Ren, S., He, K., Girshick, R., Zhang, X., Sun, J.: Object detection networks on convolutional feature maps. TPAMI 39(7), 1476–1481 (2017)CrossRef

62.

Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: DetNet: a backbone network for object detection (2018). arXiv:1804.06215

63.

Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)

Title: Group Normalization
Authors: Yuxin Wu
Kaiming He
Publisher: Springer International Publishing
Book: Computer Vision – ECCV 2018
Print ISBN: 978-3-030-01260-1

Electronic ISBN: 978-3-030-01261-8

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-030-01261-8_1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner