nach oben

International Journal of Computer Vision

Erschienen in:

09.12.2019

SSN: Learning Sparse Switchable Normalization via SparsestMax

verfasst von: Wenqi Shao, Jingyu Li, Jiamin Ren, Ruimao Zhang, Xiaogang Wang, Ping Luo

Erschienen in: International Journal of Computer Vision | Ausgabe 8-9/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Normalization method deals with parameters training of convolution neural networks (CNNs) in which there are often multiple convolution layers. Despite the fact that layers in CNN are not homogeneous in the role they play at representing a prediction function, existing works often employ identical normalizer in different layers, making performance away from idealism. To tackle this problem and further boost performance, a recently-proposed switchable normalization (SN) provides a new perspective for deep learning: it learns to select different normalizers for different convolution layers of a ConvNet. However, SN uses softmax function to learn importance ratios to combine normalizers, not only leading to redundant computations compared to a single normalizer but also making model less interpretable. This work addresses this issue by presenting sparse switchable normalization (SSN) where the importance ratios are constrained to be sparse. Unlike \(\ell _1\) and \(\ell _0\) regularizations that impose difficulties in tuning layer-wise regularization coefficients, we turn this sparse-constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax. SSN has several appealing properties. (1) It inherits all benefits from SN such as applicability in various tasks and robustness to a wide range of batch sizes. (2) It is guaranteed to select only one normalizer for each normalization layer, avoiding redundant computations and improving interpretability of normalizer selection. (3) SSN can be transferred to various tasks in an end-to-end manner. Extensive experiments show that SSN outperforms its counterparts on various challenging benchmarks such as ImageNet, COCO, Cityscapes, ADE20K, Kinetics and MegaFace. Models and code are available at https://github.com/switchablenorms/Sparse_SwitchNorm.

Vorheriger Artikel Learning an Evolutionary Embedding via Massive Knowledge Distillation

Nächster Artikel Rectified Wing Loss for Efficient and Robust Facial Landmark Localisation with Convolutional Neural Networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

The softmax function is defined by \(p_k=\mathrm {softmax}_k(\mathbf {z})=\exp (z_k)/\sum _{k=1}^{|\varOmega |}\exp (z_k)\).

Unless otherwise stated, all \((1),(2),\ldots ,(K)\) in this paper represent subscripts in the descending order of all elements in \(\mathbf {z}\).

https://github.com/deepinsight/insightface/tree/master/src/megaface

Advani, M. S., & Saxe, A. M. (2017). High-dimensional dynamics of generalization error in neural networks. arXiv preprint arXiv:1710.03667.

Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. arXiv preprint arXiv:1607.06450.

Bartlett, P. L., Maiorov, V., & Meir, R. (1999). Almost linear vc dimension bounds for piecewise polynomial networks. In Advances in neural information processing systems (pp. 190–196).

Bentley, J. L., & McIlroy, M. D. (1993). Engineering a sort function. Software: Practice and Experience, 23, 1249–1265.

Bertsekas, D. P. (2014). Constrained optimization and Lagrange multiplier methods. New York: Academic Press.MATH

Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? a new model and he kinetics dataset. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4724–4733). IEEE.

Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.CrossRef

Condat, L. (2016). Fast projection onto the simplex and the \(\ell _1\) ball. Mathematical Programming, 158, 575–585.MathSciNetMATH

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2018). Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698.

Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems (pp. 1269–1277).

Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448).

Girshick, R., Radosavovic, I., Gkioxari, G., Dollár, P., & He, K. (2018). Detectron. https://github.com/facebookresearch/detectron.

Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. (2017). Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677.

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).

Held, M., Wolfe, P., & Crowder, H. P. (1974). Validation of subgradient optimization. Mathematical Programming, 6(1), 62–88.MathSciNetMATHCrossRef

Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251–257.MathSciNetCrossRef

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.MATHCrossRef

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (vol. 1, p. 3).

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.

Jang, E., Gu, S., & Poole, B. (2016). Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.

Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., et al. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.

Kemelmacher-Shlizerman, I., Seitz, S. M., Miller, D., & Brossard, E. (2016). The megaface benchmark: 1 million faces for recognition at scale. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4873–4882).

Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., & Bengio, Y. (2016). Batch normalized recurrent neural networks. 2016 IEEE international conference on acoustics (pp. 2657–2661). IEEE: Speech and Signal Processing (ICASSP).

Li, Y., Wang, N., Shi, J., Liu, J., & Hou, X. (2016). Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv:1603.04779.

Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017), Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125)

Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755.) Springer.

Liu, H., Simonyan, K., & Yang, Y. (2018). Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055.

Liu, B., Wang, M., Foroosh, H., Tappen, M., & Pensky, M. (2015). Sparse convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 806–814).

Louizos, C., Welling, M., & Kingma, D. P. (2017). Learning sparse neural networks through \( l\_0 \) regularization. arXiv preprint arXiv:1712.01312.

Luo, P., Ren, J., & Peng, Z. (2018). Differentiable learning-to-normalize via switchable normalization. arXiv preprint arXiv:1806.10779.

Luo, P., Wang, X., Shao, W., & Peng, Z. (2018). Understanding regularization in batch normalization. arXiv preprint arXiv:1809.00846.

Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV) (pp. 116–131).

Maddison, C. J., Mnih, A., & Teh, Y. W. (2016). The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712.

Malaviya, C., Ferreira, P., & Martins, A. F. (2018). Sparse and constrained attention for neural machine translation. arXiv preprint arXiv:1805.08241.

Martins, A. F. T., & Astudillo, R. F. (2016). From softmax to sparsemax: A sparse model of attention and multi-label classification. CoRR arXiv:1602.02068.

Martins, A. F., & Kreutzer, J. (2017). Learning what’s easy: Fully differentiable neural easy-first taggers. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 349–362).

Miyato, T., Kataoka, T., Koyama, M., & Yoshida, Y. (2018). Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957.

Pan, X., Luo, P., Shi, J., & Tang, X. (2018). Two at once: Enhancing learning and generalization capacities via ibn-net. arXiv preprint arXiv:1807.09441.

Pascanu, R., Montufar, G., & Bengio, Y. (2013). On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv preprint arXiv:1312.6098.

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch. In NIPS-W.

Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., & Dickstein, J. S. (2017). On the expressive power of deep neural networks. In Proceedings of the 34th international conference on machine learning—Volume 70 (pp. 2847–2854). JMLR. org.

Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2018). Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548.

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91–99).

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef

Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in neural information processing systems (pp. 901–909).

Santurkar, S., Tsipras, D., Ilyas, A., & Madry, A. (2018). How does batch normalization help optimization?(no, it is not about internal covariate shift). arXiv preprint arXiv:1805.11604.

Scardapane, S., Comminiello, D., Hussain, A., & Uncini, A. (2017). Group sparse regularization for deep neural networks. Neurocomputing, 241, 81–89.CrossRef

Sun, S., Pang, J., Shi, J., Yi, S., & Ouyang, W. (2018). Fishnet: A versatile backbone for image, region, and pixel level prediction. In Advances in neural information processing systems (pp. 762–772).

Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence.

Tai, C., Xiao, T., Zhang, Y., Wang, X., et al. (2015). Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067.

Tartaglione, E., Lepsøy, S., Fiandrotti, A., & Francini, G. (2018), Learning sparse neural networks via sensitivity-driven regularization. In Advances in neural information processing systems (pp. 3882–3892).

Teye, M., Azizpour, H., & Smith, K. (2018). Bayesian uncertainty estimation for batch normalized deep networks. arXiv preprint arXiv:1802.06455.

Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2017), Instance normalization: the missing ingredient for fast stylization. cscv. arXiv preprint arXiv:1607.08022.

Van Den Berg, E., & Friedlander, M. P. (2008). Probing the pareto frontier for basis pursuit solutions. SIAM Journal on Scientific Computing, 31(2), 890–912.MathSciNetMATHCrossRef

Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In CVPR.

Wen, W., Wu, C., Wang, Y., Chen, Y., & Li, H. (2016). Learning structured sparsity in deep neural networks. In Advances in neural information processing systems (pp. 2074–2082).

Wu, Y., & He, K. (2018). Group normalization. arXiv preprint arXiv:1803.08494.

Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492–1500).

Xie, S., Zheng, H., Liu, C., & Lin, L. (2018) SNAS: stochastic neural architecture search. arXiv preprint arXiv:1812.09926.

Yang, G., Pennington, J., Rao, V., Sohl-Dickstein, J., Schoenholz, S. S. (2019). A mean field theory of batch normalization. arXiv preprint arXiv:1902.08129.

Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. arXiv preprint arXiv:1605.07146.

Zhang, C., Bengio, S., & Singer, Y. (2019). Are all layers created equal? arXiv preprint arXiv:1902.01996.

Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6848–6856).

Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR).

Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition.

Zoph, B., Le, Q. V. (2016). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578.

Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8697–8710).

Titel: SSN: Learning Sparse Switchable Normalization via SparsestMax
verfasst von: Wenqi Shao
Jingyu Li
Jiamin Ren
Ruimao Zhang
Xiaogang Wang
Ping Luo
Publikationsdatum: 09.12.2019
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 8-9/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-019-01269-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 8-9/2020

HetConv: Beyond Homogeneous Convolution Kernels for Deep CNNs

Rectified Wing Loss for Efficient and Robust Facial Landmark Localisation with Convolutional Neural Networks

Tensorized Multi-view Subspace Representation Learning

Hardware-Centric AutoML for Mixed-Precision Quantization

A General Framework for Deep Supervised Discrete Hashing

Efficient Visual Recognition