Top

Published in:

2021 | OriginalPaper | Chapter

FreezeNet: Full Performance by Reduced Storage Costs

Authors : Paul Wimmer, Jens Mehnert, Alexandru Condurache

Published in: Computer Vision – ACCV 2020

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Pruning generates sparse networks by setting parameters to zero. In this work we improve one-shot pruning methods, applied before training, without adding any additional storage costs while preserving the sparse gradient computations. The main difference to pruning is that we do not sparsify the network’s weights but learn just a few key parameters and keep the other ones fixed at their random initialized value. This mechanism is called freezing the parameters. Those frozen weights can be stored efficiently with a single 32bit random seed number. The parameters to be frozen are determined one-shot by a single for- and backward pass applied before training starts. We call the introduced method FreezeNet. In our experiments we show that FreezeNets achieve good results, especially for extreme freezing rates. Freezing weights preserves the gradient flow throughout the network and consequently, FreezeNets train better and have an increased capacity compared to their pruned counterparts. On the classification tasks MNIST and CIFAR-10/100 we outperform SNIP, in this setting the best reported one-shot pruning method, applied before training. On MNIST, FreezeNet achieves \(99.2\%\) performance of the baseline LeNet-5-Caffe architecture, while compressing the number of trained and stored parameters by a factor of \(\times 157\).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Understanding Motion in Sign Language: A New Structured Translation Dataset

Available only for authorised users

By an abuse of notation, we also use W and B as the vectors containing all elements of the set of all weights and biases, respectively.

To obtain differentiability in equation (1), the mask is assumed to be continuous, i.e. \(m\in \mathbb {R}^{\vert W\vert }\).

Based on the official implementation https://github.com/namhoonlee/snip-public.

Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016)

Bellec, G., Kappel, D., Maass, W., Legenstein, R.: Deep rewiring: training very sparse deep networks. In: International Conference on Learning Representations. OpenReview.net (2018)

Carreira-Perpinan, M.A., Idelbayev, Y.: “Learning-compression” algorithms for neural net pruning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8532–8541. IEEE Computer Society (2018)

Chauvin, Y.: A back-propagation algorithm with optimal use of hidden units. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 1. Morgan-Kaufmann, Burlington (1989)

Davis, C.: The norm of the schur product operation. Numer. Math. 4(1), 343–344 (1962)MathSciNetCrossRef

Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE Computer Society (2009)

Dettmers, T., Zettlemoyer, L.: Sparse networks from scratch: faster training without losing performance. CoRR abs/1907.04840 (2019)

Ding, X., Ding, G., Zhou, X., Guo, Y., Han, J., Liu, J.: Global sparse momentum SGD for pruning very deep neural networks. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 6382–6394. Curran Associates, Inc. (2019)

Dong, X., Huang, J., Yang, Y., Yan, S.: More is less: a more complicated network with less inference complexity. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

10.

Dong, X., Yang, Y.: Network pruning via transformable architecture search. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 760–771. Curran Associates, Inc. (2019)

11.

Duda, J., Tahboub, K., Gadgil, N.J., Delp, E.J.: The use of asymmetric numeral systems as an accurate replacement for huffman coding. In: Picture Coding Symposium, pp. 65–69 (2015)

12.

Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International Conference on Learning Representations (2018)

13.

Frankle, J., Dziugaite, G.K., Roy, D.M., Carbin, M.: Stabilizing the lottery ticket hypothesis. CoRR abs/1903.01611 (2019)

14.

Frankle, J., Schwab, D.J., Morcos, A.S.: Training batchnorm and only batchnorm: on the expressive power of random features in cnns. CoRR abs/2003.00152 (2020)

15.

Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4(1), 1–58 (1992)CrossRef

16.

Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. PMLR (2010)

17.

Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient dnns. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 1379–1387. Curran Associates, Inc. (2016)

18.

Gustafson, J.L.: Moore’s law. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1177–1184. Springer, US (2011). https://doi.org/10.1007/978-0-387-09766-4CrossRef

19.

Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28. Curran Associates, Inc. (2015)

20.

Hanson, S.J., Pratt, L.Y.: Comparing biases for minimal network construction with back-propagation. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 1. Morgan-Kaufmann, Burlington (1989)

21.

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. CoRR abs/1502.01852 (2015)

22.

He, Y., Lin, J., Liu, Z., Wang, H., Li, L.-J., Han, S.: AMC: AutoML for model compression and acceleration on mobile devices. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 815–832. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_48CrossRef

23.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456. PMLR (2015)

24.

Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: ICCV, pp. 2146–2153. IEEE (2009)

25.

Karnin, E.D.: A simple procedure for pruning back-propagation trained neural networks. IEEE Trans. Neural Networks 1(2), 239–242 (1990)CrossRef

26.

Krizhevsky, A.: Learning multiple layers of features from tiny images. University of Toronto (2012). http://www.cs.toronto.edu/~kriz/cifar.html

27.

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef

28.

LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 2. Morgan-Kaufmann, Burlington (1990)

29.

Lee, N., Ajanthan, T., Torr, P.: SNIP: Single-shot network pruning based on connection sensitivity. In: International Conference on Learning Representations. OpenReview.net (2019)

30.

Mocanu, D., Mocanu, E., Stone, P., Nguyen, P., Gibescu, M., Liotta, A.: Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat. Commun. 9, 1–12 (2018)CrossRef

31.

Mozer, M.C., Smolensky, P.: Skeletonization: a technique for trimming the fat from a network via relevance assessment. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 1. Morgan-Kaufmann (1989)

32.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., et al.: Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019)

33.

Ramanujan, V., Wortsman, M., Kembhavi, A., Farhadi, A., Rastegari, M.: What’s hidden in a randomly weighted neural network? In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society (2020)

34.

Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)MathSciNetCrossRef

35.

Saxe, A., Koh, P.W., Chen, Z., Bhand, M., Suresh, B., Ng, A.: On random weights and unsupervised feature learning. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning, pp. 1089–1096. ACM (2011)

36.

Schwartz, R., Dodge, J., Smith, N.A., Etzioni, O.: Green AI. CoRR abs/1907.10597 (2019)

37.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) International Conference on Learning Representations. OpenReview.net (2015)

38.

Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1139–1147. PMLR (2013)

39.

Wang, C., Zhang, G., Grosse, R.: Picking winning tickets before training by preserving gradient flow. In: International Conference on Learning Representations. OpenReview.net (2020)

40.

Wortsman, M., Farhadi, A., Rastegari, M.: Discovering neural wirings. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 2684–2694. Curran Associates, Inc. (2019)

41.

Xie, S., Kirillov, A., Girshick, R., He, K.: Exploring randomly wired neural networks for image recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

42.

Zhou, H., Lan, J., Liu, R., Yosinski, J.: Deconstructing lottery tickets: zeros, signs, and the supermask. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 3597–3607. Curran Associates, Inc. (2019)

Title: FreezeNet: Full Performance by Reduced Storage Costs
Authors: Paul Wimmer
Jens Mehnert
Alexandru Condurache
Publisher: Springer International Publishing
Book: Computer Vision – ACCV 2020
Print ISBN: 978-3-030-69543-9

Electronic ISBN: 978-3-030-69544-6

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-69544-6_41

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner