Top

Published in:

2021 | OriginalPaper | Chapter

The Impact of Activation Sparsity on Overfitting in Convolutional Neural Networks

Authors : Karim Huesmann, Luis Garcia Rodriguez, Lars Linsen, Benjamin Risse

Published in: Pattern Recognition. ICPR International Workshops and Challenges

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Overfitting is one of the fundamental challenges when training convolutional neural networks and is usually identified by a diverging training and test loss. The underlying dynamics of how the flow of activations induce overfitting is however poorly understood. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures. These novel explainable AI strategies reveal a surprising relationship between activation sparsity and overfitting, namely an increase in sparsity in the feature extraction layers shortly before the test loss starts rising. This tendency is preserved across network architectures and reguralisation strategies so that our measures can be used as a reliable indicator for overfitting while decoupling the network’s generalisation capabilities from its loss-based definition. Moreover, our differentiable sparsity formulation can be used to explicitly penalise the emergence of sparsity during training so that the impact of reduced sparsity on overfitting can be studied in real-time. Applying this penalty and analysing activation sparsity for well known regularisers and in common network architectures supports the hypothesis that reduced activation sparsity can effectively improve the generalisation and classification performance. In line with other recent work on this topic, our methods reveal novel insights into the contradicting concepts of activation sparsity and network capacity by demonstrating that dense activations can enable discriminative feature learning while efficiently exploiting the capacity of deep models without suffering from overfitting, even when trained excessively.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Visualizing the Effect of Semantic Classes in the Attribution of Scene Recognition Models

next chapter Remove to Improve?

Ahmad, S., Scheinkman, L.: How can we be so dense? The benefits of using highly sparse representations. arXiv preprint arXiv:1903.11257 (2019)

Ayinde, B.O., Inanc, T., Zurada, J.M.: On correlation of features extracted by deep neural networks. In: International Joint Conference on Neural Networks (IJCNN), Proceedings, pp. 1–8 (2019)

Ayinde, B.O., Inanc, T., Zurada, J.M.: Regularizing deep neural networks by enhancing diversity in feature extraction. IEEE Trans. Neural Netw. Learn. Syst. 30, 2650–2661 (2019)CrossRef

Ayinde, B.O., Zurada, J.M.: Deep learning of constrained autoencoders for enhanced understanding of data. IEEE Trans. Neural Netw. Learn. Syst. 29(9), 3969–3979 (2017)CrossRef

Bao, Y., Jiang, H., Dai, L., Liu, C.: Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6980–6984. IEEE (2013)

Bengio, Y., Bergstra, J.S.: Slow, decorrelated features for pretraining complex cell-like networks. In: Advances in Neural Information Processing Systems, pp. 99–107 (2009)

Changpinyo, S., Sandler, M., Zhmoginov, A.: The power of sparsity in convolutional neural networks. arXiv preprint arXiv:1702.06257 (2017)

Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)

Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. arXiv preprint arXiv:1511.06068 (2015)

10.

Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V.: AutoAugment: learning augmentation strategies from data. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Proceedings, pp. 113–123 (2019)

11.

Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International Conference on Learning Representations (2018)

12.

Gale, T., Elsen, E., Hooker, S.: The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574 (2019)

13.

Gavrilov, A.D., Jordache, A., Vasdani, M., Deng, J.: Preventing model overfitting and underfitting in convolutional neural networks. Int. J. Softw. Sci. Comput. Intell. (IJSSCI) 10(4), 19–28 (2018)CrossRef

14.

Gomez, A.N., et al.: Learning sparse networks using targeted dropout. arXiv (2019)

15.

Goodfellow, I.J., Bengio, Y., Courville, A.C.: Deep Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2016)MATH

16.

Guo, H., Mao, Y., Zhang, R.: Mixup as locally linear out-of-manifold regularization. In: AAAI Conference on Artificial Intelligence (AAAI), Proceedings, pp. 3714–3722 (2019)

17.

Guo, Y., Zhang, C., Zhang, C., Chen, Y.: Sparse DNNs with improved adversarial robustness. In: Advances in Neural Information Processing Systems, pp. 242–251 (2018)

18.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

19.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), Proceedings, pp. 448–456 (2015)

20.

Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53(8), 5455–5516 (2020). https://doi.org/10.1007/s10462-020-09825-6CrossRef

21.

Klemm, S., Ortkemper, R.D., Jiang, X.: Deploying deep learning into practice: a case study on fundus segmentation. In: Zheng, Y., Williams, B.M., Chen, K. (eds.) MIUA 2019. CCIS, vol. 1065, pp. 411–422. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39343-4_35CrossRef

22.

Li, M., Soltanolkotabi, M., Oymak, S.: Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 4313–4324. PMLR (2020)

23.

Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744 (2017)

24.

Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. In: International Conference on Learning Representations (2018)

25.

Mehta, D., Kim, K.I., Theobalt, C.: On implicit filter level sparsity in convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 520–528 (2019)

26.

Miller, D.J., Rao, A.V., Rose, K., Gersho, A.: A global optimization technique for statistical classifier design. IEEE Trans. Signal Process. 44(12), 3108–3122 (1996)CrossRef

27.

Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. In: 5th International Conference on Learning Representations, ICLR 2017-Conference Track Proceedings (2019)

28.

Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Comput. 4(4), 473–493 (1992)CrossRef

29.

Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., Hinton, G.E.: Regularizing neural networks by penalizing confident output distributions. In: International Conference on Learning Representations (ICLR), Proceedings (2017)

30.

Prechelt, L.: Early stopping - but when? In: Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 1524, pp. 55–69. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49430-8_3CrossRef

31.

Rhu, M., O’Connor, M., Chatterjee, N., Pool, J., Kwon, Y., Keckler, S.W.: Compressing DMA engine: leveraging activation sparsity for training deep neural networks. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 78–91 (2018)

32.

Seelig, J.D., et al.: Two-photon calcium imaging from head-fixed drosophila during optomotor walking behavior. Nat. Methods 7(7), 535 (2010)CrossRef

33.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computational and Biological Learning Society (2015)

34.

Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15(1), 1929–1958 (2014)MathSciNetMATH

35.

Tu, Z., et al.: A survey of variational and CNN-based optical flow techniques. Sig. Process. Image Commun. 72, 9–24 (2019)CrossRef

36.

Vinje, W.E., Gallant, J.L.: Sparse coding and decorrelation in primary visual cortex during natural vision. Science 287(5456), 1273–1276 (2000)CrossRef

37.

Werpachowski, R., György, A., Szepesvári, C.: Detecting overfitting via adversarial examples. In: Advances in Neural Information Processing Systems, pp. 7856–7866 (2019)

38.

Yaguchi, A., Suzuki, T., Asano, W., Nitta, S., Sakata, Y., Tanizawa, A.: Adam induces implicit weight sparsity in rectifier neural networks. In: IEEE International Conference on Machine Learning and Applications (ICMLA), Proceedings, pp. 318–325 (2018)

39.

Yang, Q., Mao, J., Wang, Z., Li, H.: DASNet: dynamic activation sparsity for neural network efficiency improvement. In: 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1401–1405. IEEE (2019)

40.

Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: International Conference on Learning Representations (ICLR), Proceedings (2017)

41.

Zhang, C., Vinyals, O., Munos, R., Bengio, S.: A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893 (2018)

42.

Zhou, H., Lan, J., Liu, R., Yosinski, J.: Deconstructing lottery tickets: zeros, signs, and the supermask. In: Advances in Neural Information Processing Systems, pp. 3592–3602 (2019)

Title: The Impact of Activation Sparsity on Overfitting in Convolutional Neural Networks
Authors: Karim Huesmann
Luis Garcia Rodriguez
Lars Linsen
Benjamin Risse
Publisher: Springer International Publishing
Book: Pattern Recognition. ICPR International Workshops and Challenges
Print ISBN: 978-3-030-68795-3

Electronic ISBN: 978-3-030-68796-0

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-68796-0_10

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner