Top

Neural Computing and Applications

Published in:

29-08-2021 | Original Article

EPMC: efficient parallel memory compression in deep neural network training

Authors: Zailong Chen, Shenghong Yang, Chubo Liu, Yikun Hu, Kenli Li, Keqin Li

Published in: Neural Computing and Applications | Issue 1/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Deep neural networks (DNNs) are getting deeper and larger, making memory become one of the most important bottlenecks during training. Researchers have found that the feature maps generated during DNN training occupy the major portion of memory footprint. To reduce memory demand, they proposed to encode the feature maps in the forward pass and decode them in the backward pass. However, we observe that the execution of encoding and decoding is time-consuming, leading to severe slowdown of the DNN training. To solve this problem, we present an efficient parallel memory compression framework—EPMC, which enables us to simultaneously reduce the memory footprint and the impact of encoding/decoding on DNN training. Our framework employs pipeline parallel optimization and specific-layer parallelism for encoding and decoding to reduce their impact on overall training. It also combines precision reduction with encoding for improving the data compressing ratio. We evaluate EPMC across four state-of-the-art DNNs. Experimental results show that EPMC can reduce the memory footprint during training to 2.3 times on average without accuracy loss. In addition, it can reduce the DNN training time by more than 2.1 times on average compared with the unoptimized encoding/decoding scheme. Moreover, compared with using the common compression scheme Compressed Sparse Row, EPMC can achieve data compression ratio by 2.2 times.

previous article A novel concept of domination in m-polar interval-valued fuzzy graph and its application

next article AENeT: an attention-enabled neural architecture for fake news detection using contextual features

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef

Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT press Cambridge, CambridgeMATH

Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M, Muller U, Zhang J, et al., End to end learning for self-driving cars, arXiv preprint arXiv:1604.07316

Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3476–3483

Weimer D, Scholz-Reiter B, Shpitalni M (2016) Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann 65(1):417–420CrossRef

Kalchbrenner N, Grefenstette E, Blunsom P, A convolutional neural network for modelling sentences, arXiv preprint arXiv:1404.2188

Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1725–1732

Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164

LeCun Y, et al (2015) Lenet-5, convolutional neural networks. URL: http://yann.lecun.com/exdb/lenet 20(5): 14

10.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778

11.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105

12.

Simonyan K, Zisserman A, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

13.

Rhu M, Gimelshein N, Clemons J, Zulfiqar A, Keckler S (2016) vdnn: Virtualized deep neural networks for scalable, memory-efficient neural network design. Memory-Efficient Neural Network Design, MICRO-2016

14.

Rhu M, O’Connor M, Chatterjee N, Pool J, Kwon Y, Keckler SW, Compressing dma engine: leveraging activation sparsity for training deep neural networks. In: (2018) IEEE International symposium on high performance computer architecture (HPCA). IEEE 2018:78–91

15.

Han S, Mao H, Dally WJ, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149

16.

He Y, Lin J, Liu Z,Wang H, Li L-J, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV), pp. 784–800

17.

Guo S, Wang Y, Li Q, Yan J (2020) Dmcp: Differentiable markov channel pruning for neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1539–1547

18.

Li Y, Gu S, Mayer C, Gool LV, Timofte R (2020) Group sparsity: the hinge between filter pruning and decomposition for network compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8018–8027

19.

Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1529–1538

20.

Liu Z, Mu H, Zhang X, Guo Z, Yang X, Cheng K-T, Sun J (2019) Metapruning: Meta learning for automatic neural network channel pruning. In: Proceedings of the IEEE international conference on computer vision, pp. 3296–3305

21.

Neill JO, An overview of neural network compression. arXiv preprint arXiv:2006.03669

22.

An overview of model compression techniques for deep learning in space. https://medium.com/gsi-technology/an-overview-of-model-compression-techniques-for-deep-learning-in-space-3fd8d4ce84e5 (2020)

23.

Xu Y, Wang Y, Zhou A, Lin W, Xiong H (2018) Deep neural network compression with single and multiple level quantization. In: Proceedings of the AAAI conference on artificial intelligence. 32

24.

Kim H, Khan MUK, Kyung C-M (2019) Efficient neural network compression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 12569–12577

25.

Ge S (2018) Efficient deep learning in network compression and acceleration. In: Digital systems, IntechOpen

26.

Paupamah K, James S, Klein R (2020) Quantisation and pruning for neural network compression and regularisation. In: International SAUPEC/RobMech/PRASA Conference. IEEE 2020:1–6

27.

Jin S, Di S, Liang X, Tian J, Tao D, Cappello F (2019) Deepsz: A novel framework to compress deep neural networks by using error-bounded lossy compression. In: Proceedings of the 28th international symposium on high-performance parallel and distributed computing. pp. 159–170

28.

Kozlov A, Lazarevich I, Shamporov V, Lyalyushkin N, Gorbachev Y, Neural network compression framework for fast model inference. arXiv preprint arXiv:2002.08679

29.

Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K, Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677

30.

Jain A, Phanishayee A, Mars J, Tang L, Pekhimenko G (2018) Gist: Efficient data encoding for deep neural network training. In: ACM/IEEE 45th annual international symposium on computer architecture (ISCA). IEEE 2018:776–789

31.

Chen T, Xu B, Zhang C, Guestrin C, Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174

32.

Wen W, Xu C, Wu C, Wang Y, Chen Y, Li H (2017) Coordinating filters for faster deep neural networks. In: Proceedings of the IEEE international conference on computer vision. pp. 658–666

33.

Zhu C, Han S, Mao H, Dally WJ, Trained ternary quantization. arXiv preprint arXiv:1612.01064

34.

Liu Z, Wu B, Luo W, Yang X, Liu W, Cheng K-T (2018) Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In: Proceedings of the European conference on computer vision (ECCV), pp. 722–737

35.

Qin H, Gong R, Liu X, Shen M, Wei Z, Yu F, Song J (2020) Forward and backward information retention for accurate binary neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2250–2259

36.

Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y, Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229

37.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9

38.

De Sa C, Feldman M, Ré C, Olukotun K (2017) Understanding and optimizing asynchronous low-precision stochastic gradient descent. In: Proceedings of the 44th annual international symposium on computer architecture, pp. 561–574

39.

Cheng J, Grossman M, McKercher T (2014) Professional CUDA c programming. Wiley, Hoboken

40.

Nvidia kepler architecture (2012) https://www.nvidia.cn/content/apac/pdf/tesla/nvidia-kepler-gk110-architecture-whitepaper-cn.pdf

41.

Nvidia tesla k40c (2013) https://www.techpowerup.com/gpu-specs/tesla-k40c.c2505

42.

Nvidia geforce gtx 1070 (2016) https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1070/

43.

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252MathSciNetCrossRef

Title: EPMC: efficient parallel memory compression in deep neural network training
Authors: Zailong Chen
Shenghong Yang
Chubo Liu
Yikun Hu
Kenli Li
Keqin Li
Publication date: 29-08-2021
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 1/2022
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-021-06433-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 1/2022

Correction to: Innovative deep learning models for EEG-based vigilance detection

A novel meta-heuristic algorithm for solving numerical optimization problems: Ali Baba and the forty thieves

An integration framework of topology method, enhanced adaptive neuro-fuzzy inference system, water cycle algorithm with evaporation rate for design optimization for a flexure gripper

Continually trained life-long classification

Decision-making in machine learning using novel picture fuzzy divergence measure

Ensemble deep transfer learning model for Arabic (Indian) handwritten digit recognition

Premium Partner