PAC-Bayesian framework based drop-path method for 2D discriminative convolutional network pruning

Zheng, Qinghe; Tian, Xinyu; Yang, Mingqiang; Wu, Yulin; Su, Huake

doi:10.1007/s11045-019-00686-z

PAC-Bayesian framework based drop-path method for 2D discriminative convolutional network pruning

Published: 19 October 2019

Volume 31, pages 793–827, (2020)
Cite this article

Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

744 Accesses
65 Citations
Explore all metrics

Abstract

Deep convolutional neural networks (CNNs) have demonstrated its extraordinary power on various visual tasks like object detection and classification. However, it is still challenging to deploy state-of-the-art models into real-world applications, such as autonomous vehicles, due to their expensive computation costs. In this paper, to accelerate the network inference, we introduce a novel pruning method named Drop-path to reduce model parameters of 2D deep CNNs. Given a trained deep CNN, pruning paths with different lengths is achieved by ordering the influence of neurons in each layer on the probably approximately correct (PAC) Bayesian boundary of the model. We believe that the invariance of PAC-Bayesian boundary is an important factor to guarantee the generalization ability of deep CNN under the condition of optimizing as much as possible. To the best of our knowledge, this is the first time to reduce model size based on the generalization error boundary. After pruning, we observe that the convolutional kernels themselves become sparse, rather than some being removed directly. In fact, Drop-path is generic and can be well generalized on multi-layer and multi-branch models, since parameter ranking criterion can be applied to any kind of layer and the importance scores can still be propagated. Finally, Drop-path is evaluated on two image classification benchmark datasets (ImageNet and CIFAR-10) with multiple deep CNN models, including AlexNet, VGG-16, GoogLeNet, and ResNet-34/50/56/110. Experimental results demonstrate that Drop-path achieves significant model compression and acceleration with negligible accuracy loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Blending Pruning Criteria for Convolutional Neural Networks

Global balanced iterative pruning for efficient convolutional neural networks

Article 27 July 2022

CLMIP: cross-layer manifold invariance based pruning method of deep convolutional neural network for real-time road type recognition

Article 17 July 2020

References

Bolukbasi, T., Wang, J., Dekei, O., & Saligrama, V. (2017). Adaptive neural networks for fast test-time prediction. In Proceedings of 34th international conference on machine learning (ICML) (pp. 527–536), Sydney.
Carsen, S., Marius, P., Nicholas, A. S., Michael, O., Peter, B., et al. (2016). Inhibitory control of correlated intrinsic variability in cortical networks. Elife. https://doi.org/10.7554/elife.19695.
Article Google Scholar
Chen, F. C., & Jahanshahi, R. J. (2017). NB-CNN: Deep learning-based crack detection using convolutional neural network and naive Bayes data fusion. IEEE Transactions on Industrial Electronics,65(5), 4392–4400.
Article Google Scholar
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1800–1807), Honolulu.
Denton, E., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of conference and workshop on neural information processing systems (NIPS), Montreal, http://papers.nips.cc/paper/5544-exploiting-linear-structure-within-convolutional-networks-for-efficient-evaluation.pdf.
Figurnov, M., Ibraimova, A., & Dmitry, P. V. (2016). PerforatedCNNs: Acceleration through elimination of redundant convolutions. In Proceedings of conference and workshop on neural information processing systems (NIPS) (pp. 1–9), Barcelona. https://arxiv.org/abs/1504.08362.
Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In Proceedings of international conference on learning representations (ICLR). https://openreview.net/forum?id=rJl-b3RcF7.
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of 13th international conference on artificial intelligence and statistics (AISTATS) (pp. 249–256), Sardinia.
Goh, H., Thome, N., Cord, M., & Lim, J. (2014). Learning deep hierarchical visual feature coding. IEEE Transactions on Neural Networks and Learning Systems,25(12), 2212–2225.
Article Google Scholar
Gomez, N. A., Zhang, I., Swersky, K., Gal, Y., & Hinton, G. E. (2018). Targeted dropout. In Proceedings of conference and workshop on neural information processing systems (NIPS). https://nips.cc/Conferences/2018/Schedule?showEvent=10941.
Gutierrez-Galan, D., Dominguez-Morales, J. P., Cerezuela-Escudero, E., Rios-Navarro, A., Tapiador-Morales, R., Rivas-Perez, M., et al. (2018). Embedded neural network for real-time animal behavior classification. Neurocomputing,272, 17–26.
Article Google Scholar
Han, S., Mao, H., & Dally, W. J. (2016). Deep compression: Compressing DNNs with pruning, trained quantization and huffman coding. In Proceedings of international conference on learning representations (ICLR), San Juan. https://arxiv.org/abs/1510.00149.
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural networks. In Proceedings of conference and workshop on neural information processing systems (NIPS) (pp. 1–9), Montreal, Canada. http://papers.nips.cc/paper/5784-learning-both-weights-and-connections-for-efficient-neural-network.pdf.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 1026–1034), Santiago.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition ResNet. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778), Las Vegas.
He, Y., Zhang, X., & Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 1398–1406), Venice.
Herbrich, R., & Graepel, T. (2002). A PAC-Bayesian margin bound for linear classifiers”. IEEE Transactions on Information Theory,48(12), 3140–3150.
Article MathSciNet Google Scholar
Howard, A. G., Zhu, M., Chen, B., & Kalenichenko, D. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint, https://arxiv.org/abs/1704.04861.
Hu, Y., Li, C., Meng, K., Qin, J., & Yang, X. (2017). Group sparse optimization via l p, q, regularization. Journal of Machine Learning Research,8(30), 960–1011.
Google Scholar
Huang, Z., & Wang, N. (2018). Data-driven sparse structure selection for deep neural networks. In Proceedings of European conference on computer vision (ECCV) (pp. 317–334), Munich.
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2017). SqueezeNet: AlexNet-Level accuracy with 50X fewer parameters and < 0.5 MB model size. In Proceedings of international conference on learning representations (ICLR), Toulon. https://openreview.net/pdf?id=S1xh5sYgx.
Jang, H., & Lee, J. (2018). An empirical study on modeling and prediction of bitcoin prices with Bayesian neural networks based on blockchain information. IEEE Access,6, 5427–5437.
Article Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of 22nd ACM international conference on multimedia (pp. 675–678), Florida.
Jie, W., & Wang, J. (2017). Forecasting stochastic neural network based on financial empirical mode decomposition. Neural Networks,90, 8–20.
Article Google Scholar
Kim, Y., Park, E., Yoo, S., Choi, T., Yang, L., & Shi, D. (2016). Compression of deep convolutional neural networks for fast and low power mobile applications. In Proceedings of international conference on learning representations (ICLR), Caribe Hilton. https://arxiv.org/abs/1511.06530.
Krizhevsky, A., & Hinton, G. E. (2009). Learning multiple layers of features from tiny images. Technical Report, 1(4), p. 7, University of Toronto, Toronto, Canada.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of conference and workshop on neural information processing systems (NIPS). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.
Langford, J., & Schapire, R. (2015). Tutorial on practical prediction theory for classification. Journal of Machine Learning Research,6(3), 273–306.
MathSciNet MATH Google Scholar
Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,521(7553), 436–444.
Article Google Scholar
Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). Pruning filters for efficient convnets. In Proceedings of international conference on learning representations (ICLR), Toulon. https://openreview.net/pdf?id=rJqFGTslg.
Li, H., Xu, Z., Taylor, G., & Goldstein, T. (2018). Visualizing the loss landscape of neural nets. In International conference on learning representations workshop (ICLRW), Vancouver, BC, Canada (pp. 1–17).
Li, Y., Yin, G., Zhuang, W., Zhang, N., Wang, J., & Geng, K. (2018b). Compensating delays and noises in motion control of autonomous electric vehicles by using deep learning and unscented Kalman predictor. IEEE Transactions on Systems, Man, and Cybernetics: Systems,. https://doi.org/10.1109/TSMC.2018.2850367.
Article Google Scholar
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 2755–2763), Venice.
Luo, J., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 5068–5076), Venice.
Miao, H., & He, D. (2017). Deep learning based approach for bearing fault diagnosis. IEEE Transactions on Industry Applications,53(3), 3057–3065.
Article Google Scholar
Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2017). Pruning convolutional neural networks resource efficient inference. In Proceedings of international conference on learning representations (ICLR), Toulon. https://openreview.net/forum?id=SJGCiw5gl.
Painsky, A., & Rosset, S. (2016). Isotonic modeling with non-differentiable loss functions with application to Lasso regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence,38(2), 308–321.
Article Google Scholar
Radosavovic, L., Dollár, P., Girshick, R., Gkioxari, G., & He, K. (2018). Data distillation: Towards omni-supervised learning. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4119–4128), Salt Lake City.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision,115(3), 211–252.
Article MathSciNet Google Scholar
Samala, R. K., Chan, H., Hadjiiski, L., Helvie, M. A., Richter, C. D., & Cha, K. H. (2019). Breast cancer diagnosis in digital breast Tomosynthesis: effects of training sample size on multi-stage transfer learning using deep neural nets. IEEE Transactions on Medical Imaging,38(3), 686–696.
Article Google Scholar
Sau, B. B., & Balasubramanian, V. N. (2016). Deep model compression: Distilling knowledge from noisy teachers. arXiv preprint, https://arxiv.org/abs/1610.09650.
Shin, H. C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging,35(5), 1285–1298.
Article Google Scholar
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint, https://arxiv.org/abs/1409.1556.
Srinivas, S., & Babu, R. V. (2016). Learning the architecture of deep neural networks. In Proceedings of international conference on learning representations (ILCR), Caribe Hilton. https://arxiv.org/abs/1511.05497v1.
Sun, X., Ren, X., Ma, S., & Wang, H. (2017). meProp sparsified backpropagation for accelerated deep learning with reduced overfitting. In Proceedings of 34th international conference on machine learning (ICML) (pp. 3299–3308), Sydney.
Sun, Y., Wang, X., & Tang, X. (2016). Sparsifying neural network connections for face recognition. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4856–4864), Las Vegas.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–9), Boston.
Theis, L., Korshunova, I., Tejani, A., & Huszár, F. (2018). Faster gaze prediction with dense networks and Fisher pruning. arXiv preprint, https://arxiv.org/abs/1801.05787.
Tian, Q., Arbel, T., & Clark, J. J. (2017). Deep LDA-pruned nets for efficient facial gender classification. In Proceedings of IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 512–521), Honolulu.
Torfi, A., & Shirvani, R. A. (2018). Attention-based guided structured sparsity of deep neural networks, In Proceedings of international conference on learning representations workshops (ICLRW), Canada. https://openreview.net/pdf?id=S1dGIXVUz.
Wang, J., Xu, C., Yang, X., & Zurada, J. M. (2018). A novel pruning algorithm for smoothing feedforward neural networks based on group Lasso method. IEEE Transactions on Neural Networks and Learning Systems,29(5), 2012–2024.
Article MathSciNet Google Scholar
Xu, T., Yang, P., Zhang, X., & Liu, C. (2019). LightweightNet: toward fast and lightweight convolutional neural networks via architecture distillation. Pattern Recognition,88, 272–284.
Article Google Scholar
Yang, T. J., Chen, Y. H., & Sze, V. (2017). Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6071–6079), Honolulu.
Yu, F., & Koltun, V. (2015). Multi-scale context aggregation with dilated convolutions. arXiv preprint, https://arxiv.org/abs/1511.07122v2.
Yu, R., Li, A., Chen, C. F., Lai, J., Morariu, V. I., Han, X., et al. (2018). NISP: Pruning networks using neuron importance score propagation. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 9194–9203), Salt Lake City.
Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 6848–6856), Salt Lake City.
Zheng, Q., Tian, X., Yang, M., & Wang, H. (2019). Differential learning: a powerful tool for interactive content-based Image Retrieval. Engineering Letters,27(1), 202–215.
Google Scholar
Zheng, Q., Yang, M., Zhang, Q., & Yang, J. (2018a). A bilinear multi-scale convolutional neural network for fine-grained object classification. IAENG International Journal of Computer Science,45(2), 340–352.
Google Scholar
Zheng, Q., Yang, M., Zhang, Q., & Zhang, X. (2018b). Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process. IEEE Access,6, 15844–15869.
Article Google Scholar
Zheng, Q., Yang, M., Zhang, Q., Zhang, X., & Yang, J. (2017). Understanding and boosting of deep convolutional neural network based on sample distribution, In IEEE Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). Chengdu, China,2017, 823–827.
Google Scholar
Zoph, B., Vasudevan, V., Shlens, J., & V. Le, Q. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8697–8710), Salt Lake City.

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2018YFC0831503), National Natural Science Foundation of China (Grant No. 61571275), China Computer Program for Education and Scientific Research (Grant No. NGII20161001), and Fundamental Research Funds of Shandong University (Grant No. 2018JC040).

Author information

Authors and Affiliations

School of Information Science and Engineering, Shandong University, Qingdao, 266237, China
Qinghe Zheng, Mingqiang Yang & Yulin Wu
College of Mechanical and Electrical Engineering, Shandong Management University, Jinan, 250357, China
Xinyu Tian
School of Microelectronics, Xidian University, Xi’an, 710126, China
Huake Su

Authors

Qinghe Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Tian
View author publications
You can also search for this author in PubMed Google Scholar
Mingqiang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yulin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Huake Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingqiang Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, Q., Tian, X., Yang, M. et al. PAC-Bayesian framework based drop-path method for 2D discriminative convolutional network pruning. Multidim Syst Sign Process 31, 793–827 (2020). https://doi.org/10.1007/s11045-019-00686-z

Download citation

Received: 09 May 2019
Revised: 25 July 2019
Accepted: 04 October 2019
Published: 19 October 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11045-019-00686-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PAC-Bayesian framework based drop-path method for 2D discriminative convolutional network pruning

Abstract

Access this article

Similar content being viewed by others

Blending Pruning Criteria for Convolutional Neural Networks

Global balanced iterative pruning for efficient convolutional neural networks

CLMIP: cross-layer manifold invariance based pruning method of deep convolutional neural network for real-time road type recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PAC-Bayesian framework based drop-path method for 2D discriminative convolutional network pruning

Abstract

Access this article

Similar content being viewed by others

Blending Pruning Criteria for Convolutional Neural Networks

Global balanced iterative pruning for efficient convolutional neural networks

CLMIP: cross-layer manifold invariance based pruning method of deep convolutional neural network for real-time road type recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation