Skip to main content
Log in

PAC-Bayesian framework based drop-path method for 2D discriminative convolutional network pruning

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

Deep convolutional neural networks (CNNs) have demonstrated its extraordinary power on various visual tasks like object detection and classification. However, it is still challenging to deploy state-of-the-art models into real-world applications, such as autonomous vehicles, due to their expensive computation costs. In this paper, to accelerate the network inference, we introduce a novel pruning method named Drop-path to reduce model parameters of 2D deep CNNs. Given a trained deep CNN, pruning paths with different lengths is achieved by ordering the influence of neurons in each layer on the probably approximately correct (PAC) Bayesian boundary of the model. We believe that the invariance of PAC-Bayesian boundary is an important factor to guarantee the generalization ability of deep CNN under the condition of optimizing as much as possible. To the best of our knowledge, this is the first time to reduce model size based on the generalization error boundary. After pruning, we observe that the convolutional kernels themselves become sparse, rather than some being removed directly. In fact, Drop-path is generic and can be well generalized on multi-layer and multi-branch models, since parameter ranking criterion can be applied to any kind of layer and the importance scores can still be propagated. Finally, Drop-path is evaluated on two image classification benchmark datasets (ImageNet and CIFAR-10) with multiple deep CNN models, including AlexNet, VGG-16, GoogLeNet, and ResNet-34/50/56/110. Experimental results demonstrate that Drop-path achieves significant model compression and acceleration with negligible accuracy loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Bolukbasi, T., Wang, J., Dekei, O., & Saligrama, V. (2017). Adaptive neural networks for fast test-time prediction. In Proceedings of 34th international conference on machine learning (ICML) (pp. 527–536), Sydney.

  • Carsen, S., Marius, P., Nicholas, A. S., Michael, O., Peter, B., et al. (2016). Inhibitory control of correlated intrinsic variability in cortical networks. Elife. https://doi.org/10.7554/elife.19695.

    Article  Google Scholar 

  • Chen, F. C., & Jahanshahi, R. J. (2017). NB-CNN: Deep learning-based crack detection using convolutional neural network and naive Bayes data fusion. IEEE Transactions on Industrial Electronics,65(5), 4392–4400.

    Article  Google Scholar 

  • Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1800–1807), Honolulu.

  • Denton, E., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of conference and workshop on neural information processing systems (NIPS), Montreal, http://papers.nips.cc/paper/5544-exploiting-linear-structure-within-convolutional-networks-for-efficient-evaluation.pdf.

  • Figurnov, M., Ibraimova, A., & Dmitry, P. V. (2016). PerforatedCNNs: Acceleration through elimination of redundant convolutions. In Proceedings of conference and workshop on neural information processing systems (NIPS) (pp. 1–9), Barcelona. https://arxiv.org/abs/1504.08362.

  • Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In Proceedings of international conference on learning representations (ICLR). https://openreview.net/forum?id=rJl-b3RcF7.

  • Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of 13th international conference on artificial intelligence and statistics (AISTATS) (pp. 249–256), Sardinia.

  • Goh, H., Thome, N., Cord, M., & Lim, J. (2014). Learning deep hierarchical visual feature coding. IEEE Transactions on Neural Networks and Learning Systems,25(12), 2212–2225.

    Article  Google Scholar 

  • Gomez, N. A., Zhang, I., Swersky, K., Gal, Y., & Hinton, G. E. (2018). Targeted dropout. In Proceedings of conference and workshop on neural information processing systems (NIPS). https://nips.cc/Conferences/2018/Schedule?showEvent=10941.

  • Gutierrez-Galan, D., Dominguez-Morales, J. P., Cerezuela-Escudero, E., Rios-Navarro, A., Tapiador-Morales, R., Rivas-Perez, M., et al. (2018). Embedded neural network for real-time animal behavior classification. Neurocomputing,272, 17–26.

    Article  Google Scholar 

  • Han, S., Mao, H., & Dally, W. J. (2016). Deep compression: Compressing DNNs with pruning, trained quantization and huffman coding. In Proceedings of international conference on learning representations (ICLR), San Juan. https://arxiv.org/abs/1510.00149.

  • Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural networks. In Proceedings of conference and workshop on neural information processing systems (NIPS) (pp. 1–9), Montreal, Canada. http://papers.nips.cc/paper/5784-learning-both-weights-and-connections-for-efficient-neural-network.pdf.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 1026–1034), Santiago.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition ResNet. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778), Las Vegas.

  • He, Y., Zhang, X., & Sun, J. (2017). Channel pruning for accelerating very deep neural networks. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 1398–1406), Venice.

  • Herbrich, R., & Graepel, T. (2002). A PAC-Bayesian margin bound for linear classifiers”. IEEE Transactions on Information Theory,48(12), 3140–3150.

    Article  MathSciNet  Google Scholar 

  • Howard, A. G., Zhu, M., Chen, B., & Kalenichenko, D. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint, https://arxiv.org/abs/1704.04861.

  • Hu, Y., Li, C., Meng, K., Qin, J., & Yang, X. (2017). Group sparse optimization via l p, q, regularization. Journal of Machine Learning Research,8(30), 960–1011.

    Google Scholar 

  • Huang, Z., & Wang, N. (2018). Data-driven sparse structure selection for deep neural networks. In Proceedings of European conference on computer vision (ECCV) (pp. 317–334), Munich.

  • Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2017). SqueezeNet: AlexNet-Level accuracy with 50X fewer parameters and < 0.5 MB model size. In Proceedings of international conference on learning representations (ICLR), Toulon. https://openreview.net/pdf?id=S1xh5sYgx.

  • Jang, H., & Lee, J. (2018). An empirical study on modeling and prediction of bitcoin prices with Bayesian neural networks based on blockchain information. IEEE Access,6, 5427–5437.

    Article  Google Scholar 

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of 22nd ACM international conference on multimedia (pp. 675–678), Florida.

  • Jie, W., & Wang, J. (2017). Forecasting stochastic neural network based on financial empirical mode decomposition. Neural Networks,90, 8–20.

    Article  Google Scholar 

  • Kim, Y., Park, E., Yoo, S., Choi, T., Yang, L., & Shi, D. (2016). Compression of deep convolutional neural networks for fast and low power mobile applications. In Proceedings of international conference on learning representations (ICLR), Caribe Hilton. https://arxiv.org/abs/1511.06530.

  • Krizhevsky, A., & Hinton, G. E. (2009). Learning multiple layers of features from tiny images. Technical Report, 1(4), p. 7, University of Toronto, Toronto, Canada.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of conference and workshop on neural information processing systems (NIPS). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.

  • Langford, J., & Schapire, R. (2015). Tutorial on practical prediction theory for classification. Journal of Machine Learning Research,6(3), 273–306.

    MathSciNet  MATH  Google Scholar 

  • Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,521(7553), 436–444.

    Article  Google Scholar 

  • Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). Pruning filters for efficient convnets. In Proceedings of international conference on learning representations (ICLR), Toulon. https://openreview.net/pdf?id=rJqFGTslg.

  • Li, H., Xu, Z., Taylor, G., & Goldstein, T. (2018). Visualizing the loss landscape of neural nets. In International conference on learning representations workshop (ICLRW), Vancouver, BC, Canada (pp. 1–17).

  • Li, Y., Yin, G., Zhuang, W., Zhang, N., Wang, J., & Geng, K. (2018b). Compensating delays and noises in motion control of autonomous electric vehicles by using deep learning and unscented Kalman predictor. IEEE Transactions on Systems, Man, and Cybernetics: Systems,. https://doi.org/10.1109/TSMC.2018.2850367.

    Article  Google Scholar 

  • Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 2755–2763), Venice.

  • Luo, J., Wu, J., & Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In Proceedings of IEEE international conference on computer vision (ICCV) (pp. 5068–5076), Venice.

  • Miao, H., & He, D. (2017). Deep learning based approach for bearing fault diagnosis. IEEE Transactions on Industry Applications,53(3), 3057–3065.

    Article  Google Scholar 

  • Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2017). Pruning convolutional neural networks resource efficient inference. In Proceedings of international conference on learning representations (ICLR), Toulon. https://openreview.net/forum?id=SJGCiw5gl.

  • Painsky, A., & Rosset, S. (2016). Isotonic modeling with non-differentiable loss functions with application to Lasso regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence,38(2), 308–321.

    Article  Google Scholar 

  • Radosavovic, L., Dollár, P., Girshick, R., Gkioxari, G., & He, K. (2018). Data distillation: Towards omni-supervised learning. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4119–4128), Salt Lake City.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision,115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Samala, R. K., Chan, H., Hadjiiski, L., Helvie, M. A., Richter, C. D., & Cha, K. H. (2019). Breast cancer diagnosis in digital breast Tomosynthesis: effects of training sample size on multi-stage transfer learning using deep neural nets. IEEE Transactions on Medical Imaging,38(3), 686–696.

    Article  Google Scholar 

  • Sau, B. B., & Balasubramanian, V. N. (2016). Deep model compression: Distilling knowledge from noisy teachers. arXiv preprint, https://arxiv.org/abs/1610.09650.

  • Shin, H. C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging,35(5), 1285–1298.

    Article  Google Scholar 

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint, https://arxiv.org/abs/1409.1556.

  • Srinivas, S., & Babu, R. V. (2016). Learning the architecture of deep neural networks. In Proceedings of international conference on learning representations (ILCR), Caribe Hilton. https://arxiv.org/abs/1511.05497v1.

  • Sun, X., Ren, X., Ma, S., & Wang, H. (2017). meProp sparsified backpropagation for accelerated deep learning with reduced overfitting. In Proceedings of 34th international conference on machine learning (ICML) (pp. 3299–3308), Sydney.

  • Sun, Y., Wang, X., & Tang, X. (2016). Sparsifying neural network connections for face recognition. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4856–4864), Las Vegas.

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–9), Boston.

  • Theis, L., Korshunova, I., Tejani, A., & Huszár, F. (2018). Faster gaze prediction with dense networks and Fisher pruning. arXiv preprint, https://arxiv.org/abs/1801.05787.

  • Tian, Q., Arbel, T., & Clark, J. J. (2017). Deep LDA-pruned nets for efficient facial gender classification. In Proceedings of IEEE conference on computer vision and pattern recognition workshops (CVPRW) (pp. 512–521), Honolulu.

  • Torfi, A., & Shirvani, R. A. (2018). Attention-based guided structured sparsity of deep neural networks, In Proceedings of international conference on learning representations workshops (ICLRW), Canada. https://openreview.net/pdf?id=S1dGIXVUz.

  • Wang, J., Xu, C., Yang, X., & Zurada, J. M. (2018). A novel pruning algorithm for smoothing feedforward neural networks based on group Lasso method. IEEE Transactions on Neural Networks and Learning Systems,29(5), 2012–2024.

    Article  MathSciNet  Google Scholar 

  • Xu, T., Yang, P., Zhang, X., & Liu, C. (2019). LightweightNet: toward fast and lightweight convolutional neural networks via architecture distillation. Pattern Recognition,88, 272–284.

    Article  Google Scholar 

  • Yang, T. J., Chen, Y. H., & Sze, V. (2017). Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6071–6079), Honolulu.

  • Yu, F., & Koltun, V. (2015). Multi-scale context aggregation with dilated convolutions. arXiv preprint, https://arxiv.org/abs/1511.07122v2.

  • Yu, R., Li, A., Chen, C. F., Lai, J., Morariu, V. I., Han, X., et al. (2018). NISP: Pruning networks using neuron importance score propagation. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 9194–9203), Salt Lake City.

  • Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 6848–6856), Salt Lake City.

  • Zheng, Q., Tian, X., Yang, M., & Wang, H. (2019). Differential learning: a powerful tool for interactive content-based Image Retrieval. Engineering Letters,27(1), 202–215.

    Google Scholar 

  • Zheng, Q., Yang, M., Zhang, Q., & Yang, J. (2018a). A bilinear multi-scale convolutional neural network for fine-grained object classification. IAENG International Journal of Computer Science,45(2), 340–352.

    Google Scholar 

  • Zheng, Q., Yang, M., Zhang, Q., & Zhang, X. (2018b). Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process. IEEE Access,6, 15844–15869.

    Article  Google Scholar 

  • Zheng, Q., Yang, M., Zhang, Q., Zhang, X., & Yang, J. (2017). Understanding and boosting of deep convolutional neural network based on sample distribution, In IEEE Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). Chengdu, China,2017, 823–827.

    Google Scholar 

  • Zoph, B., Vasudevan, V., Shlens, J., & V. Le, Q. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8697–8710), Salt Lake City.

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2018YFC0831503), National Natural Science Foundation of China (Grant No. 61571275), China Computer Program for Education and Scientific Research (Grant No. NGII20161001), and Fundamental Research Funds of Shandong University (Grant No. 2018JC040).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingqiang Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Q., Tian, X., Yang, M. et al. PAC-Bayesian framework based drop-path method for 2D discriminative convolutional network pruning. Multidim Syst Sign Process 31, 793–827 (2020). https://doi.org/10.1007/s11045-019-00686-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-019-00686-z

Keywords

Navigation