Skip to main content
Erschienen in: The Journal of Supercomputing 4/2021

04.09.2020

Bayesian neural networks at scale: a performance analysis and pruning study

verfasst von: Himanshu Sharma, Elise Jennings

Erschienen in: The Journal of Supercomputing | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Bayesian neural networks (BNNs) are a promising method of obtaining statistical uncertainties for neural network predictions but with a higher computational overhead which can limit their practical usage. This work explores the use of high-performance computing with distributed training to address the challenges of training BNNs at scale. We present a performance and scalability comparison of training the VGG-16 and Resnet-18 models on a Cray-XC40 cluster. We demonstrate that network pruning can speed up inference without accuracy loss and provide an open-source software package, BPrune, to automate this pruning. For certain models we find that pruning up to 80% of the network results in only a 7.0% loss in accuracy. With the development of new hardware accelerators for deep learning, BNNs are of considerable interest for benchmarking performance. This analysis of training a BNN at scale outlines the limitations and benefits compared to a conventional neural network.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The communication efficiency is calculated as the ratio of communication time (MPI_WTIME) to elapsed time (includes MPI_INIT & MPI_FINALIZE). For 16-node run the efficiency of BNN VGG and Resnet models is 86.81% and 87.59% respectively, while that of the CNN VGG and Resnet model are 80.26% and 88.59%, respectively. For a 128-node run the BNN VGG and Resnet model communication efficiencies are 91.15% and 94.91%, while for the CNN VGG and Resnet model efficiencies are 86.99% and 89.11%, respectively.
 
2
The configuration for the GPUs used at ALCF is as follows, 8X Tesla V100, GPU total system memory of 128 GB, a CPU with Dual 20-Core Intel Xeon E5-2698 v4 2.2 GHz, 40,960 NVIDIA CUDA Cores, 5120 NVIDIA Tensor Cores, System memory of 512 GB 2133 MHz DDR4 LRDIMM, Storage of 4X 1.92 TB SSD RAID 0, and Dual 10 GbE, 4 IB EDR network.
 
4
The ratio for the Gaussian prior overweights can be simply calculated as a ratio \(|\mu | / \sigma\) [6]. In the initial BPrune release Gaussian priors are supported. Other choices of distribution will be supported in future releases.
 
Literatur
1.
Zurück zum Zitat Neal RM (1995) Bayesian learning for neural networks. Technical report Neal RM (1995) Bayesian learning for neural networks. Technical report
2.
Zurück zum Zitat Williams C (1996) Computing with infinite networks. In: Advances in neural information processing systems, vol 9. MIT Press, Cambridge, pp 295–301 Williams C (1996) Computing with infinite networks. In: Advances in neural information processing systems, vol 9. MIT Press, Cambridge, pp 295–301
3.
Zurück zum Zitat MacKay DJC (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4(3):448–472CrossRef MacKay DJC (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4(3):448–472CrossRef
4.
Zurück zum Zitat Hinton G, Van Camp D (1993) Keeping neural networks simple by minimizing the description length of the weights. In: Proceedings of the 6th Annual ACM Conference on Computational Learning Theory. Citeseer Hinton G, Van Camp D (1993) Keeping neural networks simple by minimizing the description length of the weights. In: Proceedings of the 6th Annual ACM Conference on Computational Learning Theory. Citeseer
5.
Zurück zum Zitat Barber D, Bishop CM (1998) Ensemble learning in Bayesian neural networks. NATO ASI Ser Ser F Comput Syst Sci 168:215–237MATH Barber D, Bishop CM (1998) Ensemble learning in Bayesian neural networks. NATO ASI Ser Ser F Comput Syst Sci 168:215–237MATH
6.
Zurück zum Zitat Graves A (2011) Practical variational inference for neural networks. In: Advances in neural information processing systems. pp 2348–2356 Graves A (2011) Practical variational inference for neural networks. In: Advances in neural information processing systems. pp 2348–2356
7.
Zurück zum Zitat Hoffman MD, Blei DM, Wang C, Paisley J (2013) Stochastic variational inference. J Mach Learn Res 14(1):1303–1347MathSciNetMATH Hoffman MD, Blei DM, Wang C, Paisley J (2013) Stochastic variational inference. J Mach Learn Res 14(1):1303–1347MathSciNetMATH
8.
10.
Zurück zum Zitat Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models
11.
Zurück zum Zitat Titsias M, Lázaro-Gredilla M (2014) Doubly stochastic variational Bayes for non-conjugate inference. In: International Conference on Machine Learning, pp 1971–1979 Titsias M, Lázaro-Gredilla M (2014) Doubly stochastic variational Bayes for non-conjugate inference. In: International Conference on Machine Learning, pp 1971–1979
12.
13.
Zurück zum Zitat Shridhar K, Laumann F, Liwicki M (2019) A comprehensive guide to Bayesian convolutional neural network with variational inference. arXiv preprint arXiv:1901.02731 Shridhar K, Laumann F, Liwicki M (2019) A comprehensive guide to Bayesian convolutional neural network with variational inference. arXiv preprint arXiv:​1901.​02731
14.
Zurück zum Zitat Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:​1207.​0580
15.
Zurück zum Zitat Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp 1050–1059 Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp 1050–1059
16.
Zurück zum Zitat Tran D, Dusenberry M, van der Wilk M, Hafner D (2019) Bayesian layers: a module for neural network uncertainty. In: Advances in neural information processing systems, pp 14633–14645 Tran D, Dusenberry M, van der Wilk M, Hafner D (2019) Bayesian layers: a module for neural network uncertainty. In: Advances in neural information processing systems, pp 14633–14645
17.
Zurück zum Zitat Shazeer N, Cheng Y, Parmar N, Tran D, Vaswani A, Koanantakool P, Hawkins P, Lee H, Hong M, Young C et al (2018) Mesh-tensorflow: deep learning for supercomputers. In: Advances in neural information processing systems, pp 10414–10423 Shazeer N, Cheng Y, Parmar N, Tran D, Vaswani A, Koanantakool P, Hawkins P, Lee H, Hong M, Young C et al (2018) Mesh-tensorflow: deep learning for supercomputers. In: Advances in neural information processing systems, pp 10414–10423
18.
Zurück zum Zitat Wen Y, Vicol P, Ba J, Tran D, Grosse R (2018) Flipout: efficient pseudo-independent weight perturbations on mini-batches. arXiv preprint: arXiv:1803.04386 Wen Y, Vicol P, Ba J, Tran D, Grosse R (2018) Flipout: efficient pseudo-independent weight perturbations on mini-batches. arXiv preprint: arXiv:​1803.​04386
19.
Zurück zum Zitat Tsyplikhin A (2019) Graphcore delivers 26x performance gains for finance customers Tsyplikhin A (2019) Graphcore delivers 26x performance gains for finance customers
22.
Zurück zum Zitat Baydin AG, Shao L, Bhimji W, Heinrich L, Meadows L, Liu J, Munk A, Naderiparizi S, Gram-Hansen B, Louppe G et al (2019) Etalumis: bringing probabilistic programming to scientific simulators at scale. arXiv preprint arXiv:1907.03382 Baydin AG, Shao L, Bhimji W, Heinrich L, Meadows L, Liu J, Munk A, Naderiparizi S, Gram-Hansen B, Louppe G et al (2019) Etalumis: bringing probabilistic programming to scientific simulators at scale. arXiv preprint arXiv:​1907.​03382
23.
Zurück zum Zitat Viebke A, Memeti S, Pllana S, Abraham A (2019) Chaos: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi. J Supercomput 75(1):197–227CrossRef Viebke A, Memeti S, Pllana S, Abraham A (2019) Chaos: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi. J Supercomput 75(1):197–227CrossRef
24.
Zurück zum Zitat ALCF (2019/2020) Xc40 machine overview. Technical report ALCF (2019/2020) Xc40 machine overview. Technical report
26.
27.
Zurück zum Zitat Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556
28.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778 He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
29.
Zurück zum Zitat Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2015) Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349 Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2015) Generating sentences from a continuous space. arXiv preprint arXiv:​1511.​06349
30.
Zurück zum Zitat Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: learning basic visual concepts with a constrained variational framework. ICLR 2(5):6 Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: learning basic visual concepts with a constrained variational framework. ICLR 2(5):6
31.
32.
Zurück zum Zitat Liu X, Gao J, Celikyilmaz A, Carin L et al (2019) Cyclical annealing schedule: a simple approach to mitigating kl vanishing. arXiv preprint arXiv:1903.10145 Liu X, Gao J, Celikyilmaz A, Carin L et al (2019) Cyclical annealing schedule: a simple approach to mitigating kl vanishing. arXiv preprint arXiv:​1903.​10145
34.
Zurück zum Zitat Naesseth CA, Ruiz FJR, Linderman SW, Blei DM (2017) Reparameterization gradients through acceptance–rejection sampling algorithms. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017 Naesseth CA, Ruiz FJR, Linderman SW, Blei DM (2017) Reparameterization gradients through acceptance–rejection sampling algorithms. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017
35.
Zurück zum Zitat Krizhevsky A et al (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer Krizhevsky A et al (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer
37.
Zurück zum Zitat Loosli G, Canu S, Bottou L (2007) Training invariant support vector machines using selective sampling Loosli G, Canu S, Bottou L (2007) Training invariant support vector machines using selective sampling
38.
39.
Zurück zum Zitat Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: a review for statisticians. University of California, Berkeley Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: a review for statisticians. University of California, Berkeley
40.
Zurück zum Zitat Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw 76(1) Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw 76(1)
41.
Zurück zum Zitat Stan Development Team et al (2017) PyStan: the Python interface to Stan. Version 2.16. 0.0 Stan Development Team et al (2017) PyStan: the Python interface to Stan. Version 2.16. 0.0
42.
Zurück zum Zitat Bingham E, Chen JP, Jankowiak M, Obermeyer F, Pradhan N, Karaletsos T, Singh R, Szerlip P, Horsfall P, Goodman ND (2019) Pyro: deep universal probabilistic programming. J Mach Learn Res 20(1):973–978MATH Bingham E, Chen JP, Jankowiak M, Obermeyer F, Pradhan N, Karaletsos T, Singh R, Szerlip P, Horsfall P, Goodman ND (2019) Pyro: deep universal probabilistic programming. J Mach Learn Res 20(1):973–978MATH
43.
Zurück zum Zitat Cusumano-Towner MF, Saad FA, Lew AK, Mansinghka VK (2019) Gen: a general-purpose probabilistic programming system with programmable inference. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019. ACM, New York, pp 221–236 Cusumano-Towner MF, Saad FA, Lew AK, Mansinghka VK (2019) Gen: a general-purpose probabilistic programming system with programmable inference. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019. ACM, New York, pp 221–236
44.
Zurück zum Zitat Dillon JV, Langmore I, Tran D, Brevdo E, Vasudevan S, Moore D, Patton B, Alemi A, Hoffman M, Saurous RA (2017) Tensorflow distributions. arXiv preprint arXiv:1711.10604 Dillon JV, Langmore I, Tran D, Brevdo E, Vasudevan S, Moore D, Patton B, Alemi A, Hoffman M, Saurous RA (2017) Tensorflow distributions. arXiv preprint arXiv:​1711.​10604
46.
Zurück zum Zitat Laanait N, Romero J, Yin J, Young MT, Treichler S, Starchenko V, Borisevich A, Sergeev A, Matheson M (2019) Exascale deep learning for scientific inverse problems. arXiv preprint arXiv:1909.11150 Laanait N, Romero J, Yin J, Young MT, Treichler S, Starchenko V, Borisevich A, Sergeev A, Matheson M (2019) Exascale deep learning for scientific inverse problems. arXiv preprint arXiv:​1909.​11150
48.
Zurück zum Zitat LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605 LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605
49.
Zurück zum Zitat Giles CL, Omlin CW (1994) Pruning recurrent neural networks for improved generalization performance. IEEE Trans Neural Netw 5(5):848–851CrossRef Giles CL, Omlin CW (1994) Pruning recurrent neural networks for improved generalization performance. IEEE Trans Neural Netw 5(5):848–851CrossRef
50.
Zurück zum Zitat Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from www.tensorflow.org. Accessed June 2019 Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from www.​tensorflow.​org. Accessed June 2019
Metadaten
Titel
Bayesian neural networks at scale: a performance analysis and pruning study
verfasst von
Himanshu Sharma
Elise Jennings
Publikationsdatum
04.09.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 4/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03401-z

Weitere Artikel der Ausgabe 4/2021

The Journal of Supercomputing 4/2021 Zur Ausgabe