Skip to main content
Erschienen in: International Journal of Parallel Programming 4/2018

03.10.2017

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks

verfasst von: Yuntao Lu, Chao Wang, Lei Gong, Xuehai Zhou

Erschienen in: International Journal of Parallel Programming | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Neural networks have been widely used as a powerful representation in various research domains, such as computer vision, natural language processing, and artificial intelligence, etc. To achieve better effect of applications, the increasing number of neurons and synapses make neural networks both computationally and memory intensive, furthermore difficult to deploy on resource-limited platforms. Sparse methods can reduce redundant neurons and synapses, but conventional accelerators cannot benefit from the sparsity. In this paper, we propose an efficient accelerating method for sparse neural networks, which compresses synapse weights and processes the compressed structure by an FPGA accelerator. Our method will achieve 40 and 20% compression ratio of synapse weights in convolutional and full-connected layers. The experiment results demonstrate that our accelerating method can boost an FPGA accelerator to achieve 3\(\times \) speedup over a conventional one.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467 (2016) Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:​1603.​04467 (2016)
2.
Zurück zum Zitat Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math compiler in python. In: EuroSciPy, pp. 1–7 (2010) Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math compiler in python. In: EuroSciPy, pp. 1–7 (2010)
3.
Zurück zum Zitat Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ACM Sigplan Notices, vol. 49, pp. 269–284 (2014) Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ACM Sigplan Notices, vol. 49, pp. 269–284 (2014)
4.
Zurück zum Zitat Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., Andrew, N.: Deep learning with COTS HPC systems. In: ICML, pp. 1337–1345 (2013) Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., Andrew, N.: Deep learning with COTS HPC systems. In: ICML, pp. 1337–1345 (2013)
5.
Zurück zum Zitat Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Tech. rep. (2002) Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Tech. rep. (2002)
6.
Zurück zum Zitat Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., Temam, O.: Shidiannao: shifting vision processing closer to the sensor. In: SCAN, vol. 43, pp. 92–104 (2015) Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., Temam, O.: Shidiannao: shifting vision processing closer to the sensor. In: SCAN, vol. 43, pp. 92–104 (2015)
7.
Zurück zum Zitat Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B.C., Richardson, S., Kozyrakis, C., Horowitz, M.: Understanding sources of inefficiency in general-purpose chips. In: SCAN, vol. 38, pp. 37–47 (2010) Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B.C., Richardson, S., Kozyrakis, C., Horowitz, M.: Understanding sources of inefficiency in general-purpose chips. In: SCAN, vol. 38, pp. 37–47 (2010)
8.
Zurück zum Zitat Han, S., Mao, H., Dally, W.J.: Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:1510.00149 (2015) Han, S., Mao, H., Dally, W.J.: Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:​1510.​00149 (2015)
9.
Zurück zum Zitat Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J.: EIE: efficient inference engine on compressed deep neural network. In: ISCA, pp. 243–254 (2016) Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J.: EIE: efficient inference engine on compressed deep neural network. In: ISCA, pp. 243–254 (2016)
10.
Zurück zum Zitat Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: NIPS, pp. 1135–1143 (2016) Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: NIPS, pp. 1135–1143 (2016)
11.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ICM, pp. 675–678 (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ICM, pp. 675–678 (2014)
12.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
13.
Zurück zum Zitat Le, Q.V.: Building high-level features using large scale unsupervised learning. In: ICASSP, pp. 8595–8598 (2013) Le, Q.V.: Building high-level features using large scale unsupervised learning. In: ICASSP, pp. 8595–8598 (2013)
14.
Zurück zum Zitat LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
15.
Zurück zum Zitat Lu, Y., Gong, L., Xu, C., Sun, F., Zhang, Y., Wang, C., Zhou, X.: Work-in-Progress: A High-Performance FPGA Accelerator for Sparse Neural Networks (2017) Lu, Y., Gong, L., Xu, C., Sun, F., Zhang, Y., Wang, C., Zhou, X.: Work-in-Progress: A High-Performance FPGA Accelerator for Sparse Neural Networks (2017)
16.
Zurück zum Zitat Luo, T., Liu, S., Li, L., Wang, Y., Zhang, S., Chen, T., Xu, Z., Temam, O., Chen, Y.: Dadiannao: a neural network supercomputer. TC 66(1), 73–88 (2017) Luo, T., Liu, S., Li, L., Wang, Y., Zhang, S., Chen, T., Xu, Z., Temam, O., Chen, Y.: Dadiannao: a neural network supercomputer. TC 66(1), 73–88 (2017)
17.
Zurück zum Zitat Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010) Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)
18.
Zurück zum Zitat Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607 (1996)CrossRef Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607 (1996)CrossRef
19.
Zurück zum Zitat Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: ANIPS, pp. 1137–1144 (2007) Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: ANIPS, pp. 1137–1144 (2007)
20.
Zurück zum Zitat Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., et al.: Going deeper with embedded FPGA platform for convolutional neural network. In: FPGA, pp. 26–35 (2016) Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., et al.: Going deeper with embedded FPGA platform for convolutional neural network. In: FPGA, pp. 26–35 (2016)
21.
Zurück zum Zitat Rafique, A., Constantinides, G.A., Kapre, N.: Communication optimization of iterative sparse matrix-vector multiply on GPUs and FPGAs. TPDS 26(1), 24–34 (2015) Rafique, A., Constantinides, G.A., Kapre, N.: Communication optimization of iterative sparse matrix-vector multiply on GPUs and FPGAs. TPDS 26(1), 24–34 (2015)
22.
23.
Zurück zum Zitat Temam, O.: A defect-tolerant accelerator for emerging high-performance applications. In: ISCA, pp. 356–367 (2012) Temam, O.: A defect-tolerant accelerator for emerging high-performance applications. In: ISCA, pp. 356–367 (2012)
24.
Zurück zum Zitat Wang, C., Zhang, J., Zhou, X., Feng, X., Wang, A.: A flexible high speed star network based on peer to peer links on FPGA. In: 2011 IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), IEEE, pp. 107–112 (2011) Wang, C., Zhang, J., Zhou, X., Feng, X., Wang, A.: A flexible high speed star network based on peer to peer links on FPGA. In: 2011 IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), IEEE, pp. 107–112 (2011)
25.
Zurück zum Zitat Wang, C., Li, X., Chen, P., Zhang, J., Feng, X., Zhou, X.: Regarding processors and reconfigurable IP cores as services. In: 2012 IEEE Ninth International Conference on Services Computing (SCC), pp. 668–669. IEEE (2012) Wang, C., Li, X., Chen, P., Zhang, J., Feng, X., Zhou, X.: Regarding processors and reconfigurable IP cores as services. In: 2012 IEEE Ninth International Conference on Services Computing (SCC), pp. 668–669. IEEE (2012)
26.
Zurück zum Zitat Wang, C., Gong, L., Yu, Q., Li, X., Xie, Y., Zhou, X.: Dlau: a scalable deep learning accelerator unit on fpga. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(3), 513–517 (2017) Wang, C., Gong, L., Yu, Q., Li, X., Xie, Y., Zhou, X.: Dlau: a scalable deep learning accelerator unit on fpga. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(3), 513–517 (2017)
27.
Zurück zum Zitat Wang, F.Y., Zhang, J.J., Zheng, X., Wang, X., Yuan, Y., Dai, X., Zhang, J., Yang, L.: Where does alphago go: from church-turing thesis to alphago thesis and beyond. JAS 3(2), 113–120 (2016) Wang, F.Y., Zhang, J.J., Zheng, X., Wang, X., Yuan, Y., Dai, X., Zhang, J., Yang, L.: Where does alphago go: from church-turing thesis to alphago thesis and beyond. JAS 3(2), 113–120 (2016)
28.
Zurück zum Zitat Yu, Q., Wang, C., Ma, X., Li, X., Zhou, X.: A deep learning prediction process accelerator based FPGA. In: CCGrid, IEEE, pp. 1159–1162 (2015) Yu, Q., Wang, C., Ma, X., Li, X., Zhou, X.: A deep learning prediction process accelerator based FPGA. In: CCGrid, IEEE, pp. 1159–1162 (2015)
29.
Zurück zum Zitat Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: FPGA, pp. 161–170 (2015) Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: FPGA, pp. 161–170 (2015)
30.
Zurück zum Zitat Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., Guo, Q., Chen, T., Chen, Y.: Cambricon-x: an accelerator for sparse neural networks. In: MICRO, pp. 1–12 (2016) Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., Guo, Q., Chen, T., Chen, Y.: Cambricon-x: an accelerator for sparse neural networks. In: MICRO, pp. 1–12 (2016)
Metadaten
Titel
SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks
verfasst von
Yuntao Lu
Chao Wang
Lei Gong
Xuehai Zhou
Publikationsdatum
03.10.2017
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 4/2018
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-017-0528-8

Weitere Artikel der Ausgabe 4/2018

International Journal of Parallel Programming 4/2018 Zur Ausgabe

Premium Partner