nach oben

International Journal of Parallel Programming

Erschienen in:

03.10.2017

SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks

verfasst von: Yuntao Lu, Chao Wang, Lei Gong, Xuehai Zhou

Erschienen in: International Journal of Parallel Programming | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Neural networks have been widely used as a powerful representation in various research domains, such as computer vision, natural language processing, and artificial intelligence, etc. To achieve better effect of applications, the increasing number of neurons and synapses make neural networks both computationally and memory intensive, furthermore difficult to deploy on resource-limited platforms. Sparse methods can reduce redundant neurons and synapses, but conventional accelerators cannot benefit from the sparsity. In this paper, we propose an efficient accelerating method for sparse neural networks, which compresses synapse weights and processes the compressed structure by an FPGA accelerator. Our method will achieve 40 and 20% compression ratio of synapse weights in convolutional and full-connected layers. The experiment results demonstrate that our accelerating method can boost an FPGA accelerator to achieve 3\(\times \) speedup over a conventional one.

Vorheriger Artikel Editor’s Note: Special Issue on Network and Parallel Computing for New Architectures and Applications

Nächster Artikel Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467 (2016)

Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math compiler in python. In: EuroSciPy, pp. 1–7 (2010)

Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ACM Sigplan Notices, vol. 49, pp. 269–284 (2014)

Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., Andrew, N.: Deep learning with COTS HPC systems. In: ICML, pp. 1337–1345 (2013)

Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Tech. rep. (2002)

Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., Temam, O.: Shidiannao: shifting vision processing closer to the sensor. In: SCAN, vol. 43, pp. 92–104 (2015)

Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B.C., Richardson, S., Kozyrakis, C., Horowitz, M.: Understanding sources of inefficiency in general-purpose chips. In: SCAN, vol. 38, pp. 37–47 (2010)

Han, S., Mao, H., Dally, W.J.: Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:1510.00149 (2015)

Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J.: EIE: efficient inference engine on compressed deep neural network. In: ISCA, pp. 243–254 (2016)

10.

Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: NIPS, pp. 1135–1143 (2016)

11.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ICM, pp. 675–678 (2014)

12.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)

13.

Le, Q.V.: Building high-level features using large scale unsupervised learning. In: ICASSP, pp. 8595–8598 (2013)

14.

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef

15.

Lu, Y., Gong, L., Xu, C., Sun, F., Zhang, Y., Wang, C., Zhou, X.: Work-in-Progress: A High-Performance FPGA Accelerator for Sparse Neural Networks (2017)

16.

Luo, T., Liu, S., Li, L., Wang, Y., Zhang, S., Chen, T., Xu, Z., Temam, O., Chen, Y.: Dadiannao: a neural network supercomputer. TC 66(1), 73–88 (2017)

17.

Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)

18.

Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607 (1996)CrossRef

19.

Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: ANIPS, pp. 1137–1144 (2007)

20.

Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., et al.: Going deeper with embedded FPGA platform for convolutional neural network. In: FPGA, pp. 26–35 (2016)

21.

Rafique, A., Constantinides, G.A., Kapre, N.: Communication optimization of iterative sparse matrix-vector multiply on GPUs and FPGAs. TPDS 26(1), 24–34 (2015)

22.

Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 (2015)

23.

Temam, O.: A defect-tolerant accelerator for emerging high-performance applications. In: ISCA, pp. 356–367 (2012)

24.

Wang, C., Zhang, J., Zhou, X., Feng, X., Wang, A.: A flexible high speed star network based on peer to peer links on FPGA. In: 2011 IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), IEEE, pp. 107–112 (2011)

25.

Wang, C., Li, X., Chen, P., Zhang, J., Feng, X., Zhou, X.: Regarding processors and reconfigurable IP cores as services. In: 2012 IEEE Ninth International Conference on Services Computing (SCC), pp. 668–669. IEEE (2012)

26.

Wang, C., Gong, L., Yu, Q., Li, X., Xie, Y., Zhou, X.: Dlau: a scalable deep learning accelerator unit on fpga. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(3), 513–517 (2017)

27.

Wang, F.Y., Zhang, J.J., Zheng, X., Wang, X., Yuan, Y., Dai, X., Zhang, J., Yang, L.: Where does alphago go: from church-turing thesis to alphago thesis and beyond. JAS 3(2), 113–120 (2016)

28.

Yu, Q., Wang, C., Ma, X., Li, X., Zhou, X.: A deep learning prediction process accelerator based FPGA. In: CCGrid, IEEE, pp. 1159–1162 (2015)

29.

Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: FPGA, pp. 161–170 (2015)

30.

Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., Guo, Q., Chen, T., Chen, Y.: Cambricon-x: an accelerator for sparse neural networks. In: MICRO, pp. 1–12 (2016)

Titel: SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks
verfasst von: Yuntao Lu
Chao Wang
Lei Gong
Xuehai Zhou
Publikationsdatum: 03.10.2017
Verlag: Springer US
Erschienen in: International Journal of Parallel Programming / Ausgabe 4/2018
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-017-0528-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2018

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs

RollSec: Automatically Secure Software States Against General Rollback

Improving the Performance of Distributed TensorFlow with RDMA

DCF: A Dataflow-Based Collaborative Filtering Training Algorithm

Enabling Realistic Logical Device Interface and Driver for NVM Express Enabled Full System Simulations

Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data- and Compute-intensive

Premium Partner