Skip to main content

2019 | OriginalPaper | Buchkapitel

Deep Fusion: A Software Scheduling Method for Memory Access Optimization

verfasst von : Yimin Zhuang, Shaohui Peng, Xiaobing Chen, Shengyuan Zhou, Tian Zhi, Wei Li, Shaoli Liu

Erschienen in: Network and Parallel Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep neural networks (DNNs) have been considered to be the state-of-the-art artificial intelligence methods in a very broad range of applications. However, DNNs are compute intensive and memory intensive which are difficult to be employed in practical scenarios. Due to their favorable parallel computing ability, a series of DNN accelerators have been proposed. However, the improvement of on-chip computing capacity and the increasing number of parameters in the neural networks make access to memory a bottleneck. In this paper, we analyze the existing DNN algorithms. We observe that the special structure of neural networks makes it have two useful characteristics, which are unilateral directivity and local independence. Based on these characteristics, we propose a general software scheduling method to reduce memory access cost. Based on the experimental results, our method can reduce 32% memory access cost and achieve a speedup of 1.6x in average on our experiment platform and the best result is in ResNet-50, which is up to 56% and 2.62x.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Xiong, W., et al.: Achieving human parity in conversational speech recognition. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, p. 99 (2016) Xiong, W., et al.: Achieving human parity in conversational speech recognition. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, p. 99 (2016)
2.
Zurück zum Zitat Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef Ren, S., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRef
3.
Zurück zum Zitat Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement (2018) Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement (2018)
4.
Zurück zum Zitat Noh, H., Hong, S., Han, B.: Learning Deconvolution Network for Semantic Segmentation (2015) Noh, H., Hong, S., Han, B.: Learning Deconvolution Network for Semantic Segmentation (2015)
5.
Zurück zum Zitat Han, S., et al.: Learning both Weights and Connections for Efficient Neural Networks (2015) Han, S., et al.: Learning both Weights and Connections for Efficient Neural Networks (2015)
6.
Zurück zum Zitat Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. Fiber 56(4), 3–7 (2015) Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. Fiber 56(4), 3–7 (2015)
7.
Zurück zum Zitat Jacob, B., et al.: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference (2017) Jacob, B., et al.: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference (2017)
8.
Zurück zum Zitat Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Not. 49(4), 269–284 (2014) Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Not. 49(4), 269–284 (2014)
9.
Zurück zum Zitat Chen, Y., et al.: DaDianNao: A Machine-Learning Supercomputer (2014) Chen, Y., et al.: DaDianNao: A Machine-Learning Supercomputer (2014)
10.
Zurück zum Zitat Han, S., et al.: EIE: efficient inference engine on compressed deep neural network. ACM Sigarch Comput. Archit. News 44(3), 243–254 (2016) CrossRef Han, S., et al.: EIE: efficient inference engine on compressed deep neural network. ACM Sigarch Comput. Archit. News 44(3), 243–254 (2016) CrossRef
11.
Zurück zum Zitat Shen, Y., Ferdman, M., Milder, P.: Escher: a CNN accelerator with flexible buffering to minimize off-chip transfer. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) IEEE Computer Society (2017) Shen, Y., Ferdman, M., Milder, P.: Escher: a CNN accelerator with flexible buffering to minimize off-chip transfer. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) IEEE Computer Society (2017)
12.
Zurück zum Zitat Chen, Y.-H., et al.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)CrossRef Chen, Y.-H., et al.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)CrossRef
13.
Zurück zum Zitat Liu, S., et al.: Cambricon: an instruction set architecture for neural networks. In: ACM/IEEE International Symposium on Computer Architecture (2016) Liu, S., et al.: Cambricon: an instruction set architecture for neural networks. In: ACM/IEEE International Symposium on Computer Architecture (2016)
14.
Zurück zum Zitat Chen, T., et al.: MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. Statistics (2015) Chen, T., et al.: MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. Statistics (2015)
15.
Zurück zum Zitat Abadi, M., et al.: TensorFlow: a system for large-scale machine learning (2016) Abadi, M., et al.: TensorFlow: a system for large-scale machine learning (2016)
16.
Zurück zum Zitat Abadi, M., et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (2016) Abadi, M., et al.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (2016)
17.
Zurück zum Zitat Alwani, M., et al.: Fused-Layer CNN Accelerators. In: IEEE/ACM International Symposium on Microarchitecture (2016) Alwani, M., et al.: Fused-Layer CNN Accelerators. In: IEEE/ACM International Symposium on Microarchitecture (2016)
18.
Zurück zum Zitat Simonyan, K., Andrew Z.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Andrew Z.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
19.
Zurück zum Zitat Szegedy, C., et al.: Going Deeper with Convolutions (2014) Szegedy, C., et al.: Going Deeper with Convolutions (2014)
20.
Zurück zum Zitat Xia, X., Cui, X., Bing, N.: Inception-v3 for flower classification. In: International Conference on Image (2017) Xia, X., Cui, X., Bing, N.: Inception-v3 for flower classification. In: International Conference on Image (2017)
21.
Zurück zum Zitat He, K., et al.: Deep Residual Learning for Image Recognition (2015) He, K., et al.: Deep Residual Learning for Image Recognition (2015)
Metadaten
Titel
Deep Fusion: A Software Scheduling Method for Memory Access Optimization
verfasst von
Yimin Zhuang
Shaohui Peng
Xiaobing Chen
Shengyuan Zhou
Tian Zhi
Wei Li
Shaoli Liu
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-30709-7_22

Premium Partner