Skip to main content
Top

2019 | OriginalPaper | Chapter

PRTSM: Hardware Data Arrangement Mechanisms for Convolutional Layer Computation on the Systolic Array

Authors : Shuquan Wang, Lei Wang, Shiming Li, Tian Shuo, Shasha Guo, Ziyang Kang, Shuzheng Zhang, Weixia Xu

Published in: Network and Parallel Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The systolic array is an array of processing units which share the inner data flow. Since the 2D systolic array fits the operation of multiplication and accumulation (MAC) naturally, there are many groups which use the systolic array to accelerate the computation of DNN (Deep Neural Network). However, the performance of the systolic array is limited by the data bandwidth. Some groups solve this problem with the method of loop tiling and care little about the pixel reuse potential of the convolutional layer. In this paper, we propose a novel method of PRTSM (Pixels Reuse with Time and Spatial Multiplexing) which reuses the pixels of the input feature map with time and spatial multiplexing. With it, we can significantly reduce the pressure of bandwidth and save the time of data preparing for convolutional layers on the systolic array. We propose three algorithms for this method and implement the corresponding hardware mechanisms on Xilinx FPGA XCVU440. Experiments show that our hardware mechanisms can reduce at least \(72.03\%\) of the off-chip traffic. The mechanisms proposed by this paper can reach a peak performance of 64.034 GOPS with a frequency of 167 MHz.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Samajdar, A., Zhu, Y., Whatmough, P., et al.: SCALE-Sim: Systolic CNN Accelerator (2018) Samajdar, A., Zhu, Y., Whatmough, P., et al.: SCALE-Sim: Systolic CNN Accelerator (2018)
2.
go back to reference Zhang, J., Gu, T., Basu, K., et al.: Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator (2018) Zhang, J., Gu, T., Basu, K., et al.: Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator (2018)
3.
go back to reference Bao, W., Jiang, J., Fu, Y., et al.: A reconfigurable macro-pipelined systolic accelerator architecture. In: 2011 International Conference on Field-Programmable Technology, FPT 2011, New Delhi, India, 12–14 December 2011. IEEE (2011) Bao, W., Jiang, J., Fu, Y., et al.: A reconfigurable macro-pipelined systolic accelerator architecture. In: 2011 International Conference on Field-Programmable Technology, FPT 2011, New Delhi, India, 12–14 December 2011. IEEE (2011)
4.
go back to reference Chen, Y.-H., Krishna, T., Emer, J., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In: International Solid-State Circuits Conference, Ser. ISSCC (2016) Chen, Y.-H., Krishna, T., Emer, J., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In: International Solid-State Circuits Conference, Ser. ISSCC (2016)
5.
go back to reference Sze, V., Chen, Y.H., Yang, T.J., et al.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)CrossRef Sze, V., Chen, Y.H., Yang, T.J., et al.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)CrossRef
6.
go back to reference Du, Z., Fasthuber, R., Chen, T., et al.: ShiDianNao: shifting vision processing closer to the sensor. In: ACM/IEEE International Symposium on Computer Architecture (2015) Du, Z., Fasthuber, R., Chen, T., et al.: ShiDianNao: shifting vision processing closer to the sensor. In: ACM/IEEE International Symposium on Computer Architecture (2015)
7.
go back to reference In-Datacenter Performance Analysis of a Tensor Processing Unit (2017) In-Datacenter Performance Analysis of a Tensor Processing Unit (2017)
8.
go back to reference Razip, M.I.M., Junid, S.A.M.A., Halim, A.K., et al.: Sequence alignment using systolic array for an accelerator. In: Power Engineering and Optimization Conference. IEEE (2014) Razip, M.I.M., Junid, S.A.M.A., Halim, A.K., et al.: Sequence alignment using systolic array for an accelerator. In: Power Engineering and Optimization Conference. IEEE (2014)
9.
go back to reference Razip, M.I.M., Al Junid, S.A.M., Halim, A.K., et al.: Sequence alignment using systolic array for an accelerator (2014) Razip, M.I.M., Al Junid, S.A.M., Halim, A.K., et al.: Sequence alignment using systolic array for an accelerator (2014)
10.
go back to reference Ito, M.: A power-efficient FPGA accelerator: systolic array with cache-coherent interface for pair-HMM algorithm. In: Low-Power and High-Speed Chips (2016) Ito, M.: A power-efficient FPGA accelerator: systolic array with cache-coherent interface for pair-HMM algorithm. In: Low-Power and High-Speed Chips (2016)
11.
go back to reference Qiu, J., et al.: Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 24th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016) Qiu, J., et al.: Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 24th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (2016)
12.
go back to reference Chen, Y., Emer, J., Sze, V.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: Proceedings of 43rd Annual International Symposium on Computer Architecture (2016) Chen, Y., Emer, J., Sze, V.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: Proceedings of 43rd Annual International Symposium on Computer Architecture (2016)
13.
go back to reference Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (2016) Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (2016)
14.
go back to reference Azizimazreah, A., Chen, L.: Shortcut mining: exploiting cross-layer shortcut reuse in DCNN accelerators. In: 2019 IEEE International Symposium on High-Performance Computer Architecture Azizimazreah, A., Chen, L.: Shortcut mining: exploiting cross-layer shortcut reuse in DCNN accelerators. In: 2019 IEEE International Symposium on High-Performance Computer Architecture
15.
go back to reference Ma, Y., Kim, M., Cao, Y., Vrudhula, S., Seo, J.: End-to-end scalable FPGA accelerator for deep residual networks. In: IEEE International Symposium on Circuits and Systems (2017) Ma, Y., Kim, M., Cao, Y., Vrudhula, S., Seo, J.: End-to-end scalable FPGA accelerator for deep residual networks. In: IEEE International Symposium on Circuits and Systems (2017)
Metadata
Title
PRTSM: Hardware Data Arrangement Mechanisms for Convolutional Layer Computation on the Systolic Array
Authors
Shuquan Wang
Lei Wang
Shiming Li
Tian Shuo
Shasha Guo
Ziyang Kang
Shuzheng Zhang
Weixia Xu
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-30709-7_6

Premium Partner