Skip to main content

2020 | OriginalPaper | Buchkapitel

A Pipelining Strategy for Accelerating Convolutional Networks on ARM Processors

verfasst von : Xin Zhou, Rongchun Li, Peng Zhang, Yuntao Liu, Yong Dou

Erschienen in: Parallel Architectures, Algorithms and Programming

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Convolutional neural networks (CNN) is playing an important role in many fields. Many applications are able to run the inference process of CNN with pre-trained models on mobile devices in these days. Improving performance of embedded processors such as ARM-based CPUs makes it possible to meet the requirement of real-time processing. In this paper, a pipelining strategy is proposed to accelerate convolution networks on ARM processors. We implement a \(3\times 3\) convolution with Neon instructions which are single instruction and multiple data (SIMD) instructions supported by ARM processors. In order to reduce stalls in the pipeline, issue orders of instructions are rearranged according to the out-of-order execution and dual-issue mechanism on ARM processors. A tiling method is exploited to increase data reuse. The input feature map is divided into multiple \(6\times 6\) tiles, and the computations within the tile is highly optimized using our proposed pipelining strategy. The speedup of proposed method is 2.88 compared with gcc compiled codes on RK3288. The effect of our optimizing method is measured by a performance profiling tool, cycles and cache misses are decreased significantly. The multi-thread version implemented with openMP achieve speedup of 6.8 compared with single-thread gcc complied version.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Liu, Z., Chow, P., Xu, J., Jiang, J., Dou, Y., Zhou, J.: A uniform architecture design for accelerating 2D and 3D CNNS on FPGAs. Electronics 8(1), 65 (2019)CrossRef Liu, Z., Chow, P., Xu, J., Jiang, J., Dou, Y., Zhou, J.: A uniform architecture design for accelerating 2D and 3D CNNS on FPGAs. Electronics 8(1), 65 (2019)CrossRef
2.
Zurück zum Zitat Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., Zhang, C.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurrency Comput.: Practice Exp. 29(20), e3850 (2017)CrossRef Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., Zhang, C.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concurrency Comput.: Practice Exp. 29(20), e3850 (2017)CrossRef
3.
Zurück zum Zitat Dongarra, J.J., Cruz, J.D., Hammarling, S., Duff, I.S.: Algorithm 679: a set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Trans. Math. Softw. (TOMS) 16(1), 18–28 (1990)CrossRef Dongarra, J.J., Cruz, J.D., Hammarling, S., Duff, I.S.: Algorithm 679: a set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Trans. Math. Softw. (TOMS) 16(1), 18–28 (1990)CrossRef
4.
Zurück zum Zitat Winograd, S.: Arithmetic Complexity of Computations, vol. 33. SIAM, Philadelphia (1980) Winograd, S.: Arithmetic Complexity of Computations, vol. 33. SIAM, Philadelphia (1980)
5.
Zurück zum Zitat Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016) Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
11.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
12.
Zurück zum Zitat Szegedy, C., et al.: Going deeper with convolutions (2014) Szegedy, C., et al.: Going deeper with convolutions (2014)
13.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
14.
Zurück zum Zitat Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017) Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017)
15.
Zurück zum Zitat Patterson, D.A.: Computer Architecture: A Quantitative Approach (2008) Patterson, D.A.: Computer Architecture: A Quantitative Approach (2008)
16.
Zurück zum Zitat Cortex, A.: A8 technical reference manual. Revision: r3p2, p. 64, May 2010 Cortex, A.: A8 technical reference manual. Revision: r3p2, p. 64, May 2010
17.
Zurück zum Zitat Cortex, A.: Arm Cortex-A17 MPCore processor. Revision: r1p1, September 2014 Cortex, A.: Arm Cortex-A17 MPCore processor. Revision: r1p1, September 2014
Metadaten
Titel
A Pipelining Strategy for Accelerating Convolutional Networks on ARM Processors
verfasst von
Xin Zhou
Rongchun Li
Peng Zhang
Yuntao Liu
Yong Dou
Copyright-Jahr
2020
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-2767-8_45

Neuer Inhalt