Top

Published in:

2020 | OriginalPaper | Chapter

Processing Systems for Deep Learning Inference on Edge Devices

Author : Mário Véstias

Published in: Convergence of Artificial Intelligence and the Internet of Things

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Deep learning models are taking place at many artificial intelligence tasks. These models are achieving better results but need more computing power and memory. Therefore, training and inference of deep learning models are made at cloud centers with high-performance platforms. In many applications, it is more beneficial or required to have the inference at the edge near the source of data or action requests avoiding the need to transmit the data to a cloud service and wait for the answer. In many scenarios, transmission of data to the cloud is not reliable or even impossible, or has a high latency with uncertainty about the round-trip delay of the communication, which is not acceptable for applications sensitive to latency with real-time decisions. Other factors like security and privacy of data force the data to stay in the edge. With all these disadvantages, inference is migrating partial or totally to the edge. The problem, is that deep learning models are quite hungry in terms of computation, memory and energy which are not available in today’s edge computing devices. Therefore, artificial intelligence devices are being deployed by different companies with different markets in mind targeting edge computing. In this chapter we describe the actual state of algorithms and models for deep learning and analyze the state of the art of computing devices and platforms to deploy deep learning on edge. We describe the existing computing devices for deep learning and analyze them in terms of different metrics, like performance, power and flexibility. Different technologies are to be considered including GPU, CPU, FPGA and ASIC. We will explain the trends in computing devices for deep learning on edge, what has been researched, and what should we expect from future devices.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Machine Learning Techniques for Wireless-Powered Ambient Backscatter Communications: Enabling Intelligent IoT Networks in 6G Era

next chapter Power Domain Based Multiple Access for IoT Deployment: Two-Way Transmission Mode and Performance Analysis

Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 1–13 (2016). https://doi.org/10.1109/ISCA.2016.11

Aydonat, U., O’Connell, S., Capalija, D., Ling, A.C., Chiu, G.R.: An opencl™deep learning accelerator on arria 10. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 55–64. FPGA’17, ACM, New York, NY, USA (2017). https://doi.org/10.1145/3020078.3021738

Cadence: Tensilica DNA Processor IP For AI Inference (Oct 2017). https://ip.cadence.com/uploads/datasheets/TIP_PB_AI_Processor_FINAL.pdf

Chen, Y., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017). https://doi.org/10.1109/JSSC.2016.2616357CrossRef

Courbariaux, M., Bengio, Y.: Binarynet: training deep neural networks with weights and activations constrained to +1 or \(-\)1. CoRR abs/1602.02830 (2016). http://arxiv.org/abs/1602.02830

Courbariaux, M., Bengio, Y., David, J.: Binaryconnect: training deep neural networks with binary weights during propagations. CoRR abs/1511.00363 (2015). http://arxiv.org/abs/1511.00363

Davies, M., Srinivasa, N., Lin, T., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S., Liao, Y., Lin, C., Lines, A., Liu, R., Mathaikutty, D., McCoy, S., Paul, A., Tse, J., Venkataramanan, G., Weng, Y., Wild, A., Yang, Y., Wang, H.: Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018). https://doi.org/10.1109/MM.2018.112130359CrossRef

Flex Logic Technologies, Inc.: Flex Logic Improves Deep Learning Performance by 10X with new EFLX4K AI eFPGA Core (June 2018)

Fujii, T., Toi, T., Tanaka, T., Togawa, K., Kitaoka, T., Nishino, K., Nakamura, N., Nakahara, H., Motomura, M.: New generation dynamically reconfigurable processor technology for accelerating embedded AI applications. In: 2018 IEEE Symposium on VLSI Circuits, pp. 41–42 (June 2018). https://doi.org/10.1109/VLSIC.2018.8502438

10.

Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Gordon, G., Dunson, D., Dudk, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15, pp. 315–323. PMLR, Fort Lauderdale, FL, USA (11–13 Apr 2011). http://proceedings.mlr.press/v15/glorot11a.html

11.

Guo, K., Zeng, S., Yu, J., Wang, Y., Yang, H.: A survey of FPGA based neural network accelerator. CoRR abs/1712.08934 (2017). http://arxiv.org/abs/1712.08934

12.

Guo, S., Wang, L., Chen, B., Dou, Q., Tang, Y., Li, Z.: Fixcaffe: Training cnn with low precision arithmetic operations by fixed point caffe. In: APPT (2017)

13.

Guo, K., Sui, L., Qiu, J., Yu, J., Wang, J., Yao, S., Han, S., Wang, Y., Yang, H.: Angel-eye: a complete design flow for mapping cnn onto embedded fpga. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 37(1), 35–47 (2018). https://doi.org/10.1109/TCAD.2017.2705069

14.

Gyrfalcon Technology: Lightspeeur 2803S Neural Accelerator (Jan 2018)

15.

Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented approximation of convolutional neural networks. In: Proceedings of the 4th International Conference on Learning Representations (2016)

16.

Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference Track Proceedings (2016)

17.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385

18.

Higginbotham, S.: Google Takes Unconventional Route with Homegrown Machine Learning Chips (May 2016)

19.

Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). https://doi.org/10.1126/science.1127647MathSciNetCrossRefMATH

20.

Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012)

21.

Intel: Intel Movidius Myriad X VPU (Aug 2017)

22.

Judd, P., Albericio, J., Hetherington, T., Aamodt, T.M., Moshovos, A.: Stripes: Bit-serial deep neural network computing. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (Oct 2016). https://doi.org/10.1109/MICRO.2016.7783722

23.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105. NIPS’12, Curran Associates Inc., USA (2012)

24.

LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539

25.

LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 2, pp. 396–404. Morgan-Kaufmann (1990)

26.

Li, F., Liu, B.: Ternary weight networks. CoRR abs/1605.04711 (2016). http://arxiv.org/abs/1605.04711

27.

Linley Group: Ceva NeuPro Accelerates Neural Nets (Jan 2018)

28.

Liu, Y., Yang, C., Jiang, L., Xie, S., Zhang, Y.: Intelligent edge computing for iot-based energy management in smart cities. IEEE Netw. 33(2), 111–117 (2019). https://doi.org/10.1109/MNET.2019.1800254. MarchCrossRef

29.

Liu, Z., Dou, Y., Jiang, J., Xu, J., Li, S., Zhou, Y., Xu, Y.: Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfig. Technol. Syst. 10(3), 17:1–17:23 (Jul 2017). https://doi.org/10.1145/3079758

30.

Markakis, E.K., Karras, K., Zotos, N., Sideris, A., Moysiadis, T., Corsaro, A., Alexiou, G., Skianis, C., Mastorakis, G., Mavromoustakis, C.X., Pallis, E.: Exegesis: Extreme edge resource harvesting for a virtualized fog environment. IEEE Communications Magazine 55(7), 173–179 (July 2017). https://doi.org/10.1109/MCOM.2017.1600730

31.

Mavromoustakis, C.X., Batalla, J.M., Mastorakis, G., Markakis, E., Pallis, E.: Socially oriented edge computing for energy awareness in iot architectures. IEEE Commun. Mag. 56(7), 139–145 (2018). https://doi.org/10.1109/MCOM.2018.1700600. JulyCrossRef

32.

Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong Gee Hock, J., Liew, Y.T., Srivatsan, K., Moss, D., Subhaschandra, S., Boudoukh, G.: Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14. FPGA ’17, ACM, New York, NY, USA (2017). https://doi.org/10.1145/3020078.3021740

33.

Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: A deep neural network architecture for real-time semantic segmentation. CoRR abs/1606.02147 (2016). http://arxiv.org/abs/1606.02147

34.

Peres, T., Gonalves, A., Véstias, M.: Faster convolutional neural networks in low density FPGAs using block pruning. In: 15th Annual International Symposium on Applied Reconfigurable Computing (April 2019)

35.

Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., Zhang, C.: Fpga-accelerated deep convolutional neural networks for high throughput and energy efficiency. Concu. Comput. Pract. Exp. 29(20), e3850–n/a (2017). https://doi.org/10.1002/cpe.3850,e3850cpe.3850

36.

Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018). http://arxiv.org/abs/1801.04381

37.

Shin, D., Lee, J., Lee, J., Yoo, H.: 14.2 DNPU: An 8.1tops/w reconfigurable CNN-RNN processor for general-purpose deep neural networks. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 240–241 (Feb 2017). https://doi.org/10.1109/ISSCC.2017.7870350

38.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556

39.

Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J.s., Cao, Y.: Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 16–25. FPGA ’16, ACM, New York, NY, USA (2016). https://doi.org/10.1145/2847263.2847276

40.

Synopsys: DesignWare EV6x Vision Processors (Oct 2017). https://www.synopsys.com/dw/ipdir.php?ds=ev6x-vision-processors

41.

Sze, V., Chen, Y., Yang, T., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. of the IEEE 105(12), 2295–2329 (2017). https://doi.org/10.1109/JPROC.2017.2761740. DecCrossRef

42.

Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR abs/1602.07261 (2016). http://arxiv.org/abs/1602.07261

43.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. CoRR abs/1409.4842 (2014). http://arxiv.org/abs/1409.4842

44.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR abs/1512.00567 (2015). http://arxiv.org/abs/1512.00567

45.

Tractica: Deep Learning Chipsets (Jan 2019). https://www.tractica.com/research/deep-learning-chipsets/

46.

Valdes Pena, M.D., Rodriguez-Andina, J.J., Manic, M.: The internet of things: the role of reconfigurable platforms. IEEE Ind. Electron. Mag. 11(3), 6–19 (2017). https://doi.org/10.1109/MIE.2017.2724579. SepCrossRef

47.

Venieris, S.I., Bouganis, C.: fpgaconvnet: Mapping regular and irregular convolutional neural networks on FPGAs. IEEE Trans. Neural Netw. Learn. Syst. 1–17 (2018). https://doi.org/10.1109/TNNLS.2018.2844093

48.

Véstias, M., Duarte, R.P., Sousa, J.T.d., Neto, H.: Lite-CNN: a high-performance architecture to execute CNNs in low density FPGAs. In: Proceedings of the 28th International Conference on Field Programmable Logic and Applications (2018)

49.

Wang, J., Lou, Q., Zhang, X., Zhu, C., Lin, Y., Chen., D.: Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 28th International Conference on Field-Programmable Logic and Applications (2018)

50.

Xilinx: Versal: the first adaptive compute acceleration platform (acap) (Oct 2018), https://www.xilinx.com/support/documentation/white_papers/wp505-versal-acap.pdf

51.

Yin, S., Ouyang, P., Tang, S., Tu, F., Li, X., Zheng, S., Lu, T., Gu, J., Liu, L., Wei, S.: A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE J. Solid-State Circ. 53(4), 968–982 (2018). https://doi.org/10.1109/JSSC.2017.2778281. AprilCrossRef

52.

Yu, J., Lukefahr, A., Palframan, D., Dasika, G., Das, R., Mahlke, S.: Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 548–560 (June 2017). https://doi.org/10.1145/3079856.3080215

53.

Zhou, S., Ni, Z., Zhou, X., Wen, H., Wu, Y., Zou, Y.: Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160 (2016). http://arxiv.org/abs/1606.06160

Title: Processing Systems for Deep Learning Inference on Edge Devices
Author: Mário Véstias
Publisher: Springer International Publishing
Book: Convergence of Artificial Intelligence and the Internet of Things
Print ISBN: 978-3-030-44906-3

Electronic ISBN: 978-3-030-44907-0

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-44907-0_9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner