Skip to main content

2020 | OriginalPaper | Buchkapitel

Processing Systems for Deep Learning Inference on Edge Devices

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep learning models are taking place at many artificial intelligence tasks. These models are achieving better results but need more computing power and memory. Therefore, training and inference of deep learning models are made at cloud centers with high-performance platforms. In many applications, it is more beneficial or required to have the inference at the edge near the source of data or action requests avoiding the need to transmit the data to a cloud service and wait for the answer. In many scenarios, transmission of data to the cloud is not reliable or even impossible, or has a high latency with uncertainty about the round-trip delay of the communication, which is not acceptable for applications sensitive to latency with real-time decisions. Other factors like security and privacy of data force the data to stay in the edge. With all these disadvantages, inference is migrating partial or totally to the edge. The problem, is that deep learning models are quite hungry in terms of computation, memory and energy which are not available in today’s edge computing devices. Therefore, artificial intelligence devices are being deployed by different companies with different markets in mind targeting edge computing. In this chapter we describe the actual state of algorithms and models for deep learning and analyze the state of the art of computing devices and platforms to deploy deep learning on edge. We describe the existing computing devices for deep learning and analyze them in terms of different metrics, like performance, power and flexibility. Different technologies are to be considered including GPU, CPU, FPGA and ASIC. We will explain the trends in computing devices for deep learning on edge, what has been researched, and what should we expect from future devices.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 1–13 (2016). https://doi.org/10.1109/ISCA.2016.11 Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E., Moshovos, A.: Cnvlutin: ineffectual-neuron-free deep neural network computing. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 1–13 (2016). https://​doi.​org/​10.​1109/​ISCA.​2016.​11
2.
Zurück zum Zitat Aydonat, U., O’Connell, S., Capalija, D., Ling, A.C., Chiu, G.R.: An opencl™deep learning accelerator on arria 10. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 55–64. FPGA’17, ACM, New York, NY, USA (2017). https://doi.org/10.1145/3020078.3021738 Aydonat, U., O’Connell, S., Capalija, D., Ling, A.C., Chiu, G.R.: An opencl™deep learning accelerator on arria 10. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 55–64. FPGA’17, ACM, New York, NY, USA (2017). https://​doi.​org/​10.​1145/​3020078.​3021738
7.
Zurück zum Zitat Davies, M., Srinivasa, N., Lin, T., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S., Liao, Y., Lin, C., Lines, A., Liu, R., Mathaikutty, D., McCoy, S., Paul, A., Tse, J., Venkataramanan, G., Weng, Y., Wild, A., Yang, Y., Wang, H.: Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018). https://doi.org/10.1109/MM.2018.112130359CrossRef Davies, M., Srinivasa, N., Lin, T., Chinya, G., Cao, Y., Choday, S.H., Dimou, G., Joshi, P., Imam, N., Jain, S., Liao, Y., Lin, C., Lines, A., Liu, R., Mathaikutty, D., McCoy, S., Paul, A., Tse, J., Venkataramanan, G., Weng, Y., Wild, A., Yang, Y., Wang, H.: Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018). https://​doi.​org/​10.​1109/​MM.​2018.​112130359CrossRef
8.
Zurück zum Zitat Flex Logic Technologies, Inc.: Flex Logic Improves Deep Learning Performance by 10X with new EFLX4K AI eFPGA Core (June 2018) Flex Logic Technologies, Inc.: Flex Logic Improves Deep Learning Performance by 10X with new EFLX4K AI eFPGA Core (June 2018)
9.
Zurück zum Zitat Fujii, T., Toi, T., Tanaka, T., Togawa, K., Kitaoka, T., Nishino, K., Nakamura, N., Nakahara, H., Motomura, M.: New generation dynamically reconfigurable processor technology for accelerating embedded AI applications. In: 2018 IEEE Symposium on VLSI Circuits, pp. 41–42 (June 2018). https://doi.org/10.1109/VLSIC.2018.8502438 Fujii, T., Toi, T., Tanaka, T., Togawa, K., Kitaoka, T., Nishino, K., Nakamura, N., Nakahara, H., Motomura, M.: New generation dynamically reconfigurable processor technology for accelerating embedded AI applications. In: 2018 IEEE Symposium on VLSI Circuits, pp. 41–42 (June 2018). https://​doi.​org/​10.​1109/​VLSIC.​2018.​8502438
10.
Zurück zum Zitat Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Gordon, G., Dunson, D., Dudk, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15, pp. 315–323. PMLR, Fort Lauderdale, FL, USA (11–13 Apr 2011). http://proceedings.mlr.press/v15/glorot11a.html Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Gordon, G., Dunson, D., Dudk, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 15, pp. 315–323. PMLR, Fort Lauderdale, FL, USA (11–13 Apr 2011). http://​proceedings.​mlr.​press/​v15/​glorot11a.​html
12.
Zurück zum Zitat Guo, S., Wang, L., Chen, B., Dou, Q., Tang, Y., Li, Z.: Fixcaffe: Training cnn with low precision arithmetic operations by fixed point caffe. In: APPT (2017) Guo, S., Wang, L., Chen, B., Dou, Q., Tang, Y., Li, Z.: Fixcaffe: Training cnn with low precision arithmetic operations by fixed point caffe. In: APPT (2017)
14.
Zurück zum Zitat Gyrfalcon Technology: Lightspeeur 2803S Neural Accelerator (Jan 2018) Gyrfalcon Technology: Lightspeeur 2803S Neural Accelerator (Jan 2018)
15.
Zurück zum Zitat Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented approximation of convolutional neural networks. In: Proceedings of the 4th International Conference on Learning Representations (2016) Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented approximation of convolutional neural networks. In: Proceedings of the 4th International Conference on Learning Representations (2016)
16.
Zurück zum Zitat Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference Track Proceedings (2016) Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference Track Proceedings (2016)
18.
Zurück zum Zitat Higginbotham, S.: Google Takes Unconventional Route with Homegrown Machine Learning Chips (May 2016) Higginbotham, S.: Google Takes Unconventional Route with Homegrown Machine Learning Chips (May 2016)
20.
Zurück zum Zitat Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012) Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012)
21.
Zurück zum Zitat Intel: Intel Movidius Myriad X VPU (Aug 2017) Intel: Intel Movidius Myriad X VPU (Aug 2017)
23.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105. NIPS’12, Curran Associates Inc., USA (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105. NIPS’12, Curran Associates Inc., USA (2012)
25.
Zurück zum Zitat LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 2, pp. 396–404. Morgan-Kaufmann (1990) LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems 2, pp. 396–404. Morgan-Kaufmann (1990)
27.
Zurück zum Zitat Linley Group: Ceva NeuPro Accelerates Neural Nets (Jan 2018) Linley Group: Ceva NeuPro Accelerates Neural Nets (Jan 2018)
29.
Zurück zum Zitat Liu, Z., Dou, Y., Jiang, J., Xu, J., Li, S., Zhou, Y., Xu, Y.: Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfig. Technol. Syst. 10(3), 17:1–17:23 (Jul 2017). https://doi.org/10.1145/3079758 Liu, Z., Dou, Y., Jiang, J., Xu, J., Li, S., Zhou, Y., Xu, Y.: Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans. Reconfig. Technol. Syst. 10(3), 17:1–17:23 (Jul 2017). https://​doi.​org/​10.​1145/​3079758
30.
Zurück zum Zitat Markakis, E.K., Karras, K., Zotos, N., Sideris, A., Moysiadis, T., Corsaro, A., Alexiou, G., Skianis, C., Mastorakis, G., Mavromoustakis, C.X., Pallis, E.: Exegesis: Extreme edge resource harvesting for a virtualized fog environment. IEEE Communications Magazine 55(7), 173–179 (July 2017). https://doi.org/10.1109/MCOM.2017.1600730 Markakis, E.K., Karras, K., Zotos, N., Sideris, A., Moysiadis, T., Corsaro, A., Alexiou, G., Skianis, C., Mastorakis, G., Mavromoustakis, C.X., Pallis, E.: Exegesis: Extreme edge resource harvesting for a virtualized fog environment. IEEE Communications Magazine 55(7), 173–179 (July 2017). https://​doi.​org/​10.​1109/​MCOM.​2017.​1600730
32.
Zurück zum Zitat Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong Gee Hock, J., Liew, Y.T., Srivatsan, K., Moss, D., Subhaschandra, S., Boudoukh, G.: Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14. FPGA ’17, ACM, New York, NY, USA (2017). https://doi.org/10.1145/3020078.3021740 Nurvitadhi, E., Venkatesh, G., Sim, J., Marr, D., Huang, R., Ong Gee Hock, J., Liew, Y.T., Srivatsan, K., Moss, D., Subhaschandra, S., Boudoukh, G.: Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14. FPGA ’17, ACM, New York, NY, USA (2017). https://​doi.​org/​10.​1145/​3020078.​3021740
34.
Zurück zum Zitat Peres, T., Gonalves, A., Véstias, M.: Faster convolutional neural networks in low density FPGAs using block pruning. In: 15th Annual International Symposium on Applied Reconfigurable Computing (April 2019) Peres, T., Gonalves, A., Véstias, M.: Faster convolutional neural networks in low density FPGAs using block pruning. In: 15th Annual International Symposium on Applied Reconfigurable Computing (April 2019)
36.
Zurück zum Zitat Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018). http://arxiv.org/abs/1801.04381 Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018). http://​arxiv.​org/​abs/​1801.​04381
39.
Zurück zum Zitat Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J.s., Cao, Y.: Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 16–25. FPGA ’16, ACM, New York, NY, USA (2016). https://doi.org/10.1145/2847263.2847276 Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrudhula, S., Seo, J.s., Cao, Y.: Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 16–25. FPGA ’16, ACM, New York, NY, USA (2016). https://​doi.​org/​10.​1145/​2847263.​2847276
48.
Zurück zum Zitat Véstias, M., Duarte, R.P., Sousa, J.T.d., Neto, H.: Lite-CNN: a high-performance architecture to execute CNNs in low density FPGAs. In: Proceedings of the 28th International Conference on Field Programmable Logic and Applications (2018) Véstias, M., Duarte, R.P., Sousa, J.T.d., Neto, H.: Lite-CNN: a high-performance architecture to execute CNNs in low density FPGAs. In: Proceedings of the 28th International Conference on Field Programmable Logic and Applications (2018)
49.
Zurück zum Zitat Wang, J., Lou, Q., Zhang, X., Zhu, C., Lin, Y., Chen., D.: Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 28th International Conference on Field-Programmable Logic and Applications (2018) Wang, J., Lou, Q., Zhang, X., Zhu, C., Lin, Y., Chen., D.: Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In: 28th International Conference on Field-Programmable Logic and Applications (2018)
52.
Zurück zum Zitat Yu, J., Lukefahr, A., Palframan, D., Dasika, G., Das, R., Mahlke, S.: Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 548–560 (June 2017). https://doi.org/10.1145/3079856.3080215 Yu, J., Lukefahr, A., Palframan, D., Dasika, G., Das, R., Mahlke, S.: Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 548–560 (June 2017). https://​doi.​org/​10.​1145/​3079856.​3080215
Metadaten
Titel
Processing Systems for Deep Learning Inference on Edge Devices
verfasst von
Mário Véstias
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-44907-0_9