Skip to main content
Erschienen in: Real-Time Systems 1/2019

18.07.2018

DeepRT: predictable deep learning inference for cyber-physical systems

verfasst von: Woochul Kang, Jaeyong Chung

Erschienen in: Real-Time Systems | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently, in mobile and embedded devices, deep learning is changing the way computers see, hear, and understand the world. When deep learning is deployed to such systems, they are supposed to perform inference tasks in a timely and energy-efficient manner. Lots of research has focused on taming deep learning for resource-constrained devices by either compressing deep learning models or devising hardware accelerators. However, these approaches have focused on providing ‘best-effort’ performance for such devices. In this paper, we present the design and implementation of DeepRT, a novel deep learning inference runtime. Unlike previous approaches, DeepRT focuses on supporting predictable temporal and spatial inference performance when deep learning models are used under unpredictable and resource-constrained environments. In particular, DeepRT applies formal control theory to support Quality-of-Service (QoS) management that can dynamically minimize the tardiness of inference tasks at runtime while achieving high energy-efficiency. Further, DeepRT determines a proper level of compression of deep learning models at runtime according to the memory availability and users’ QoS requirements, resulting in proper trade-offs between the memory savings and the losses of inference accuracy. We evaluate DeepRT on a wide range of deep learning models under various conditions. The experimental results show that DeepRT supports the timeliness of inference tasks in a robust and energy-efficient manner.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The details of the experiment setting is discussed in Sect. 4.1. The system power is measured using an external power meter. Only the result for the CaffeNet model is shown since other models, e.g., LeNet, manifest very similar behavior.
 
2
CaffeNet is a variant of AlexNet Krizhevsky et al. (2012).
 
Literatur
Zurück zum Zitat Abe Y, Sasaki H, Kato S, Inoue K, Edahiro M, Peres M (2014) Power and performance characterization and modeling of gpu-accelerated systems. In: 2014 IEEE 28th international parallel and distributed processing symposium, pp 113–122. https://doi.org/10.1109/IPDPS.2014.23 Abe Y, Sasaki H, Kato S, Inoue K, Edahiro M, Peres M (2014) Power and performance characterization and modeling of gpu-accelerated systems. In: 2014 IEEE 28th international parallel and distributed processing symposium, pp 113–122. https://​doi.​org/​10.​1109/​IPDPS.​2014.​23
Zurück zum Zitat Amert T, Otterness N, Yang M, Anderson JH, Smith FD (2017) Gpu scheduling on the nvidia tx2: hidden details revealed. 2017 IEEE real-time systems symposium (RTSS), pp 104–115 Amert T, Otterness N, Yang M, Anderson JH, Smith FD (2017) Gpu scheduling on the nvidia tx2: hidden details revealed. 2017 IEEE real-time systems symposium (RTSS), pp 104–115
Zurück zum Zitat Chen JJ, Kuo CF (2007) Energy-efficient scheduling for real-time systems on dynamic voltage scaling (dvs) platforms. In: 13th IEEE international conference on embedded and real-time computing systems and applications, pp 28–38 Chen JJ, Kuo CF (2007) Energy-efficient scheduling for real-time systems on dynamic voltage scaling (dvs) platforms. In: 13th IEEE international conference on embedded and real-time computing systems and applications, pp 28–38
Zurück zum Zitat Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, pp 2285–2294 Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, pp 2285–2294
Zurück zum Zitat Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138CrossRef Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138CrossRef
Zurück zum Zitat Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277 Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277
Zurück zum Zitat Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) Shidiannao: shifting vision processing closer to the sensor. In: Proceedings of the 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA). IEEE, New York, pp 92–104 Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) Shidiannao: shifting vision processing closer to the sensor. In: Proceedings of the 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA). IEEE, New York, pp 92–104
Zurück zum Zitat Falcini F, Lami G, Costanza AM (2017) Deep learning in automotive software. IEEE Softw 34(3):56–63CrossRef Falcini F, Lami G, Costanza AM (2017) Deep learning in automotive software. IEEE Softw 34(3):56–63CrossRef
Zurück zum Zitat Fu X, Wang X (2011) Utilization-controlled task consolidation for power optimization in multi-core real-time systems. In: Proceedings of the 2011 IEEE 17th international conference on embedded and real-time computing systems and applications (RTCSA), vol 1, pp 73–82 Fu X, Wang X (2011) Utilization-controlled task consolidation for power optimization in multi-core real-time systems. In: Proceedings of the 2011 IEEE 17th international conference on embedded and real-time computing systems and applications (RTCSA), vol 1, pp 73–82
Zurück zum Zitat Fu Y, Kottenstette N, Lu C, Koutsoukos XD (2012) Feedback thermal control of real-time systems on multicore processors. In: Proceedings of the tenth ACM international conference on embedded software, EMSOFT ’12. ACM, New York, pp 113–122 Fu Y, Kottenstette N, Lu C, Koutsoukos XD (2012) Feedback thermal control of real-time systems on multicore processors. In: Proceedings of the tenth ACM international conference on embedded software, EMSOFT ’12. ACM, New York, pp 113–122
Zurück zum Zitat Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:​1412.​6115
Zurück zum Zitat Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: Proceedings of the 43rd international symposium on computer architecture, ISCA ’16, pp 243–254 Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: Proceedings of the 43rd international symposium on computer architecture, ISCA ’16, pp 243–254
Zurück zum Zitat Hellerstein JL, Diao Y, Parekh S, Tilbury DM (2004) Feedback control of computing systems. Wiley IEEE press, HobokenCrossRef Hellerstein JL, Diao Y, Parekh S, Tilbury DM (2004) Feedback control of computing systems. Wiley IEEE press, HobokenCrossRef
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
Zurück zum Zitat Ishihara T, Yasuura H (1998) Voltage scheduling problem for dynamically variable voltage processors. In: Proceedings, 1998 international symposium on low power electronics and design. IEEE, New York, pp 197–202 Ishihara T, Yasuura H (1998) Voltage scheduling problem for dynamically variable voltage processors. In: Proceedings, 1998 international symposium on low power electronics and design. IEEE, New York, pp 197–202
Zurück zum Zitat Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British machine vision conference. BMVA Press Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British machine vision conference. BMVA Press
Zurück zum Zitat Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:​1408.​5093
Zurück zum Zitat Kang W, Son SH, Stankovic JA (2012) Design, implementation, and evaluation of a qos-aware real-time embedded database. IEEE Trans Comput 61(1):45–59MathSciNetCrossRefMATH Kang W, Son SH, Stankovic JA (2012) Design, implementation, and evaluation of a qos-aware real-time embedded database. IEEE Trans Comput 61(1):45–59MathSciNetCrossRefMATH
Zurück zum Zitat Kim DHK, Imes C, Hoffmann H (2015) Racing and pacing to idle: theoretical and empirical analysis of energy optimization heuristics. In: 2015 IEEE 3rd international conference on cyber-physical systems, networks, and applications, pp 78–85. https://doi.org/10.1109/CPSNA.2015.23 Kim DHK, Imes C, Hoffmann H (2015) Racing and pacing to idle: theoretical and empirical analysis of energy optimization heuristics. In: 2015 IEEE 3rd international conference on cyber-physical systems, networks, and applications, pp 78–85. https://​doi.​org/​10.​1109/​CPSNA.​2015.​23
Zurück zum Zitat Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Zurück zum Zitat Lane ND, Bhattacharya S, Georgiev P, Forlivesi C, Jiao L, Qendro L, Kawsar F (2016) Deepx: a software accelerator for low-power deep learning inference on mobile devices. In: 2016 15th ACM/IEEE international conference on information processing in sensor networks (IPSN). IEEE, New York, pp 1–12 Lane ND, Bhattacharya S, Georgiev P, Forlivesi C, Jiao L, Qendro L, Kawsar F (2016) Deepx: a software accelerator for low-power deep learning inference on mobile devices. In: 2016 15th ACM/IEEE international conference on information processing in sensor networks (IPSN). IEEE, New York, pp 1–12
Zurück zum Zitat LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef
Zurück zum Zitat Ljung L (1999) Systems identification: theory for the user, 2nd edn. Prentice Hall PTR, Upper Saddle River Ljung L (1999) Systems identification: theory for the user, 2nd edn. Prentice Hall PTR, Upper Saddle River
Zurück zum Zitat Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440 Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Zurück zum Zitat Lu C, Abdelzaher TF, Stankovic JA, Son SH (2001) A feedback control approach for guaranteeing relative delays in web servers. In: RTAS ’01: Proceedings of the seventh real-time technology and applications symposium (RTAS ’01) Lu C, Abdelzaher TF, Stankovic JA, Son SH (2001) A feedback control approach for guaranteeing relative delays in web servers. In: RTAS ’01: Proceedings of the seventh real-time technology and applications symposium (RTAS ’01)
Zurück zum Zitat Lu C, Stankovic JA, Son SH, Tao G (2002) Feedback control real-time scheduling: framework, modeling, and algorithms. Real Time Syst 23(1–2):85–126CrossRefMATH Lu C, Stankovic JA, Son SH, Tao G (2002) Feedback control real-time scheduling: framework, modeling, and algorithms. Real Time Syst 23(1–2):85–126CrossRefMATH
Zurück zum Zitat Lu C, Wang X, Gill C (2003) Feedback control real-time scheduling in orb middleware. In: RTAS ’03: Proceedings of the 9th IEEE real-time and embedded technology and applications symposium. IEEE Computer Society, Washington, DC, p 37 Lu C, Wang X, Gill C (2003) Feedback control real-time scheduling in orb middleware. In: RTAS ’03: Proceedings of the 9th IEEE real-time and embedded technology and applications symposium. IEEE Computer Society, Washington, DC, p 37
Zurück zum Zitat Lu Y, Abdelzaher TF, Saxena A (2004) Design, implementation, and evaluation of differentiated caching services. IEEE Trans Parallel Distrib Syst 15(5):440–452CrossRef Lu Y, Abdelzaher TF, Saxena A (2004) Design, implementation, and evaluation of differentiated caching services. IEEE Trans Parallel Distrib Syst 15(5):440–452CrossRef
Zurück zum Zitat Mei X, Wang Q, Chu X (2017) A survey and measurement study of gpu dvfs on energy conservation. Digital Commun Netw 3(2):89–100CrossRef Mei X, Wang Q, Chu X (2017) A survey and measurement study of gpu dvfs on energy conservation. Digital Commun Netw 3(2):89–100CrossRef
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef
Zurück zum Zitat Ovtcharov K, Ruwase O, Kim JY, Fowers J, Strauss K, Chung ES (2015) Toward accelerating deep learning at scale using specialized hardware in the datacenter. In: Hot chips 27 symposium (HCS). IEEE, New York, pp 1–38 Ovtcharov K, Ruwase O, Kim JY, Fowers J, Strauss K, Chung ES (2015) Toward accelerating deep learning at scale using specialized hardware in the datacenter. In: Hot chips 27 symposium (HCS). IEEE, New York, pp 1–38
Zurück zum Zitat Pallipadi V, Starikovskiy A (2006) The ondemand governor. Proc Linux Symp 2:215–230 Pallipadi V, Starikovskiy A (2006) The ondemand governor. Proc Linux Symp 2:215–230
Zurück zum Zitat Parekh S, Gandhi N, Hellerstein J, Tilbury D, Jayram T, Bigus J (2002) Using control theory to achieve service level objectives in performance management. Real Time Syst 23(1–2):127–141CrossRefMATH Parekh S, Gandhi N, Hellerstein J, Tilbury D, Jayram T, Bigus J (2002) Using control theory to achieve service level objectives in performance management. Real Time Syst 23(1–2):127–141CrossRefMATH
Zurück zum Zitat Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:​1606.​02147
Zurück zum Zitat Strang G (2016) Introduction to linear algebra, vol 5. Wellesley-Cambridge Press, WellesleyMATH Strang G (2016) Introduction to linear algebra, vol 5. Wellesley-Cambridge Press, WellesleyMATH
Zurück zum Zitat Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Zurück zum Zitat Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp 2365–2369 Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp 2365–2369
Zurück zum Zitat Yao F, Demers A, Shenker S (1995) A scheduling model for reduced cpu energy. In: Proceedings of the 36th annual symposium on foundations of computer science, pp 374–382 Yao F, Demers A, Shenker S (1995) A scheduling model for reduced cpu energy. In: Proceedings of the 36th annual symposium on foundations of computer science, pp 374–382
Metadaten
Titel
DeepRT: predictable deep learning inference for cyber-physical systems
verfasst von
Woochul Kang
Jaeyong Chung
Publikationsdatum
18.07.2018
Verlag
Springer US
Erschienen in
Real-Time Systems / Ausgabe 1/2019
Print ISSN: 0922-6443
Elektronische ISSN: 1573-1383
DOI
https://doi.org/10.1007/s11241-018-9314-y

Weitere Artikel der Ausgabe 1/2019

Real-Time Systems 1/2019 Zur Ausgabe

Premium Partner