nach oben

Real-Time Systems

Erschienen in:

18.07.2018

DeepRT: predictable deep learning inference for cyber-physical systems

verfasst von: Woochul Kang, Jaeyong Chung

Erschienen in: Real-Time Systems | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recently, in mobile and embedded devices, deep learning is changing the way computers see, hear, and understand the world. When deep learning is deployed to such systems, they are supposed to perform inference tasks in a timely and energy-efficient manner. Lots of research has focused on taming deep learning for resource-constrained devices by either compressing deep learning models or devising hardware accelerators. However, these approaches have focused on providing ‘best-effort’ performance for such devices. In this paper, we present the design and implementation of DeepRT, a novel deep learning inference runtime. Unlike previous approaches, DeepRT focuses on supporting predictable temporal and spatial inference performance when deep learning models are used under unpredictable and resource-constrained environments. In particular, DeepRT applies formal control theory to support Quality-of-Service (QoS) management that can dynamically minimize the tardiness of inference tasks at runtime while achieving high energy-efficiency. Further, DeepRT determines a proper level of compression of deep learning models at runtime according to the memory availability and users’ QoS requirements, resulting in proper trade-offs between the memory savings and the losses of inference accuracy. We evaluate DeepRT on a wide range of deep learning models under various conditions. The experimental results show that DeepRT supports the timeliness of inference tasks in a robust and energy-efficient manner.

Vorheriger Artikel Real-time analysis of priority-preemptive NoCs with arbitrary buffer sizes and router delays

Nächster Artikel Correspondence article: a correction of the reduction-based schedulability analysis for APA scheduling

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The details of the experiment setting is discussed in Sect. 4.1. The system power is measured using an external power meter. Only the result for the CaffeNet model is shown since other models, e.g., LeNet, manifest very similar behavior.

CaffeNet is a variant of AlexNet Krizhevsky et al. (2012).

Abe Y, Sasaki H, Kato S, Inoue K, Edahiro M, Peres M (2014) Power and performance characterization and modeling of gpu-accelerated systems. In: 2014 IEEE 28th international parallel and distributed processing symposium, pp 113–122. https://doi.org/10.1109/IPDPS.2014.23

Amert T, Otterness N, Yang M, Anderson JH, Smith FD (2017) Gpu scheduling on the nvidia tx2: hidden details revealed. 2017 IEEE real-time systems symposium (RTSS), pp 104–115

Caffe Model Zoo (2018) https://github.com/bvlc/caffe/wiki/model-zoo

Chen JJ, Kuo CF (2007) Energy-efficient scheduling for real-time systems on dynamic voltage scaling (dvs) platforms. In: 13th IEEE international conference on embedded and real-time computing systems and applications, pp 28–38

Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, pp 2285–2294

Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138CrossRef

Chung J, Shin T (2016) Simplifying deep neural networks for neuromorphic architectures. In: 2016 53nd ACM/EDAC/IEEE design automation conference (DAC), pp 1–6. https://doi.org/10.1145/2897937.2898092

Deng L, Yu D (2014) Deep learning: methods and applications. Technical Report. https://www.microsoft.com/en-us/research/publication/deep-learning-methods-and-applications/

Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277

Du Z, Fasthuber R, Chen T, Ienne P, Li L, Luo T, Feng X, Chen Y, Temam O (2015) Shidiannao: shifting vision processing closer to the sensor. In: Proceedings of the 2015 ACM/IEEE 42nd annual international symposium on computer architecture (ISCA). IEEE, New York, pp 92–104

Falcini F, Lami G, Costanza AM (2017) Deep learning in automotive software. IEEE Softw 34(3):56–63CrossRef

Fu X, Wang X (2011) Utilization-controlled task consolidation for power optimization in multi-core real-time systems. In: Proceedings of the 2011 IEEE 17th international conference on embedded and real-time computing systems and applications (RTCSA), vol 1, pp 73–82

Fu Y, Kottenstette N, Lu C, Koutsoukos XD (2012) Feedback thermal control of real-time systems on multicore processors. In: Proceedings of the tenth ACM international conference on embedded software, EMSOFT ’12. ACM, New York, pp 113–122

Gong Y, Liu L, Yang M, Bourdev L (2014) Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115

Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge. http://www.deeplearningbook.org

Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) Eie: efficient inference engine on compressed deep neural network. In: Proceedings of the 43rd international symposium on computer architecture, ISCA ’16, pp 243–254

Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. CoRR abs/1510.00149. http://arxiv.org/abs/1510.00149

Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural networks. In: Proceedings of the 28th international conference on neural information processing systems, NIPS’15. MIT Press, Cambridge, pp 1135–1143. http://dl.acm.org/citation.cfm?id=2969239.2969366

Hellerstein JL, Diao Y, Parekh S, Tilbury DM (2004) Feedback control of computing systems. Wiley IEEE press, HobokenCrossRef

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

Ishihara T, Yasuura H (1998) Voltage scheduling problem for dynamically variable voltage processors. In: Proceedings, 1998 international symposium on low power electronics and design. IEEE, New York, pp 197–202

Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British machine vision conference. BMVA Press

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093

Kang W, Chung J (2017) Energy-efficient response time management for embedded databases. Real Time Syst 53(2):228–253. https://doi.org/10.1007/s11241-016-9264-1 CrossRef

Kang W, Son SH, Stankovic JA (2012) Design, implementation, and evaluation of a qos-aware real-time embedded database. IEEE Trans Comput 61(1):45–59MathSciNetCrossRefMATH

Kim DHK, Imes C, Hoffmann H (2015) Racing and pacing to idle: theoretical and empirical analysis of energy optimization heuristics. In: 2015 IEEE 3rd international conference on cyber-physical systems, networks, and applications, pp 78–85. https://doi.org/10.1109/CPSNA.2015.23

Kim Y, Park E, Yoo S, Choi T, Yang L, Shin D (2015) Compression of deep convolutional neural networks for fast and low power mobile applications. CoRR abs/1511.06530. http://arxiv.org/abs/1511.06530

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

Lane ND, Bhattacharya S, Georgiev P, Forlivesi C, Jiao L, Qendro L, Kawsar F (2016) Deepx: a software accelerator for low-power deep learning inference on mobile devices. In: 2016 15th ACM/IEEE international conference on information processing in sensor networks (IPSN). IEEE, New York, pp 1–12

LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef

Liu CL, Layland JW (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61. https://doi.org/10.1145/321738.321743 MathSciNetCrossRefMATH

Ljung L (1999) Systems identification: theory for the user, 2nd edn. Prentice Hall PTR, Upper Saddle River

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

Lu C, Abdelzaher TF, Stankovic JA, Son SH (2001) A feedback control approach for guaranteeing relative delays in web servers. In: RTAS ’01: Proceedings of the seventh real-time technology and applications symposium (RTAS ’01)

Lu C, Stankovic JA, Son SH, Tao G (2002) Feedback control real-time scheduling: framework, modeling, and algorithms. Real Time Syst 23(1–2):85–126CrossRefMATH

Lu C, Wang X, Gill C (2003) Feedback control real-time scheduling in orb middleware. In: RTAS ’03: Proceedings of the 9th IEEE real-time and embedded technology and applications symposium. IEEE Computer Society, Washington, DC, p 37

Lu Y, Abdelzaher TF, Saxena A (2004) Design, implementation, and evaluation of differentiated caching services. IEEE Trans Parallel Distrib Syst 15(5):440–452CrossRef

Mei X, Wang Q, Chu X (2017) A survey and measurement study of gpu dvfs on energy conservation. Digital Commun Netw 3(2):89–100CrossRef

Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRef

Nvidia TensorRT (2017) https://developer.nvidia.com/tensorrt

Ovtcharov K, Ruwase O, Kim JY, Fowers J, Strauss K, Chung ES (2015) Toward accelerating deep learning at scale using specialized hardware in the datacenter. In: Hot chips 27 symposium (HCS). IEEE, New York, pp 1–38

Pallipadi V, Starikovskiy A (2006) The ondemand governor. Proc Linux Symp 2:215–230

Parekh S, Gandhi N, Hellerstein J, Tilbury D, Jayram T, Bigus J (2002) Using control theory to achieve service level objectives in performance management. Real Time Syst 23(1–2):127–141CrossRefMATH

Park S, Humphrey MA (2011) Predictable high-performance computing using feedback control and admission control. IEEE Trans Parallel Distrib Syst 22(3):396–411. https://doi.org/10.1109/TPDS.2010.100 CrossRef

Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis IJCV 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y MathSciNetCrossRef

Stewart J (2018) Self-driving cars use crazy amounts of power, and it’s becoming a problem. Wired. https://www.wired.com/story/self-driving-cars-power-consumption-nvidia-chip/

Strang G (2016) Introduction to linear algebra, vol 5. Wellesley-Cambridge Press, WellesleyMATH

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Interspeech, pp 2365–2369

Yao F, Demers A, Shenker S (1995) A scheduling model for reduced cpu energy. In: Proceedings of the 36th annual symposium on foundations of computer science, pp 374–382

Titel: DeepRT: predictable deep learning inference for cyber-physical systems
verfasst von: Woochul Kang
Jaeyong Chung
Publikationsdatum: 18.07.2018
Verlag: Springer US
Erschienen in: Real-Time Systems / Ausgabe 1/2019
Print ISSN: 0922-6443
Elektronische ISSN: 1573-1383
DOI: https://doi.org/10.1007/s11241-018-9314-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2019

Real-time analysis of priority-preemptive NoCs with arbitrary buffer sizes and router delays

Correspondence article: a correction of the reduction-based schedulability analysis for APA scheduling

Many suspensions, many problems: a review of self-suspending tasks in real-time systems

Stretching algorithm for global scheduling of real-time DAG tasks

Uniprocessor scheduling of real-time synchronous dataflow tasks

Premium Partner