Skip to main content

2024 | OriginalPaper | Buchkapitel

Towards a Flexible Accuracy-Oriented Deep Learning Module Inference Latency Prediction Framework for Adaptive Optimization Algorithms

verfasst von : Jingran Shen, Nikos Tziritas, Georgios Theodoropoulos

Erschienen in: Intelligent Information Processing XII

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the rapid development of Deep Learning, more and more applications on the cloud and edge tend to utilize large DNN (Deep Neural Network) models for improved task execution efficiency as well as decision-making quality. Due to memory constraints, models are commonly optimized using compression, pruning, and partitioning algorithms to become deployable onto resource-constrained devices. As the conditions in the computational platform change dynamically, the deployed optimization algorithms should accordingly adapt their solutions. To perform frequent evaluations of these solutions in a timely fashion, RMs (Regression Models) are commonly trained to predict the relevant solution quality metrics, such as the resulted DNN module inference latency, which is the focus of this paper. Existing prediction frameworks specify different RM training workflows, but none of them allow flexible configurations of the input parameters (e.g., batch size, device utilization rate) and of the selected RMs for different modules. In this paper, a deep learning module inference latency prediction framework is proposed, which i) hosts a set of customizable input parameters to train multiple different RMs per DNN module (e.g., convolutional layer) with self-generated datasets, and ii) automatically selects a set of trained RMs leading to the highest possible overall prediction accuracy, while keeping the prediction time/space consumption as low as possible. Furthermore, a new RM, namely MEDN (Multi-task Encoder-Decoder Network), is proposed as an alternative solution. Comprehensive experiment results show that MEDN is fast and lightweight, and capable of achieving the highest overall prediction accuracy and R-squared value. The Time/Space-efficient Auto-selection algorithm also manages to improve the overall accuracy by 2.5% and R-squared by 0.39%, compared to the MEDN single-selection scheme.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Banitalebi-Dehkordi, A., Vedula, N., Pei, J., Xia, F., Wang, L., Zhang, Y.: Auto-split: a general framework of collaborative edge-cloud AI. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. p. 2543-2553. KDD 2021, Association for Computing Machinery (2021). https://doi.org/10.1145/3447548.3467078 Banitalebi-Dehkordi, A., Vedula, N., Pei, J., Xia, F., Wang, L., Zhang, Y.: Auto-split: a general framework of collaborative edge-cloud AI. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. p. 2543-2553. KDD 2021, Association for Computing Machinery (2021). https://​doi.​org/​10.​1145/​3447548.​3467078
2.
Zurück zum Zitat Bank, D., Koenigstein, N., Giryes, R.: Autoencoders (2021) Bank, D., Koenigstein, N., Giryes, R.: Autoencoders (2021)
3.
Zurück zum Zitat Brown, T.B., et al.: Language models are few-shot learners (2020) Brown, T.B., et al.: Language models are few-shot learners (2020)
8.
Zurück zum Zitat Li, E., Zeng, L., Zhou, Z., Chen, X.: Edge AI: on-demand accelerating deep neural network inference via edge computing (2019) Li, E., Zeng, L., Zhou, Z., Chen, X.: Edge AI: on-demand accelerating deep neural network inference via edge computing (2019)
10.
Zurück zum Zitat Liu, G., Dai, F., Huang, B., Li, L., Wang, S., Qiang, Z.: Towards accurate latency prediction of DNN layers inference on diverse computing platforms. In: 2022 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 1–7 (2022). https://doi.org/10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927862 Liu, G., Dai, F., Huang, B., Li, L., Wang, S., Qiang, Z.: Towards accurate latency prediction of DNN layers inference on diverse computing platforms. In: 2022 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 1–7 (2022). https://​doi.​org/​10.​1109/​DASC/​PiCom/​CBDCom/​Cy55231.​2022.​9927862
12.
Zurück zum Zitat Mendoza, D.: Predicting latency of neural network inference (2020) Mendoza, D.: Predicting latency of neural network inference (2020)
13.
Zurück zum Zitat Paszke, A., et al.: PyTorch: An Imperative Style. High-Performance Deep Learning Library. Curran Associates Inc., Red Hook (2019) Paszke, A., et al.: PyTorch: An Imperative Style. High-Performance Deep Learning Library. Curran Associates Inc., Red Hook (2019)
14.
Zurück zum Zitat Shao, J., Zhang, H., Mao, Y., Zhang, J.: Branchy-GNN: a device-edge co-inference framework for efficient point cloud processing (2023) Shao, J., Zhang, H., Mao, Y., Zhang, J.: Branchy-GNN: a device-edge co-inference framework for efficient point cloud processing (2023)
17.
Metadaten
Titel
Towards a Flexible Accuracy-Oriented Deep Learning Module Inference Latency Prediction Framework for Adaptive Optimization Algorithms
verfasst von
Jingran Shen
Nikos Tziritas
Georgios Theodoropoulos
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-57808-3_3