Skip to main content
Top
Published in: The Journal of Supercomputing 14/2023

20-04-2023

A machine learning-based resource-efficient task scheduler for heterogeneous computer systems

Authors: Asad Hayat, Yasir Noman Khalid, Muhammad Siraj Rathore, Muhammad Nadeem Nadir

Published in: The Journal of Supercomputing | Issue 14/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Heterogeneous computer systems are becoming mainstream due to their disparate processing and performance capabilities. These systems consist of different types of devices, i.e., central processing units (CPUs), accelerators, and graphics processing units (GPUs). In the heterogeneous computing environment, if one device is more powerful in terms of computing capability, the scheduling schemes generally favor the powerful device, and that device becomes overloaded, while the other device is underutilized. This load imbalance problem results in increased execution time. In this research, we propose load-balanced task scheduler combined with machine learning-based device predictor. The device predictor is used to predict execution time both on CPU and GPU devices, and a device with shorter predicted execution time is considered as a suitable device for that particular task. However, it may happen that a high fraction of tasks map only on one type of device since that device is considered as a suitable device for them. It is due to the fact that a task is mapped to one device (with lower predicted execution time), although it can be executed on the other device as well. In this context, one device may become overloaded, while the other device may be underutilized. To solve this problem of load imbalance, we use work-stealing-based task scheduler as part of our solution that allows an idle device to process tasks from the queue of another’s device. In this way, we can avoid load imbalance, minimize the overall execution time of tasks, and maximize the device utilization and throughput. We evaluate the performance of our proposed solution into two stages. Firstly, we measure the error rate of our machine learning predictor using three different algorithms (i.e., random forest, gradient boosting, and multiple linear regression). We demonstrate that random forest performs better with marginal error rate. Secondly, we compare the performance of work-stealing task scheduler with other scheduling alternatives. Our results show that the proposed solution reduces execution time by 65.63%, increased resource utilization by 93.3%, and throughput by 65.5% in comparison with baseline scheduling schemes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
17.
go back to reference Wang Z, Zheng L, Chen Q, and Guo M (2013) CAP: Co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems, in: Proceedings of the 2013 International Workshop on Programming Models And Applications for Multicores and Manycores, PMAM 2013, pp. 107–114. https://doi.org/10.1145/2442992.2443004. Wang Z, Zheng L, Chen Q, and Guo M (2013) CAP: Co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems, in: Proceedings of the 2013 International Workshop on Programming Models And Applications for Multicores and Manycores, PMAM 2013, pp. 107–114. https://​doi.​org/​10.​1145/​2442992.​2443004.
20.
26.
go back to reference Rahmani TA, Daham F, Belalem G, Mahmoudi SA, HBalancer: A machine learning based load balancer in real time CPU-GPU heterogeneous systems, in: 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT 2022, pp. 674–679, 2022, https://doi.org/10.1109/3ICT56508.2022.9990623. Rahmani TA, Daham F, Belalem G, Mahmoudi SA, HBalancer: A machine learning based load balancer in real time CPU-GPU heterogeneous systems, in: 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies, 3ICT 2022, pp. 674–679, 2022, https://​doi.​org/​10.​1109/​3ICT56508.​2022.​9990623.
27.
go back to reference Alsubaihi S and Gaudiot JL (2017) A runtime workload distribution with resource allocation for CPU-GPU heterogeneous systems,in: Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, pp. 994–1003, https://doi.org/10.1109/IPDPSW.2017.19. Alsubaihi S and Gaudiot JL (2017) A runtime workload distribution with resource allocation for CPU-GPU heterogeneous systems,in: Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, pp. 994–1003, https://​doi.​org/​10.​1109/​IPDPSW.​2017.​19.
29.
33.
go back to reference “Scheduler — CAF 0.17.5 documentation.” “Scheduler — CAF 0.17.5 documentation.”
34.
go back to reference Kreiliger F, Matejka J, Sojka M, Hanzálek Z (2019) Experiments for predictable execution of GPU Kernels. Ospert 2019:23 Kreiliger F, Matejka J, Sojka M, Hanzálek Z (2019) Experiments for predictable execution of GPU Kernels. Ospert 2019:23
35.
Metadata
Title
A machine learning-based resource-efficient task scheduler for heterogeneous computer systems
Authors
Asad Hayat
Yasir Noman Khalid
Muhammad Siraj Rathore
Muhammad Nadeem Nadir
Publication date
20-04-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 14/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05266-4

Other articles of this Issue 14/2023

The Journal of Supercomputing 14/2023 Go to the issue

Premium Partner