Skip to main content
Erschienen in: The Journal of Supercomputing 10/2018

23.05.2018

E-OSched: a load balancing scheduler for heterogeneous multicores

verfasst von: Yasir Noman Khalid, Muhammad Aleem, Radu Prodan, Muhammad Azhar Iqbal, Muhammad Arshad Islam

Erschienen in: The Journal of Supercomputing | Ausgabe 10/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications. These heterogeneous devices are based on CPUs and GPUs. OpenCL is deemed as one of the industry standards to program heterogeneous machines. The conventional application scheduling mechanisms allocate most of the applications to GPUs while leaving CPU device underutilized. This underutilization of slower devices (such as CPU) often originates the sub-optimal performance of data-parallel applications in terms of load balance, execution time, and throughput. Moreover, multiple scheduled applications on a heterogeneous system further aggravate the problem of performance inefficiency. This paper is an attempt to evade the aforementioned deficiencies via initiating a novel scheduling strategy named OSched. An enhancement to the OSched named E-OSched is also part of this study. The OSched performs the resource-aware assignment of jobs to both CPUs and GPUs while ensuring a balanced load. The load balancing is achieved via contemplation on computational requirements of jobs and computing potential of a device. The load-balanced execution is beneficiary in terms of lower execution time, higher throughput, and improved utilization. The E-OSched reduces the magnitude of the main memory contention during concurrent job execution phase. The mathematical model of the proposed algorithms is evaluated by comparison of simulation results with different state-of-the-art scheduling heuristics. The results revealed that the proposed E-OSched has performed significantly well than the state-of-the-art scheduling heuristics by obtaining up to 8.09% improved execution time and up to 7.07% better throughput.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this research job terminology is used to define an OpenCL application that consists of a host program and kernel functions.
 
2
FLOPS = Floating Point Operations Per Second.
 
Literatur
2.
Zurück zum Zitat Aleem M, Prodan R, Fahringer T (2011) Scheduling javasymphony applications on many-core parallel computers. In: Euro-Par 2011 Parallel Processing. Springer, pp 167–179 Aleem M, Prodan R, Fahringer T (2011) Scheduling javasymphony applications on many-core parallel computers. In: Euro-Par 2011 Parallel Processing. Springer, pp 167–179
4.
Zurück zum Zitat Augonnet C, Thibault S, Namyst R, Wacrenier P-A, Wacrenier StarPU P-A (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23:187–198CrossRef Augonnet C, Thibault S, Namyst R, Wacrenier P-A, Wacrenier StarPU P-A (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23:187–198CrossRef
5.
7.
Zurück zum Zitat Binotto APD, Pereira CE, Kuijper A, Stork A, Fellner DW (2011) An effective dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC). IEEE, pp 78–85 Binotto APD, Pereira CE, Kuijper A, Stork A, Fellner DW (2011) An effective dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC). IEEE, pp 78–85
8.
Zurück zum Zitat Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers. ACM, p 21 Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers. ACM, p 21
9.
Zurück zum Zitat Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. IEEE International Symposium on Workload Characterization, 2009. IEEE, pp 44–54 Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. IEEE International Symposium on Workload Characterization, 2009. IEEE, pp 44–54
10.
Zurück zum Zitat Chen Z, Marculescu D (2017) Task scheduling for heterogeneous multicore systems. arXiv Prepr. arXiv1712.03209 Chen Z, Marculescu D (2017) Task scheduling for heterogeneous multicore systems. arXiv Prepr. arXiv1712.03209
14.
Zurück zum Zitat Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar). IEEE, pp 1–10 Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar). IEEE, pp 1–10
15.
Zurück zum Zitat Gregg C, Boyer M, Hazelwood K, Skadron K (2011) Dynamic heterogeneous scheduling decisions using historical runtime data. In: Proceedings of the 2nd Workshop on Applications for Multi-and Many-Core Processors. San Jose, CA Gregg C, Boyer M, Hazelwood K, Skadron K (2011) Dynamic heterogeneous scheduling decisions using historical runtime data. In: Proceedings of the 2nd Workshop on Applications for Multi-and Many-Core Processors. San Jose, CA
16.
Zurück zum Zitat Gregg C, Brantley JS, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: 2nd USENIX Workshop on Hot Topics Parallelism Gregg C, Brantley JS, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: 2nd USENIX Workshop on Hot Topics Parallelism
17.
Zurück zum Zitat Grewe D, O’Boyle MF (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: International Conference on Compiler Construction. Springer, pp 286–305 Grewe D, O’Boyle MF (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: International Conference on Compiler Construction. Springer, pp 286–305
20.
Zurück zum Zitat Jiménez VJ, Vilanova L, Gelado I, Gil M, Fursin G, Navarro N (2009) Predictive runtime code scheduling for heterogeneous architectures. In: International Conference on High-Performance Embedded Architectures and Compilers. Springer Berlin Heidelberg, pp 19–33CrossRef Jiménez VJ, Vilanova L, Gelado I, Gil M, Fursin G, Navarro N (2009) Predictive runtime code scheduling for heterogeneous architectures. In: International Conference on High-Performance Embedded Architectures and Compilers. Springer Berlin Heidelberg, pp 19–33CrossRef
21.
Zurück zum Zitat Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, pp 151–162 Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, pp 151–162
22.
Zurück zum Zitat Kofler K, Grasso I, Cosenza B, Fahringer T (2013) An automatic input-sensitive approach for heterogeneous task partitioning categories and subject descriptors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing—ICS’13. pp 149–160. https://doi.org/10.1145/2464996.2465007 Kofler K, Grasso I, Cosenza B, Fahringer T (2013) An automatic input-sensitive approach for heterogeneous task partitioning categories and subject descriptors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing—ICS’13. pp 149–160. https://​doi.​org/​10.​1145/​2464996.​2465007
23.
Zurück zum Zitat Lee J, Samadi M, Mahlke S (2015a) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, pp 355–366 Lee J, Samadi M, Mahlke S (2015a) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, pp 355–366
25.
Zurück zum Zitat Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE Press, pp 245–256 Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE Press, pp 245–256
26.
Zurück zum Zitat Lösch A, Beisel T, Kenter T, Plessl C, Platzner M (2016) Performance-centric scheduling with task migration for a heterogeneous compute node in the data center. In: Proceedings of the 2016 Conference on Design, Automation and Test in Europe. EDA Consortium, pp 912–917 Lösch A, Beisel T, Kenter T, Plessl C, Platzner M (2016) Performance-centric scheduling with task migration for a heterogeneous compute node in the data center. In: Proceedings of the 2016 Conference on Design, Automation and Test in Europe. EDA Consortium, pp 912–917
27.
Zurück zum Zitat Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 45–55 Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 45–55
31.
Zurück zum Zitat Pandit P, Govindarajan R (2014) Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, p 273. https://doi.org/10.1145/2544137.2544163 Pandit P, Govindarajan R (2014) Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, p 273. https://​doi.​org/​10.​1145/​2544137.​2544163
33.
Zurück zum Zitat Rohr D, Kalcher S, Bach M, Alaqeeliy AA, Alzaidy HM, Eschweiler D, Lindenstruth V, Alkhereyfy SB, Alharthiy A, Almubaraky A, Alqwaizy I, Suliman RB (2014) An energy-efficient multi-GPU supercomputer. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS). IEEE, Paris, pp 42–45. https://doi.org/10.1109/HPCC.2014.14 Rohr D, Kalcher S, Bach M, Alaqeeliy AA, Alzaidy HM, Eschweiler D, Lindenstruth V, Alkhereyfy SB, Alharthiy A, Almubaraky A, Alqwaizy I, Suliman RB (2014) An energy-efficient multi-GPU supercomputer. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS). IEEE, Paris, pp 42–45. https://​doi.​org/​10.​1109/​HPCC.​2014.​14
34.
Zurück zum Zitat Rul S, Vandierendonck H, D’haene J, De Bosschere K (2010) An experimental study on performance portability of OpenCL kernels. Papers presented at the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC ’10) Rul S, Vandierendonck H, D’haene J, De Bosschere K (2010) An experimental study on performance portability of OpenCL kernels. Papers presented at the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC ’10)
36.
Zurück zum Zitat Sun E, Schaa D, Bagley R, Rubin N, Kaeli D (2012) Enabling task-level scheduling on heterogeneous platforms *. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM, pp 84–93 Sun E, Schaa D, Bagley R, Rubin N, Kaeli D (2012) Enabling task-level scheduling on heterogeneous platforms *. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM, pp 84–93
37.
Zurück zum Zitat Wang Z, Zheng L, Chen Q, Guo M (2013) CAP: co-scheduling based on asymptotic profiling in CPU + GPU hybrid systems. In: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores—PMAM’13. ACM, pp 107–114. https://doi.org/10.1145/2442992.2443004 Wang Z, Zheng L, Chen Q, Guo M (2013) CAP: co-scheduling based on asymptotic profiling in CPU + GPU hybrid systems. In: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores—PMAM’13. ACM, pp 107–114. https://​doi.​org/​10.​1145/​2442992.​2443004
39.
Zurück zum Zitat Wen Y, Wang Z, O’boyle MFP (2014) Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC). IEEE, pp 1–10 Wen Y, Wang Z, O’boyle MFP (2014) Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC). IEEE, pp 1–10
Metadaten
Titel
E-OSched: a load balancing scheduler for heterogeneous multicores
verfasst von
Yasir Noman Khalid
Muhammad Aleem
Radu Prodan
Muhammad Azhar Iqbal
Muhammad Arshad Islam
Publikationsdatum
23.05.2018
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 10/2018
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-018-2435-1

Weitere Artikel der Ausgabe 10/2018

The Journal of Supercomputing 10/2018 Zur Ausgabe

Premium Partner