Skip to main content
Erschienen in: The Journal of Supercomputing 7/2021

02.01.2021

PEPS: predictive energy-efficient parallel scheduler for multi-core processors

verfasst von: Zeinab Maghsoud, Hamid Noori, Saadat Pour Mozaffari

Erschienen in: The Journal of Supercomputing | Ausgabe 7/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In multi-core processors, energy efficiency and performance consideration are essential issues. Usually, energy-saving techniques result in performance loss and vice versa. Therefore, energy delay product (EDP) is used broadly in many applications as a trade-off between energy saving and performance improvement. This paper presents a technique to perform work-stealing scheduling in the operating system kernel without needing any modification to the user-space program. The proposed scheduling uses predictive models to determine the optimal active number of cores and clock frequency of the processor as an optimum configuration at runtime for any running program to achieve the minimum EDP value. Since EDP is considered as a long-term metric, at runtime, in each specific time frame, PEPS uses the instruction per watt (IPW) to determine the best configuration. By using performance and power predicting models, PEPS finds the optimal configuration in terms of energy efficiency for the next time interval. Because different workloads at runtime have different behaviors and programs with different degrees of parallelization acted variously, the proposed method uses performance counters as a factor for workload characterization. Compared to the Linux scheduler, the proposed algorithm has up to 25% improvement in energy saving at the cost of 7% performance loss. Moreover, while reducing the temperature by 24%, it results in 19% improvement in EDP.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Hennessy J, Patterson D (2006) Computer architecture: a quantitative approach, vol 4. Morgan Kaufman, San FranciscoMATH Hennessy J, Patterson D (2006) Computer architecture: a quantitative approach, vol 4. Morgan Kaufman, San FranciscoMATH
2.
Zurück zum Zitat Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38:114–117 Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38:114–117
3.
Zurück zum Zitat Blumofe RD (1995) Executing multithreaded programs efficiently. Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Blumofe RD (1995) Executing multithreaded programs efficiently. Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
4.
Zurück zum Zitat Gautier T, Besseron X, Pigeon L (2007). Kaapi: a thread scheduling runtime system for data flow computations on cluster of multiprocessors. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation. ACM, New York, pp 15–23 Gautier T, Besseron X, Pigeon L (2007). Kaapi: a thread scheduling runtime system for data flow computations on cluster of multiprocessors. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation. ACM, New York, pp 15–23
5.
Zurück zum Zitat Leiserson CE, Charles E (2009) The Cilk++ concurrency platform. In: Proceedings of the 46th Annual Design Automation Conference (DAC09), pp 522–527 Leiserson CE, Charles E (2009) The Cilk++ concurrency platform. In: Proceedings of the 46th Annual Design Automation Conference (DAC09), pp 522–527
6.
Zurück zum Zitat Duran A, Corbal J and Ayguad Eduard (2008). Evaluation of OpenMP task scheduling strategies. In: Eigenmann R, de Supinski BR (eds) OpenMP in a New Era of Parallelism. IWOMP. Lecture Notes in Computer Science, vol 5004. Springer, Berlin Duran A, Corbal J and Ayguad Eduard (2008). Evaluation of OpenMP task scheduling strategies. In: Eigenmann R, de Supinski BR (eds) OpenMP in a New Era of Parallelism. IWOMP. Lecture Notes in Computer Science, vol 5004. Springer, Berlin
7.
Zurück zum Zitat Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: OOPSLA’05: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, New York, pp 519–538 Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: OOPSLA’05: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, New York, pp 519–538
8.
Zurück zum Zitat Horowitz M, Indermaur T, González R (1994) Low-power digital design. In: Proceedings of 1994 IEEE Symposium on Low Power Electronics, pp 8–11 Horowitz M, Indermaur T, González R (1994) Low-power digital design. In: Proceedings of 1994 IEEE Symposium on Low Power Electronics, pp 8–11
9.
Zurück zum Zitat Sergey Z, Carlos SJ, Sergey B, Alexandra F, Manuel P (2013) Survey of energy-cognizant scheduling techniques. IEEE Trans Parallel Distrib Syst 24:1447–1464CrossRef Sergey Z, Carlos SJ, Sergey B, Alexandra F, Manuel P (2013) Survey of energy-cognizant scheduling techniques. IEEE Trans Parallel Distrib Syst 24:1447–1464CrossRef
10.
Zurück zum Zitat Shinde J, Salankar SS (2011) Clock gating—a power optimizing technique for VLSI circuits. In: 2011 Annual IEEE India Conference, IEEE Shinde J, Salankar SS (2011) Clock gating—a power optimizing technique for VLSI circuits. In: 2011 Annual IEEE India Conference, IEEE
11.
Zurück zum Zitat Nandita S, Prakash NS, Shalakha D, Sivaranjani D (2015) Power Reduction by clock gating technique. Procedia Technol 21:631–635CrossRef Nandita S, Prakash NS, Shalakha D, Sivaranjani D (2015) Power Reduction by clock gating technique. Procedia Technol 21:631–635CrossRef
12.
Zurück zum Zitat Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent Systems and Applications: Proceedings of the International Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent Systems and Applications: Proceedings of the International
13.
Zurück zum Zitat Donald J, Martonosi M (2006) Techniques for multi-core thermal management: classification and new exploration. ACM SIGARCH Comput Archit News 34:2CrossRef Donald J, Martonosi M (2006) Techniques for multi-core thermal management: classification and new exploration. ACM SIGARCH Comput Archit News 34:2CrossRef
14.
Zurück zum Zitat Zanini F, Atienza D, Benini L, Micheli G (2009) Multi-core thermal management with model predictive control. In: European Conference Circuit Theory and Design (ECCTD), vol 1, pp 711–714 Zanini F, Atienza D, Benini L, Micheli G (2009) Multi-core thermal management with model predictive control. In: European Conference Circuit Theory and Design (ECCTD), vol 1, pp 711–714
15.
Zurück zum Zitat Wang Y, Ma K, Wang X (2009) Temperature-constrained power control for chip multiprocessors with online model estimation. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, pp 314–324 Wang Y, Ma K, Wang X (2009) Temperature-constrained power control for chip multiprocessors with online model estimation. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, pp 314–324
16.
Zurück zum Zitat Cui Y, Zhang W, He B (2017) A variation-aware adaptive fuzzy control system for thermal management of microprocessors. IEEE Trans Large Scale Integr (VLSI) Syst 25:683–695CrossRef Cui Y, Zhang W, He B (2017) A variation-aware adaptive fuzzy control system for thermal management of microprocessors. IEEE Trans Large Scale Integr (VLSI) Syst 25:683–695CrossRef
17.
Zurück zum Zitat Alrabea A, Alzubi OA, Alzubi JA (2020) A task-based model for minimizing energy consumption in WSNs. Energy Syst 29:1423–1431 Alrabea A, Alzubi OA, Alzubi JA (2020) A task-based model for minimizing energy consumption in WSNs. Energy Syst 29:1423–1431
18.
Zurück zum Zitat Lawler EL, Labetoulle J (1978) On preemptive scheduling of unrelated parallel processors by linear programming. J ACM (JACM) 25:612–619MathSciNetCrossRef Lawler EL, Labetoulle J (1978) On preemptive scheduling of unrelated parallel processors by linear programming. J ACM (JACM) 25:612–619MathSciNetCrossRef
19.
Zurück zum Zitat Bailis P, Reddi VJ, Gandhi S, Brooks D, Seltzer M (2011) Dimetrodon: processor-level preventive thermal management via idle cycle injection. In: IEEE 48th ACM/EDAC/IEEE Design Automation Conference (DAC), New York, USA Bailis P, Reddi VJ, Gandhi S, Brooks D, Seltzer M (2011) Dimetrodon: processor-level preventive thermal management via idle cycle injection. In: IEEE 48th ACM/EDAC/IEEE Design Automation Conference (DAC), New York, USA
20.
Zurück zum Zitat Chadha G, Mahlke S, Narayanasamy S (2012) When less is more (LIMO): controlled parallelism for improved efficiency. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 2012. CASES, pp 141–150 Chadha G, Mahlke S, Narayanasamy S (2012) When less is more (LIMO): controlled parallelism for improved efficiency. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 2012. CASES, pp 141–150
21.
Zurück zum Zitat Charr JC, Couturier R, Fanfakh A, Giersch A (2014) Dynamic frequency scaling for energy consumption reduction in synchronous distributed applications. In: IEEE International Symposium on Parallel and Distributed Processing with Applications Charr JC, Couturier R, Fanfakh A, Giersch A (2014) Dynamic frequency scaling for energy consumption reduction in synchronous distributed applications. In: IEEE International Symposium on Parallel and Distributed Processing with Applications
22.
Zurück zum Zitat Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent System and Applications, 2015 Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent System and Applications, 2015
23.
Zurück zum Zitat Chen Q, Guo M (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798MathSciNetCrossRef Chen Q, Guo M (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798MathSciNetCrossRef
24.
Zurück zum Zitat Cochran R, Hankendi C, Coskun A, Reda S (2011) Identifying the optimal energy-efficient operating points of parallel workloads. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Cochran R, Hankendi C, Coskun A, Reda S (2011) Identifying the optimal energy-efficient operating points of parallel workloads. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
25.
Zurück zum Zitat Ju T et al (2016) Thread count prediction model: dynamically adjusting threads for heterogeneous many-core systems. In: IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS) Ju T et al (2016) Thread count prediction model: dynamically adjusting threads for heterogeneous many-core systems. In: IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)
26.
Zurück zum Zitat Wang W, Davidson JW, Soffa ML (2016) Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain Wang W, Davidson JW, Soffa ML (2016) Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain
27.
Zurück zum Zitat De Daniele S, Torquati M, Danelutto M (2016) A reconfiguration algorithm for power-aware parallel applications. ACM Trans Archit Code Optim 43:1–25 De Daniele S, Torquati M, Danelutto M (2016) A reconfiguration algorithm for power-aware parallel applications. ACM Trans Archit Code Optim 43:1–25
29.
Zurück zum Zitat Blumofe RD, Leiserson CE, Santa Fe (1995) Scheduling multithreaded computations by work stealing. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, vol 46. Journal of the ACM, New Mexico, pp 356–368 Blumofe RD, Leiserson CE, Santa Fe (1995) Scheduling multithreaded computations by work stealing. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, vol 46. Journal of the ACM, New Mexico, pp 356–368
30.
Zurück zum Zitat Imam S, Sarkar V, Träff J, Hunold S, Versaci F (2015) Load balancing prioritized tasks via work-stealing. In: Euro-Par 2015: Parallel Processing. Lecture notes in Computer Science, vol 9233 Imam S, Sarkar V, Träff J, Hunold S, Versaci F (2015) Load balancing prioritized tasks via work-stealing. In: Euro-Par 2015: Parallel Processing. Lecture notes in Computer Science, vol 9233
31.
Zurück zum Zitat Guo Y et al (2010) SLAW: a scalable locality-aware adaptive work-stealing scheduler. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, USA, pp 1–12 Guo Y et al (2010) SLAW: a scalable locality-aware adaptive work-stealing scheduler. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, USA, pp 1–12
32.
Zurück zum Zitat Liu YD, Binghamton SUNY (2012) Green thieves in work stealing. In: ASPLOS’12 (Provactive Ideas session) Liu YD, Binghamton SUNY (2012) Green thieves in work stealing. In: ASPLOS’12 (Provactive Ideas session)
33.
Zurück zum Zitat Ribic H, Liu YD (2014) Energy-efficient work-stealing language runtimes. ACM SIGARCH Comput Archit News 4:513–528CrossRef Ribic H, Liu YD (2014) Energy-efficient work-stealing language runtimes. ACM SIGARCH Comput Archit News 4:513–528CrossRef
34.
Zurück zum Zitat Shankar S, Lakomski G, Alvarado C, Hay R (2014) Power aware work-stealing in homogeneous multi-core systems. In: FUTURE COMPUTING: the Sixth International Conference on Future Computational Technologies and Applications Shankar S, Lakomski G, Alvarado C, Hay R (2014) Power aware work-stealing in homogeneous multi-core systems. In: FUTURE COMPUTING: the Sixth International Conference on Future Computational Technologies and Applications
35.
Zurück zum Zitat Chen Q, Zheng L, Guo M, Phoenix HZ (2014) EEWA: energy-efficient workload-aware task scheduling in multi-core architectures. IEEE, AZ, USA Chen Q, Zheng L, Guo M, Phoenix HZ (2014) EEWA: energy-efficient workload-aware task scheduling in multi-core architectures. IEEE, AZ, USA
36.
Zurück zum Zitat Quan C, Minyi G (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798MathSciNetCrossRef Quan C, Minyi G (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798MathSciNetCrossRef
38.
Zurück zum Zitat Al-hayanni MA et al (2020) PARMA: parallelization-aware run-time management for energy-efficient many-core systems. IEEE Trans Comput (Early Access) 69:1507–1518MathSciNetCrossRef Al-hayanni MA et al (2020) PARMA: parallelization-aware run-time management for energy-efficient many-core systems. IEEE Trans Comput (Early Access) 69:1507–1518MathSciNetCrossRef
39.
Zurück zum Zitat Salami B, Noori H, Naghibzadeh M (2020) Fairness-aware energy efficient scheduling on heterogeneous multi-core processors. IEEE Trans Comput 70:72–82CrossRef Salami B, Noori H, Naghibzadeh M (2020) Fairness-aware energy efficient scheduling on heterogeneous multi-core processors. IEEE Trans Comput 70:72–82CrossRef
40.
Zurück zum Zitat Blumofe RD, Leiserson CE (1994) Scheduling multithreaded computations by work stealing. In: Proceeding of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico, pp 356–368 Blumofe RD, Leiserson CE (1994) Scheduling multithreaded computations by work stealing. In: Proceeding of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico, pp 356–368
41.
Zurück zum Zitat Bircher WL, John LK, San J (2007) Complete system power estimation: a trickle-down approach based on performance events. In: IEEE International Symposium on Performance Analysis of Systems & Software, CA, USA Bircher WL, John LK, San J (2007) Complete system power estimation: a trickle-down approach based on performance events. In: IEEE International Symposium on Performance Analysis of Systems & Software, CA, USA
42.
Zurück zum Zitat Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef
43.
Zurück zum Zitat Brodowski D, Golde N (2015) CPU frequency and voltage scaling code in the Linux (TM) kernel. Linux CPUFreq. CPUFreq Governors Brodowski D, Golde N (2015) CPU frequency and voltage scaling code in the Linux (TM) kernel. Linux CPUFreq. CPUFreq Governors
44.
Zurück zum Zitat Kim S-W, Lee JJ-S, Dugar V, De Vega J (2014) Intel® power gadget. Intel Corporation, vol 7 Kim S-W, Lee JJ-S, Dugar V, De Vega J (2014) Intel® power gadget. Intel Corporation, vol 7
45.
Zurück zum Zitat Eranian S (2006) Perfmon2: a flexible performance monitoring interface for Linux. In: Proceeding of the Ottawa Linux Symposium Eranian S (2006) Perfmon2: a flexible performance monitoring interface for Linux. In: Proceeding of the Ottawa Linux Symposium
Metadaten
Titel
PEPS: predictive energy-efficient parallel scheduler for multi-core processors
verfasst von
Zeinab Maghsoud
Hamid Noori
Saadat Pour Mozaffari
Publikationsdatum
02.01.2021
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 7/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03562-x

Weitere Artikel der Ausgabe 7/2021

The Journal of Supercomputing 7/2021 Zur Ausgabe

Premium Partner