Skip to main content
Erschienen in: The Journal of Supercomputing 1/2014

01.01.2014

Optimization power consumption model of reliability-aware GPU clusters

verfasst von: Haifeng Wang, Qingkui Chen

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Power controlling on reliability-aware GPU clusters with dynamically variable voltage and speed is investigated as combinatorial optimization problem, namely the problem of minimizing task execution time with energy consumption constraint and the problem of minimizing energy consumption with system reliability constraint. The two problems have applied in general multiprocessor computing and real-time multiprocessing systems where energy consumption and system reliability both are important. These problems which emphasize the trade-off among performance, power and reliability have not been well studied before. In this research, a novel power control model is built based on Model Prediction Control theory. Maximum Entropy Method is used to determine partial ordering relation of control variable and to identify the quality of solutions. Our controller can cap the redundant energy consumption by dynamically transforming energy states of the nodes in GPU cluster. We compare our controller with the control scheme, which does not consider the system reliability. The experimental results demonstrate that the proposed controller is more reliable and valuable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Repantis T, Gu X, Kalogeraki V (2010) Qos-aware shared component composition for distributed stream processing system. IEEE Trans Parallel Distrib Syst 20(7):968–982 CrossRef Repantis T, Gu X, Kalogeraki V (2010) Qos-aware shared component composition for distributed stream processing system. IEEE Trans Parallel Distrib Syst 20(7):968–982 CrossRef
2.
Zurück zum Zitat Horvath T, Abdelzaher T, Shadron K, Liu X (2007) Dynamic voltage scaling in multitier web servers with end-to-end delay control. IEEE Trans Comput 56(4):444–458 CrossRefMathSciNet Horvath T, Abdelzaher T, Shadron K, Liu X (2007) Dynamic voltage scaling in multitier web servers with end-to-end delay control. IEEE Trans Comput 56(4):444–458 CrossRefMathSciNet
3.
Zurück zum Zitat Wang G, Ren X (2012) Power-efficient work distribution method for CPU-GPU heterogeneous system. In: Proceedings of international symposium on parallel and distributed processing with applications Wang G, Ren X (2012) Power-efficient work distribution method for CPU-GPU heterogeneous system. In: Proceedings of international symposium on parallel and distributed processing with applications
4.
Zurück zum Zitat Maruyama N, Nukada A, Mastsuoka S (2009) Software-based ECC for GPUs. In: Symposium on application accelerators in high performance computing Maruyama N, Nukada A, Mastsuoka S (2009) Software-based ECC for GPUs. In: Symposium on application accelerators in high performance computing
5.
Zurück zum Zitat Dimitrov M, Mantor M, Zhou H (2009) Understanding software approaches for GPGPU reliability. In: Proceedings of 2nd workshop on general purpose processing on graphics processing units. ACM, New York Dimitrov M, Mantor M, Zhou H (2009) Understanding software approaches for GPGPU reliability. In: Proceedings of 2nd workshop on general purpose processing on graphics processing units. ACM, New York
6.
Zurück zum Zitat Xin-Hai X, Xue-Jun Y, Yu-Fei L, Yi-Song L, Tao T (2011) Fault-tolerance method for CPU-GPU heterogeneous system. J Softw 22(10):2538–2552 CrossRef Xin-Hai X, Xue-Jun Y, Yu-Fei L, Yi-Song L, Tao T (2011) Fault-tolerance method for CPU-GPU heterogeneous system. J Softw 22(10):2538–2552 CrossRef
7.
Zurück zum Zitat Sheaffer J, Luebke D, Skadron K (2007) A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors. In: Proceedings of 2007 graphics hardware Sheaffer J, Luebke D, Skadron K (2007) A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors. In: Proceedings of 2007 graphics hardware
8.
Zurück zum Zitat Haque IS, Pande VS (2009) Hard data on soft errors: a large-scale assessment of real-world error rates in GPGPU. In: Proceedings of the 10th IEEE/ACM international conference on cluster, cloud and grid computing Haque IS, Pande VS (2009) Hard data on soft errors: a large-scale assessment of real-world error rates in GPGPU. In: Proceedings of the 10th IEEE/ACM international conference on cluster, cloud and grid computing
9.
Zurück zum Zitat Xu X, Lin Y, Tang T et al (2010) HiAL-Ckpt: a hierarchical application-level checkpointing for CPU-GPU hybrid system. In: Proceedings of the 5th international conference on computer science & education, Heifei, China Xu X, Lin Y, Tang T et al (2010) HiAL-Ckpt: a hierarchical application-level checkpointing for CPU-GPU hybrid system. In: Proceedings of the 5th international conference on computer science & education, Heifei, China
10.
Zurück zum Zitat Zhao B, Aydin H, Zhu D (2012) Energy management under general task-level reliability constraints. In: Proceedings of 2012 IEEE 18th real-time and embedded technology and applications symposium Zhao B, Aydin H, Zhu D (2012) Energy management under general task-level reliability constraints. In: Proceedings of 2012 IEEE 18th real-time and embedded technology and applications symposium
11.
Zurück zum Zitat Zhu D, Aydin H (2009) Reliability-aware energy management for periodic real-time tasks. IEEE Trans Comput 58(10):1382–1397 CrossRefMathSciNet Zhu D, Aydin H (2009) Reliability-aware energy management for periodic real-time tasks. IEEE Trans Comput 58(10):1382–1397 CrossRefMathSciNet
12.
Zurück zum Zitat Wang X, Chen M, Fu X (2007) MIMI power control for high-density servers in an enclosure. IEEE Trans Parallel Distrib Syst 21(10):1412–1426 CrossRef Wang X, Chen M, Fu X (2007) MIMI power control for high-density servers in an enclosure. IEEE Trans Parallel Distrib Syst 21(10):1412–1426 CrossRef
13.
Zurück zum Zitat Wang H, Chen Q (2012) Power estimating model and analysis of general programming on GPU. J Softw 7(5):1164–1170 Wang H, Chen Q (2012) Power estimating model and analysis of general programming on GPU. J Softw 7(5):1164–1170
14.
Zurück zum Zitat Sunpyo H, Hyesoon K (2010) An integrated GPU power and performance model. In: Proceedings of the 37th annual international symposium on computer architecture, Saint-Malo, France, pp 280–289 Sunpyo H, Hyesoon K (2010) An integrated GPU power and performance model. In: Proceedings of the 37th annual international symposium on computer architecture, Saint-Malo, France, pp 280–289
15.
Zurück zum Zitat Collange S, Defour D, Tisserand A (2009) Power consumption of GPUs from a software perspective. In: Proceedings of the 9th international conference on computational science, Baton Rouge, LA, pp 914–923 Collange S, Defour D, Tisserand A (2009) Power consumption of GPUs from a software perspective. In: Proceedings of the 9th international conference on computational science, Baton Rouge, LA, pp 914–923
16.
Zurück zum Zitat Bini E, Buttazzo G, Lipari G (2005) Speed modulation in energy-aware real-time systems. In: Proc. of the 17th euromicro conference on real-time systems Bini E, Buttazzo G, Lipari G (2005) Speed modulation in energy-aware real-time systems. In: Proc. of the 17th euromicro conference on real-time systems
17.
Zurück zum Zitat Seth K, Anantaraman A, Mueller F, Fast ER (2003) Frequency-aware static timing analysis. In: Proc. of 24th IEEE real-time system symposium Seth K, Anantaraman A, Mueller F, Fast ER (2003) Frequency-aware static timing analysis. In: Proc. of 24th IEEE real-time system symposium
18.
Zurück zum Zitat Wang X, Wang Y (2011) Coordinating power control and performance management for virtualized server clusters. IEEE Trans Parallel Distrib Syst 22(2):245–259 CrossRef Wang X, Wang Y (2011) Coordinating power control and performance management for virtualized server clusters. IEEE Trans Parallel Distrib Syst 22(2):245–259 CrossRef
19.
Zurück zum Zitat Zhao B, Aydin H, Zhu D (2010) On maximizing reliability of real-time embedded applications under hard energy constraint. IEEE Trans Ind Inform 6(3):316–328 CrossRef Zhao B, Aydin H, Zhu D (2010) On maximizing reliability of real-time embedded applications under hard energy constraint. IEEE Trans Ind Inform 6(3):316–328 CrossRef
20.
Zurück zum Zitat Zhao B, Aydin H, Zhu D (2012) Energy management under general task-level reliability constraints. In: Proceedings of 2012 IEEE 18th real-time and embedded technology and applications symposium Zhao B, Aydin H, Zhu D (2012) Energy management under general task-level reliability constraints. In: Proceedings of 2012 IEEE 18th real-time and embedded technology and applications symposium
21.
Zurück zum Zitat Srinivasan S, Nk J (2006) Safety and reliability driven task allocation in distributed systems. IEEE Trans Comput 55(7):864–879 CrossRef Srinivasan S, Nk J (2006) Safety and reliability driven task allocation in distributed systems. IEEE Trans Comput 55(7):864–879 CrossRef
22.
Zurück zum Zitat Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE international symposium on workload charaterization Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE international symposium on workload charaterization
24.
Zurück zum Zitat Zhang Q, Zhou A, Jin Y (2008) RM-MEDA: a regularity model-based multi-objective estimation of distribution algorithm. IEEE Trans Evol Comput 12(1):41–63 CrossRef Zhang Q, Zhou A, Jin Y (2008) RM-MEDA: a regularity model-based multi-objective estimation of distribution algorithm. IEEE Trans Evol Comput 12(1):41–63 CrossRef
25.
Zurück zum Zitat Yari G, Chaji AR (2012) Maximum Bayesian entropy method for determining ordered weighted averaging operator weights. Comput Ind Eng 63:338–342 CrossRef Yari G, Chaji AR (2012) Maximum Bayesian entropy method for determining ordered weighted averaging operator weights. Comput Ind Eng 63:338–342 CrossRef
26.
Zurück zum Zitat Farina M, Deb K, Amato P (2004) Dynamic multiobjective optimization problems: test cases, approximations, and applications. IEEE Trans Evol Comput 8(5):425–442 CrossRef Farina M, Deb K, Amato P (2004) Dynamic multiobjective optimization problems: test cases, approximations, and applications. IEEE Trans Evol Comput 8(5):425–442 CrossRef
27.
Zurück zum Zitat Bemporad A, Morari M (1999) Robust model predictive control: a survey. Lect Notes Control Inf Sci 245:207–226 CrossRefMathSciNet Bemporad A, Morari M (1999) Robust model predictive control: a survey. Lect Notes Control Inf Sci 245:207–226 CrossRefMathSciNet
28.
Zurück zum Zitat Moorthy AK, Seshadrinathan K et al (2010) Wireless video quality assessment: a study of subjective scores and objective algorithms. IEEE Trans Circuits Syst Video Technol 20(4):587–599 CrossRef Moorthy AK, Seshadrinathan K et al (2010) Wireless video quality assessment: a study of subjective scores and objective algorithms. IEEE Trans Circuits Syst Video Technol 20(4):587–599 CrossRef
29.
Zurück zum Zitat Qu Q, Pei Y, Modestino JW (2006) An adaptive motion-based unequal error protection approach for real-time video transport over wireless IP networks. IEEE Trans Multimed 8(5):1033–1044 CrossRef Qu Q, Pei Y, Modestino JW (2006) An adaptive motion-based unequal error protection approach for real-time video transport over wireless IP networks. IEEE Trans Multimed 8(5):1033–1044 CrossRef
Metadaten
Titel
Optimization power consumption model of reliability-aware GPU clusters
verfasst von
Haifeng Wang
Qingkui Chen
Publikationsdatum
01.01.2014
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-013-0993-9

Weitere Artikel der Ausgabe 1/2014

The Journal of Supercomputing 1/2014 Zur Ausgabe