research-article

An integrated GPU power and performance model

Authors:
Sunpyo Hong

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA
View Profile

,
Hyesoon Kim

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 38 Issue 3June 2010pp 280–289https://doi.org/10.1145/1816038.1815998

Published:19 June 2010Publication History

ACM SIGARCH Computer Architecture News

Abstract

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption of many-core processors such as GPUs has increased significantly.

Hence, in this paper, we propose an integrated power and performance (IPP) prediction model for a GPU architecture to predict the optimal number of active processors for a given application. The basic intuition is that when an application reaches the peak memory bandwidth, using more cores does not result in performance improvement.

We develop an empirical power model for the GPU. Unlike most previous models, which require measured execution times, hardware performance counters, or architectural simulations, IPP predicts execution times to calculate dynamic power events. We then use the outcome of IPP to control the number of running cores. We also model the increases in power consumption that resulted from the increases in temperature.

With the predicted optimal number of active cores, we show that we can save up to 22.09%of runtime GPU energy consumption and on average 10.99% of that for the five memory bandwidth-limited benchmarks.

References

Extech 380801. http://www.extech.com/instrument/products/310_399/380801.html.Google Scholar
NVIDIA GeForce series GTX280, 8800GTX, 8800GT. http://www.nvidia.com/geforce.Google Scholar
Nvidia's next generation cuda compute architecture. http://www.nvidia.com/fermi.Google Scholar
S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4):23--29, 1999. Google ScholarDigital Library
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA-27, 2000. Google ScholarDigital Library
J. A. Butts and G. S. Sohi. A static power model for architects. Microarchitecture, 0:191--201, 2000. Google ScholarDigital Library
G. Contreras and M. Martonosi. Power prediction for intel xscale processors using performance monitoring unit events. In ISLPED, 2005. Google ScholarDigital Library
R. Fu, A. Zhai, P.-C. Yew, W.-C. Hsu, and J. Lu. Reducing queuing stalls caused by data prefetching. In INTERACT-11, 2007.Google Scholar
S. Hong and H. Kim. An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. In ISCA, 2009. Google ScholarDigital Library
S. Huang, S. Xiao, and W. Feng. On the energy efficiency of graphics processing units for scientific computing. In IPDPS, 2009. Google ScholarDigital Library
Intel. Intel R Nehalem Microarchitecture. http://www.intel.com/technology/architecture-silicon/next-gen/.Google Scholar
C. Isci and M. Martonosi. Runtime power monitoring in high-end processors: Methodology and empirical data. In MICRO, 2003. Google ScholarDigital Library
A. Kerr, G. Diamos, and S. Yalamanchili. A characterization and analysis of ptx kernels. In IISWC, 2009. Google ScholarDigital Library
J. Li and J. F. Martínez. Power-performance considerations of parallel computing on chip multiprocessors. ACM Trans. Archit. Code Optim., 2(4):397--422, 2005. Google ScholarDigital Library
M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII, 2008. Google ScholarDigital Library
NVClock. Nvidia overclocking on Linux. http://www.linuxhardware.org/nvclock/.Google Scholar
NVIDIA Corporation. CUDA Programming Guide, V3.0.Google Scholar
K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand. Leakage current mechanisms and leakage reduction techniques in deep-submicrometer cmos circuits. Proceedings of the IEEE, 91(2):305--327, Feb 2003.Google ScholarCross Ref
J. W. Sheaffer, D. Luebke, and K. Skadron. A flexible simulation framework for graphics architectures. In HWWS, 2004. Google ScholarDigital Library
K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan. Temperature-aware microarchitecture: Modeling and implementation. ACM Trans. Archit. Code Optim., 1(1):94--125, 2004. Google ScholarDigital Library
H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif. Full chip leakage estimation considering power supply and temperature variations. In ISLPED, 2003. Google ScholarDigital Library
M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback driven threading: Power-efficient and high-performance execution of multithreaded workloads on cmps. In ASPLOS-XIII, 2008. Google ScholarDigital Library
S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, 2009. Google ScholarDigital Library
Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan. Hotleakage: A temperature-aware model of subthreshold and gate leakage for architects. Technical report, University of Virginia, 2003.Google Scholar

Index Terms

An integrated GPU power and performance model

Recommendations

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance ...
Read More
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance ...
Read More
An integrated GPU power and performance model
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 38, Issue 3
ISCA '10
June 2010
508 pages
ISSN:0163-5964
DOI:10.1145/1816038
Issue’s Table of Contents
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
June 2010
520 pages
ISBN:9781450300537
DOI:10.1145/1815961
General Chair:
André Seznec
INRIA Rennes
,
Program Chairs:
Uri Weiser
Technion
,
Ronny Ronen
Intel
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 June 2010
Check for updates
Author Tags
CUDA
GPU architecture
analytical model
energy
performance
power estimation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 386
  Total Citations
  View Citations
- 4,687
  Total Downloads
- Downloads (Last 12 months)301
- Downloads (Last 6 weeks)45
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An integrated GPU power and performance model

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

An integrated GPU power and performance model