Abstract
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption of many-core processors such as GPUs has increased significantly.
Hence, in this paper, we propose an integrated power and performance (IPP) prediction model for a GPU architecture to predict the optimal number of active processors for a given application. The basic intuition is that when an application reaches the peak memory bandwidth, using more cores does not result in performance improvement.
We develop an empirical power model for the GPU. Unlike most previous models, which require measured execution times, hardware performance counters, or architectural simulations, IPP predicts execution times to calculate dynamic power events. We then use the outcome of IPP to control the number of running cores. We also model the increases in power consumption that resulted from the increases in temperature.
With the predicted optimal number of active cores, we show that we can save up to 22.09%of runtime GPU energy consumption and on average 10.99% of that for the five memory bandwidth-limited benchmarks.
- Extech 380801. http://www.extech.com/instrument/products/310_399/380801.html.Google Scholar
- NVIDIA GeForce series GTX280, 8800GTX, 8800GT. http://www.nvidia.com/geforce.Google Scholar
- Nvidia's next generation cuda compute architecture. http://www.nvidia.com/fermi.Google Scholar
- S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4):23--29, 1999. Google ScholarDigital Library
- D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA-27, 2000. Google ScholarDigital Library
- J. A. Butts and G. S. Sohi. A static power model for architects. Microarchitecture, 0:191--201, 2000. Google ScholarDigital Library
- G. Contreras and M. Martonosi. Power prediction for intel xscale processors using performance monitoring unit events. In ISLPED, 2005. Google ScholarDigital Library
- R. Fu, A. Zhai, P.-C. Yew, W.-C. Hsu, and J. Lu. Reducing queuing stalls caused by data prefetching. In INTERACT-11, 2007.Google Scholar
- S. Hong and H. Kim. An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. In ISCA, 2009. Google ScholarDigital Library
- S. Huang, S. Xiao, and W. Feng. On the energy efficiency of graphics processing units for scientific computing. In IPDPS, 2009. Google ScholarDigital Library
- Intel. Intel R Nehalem Microarchitecture. http://www.intel.com/technology/architecture-silicon/next-gen/.Google Scholar
- C. Isci and M. Martonosi. Runtime power monitoring in high-end processors: Methodology and empirical data. In MICRO, 2003. Google ScholarDigital Library
- A. Kerr, G. Diamos, and S. Yalamanchili. A characterization and analysis of ptx kernels. In IISWC, 2009. Google ScholarDigital Library
- J. Li and J. F. Martínez. Power-performance considerations of parallel computing on chip multiprocessors. ACM Trans. Archit. Code Optim., 2(4):397--422, 2005. Google ScholarDigital Library
- M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII, 2008. Google ScholarDigital Library
- NVClock. Nvidia overclocking on Linux. http://www.linuxhardware.org/nvclock/.Google Scholar
- NVIDIA Corporation. CUDA Programming Guide, V3.0.Google Scholar
- K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand. Leakage current mechanisms and leakage reduction techniques in deep-submicrometer cmos circuits. Proceedings of the IEEE, 91(2):305--327, Feb 2003.Google ScholarCross Ref
- J. W. Sheaffer, D. Luebke, and K. Skadron. A flexible simulation framework for graphics architectures. In HWWS, 2004. Google ScholarDigital Library
- K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan. Temperature-aware microarchitecture: Modeling and implementation. ACM Trans. Archit. Code Optim., 1(1):94--125, 2004. Google ScholarDigital Library
- H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif. Full chip leakage estimation considering power supply and temperature variations. In ISLPED, 2003. Google ScholarDigital Library
- M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback driven threading: Power-efficient and high-performance execution of multithreaded workloads on cmps. In ASPLOS-XIII, 2008. Google ScholarDigital Library
- S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, 2009. Google ScholarDigital Library
- Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan. Hotleakage: A temperature-aware model of subthreshold and gate leakage for architects. Technical report, University of Virginia, 2003.Google Scholar
Index Terms
- An integrated GPU power and performance model
Recommendations
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance ...
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
ISCA '09: Proceedings of the 36th annual international symposium on Computer architectureGPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance ...
An integrated GPU power and performance model
ISCA '10: Proceedings of the 37th annual international symposium on Computer architectureGPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is ...
Comments