skip to main content
research-article

An integrated GPU power and performance model

Published:19 June 2010Publication History
Skip Abstract Section

Abstract

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is even more difficult. Unfortunately, as a result of the high number of processors, the power consumption of many-core processors such as GPUs has increased significantly.

Hence, in this paper, we propose an integrated power and performance (IPP) prediction model for a GPU architecture to predict the optimal number of active processors for a given application. The basic intuition is that when an application reaches the peak memory bandwidth, using more cores does not result in performance improvement.

We develop an empirical power model for the GPU. Unlike most previous models, which require measured execution times, hardware performance counters, or architectural simulations, IPP predicts execution times to calculate dynamic power events. We then use the outcome of IPP to control the number of running cores. We also model the increases in power consumption that resulted from the increases in temperature.

With the predicted optimal number of active cores, we show that we can save up to 22.09%of runtime GPU energy consumption and on average 10.99% of that for the five memory bandwidth-limited benchmarks.

References

  1. Extech 380801. http://www.extech.com/instrument/products/310_399/380801.html.Google ScholarGoogle Scholar
  2. NVIDIA GeForce series GTX280, 8800GTX, 8800GT. http://www.nvidia.com/geforce.Google ScholarGoogle Scholar
  3. Nvidia's next generation cuda compute architecture. http://www.nvidia.com/fermi.Google ScholarGoogle Scholar
  4. S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4):23--29, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. A. Butts and G. S. Sohi. A static power model for architects. Microarchitecture, 0:191--201, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Contreras and M. Martonosi. Power prediction for intel xscale processors using performance monitoring unit events. In ISLPED, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Fu, A. Zhai, P.-C. Yew, W.-C. Hsu, and J. Lu. Reducing queuing stalls caused by data prefetching. In INTERACT-11, 2007.Google ScholarGoogle Scholar
  9. S. Hong and H. Kim. An analytical model for a gpu architecture with memory-level and thread-level parallelism awareness. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Huang, S. Xiao, and W. Feng. On the energy efficiency of graphics processing units for scientific computing. In IPDPS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Intel. Intel R Nehalem Microarchitecture. http://www.intel.com/technology/architecture-silicon/next-gen/.Google ScholarGoogle Scholar
  12. C. Isci and M. Martonosi. Runtime power monitoring in high-end processors: Methodology and empirical data. In MICRO, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Kerr, G. Diamos, and S. Yalamanchili. A characterization and analysis of ptx kernels. In IISWC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Li and J. F. Martínez. Power-performance considerations of parallel computing on chip multiprocessors. ACM Trans. Archit. Code Optim., 2(4):397--422, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. NVClock. Nvidia overclocking on Linux. http://www.linuxhardware.org/nvclock/.Google ScholarGoogle Scholar
  17. NVIDIA Corporation. CUDA Programming Guide, V3.0.Google ScholarGoogle Scholar
  18. K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand. Leakage current mechanisms and leakage reduction techniques in deep-submicrometer cmos circuits. Proceedings of the IEEE, 91(2):305--327, Feb 2003.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. W. Sheaffer, D. Luebke, and K. Skadron. A flexible simulation framework for graphics architectures. In HWWS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan. Temperature-aware microarchitecture: Modeling and implementation. ACM Trans. Archit. Code Optim., 1(1):94--125, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif. Full chip leakage estimation considering power supply and temperature variations. In ISLPED, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback driven threading: Power-efficient and high-performance execution of multithreaded workloads on cmps. In ASPLOS-XIII, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan. Hotleakage: A temperature-aware model of subthreshold and gate leakage for architects. Technical report, University of Virginia, 2003.Google ScholarGoogle Scholar

Index Terms

  1. An integrated GPU power and performance model

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
          ISCA '10
          June 2010
          508 pages
          ISSN:0163-5964
          DOI:10.1145/1816038
          Issue’s Table of Contents
          • cover image ACM Conferences
            ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
            June 2010
            520 pages
            ISBN:9781450300537
            DOI:10.1145/1815961

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 June 2010

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader