skip to main content
research-article
Free Access

Fine-Grain Power Breakdown of Modern Out-of-Order Cores and Its Implications on Skylake-Based Systems

Published:16 December 2016Publication History
Skip Abstract Section

Abstract

A detailed analysis of power consumption at low system levels becomes important as a means for reducing the overall power consumption of a system and its thermal hot spots. This work presents a new power estimation method that allows understanding the power breakdown of an application when running on modern processor architecture such as the newly released Intel Skylake processor. This work also provides a detailed power and performance characterization report for the SPEC CPU2006 benchmarks, analysis of the data using side-by-side power and performance breakdowns, as well as few interesting case studies.

References

  1. F. Bellosa. 2000. The benefits of event: Driven energy accounting in power-sensitive systems. In Proceedings of the 9th Workshop on ACM SIGOPS European Workshop: Beyond the PC: New Challenges for the Operating System. ACM, 37--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bertran, M. Gonzalez, X. Martorell, N. Navarro, and E. Ayguade. 2010. Decomposable and responsive power models for multicore processors using performance counters. In Proceedings of the 24th ACM International Conference on Supercomputing. ACM, 147--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bertran, M. Gonzàlez, X. Martorell, N. Navarro, and E. Ayguadé. 2013a. Counter-based power modeling methods: Top-down vs. bottom-up. The Computer Journal 56, 2, 198--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bertran, M. Gonzalez, X. Martorell, N. Navarro, and E. Ayguade. 2013b. A systematic methodology to generate decomposable and responsive power models for CMPs. IEEE Transactions on Computers 62, 7, 1289--1302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Bhunia, S. Mukhopadhyay. (eds.). 2010. Low-power variation-tolerant design in nanometer silicon. Springer-Verlag.Google ScholarGoogle Scholar
  6. A. Carvalho. 2010. The new linux ‘perf’ tools. Presented at the Linux Kongress, 2010. https://scholar.google.co.il/scholar?q=The+new+linux+perf+Carvalho.8btnG=8hl=en8as_sdt=0%2C5.Google ScholarGoogle Scholar
  7. H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. 2010. RAPL: Memory power estimation and capping. In 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED). IEEE, 189--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Firasta, M. Buxton, P. Jinbo, K. Nasri, and S. Kuo. 2008. Intel AVX: New frontiers in performance improvements and energy efficiency. Intel White Paper.Google ScholarGoogle Scholar
  9. J. Haj-Yihia, Y. B. Asher, E. Rotem, A. Yasin, and R. Ginosar. 2015. Compiler-directed power management for superscalars. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4, 48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jawad Haj-Yihia, Ahmad Yasin, Yosi ben Asher, and Avi Mendelson. 2016. Core Power breakdown tool. https://drive.google.com/open?id=0B3IgzCqRS5Q_ZGN0QVFqaWxxY28.Google ScholarGoogle Scholar
  11. Intel Corporation. 2014. Intel® 64 and IA-32 Architectures Optimization Reference Manual, Appendix B.1 Intel. (as of August 2014).Google ScholarGoogle Scholar
  12. Intel Corporation. 2015. “Intel open source”, online: http://download.01.org/perfmon/ [accesses October 8, 2015].Google ScholarGoogle Scholar
  13. Intel® 64 and IA-32 Architectures Software Developer's Manual. 2016a. Volume 3A: System Programming Guide, Part 1, [accesses January 2016a].Google ScholarGoogle Scholar
  14. Intel Corporation. 2016b. “6th Generation Intel® Processor Family -- Specification update”, online: http://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-spec-update.html [accesses August 2016].Google ScholarGoogle Scholar
  15. C. Isci and M. Martonosi. 2003. Runtime power monitoring in high-end processors: Methodology and empirical data. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, 93. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Isci, A. Buyuktosunoglu, C. Y. Cher, P. Bose, and M. Martonosi. 2006. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 347--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kleen. 2015. Toplev manual (pmu-tools), online: https://github.com/andikleen/pmu-tools/wiki/toplev-manual [accesses October 8, 2015].Google ScholarGoogle Scholar
  18. M. D. Powell, A. Biswas, J. S. Emer, S. S. Mukherjee, B. R. Sheikh, and S. Yardi. 2009. CAMP: A technique to estimate per-structure power at run-time using a few simple parameters. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture. IEEE, 289--300.Google ScholarGoogle Scholar
  19. R. Efraim, R. Ginosar, C. Weiser, and A. Mendelson. 2014. Energy aware race to halt: A down to EARtH approach for platform energy management. IEEE Computer Architecture Letters 13, 1, 25--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Rotem, A. Naveh, A. Ananthakrishnan, E. Weissmann, and D. Rajwan. 2012. Power-management architecture of the intel microarchitecture code-named sandy bridge. IEEE Micro 2, 32, 20--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. S. Shao and D. Brooks. 2013. ISA-independent workload characterization and its implications for specialized architectures. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS 2013), 245--255.Google ScholarGoogle Scholar
  22. Y. S. Shao, B. Reagen, G. Y. Wei, and D. Brooks. 2014. Aladdin: A preRTL, power-performance accelerator simulator enabling large design space exploration of customized architectures. In Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA), 97--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Singh, M. Bhadauria, and S. A. McKee. 2009. Real time power estimation and thread scheduling via performance counters. ACM SIGARCH Comput. Architect. News 37, 2, 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. THE GREEN500 SITES. 2013. http://www.green500.org (accessed December 12, 2013)Google ScholarGoogle Scholar
  25. ThinkPad SMAPI kernel module version 0.40. http://tpctl.sourceforge.net/.Google ScholarGoogle Scholar
  26. TOP 500 SUPERCOMPUTER SITES. 2013. http://www.top500.org/list/2013/06 (accessed December 12, 2013)Google ScholarGoogle Scholar
  27. Vasileios Spiliopoulos, Andreas Sembrant, and Stefanos Kaxiras. 2012. Power-sleuth: A tool for investigating your program's power behavior. In Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis 8 Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Van den Steen, S. De Pestel, M. Mechri, S. Eyerman, T. Carlson, L. Eeckhout, E. Hagersten, and D. Black-Schaffer. 2015. Micro-architecture independent analytical processor performance and power modeling. In Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Mar. 2015Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Yasin. 2014. A top-down method for performance analysis and counters architecture. Presented at the 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). https://scholar.google.co.il/scholar?q=A+top-down+method+for+performance+analysis+and+counters+architecture.8btnG=8hl=en8as_sdt=0%2C5.Google ScholarGoogle Scholar

Index Terms

  1. Fine-Grain Power Breakdown of Modern Out-of-Order Cores and Its Implications on Skylake-Based Systems

      Recommendations

      Reviews

      Karthik S Murthy

      Extreme-scale data centers and supercomputers draw many megawatts of power to function. By today's standards, drawing one megawatt of power roughly costs $1 million; hence, reduced power consumption is critical for these systems. Current and upcoming systems are very efficient in terms of power usage effectiveness (PUE); for example, Intel home-built data centers run at 1.06 PUE and Facebook's centers run at 1.078 PUE. These PUE numbers tell us that most of the power drawn on these systems is used for application execution, and therefore developers need to shoulder the responsibility of achieving power efficiency as well. Rotem et al. [1] show that detailed power modeling is necessary because it is not straightforward to gauge whether (1) running a processor at a high frequency to complete the application faster and then putting the processor to sleep or (2) running a processor at a low frequency for a longer time to execute the application results in an efficient power envelope. This paper develops a tool that provides a fine-grained breakdown of the power consumed by different processor and sub-processor domains on the Intel Skylake system. Intel VTune helps identify performance bottlenecks by employing the top-down analysis method developed by Ahmad Yasin in 2014 [2]. Top-down analysis is built on the idea that studying performance counters in isolation is not as informative as studying them in groups. These (sub)groups of performance counters, that is, meta-performance counters, are useful in pinpointing whether the performance bottlenecks in the developer's application are frontend-bound or backend-bound or whether they have occurred due to misspeculations. Similarly, the tool built by the authors helps classify whether the power consumption in an application is frontend-bound, backend-bound, or due to misspeculation. To build these meta-performance counters, the authors had to identify weights for each of the performance counters, which make up a meta-performance counter. To do so, they used a set of training microbenchmarks. Overall, the paper is very informative and nicely written. The experiments are substantial. However, as the authors admit, these counters were studied for one core and one p-state, which is hardly the case in the wild. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Architecture and Code Optimization
        ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
        December 2016
        648 pages
        ISSN:1544-3566
        EISSN:1544-3973
        DOI:10.1145/3012405
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 December 2016
        • Accepted: 1 November 2016
        • Revised: 1 October 2016
        • Received: 1 June 2016
        Published in taco Volume 13, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader