skip to main content
10.1145/3123939.3124537acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Harnessing voltage margins for energy efficiency in multicore CPUs

Published:14 October 2017Publication History

ABSTRACT

In this paper, we present the first automated system-level analysis of multicore CPUs based on ARMv8 64-bit architecture (8-core, 28nm X-Gene 2 micro-server by AppliedMicro) when pushed to operate in scaled voltage conditions. We report detailed system-level effects including SDCs, corrected/uncorrected errors and application/system crashes. Our study reveals large voltage margins (that can be harnessed for energy savings) and also large Vmin variation among the 8 cores of the CPU chip, among 3 different chips (a nominal rated and two sigma chips), and among different benchmarks.

Apart from the Vmin analysis we propose a new composite metric (severity) that aggregates the behavior of cores when undervolted and can support system operation and design protection decisions. Our undervolting characterization findings are the first reported analysis for an enterprise class 64-bit ARMv8 platform and we highlight key differences with previous studies on x86 platforms. We utilize the results of the system characterization along with performance counters information to measure the accuracy of prediction models for the behavior of benchmarks running in particular cores. Finally, we discuss how the detailed characterization and the prediction results can be effectively used to support design and system software decisions to harness voltage margins for energy efficiency while preserving operation correctness. Our findings show that, on average, 19.4% energy saving can be achieved without compromising the performance, while with 25% performance reduction, the energy saving raises to 38.8%.

References

  1. F. Salehuddin, I. Ahmad, F.A. Hamid, A. Zaharim, A. Maheran, A. Hamid, P. S. Menon, H. A. Elgomati, and B. Y. Majlis. 2012. Optimization of process parameter variation in 45nm p-channel MOSFET using L18 Orthogonal Array. In Proceedings of IEEE International Conference on Semiconductor Electronic (ICSE '12). Kuala Lumpur, Malaysia, 219--223.Google ScholarGoogle Scholar
  2. W. Schemmert, and G. Zimmer. 1974. Threshold-voltage sensitivity of ion- implanted MOS transistors due to process variations. Electronics Letters, vol. 10, no. 9, pp. 151--152, May.Google ScholarGoogle ScholarCross RefCross Ref
  3. Norman James, Phillip Restle, Joshua Friedrich, Bill Huott, and Bradley McCredie. 2007. Comparison of split-versus connected-core supplies in the POWER6 microprocessor. In Proceedings of the 2007 IEEE International Solid-State Circuits Conference (ISSCC `07). San Francisco, CA, USA, 298--604.Google ScholarGoogle ScholarCross RefCross Ref
  4. Vijay Janapa Reddi, Svilen Kanev, Wonyoung Kim, Simone Campanoni, Michael D. Smith, Gu-Yeon Wei, and David Brooks. 2010. Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-43). IEEE Computer Society, Washington, DC, USA, 77--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Etienne Le Sueur and Gernot Heiser. 2010. Dynamic voltage and frequency scaling: the laws of diminishing returns. In Proceedings of the 2010 international conference on Power aware computing and systems (HotPower'10). USENIX Association, Berkeley, CA, USA, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham, Conrad Ziesler, David Blaauw, Todd Austin, Krisztian Flautner, and Trevor Mudge. 2003. Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36). IEEE Computer Society, Washington, DC, USA, 7--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yazhou Zu, Charles R. Lefurgy, Jingwen Leng, Matthew Halpern, Michael S. Floyd, and Vijay Janapa Reddi. 2015. Adaptive guardband scheduling to improve system-level efficiency of the POWER7+. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 308--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Charles R. Lefurgy, Alan J. Drake, Michael S. Floyd, Malcolm S. Allen-Ware, Bishop Brock, Jose A. Tierno, and John B. Carter. 2011. Active management of timing guardband to save energy in POWER7. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Anys Bacha and Radu Teodorescu. 2013. Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 297--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Anys Bacha and Radu Teodorescu. 2014. Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). IEEE Computer Society, Washington, DC, USA, 306--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jingwen Leng, Alper Buyuktosunoglu, Ramon Bertran, Pradip Bose, and Vijay Janapa Reddi. 2015. Safe limits on voltage reduction efficiency in GPUs: a direct measurement approach. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 294--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. The Linux Kernel Documentation (Parent Directory), Retrieved 2017 from https://www.kernel.org/doc/Documentation.Google ScholarGoogle Scholar
  13. George Papadimitriou, Manolis Kaliorakis, Athanasios Chatzidimitriou, Dimitris Gizopoulos, Greg Favor, Kumar Sankaran and Shidhartha Das. 2017. A System-Level Voltage/Frequency Scaling Characterization Framework for Multicore CPUs. In 13th IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE `17). Boston, MA, USA.Google ScholarGoogle Scholar
  14. John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (September 2006), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Reid J. Riedlinger, Rohit Bhatia, Larry Biro, Bill Bowhill, Eric Fetzer, Paul Gronowski, and Tom Grutkowski. 2011. A 32nm 3.1 Billion Transistor 12-Wide-Issue Itanium® Processor for Mission-Critical Servers", In Proceedings of the 2011 IEEE International Solid-State Circuits Conference (ISSCC `11). San Francisco, CA, USA, 84--86.Google ScholarGoogle ScholarCross RefCross Ref
  16. Arijit Biswas, Niranjan Soundararajan, Shubhendu S. Mukherjee, and Sudhanva Gurumurthi. 2009. Quantized AVF: A means of capturing vulnerability variations over small windows of time. In IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE `09). Stanford University, CA, USA.Google ScholarGoogle Scholar
  17. Vijay Janapa Reddi, Meeta S. Gupta, Glenn Holloway, Gu-Yeon Wei, Michael D. Smith, and David Brooks. 2009. Voltage emergency prediction: Using signatures to reduce operating margins. In Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA `09), Raleigh, NC, USA 18--29.Google ScholarGoogle ScholarCross RefCross Ref
  18. Kristen R. Walcott, Greg Humphreys, and Sudhanva Gurumurthi. 2007. Dynamic prediction of architectural vulnerability from microarchitectural state. In Proceedings of the 34th annual international symposium on Computer architecture (ISCA '07). ACM, New York, NY, USA, 516--527. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Edouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. Machine Learning Research, vol. 12, pp. 2825--2830, October. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Perf: Linux Profiling with Performance Counters. Retrieved 2017 from https://perf.wiki.kernel.org/index.php/Main_Page.Google ScholarGoogle Scholar
  21. Chris Wilkerson, Hongliang Gao, Alaa R. Alameldeen, Zeshan Chishti, Muhammad Khellah, and Shih-Lien Lu. 2008. Trading off Cache Capacity for Reliability to Enable Low Voltage Operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society, Washington, DC, USA, 203--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zeshan Chishti, Alaa R. Alameldeen, Chris Wilkerson, Wei Wu, and Shih-Lien Lu. 2009. Improving cache lifetime reliability at ultra-low voltages. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). ACM, New York, NY, USA, 89--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Henry Duwe, Xun Jian, Daniel Petrisko, and Rakesh Kumar. 2016. Rescuing uncorrectable fault patterns in on-chip memories through error pattern transformation. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 634--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Meeta S. Gupta, Krishna K. Rangan, Michael D. Smith, Gu-Yeon Wei, and David Brooks. 2007. Towards a software approach to mitigate voltage emergencies. In Proceedings of the 2007 ACM/IEEE International Symposium on Low Power Electronics and Design (ISPLED `07), Portland, OR, USA, 123--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Franch, P. Restle, N. James, W. Huott, J. Friedrich, R. Dixon, S. Weitzel, K. Van Goor, and G. Salem. 2008. On-chip timing uncertainty measurements on IBM microprocessors. In Proceedings of the IEEE International Test Conference (ITC `08), Santa Clara, CA, USA, 1--7.Google ScholarGoogle Scholar
  26. Phillip J. Restle, Robert L. Franch, Norman K. James, William V. Huott, Timothy M. Skergan, Steven C. Wilson, Nicole S. Schwartz, Joachim G. Clabes. 2004. Timing uncertainty measurements on the power5 microprocessor. In Proceedings of the 2004 IEEE International Solid-State Circuits Conference (ISSCC '04), San Francisco, CA, USA, 354--355.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mahesh Ketkar and Eli Chiprout. 2009. A microarchitecture-based framework for pre- and post-silicon power delivery analysis. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). ACM, New York, NY, USA, 179--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Youngtaek Kim and Lizy Kurian John. 2011. Automated di/dt stressmark generation for microprocessor power delivery networks. In Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design (ISLPED '11). IEEE Press, Piscataway, NJ, USA, 253--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Youngtaek Kim, Lizy Kurian John, Sanjay Pant, Srilatha Manne, Michael Schulte, W. Lloyd Bircher, and Madhu S. Sibi Govindan. 2012. AUDIT: Stress Testing the Automatic Way. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 212--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Meeta S. Gupta, Vijay Janapa Reddi, Glenn Holloway, Gu-Yeon Wei, and David M. Brooks. 2009. An event-guided approach to reducing voltage noise in processors. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE '09). European Design and Automation Association, 3001 Leuven, Belgium, Belgium, 160--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Russ Joseph, David Brooks, and Margaret Martonosi. 2003. Control techniques to eliminate voltage emergencies in high performance processors. In Proceedings of the 2003 IEEE International Conference on High-Performance Computer Architecture (HPCA `03), Anaheim, CA, USA, 79--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Timothy N. Miller, Renji Thomas, Xiang Pan, and Radu Teodorescu. 2012. VRSync: characterizing and eliminating synchronization-induced voltage emergencies in many-core processors. In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). IEEE Computer Society, Washington, DC, USA, 249--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michael D. Powell and T. N. Vijaykumar. 2003. Pipeline muffling and a priori current ramping: architectural techniques to reduce high-frequency inductive noise. In Proceedings of the 2003 international symposium on Low power electronics and design (ISLPED '03). ACM, New York, NY, USA, 223--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Meeta S. Gupta, Krishna K. Rangan, Michael D. Smith, Gu-Yeon Wei, and David Brooks. 2008. DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors. In Proceedings of the 2008 IEEE International Conference on High-Performance Computer Architecture (HPCA `08), Salt Lake City, UT, USA.Google ScholarGoogle ScholarCross RefCross Ref
  35. Bhargava Gopireddy, Choungki Song, Josep Torrellas, Nam Sung Kim, Aditya Agrawal, and Asit Mishra. 2016. ScalCore: Designing a core for voltage scalability. In Proceedings of the 2016 IEEE International Conference on High-Performance Computer Architecture (HPCA `16), Barcelona, Spain, 681--693.Google ScholarGoogle ScholarCross RefCross Ref
  36. George Papadimitriou, Manolis Kaliorakis, Athanasios Chatzidimitriou, Charalampos Magdalinos, Dimitris Gizopoulos. 2017. Voltage Margins Identification on Commercial x86-64 Multicore Microprocessors. In Proceedings of the 2017 IEEE 23rd International Symposium on On-Line Testing and Robust System Design (IOLTS `17). Thessaloniki, Greece, 51--56.Google ScholarGoogle ScholarCross RefCross Ref
  37. Anys Bacha and Radu Teodorescu. 2015. Authenticache: harnessing cache ECC for system authentication. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 128--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sriram Sundaram, Sriram Samabmurthy, Michael Austin, Aaron Grenat, Michael Golden, Stephen Kosonocky, and Samuel Naffziger. 2016. Adaptive Voltage Frequency Scaling using Critical Path Accumulator implemented in 28nm CPU. In Proceedings of the 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID `16), Kolkata, India, 565--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Paul N. Whatmough, Shidhartha Das, Zacharias Hadjilambrou, and David M. Bull. 2015. An all-digital power-delivery monitor for analysis of a 28nm dual-core ARM Cortex-A57 cluster. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC `15), San Francisco, CA, USA, 262--264.Google ScholarGoogle Scholar
  40. Paul N. Whatmough, Shidhartha Das, and David M. Bull. 2015. Analysis of adaptive clocking technique for resonant supply voltage noise mitigation. In Proceedings of the 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED `15), Rome, Italy, 128--133.Google ScholarGoogle Scholar
  41. Shidhartha Das, Paul Whatmough and David M. Bull. 2015. Modelling and characterization of the System-Level Power-Delivery Network for a Dual-Core ARM A57 Cluster in 28nm CMOS. In Proceedings of the 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED `15), Rome, Italy, 146--151.Google ScholarGoogle Scholar
  42. Paul Whatmough, Shidhartha Das and David M. Bull. 2017. Power Integrity Analysis of a 28 nm Dual-Core ARM Cortex-A57 Cluster Using an All-Digital Power Delivery Monitor. In Journal of Solid-State Circuits (JSSC '17). vol. 52, no. 6, pp. 1643 -- 1654, March.Google ScholarGoogle Scholar
  43. Wenhao Jia, Kelly A. Shaw, and Margaret Martonosi. 2012. Stargazer: Automated regression-based GPU design space exploration. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS '12). IEEE Computer Society, Washington, DC, USA, 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. J. Joseph, Kapil Vaswani, Matthew J. Thazhuthaveetil. 2006. Construction and use of linear regression models for processor performance analysis. In Proceedings of the 12th International Conference on High-Performance Computer Architecture (HPCA '06). Austin, TX, USA, 99--108.Google ScholarGoogle ScholarCross RefCross Ref
  45. Benjamin C. Lee and David M. Brooks. 2006. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems (ASPLOS XII). ACM, New York, NY, USA, 185--194. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Harnessing voltage margins for energy efficiency in multicore CPUs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader