skip to main content
article
Free Access

Power-performance considerations of parallel computing on chip multiprocessors

Published:01 December 2005Publication History
Skip Abstract Section

Abstract

This paper looks at the power-performance implications of running parallel applications on chip multiprocessors (CMPs). First, we develop an analytical model that, for the first time, puts together parallel efficiency, granularity of parallelism, and voltage/frequency scaling, to establish a formal connection with the power consumption and performance of a parallel code running on a CMP. We then conduct detailed simulations of parallel applications running on a detailed power-performance CMP model to confirm the analytical results and provide further insights. Both analytical and experimental models show that parallel computing can bring significant power savings and still meet a given performance target by choosing granularity and voltage/frequency levels judiciously. The particular choice, however, is dependent on the application's parallel efficiency curve and the process technology utilized, which our model captures. Likewise, analytical model and experiments show the effect of a limited power budget on the application's scalability curve. In particular, we show that a limited power budget can cause a rapid performance degradation beyond a number of cores, even in the case of applications with excellent scalability properties. On the other hand, our experiments show that, when a limited power budget is in place, power-thrifty memory-bound applications may actually enjoy better scalability than more compute-intensive codes, even if the latter would exhibit higher scalability in a power-unconstrained scenario.

References

  1. Agerwala, T. and Chatterjee, S. 2005. Computer architecture: Challenges and opportunities for the next decade. IEEE Micro 25, 3 (May--June), 58--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Annavaram, M., Grochowski, E., and Shen, J. 2005. Mitigating Amdahl's Law through EPI throttling. In International Symposium on Computer Architecture, Madison, Wisconsin, 298--309. Google ScholarGoogle Scholar
  3. Aslot, V. and Eigenmann, R. 2003. Quantitative performance analysis of the SPEC OMPM2001 benchmarks. Scientific Programming 11, 2, 105--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Borkar, S. 1999. Design challenges for technology scaling. IEEE Micro 19, 4 (July--Aug.), 23--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In International Symposium on Computer Architecture, Vancouver. 83--94. Google ScholarGoogle Scholar
  6. Chandrakasan, A., Sheng, S., and Brodersen, R. W. 1992. Low-power CMOS digital design. IEEE Journal of Solid-State Circuits 27, 4 (Apr.), 473--484.Google ScholarGoogle ScholarCross RefCross Ref
  7. Chaparro, P., González, J., and González, A. 2004. Thermal-effective clustered microarchitectures. In Workshop on Temperature-Aware Computer Systems, München.Google ScholarGoogle Scholar
  8. Compaq Computer Corporation. 1999. Alpha 21264 Microprocessor Hardware Reference Manual. Compaq Computer Corp., Shrewsbury, MA.Google ScholarGoogle Scholar
  9. Culler, D. E. and Singh, J. P. 1999. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann. Google ScholarGoogle Scholar
  10. Donald, J. and Martonosi, M. 2004. Temperature-aware design issues for SMT and CMP architectures. In Workshop on Complexity-Effective Design, München.Google ScholarGoogle Scholar
  11. Ekman, M. and Stenström, P. 2003. Performance and power impact of issue-width in chip-multiprocessor cores. In International Conference on Parallel Processing, Kaohsiung. 359--368.Google ScholarGoogle Scholar
  12. Elnozahy, E. N., Kistler, M., and Rajamony, R. 2002. Energy-efficient server clusters. In Workshop on Power Aware Computing Systems. Cambridge, MA. 179--196. Google ScholarGoogle Scholar
  13. Elnozahy, E. N., Kistler, M., and Rajamony, R. 2003. Energy conservation policies for web servers. In USENIX Symposium on Internet Technologies and Systems. Seattle, WA. Google ScholarGoogle Scholar
  14. Ghiasi, S. and Grunwald, D. 2004. Design choices for thermal control in dual-core processors. In Workshop on Complexity-Effective Design, München.Google ScholarGoogle Scholar
  15. Gonzalez, R. and Horowitz, M. 1996. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits 31, 9 (Sept.), 1277--1284.Google ScholarGoogle ScholarCross RefCross Ref
  16. Grochowski, E., Ronen, R., Shen, J., and Wang, H. 2004. Best of both latency and throughput. In International Conference on Computer Design, San Jose, CA. 236--243. Google ScholarGoogle Scholar
  17. Gunther, S., Binns, F., Carmean, D. M., and Hall, J. C. 2001. Managing the impact of increasing microprocessor power consumption. Intel Technology Journal 5, 1 (Feb.).Google ScholarGoogle Scholar
  18. Hennessy, J. L. and Patterson, D. A. 2003. Computer Architecture: A Quantitative Approach, 3rd edn. Elsevier Science, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Heo, S. and Asanović, K. 2004. Power-optimal pipelining in deep submicron technology. In International Symposium on Low Power Electronics and Design, Newport Beach, CA. Google ScholarGoogle Scholar
  20. Huh, J., Burger, D., and Keckler, S. W. 2001. Exploring the design space of future CMPs. In International Conference on Parallel Architectures and Compilation Techniques, Barcelona. 199--210. Google ScholarGoogle Scholar
  21. Intel Corporation. 2004. Intel Pentium M Processor on 90nm Process with 2-MB L2 Cache Datasheet. Intel Corp. Hudson, MA.Google ScholarGoogle Scholar
  22. Kadayif, I., Kandemir, M., and Sezer, U. 2002. An integer linear programming based approach for parallelizing applications in on-chip multiprocessors. In IEEE/ACM Design Automation Conference, New Orleans, LA. 703--708. Google ScholarGoogle Scholar
  23. Kadayif, I., Kandemir, M., Vijaykrishnan, N., and Irwin, M. J. 2004. Exploiting processor workload heterogeneity for reducing energy consumption in chip multiprocessors. In Design, Automation and Test in Europe, Paris. 1158--1163. Google ScholarGoogle Scholar
  24. Kaxiras, S., Narlikar, G., Berenbaum, A. D., and Hu, Z. 2001. Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Atlanta, Georgia. 211--220. Google ScholarGoogle Scholar
  25. Kim, N. S., Austin, T., Blaauw, D., Mudge, T., Flautner, K., Hu, J. S., Irwin, M. J., Kandemir, M., and Narayanan, V. 2003. Leakage current: Moore's law meets static power. IEEE Computer 36, 12 (Dec.), 68--75. Google ScholarGoogle Scholar
  26. Kumar, R., Farkas, K. I., Jouppi, N. P., Ranganathan, P., and Tullsen, D. M. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In International Symposium on Microarchitecture, San Diego, CA. 81--92. Google ScholarGoogle Scholar
  27. Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In International Symposium on Computer Architecture, München. 64--75. Google ScholarGoogle Scholar
  28. Lawrence, R. D., Almasi, G. S., and Rushmeier, H. E. 1999. A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems. Data Mining and Knowledge Discovery 3, 2 (Sept.), 171--195. Google ScholarGoogle ScholarCross RefCross Ref
  29. Li, J., Martínez, J. F., and Huang, M. C. 2004. The Thrifty Barrier: Energy-aware synchronization in shared-memory multiprocessors. In International Symposium on High-Performance Computer Architecture, Madrid. 14--23. Google ScholarGoogle Scholar
  30. Li, Y., Skadron, K., Brooks, D., and Hu, Z. 2004. Understanding the energy efficiency of simultaneous multithreading. In International Symposium on Low Power Electronics and Design, Newport Beach, CA. 207--212. Google ScholarGoogle Scholar
  31. Li, Y., Brooks, D., Hu, Z., and Skadron, K. 2005. Performance, energy, and temperature considerations for SMT and CMP architectures. In International Symposium on High-Performance Computer Architecture, San Francisco, CA. Google ScholarGoogle Scholar
  32. Majan, R. 2002. Thermal management of CPUs: A perspective on trends, needs and opportunities. In International Workshop on Thermal Investigations of ICs and Systems, Madrid. (Keynote presentation).Google ScholarGoogle Scholar
  33. Moshovos, A., Memik, G., Falsafi, B., and Choudhary, A. 2001. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In International Symposium on High-Performance Computer Architecture, Nuevo Leone. 85--96. Google ScholarGoogle Scholar
  34. Mudge, T. 2001. Power: A first-class architectural design constraint. IEEE Computer 34, 4 (Apr.), 52--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Parhi, K. K. 1999. VLSI Digital Signal Processing Systems. Wiley, New York.Google ScholarGoogle Scholar
  36. Pinheiro, E., Bianchini, R., Carrera, E., and Heath, T. 2001. Load balancing and unbalancing for power and performance in cluster-based systems. In International Workshop on Compilers and Operating Systems for Low Power, Barcelona.Google ScholarGoogle Scholar
  37. Rajamani, K. and Lefurgy, C. 2003. On evaluating request-distribution schemes for saving energy in server clusters. In International Symposium on Performance Analysis of Systems and Software, Austin, TX. 111--122. Google ScholarGoogle Scholar
  38. Saldanha, C. and Lipasti, M. 2001. Power efficient cache coherence. In Workshop on Memory Performance Issues, Göteborg.Google ScholarGoogle Scholar
  39. Sasanka, R., Adve, S. V., Chen, Y., and Debes, E. 2004. Comparing the energy efficiency of CMP and SMT architectures for multimedia workloads. In International Conference on Supercomputing, Malo. 196--206. Google ScholarGoogle Scholar
  40. Seng, J. S., Tullsen, D. M., and Cai, G. Z. N. 2000. Power-sensitive multithreaded architecture. In International Conference on Computer Design, Austin, Texas. 199--208. Google ScholarGoogle Scholar
  41. Skadron, K., Stan, M., Huang, W., and Velusamy, S. 2003. Temperature-aware microarchitecture: Extended discussion and results. Tech. Rep. CS-2003-08, University of Virginia. (Apr.)Google ScholarGoogle Scholar
  42. The ITRS Technology Working Groups. 2003. International Technology Roadmap for Semiconductors (ITRS). The ITRS Technology Working Groups.Google ScholarGoogle Scholar
  43. Weiser, U. 2004. Microprocessors: Bypass the power wall. In Intel Academic Forum, Barcelona. (Keynote presentation).Google ScholarGoogle Scholar
  44. Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits 31, 5 (May), 677--688.Google ScholarGoogle ScholarCross RefCross Ref
  45. Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In International Symposium on Computer Architecture. Santa Margherita Ligure, Italy. 24--36. Google ScholarGoogle Scholar
  46. Zyuban, V., Brooks, D., Srinivasan, V., Gschwind, M., Bose, P., Strenski, P. N., and Emma, P. G. 2004. Integrated analysis of power and performance for pipelined microprocessors. IEEE Transactions on Computers 53, 8 (Aug.), 1004--1016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Power-performance considerations of parallel computing on chip multiprocessors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 2, Issue 4
      December 2005
      116 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/1113841
      Issue’s Table of Contents

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 December 2005
      Published in taco Volume 2, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader