article

Free Access

Power-performance considerations of parallel computing on chip multiprocessors

Authors:
Jian Li

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
José F. Martínez

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

ACM Transactions on Architecture and Code Optimization Volume 2 Issue 4pp 397–422https://doi.org/10.1145/1113841.1113844

Published:01 December 2005Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

This paper looks at the power-performance implications of running parallel applications on chip multiprocessors (CMPs). First, we develop an analytical model that, for the first time, puts together parallel efficiency, granularity of parallelism, and voltage/frequency scaling, to establish a formal connection with the power consumption and performance of a parallel code running on a CMP. We then conduct detailed simulations of parallel applications running on a detailed power-performance CMP model to confirm the analytical results and provide further insights. Both analytical and experimental models show that parallel computing can bring significant power savings and still meet a given performance target by choosing granularity and voltage/frequency levels judiciously. The particular choice, however, is dependent on the application's parallel efficiency curve and the process technology utilized, which our model captures. Likewise, analytical model and experiments show the effect of a limited power budget on the application's scalability curve. In particular, we show that a limited power budget can cause a rapid performance degradation beyond a number of cores, even in the case of applications with excellent scalability properties. On the other hand, our experiments show that, when a limited power budget is in place, power-thrifty memory-bound applications may actually enjoy better scalability than more compute-intensive codes, even if the latter would exhibit higher scalability in a power-unconstrained scenario.

References

Agerwala, T. and Chatterjee, S. 2005. Computer architecture: Challenges and opportunities for the next decade. IEEE Micro 25, 3 (May--June), 58--69. Google ScholarDigital Library
Annavaram, M., Grochowski, E., and Shen, J. 2005. Mitigating Amdahl's Law through EPI throttling. In International Symposium on Computer Architecture, Madison, Wisconsin, 298--309. Google Scholar
Aslot, V. and Eigenmann, R. 2003. Quantitative performance analysis of the SPEC OMPM2001 benchmarks. Scientific Programming 11, 2, 105--124. Google ScholarDigital Library
Borkar, S. 1999. Design challenges for technology scaling. IEEE Micro 19, 4 (July--Aug.), 23--29. Google ScholarDigital Library
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In International Symposium on Computer Architecture, Vancouver. 83--94. Google Scholar
Chandrakasan, A., Sheng, S., and Brodersen, R. W. 1992. Low-power CMOS digital design. IEEE Journal of Solid-State Circuits 27, 4 (Apr.), 473--484.Google ScholarCross Ref
Chaparro, P., González, J., and González, A. 2004. Thermal-effective clustered microarchitectures. In Workshop on Temperature-Aware Computer Systems, München.Google Scholar
Compaq Computer Corporation. 1999. Alpha 21264 Microprocessor Hardware Reference Manual. Compaq Computer Corp., Shrewsbury, MA.Google Scholar
Culler, D. E. and Singh, J. P. 1999. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann. Google Scholar
Donald, J. and Martonosi, M. 2004. Temperature-aware design issues for SMT and CMP architectures. In Workshop on Complexity-Effective Design, München.Google Scholar
Ekman, M. and Stenström, P. 2003. Performance and power impact of issue-width in chip-multiprocessor cores. In International Conference on Parallel Processing, Kaohsiung. 359--368.Google Scholar
Elnozahy, E. N., Kistler, M., and Rajamony, R. 2002. Energy-efficient server clusters. In Workshop on Power Aware Computing Systems. Cambridge, MA. 179--196. Google Scholar
Elnozahy, E. N., Kistler, M., and Rajamony, R. 2003. Energy conservation policies for web servers. In USENIX Symposium on Internet Technologies and Systems. Seattle, WA. Google Scholar
Ghiasi, S. and Grunwald, D. 2004. Design choices for thermal control in dual-core processors. In Workshop on Complexity-Effective Design, München.Google Scholar
Gonzalez, R. and Horowitz, M. 1996. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits 31, 9 (Sept.), 1277--1284.Google ScholarCross Ref
Grochowski, E., Ronen, R., Shen, J., and Wang, H. 2004. Best of both latency and throughput. In International Conference on Computer Design, San Jose, CA. 236--243. Google Scholar
Gunther, S., Binns, F., Carmean, D. M., and Hall, J. C. 2001. Managing the impact of increasing microprocessor power consumption. Intel Technology Journal 5, 1 (Feb.).Google Scholar
Hennessy, J. L. and Patterson, D. A. 2003. Computer Architecture: A Quantitative Approach, 3rd edn. Elsevier Science, New York. Google ScholarDigital Library
Heo, S. and Asanović, K. 2004. Power-optimal pipelining in deep submicron technology. In International Symposium on Low Power Electronics and Design, Newport Beach, CA. Google Scholar
Huh, J., Burger, D., and Keckler, S. W. 2001. Exploring the design space of future CMPs. In International Conference on Parallel Architectures and Compilation Techniques, Barcelona. 199--210. Google Scholar
Intel Corporation. 2004. Intel Pentium M Processor on 90nm Process with 2-MB L2 Cache Datasheet. Intel Corp. Hudson, MA.Google Scholar
Kadayif, I., Kandemir, M., and Sezer, U. 2002. An integer linear programming based approach for parallelizing applications in on-chip multiprocessors. In IEEE/ACM Design Automation Conference, New Orleans, LA. 703--708. Google Scholar
Kadayif, I., Kandemir, M., Vijaykrishnan, N., and Irwin, M. J. 2004. Exploiting processor workload heterogeneity for reducing energy consumption in chip multiprocessors. In Design, Automation and Test in Europe, Paris. 1158--1163. Google Scholar
Kaxiras, S., Narlikar, G., Berenbaum, A. D., and Hu, Z. 2001. Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Atlanta, Georgia. 211--220. Google Scholar
Kim, N. S., Austin, T., Blaauw, D., Mudge, T., Flautner, K., Hu, J. S., Irwin, M. J., Kandemir, M., and Narayanan, V. 2003. Leakage current: Moore's law meets static power. IEEE Computer 36, 12 (Dec.), 68--75. Google Scholar
Kumar, R., Farkas, K. I., Jouppi, N. P., Ranganathan, P., and Tullsen, D. M. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In International Symposium on Microarchitecture, San Diego, CA. 81--92. Google Scholar
Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In International Symposium on Computer Architecture, München. 64--75. Google Scholar
Lawrence, R. D., Almasi, G. S., and Rushmeier, H. E. 1999. A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems. Data Mining and Knowledge Discovery 3, 2 (Sept.), 171--195. Google ScholarCross Ref
Li, J., Martínez, J. F., and Huang, M. C. 2004. The Thrifty Barrier: Energy-aware synchronization in shared-memory multiprocessors. In International Symposium on High-Performance Computer Architecture, Madrid. 14--23. Google Scholar
Li, Y., Skadron, K., Brooks, D., and Hu, Z. 2004. Understanding the energy efficiency of simultaneous multithreading. In International Symposium on Low Power Electronics and Design, Newport Beach, CA. 207--212. Google Scholar
Li, Y., Brooks, D., Hu, Z., and Skadron, K. 2005. Performance, energy, and temperature considerations for SMT and CMP architectures. In International Symposium on High-Performance Computer Architecture, San Francisco, CA. Google Scholar
Majan, R. 2002. Thermal management of CPUs: A perspective on trends, needs and opportunities. In International Workshop on Thermal Investigations of ICs and Systems, Madrid. (Keynote presentation).Google Scholar
Moshovos, A., Memik, G., Falsafi, B., and Choudhary, A. 2001. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In International Symposium on High-Performance Computer Architecture, Nuevo Leone. 85--96. Google Scholar
Mudge, T. 2001. Power: A first-class architectural design constraint. IEEE Computer 34, 4 (Apr.), 52--58. Google ScholarDigital Library
Parhi, K. K. 1999. VLSI Digital Signal Processing Systems. Wiley, New York.Google Scholar
Pinheiro, E., Bianchini, R., Carrera, E., and Heath, T. 2001. Load balancing and unbalancing for power and performance in cluster-based systems. In International Workshop on Compilers and Operating Systems for Low Power, Barcelona.Google Scholar
Rajamani, K. and Lefurgy, C. 2003. On evaluating request-distribution schemes for saving energy in server clusters. In International Symposium on Performance Analysis of Systems and Software, Austin, TX. 111--122. Google Scholar
Saldanha, C. and Lipasti, M. 2001. Power efficient cache coherence. In Workshop on Memory Performance Issues, Göteborg.Google Scholar
Sasanka, R., Adve, S. V., Chen, Y., and Debes, E. 2004. Comparing the energy efficiency of CMP and SMT architectures for multimedia workloads. In International Conference on Supercomputing, Malo. 196--206. Google Scholar
Seng, J. S., Tullsen, D. M., and Cai, G. Z. N. 2000. Power-sensitive multithreaded architecture. In International Conference on Computer Design, Austin, Texas. 199--208. Google Scholar
Skadron, K., Stan, M., Huang, W., and Velusamy, S. 2003. Temperature-aware microarchitecture: Extended discussion and results. Tech. Rep. CS-2003-08, University of Virginia. (Apr.)Google Scholar
The ITRS Technology Working Groups. 2003. International Technology Roadmap for Semiconductors (ITRS). The ITRS Technology Working Groups.Google Scholar
Weiser, U. 2004. Microprocessors: Bypass the power wall. In Intel Academic Forum, Barcelona. (Keynote presentation).Google Scholar
Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits 31, 5 (May), 677--688.Google ScholarCross Ref
Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In International Symposium on Computer Architecture. Santa Margherita Ligure, Italy. 24--36. Google Scholar
Zyuban, V., Brooks, D., Srinivasan, V., Gschwind, M., Bose, P., Strenski, P. N., and Emma, P. G. 2004. Integrated analysis of power and performance for pipelined microprocessors. IEEE Transactions on Computers 53, 8 (Aug.), 1004--1016. Google ScholarDigital Library

Index Terms

Power-performance considerations of parallel computing on chip multiprocessors
1. Computer systems organization
  1. Architectures
    1. Parallel architectures

Recommendations

Multi-optimization power management for chip multiprocessors
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

The emergence of power as a first-class design constraint has fueled the proposal of a growing number of run-time power optimizations. Many of these optimizations trade-off power saving opportunity for a variable performance loss which depends on ...
Read More
System level power-performance trade-offs in embedded systems using voltage and frequency scaling of off-chip buses and memory
ISSS '02: Proceedings of the 15th international symposium on System Synthesis

In embedded systems, off-chip buses and memory (i.e., L2 memory as opposed to the L1 memory which is usually on-chip cache) consume significant power, often more than the processor itself. In this paper, for the case of an embedded system with one ...
Read More
Power-performance considerations of parallel computing on chip multiprocessors
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 2, Issue 4
December 2005
116 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/1113841
Issue’s Table of Contents

Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2005
Published in taco Volume 2, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Voltage/frequency scaling
granularity
parallel efficiency
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 61
  Total Citations
  View Citations
- 1,788
  Total Downloads
- Downloads (Last 12 months)113
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Power-performance considerations of parallel computing on chip multiprocessors

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Multi-optimization power management for chip multiprocessors

System level power-performance trade-offs in embedded systems using voltage and frequency scaling of off-chip buses and memory

Power-performance considerations of parallel computing on chip multiprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Power-performance considerations of parallel computing on chip multiprocessors

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Multi-optimization power management for chip multiprocessors

System level power-performance trade-offs in embedded systems using voltage and frequency scaling of off-chip buses and memory

Power-performance considerations of parallel computing on chip multiprocessors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media