Abstract
This paper looks at the power-performance implications of running parallel applications on chip multiprocessors (CMPs). First, we develop an analytical model that, for the first time, puts together parallel efficiency, granularity of parallelism, and voltage/frequency scaling, to establish a formal connection with the power consumption and performance of a parallel code running on a CMP. We then conduct detailed simulations of parallel applications running on a detailed power-performance CMP model to confirm the analytical results and provide further insights. Both analytical and experimental models show that parallel computing can bring significant power savings and still meet a given performance target by choosing granularity and voltage/frequency levels judiciously. The particular choice, however, is dependent on the application's parallel efficiency curve and the process technology utilized, which our model captures. Likewise, analytical model and experiments show the effect of a limited power budget on the application's scalability curve. In particular, we show that a limited power budget can cause a rapid performance degradation beyond a number of cores, even in the case of applications with excellent scalability properties. On the other hand, our experiments show that, when a limited power budget is in place, power-thrifty memory-bound applications may actually enjoy better scalability than more compute-intensive codes, even if the latter would exhibit higher scalability in a power-unconstrained scenario.
- Agerwala, T. and Chatterjee, S. 2005. Computer architecture: Challenges and opportunities for the next decade. IEEE Micro 25, 3 (May--June), 58--69. Google ScholarDigital Library
- Annavaram, M., Grochowski, E., and Shen, J. 2005. Mitigating Amdahl's Law through EPI throttling. In International Symposium on Computer Architecture, Madison, Wisconsin, 298--309. Google Scholar
- Aslot, V. and Eigenmann, R. 2003. Quantitative performance analysis of the SPEC OMPM2001 benchmarks. Scientific Programming 11, 2, 105--124. Google ScholarDigital Library
- Borkar, S. 1999. Design challenges for technology scaling. IEEE Micro 19, 4 (July--Aug.), 23--29. Google ScholarDigital Library
- Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In International Symposium on Computer Architecture, Vancouver. 83--94. Google Scholar
- Chandrakasan, A., Sheng, S., and Brodersen, R. W. 1992. Low-power CMOS digital design. IEEE Journal of Solid-State Circuits 27, 4 (Apr.), 473--484.Google ScholarCross Ref
- Chaparro, P., González, J., and González, A. 2004. Thermal-effective clustered microarchitectures. In Workshop on Temperature-Aware Computer Systems, München.Google Scholar
- Compaq Computer Corporation. 1999. Alpha 21264 Microprocessor Hardware Reference Manual. Compaq Computer Corp., Shrewsbury, MA.Google Scholar
- Culler, D. E. and Singh, J. P. 1999. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann. Google Scholar
- Donald, J. and Martonosi, M. 2004. Temperature-aware design issues for SMT and CMP architectures. In Workshop on Complexity-Effective Design, München.Google Scholar
- Ekman, M. and Stenström, P. 2003. Performance and power impact of issue-width in chip-multiprocessor cores. In International Conference on Parallel Processing, Kaohsiung. 359--368.Google Scholar
- Elnozahy, E. N., Kistler, M., and Rajamony, R. 2002. Energy-efficient server clusters. In Workshop on Power Aware Computing Systems. Cambridge, MA. 179--196. Google Scholar
- Elnozahy, E. N., Kistler, M., and Rajamony, R. 2003. Energy conservation policies for web servers. In USENIX Symposium on Internet Technologies and Systems. Seattle, WA. Google Scholar
- Ghiasi, S. and Grunwald, D. 2004. Design choices for thermal control in dual-core processors. In Workshop on Complexity-Effective Design, München.Google Scholar
- Gonzalez, R. and Horowitz, M. 1996. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits 31, 9 (Sept.), 1277--1284.Google ScholarCross Ref
- Grochowski, E., Ronen, R., Shen, J., and Wang, H. 2004. Best of both latency and throughput. In International Conference on Computer Design, San Jose, CA. 236--243. Google Scholar
- Gunther, S., Binns, F., Carmean, D. M., and Hall, J. C. 2001. Managing the impact of increasing microprocessor power consumption. Intel Technology Journal 5, 1 (Feb.).Google Scholar
- Hennessy, J. L. and Patterson, D. A. 2003. Computer Architecture: A Quantitative Approach, 3rd edn. Elsevier Science, New York. Google ScholarDigital Library
- Heo, S. and Asanović, K. 2004. Power-optimal pipelining in deep submicron technology. In International Symposium on Low Power Electronics and Design, Newport Beach, CA. Google Scholar
- Huh, J., Burger, D., and Keckler, S. W. 2001. Exploring the design space of future CMPs. In International Conference on Parallel Architectures and Compilation Techniques, Barcelona. 199--210. Google Scholar
- Intel Corporation. 2004. Intel Pentium M Processor on 90nm Process with 2-MB L2 Cache Datasheet. Intel Corp. Hudson, MA.Google Scholar
- Kadayif, I., Kandemir, M., and Sezer, U. 2002. An integer linear programming based approach for parallelizing applications in on-chip multiprocessors. In IEEE/ACM Design Automation Conference, New Orleans, LA. 703--708. Google Scholar
- Kadayif, I., Kandemir, M., Vijaykrishnan, N., and Irwin, M. J. 2004. Exploiting processor workload heterogeneity for reducing energy consumption in chip multiprocessors. In Design, Automation and Test in Europe, Paris. 1158--1163. Google Scholar
- Kaxiras, S., Narlikar, G., Berenbaum, A. D., and Hu, Z. 2001. Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, Atlanta, Georgia. 211--220. Google Scholar
- Kim, N. S., Austin, T., Blaauw, D., Mudge, T., Flautner, K., Hu, J. S., Irwin, M. J., Kandemir, M., and Narayanan, V. 2003. Leakage current: Moore's law meets static power. IEEE Computer 36, 12 (Dec.), 68--75. Google Scholar
- Kumar, R., Farkas, K. I., Jouppi, N. P., Ranganathan, P., and Tullsen, D. M. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In International Symposium on Microarchitecture, San Diego, CA. 81--92. Google Scholar
- Kumar, R., Tullsen, D. M., Ranganathan, P., Jouppi, N. P., and Farkas, K. I. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In International Symposium on Computer Architecture, München. 64--75. Google Scholar
- Lawrence, R. D., Almasi, G. S., and Rushmeier, H. E. 1999. A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems. Data Mining and Knowledge Discovery 3, 2 (Sept.), 171--195. Google ScholarCross Ref
- Li, J., Martínez, J. F., and Huang, M. C. 2004. The Thrifty Barrier: Energy-aware synchronization in shared-memory multiprocessors. In International Symposium on High-Performance Computer Architecture, Madrid. 14--23. Google Scholar
- Li, Y., Skadron, K., Brooks, D., and Hu, Z. 2004. Understanding the energy efficiency of simultaneous multithreading. In International Symposium on Low Power Electronics and Design, Newport Beach, CA. 207--212. Google Scholar
- Li, Y., Brooks, D., Hu, Z., and Skadron, K. 2005. Performance, energy, and temperature considerations for SMT and CMP architectures. In International Symposium on High-Performance Computer Architecture, San Francisco, CA. Google Scholar
- Majan, R. 2002. Thermal management of CPUs: A perspective on trends, needs and opportunities. In International Workshop on Thermal Investigations of ICs and Systems, Madrid. (Keynote presentation).Google Scholar
- Moshovos, A., Memik, G., Falsafi, B., and Choudhary, A. 2001. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In International Symposium on High-Performance Computer Architecture, Nuevo Leone. 85--96. Google Scholar
- Mudge, T. 2001. Power: A first-class architectural design constraint. IEEE Computer 34, 4 (Apr.), 52--58. Google ScholarDigital Library
- Parhi, K. K. 1999. VLSI Digital Signal Processing Systems. Wiley, New York.Google Scholar
- Pinheiro, E., Bianchini, R., Carrera, E., and Heath, T. 2001. Load balancing and unbalancing for power and performance in cluster-based systems. In International Workshop on Compilers and Operating Systems for Low Power, Barcelona.Google Scholar
- Rajamani, K. and Lefurgy, C. 2003. On evaluating request-distribution schemes for saving energy in server clusters. In International Symposium on Performance Analysis of Systems and Software, Austin, TX. 111--122. Google Scholar
- Saldanha, C. and Lipasti, M. 2001. Power efficient cache coherence. In Workshop on Memory Performance Issues, Göteborg.Google Scholar
- Sasanka, R., Adve, S. V., Chen, Y., and Debes, E. 2004. Comparing the energy efficiency of CMP and SMT architectures for multimedia workloads. In International Conference on Supercomputing, Malo. 196--206. Google Scholar
- Seng, J. S., Tullsen, D. M., and Cai, G. Z. N. 2000. Power-sensitive multithreaded architecture. In International Conference on Computer Design, Austin, Texas. 199--208. Google Scholar
- Skadron, K., Stan, M., Huang, W., and Velusamy, S. 2003. Temperature-aware microarchitecture: Extended discussion and results. Tech. Rep. CS-2003-08, University of Virginia. (Apr.)Google Scholar
- The ITRS Technology Working Groups. 2003. International Technology Roadmap for Semiconductors (ITRS). The ITRS Technology Working Groups.Google Scholar
- Weiser, U. 2004. Microprocessors: Bypass the power wall. In Intel Academic Forum, Barcelona. (Keynote presentation).Google Scholar
- Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits 31, 5 (May), 677--688.Google ScholarCross Ref
- Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In International Symposium on Computer Architecture. Santa Margherita Ligure, Italy. 24--36. Google Scholar
- Zyuban, V., Brooks, D., Srinivasan, V., Gschwind, M., Bose, P., Strenski, P. N., and Emma, P. G. 2004. Integrated analysis of power and performance for pipelined microprocessors. IEEE Transactions on Computers 53, 8 (Aug.), 1004--1016. Google ScholarDigital Library
Index Terms
- Power-performance considerations of parallel computing on chip multiprocessors
Recommendations
Multi-optimization power management for chip multiprocessors
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesThe emergence of power as a first-class design constraint has fueled the proposal of a growing number of run-time power optimizations. Many of these optimizations trade-off power saving opportunity for a variable performance loss which depends on ...
System level power-performance trade-offs in embedded systems using voltage and frequency scaling of off-chip buses and memory
ISSS '02: Proceedings of the 15th international symposium on System SynthesisIn embedded systems, off-chip buses and memory (i.e., L2 memory as opposed to the L1 memory which is usually on-chip cache) consume significant power, often more than the processor itself. In this paper, for the case of an embedded system with one ...
Comments