ABSTRACT
Energy consumption by computer systems has emerged as an important concern. However, the energy consumed in executing an algorithm cannot be inferred from its performance alone: it must be modeled explicitly. This paper analyzes energy consumption of parallel algorithms executed on shared memory multicore processors. Specifically, we develop a methodology to evaluate how energy consumption of a given parallel algorithm changes as the number of cores and their frequency is varied. We use this analysis to establish the optimal number of cores to minimize the energy consumed by the execution of a parallel algorithm for a specific problem size while satisfying a given performance requirement. We study the sensitivity of our analysis to changes in parameters such as the ratio of the power consumed by a computation step versus the power consumed in accessing memory. The results show that the relation between the problem size and the optimal number of cores is relatively unaffected for a wide range of these parameters.
- http://www.greenpeace.org/raw/content/ international/press/reports/make-it-green-cloud-computing.pdf. Greenpeace International, 2010.Google Scholar
- A. Aggarwal and S. Vitter, Jeffrey. The input/output complexity of sorting and related problems. Communications of ACM, 31(9):1116--1127, 1988. Google ScholarDigital Library
- L. Arge, M. T. Goodrich, M. J. Nelson, and N. Sitchinava. Fundamental parallel algorithms for private-cache chip multiprocessors. In SPAA, pages 197--206, 2008. Google ScholarDigital Library
- M. A. Bender, J. T. Fineman, S. Gilbert, and B. C. Kuszmaul. Concurrent cache-oblivious b-trees. In SPAA, pages 228--237, 2005. Google ScholarDigital Library
- B. D. Bingham and M. R. Greenstreet. Computation with energy-time trade-offs: Models, algorithms and lower-bounds. In ISPA, pages 143--152, 2008. Google ScholarDigital Library
- G. E. Blelloch, R. A. Chowdhury, P. B. Gibbons, V. Ramachandran, S. Chen, and M. Kozuch. Provably good multicore cache performance for divide-and-conquer algorithms. In SODA, pages 501--510, 2008. Google ScholarDigital Library
- A. Chandrakasan, S. Sheng, and R. Brodersen. Low-power cmos digital design. IEEE Journal of Solid-State Circuits, 27(4):473--484, 1992.Google ScholarCross Ref
- S. Cho and R. Melhem. Corollaries to Amdahl's Law for Energy. Computer Architecture Letters, 7(1):25--28, 2008. Google ScholarDigital Library
- M. Curtis-Maury, A. Shah, F. Blagojevic, D. Nikolopoulos, B. de Supinski, and M. Schulz. Prediction Models for Multi-dimensional Power-Performance Optimization on Many Cores. In International Conference on Parallel architectures and compilation techniques, pages 250--259, 2008. Google ScholarDigital Library
- R. Ge, X. Feng, and K. Cameron. Performance-Constrained Distributed DVS Scheduling for Scientific Applications on Power-Aware Clusters. In International Conference on Supercomputing. IEEE Computer Society Washington, DC, USA, 2005. Google ScholarDigital Library
- R. Gonzalez and M. A. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31:1277--1284, 1995.Google ScholarCross Ref
- M. T. Goodrich. Communication-efficient parallel sorting. SIAM Journal of Computing, 29(2):416--432, 1999. Google ScholarDigital Library
- C. Isci, A. Buyuktosunoglu, C. Cher, P. Bose, and M. Martonosi. An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget. In International Symposium on Microarchitecture, volume 9, pages 347--358, 2006. Google ScholarDigital Library
- R. M. Karp and V. Ramachandran. Parallel algorithms for shared-memory machines. Handbook of Theoretical Computer Science (vol. A): Algorithms and Complexity, pages 869--941, 1990. Google ScholarDigital Library
- V. A. Korthikanti and G. Agha. Analysis of parallel algorithms for energy conservation in scalable multicore architectures. In ICPP, pages 212--219, 2009. Google ScholarDigital Library
- V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin-Cummings Publishing Co., Inc. Redwood City, CA, USA, 1994. Google ScholarDigital Library
- R. E. Ladner and M. J. Fischer. Parallel prefix computation. ACM Journal, 27(4):831--838, 1980. Google ScholarDigital Library
- J. Li and J. Martinez. Power-Performance Considerations of Parallel Computing on Chip Multiprocessors. ACM Transactions on Architecture and Code Optimization, 2(4):1--25, 2005. Google ScholarDigital Library
- J. Li and J. Martinez. Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multiprocessors. In International Symposium on High-Performance Computer Architecture, pages 77--87, 2006.Google Scholar
- A. J. Martin. Towards an energy complexity of computation. Information Processing Letters, 39:181--187, 2001. Google ScholarDigital Library
- M. P. Mills. The internet begins with coal. Green Earth Society, USA, 1999.Google Scholar
- R. Murphy. On the effects of memory latency and bandwidth on supercomputer application performance. IEEE International Symposium on Workload Characterization, pages 35--43, Sept. 2007. Google ScholarDigital Library
- S. Park, W. Jiang, Y. Zhou, and S. Adve. Managing Energy-Performance Tradeoffs for Multithreaded Applications on Multiprocessor Architectures. In ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 169--180. ACM New York, NY, USA, 2007. Google ScholarDigital Library
- X. Wang and S. Ziavras. Performance-Energy Tradeoffs for Matrix Multiplication on FPGA-Based Mixed-Mode Chip Multiprocessors. In International Symposium on Quality Electronic Design, pages 386--391, 2007. Google ScholarDigital Library
Index Terms
- Towards optimizing energy costs of algorithms for shared memory architectures
Recommendations
Energy cost evaluation of parallel algorithms for multiprocessor systems
With the continuous development of hardware and software, Graphics Processor Units (GPUs) have been used in the general-purpose computation field. They have emerged as a computational accelerator that dramatically reduces the application execution time ...
Energy consumption modeling for hybrid computing
Euro-Par'12: Proceedings of the 18th international conference on Parallel ProcessingEnergy efficiency is increasingly critical for embedded systems and mobile devices, where their continuous operation is based on battery life. In order to increase energy efficiency, chip manufacturers are developing heterogeneous CMP chips.
We present ...
Determine energy-saving potential in wait-states of large-scale parallel programs
Energy consumption is one of the major topics in high performance computing (HPC) in the last years. However, little effort is put into energy analysis by developers of HPC applications.
We present our approach of combined performance and energy analysis ...
Comments