article

Performance/Watt: the new server focus

Author:
James Laudon

Sun Microsystems

Sun Microsystems
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 33 Issue 4November 2005pp 5–13https://doi.org/10.1145/1105734.1105737

Published:01 November 2005Publication History

ACM SIGARCH Computer Architecture News

Abstract

Transaction processing has emerged as the killer application for commercial servers. Most servers are engaged in transactional workloads such as processing search requests, serving middleware, evaluating decisions, managing databases, and powering online commerce. Currently, commercial servers are built from one or more high-performance superscalar processors. However, commercial server applications exhibit high cache miss rates, large memory footprints, and low instruction level parallelism (ILP), which leads to poor utilization on traditional ILP-focused superscalar processors [11]. In addition, these ILP-focused processors have been primarily optimized to deliver maximum performance by employing high clock rates and large amounts of speculation. As a result, we are now at the point where the performance/Watt of subsequent generations of traditional ILP-focused processors on server workloads has been flat [4] or even decreasing. The lack of increase in processor performance/Watt, coupled with the continued decrease in server hardware acquisition costs and likely increases in future power and cooling costs is leading to a situation where total cost of server ownership will soon be predominately determined by power [4]. In this paper, we argue that attacking thread-level parallelism (TLP) via a large number of simple cores on a chip multiprocessor (CMP) leads to much better performance/Watt for server workloads. As a case study, we compare Sun's TLP-oriented Niagara processor against the ILP-oriented dual-core Pentium Extreme Edition from Intel, showing that the Niagara processor has a significant performance/Watt advantage for throughput-oriented server applications.

References

A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-scale Multiprocessors," IEEE Micro June 1993, pages 48--61. 0. Google ScholarDigital Library
L. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads." Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998, pages 3--14. Google ScholarDigital Library
L. Barroso, K. Charachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing." Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000. Google ScholarDigital Library
L. Barroso, "The Price of Performance", ACM Queue, Vol 3, Number 7, September 2005. Google ScholarDigital Library
S. Chaudhry, P. Caprioli, S. Yip, and M. Tremblay, "High-Performance Throughput Computing," IEEE Micro, May/June 2005, pages 32--45. Google ScholarDigital Library
J. Clabes, J. Friedrich, and M. Sweet, "Design and Implementation of the POWER5#8482; Microprocessor" ISSCC Dig. Tech. Papers, Feb. 2004, pages 56--57. Google ScholarDigital Library
J. D. Davis, et. al. "Maximizing CMT Throughput with Mediocre Cores" In Proceeedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, Sep. 2005, pages 51--62. Google ScholarDigital Library
J. Dundas and T. Mudge, "Improving Data Cache Performance by Pre-executing Instructions Under a Cache Miss", In Proceedings of the 1997 International Conference on Supercomputing, July 1997, pages 68--75. Google ScholarDigital Library
M. Hrishikesh, et. al. "The Optimal Logic Depth per Pipeline Stage Is 6 to 8 FO4 Inverter Delays". In Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 14--24. Google ScholarDigital Library
P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-way Multithreaded SPARC Processor," IEEE Micro, March/April 2005, pages 21--29. Google ScholarDigital Library
S. Kunkel, R. Eickemeyer, M. Lip, T. Mullins, "A Performance Methodology for Commercial Servers," IBM Journal of Research and Development, Vol. 44, Number 6, 2000. Google ScholarDigital Library
J. Laudon, A. Gupta, and M. Horowitz, "Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations," Proceedings of the 6th International Symposium on Architectural Support for Parallel Languages and Operating Systems, October 1994, pages 308--318. Google ScholarDigital Library
J. Lo, L. Barroso, S. Eggers, K. Gharachorloo, et. al. "An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors," Proceeedings of the 25th Annual International Symposium on Computer Architecture, June 1998, pages39--50. Google ScholarDigital Library
D. Marr, "Hyper-Threading Technology in the Netburst® Microarchitecture", 14th Hot Chips, August 2002.Google Scholar
S. Mukherjee, M. Kontz, and S. Reinhardt, "Detailed Design and Evaluation of Redundant Multithreading Alternatives," Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 99--110. Google ScholarDigital Library
O. Mutlu, H. Kim, J. Stark, and Y. N. Patt, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," Proceedings of the 9th International Symposium on High Performance Computer Architecture, February 2003. Google ScholarDigital Library
S. Naffzigerl, T. Grutkowski2, and B. Stackhouse, "The Implementation of a 2-core Multi-Threaded Itanium® Family Processor," IEEE Internation Solid-State Circuits Conference (ISSCC), Feb. 2005, pages 182--183Google Scholar
C. Poirier, R. McGowen2, C. Bostak1, and S. Naffziger, "Power and Temperature Control on a 90nm Itanium®-Family Processor," ISSCC, Feb. 2005, pages 304--305Google Scholar
Standard Performance Evaluation Corporation, SPEC*, http://www.spec.org, Warrenton, VA.Google Scholar
Transaction Processing Performance Council, TPC-*, http:/www.tpc.org, San Francisco, CAGoogle Scholar
D. Tullsen, S. Eggers, and H. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallism," Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pages 392--403. Google ScholarDigital Library
T. Vijaykumar, I. Pomeranz, and K. Cheng, "Transient-Fault Recovery Using Simultaneous Multithreading," Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 87--98. Google ScholarDigital Library
"XML Processing Performance in Java and .NET", http://java.sun.com/performance/reference/whitepapers/XML_Test-1_0.pdfGoogle Scholar

Index Terms

Performance/Watt: the new server focus

Recommendations

Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing
Special section on adaptive power management for energy and temperature-aware computing systems

Asymmetric multi-core processors (AMPs) have been shown to outperform symmetric ones in terms of performance and performance/watt. Improved performance and power efficiency are achieved when the program threads are matched to their most suitable cores. ...
Read More
Dynamic Thread Scheduling in Asymmetric Multicores to Maximize Performance-per-Watt
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Recent trends in technology scaling have enabled the incorporation of multiple processor cores on a single die. Depending on the characteristics of the cores, the multicore may be either symmetric (SMP) or asymmetric (AMP). Several studies have shown ...
Read More
Efficient superscalar performance through boosting
ASPLOS V: Proceedings of the fifth international conference on Architectural support for programming languages and operating systems

The foremost goal of superscalar processor design is to increase performance through the exploitation of instruction-level parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per cycle (IPC) rates ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGARCH Computer Architecture News Volume 33, Issue 4
Special issue: dasCMP'05
November 2005
130 pages
ISSN:0163-5964
DOI:10.1145/1105734
Issue’s Table of Contents

Copyright © 2005 Author
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2005
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 650
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Performance/Watt: the new server focus

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing

Dynamic Thread Scheduling in Asymmetric Multicores to Maximize Performance-per-Watt

Efficient superscalar performance through boosting

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Performance/Watt: the new server focus

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Improving performance per watt of asymmetric multi-core processors via online program phase classification and adaptive core morphing

Dynamic Thread Scheduling in Asymmetric Multicores to Maximize Performance-per-Watt

Efficient superscalar performance through boosting

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media