skip to main content
article

Performance/Watt: the new server focus

Published:01 November 2005Publication History
Skip Abstract Section

Abstract

Transaction processing has emerged as the killer application for commercial servers. Most servers are engaged in transactional workloads such as processing search requests, serving middleware, evaluating decisions, managing databases, and powering online commerce. Currently, commercial servers are built from one or more high-performance superscalar processors. However, commercial server applications exhibit high cache miss rates, large memory footprints, and low instruction level parallelism (ILP), which leads to poor utilization on traditional ILP-focused superscalar processors [11]. In addition, these ILP-focused processors have been primarily optimized to deliver maximum performance by employing high clock rates and large amounts of speculation. As a result, we are now at the point where the performance/Watt of subsequent generations of traditional ILP-focused processors on server workloads has been flat [4] or even decreasing. The lack of increase in processor performance/Watt, coupled with the continued decrease in server hardware acquisition costs and likely increases in future power and cooling costs is leading to a situation where total cost of server ownership will soon be predominately determined by power [4]. In this paper, we argue that attacking thread-level parallelism (TLP) via a large number of simple cores on a chip multiprocessor (CMP) leads to much better performance/Watt for server workloads. As a case study, we compare Sun's TLP-oriented Niagara processor against the ILP-oriented dual-core Pentium Extreme Edition from Intel, showing that the Niagara processor has a significant performance/Watt advantage for throughput-oriented server applications.

References

  1. A. Agarwal, J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D'Souza, and M. Parkin, "Sparcle: An Evolutionary Processor Design for Large-scale Multiprocessors," IEEE Micro June 1993, pages 48--61. 0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads." Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998, pages 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Barroso, K. Charachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing." Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Barroso, "The Price of Performance", ACM Queue, Vol 3, Number 7, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chaudhry, P. Caprioli, S. Yip, and M. Tremblay, "High-Performance Throughput Computing," IEEE Micro, May/June 2005, pages 32--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Clabes, J. Friedrich, and M. Sweet, "Design and Implementation of the POWER5#8482; Microprocessor" ISSCC Dig. Tech. Papers, Feb. 2004, pages 56--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. D. Davis, et. al. "Maximizing CMT Throughput with Mediocre Cores" In Proceeedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, Sep. 2005, pages 51--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Dundas and T. Mudge, "Improving Data Cache Performance by Pre-executing Instructions Under a Cache Miss", In Proceedings of the 1997 International Conference on Supercomputing, July 1997, pages 68--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Hrishikesh, et. al. "The Optimal Logic Depth per Pipeline Stage Is 6 to 8 FO4 Inverter Delays". In Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 14--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-way Multithreaded SPARC Processor," IEEE Micro, March/April 2005, pages 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Kunkel, R. Eickemeyer, M. Lip, T. Mullins, "A Performance Methodology for Commercial Servers," IBM Journal of Research and Development, Vol. 44, Number 6, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Laudon, A. Gupta, and M. Horowitz, "Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations," Proceedings of the 6th International Symposium on Architectural Support for Parallel Languages and Operating Systems, October 1994, pages 308--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Lo, L. Barroso, S. Eggers, K. Gharachorloo, et. al. "An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors," Proceeedings of the 25th Annual International Symposium on Computer Architecture, June 1998, pages39--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Marr, "Hyper-Threading Technology in the Netburst® Microarchitecture", 14th Hot Chips, August 2002.Google ScholarGoogle Scholar
  15. S. Mukherjee, M. Kontz, and S. Reinhardt, "Detailed Design and Evaluation of Redundant Multithreading Alternatives," Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 99--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Mutlu, H. Kim, J. Stark, and Y. N. Patt, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," Proceedings of the 9th International Symposium on High Performance Computer Architecture, February 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Naffzigerl, T. Grutkowski2, and B. Stackhouse, "The Implementation of a 2-core Multi-Threaded Itanium® Family Processor," IEEE Internation Solid-State Circuits Conference (ISSCC), Feb. 2005, pages 182--183Google ScholarGoogle Scholar
  18. C. Poirier, R. McGowen2, C. Bostak1, and S. Naffziger, "Power and Temperature Control on a 90nm Itanium®-Family Processor," ISSCC, Feb. 2005, pages 304--305Google ScholarGoogle Scholar
  19. Standard Performance Evaluation Corporation, SPEC*, http://www.spec.org, Warrenton, VA.Google ScholarGoogle Scholar
  20. Transaction Processing Performance Council, TPC-*, http:/www.tpc.org, San Francisco, CAGoogle ScholarGoogle Scholar
  21. D. Tullsen, S. Eggers, and H. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallism," Proceedings of the 22nd Annual International Symposium on Computer Architecture, June 1995, pages 392--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Vijaykumar, I. Pomeranz, and K. Cheng, "Transient-Fault Recovery Using Simultaneous Multithreading," Proceedings of the 29th Annual International Symposium on Computer Architecture, May 2002, pages 87--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. "XML Processing Performance in Java and .NET", http://java.sun.com/performance/reference/whitepapers/XML_Test-1_0.pdfGoogle ScholarGoogle Scholar

Index Terms

  1. Performance/Watt: the new server focus

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM SIGARCH Computer Architecture News
                  ACM SIGARCH Computer Architecture News  Volume 33, Issue 4
                  Special issue: dasCMP'05
                  November 2005
                  130 pages
                  ISSN:0163-5964
                  DOI:10.1145/1105734
                  Issue’s Table of Contents

                  Copyright © 2005 Author

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 November 2005

                  Check for updates

                  Qualifiers

                  • article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader