ABSTRACT
Computer system designers often evaluate future design alternatives with detailed simulators that strive for functional fidelity (to execute relevant workloads) and performance fidelity (to rank design alternatives). Trends toward multi-threaded architectures, more complex micro-architectures, and richer workloads, make authoring detailed simulators increasingly difficult. To manage simulator complexity, this paper advocates decoupled simulator organizations that separate functional and performance concerns. Furthermore, we define an approach, called timing-first simulation, that uses an augmented timing simulator to execute instructions important to performance in conjunction with a functional simulator to insure correctness. This design simplifies software development, leverages existing simulators, and can model micro-architecture timing in detail.We describe the timing-first organization and our experiences implementing TFsim, a full-system multiprocessor performance simulator. TFsim models a pipelined, out-of-order micro-architecture in detail, was developed in less than one person-year, and performs competitively with previously-published simulators. TFsim's timing simulator implements dynamically common instructions (99.99% of them), while avoiding the vast and exacting implementation efforts necessary to run unmodified commercial operating systems and workloads. Virtutech Simics, a full-system functional simulator, checks and corrects the timing simulator's execution, contributing 18-36% to the overall run-time. TFsim's mostly correct functional implementation introduces a worst-case performance error of 4.8% for our commercial workloads. Some additional simulator performance is gained by verifying functional correctness less often, at the cost of some additional performance error.
- A. R. Alameldeen, C. J. Mauer, M. Xu, P. J. Harper, M. M. Martin, D. J. Sorin, M. D. Hill, and D. A. Wood. Evaluating Non-deterministic Multi-threaded Commercial Workloads. In Proceedings of the Fifth Workshop on Computer Architecture Evaluation Using Commercial Workloads, pages 30-38, Feb. 2002.Google Scholar
- T. Austin, E. Larson, and D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Computer, 35(2):59-67, Feb. 2002. Google ScholarDigital Library
- L. A. Barroso, K. Gharachorloo, A. Nowatzyk, and B. Verghese. Impact of Chip-Level Integration on Performance of OLTP Workloads. In Proceedings of the Sixth IEEE Symposium on High-Performance Computer Architecture, Jan. 2000.Google Scholar
- R. C. Bedichek. Some Efficient Architecture Simulation Techniques. Winter 1990 USENIX Conference, pages 53-63, Jan. 1990.Google Scholar
- R. C. Bedichek. Talisman: Fast and accurate multicomputer simulation. In Proceedings of the 1995 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 14-24, May 1995. Google ScholarDigital Library
- S. E. Breach. Design and Evaluation of a Multiscalar Processor. PhD thesis, Computer Sciences Department, University of Wisconsin-Madison, Feb. 1999. Google ScholarDigital Library
- H. W. Cain, K. M. Lepak, B. A. Schwartz, and M. H. Lipasti. Precise and Accurate Processor Simulation. In Proceedings of the Fifth Workshop on Computer Architecture Evaluation Using Commercial Workloads, pages 13-22, Feb. 2002.Google Scholar
- R. F. Cmelik and D. Keppel. Shade: A Fast Instruction-Set Simulator for Execution Profiling. In Proceedings of the 1994 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, May 1994. Google ScholarDigital Library
- D. E. Culler and J. Singh. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann Publishers, Inc., 1999. Google ScholarDigital Library
- R. Desikan, D. Burger, and S. W. Keckler. Measuring Experimental Error in Microprocessor Simulation. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 266-277, July 2001. Google ScholarDigital Library
- K. Driesen and U. Holzle. Accurate Indirect Branch Prediction. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 167-178, June 1998. Google ScholarDigital Library
- M. Durbhakula, V. S. Pai, and S. V. Adve. Improving the Accuracy vs. Speed Tradeoff for Simulating Shared-Memory Multiprocessors with ILP Processors. Technical Report TR9802, Rice University, 1999. Google ScholarDigital Library
- A. N. Eden and T. Mudge. The YAGS branch prediction scheme. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 69-77, June 1998. Google ScholarDigital Library
- J. Emer, P. Ahuja, E. Borch, A. Klauser, C.-K. Luk, S. Manne, S. S. Mukherjee, H. Patil, S. Wallace, N. Binkert, R. Espasa, and T. Juan. Asim: A Performance Model Framework. IEEE Computer, 35(2):68-76, Feb. 2002. Google ScholarDigital Library
- C. J. Hughes, V. S. P. Pai, P. Ranganathan, and S. V. Adve. Rsim: Simulating Shared-Memory Multiprocessors with ILP Processors. IEEE Computer, 35(2):40-49, Feb. 2002. Google ScholarDigital Library
- S. Jourdan, T.-H. Hsing, J. Stark, and Y. N. Patt. The Effects of Mispredicted-Path Execution on Branch Prediction Structures. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 58-67, Oct. 1996. Google ScholarDigital Library
- L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9):690-691, Sept. 1979.Google ScholarDigital Library
- E. Larson, S. Chatterjee, and T. Austin. MASE: A Novel Infrastructure for Detailed Microarchitectural Modeling. International Symposium on Performance Analysis of Systems and Software, Nov. 2001.Google Scholar
- P. S. Magnusson. A Design For Efficient Simulation of a Multiprocessor. First International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pages 69-78, Jan. 1993. Google ScholarDigital Library
- P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50-58, Feb. 2002. Google ScholarDigital Library
- M. M. K. Martin, D. J. Sorin, M. D. Hill, and D. A. Wood. Bandwidth Adaptive Snooping. In Proceedings of the Eighth IEEE Symposium on High-Performance Computer Architecture, Jan. 2002. Google ScholarDigital Library
- R. Rajwar. Personal Communication, Oct. 2001.Google Scholar
- M. Rosenblum, S. A. Herrod, E. Witchel, and A. Gupta. Complete Computer System Simulation: The SimOS Approach. IEEE Parallel and Distributed Technology: Systems and Applications, 3(4):34-43, 1995. Google ScholarDigital Library
- E. Schnarr and J. R. Larus. Fast Out-Of-Order Processor Simulation Using Memoization. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 283-294, Oct. 1998. Google ScholarDigital Library
- Sun Microsystems. UltraSPARC User's Manual. Sun Microsystems, Inc., July 1997.Google Scholar
- Systems Performance Evaluation Cooperative. SPEC Benchmarks. http://www.spec.org.Google Scholar
- Transaction Processing Performance Council. TPC Benchmark C, Draft Specification, Revision 4.0.q, Aug. 1999.Google Scholar
- R. A. Uhlig and T. N. Mudge. Trace-Driven Memory Simulation: A Survey. ACM Computing Surveys, 29(2):128-170, 1997. Google ScholarDigital Library
- D. L. Weaver and T. Germond, editors. SPARC Architecture Manual (Version 9). PTR Prentice Hall, 1994. Google ScholarDigital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-37, June 1995. Google ScholarDigital Library
- K. C. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28-40, Apr. 1996. Google ScholarDigital Library
- C. B. Zilles, J. S. Emer, and G. S. Sohi. The Use of Multithreading for Exception Handling. In Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 219-229, Nov. 1999. Google ScholarDigital Library
- Full-system timing-first simulation
Recommendations
Full-system timing-first simulation
Measurement and modeling of computer systemsComputer system designers often evaluate future design alternatives with detailed simulators that strive for functional fidelity (to execute relevant workloads) and performance fidelity (to rank design alternatives). Trends toward multi-threaded ...
Fast, Accurate, and Validated Full-System Software Simulation of x86 Hardware
This article presents a fast and accurate interval-based CPU timing model that is easily implemented and integrated in the COTSon full-system simulation infrastructure. Validation against real x86 hardware demonstrates the timing model's accuracy. The ...
A full system x86 simulator for teaching computer organization
SIGCSE '11: Proceedings of the 42nd ACM technical symposium on Computer science educationThis paper describes a new graphical computer simulator developed for computer organization students. Unlike other teaching simulators, our simulator faithfully models a complete personal computer, including an i386 processor, physical memory, I/O ports,...
Comments