Abstract
This paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Embra models the processors of a MIPS R3000/R4000 machine faithfully enough to run a commercial operating system and arbitrary user applications. To achieve high simulation speed, Embra uses dynamic binary translation to generate code sequences which simulate the workload. It is the first machine simulator to use this technique. Embra can simulate real workloads such as multiprocess compiles and the SPEC92 benchmarks running on Silicon Graphic's IRIX 5.3 at speeds only 3 to 9 times slower than native execution of the workload, making Embra the fastest reported complete machine simulator. Dynamic binary translation also gives Embra the flexibility to dynamically control both the simulation statistics reported and the simulation model accuracy with low performance overheads. For example, Embra can customize its generated code to include a processor cache model which allows it to compute the cache misses and memory stall time of a workload. Customized code generation allows Embra to simulate a machine with caches at slowdowns of only a factor of 7 to 20. Most of the statistics generated at this speed match those produced by a slower reference simulator to within 1%. This paper describes the techniques used by Embra to achieve high performance, focusing on the requirements unique to machine simulation, including modeling the processor, memory management unit, and caches. In order to study Embra's memory system performance we use the SimOS simulation system to examine Embra itself. We present a detailed breakdown of Embra's memory system performance for two cache hierarchies to understand Embra's current performance and to show that Embra's implementation techniques benefit significantly from the larger cache hierarchies that are becoming available. Embra has been used for operating system development and testing as well as for studies of computer architecture. In this capacity it has simulated large, commercial workloads including IRIX running a relational database system and a CAD system for billions of simulated machine cycles.
- Bedicheck90 Robert Bedichek. Some Efficient Architecture Simulation Techniques, Winter 1990 Usenix Technical Conference, Jan, 1990.Google Scholar
- Bedicheck95 Robert C. Bedicheck. Talisman: Fast and Accurate Multicomputer Simulation, In SIGMETRICS, Ottawa, Ontario, Canada, May, 1995. Google ScholarDigital Library
- Chapin95 John Chapin, Mendel Rosenblum, Scott Devine, Tirthankar Lahiri, Dan Teodosiu, and Anoop Gupta. Hive: Fault Containment for Shared-Memory Multiprocessors. SOSP, Colorado, 1995. Google ScholarDigital Library
- Cmelik94 Robert E Cmelik and David Keppel. Shade: A Fast Instruction Set Simulator for Execution Profiling, SIGMETRICS, Nashville, TN, 1994. Google ScholarDigital Library
- Dixit92 Kaivalya M. Dixit. New CPU Benchmark Suites from SPEC, 37th Annual IEEE International Computer Conference --- COMPCON Spring '92, San Francisco, CA, Feb. 1992. Google ScholarDigital Library
- Engler95 Dawson R. Engler, M. Frans Kaashoek, and James O'Toole Jr., ExokerneL" An Operating System Architecture for ~li.cation.Level Resource Management, SOSP, Colorado, Google ScholarDigital Library
- Engler96 Dawson R. Engler, Wilson C. Hsieh, and M. Frans Kaahsoek. "C: A Language for High-Level, Efficient, and Machine-independent Dynamic Code Generation. POPL, St. Petersburg, FL, 1996. Google ScholarDigital Library
- Hastings91 R. Hastings, B. Joyce. Purify: fast detection of memory leaks and access errors, Proceedings of the Winter 1992 USENIX Conference, Berkeley, CA, 1991, pages 125-36.Google Scholar
- Lenoski92 D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer 25(3):63-79. March 1992. Google ScholarDigital Library
- Magnusson93 Peter Magnusson. A Design For Efficient Simulation of a Multiprocessor, MASCOTS "93 -Proceedings ot the 1993 Western Simulation Multiconterence on International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, La Jolla, California, January 1993. Google ScholarDigital Library
- Magnusson95 Peter Magnusson and Bengt Wemer. Efficient Memory Simulation in SimlCS, 28th Annual Simulation Symposium, Phoenix, April 1995. Google ScholarDigital Library
- Massalin92 Henry Massalin. Synthesis: An Efficeint Implementation of Fundamental Operating System Services, Ph.D. Thesis, Columbia University 1992. Google ScholarDigital Library
- Ousterhout90 John Ousterhout. Why Aren't Operating Systems Getting Faster as Fast as Hardware ?, In Proceedings of the Summer 1990 USENIX Conference, pp. 247-256, June 1990.Google Scholar
- Lebeck95 Alvin R. Lebeck, David A Wood. Active Memory: A New Abstraction for Memory-System Simulation, SIGMETRICS, Ottawa, Ontario, Canada, 1995. Google ScholarDigital Library
- Reinhardt93 Steven K. Reinhardt, Mark D. Hill, James R. LarPrototypingus, Alvin R. Lebeck, James C. Lewis, and David A. Woo~l. "The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers," SIGMETRICS, Santa Clara, CA, 1993. Google ScholarDigital Library
- Rosenblum95a Mendel. Rosenblum, Edouard Bugnion, Stephen A. Herrod, Emmett Witchel, and Anoop Gupta. The Impact of Architectural Trends on Operating System Performance. SOSP, Colorado, 1995. Google ScholarDigital Library
- Rosenblum95b Mendel Rosenblum, Steven A. Herrod, Emmett Witchel, and Anoop Gupta. Complete Computer System Simulation: The SiinOS Approach. IEEE Parallel and Distributed Technology, Fall 1995. Google ScholarDigital Library
- Srivastava94 Amitabh Srivastava and Alan Eustace. ATOM: a system for building customized program analysis tools, SIGPLAN Notices, June 1994, vol-.29, no.6, pages 196-205. Google ScholarDigital Library
- Uhlig94 Richard Uhlig, David Nagle, Trevor Mudge and Stuart ~ecnrest. Trap-driven Simulation with Tapeworm Ii, ASPLOS San Jose, 1994. Google ScholarDigital Library
- Wahbe93 R. Wahbe, S. Lucco, T. Anderson, and S. Graham. cient Software-Based Fault Isolation." SOSE December 1993. Google ScholarDigital Library
- Woo95 Steven Cameron Woo, Moriyoshi Ohara, Evan Tome, Jaswinder Pal Singh, and Anoop Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. Proceedings of the 22nd ISCA, Santa Margherita Ligure, Italy, June 1995. Google ScholarDigital Library
Index Terms
- Embra: fast and flexible machine simulation
Recommendations
Embra: fast and flexible machine simulation
SIGMETRICS '96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsThis paper describes Embra, a simulator for the processors, caches, and memory systems of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS simulation environment, Embra models the processors of a MIPS R3000/R4000 ...
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs
Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
Comments