ABSTRACT
This paper presents a novel cycle-approximate performance estimation technique for automatically generated transaction level models (TLMs) for heterogeneous multi-core designs. The inputs are application C processes and their mapping to processing units in the platform. The processing unit model consists of pipelined datapath, memory hierarchy and branch delay model. Using the processing unit model, the basic blocks in the C processes are analyzed and annotated with estimated delays. This is followed by a code generation phase where delay-annotated C code is generated and linked with a SystemC wrapper consisting of inter-process communication channels. The generated TLM is compiled and executed natively on the host machine. Our key contribution is that the estimation technique is close to cycle-accurate, it can be applied to any multi-core platform and it produces high-speed native compiled TLMs. For experiments, timed TLMs for industrial scale designs such as MP3 decoder were automatically generated for 4 heterogeneous multi-processor platforms with up to 5 PEs under 1 minute. Each TLM simulated under 1 second, compared to 3--4 hrs of instruction set simulation (ISS) and 15--18 hrs of RTL simulation. Comparison to on-board measurement showed only 8% error on average in estimated number of cycles.
- T. Austin, E. Larson, and D. Ernst. Simplescalar: an infrastructure for computer system modeling. Computer, 35(2):59--67, February 2002. Google ScholarDigital Library
- J. R. Bammi, W. Kruijtzer, and L. Lavagno. Software Performance Estimatioin Strategies in a System-Level Design Tool. In CODES, San Diego, USA, 2000. Google ScholarDigital Library
- C. Brandolese, W. Fornaciari, F. Salice, and D. Sciuto. Source-Level Execution Time Estimation of C Programs. In CODES, Copenhagen, Denmark, 2001. Google ScholarDigital Library
- L. Cai, A. Gerstlauer, and D. Gajski. Retargetable Profiling for Rapid, Early System-Level Design Space Exploration. In DATE, San Diego, USA, June 2004. Google ScholarDigital Library
- M.-K. Chung, S. Na, and C.-M. Kyung. System-Level Performance Analysis of Embedded System using Behavioral C/C++ model. In VLSI-TSA-DAT, Hsinchu, Taiwan, April 2005.Google Scholar
- ESE: Embedded Systems Environment. "http://www.cecs.uci.edu/ese".Google Scholar
- FastVeri (SystemC-based High-Speed Simulator) Product. "http://www.interdesigntech.co.jp/english/fastveri/".Google Scholar
- T. Kempf, K. Karuri, S. Wallentowitz, G. Ascheid, R. Leupers, and H. Meyr. A SW Performance Estimation Framework for Early System-Level-Design using Fine-grained Instrumentation. In DATE, Munich, Germany, March 2006. Google ScholarDigital Library
- M. Lajolo, M. Lazarescu, and A. Sangiovanni-Vincentelli. A Compilation-based Software Estimation Scheme for Hardware/Software Co-simulation. In CODES, Rome, Italy, May 1999. Google ScholarDigital Library
- J.-Y. Lee and I.-C. Park. Time Compiled-code Simulation of Embedded Software for Performance Analysis of SOC design. In DAC, New Orleans, USA, June 2002. Google ScholarDigital Library
- LLVM(Low Level Virtual Machine) Compiler Infrastructure Project. "http://www.llvm.org".Google Scholar
- J. T. Russell and M. F. Jacome. Architecture-level Performance Evaluation of Component-based Embedded Systems. In DAC, Anaheim, USA, June 2003. Google ScholarDigital Library
- VaST: Virtual System Prototype Technologies. "http://www.vastsystems.com/solutions-architecture-systems.html".Google Scholar
- Xilinx. Embedded System Tools Reference Manual. 2005.Google Scholar
- Xilinx. MicroBlaze Processor Reference Manual. 2007.Google Scholar
- L. Yu, S. Abdi, and D. Gajski. Transaction level platform modeling in systemc for multi-processor designs. Technical Report CECS-TR-07-01, January 2007.Google Scholar
Index Terms
- Cycle-approximate retargetable performance estimation at the transaction level
Recommendations
A retargetable VLIW compiler framework for DSPs with instruction-level parallelism
A standard design methodology for embedded processors today is the system-on-a-chip design with potentially multiple heterogeneous processing elements on a chip, such as a very long instruction word (VLIW) processor, digital signal processor (DSP), and ...
A cycle-approximate, mixed-ISA simulator for the KAHRISMA architecture
DATE '12: Proceedings of the Conference on Design, Automation and Test in EuropeProcessor architectures that are capable to reconfigure their instruction set and instruction format dynamically at run time offer a new flexibility exploiting instruction level parallelism vs. thread level parallelism. Based on the characteristics of ...
Embedded software development on top of transaction-level models
CODES+ISSS '07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesisEarly embedded SW development with transaction-level models has been broadly promoted to improve SoC design productivity. But the proposed APIs only provide low-level read/write operations via a TLM interconnect. SW developers have to implement platform-...
Comments