Abstract
SIMP is a novel multiple instruction-pipeline parallel architecture. It is targeted for enhancing the performance of SISD processors drastically by exploiting both temporal and spatial parallelisms, and for keeping program compatibility as well. Degree of performance enhancement achieved by SIMP depends on; i) how to supply multiple instructions continuously, and ii) how to resolve data and control dependencies effectively. We have devised the outstanding techniques for instruction fetch and dependency resolution. The instruction fetch mechanism employs unique schemes of; i) prefetching multiple instructions with the help of branch prediction, ii) squashing instructions selectively, and iii) providing multiple conditional modes as a result. The dependency resolution mechanism permits out-of-order execution of sequential instruction stream. Our out-of-order execution model is based on Tomasulo's algorithm which has been used in single instruction-pipeline processors. However, it is greatly extended and accommodated to multiple instruction pipelining with; i) detecting and identifying multiple dependencies simultaneously, ii) alleviating the effects of control dependencies with both eager execution and advance execution, and iii) ensuring a precise machine state against branches and interrupts. By taking advantage of these techniques, SIMP is one of the most promising architectures toward the coming generation of high-speed single processors.
- Acosta86 R.D.Acosta, J.Kjelstrup, and H.C.Torng, "An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors," IEEE Trans. Cornput., vol.C-36, no.9, pp.815-828, Sept. 1986. Google ScholarDigital Library
- Colwell87 R.P.ColwelI, R.P.Nix, J.J.O'Donnell, D.B.Papworth, and P.K.Rodman, "A VLIW Architecture for a Trace Scheduling Compiler," Proc. 2nd Znt. Conf. Archifectural Support for Programming Languages and Operating Systems fASPLOS If), pp.180-192, Oct. 1987. Google ScholarCross Ref
- Fisher81 J.A.Fisher, "Trace Scheduling: A Technique for Global Microcode Compaction," IEEE Trans. Comput., vol. C-30, no.7, pp.478-490, July 1981.Google Scholar
- Fisher83 J.A.Fisher, "Very Long Instruction Word Architectures and the ELI-512," Proc. 10th Ann. Int. Symp. Computer Architecture, pp.140-150, June 1983. Google ScholarDigital Library
- Hagiwara80 H.Hagiwara, STomita, S.Oyanagi, and K.Shibayama, "A Dynamically Microprogrammable Computer with Low-Level Parallelism," IEEE Trans. Comput., vol.C-29, no.7, pp.577-695, July 1980.Google ScholarDigital Library
- Hwu87 W.W.Hwu and Y.N.Patt, "Checkpoint Repair for Out-oforder Execution Machines," Proc. 14th Artn. Int. Symp. Computer Architecture, pp.18-26, June 1987; also IEEE Trans. Cornput. vol.C-36, no.12, pp.1496.1514, Dec. 1987. Google ScholarDigital Library
- Irie88 N.Irie, M.Kuga, K.Murakami, and S.Tomita, "Speedup Mechanisms and Performance Estimate for the SIMP Processor Prototype (in Japanese)," ZPSJ WGARC report 73-11, Nov. 1988.Google Scholar
- Kuga89 M.Kuga, K.Murakami, and STomita, "Low-level Parallel Processing Algorithms for the SIMP Processor Prototype (in Japanese)," Proc. IPSJ Joint Symp. Parallel Processing'89, pp.163-170, Feb. 1989.Google Scholar
- Lam88 M.Lam, "Software Pipelining: An Effective Scheduling Technique for VLIW Machines," Proc. SIGPLAN'88 Conf. Programming Language Design and ImpZemcntation , pp.318- 328, June 1988. Google ScholarDigital Library
- Lee84 J.K.F.Lee and A.J.Smith, "Branch Prediction Strategies and Branch Target Buffer Design," IEEE Computer, vol.17, no.1, pp.6-22, Jan. 1984.Google ScholarDigital Library
- Murakami88 K.Murakami,A.Fukuda, T.Sueyoshi, and STomita, "SIMP:Single Instruction stream/Multiple instruction Pipelining (in Japanese)," IPSJ WGARC report 69-4, Jan. 1988.Google Scholar
- Patt85 Y.N.Patt, W-M.Hwu, and M.Shebanow,"HPS, A New Microarchitecture: Rationale and Introduction," Proc. 18th Ann. Workshop on Microprogramming, pp.103-108, Dec. 1985. Google ScholarDigital Library
- Pleszkun88 A.R.Pleszkun and G.S.Sohi, "The Performance Potential of Multiple Functional Unit Processors," Proc. 15th Ann. lnt. Symp. Computer Architecture, pp.37-44, May 1988. Google ScholarDigital Library
- Rau89 JB.R.Rau,D.W.L.Yen, W.YenandR.A.Towle,"TheCydra 5 Departmental Supercomputer," ZEEE Computer, vol.22, no.J, Jan. 1989. Google ScholarDigital Library
- Smith85 J.E.Smith and A.R.Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors," Proc. 12th Ann. Int. Symp. Computer Architecture, pp.36-44, June 1985; also IEEE Trans. Cornput., vol.C-37, no.5, pp.562-573, May 1988. Google ScholarDigital Library
- Sohi87 GSSohi and S.Vajapeyam, "Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors," Proc. 14th Ann. Int. Symp. Computer Architecture, pp.27-34, June 1987. Google ScholarDigital Library
- Tjaden70 GSTjaden and M.J.Flynn, "Detection and Parallel Execution of independent instructions," IEEE Trans. Cornput., vol.C-19,no.l0, pp.889-895, Oct. 1970.Google Scholar
- Tomasulo67 R.M.Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units." IBM J. Res. Develop., vol.ll,pp.25-33, Jan. 1967.Google Scholar
- Tomita83 S.Tomita, KShibayama, T.Kitamura, T.Nakata, and H.Hagiwara, "A User-Microprogrammable, Local Host Computer with Low-Level Parallelism," Proc. 10th Ann. Int. Symp. Computer Architecture, pp.151-157, June 1983. Google ScholarDigital Library
- Tomita86 STomita, K.Shibayama, T.Nakata, S.Yuasa, and H.Hagiwara, "A Computer with Low-Level Parallelism QA-2 - Its Applications to 3-D Graphics and Prolog/Lisp Machines -," Proc. 13th Ann. Int. Symp. Computer Architecture, pp.280-289, June 1986. Google ScholarDigital Library
- Weiss84 SWeiss and J.E.Smith, "Instruction Issue Logic for Pipelined Supercomputers," Proc. 11th Ann. Znt. Symp. Computer Architecture, pp.llO-118, June 1984; also IEEE Trans. Comput., vol.C-33, no.ll.pp.1013-1022, Nov. 1984. Google ScholarDigital Library
Index Terms
- SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture
Recommendations
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture
ISCA '89: Proceedings of the 16th annual international symposium on Computer architectureSIMP is a novel multiple instruction-pipeline parallel architecture. It is targeted for enhancing the performance of SISD processors drastically by exploiting both temporal and spatial parallelisms, and for keeping program compatibility as well. Degree ...
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, ...
Modeling the effects of instruction queue loading on a static instruction stream micro-architecture
MICRO 21: Proceedings of the 21st annual workshop on Microprogramming and microarchitectureIncreased processor performance requires the exploitation of the parallelism that exists within the instruction stream and within the processor itself: A static instruction stream micro-architecture, CONDEL, extracts and uses the machine instruction ...
Comments