Abstract
This paper describes the polymorphous TRIPS architecture which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level parallelism. To adapt to small and large-grain concurrency, the TRIPS architecture contains four out-of-order, 16-wide-issue Grid Processor cores, which can be partitioned when easily extractable fine-grained parallelism exists. This approach to polymorphism provides better performance across a wide range of application types than an approach in which many small processors are aggregated to run workloads with irregular parallelism. Our results show that high performance can be obtained in each of the three modes--ILP, TLP, and DLP-demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.
- TMS320C54x DSP Reference Set, Volume 2: Mnemonic Instruction Set, Literature Number: SPRU172C, March 2001.Google Scholar
- L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 282--293, June 2000. Google ScholarDigital Library
- V. Baumgarte, F. May, A. Nückel, M. Vorbach, and M. Weinhardt. PACT XPP -- A Self-Reconfigurable Data Processing Architecture. In 1st International Conference on Engineering of Reconfigurable Systems and Algorithms, June 2001.Google Scholar
- C. Casçaval, J. Castanos, L. Ceze, M. Denneau, M. Gupta, D. Lieber, J. E. Moreira, K. Strauss, and H. S. W. Jr. Evaluation of multithreaded architecture for cellular computing. In Proceedings of the 8th International Symposium on High Performance Computer Architecture, pages 311--322, January 2002. Google ScholarDigital Library
- P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. mei W. Hwu. IMPACT: An architectural framework for multiple-instruction-issue processors. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 266--275, May 1991. Google ScholarDigital Library
- M. Cintra, J. F. Martínez, and J. Torrellas. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 13--24, June 2000. Google ScholarDigital Library
- C. Ebeling, D. C. Cronquist, and P. Franklin. Configurable computing: The catalyst for high-performance architectures. In International Conference on Application-Specific Systems, Architectures, and Processors, pages 364--372, 1997. Google ScholarDigital Library
- R. Espasa, F. Ardanaz, J. Emer, S. Felix, J. Gago, R. Gramunt, I. Hernandez, T. Juan, G. Lowney, M. Mattina, and A. Seznec. Tarantula: A Vector Extension to the Alpha Architecture. In Proceedings of The 29th International Symposium on Computer Architecture, pages 281--292, May 2002. Google ScholarDigital Library
- S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. Taylor. Piperench: A reconfigurable architecture and compiler. IEEE Computer, 33(4):70--77, April 2000. Google ScholarDigital Library
- Q. Jacobson, S. Bennett, N. Sharma, and J. E. Smith. Control flow speculation in multiscalar processors. In Proceedings of the 3rd International Symposium on High Performance Computer Architecture, Feb. 1997. Google ScholarDigital Library
- B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, and A. Chang. Imagine: Media processing with streams. IEEE Micro, 21(2):35--46, March/April 2001. Google ScholarDigital Library
- C. Kim, D. Burger, and S. W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 211--222, October 2002. Google ScholarDigital Library
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In International Symposium on Microarchitecture, pages 330--335, 1997. Google ScholarDigital Library
- S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25st International Symposium on Microarchitecture, pages 45--54, 1992. Google ScholarDigital Library
- K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz. Smart memories: A modular reconfigurable architecture. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 161--171, June 2000. Google ScholarDigital Library
- R. Nagarajan, K. Sankaralingam, D. Burger, and S. W. Keckler. A design space evaluation of grid processor architectures. In Proceedings of the 34th Annual International Symposium on Microarchitecture, pages 40--51, December 2001. Google ScholarDigital Library
- N. Ranganathan, R. Nagarajan, D. Burger, and S. W. Keckler. Combining hyperblocks and exit prediction to increase front-end bandwidth and performance. Technical Report TR-02-41, Department of Computer Sciences, The University of Texas at Austin, September 2002.Google Scholar
- S. Rixner, W. J. Dally, U. J. Kapasi, B. Khailany, A. Lopez-Lagunas, P. R. Mattson, and J. D. Owens. A bandwidth-efficient architecture for media processing. In Proceedings on the 31st International Symposium on Microarchitecture, pages 3--13, December 1998. Google ScholarDigital Library
- G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 414--425, June 1995. Google ScholarDigital Library
- J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A scalable approach to thread-level speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 1--12, June 2000. Google ScholarDigital Library
- D. Talla, L. John, and D. Burger. Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements. IEEE Transactions on Computers, to appear, pages 35--46, 2003. Google ScholarDigital Library
- J. M. Tendler, J. S. Dodson, J. J. S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Journal of Research and Development, 26(1):5--26, January 2001. Google ScholarDigital Library
- D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 392--403, June 1995. Google ScholarDigital Library
- V. Kathail, M. Schlansker, and B. R. Rau. Hpl-pd architecture specification: Version 1.1. Technical Report HPL-93-80(R.1), Hewlett-Packard Laboratories, February 2000.Google Scholar
- E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarsinghe, and A. Agarwal. Baring it all to software: RAW machines. IEEE Computer, 30(9):86--93, September 1997. Google ScholarDigital Library
Recommendations
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
ISCA '03: Proceedings of the 30th annual international symposium on Computer architectureThis paper describes the polymorphous TRIPS architecture which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined ...
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
This paper describes the polymorphous TRIPS architecture that can be configured for different granularities and types of parallelism. The TRIPS architecture is the first in a class of post-RISC, dataflow-like instruction sets called explicit data-graph ...
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture
The TRIPS architecture seeks to deliver system-level configurability to applications and runtime systems. It does so by employing the concept of polymorphism, which permits the runtime system to configure the hardware execution resources to match the ...
Comments