ABSTRACT
We develop detailed area and energy models for on-chip interconnection networks and describe tradeoffs in the design of efficient networks for tiled chip multiprocessors. Using these detailed models we investigate how aspects of the network architecture including topology, channel width, routing strategy, and buffer size affect performance and impact area and energy efficiency. We simulate the performance of a variety of on-chip networks designed for tiled chip multiprocessors implemented in an advanced VLSI process and compare area and energy efficiencies estimated from our models. We demonstrate that the introduction of a second parallel network can increase performance while improving efficiency, and evaluate different strategies for distributing traffic over the subnetworks. Drawing on insights from our analysis, we present a concentrated mesh topology with replicated subnetworks and express channels which provides a 24% improvement in area efficiency and a 48% improvement in energy efficiency over other networks evaluated in this study.
- A. Adriahantenaina, H. Charlery, A. Greiner, L. Mortiez, and C. A. Zeferino. Spin: A scalable, packet switched, on-chip micro-network. In DATE '03: Proceedings of the conference on Design, Automation and Test in Europe, page 20070, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
- P. Bai et al. A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 cu interconnect layers, low-k ild and 0.57 μm2 sram cell. In Electronic Devices Meeting, 2004. IEDM Technical Digest, pages 657--660. IEEE International, Dec 2004.Google Scholar
- A. Chatterjee et al. A 65 nm cmos technology for mobile and digital signal processing applications. In Electronic Devices Meeting, 2004. IEDM Technical Digest, pages 665--668. IEEE International, Dec 2004.Google Scholar
- W. J. Dally and B. Towles. Route packets, not wires: on-chip inteconnectoin networks. In DAC '01: Proceedings of the 38th conference on Design automation, pages 684--689, New York, NY, USA, 2001. ACM Press. Google ScholarDigital Library
- W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers, 2004. Google ScholarDigital Library
- J. Duato. A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans. Parallel Distrib. Syst., 4(12):1320--1331, 1993. Google ScholarDigital Library
- N. Eisley and L.-S. Peh. High-level power analysis for on-chip networks. In CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, pages 104--115, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- R. Ho, K. Mai, and M. Horowitz. The future of wires. In Proceedings of the IEEE, volume 89, pages 490--504. IEEE, April 2001.Google ScholarCross Ref
- R. Ho, K. Mai, and M. Horowitz. Managing wire scaling: a circuit perspective. In Proceedings of the IEEE 2003 International Interconnect Technology Conference, pages 177--179, June 2003.Google Scholar
- International technology roadmap for semiconductors. 2005 edition.Google Scholar
- J. Kim, D. Park, T. Theocharides, N. Vijaykrishnan, and C. R. Das. A low latency router supporting adaptivity for on-chip interconnects. In DAC '05: Proceedings of the 42nd annual conference on Design automation, pages 559--564, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- C. E. Leiserson. Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput., 34(10):892--901, 1985. Google ScholarDigital Library
- Z. Luo et al. High performance and low power transistors integrated in 65nm bulk cmos technology. In Electronic Devices Meeting, 2004. IEDM Technical Digest, pages 661--664. IEEE International, Dec 2004.Google ScholarCross Ref
- M. L. Mui, K. Banerjee, and A. Mehrotra. A global interconnect optimization scheme for nanometer scale vlsi with implications for latency, bandwidth, and power dissipation. In IEEE Transactions on Electron Devices, volume 51, pages 195--202. IEEE, February 2004.Google ScholarCross Ref
- R. Mullins, A. West, and S. Moore. Low-latency virtual-channel routers for on-chip networks. In ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, page 188, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
- S. R. Öhring, M. Ibel, S. K. Das, and M. J. Kumar. On generalized fat trees. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing, page 37, Washington, DC, USA, 1995. IEEE Computer Society. Google ScholarDigital Library
- K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. SIGOPS Oper. Syst. Rev., 30(5):2--11, 1996. Google ScholarDigital Library
- L.-S. Peh and W. J. Dally. A delay model and speculative architecture for pipelined routers. In HPCA '01: Proceedings of the 7th International Symposium on High-Performance Computer Architecture, page 255, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
- D. Seo, A. Ali, W.-T. Lim, N. Rafique, and M. Thottethodi. Near-optimal worst-case throughput routing for two-dimensional mesh networks. SIGARCH Comput. Archit. News, 33(2):432--443, 2005. Google ScholarDigital Library
- M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, P. Johnson, J.-W. Lee, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(2):25--35, 2002. Google ScholarDigital Library
- H. Wang, L.-S. Peh, and S. Malik. Power-driven design of router microarchitectures in on-chip networks. In MICRO 36: Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, page 105, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
- W. Zhao and Y. Cao. New generation of predictive technology model for sub-45nm design exploration. ISQED, 0:585--590, 2006. Google ScholarDigital Library
Index Terms
- Design tradeoffs for tiled CMP on-chip networks
Recommendations
Design tradeoffs for tiled CMP on-chip networks
ACM International Conference on Supercomputing 25th Anniversary VolumeWe develop detailed area and energy models for on-chip interconnection networks and describe tradeoffs in the design of efficient networks for tiled chip multiprocessors. Using these detailed models we investigate how aspects of the network architecture ...
Author retrospective for design tradeoffs for tiled CMP on-chip networks
ACM International Conference on Supercomputing 25th Anniversary VolumeIn the eight years that have passed since we published "Design Tradeoffs for Tiled CMP On-Chip Networks," on-chip interconnection networks have become pervasive, as semiconductor scaling has allowed increasing numbers of processor cores and components ...
An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures
Continuous improvements in integration scale have made possible the inclusion of several processor cores on the same chip. Such designs have been named chip-multiprocessors (or CMPs) and constitute a good alternative to traditional monolithic designs ...
Comments