Abstract
Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or flow control. We describe new algorithms for routing without using buffers in router input/output ports. We analyze the advantages and disadvantages of bufferless routing and discuss how router latency can be reduced by taking advantage of the fact that input/output buffers do not exist. Our evaluations show that routing without buffers significantly reduces the energy consumption of the on-chip cache/processor-to-cache network, while providing similar performance to that of existing buffered routing algorithms at low network utilization (i.e., on most real applications). We conclude that bufferless routing can be an attractive and energy-efficient design option for on-chip cache/processor-to-cache networks where network utilization is low.
- ]]R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In ICS, 1990. Google ScholarDigital Library
- ]]P. Baran. On distributed communications networks. IEEE Trans. on Communications, Mar. 1964.Google ScholarCross Ref
- ]]P. E. Berman, L. Gravano, G. D. Pifarre, and J. L. C. Sanz. Adaptive deadlock- and livelock-free routing with all minimal paths in torus networks. IEEE TPDS, 12(5), 1994. Google ScholarDigital Library
- ]]S. Bhansali, W.-K. Chen, S. D. Jong, A. Edwards, R. Murray, M. Drinic, D. Mihocka, and J. Chau. Framework for instruction-level tracing and analysis of programs. In VEE, 2006. Google ScholarDigital Library
- ]]D. Boggs et al. The microarchitecture of the Intel Pentium 4 processor on 90nm technology. Intel Technology Journal, 8(1), Feb. 2004.Google Scholar
- ]]S. Borkar. Thousand core chips: A technology perspective. In DAC, 2007. Google ScholarDigital Library
- ]]S. Bregni and A. Pattavina. Performance evaluation of deflection routing in optical ip packet-switched networks. Cluster Computing, 7, 2004. Google ScholarDigital Library
- ]]C. Busch, M. Herlihy, and R. Wattenhofer. Routing without flow control. In SPAA, 2001. Google ScholarDigital Library
- ]]S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006. Google ScholarDigital Library
- ]]W. J. Dally. Virtual-channel flow control. In ISCA-17, 1990. Google ScholarDigital Library
- ]]W. J. Dally and C. L. Seitz. The torus routing chip. Distributed Computing, 1:187--196, 1986.Google ScholarCross Ref
- ]]W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004. Google ScholarDigital Library
- ]]S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008. Google ScholarDigital Library
- ]]U. Feige and P. Raghavan. Exact analysis of hot-potato routing. In STOC, 1992.Google ScholarDigital Library
- ]]J. M. Frailong, W. Jalby, and J. Lenfant. XOR-Schemes: A flexible data organization in parallel memories. In ICPP, 1985.Google Scholar
- ]]R. Gabor, S. Weiss, and A. Mendelson. Fairness and throughput in switch on event multithreading. In MICRO-39, 2006. Google ScholarDigital Library
- ]]M. Galles. Spider: A high-speed network interconnect. IEEE Micro, 17(1):34--39, 2008. Google ScholarDigital Library
- ]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. A bufferless switching technique for NoCs. In Wina, 2008.Google Scholar
- ]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. Reducing packet dropping in a bufferless NoC. In Euro-Par, 2008. Google ScholarDigital Library
- ]]M. K. Gowan, L. Biro, and D. Jackson. Power considerations in the design of the Alpha 21264 microprocessor. In DAC, 1998. Google ScholarDigital Library
- ]]P. Gratz, B. Grot, and S. W. Keckler. Regional congestion awareness for load balance in networks-on-chip. In HPCA-14, 2008.Google ScholarCross Ref
- ]]P. Gratz, C. Kim, R. McDonald, S. W. Keckler, and D. Burger. Implementation and evaluation of on-chip network architectures. In ICCD, 2006.Google ScholarCross Ref
- ]]W. D. Hillis. The Connection Machine. MIT Press, 1989. Google ScholarDigital Library
- ]]Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-ghz mesh interconnect for a teraflops processor. IEEE Micro, 27(5), 2007. Google ScholarDigital Library
- ]]N. D. E. Jerger, L.-S. Peh, and M. H. Lipasti. Circuit-switched coherence. In NOCS, 2008. Google ScholarDigital Library
- ]]C. Kim, D. Burger, and S. Keckler. An adaptive, non-uniform cache structure for wire-dominated on-chip caches. In ASPLOS-X, 2002. Google ScholarDigital Library
- ]]J. Kim, J. D. Balfour, and W. J. Dally. Flattened butterfly topology for on-chip networks. In MICRO, 2007. Google ScholarDigital Library
- ]]S. Konstantinidou and L. Snyder. Chaos router: architecture and performance. In ISCA, 1991. Google ScholarDigital Library
- ]]D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981. Google ScholarDigital Library
- ]]A. Kumar, L.-S. Peh, and N. K. Jha. Token flow control. In MICRO-41, 2008. Google ScholarDigital Library
- ]]Z. Lu, M. Zhong, and A. Jantsch. Evaluation of on-chip networks using deflection routing. In GLSVLSI, 2006. Google ScholarDigital Library
- ]]C.-K. Luk et al. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005. Google ScholarDigital Library
- ]]K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google Scholar
- ]]M. M. K. Martin et al. Timestamp snooping: An approach for extending smps. In ASPLOS-IX, 2000. Google ScholarDigital Library
- ]]G. Michelogiannakis, J. Balfour, and W. J. Dally. Elastic-buffer flow control for on-chip networks. In HPCA-15, 2009.Google ScholarCross Ref
- ]]G. Michelogiannakis, D. Pnevmatikatos, and M. Katevenis. Approaching ideal NoC latency with pre-configured routes. In NOCS, 2007. Google ScholarDigital Library
- ]]Micron. 1Gb DDR2 SDRAM Component: MT47H128M8HQ-25, May 2007. http://download.micron.com/pdf/datasheets/dram/ddr2/1GbDDR2.pdf.Google Scholar
- ]]M. Millberg, R. Nilsson, R. Thid, and A. Jantsch. Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip. In DATE, 2004. Google ScholarDigital Library
- ]]R. Mullins, A. West, and S. Moore. Low-latency virtual-channel routers for on-chip networks. In ISCA-31, 2004. Google ScholarDigital Library
- ]]O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarDigital Library
- ]]T. Nesson and S. L. Johnsson. ROMM: Routing on mesh and torus networks. In SPAA, 1995. Google ScholarDigital Library
- ]]J. D. Owens, W. J. Dally, R. Ho, D. N. Jayashima, S. W. Keckler, and L.-S. Peh. Research challenges for on-chip interconnection networks. IEEE Micro, 27(5), 2007. Google ScholarDigital Library
- ]]H. Patil et al. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarDigital Library
- ]]L.-S. Peh and W. J. Dally. A delay model and speculative architecture for pipelined routers. In HPCA-7, 2001. Google ScholarDigital Library
- ]]A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. GOAL: A load-balanced adaptive routing algorithm for torus networks. In ISCA, 2003. Google ScholarDigital Library
- ]]B. J. Smith. A pipelined shared resource MIMD computer. In ICPP, 1978.Google Scholar
- ]]B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proc. of SPIE, 1981.Google Scholar
- ]]B. J. Smith, Apr. 2008. Personal communication.Google Scholar
- ]]A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous mutlithreading processor. In ASPLOS-IX, 2000. Google ScholarDigital Library
- ]]M. B. Taylor et al. Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In ISCA-31, 2004. Google ScholarDigital Library
- ]]H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In MICRO, 2002. Google ScholarDigital Library
- ]]X. Wang, A. Morikawa, and T. Aoyama. Burst optical deflection routing protocol for wavelength routing WDM networks. In SPIE/IEEE Opticom, 2004.Google Scholar
- ]]D. Wentzlaff et al. On-chip interconnection architecture of the Tile processor. IEEE Micro, 27(5), 2007. Google ScholarDigital Library
Index Terms
- A case for bufferless routing in on-chip networks
Recommendations
A case for bufferless routing in on-chip networks
ISCA '09: Proceedings of the 36th annual international symposium on Computer architectureBuffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or ...
Flattened Butterfly Topology for On-Chip Networks
With the trend towards increasing number of cores in a multicore processors, the on-chip network that connects the cores needs to scale efficiently. In this work, we propose the use of high-radix networks in on-chip networks and describe how the ...
Packetization and routing analysis of on-chip multiprocessor networks
Special issue: Networks on chipSome current and most future systems-on-chips use and will use network architectures/protocols to implement on-chip communication. On-chip networks borrow features and design methods from those used in parallel computing clusters and computer system ...
Comments