research-article

A case for bufferless routing in on-chip networks

Authors:
Thomas Moscibroda

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Onur Mutlu

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 37 Issue 3June 2009pp 196–207https://doi.org/10.1145/1555815.1555781

Published:20 June 2009Publication History

ACM SIGARCH Computer Architecture News

Abstract

Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or flow control. We describe new algorithms for routing without using buffers in router input/output ports. We analyze the advantages and disadvantages of bufferless routing and discuss how router latency can be reduced by taking advantage of the fact that input/output buffers do not exist. Our evaluations show that routing without buffers significantly reduces the energy consumption of the on-chip cache/processor-to-cache network, while providing similar performance to that of existing buffered routing algorithms at low network utilization (i.e., on most real applications). We conclude that bufferless routing can be an attractive and energy-efficient design option for on-chip cache/processor-to-cache networks where network utilization is low.

References

]]R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In ICS, 1990. Google ScholarDigital Library
]]P. Baran. On distributed communications networks. IEEE Trans. on Communications, Mar. 1964.Google ScholarCross Ref
]]P. E. Berman, L. Gravano, G. D. Pifarre, and J. L. C. Sanz. Adaptive deadlock- and livelock-free routing with all minimal paths in torus networks. IEEE TPDS, 12(5), 1994. Google ScholarDigital Library
]]S. Bhansali, W.-K. Chen, S. D. Jong, A. Edwards, R. Murray, M. Drinic, D. Mihocka, and J. Chau. Framework for instruction-level tracing and analysis of programs. In VEE, 2006. Google ScholarDigital Library
]]D. Boggs et al. The microarchitecture of the Intel Pentium 4 processor on 90nm technology. Intel Technology Journal, 8(1), Feb. 2004.Google Scholar
]]S. Borkar. Thousand core chips: A technology perspective. In DAC, 2007. Google ScholarDigital Library
]]S. Bregni and A. Pattavina. Performance evaluation of deflection routing in optical ip packet-switched networks. Cluster Computing, 7, 2004. Google ScholarDigital Library
]]C. Busch, M. Herlihy, and R. Wattenhofer. Routing without flow control. In SPAA, 2001. Google ScholarDigital Library
]]S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006. Google ScholarDigital Library
]]W. J. Dally. Virtual-channel flow control. In ISCA-17, 1990. Google ScholarDigital Library
]]W. J. Dally and C. L. Seitz. The torus routing chip. Distributed Computing, 1:187--196, 1986.Google ScholarCross Ref
]]W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004. Google ScholarDigital Library
]]S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008. Google ScholarDigital Library
]]U. Feige and P. Raghavan. Exact analysis of hot-potato routing. In STOC, 1992.Google ScholarDigital Library
]]J. M. Frailong, W. Jalby, and J. Lenfant. XOR-Schemes: A flexible data organization in parallel memories. In ICPP, 1985.Google Scholar
]]R. Gabor, S. Weiss, and A. Mendelson. Fairness and throughput in switch on event multithreading. In MICRO-39, 2006. Google ScholarDigital Library
]]M. Galles. Spider: A high-speed network interconnect. IEEE Micro, 17(1):34--39, 2008. Google ScholarDigital Library
]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. A bufferless switching technique for NoCs. In Wina, 2008.Google Scholar
]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. Reducing packet dropping in a bufferless NoC. In Euro-Par, 2008. Google ScholarDigital Library
]]M. K. Gowan, L. Biro, and D. Jackson. Power considerations in the design of the Alpha 21264 microprocessor. In DAC, 1998. Google ScholarDigital Library
]]P. Gratz, B. Grot, and S. W. Keckler. Regional congestion awareness for load balance in networks-on-chip. In HPCA-14, 2008.Google ScholarCross Ref
]]P. Gratz, C. Kim, R. McDonald, S. W. Keckler, and D. Burger. Implementation and evaluation of on-chip network architectures. In ICCD, 2006.Google ScholarCross Ref
]]W. D. Hillis. The Connection Machine. MIT Press, 1989. Google ScholarDigital Library
]]Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-ghz mesh interconnect for a teraflops processor. IEEE Micro, 27(5), 2007. Google ScholarDigital Library
]]N. D. E. Jerger, L.-S. Peh, and M. H. Lipasti. Circuit-switched coherence. In NOCS, 2008. Google ScholarDigital Library
]]C. Kim, D. Burger, and S. Keckler. An adaptive, non-uniform cache structure for wire-dominated on-chip caches. In ASPLOS-X, 2002. Google ScholarDigital Library
]]J. Kim, J. D. Balfour, and W. J. Dally. Flattened butterfly topology for on-chip networks. In MICRO, 2007. Google ScholarDigital Library
]]S. Konstantinidou and L. Snyder. Chaos router: architecture and performance. In ISCA, 1991. Google ScholarDigital Library
]]D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981. Google ScholarDigital Library
]]A. Kumar, L.-S. Peh, and N. K. Jha. Token flow control. In MICRO-41, 2008. Google ScholarDigital Library
]]Z. Lu, M. Zhong, and A. Jantsch. Evaluation of on-chip networks using deflection routing. In GLSVLSI, 2006. Google ScholarDigital Library
]]C.-K. Luk et al. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005. Google ScholarDigital Library
]]K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google Scholar
]]M. M. K. Martin et al. Timestamp snooping: An approach for extending smps. In ASPLOS-IX, 2000. Google ScholarDigital Library
]]G. Michelogiannakis, J. Balfour, and W. J. Dally. Elastic-buffer flow control for on-chip networks. In HPCA-15, 2009.Google ScholarCross Ref
]]G. Michelogiannakis, D. Pnevmatikatos, and M. Katevenis. Approaching ideal NoC latency with pre-configured routes. In NOCS, 2007. Google ScholarDigital Library
]]Micron. 1Gb DDR2 SDRAM Component: MT47H128M8HQ-25, May 2007. http://download.micron.com/pdf/datasheets/dram/ddr2/1GbDDR2.pdf.Google Scholar
]]M. Millberg, R. Nilsson, R. Thid, and A. Jantsch. Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip. In DATE, 2004. Google ScholarDigital Library
]]R. Mullins, A. West, and S. Moore. Low-latency virtual-channel routers for on-chip networks. In ISCA-31, 2004. Google ScholarDigital Library
]]O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarDigital Library
]]T. Nesson and S. L. Johnsson. ROMM: Routing on mesh and torus networks. In SPAA, 1995. Google ScholarDigital Library
]]J. D. Owens, W. J. Dally, R. Ho, D. N. Jayashima, S. W. Keckler, and L.-S. Peh. Research challenges for on-chip interconnection networks. IEEE Micro, 27(5), 2007. Google ScholarDigital Library
]]H. Patil et al. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarDigital Library
]]L.-S. Peh and W. J. Dally. A delay model and speculative architecture for pipelined routers. In HPCA-7, 2001. Google ScholarDigital Library
]]A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. GOAL: A load-balanced adaptive routing algorithm for torus networks. In ISCA, 2003. Google ScholarDigital Library
]]B. J. Smith. A pipelined shared resource MIMD computer. In ICPP, 1978.Google Scholar
]]B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proc. of SPIE, 1981.Google Scholar
]]B. J. Smith, Apr. 2008. Personal communication.Google Scholar
]]A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous mutlithreading processor. In ASPLOS-IX, 2000. Google ScholarDigital Library
]]M. B. Taylor et al. Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In ISCA-31, 2004. Google ScholarDigital Library
]]H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In MICRO, 2002. Google ScholarDigital Library
]]X. Wang, A. Morikawa, and T. Aoyama. Burst optical deflection routing protocol for wavelength routing WDM networks. In SPIE/IEEE Opticom, 2004.Google Scholar
]]D. Wentzlaff et al. On-chip interconnection architecture of the Tile processor. IEEE Micro, 27(5), 2007. Google ScholarDigital Library

Index Terms

A case for bufferless routing in on-chip networks
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
    2. Parallel architectures
      1. Interconnection architectures

Recommendations

A case for bufferless routing in on-chip networks
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or ...
Read More
Flattened Butterfly Topology for On-Chip Networks

With the trend towards increasing number of cores in a multicore processors, the on-chip network that connects the cores needs to scale efficiently. In this work, we propose the use of high-radix networks in on-chip networks and describe how the ...
Read More
Packetization and routing analysis of on-chip multiprocessor networks
Special issue: Networks on chip

Some current and most future systems-on-chips use and will use network architectures/protocols to implement on-chip communication. On-chip networks borrow features and design methods from those used in parallel computing clusters and computer system ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 37, Issue 3
June 2009
495 pages
ISSN:0163-5964
DOI:10.1145/1555815
Issue’s Table of Contents
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
June 2009
510 pages
ISBN:9781605585260
DOI:10.1145/1555754
General Chair:
Steve Keckler
University of Texas at Austin
,
Program Chair:
Luiz André Barroso
Google Inc.
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2009
Check for updates
Author Tags
memory systems
multi-core
on-chip networks
routing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 334
  Total Citations
  View Citations
- 2,296
  Total Downloads
- Downloads (Last 12 months)174
- Downloads (Last 6 weeks)85
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A case for bufferless routing in on-chip networks

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

A case for bufferless routing in on-chip networks

Flattened Butterfly Topology for On-Chip Networks

Packetization and routing analysis of on-chip multiprocessor networks