skip to main content
research-article

A case for bufferless routing in on-chip networks

Published:20 June 2009Publication History
Skip Abstract Section

Abstract

Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or flow control. We describe new algorithms for routing without using buffers in router input/output ports. We analyze the advantages and disadvantages of bufferless routing and discuss how router latency can be reduced by taking advantage of the fact that input/output buffers do not exist. Our evaluations show that routing without buffers significantly reduces the energy consumption of the on-chip cache/processor-to-cache network, while providing similar performance to that of existing buffered routing algorithms at low network utilization (i.e., on most real applications). We conclude that bufferless routing can be an attractive and energy-efficient design option for on-chip cache/processor-to-cache networks where network utilization is low.

References

  1. ]]R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In ICS, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ]]P. Baran. On distributed communications networks. IEEE Trans. on Communications, Mar. 1964.Google ScholarGoogle ScholarCross RefCross Ref
  3. ]]P. E. Berman, L. Gravano, G. D. Pifarre, and J. L. C. Sanz. Adaptive deadlock- and livelock-free routing with all minimal paths in torus networks. IEEE TPDS, 12(5), 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ]]S. Bhansali, W.-K. Chen, S. D. Jong, A. Edwards, R. Murray, M. Drinic, D. Mihocka, and J. Chau. Framework for instruction-level tracing and analysis of programs. In VEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. ]]D. Boggs et al. The microarchitecture of the Intel Pentium 4 processor on 90nm technology. Intel Technology Journal, 8(1), Feb. 2004.Google ScholarGoogle Scholar
  6. ]]S. Borkar. Thousand core chips: A technology perspective. In DAC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. ]]S. Bregni and A. Pattavina. Performance evaluation of deflection routing in optical ip packet-switched networks. Cluster Computing, 7, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. ]]C. Busch, M. Herlihy, and R. Wattenhofer. Routing without flow control. In SPAA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. ]]S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. ]]W. J. Dally. Virtual-channel flow control. In ISCA-17, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. ]]W. J. Dally and C. L. Seitz. The torus routing chip. Distributed Computing, 1:187--196, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  12. ]]W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. ]]S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. ]]U. Feige and P. Raghavan. Exact analysis of hot-potato routing. In STOC, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. ]]J. M. Frailong, W. Jalby, and J. Lenfant. XOR-Schemes: A flexible data organization in parallel memories. In ICPP, 1985.Google ScholarGoogle Scholar
  16. ]]R. Gabor, S. Weiss, and A. Mendelson. Fairness and throughput in switch on event multithreading. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. ]]M. Galles. Spider: A high-speed network interconnect. IEEE Micro, 17(1):34--39, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. ]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. A bufferless switching technique for NoCs. In Wina, 2008.Google ScholarGoogle Scholar
  19. ]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. Reducing packet dropping in a bufferless NoC. In Euro-Par, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ]]M. K. Gowan, L. Biro, and D. Jackson. Power considerations in the design of the Alpha 21264 microprocessor. In DAC, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. ]]P. Gratz, B. Grot, and S. W. Keckler. Regional congestion awareness for load balance in networks-on-chip. In HPCA-14, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  22. ]]P. Gratz, C. Kim, R. McDonald, S. W. Keckler, and D. Burger. Implementation and evaluation of on-chip network architectures. In ICCD, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  23. ]]W. D. Hillis. The Connection Machine. MIT Press, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. ]]Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-ghz mesh interconnect for a teraflops processor. IEEE Micro, 27(5), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. ]]N. D. E. Jerger, L.-S. Peh, and M. H. Lipasti. Circuit-switched coherence. In NOCS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. ]]C. Kim, D. Burger, and S. Keckler. An adaptive, non-uniform cache structure for wire-dominated on-chip caches. In ASPLOS-X, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. ]]J. Kim, J. D. Balfour, and W. J. Dally. Flattened butterfly topology for on-chip networks. In MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. ]]S. Konstantinidou and L. Snyder. Chaos router: architecture and performance. In ISCA, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. ]]D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. ]]A. Kumar, L.-S. Peh, and N. K. Jha. Token flow control. In MICRO-41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. ]]Z. Lu, M. Zhong, and A. Jantsch. Evaluation of on-chip networks using deflection routing. In GLSVLSI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. ]]C.-K. Luk et al. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. ]]K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google ScholarGoogle Scholar
  34. ]]M. M. K. Martin et al. Timestamp snooping: An approach for extending smps. In ASPLOS-IX, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. ]]G. Michelogiannakis, J. Balfour, and W. J. Dally. Elastic-buffer flow control for on-chip networks. In HPCA-15, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  36. ]]G. Michelogiannakis, D. Pnevmatikatos, and M. Katevenis. Approaching ideal NoC latency with pre-configured routes. In NOCS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. ]]Micron. 1Gb DDR2 SDRAM Component: MT47H128M8HQ-25, May 2007. http://download.micron.com/pdf/datasheets/dram/ddr2/1GbDDR2.pdf.Google ScholarGoogle Scholar
  38. ]]M. Millberg, R. Nilsson, R. Thid, and A. Jantsch. Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip. In DATE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. ]]R. Mullins, A. West, and S. Moore. Low-latency virtual-channel routers for on-chip networks. In ISCA-31, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. ]]O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. ]]T. Nesson and S. L. Johnsson. ROMM: Routing on mesh and torus networks. In SPAA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. ]]J. D. Owens, W. J. Dally, R. Ho, D. N. Jayashima, S. W. Keckler, and L.-S. Peh. Research challenges for on-chip interconnection networks. IEEE Micro, 27(5), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. ]]H. Patil et al. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. ]]L.-S. Peh and W. J. Dally. A delay model and speculative architecture for pipelined routers. In HPCA-7, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. ]]A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. GOAL: A load-balanced adaptive routing algorithm for torus networks. In ISCA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. ]]B. J. Smith. A pipelined shared resource MIMD computer. In ICPP, 1978.Google ScholarGoogle Scholar
  47. ]]B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proc. of SPIE, 1981.Google ScholarGoogle Scholar
  48. ]]B. J. Smith, Apr. 2008. Personal communication.Google ScholarGoogle Scholar
  49. ]]A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous mutlithreading processor. In ASPLOS-IX, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. ]]M. B. Taylor et al. Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In ISCA-31, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. ]]H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In MICRO, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. ]]X. Wang, A. Morikawa, and T. Aoyama. Burst optical deflection routing protocol for wavelength routing WDM networks. In SPIE/IEEE Opticom, 2004.Google ScholarGoogle Scholar
  53. ]]D. Wentzlaff et al. On-chip interconnection architecture of the Tile processor. IEEE Micro, 27(5), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A case for bufferless routing in on-chip networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
        June 2009
        495 pages
        ISSN:0163-5964
        DOI:10.1145/1555815
        Issue’s Table of Contents
        • cover image ACM Conferences
          ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
          June 2009
          510 pages
          ISBN:9781605585260
          DOI:10.1145/1555754

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2009

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader