Skip to main content
Erschienen in: Design Automation for Embedded Systems 3-4/2014

01.09.2014

Centaur: a hybrid network-on-chip architecture utilizing micro-network fusion

verfasst von: Junghee Lee, Chrysostomos Nicopoulos, Hyung Gyu Lee, Jongman Kim

Erschienen in: Design Automation for Embedded Systems | Ausgabe 3-4/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The escalating proliferation of multicore chips has accentuated the criticality of the on-chip network. Packet-based networks-on-chip (NoC) have emerged as the de facto interconnect of future chip multi-processors (CMP). On-chip traffic comprises a mixture of data and control messages from the cache coherence protocol. Given the latency-criticality of control messages, in this paper we aim to optimize their delivery times. Instead of treating the on-chip router as a monolithic component, we advocate the introduction of an ultra-low-latency ring-inspired (i.e., utilizing ring primitive building blocks) support micro-network that is optimized for control messages. This \(\upmu \)NoC is fused with a throughput-driven conventional NoC router to form a hybrid architecture, called Centaur, which maintains separate data paths and control logic for the two fused networks. Full-system simulation results from a 64-core CMP indicate that the proposed fused Centaur router improves overall system performance by up to 26 %, as compared to a state-of-the-art router implementation. Furthermore, hardware synthesis results using commercial 65 nm libraries indicate that Centaur’s area and power overheads are 9 and 3 %, respectively, as compared to a baseline router design. More importantly, the new design does not affect the router’s critical path.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
In Greek mythology, Centaur was a hybrid creature that was part human and part horse. Much like this mythical creature, our proposed router architecture fuses two distinct networks into one entity.
 
Literatur
1.
Zurück zum Zitat Abad P, Puente V, Gregorio JA (2013) LIGERO: a light but efficient router conceived for cache-coherent chip multiprocessors. ACM Trans Archit Code Optim 9(4):37:1–37:21. Abad P, Puente V, Gregorio JA (2013) LIGERO: a light but efficient router conceived for cache-coherent chip multiprocessors. ACM Trans Archit Code Optim 9(4):37:1–37:21.
2.
Zurück zum Zitat Abad P, Puente V, Gregorio JA, Prieto P (2007) Rotary router: an efficient architecture for cmp interconnection networks. In: Proceedings of the 34th annual international symposium on computer architecture, ISCA ’07, pp 116–125. Abad P, Puente V, Gregorio JA, Prieto P (2007) Rotary router: an efficient architecture for cmp interconnection networks. In: Proceedings of the 34th annual international symposium on computer architecture, ISCA ’07, pp 116–125.
3.
Zurück zum Zitat Abousamra A, Melhem R, Jones A (2012) Deja vu switching for multiplane nocs. In: Sixth IEEE/ACM international symposium on networks on chip (NoCS), pp 11–18. Abousamra A, Melhem R, Jones A (2012) Deja vu switching for multiplane nocs. In: Sixth IEEE/ACM international symposium on networks on chip (NoCS), pp 11–18.
4.
Zurück zum Zitat Agarwal N, Krishna T, Peh LS, Jha N (2009) GARNET: A detailed on-chip network model inside a full-system simulator. In: IEEE international symposium on performance analysis of systems and software. Agarwal N, Krishna T, Peh LS, Jha N (2009) GARNET: A detailed on-chip network model inside a full-system simulator. In: IEEE international symposium on performance analysis of systems and software.
5.
Zurück zum Zitat Agarwal N, Peh LS, Jha N (2009), In-network snoop ordering (INSO): snoopy coherence on unordered interconnects. In: Proceedings of the 15th international symposium on high-performance computer, architecture, pp 67–78. Agarwal N, Peh LS, Jha N (2009), In-network snoop ordering (INSO): snoopy coherence on unordered interconnects. In: Proceedings of the 15th international symposium on high-performance computer, architecture, pp 67–78.
6.
Zurück zum Zitat Anjan K, Pinkston T, Duato J (1996) Generalized theory for deadlock-free adaptive wormhole routing and its application to disha concurrent. In: Proceedings of IPPS ’96. The 10th international parallel processing symposium, pp 815–821. Anjan K, Pinkston T, Duato J (1996) Generalized theory for deadlock-free adaptive wormhole routing and its application to disha concurrent. In: Proceedings of IPPS ’96. The 10th international parallel processing symposium, pp 815–821.
7.
Zurück zum Zitat Balfour J, Dally WJ (2006) Design tradeoffs for tiled CMP on-chip networks. In: Proceedings of the 20th annual international conference on supercomputing, pp 187–198. Balfour J, Dally WJ (2006) Design tradeoffs for tiled CMP on-chip networks. In: Proceedings of the 20th annual international conference on supercomputing, pp 187–198.
8.
Zurück zum Zitat Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. Thesis, Princeton University. Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. Thesis, Princeton University.
9.
Zurück zum Zitat Bolotin E, Guz Z, Cidon I, Ginosar R, Kolodny A (2007) The power of priority: NoC based distributed cache coherency. In: Proceedings of the first international symposium on networks-on-chip. Bolotin E, Guz Z, Cidon I, Ginosar R, Kolodny A (2007) The power of priority: NoC based distributed cache coherency. In: Proceedings of the first international symposium on networks-on-chip.
10.
Zurück zum Zitat Bourduas S, Zilic Z (2007) A hybrid ring/mesh interconnect for network-on-chip using hierarchical rings for global routing. In: Proceedings of the first international symposium on networks-on-chip, pp 195–204. Bourduas S, Zilic Z (2007) A hybrid ring/mesh interconnect for network-on-chip using hierarchical rings for global routing. In: Proceedings of the first international symposium on networks-on-chip, pp 195–204.
11.
Zurück zum Zitat Chuang JH, Chao WC (1994) Torus with slotted rings architecture for a cache-coherent multiprocessor. In: Proceedings of the 1994 international conference on parallel and distributed systems, pp 76–81. Chuang JH, Chao WC (1994) Torus with slotted rings architecture for a cache-coherent multiprocessor. In: Proceedings of the 1994 international conference on parallel and distributed systems, pp 76–81.
12.
Zurück zum Zitat Das R, Eachempati S, Mishra A, Narayanan V, Das C (2009), Design and evaluation of a hierarchical on-chip interconnect for next-generation cmps. In: Proceedings of the 15th international symposium on high-performance computer, architecture, pp 175–186. Das R, Eachempati S, Mishra A, Narayanan V, Das C (2009), Design and evaluation of a hierarchical on-chip interconnect for next-generation cmps. In: Proceedings of the 15th international symposium on high-performance computer, architecture, pp 175–186.
13.
Zurück zum Zitat Das R, Mutlu O, Moscibroda T, Das C (2009) Application-aware prioritization mechanisms for on-chip networks. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, pp 280–291. Das R, Mutlu O, Moscibroda T, Das C (2009) Application-aware prioritization mechanisms for on-chip networks. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, pp 280–291.
14.
Zurück zum Zitat Duato J, Yalamanchili S, Ni L (2003) Interconnection networks. Margan Kaufmann, San Francisco Duato J, Yalamanchili S, Ni L (2003) Interconnection networks. Margan Kaufmann, San Francisco
15.
Zurück zum Zitat Flores A, Aragon J, Acacio M (2010) Heterogeneous interconnects for energy-efficient message management in cmps. IEEE Trans Comput 59(1):16–28MathSciNetCrossRef Flores A, Aragon J, Acacio M (2010) Heterogeneous interconnects for energy-efficient message management in cmps. IEEE Trans Comput 59(1):16–28MathSciNetCrossRef
16.
Zurück zum Zitat Gratz P, Kim C, McDonald R, Keckler S, Burger D (2006) Implementation and evaluation of on-chip network architectures. In: Proceedings of international conference on computer design. Gratz P, Kim C, McDonald R, Keckler S, Burger D (2006) Implementation and evaluation of on-chip network architectures. In: Proceedings of international conference on computer design.
17.
Zurück zum Zitat Hayenga M, Jerger NE, Lipasti M (2009) SCARAB: a single cycle adaptive routing and bufferless network. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture. Hayenga M, Jerger NE, Lipasti M (2009) SCARAB: a single cycle adaptive routing and bufferless network. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture.
18.
Zurück zum Zitat Holliday M, Stumm M (1994) Performance evaluation of hierarchical ring-based shared memory multiprocessors. IEEE Trans Comput 43:52–67CrossRef Holliday M, Stumm M (1994) Performance evaluation of hierarchical ring-based shared memory multiprocessors. IEEE Trans Comput 43:52–67CrossRef
19.
Zurück zum Zitat Jerger NDE, Peh LS, Lipasti MH (2008) Circuit-switched coherence. In: Proceedings of the second ACM/IEEE international symposium on networks-on-chip, pp 193–202. Jerger NDE, Peh LS, Lipasti MH (2008) Circuit-switched coherence. In: Proceedings of the second ACM/IEEE international symposium on networks-on-chip, pp 193–202.
20.
Zurück zum Zitat Kim J (2009) Low-cost router microarchitecture for on-chip networks. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, pp 255–266. Kim J (2009) Low-cost router microarchitecture for on-chip networks. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, pp 255–266.
21.
Zurück zum Zitat Kim J, Nicopoulos C, Park D (2006) A gracefully degrading and energy-efficient modular router architecture for on-chip networks. SIGARCH Comput Archit News 34(2):4–15CrossRef Kim J, Nicopoulos C, Park D (2006) A gracefully degrading and energy-efficient modular router architecture for on-chip networks. SIGARCH Comput Archit News 34(2):4–15CrossRef
22.
Zurück zum Zitat Kumar A, Peh LS, Kundu P, Jha NK (2007) Express virtual channels: towards the ideal interconnection fabric. In: Proceedings of the 34th annual international symposium on computer architecture. Kumar A, Peh LS, Kundu P, Jha NK (2007) Express virtual channels: towards the ideal interconnection fabric. In: Proceedings of the 34th annual international symposium on computer architecture.
23.
Zurück zum Zitat Kumary A, Kunduz P, Singhx A, Peh LS, Jhay N (2007) A 4.6Tbits/s 3.6 GHz single-cycle NoC router with a novel switch allocator in 65 nm CMOS. In: Proceedings of the 25th international conference on computer design, pp 63–70. Kumary A, Kunduz P, Singhx A, Peh LS, Jhay N (2007) A 4.6Tbits/s 3.6 GHz single-cycle NoC router with a novel switch allocator in 65 nm CMOS. In: Proceedings of the 25th international conference on computer design, pp 63–70.
24.
Zurück zum Zitat Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput Archit News 33:2005CrossRef Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput Archit News 33:2005CrossRef
25.
Zurück zum Zitat Matsutani H, Koibuchi M, Amano H, Yoshinaga T (2009), Prediction router: yet another low latency on-chip router architecture. In: Proceedings of the IEEE 15th international symposium on high performance computer, architecture, pp 367–378. Matsutani H, Koibuchi M, Amano H, Yoshinaga T (2009), Prediction router: yet another low latency on-chip router architecture. In: Proceedings of the IEEE 15th international symposium on high performance computer, architecture, pp 367–378.
26.
Zurück zum Zitat Mullins R, West A, Moore S (2004), Low-latency virtual-channel routers for on-chip networks. In: Proceedings of the 31st, annual international symposium on computer architecture, p 188. Mullins R, West A, Moore S (2004), Low-latency virtual-channel routers for on-chip networks. In: Proceedings of the 31st, annual international symposium on computer architecture, p 188.
27.
Zurück zum Zitat Mullins R, West A, Moore S (2006) The design and implementation of a low-latency on-chip network. In: Proceedings of Asia and South Pacific conference on design automation, p 6. Mullins R, West A, Moore S (2006) The design and implementation of a low-latency on-chip network. In: Proceedings of Asia and South Pacific conference on design automation, p 6.
28.
Zurück zum Zitat Nicopoulos C, Park D, Kim J, Vijaykrishnan N, Yousif M, Das C (2006) Vichar: a dynamic virtual channel regulator for network-on-chip routers. In: 39th annual IEEE/ACM international symposium on microarchitecture, pp 333–346. Nicopoulos C, Park D, Kim J, Vijaykrishnan N, Yousif M, Das C (2006) Vichar: a dynamic virtual channel regulator for network-on-chip routers. In: 39th annual IEEE/ACM international symposium on microarchitecture, pp 333–346.
29.
Zurück zum Zitat Park C, Badeau R, Biro L, Chang J, Singh T, Vash J, Wang B, Wang T (2010) A 1.2 TB/s on-chip ring interconnect for 45nm 8-core enterprise Xeon processor. In: Proceedings of IEEE international solid-state circuits conference digest of technical papers, pp 180–181. Park C, Badeau R, Biro L, Chang J, Singh T, Vash J, Wang B, Wang T (2010) A 1.2 TB/s on-chip ring interconnect for 45nm 8-core enterprise Xeon processor. In: Proceedings of IEEE international solid-state circuits conference digest of technical papers, pp 180–181.
30.
Zurück zum Zitat Peh LS, Dally WJ (2001), A delay model and speculative architecture for pipelined routers. In: Proceedings of the 7th international symposium on high-performance computer, architecture, p 255. Peh LS, Dally WJ (2001), A delay model and speculative architecture for pipelined routers. In: Proceedings of the 7th international symposium on high-performance computer, architecture, p 255.
31.
Zurück zum Zitat Pinkston T (1999) Flexible and efficient routing based on progressive deadlock recovery. IEEE Trans Comput 48(7):649–669CrossRef Pinkston T (1999) Flexible and efficient routing based on progressive deadlock recovery. IEEE Trans Comput 48(7):649–669CrossRef
32.
Zurück zum Zitat Sibai F (2008) Adapting the hyper-ring interconnect for many-core processors. In: International symposium on parallel and distributed processing with applications, pp 649–654. Sibai F (2008) Adapting the hyper-ring interconnect for many-core processors. In: International symposium on parallel and distributed processing with applications, pp 649–654.
33.
Zurück zum Zitat Singh A, Dally W, Towles B, Gupta A (2004) Globally adaptive load-balanced routing on tori. Comput Archit Lett 3(1):2CrossRef Singh A, Dally W, Towles B, Gupta A (2004) Globally adaptive load-balanced routing on tori. Comput Archit Lett 3(1):2CrossRef
34.
Zurück zum Zitat Song YH, Pinkston T (2003) A progressive approach to handling message-dependent deadlock in parallel computer systems. IEEE Trans Parallel Distrib Syst 14(3):259–275CrossRef Song YH, Pinkston T (2003) A progressive approach to handling message-dependent deadlock in parallel computer systems. IEEE Trans Parallel Distrib Syst 14(3):259–275CrossRef
35.
Zurück zum Zitat Volos S, Seiculescu C, Grot B, Pour N, Falsafi B, De Micheli G (2012) Ccnoc: Specializing on-chip interconnects for energy efficiency in cache-coherent servers. In: Sixth IEEE/ACM international symposium on networks on chip (NoCS), pp 67–74. Volos S, Seiculescu C, Grot B, Pour N, Falsafi B, De Micheli G (2012) Ccnoc: Specializing on-chip interconnects for energy efficiency in cache-coherent servers. In: Sixth IEEE/ACM international symposium on networks on chip (NoCS), pp 67–74.
Metadaten
Titel
Centaur: a hybrid network-on-chip architecture utilizing micro-network fusion
verfasst von
Junghee Lee
Chrysostomos Nicopoulos
Hyung Gyu Lee
Jongman Kim
Publikationsdatum
01.09.2014
Verlag
Springer US
Erschienen in
Design Automation for Embedded Systems / Ausgabe 3-4/2014
Print ISSN: 0929-5585
Elektronische ISSN: 1572-8080
DOI
https://doi.org/10.1007/s10617-014-9131-z

Weitere Artikel der Ausgabe 3-4/2014

Design Automation for Embedded Systems 3-4/2014 Zur Ausgabe

Editorial

Editorial