Reconfigurable fault tolerant routing for networks-on-chip with logical hierarchy☆
Introduction
The ongoing technology scaling allows an increasing number of cores to be implemented on a single chip, e.g. Intel’s Xeon Phi Coprocessor [1] or Tilera’s Tile-MX multicore processor [2]. As this scaling trend continues [3], future multiprocessor systems will feature hundreds of cores on a single chip.
The increasing size of on-chip systems poses a new challenge to Networks-on-Chip (NoCs). Mechanisms implemented in an NoC, such as table-based routing, work well for small systems but do not scale for bigger systems. A possibility to cope with scalability problems is the introduction of a hierarchical structure to NoCs. A hierarchical NoC can be obtained by constructing its topology using subnetworks or by segmenting a given topology into logical units. Both subnetworks and logical units enable formerly global mechanisms to be applied locally thus reducing their complexity. Compared to subnetworks, logical units have the advantage that they can be applied without changing an existing topology. Typically, network nodes (switch + core) are grouped into a logical unit if they are part of the same task and share a spatial relation.
The downside of technology scaling is the increased probability of occurrence of permanent faults in an NoC due to manufacturing inaccuracies [4] or wear-out effects such as electromigration [5] that emerge during system operation. The failure of links or switches due to permanent faults results in an altered network topology. In such a case, static routing can no longer maintain connectivity between system components. For this reason, it is crucial that the routing is adapted to the new network situation to enable packets to circumvent faulty components.
In this paper we present a reconfigurable fault tolerant routing approach based on Up/Down routing [6] for large scale NoCs with logical hierarchy. It enables the routing to be adapted locally within each hierarchical unit in case of permanent faults while deadlock freedom is guaranteed globally. Our approach can be applied to any number of hierarchy levels.
The remainder of the paper is organized as follows: In Section 2 related work is discussed. Section 3 contains a formal introduction to NoC topologies as well as an introduction to Up/Down routing. In Section 4 we present our hierarchical network concept. Our hierarchical routing is presented in Section 5 and the reconfiguration process in Section 6. Evaluation results are discussed in Section 7. Section 8 concludes the paper.
Section snippets
Related work
In this section, we focus on work related to routing reconfiguration. Related work dealing with hierarchical topologies and hierarchical routing is presented in [7].
A reconfigurable scheme for source based routing is presented in [8]. If a source cannot reach a destination, it floods a path request through the network. Each node stores the port via which the request was received in a table. When the request reaches the destination, a packet is sent back to the source using the reverse path
NoC topology
The topology of an NoC can be represented by a directed graph where N is the set of network nodes and C is the set of unidirectional channels. In a typical fault free NoC topology, two connected nodes ni, nj ∈ N have one bidirectional connection, i.e. ci, j, cj, i ∈ C. We refer to bidirectional connections as links and unidirectional connections are called channels. To distinguish between different nodes, each node has a unique ID.
Up/Down routing
In literature, various approaches (e.g. [10], [15], [16])
Hierarchical network concept
Our approach is based on introducing hierarchical levels to a network topology and organizing network nodes into hierarchical units on each level. For a topology T, hmax hierarchy levels may be defined, where corresponds to the flat hierarchy network. Each level 0 ≤ h ≤ hmax, consists of hierarchical units, which are composed of a set of connected network nodes. Each node must belong to exactly one unit on each level. On the lowest level a unit corresponds to a node.
The hierarchical
Hierarchical routing
The aim of our hierarchical routing approach is the online adaptation of routing in case of permanent faults in the communication structure. For our hierarchical routing, we consider permanent channel and link faults caused during manufacturing (e.g. bridging) or broken wires caused by aging. Further, we consider the complete failure of a switch. To minimize the time required for adaptation as well as the required communication overhead, the adaptation process is not performed globally in the
Routing reconfiguration process
This section describes the routing reconfiguration process. Please note that the test process to detect faults is out of scope of this work. We assume that in the network a test mechanism such as [18], [19], or [20] exists to detect faults. We further assume that each switch communicates the availability of a channel by means of availability signals to its neighbors. If a channel is available its availability signal is set to one otherwise to zero.
The reconfiguration process of the routing
Evaluation
We evaluate our reconfigurable routing regarding the required hardware implementation overhead (Section 7.1), the impact of different unit sizes on the network performance (Section 7.2), and the reconfiguration process in presence of faults (Section 7.3).
To evaluate our hierarchical routing, we have taken mesh networks with flat hierarchy as well as hierarchy levels into account. For level we have considered unit sizes s from 2 × 2 to 16 × 8. The size of a level 2 unit refers to the
Conclusion
Adding logical hierarchy to Networks-on-Chip (NoCs) offers significant benefits compared to NoCs with flat organization. In particular, logical hierarchy makes routing tables a feasible design choice. This is achieved by having full table entries only for nodes in the same logical network unit, and by merging routing information for other nodes through hierarchical abstraction. Thereby, a routing table for a switch in a 256 node NoC requires only less than 20% of the switch’s chip area, whereas
Acknowledgment
This work has been supported by the German Research Foundation (Deutsche Forschungsgemeinschaft - DFG) under grant Ra 1889/4-1.
Gert Schley received the Dipl-Ing (FH) degree in electrical engineering and information technologies in 2007 and the M.S. degree in embedded systems engineering in 2009 from the University of Applied Sciences, Pforzheim, Germany. Since 2009, he has been a Research Scientist with the Embedded Systems Group, University of Stuttgart. His research interests include hierarchical architectures and cross-layer fault tolerance for Network-on-Chip.
References (23)
- Intel Xeon Phi Coprocessor. visited April 2015....
- Tilera Tile-MX Multicore Processor. visited April 2015....
- International Technology Roadmap For Semiconductors. visited April 2015....
Designing reliable systems from unreliable components: the challenges of transistor variability and degradation
IEEE Micro
(2005)Electromigration and its impact on physical design in future technologies
Proceedings of ACM international symposium on physical design (ISPD)
(2013)- et al.
Autonet: a high-speed, self-configuring local area network using point-to-point links
IEEE Journal on Selected Areas in Communications
(1991) - et al.
Fault tolerant routing for hierarchically organized networks-on-chip
Proceedings of 23rd euromicro international conference on parallel, distributed and network-based processing (PDP)
(2015) - et al.
Topology-agnostic fault-tolerant NoC routing method
Proceedings of design, automation & test in Europe conference (DATE)
(2013) - et al.
A deadlock-free dynamic reconfiguration scheme for source routing networks using close up*/down* graphs
IEEE Trans Parallel Distrib Syst (TPDS)
(2011) - et al.
Ariadne: Agnostic reconfiguration in a disconnected network environment
Proceedings of IEEE/ACM international conference on parallel architectures and compilation techniques (PACT)
(2011)
Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori
Proceedings of the 20th international parallel and distributed processing symposium (IPDPS)
Cited by (0)
Gert Schley received the Dipl-Ing (FH) degree in electrical engineering and information technologies in 2007 and the M.S. degree in embedded systems engineering in 2009 from the University of Applied Sciences, Pforzheim, Germany. Since 2009, he has been a Research Scientist with the Embedded Systems Group, University of Stuttgart. His research interests include hierarchical architectures and cross-layer fault tolerance for Network-on-Chip.
Ibrahim Ahmed received his Bachelor of Science in electronics engineering from the German University in Cairo in 2012. He received his Masters of Science in information technology and embedded systems from the University of Stuttgart in 2014. Since 2015, he has been a PhD student at University of Toronto. His research interests include FPGA architecture, VLSI and computer architecture.
Muhammad Afzal received B.Sc Electrical Engineering from AJK University - Pakistan and M.Sc Embedded Systems Engineering from University of Stuttgart - Germany, in 2009 and 2014, respectively. He contributed in research work related to reconfigurable NoC switch in collaboration with Embedded Systems Group, University of Stuttgart. Since 2014 he has been working as an Application Engineer in Altium Europe GmbH.
Martin Radetzki is Professor of Embedded Systems Engineering with the University of Stuttgart. He received the Dipl.-Inform. and Dr.-Ing. degrees from the University of Oldenburg, Germany, in 1996 and 2000, respectively. His research interests include modelling and parallel simulation of embedded systems, design of robust systems, and architecture of fault-tolerant networks-on-chip.
- ☆
Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. M. Ebrahimi.