Reconfigurable fault tolerant routing for networks-on-chip with logical hierarchy

doi:10.1016/j.compeleceng.2016.02.013

Computers & Electrical Engineering

Volume 51, April 2016, Pages 195-206

https://doi.org/10.1016/j.compeleceng.2016.02.013 Get rights and content

Highlights

•
Adding logical hierarchy to networks-on-chip enables table-based routing without excessive chip area overhead. For a 256 node network, the routing table occupies only less than 20% of the switches area. Thanks to the hierarchical network organization, double data throughput is achieved, compared to a flat network of same size.
•
Table-based routing can be used to implement fault-tolerant routing by reconfiguring table entries. The article shows how table entries can be computed efficiently, and how the reconfiguration process can be organized to function reliably even in presence of transmission errors.
•
With proper choice of logical hierarchy, the reconfiguration process takes less than one third of the time required by Ariadne, the state-of-the-art approach for non-hierarchical networks.
•
The additional hardware overhead for fault-tolerant routing table reconfiguration amounts to only 6% of the chip area of a network switch.

Abstract

This paper presents a reconfigurable fault tolerant routing for Networks-on-Chip organized into hierarchical units. In case of link faults or failure of switches, the proposed approach enables the online adaptation of routing locally within each unit while deadlock freedom is globally ensured in the network. Experimental results of our approach for a 16 × 16 network show a speedup by a factor of almost four for routing reconfiguration compared to the state-of-the-art approach. Evaluation with transient faults shows that a dedicated reconfiguration unit enables successful reconfiguration of routing tables even in case of high error probabilities.

Introduction

The ongoing technology scaling allows an increasing number of cores to be implemented on a single chip, e.g. Intel’s Xeon Phi Coprocessor [1] or Tilera’s Tile-MX multicore processor [2]. As this scaling trend continues [3], future multiprocessor systems will feature hundreds of cores on a single chip.

The increasing size of on-chip systems poses a new challenge to Networks-on-Chip (NoCs). Mechanisms implemented in an NoC, such as table-based routing, work well for small systems but do not scale for bigger systems. A possibility to cope with scalability problems is the introduction of a hierarchical structure to NoCs. A hierarchical NoC can be obtained by constructing its topology using subnetworks or by segmenting a given topology into logical units. Both subnetworks and logical units enable formerly global mechanisms to be applied locally thus reducing their complexity. Compared to subnetworks, logical units have the advantage that they can be applied without changing an existing topology. Typically, network nodes (switch + core) are grouped into a logical unit if they are part of the same task and share a spatial relation.

The downside of technology scaling is the increased probability of occurrence of permanent faults in an NoC due to manufacturing inaccuracies [4] or wear-out effects such as electromigration [5] that emerge during system operation. The failure of links or switches due to permanent faults results in an altered network topology. In such a case, static routing can no longer maintain connectivity between system components. For this reason, it is crucial that the routing is adapted to the new network situation to enable packets to circumvent faulty components.

In this paper we present a reconfigurable fault tolerant routing approach based on Up/Down routing [6] for large scale NoCs with logical hierarchy. It enables the routing to be adapted locally within each hierarchical unit in case of permanent faults while deadlock freedom is guaranteed globally. Our approach can be applied to any number of hierarchy levels.

The remainder of the paper is organized as follows: In Section 2 related work is discussed. Section 3 contains a formal introduction to NoC topologies as well as an introduction to Up/Down routing. In Section 4 we present our hierarchical network concept. Our hierarchical routing is presented in Section 5 and the reconfiguration process in Section 6. Evaluation results are discussed in Section 7. Section 8 concludes the paper.

Section snippets

Related work

In this section, we focus on work related to routing reconfiguration. Related work dealing with hierarchical topologies and hierarchical routing is presented in [7].

A reconfigurable scheme for source based routing is presented in [8]. If a source cannot reach a destination, it floods a path request through the network. Each node stores the port via which the request was received in a table. When the request reaches the destination, a packet is sent back to the source using the reverse path

NoC topology

The topology of an NoC can be represented by a directed graph $h_{m a x} = 1$ where N is the set of network nodes and C is the set of unidirectional channels. In a typical fault free NoC topology, two connected nodes n_i, n_j ∈ N have one bidirectional connection, i.e. c_{i, j}, c_{j, i} ∈ C. We refer to bidirectional connections as links and unidirectional connections are called channels. To distinguish between different nodes, each node has a unique ID.

Up/Down routing

In literature, various approaches (e.g. [10], [15], [16])

Hierarchical network concept

Our approach is based on introducing hierarchical levels to a network topology and organizing network nodes into hierarchical units on each level. For a topology T, h_max hierarchy levels may be defined, where $h = 0$ corresponds to the flat hierarchy network. Each level 0 ≤ h ≤ h_max, consists of hierarchical units, which are composed of a set of connected network nodes. Each node must belong to exactly one unit on each level. On the lowest level $h - 1$ a unit corresponds to a node.

The hierarchical

Hierarchical routing

The aim of our hierarchical routing approach is the online adaptation of routing in case of permanent faults in the communication structure. For our hierarchical routing, we consider permanent channel and link faults caused during manufacturing (e.g. bridging) or broken wires caused by aging. Further, we consider the complete failure of a switch. To minimize the time required for adaptation as well as the required communication overhead, the adaptation process is not performed globally in the

Routing reconfiguration process

This section describes the routing reconfiguration process. Please note that the test process to detect faults is out of scope of this work. We assume that in the network a test mechanism such as [18], [19], or [20] exists to detect faults. We further assume that each switch communicates the availability of a channel by means of availability signals to its neighbors. If a channel is available its availability signal is set to one otherwise to zero.

The reconfiguration process of the routing

Evaluation

We evaluate our reconfigurable routing regarding the required hardware implementation overhead (Section 7.1), the impact of different unit sizes on the network performance (Section 7.2), and the reconfiguration process in presence of faults (Section 7.3).

To evaluate our hierarchical routing, we have taken mesh networks with flat hierarchy as well as $h_{m a x} = 3$ hierarchy levels into account. For level $h = 1$ we have considered unit sizes s from 2 × 2 to 16 × 8. The size of a level 2 unit refers to the

Conclusion

Adding logical hierarchy to Networks-on-Chip (NoCs) offers significant benefits compared to NoCs with flat organization. In particular, logical hierarchy makes routing tables a feasible design choice. This is achieved by having full table entries only for nodes in the same logical network unit, and by merging routing information for other nodes through hierarchical abstraction. Thereby, a routing table for a switch in a 256 node NoC requires only less than 20% of the switch’s chip area, whereas

Acknowledgment

This work has been supported by the German Research Foundation (Deutsche Forschungsgemeinschaft - DFG) under grant Ra 1889/4-1.

Gert Schley received the Dipl-Ing (FH) degree in electrical engineering and information technologies in 2007 and the M.S. degree in embedded systems engineering in 2009 from the University of Applied Sciences, Pforzheim, Germany. Since 2009, he has been a Research Scientist with the Embedded Systems Group, University of Stuttgart. His research interests include hierarchical architectures and cross-layer fault tolerance for Network-on-Chip.

References (23)

Intel Xeon Phi Coprocessor. visited April 2015....
Tilera Tile-MX Multicore Processor. visited April 2015....
International Technology Roadmap For Semiconductors. visited April 2015....
BorkarS.
Designing reliable systems from unreliable components: the challenges of transistor variability and degradation
IEEE Micro
(2005)
LienigJ.
Electromigration and its impact on physical design in future technologies
Proceedings of ACM international symposium on physical design (ISPD)
(2013)
SchroederM. et al.
Autonet: a high-speed, self-configuring local area network using point-to-point links
IEEE Journal on Selected Areas in Communications
(1991)
SchleyG. et al.
Fault tolerant routing for hierarchically organized networks-on-chip
Proceedings of 23rd euromicro international conference on parallel, distributed and network-based processing (PDP)
(2015)
WachterE. et al.
Topology-agnostic fault-tolerant NoC routing method
Proceedings of design, automation & test in Europe conference (DATE)
(2013)
Robles-GomezA. et al.
A deadlock-free dynamic reconfiguration scheme for source routing networks using close up*/down* graphs
IEEE Trans Parallel Distrib Syst (TPDS)
(2011)
AisoposK. et al.
Ariadne: Agnostic reconfiguration in a disconnected network environment
Proceedings of IEEE/ACM international conference on parallel architectures and compilation techniques (PACT)
(2011)

MejiaA. et al.

Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori

Proceedings of the 20th international parallel and distributed processing symposium (IPDPS)

(2006)

Cited by (0)

Ibrahim Ahmed received his Bachelor of Science in electronics engineering from the German University in Cairo in 2012. He received his Masters of Science in information technology and embedded systems from the University of Stuttgart in 2014. Since 2015, he has been a PhD student at University of Toronto. His research interests include FPGA architecture, VLSI and computer architecture.

Muhammad Afzal received B.Sc Electrical Engineering from AJK University - Pakistan and M.Sc Embedded Systems Engineering from University of Stuttgart - Germany, in 2009 and 2014, respectively. He contributed in research work related to reconfigurable NoC switch in collaboration with Embedded Systems Group, University of Stuttgart. Since 2014 he has been working as an Application Engineer in Altium Europe GmbH.

Martin Radetzki is Professor of Embedded Systems Engineering with the University of Stuttgart. He received the Dipl.-Inform. and Dr.-Ing. degrees from the University of Oldenburg, Germany, in 1996 and 2000, respectively. His research interests include modelling and parallel simulation of embedded systems, design of robust systems, and architecture of fault-tolerant networks-on-chip.

^☆: Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. M. Ebrahimi.

View full text

Article preview

Computers & Electrical Engineering

Highlights

Abstract

Introduction

Section snippets

Related work

NoC topology

Up/Down routing

Hierarchical network concept

Hierarchical routing

Routing reconfiguration process

Evaluation

Conclusion

Acknowledgment

References (23)

Designing reliable systems from unreliable components: the challenges of transistor variability and degradation

IEEE Micro

Electromigration and its impact on physical design in future technologies

Proceedings of ACM international symposium on physical design (ISPD)

Autonet: a high-speed, self-configuring local area network using point-to-point links

IEEE Journal on Selected Areas in Communications

Fault tolerant routing for hierarchically organized networks-on-chip

Proceedings of 23rd euromicro international conference on parallel, distributed and network-based processing (PDP)

Topology-agnostic fault-tolerant NoC routing method

Proceedings of design, automation & test in Europe conference (DATE)

A deadlock-free dynamic reconfiguration scheme for source routing networks using close up/down graphs

IEEE Trans Parallel Distrib Syst (TPDS)

Ariadne: Agnostic reconfiguration in a disconnected network environment

Proceedings of IEEE/ACM international conference on parallel architectures and compilation techniques (PACT)

Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori

Proceedings of the 20th international parallel and distributed processing symposium (IPDPS)

Cited by (0)

Computers & Electrical Engineering

Reconfigurable fault tolerant routing for networks-on-chip with logical hierarchy☆

Highlights

Abstract

Introduction

Section snippets

Related work

NoC topology

Up/Down routing

Hierarchical network concept

Hierarchical routing

Routing reconfiguration process

Evaluation

Conclusion

Acknowledgment

Designing reliable systems from unreliable components: the challenges of transistor variability and degradation

IEEE Micro

Electromigration and its impact on physical design in future technologies

Proceedings of ACM international symposium on physical design (ISPD)

Autonet: a high-speed, self-configuring local area network using point-to-point links

IEEE Journal on Selected Areas in Communications

Fault tolerant routing for hierarchically organized networks-on-chip

Proceedings of 23rd euromicro international conference on parallel, distributed and network-based processing (PDP)

Topology-agnostic fault-tolerant NoC routing method

Proceedings of design, automation & test in Europe conference (DATE)

A deadlock-free dynamic reconfiguration scheme for source routing networks using close up*/down* graphs

IEEE Trans Parallel Distrib Syst (TPDS)

Ariadne: Agnostic reconfiguration in a disconnected network environment

Proceedings of IEEE/ACM international conference on parallel architectures and compilation techniques (PACT)

Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori

Proceedings of the 20th international parallel and distributed processing symposium (IPDPS)

A deadlock-free dynamic reconfiguration scheme for source routing networks using close up/down graphs