Skip to main content

2014 | Buch

Routing Algorithms in Networks-on-Chip

herausgegeben von: Maurizio Palesi, Masoud Daneshtalab

Verlag: Springer New York

insite
SUCHEN

Über dieses Buch

This book provides a single-source reference to routing algorithms for Networks-on-Chip (NoCs), as well as in-depth discussions of advanced solutions applied to current and next generation, many core NoC-based Systems-on-Chip (SoCs). After a basic introduction to the NoC design paradigm and architectures, routing algorithms for NoC architectures are presented and discussed at all abstraction levels, from the algorithmic level to actual implementation. Coverage emphasizes the role played by the routing algorithm and is organized around key problems affecting current and next generation, many-core SoCs. A selection of routing algorithms is included, specifically designed to address key issues faced by designers in the ultra-deep sub-micron (UDSM) era, including performance improvement, power, energy, and thermal issues, fault tolerance and reliability.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Basic Concepts on On-Chip Networks
Abstract
As the number of cores integrated into a System-on-Chip increases, the role played by the communication system becomes more and more important. The Network-on-Chip design paradigm is today recognised as the most viable communication infrastructure to deal with the scalability issues which characterise the ultra-deep sub-micron silicon era. In this chapter, some of the most important concepts in the context of on-chip networks will be reviewed. Basic concepts including, network topologies, switching techniques, and routing algorithms will be recalled. Such topics represent the conceptual bases exploited by the strategies, the mechanisms, and the methodologies discussed in the subsequent chapters.
Masoud Danashtalab, Maurizio Palesi

Performance Improvement

Frontmatter
Chapter 2. A Heuristic Framework for Designing and Exploring Deterministic Routing Algorithm for NoCs
Abstract
In this chapter, we present a system-level framework for designing minimal deterministic routing algorithms for Networks-on-Chip (NoCs) that are customized for a set of applications. To this end, we first formulate an optimization problem of minimizing average packet latency in the network and then use the simulated annealing heuristic to solve this problem. To estimate the average packet latency we use a queueing-based analytical model which can capture the burstiness of the traffic. The proposed framework does not require virtual channels to guarantee deadlock freedom since routes are extracted from an acyclic channel dependency graph. Experiments with both synthetic and realistic workloads show the effectiveness of the approach. Results show that maximum sustainable throughput of the network is improved for different applications and architectures.
Abbas Eslami Kiasari, Axel Jantsch, Zhonghai Lu
Chapter 3. Run-Time Deadlock Detection
Abstract
Relentless technology downscaling provides a promising prospect to realize heterogeneous system-on-chip (SoC) and homogeneous chip-multiprocessor (CMP) based on the networks-on-chip (NoCs) paradigm with augmented scalability, modularity and performance. In many cases in such systems, scheduling and managing communication resources are the major design and implementation challenges instead of the computing resources. Moreover, adaptive packets routing in such systems are susceptible to routing deadlock, which could lead to performance degradation or system failure. Past research efforts were mainly focused on complex design-time approaches to avoid deadlock or simple heuristic run-time approaches to detect and recover from deadlock. The later, however, consider only local or partial information about the network which may produce substantial false detections, especially with the network close to saturation where blocked packets could be fagged as deadlocked.This chapter studies deadlock detection and recovery strategy in NoCs as opposed to deadlock avoidance. It presents a deadlock detection method that utilizes run-time transitive-closure (TC) computation to discover the existence of deadlock-equivalence sets, which imply loops of requests in NoCs. This detection scheme guarantees the discovery of all true-deadlocks without false alarms in contrast with state-of-the-art approximation and heuristic methods. A distributed TC-network architecture, which couples with the network-on-chip (NoC) infrastructure, is also presented to realize the detection mechanism efficiently. Detailed hardware realization architectures and schematics are also discussed. The captured results based on a cycle-accurate simulator demonstrate the effectiveness of the proposed method. It drastically outperforms timing-based deadlock detection mechanisms by eliminating false detections and, thus, reducing energy wastage in retransmission for various traffic scenarios including real-world application.
Ra’ed Al-Dujaily, Terrence Mak, Fei Xia, Alex Yakovlev, Maurizio Palesi
Chapter 4. The Abacus Turn Model
Abstract
Applications’ traffic tends to be bursty and the locations of hot-spot nodes vary with time. This will dramatically aggregate the blocking problem of wormhole-switched Network-on-Chip (NoC). Most of state-of-the-art traffic balancing solutions, which are based on fully adaptive routing algorithms, may introduce large time/space overhead to routers. On the other hand, partially adaptive routing algorithms, which are time/space efficient, are lack of even or sufficient routing adaptiveness. Due to the lack of a practical model to dynamically keep the routing algorithm deadlock free, most of reconfigurable routing algorithms, which could provide on-demand routing adaptiveness for reducing blocking, are off-line solutions. In this chapter, we will discuss the abacus turn model (AbTM). It was proposed to keep the network deadlock-free by dynamically applying forbidden turns. By this way, the AbTM-based routing algorithms are deadlock-free and on-line reconfigurable. In addition to the AbTM, this chapter also includes a brief discussion about the routing reconfiguration techniques. On detecting the congestion, the reconfigurable routing algorithm first decides the new routing rules. Afterwards, the routing reconfiguration algorithm is exploited to transform the routing function from the old one to the new one. A well-designed routing reconfiguration technique could finish the process in a very short time without blocking and dropping packets. Furthermore, this chapter also briefly discusses the possible solutions to implement reconfigurable routing algorithms, such as table-based solution and LBDR-based solution. Finally, the routing performance will be discussed at the end of this chapter.
Binzhang Fu, Yinhe Han, Huawei Li, Xiaowei Li
Chapter 5. Learning-Based Routing Algorithms for On-Chip Networks
Abstract
In this chapter, we investigate highly adaptive routing algorithms for balancing the traffic over the network based on learning approaches. The proposed methods aim to provide up-to-dated local and global congestion information at each switch. At first, the learning method is applied to a network utilizing the minimal routing. In low traffic loads, minimal methods can achieve optimal performance, while they are inefficient in avoiding hotspots when the traffic load increases. The reason of this inefficiency is that minimal methods can propagate messages through at most two directions at each switch. When the shortest paths are congested, sending more messages through them can deteriorate the congestion condition considerably. In order to address this issue, we present a non-minimal routing algorithm for on-chip networks that provides a wide range of alternative paths between each pair of source and destination switches. Initially, the algorithm determines all permitted turns in the network including 180-degree turns on a single channel without creating cycles. The implementation of the algorithm provides the best usage of all allowable turns to route messages more adaptively in the network. On top of that, for selecting a less congested path, an optimized and scalable learning method is utilized. The learning method is based on local and global congestion information and can estimate the latency from each output channel to the destination region.
Masoumeh Ebrahimi, Masoud Daneshtalab

Multicast Communication

Frontmatter
Chapter 6. Efficient and Deadlock-Free Tree-Based Multicast Routing Methods for Networks-on-Chip (NoC)
Abstract
This chapter presents a new efficient and deadlock free tree-based multicast routing method and concept. The presented deadlock-free multicast routing algorithm can be implemented on a network-on-chip (NoC) router microarchitecture, realizing a mesh planar network topology. The NoC microarchitecture supports both deadlock-free static and efficient adaptive tree-based multicast routing. Multicast packets are routed and scheduled in the NoC by using a flexible multiplexing/interleaving technique with wormhole switching. The flexibility of the proposed multicast routing method is based on a locally managed packet identity (ID-tag) attached to every flit. This concept allows different packets to be interleaved at flit-level in a single buffer pool on the same link. Furthermore, a pheromone tracking strategy presented in this chapter, which is used to reduce communication energy in the adaptive tree-based multicast routing method. The strategy is used to perform efficient spanning trees for the adaptive tree-based multicast routing which are generated at runtime.
Faizal Arya Samman, Thomas Hollstein
Chapter 7. Path-Based Multicast Routing for 2D and 3D Mesh Networks
Abstract
In this chapter, we address how to implement unicast and multicast routing algorithms efficiently in 2D and 3D mesh networks. To do this, we present several partitioning methods for the path-based multicast approach with different levels of efficiency. In path-based methods, a multicast message is routed along a path and the message is transferred to the destinations along this path. Partitioning methods divide the network into several logical partitions and assign destinations to different sets; one set for each partition covering destinations that belong to that partition. Smart partitioning methods must balance the sets and reduce the path length within each partition. All of the partitioning methods can be supported by a deterministic routing algorithm. However, in order to increase the performance, we design a general minimal and adaptive routing algorithm which is based on the Hamiltonian path and can be applied to all partitioning methods. The algorithm is simple and does not require any virtual channel for neither unicast nor multicast messages.
Masoumeh Ebrahimi, Masoud Daneshtalab, Pasi Liljeberg, Juha Plosila, Hannu Tenhunen

Fault Tolerance and Reliability

Frontmatter
Chapter 8. Fault-Tolerant Routing Algorithms in Networks On-Chip
Abstract
As the semiconductor industry advances to the deep sub-micron and nano technology points, the on-chip components are more prone to the defects during manufacturing and faults during system life time. These components include the Networks-on-Chip (NoCs) which are expected to be an important part of the future complex multi-core and many-core chips. As a result, fault tolerant techniques are essential to improve the yield of modern complex chips. In this chapter, we propose a fault-tolerant routing algorithm that keeps the negative effect of faulty components on the NoC power and performance as low as possible. Targeting intermittent faults, we achieve fault tolerance by employing a simple and fast mechanism composed of two processes: NoC monitoring and route adaption. The former keeps the track of the on-chip traffic pattern and faulty links, whereas the latter adapts the packet paths to the current set of faulty components. This mechanism exploits the global information of the state of the NoC components and on-chip traffic pattern and aims to minimize the performance loss and the power overhead imposed by the faulty NoC links and nodes. Experimental results show the effectiveness of the proposed technique, in that it offers lower average message latency and power consumption and a higher reliability, compared to some state-of-the art related work.
Reyhaneh Jabbarvand Behrouz, Mehdi Modarressi, Hamid Sarbazi-Azad
Chapter 9. Reliable and Adaptive Routing Algorithms for 2D and 3D Networks-on-Chip
Abstract
Faults may have undesirable effects on the correct operation of the system or at least the performance. NoC inherently has a potential of being a more reliable infrastructure than buses by providing alternative paths between each pair of source and destination routers. However, this potential cannot be utilized without the support of fault-tolerant routing algorithms. In this chapter, we take a detail view of implementing high-performance fault-tolerant routing algorithms in 2D and 3D mesh networks. The required turn models are discussed and all fault conditions are investigated. As faults may occur links and routers, this chapter investigates both types of faults. Unlike traditional methods in which the performance is degraded significantly in faulty situations, the proposed fault-tolerant routing algorithms perform well in maintaining the performance of a faulty network. This is achieved by using the shortest paths as long as possible to bypass faults while non-minimal paths are used when necessary. The proposed methods can be adjusted to balance between reliability and performance. This chapter gives an extensive knowledge to develop a fault-tolerant routing algorithm based on the characteristics of an NoC.
Masoumeh Ebrahimi

Power/Energy and Thermal Issues

Frontmatter
Chapter 10. Bufferless and Minimally-Buffered Deflection Routing
Abstract
A conventional Network-on-Chip (NoC) router uses input buffers to store in-flight packets. These buffers improve performance, but consume significant power. It is possible to bypass these buffers when they are empty, reducing dynamic power, but static buffer power remains, and when buffers are utilized, dynamic buffer power remains as well. To improve energy efficiency, bufferless deflection routing removes input buffers, and instead uses deflection (misrouting) to resolve contention. Bufferless deflection routing is able to provide similar network performance to conventional buffered routing when the network carries light to moderate traffic, because deflections are relatively rare. However, at high network load, deflections cause unnecessary network hops, wasting power and reducing performance. In order to avoid some deflections and recover some performance, recent work has proposed to add a small buffer which holds only flits that contend with others and would have been deflected. This minimally-buffered deflection (MinBD) router improves performance relative to bufferless deflection routing without incurring the cost of a large buffer, because it can make more efficient use of a small buffer. The result is a router design which is more energy-efficient than prior buffered, bufferless, and hybrid router designs.
Chris Fallin, Greg Nazario, Xiangyao Yu, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu
Chapter 11. A Thermal Aware Routing Algorithm for Application-Specific Network-on-Chip
Abstract
In this chapter, we propose a routing algorithm to reduce the hotspot temperature for application-specific Network-on-chip (NoC). Using the traffic information of the applications as well as the mapping of the tasks onto the NoC, we design a routing scheme which can achieve a higher adaptivity than the generic ones and at the same time distribute the traffic more uniformly across the chip. A set of admissible paths which is deadlock free for all the communications is first obtained. To reduce the hotspot temperature, we propose to route packets using the optimal distribution ratio of the communication traffic among the set of candidate paths. The problem of finding this optimal distribution ratio is formulated as a linear programming (LP) problem and is solved in the offline phase. A router microarchitecture which supports the ratio-based selection policy is also proposed. From the simulation results, the peak energy reduction considering the energy consumption of both the processor elements and routers can be as high as 20% for synthetic traffic and real benchmarks.
Zhiliang Qian, Chi-Ying Tsui

Emerging Technologies

Frontmatter
Chapter 12. Traffic- and Thermal-Aware Routing Algorithms for 3D Network-on-Chip (3D NoC) Systems
Abstract
Three-dimensional Network-on-Chip (3D NoC) has been proposed to solve the complex on-chip communication issues in future 3D multicore systems. However, the thermal problems of 3D NoC are more serious than 2D NoC due to stacking dies. To keep the temperature below a certain thermal limit, many approaches of run-time thermal management were proposed. In this chapter, we will introduce some design concepts of traffic- and thermal-aware routing algorithms, which aim at minimize the performance impact caused by the run-time thermal managements. The investigative approaches can mitigate the design challenges of 3D NoC systems. Without the enhancement of cooling devices, the 3D NoC system can still be thermal-safe. Besides, the advantages of 3D integration are preserved, because the thermal-limited performance back-off is reduced.
Kun-Chih Chen, Chih-Hao Chao, Shu-Yen Lin, An-Yeu (Andy) Wu
Chapter 13. Scalable Architecture for All-Optical Wavelength-Routed Networks-on-Chip
Abstract
Scaling the transistor feature size along with the increase of chip operation frequency leads to the growth of circuit complexity, which makes the design of electrical interconnects increasingly difficult in large chips. Optics provides low power dissipation which remains independent of capacity and distance, as well as wavelength parallelism, ultra-high throughput, and minimal access latencies. Additionally, wavelength routing, bit rate transparency, high-capacity, low propagation loss, and low power dissipation of silicon photonics are attractive for realizing optical Networks-on-Chip (ONoC) in Chip Multi-Processors (CMPs). In this chapter, we propose a new architecture for nanophotonic NoC, named 2D-HERT, which consists of optical data and control planes. The proposed data plane is built upon a new topology and all-optical switches that passively route optical data streams based on their wavelengths. Utilizing wavelength routing method, the proposed deterministic routing algorithm, and Wavelength Division Multiplexing (WDM) technique, the proposed data plane eliminates the need for optical resource reservation at the intermediate nodes. For resolving end-point contention, we propose an all-optical request-grant arbitration architecture which reduces optical losses compared to the alternative arbitration schemes. By performing a series of simulations, we study the efficiency of the proposed architecture, its power and energy consumption, and the data transmission delay.
Somayyeh Koohi, Shaahin Hessabi

Industrial Case Study

Frontmatter
Chapter 14. On Chip Network Routing for Tera-Scale Architectures
Abstract
The emergence of Tera-scale architectures features the interconnection of tens to several hundred general purpose cores to each other and with other IP blocks. The high level requirements of the underlying interconnect infrastructure include low latency, high-throughput, scalable performance, flexible and adaptive routing, support for isolated partitions, fault-tolerance, and support for irregular or partially enabled configurations. This chapter presents the architecture and routing algorithms for supporting these requirements in the overall framework of mesh and torus-based point-to-point interconnect topologies. The requirements and desired attributes for tera-scale interconnects are outlined. This is followed by an overview of the interconnect architecture and micro-architecture framework. The descriptions of various routing algorithms supported are at the heart of the chapter, and include various minimal deterministic and adaptive routing algorithms for mesh and torus networks, a novel load-balanced routing algorithm called pole-routing, and performance-isolation routing in non-rectangular mesh partitions. The implementation aspects of these topics is covered through an overview of the environment for prototyping, debugging, performance evaluation and visualization in the context of specific interconnect configurations of interest. Overall, this chapter aims to illustrate a comprehensive approach in architecting (and micro-architecting) a scalable and flexible on-die interconnect and associated routing algorithms that are applicable to a wide range of applications in an industry setting.
Aniruddha S. Vaidya, Mani Azimi, Akhilesh Kumar
Metadaten
Titel
Routing Algorithms in Networks-on-Chip
herausgegeben von
Maurizio Palesi
Masoud Daneshtalab
Copyright-Jahr
2014
Verlag
Springer New York
Electronic ISBN
978-1-4614-8274-1
Print ISBN
978-1-4614-8273-4
DOI
https://doi.org/10.1007/978-1-4614-8274-1

Neuer Inhalt