Skip to main content
main-content

Über dieses Buch

This book addresses reliability and energy efficiency of on-chip networks using cooperative error control. It describes an efficient way to construct an adaptive error control codec capable of tracking noise conditions and adjusting the error correction strength at runtime. Methods are also presented to tackle joint transient and permanent error correction, exploiting the redundant resources already available on-chip. A parallel and flexible network simulator is also introduced, which facilitates examining the impact of various error control methods on network-on-chip performance.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract
Thanks to the rapid advancement of technology in semiconductor device fabrication, billions of transistors can be integrated to a single die [1–5]. Although the increasing chip density potentially facilitates systems-on-chip (SoCs) and chip multiprocessor (CMP) integrating hundreds or thousands of processing element/memory cores, several challenges prevent system further progress, such as design complexity, high-performance interconnect and scalable on-chip communication architecture [6–9]. Networks-on-chip (NoCs) becomes a promising paradigm, which manages the increasing interconnect complexity and facilitates the integration of various intellectual property (IP) cores [10–15].
Qiaoyan Yu, Paul Ampadu

Chapter 2. Existing Transient and Permanent Error Management in NoCs

Abstract
Error control schemes combined with various error control codes are typically employed to handle the transient errors. Physical layer techniques, such as spare wire replace and split transmission, and network layer approaches, such as fault-tolerant routing have been widely investigated for permanent error management. In this chapter, we will review the state-of-the-art techniques for transient and permanent error management in NoCs.
Qiaoyan Yu, Paul Ampadu

Chapter 3. Adaptive Error Control Coding at Datalink Layer

Abstract
Reliable on-chip communication in a multi-ore system-on-chip is one of the most important challenges [1–3]. Networks-on-chip (NoCs) have been proposed to facilitate on-chip communication [4–6]. Within this framework, many coding methods have been examined to handle transient errors in NoC links [6–14]. These works typically assume that the probability of error is very low and use simple error detection codes combined with retransmission to save energy [7,8,15]. Unfortunately, as technology scales deep into the nanometer regime, on-chip communication becomes more susceptible to increased crosstalk, external radiation and spurious voltage spikes than before; thus, the number of erroneous bits per flit (flow control unit) is expected to increase [16–18]. As a result, more powerful codes are needed to provide improved error resilience against multi-bit errors.
Qiaoyan Yu, Paul Ampadu

Chapter 4. Transient and Permanent Link Errors Co-Management

Abstract
Transient and permanent errors can be co-managed at the network layer. Ali et al. use end-to-end error detection and retransmission to deal with transient errors; they utilize deterministic rerouting to avoid broken links [1]. Sanusi et al. apply single-error correction and multiple-error detection to packets at the destination, and request flooding if permanent errors are detected [2]. Handling all errors in the network layer increases the burden of that layer.
Qiaoyan Yu, Paul Ampadu

Chapter 5. Dual-Layer Cooperative Error Control for Transient Error

Abstract
Datalink-layer adaptive error control methods have been investigated in Chap. 3. In this chapter, we extend the error control adaptation to a two-layer approach communicating between the datalink and network layers to further reduce energy consumption. We employ end-to-end error control in the network interface in low noise conditions, and enhance the error control capability in high noise regions by turning on hop-to-hop error control in the router. Simply combining end-to-end error control with hop-to-hop error control typically results in huge energy consumption. Consequently, we apply the concept of product code to the NoC, performing cross-layer cooperative error control. Another major contribution is a protocol to switch between network-layer ECC and datalink-layer ECC at runtime.
Qiaoyan Yu, Paul Ampadu

Chapter 6. A Flexible Parallel Simulator for Networks-on-Chip with Error Control

Abstract
To fill in the gap between NoC simulator implementation and NoC error control exploration, we develop an NoC simulator that facilitates comprehensively investigation of the impact of different error control methods on NoC performance and energy consumption. The main functionality of the proposed simulator, plug-and-play error control coding (ECC) insertion and the flexible fault injection environment are introduced in this chapter. Energy estimation and improvements on simulation speed and memory consumption are analyzed, as well.
Qiaoyan Yu, Paul Ampadu

Chapter 7. Conclusions and Future Directions

Abstract
This book presents a multi-layer solution to address transient and permanent errors in networks-on-chip. The main contributions are as follows: (1) adaptive error control codec design and implementation, (2) transient and permanent error co-management, (3) dual-layer cooperative error control, (4) flexible and parallel NoC simulator development.
Qiaoyan Yu, Paul Ampadu

Backmatter

Weitere Informationen