Skip to main content

2012 | Buch

Reliability, Availability and Serviceability of Networks-on-Chip

verfasst von: Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski

Verlag: Springer US

insite
SUCHEN

Über dieses Buch

This book presents an overview of the issues related to the test, diagnosis and fault-tolerance of Network on Chip-based systems. It is the first book dedicated to the quality aspects of NoC-based systems and will serve as an invaluable reference to the problems, challenges, solutions, and trade-offs related to designing and implementing state-of-the-art, on-chip communication architectures.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
The design and manufacturing of integrated circuits is currently based on the ­integration of a number of pre-designed intellectual property (IP) blocks, or cores, in a single chip. Although the reuse has always been present in the design of electronic circuits, this practice has been extended and formalized in the last two decades or so, becoming the new design paradigm of the electronic industry. The reuse of previously designed functional blocks is now the key for the design of high performance circuits with large gate counts in a short time. Such a design practice is known as core-based or IP-based design, or simply as System-on-Chip (SoC). The main difference between a SoC and a traditional System-on-Board (SoB), which is also based on previously designed parts, is that in the former, all cores are synthesized together in a single chip, whereas in the latter each functional block is synthesized and manufactured separately, and then mounted in a discrete board. Furthermore, the reusable blocks of the SoC are also known as virtual components, since they are delivered as a description of logic rather than a manufactured IC, and this constitutes another important difference between traditional design methods and core-based systems.
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Chapter 2. NoC Basics
Abstract
As the number of IP modules in Systems-on-Chip (SoCs) increases, bus-based interconnection architectures may prevent these systems to meet the performance required by many applications. For systems with intensive parallel communication requirements buses may not provide the required bandwidth, latency, and power consumption. A solution for such a communication bottleneck is the use of an embedded switching network, called Network-on-Chip (NoC), to interconnect the IP modules in SoCs. NoCs design space is considerably larger when compared to a bus-based solution, as different routing and arbitration strategies can be implemented as well as different organizations of the communication infrastructure. In addition, NoCs have an inherent redundancy that helps tolerate faults and deal with communication bottlenecks. This enables the SoC designer to find suitable solutions for different system characteristics and constraints.
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Chapter 3. Systems-on-Chip Testing
Abstract
The design cycle of a complex system has greatly improved since the advent of the core-based design paradigm. Nevertheless, as technology evolves, new problems become the focus of attention. Currently, industry seems to be on pace in terms of design productivity and time-to-market, but yield, power dissipation, and reliability issues are still a challenge for complex core-based systems-on-chip (SoCs) (Venkatraman et al. 2009).
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Chapter 4. NoC Reuse for SoC Modular Testing
Abstract
In this chapter we cover the first proposed test approaches that reuse the NoC as Test Access Mechanism (TAM) in a core-based system. First, the basic reuse strategy is presented, including the very few modifications implemented in the network interface, and the definition of the test packets to make the test possible. Then, two test scheduling approaches (preemptive and non-preemptive) are discussed. These basic reuse strategies focus on the definition of specific test scheduling algorithms, since the TAM (NoC) architecture and transport capacity are given. The reuse model and the scheduling algorithms presented here assume a stream-like communication can be established, through the NoC, between the cores under test and the external test sources and sinks. This assumption implies a NoC with guaranteed fixed bandwidth and latency. Other reuse models (use of different test packet models and BE NoCs) are discussed in Chap. 5.
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Chapter 5. Advanced Approaches for NoC Reuse
Abstract
The test scheduling approaches discussed in Chap. 4 demonstrated that NoCs can be as a cost-effective TAM as a dedicated bus-based mechanism. Those approaches are based, however, on a single NoC model and on a few assumptions about the NoC, wrappers, and cores. Indeed, guaranteed services (GS) NoCs were assumed to meet the timing constraints of an external tester. Also, all pins at the core interface (functional and test pins) were assumed to be used during test to receive/deliver test data and the core test frequency was assumed to be equal to the NoC operation frequency. Finally, in those first reuse approaches, the available channel bitwidth may be sub-utilized for cores with a small test interface. In this chapter those assumptions are revised and more recent approaches that consider the NoC reuse in more detail are discussed. First, we present alternative test scheduling algorithms and wrapper models that improve channel utilization and consider additional system requirements such as the thermal budget. Then, the characteristics of the NoC communication protocol are taken into account to generate test interfaces for the external tester and test wrappers for the embedded cores. Those wrappers isolate the communication details and aim at using the available NoC bandwidth with no further assumptions. Based on these DfT structures, a test scheduling algorithm for BE NoCs with different topologies is presented.
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Chapter 6. Test and Diagnosis of Routers
Abstract
This chapter focuses on the testing of part of the Network-on-Chip (NoC) infrastructure, discussing strategies to detect and diagnose manufacturing faults in the routers. Test approaches for these NoC building blocks have based their strategies on functional test, scan-based testing or built-in self-test (BIST). The refereed fault models differ from one work to another, both in terms of abstraction level (functional, register transfer or logic level) and of covered parts (FIFOs, registers, multiplexers, routing logic). A functional-based approach is usually preferred, to reduce NoC re-design costs and to provide at-speed testing. However, scan and BIST-based approaches may be required to enhance both fault coverage and test application time. All these approaches complement each other, in the sense that none can fully cover the faults that may affect all routers of the network.
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Chapter 7. Test and Diagnosis of Communication Channels
Abstract
In complement to the previous chapter, this one discusses strategies to detect and diagnose manufacturing faults in the communication channels, thus covering altogether, the test of the whole Network-on-Chip (NoC) infrastructure. The huge number of interconnects allied to the shrinking of the chip dimensions make the NoC prone to a growing number of wiring faults. The capability of detecting interconnect faults in NoC-based Systems-on-Chip is mandatory for yield improvement. Moreover, fault diagnosis of NoC link wires can help fault tolerance approaches to mitigate the faults and to maintain the network service. Fault models, including stuck-at, bridging, delay and crosstalk faults, interconnect functional test, at-speed interconnect BIST and interconnect diagnosis are discussed in this chapter.
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Chapter 8. Error Control Coding and Retransmission
Abstract
This part of the book is devoted to on-line Network-on-Chip (NoC) testing strategies, while the previous part is devoted to off-line NoC testing strategies. The main difference is that the former detects run-time faults during system’s mission mode, while in the latter is typically used to detect manufacturing defects while the system is in test mode. This chapter is devoted to on-line fault detection on data transmitted over the NoC. Due to the effect of deep submicron (DSM) technologies on the circuit reliability, the designer can no longer assume that the NoC is fault free during its normal execution. This way, designers add test approaches such as error control coding (ECC), data retransmission, or a combination of both to detect and deal with these run-time faults. The problem is that these test approaches have a cost in terms of, for instance, silicon area, codec delay, network congestion, and energy consumption. Thus, the challenge for the designer is to find a good trade-off between these costs and the potential benefit of the test approach in terms of reliability. This chapter presents the most relevant on-line NoC testing strategies that have been proposed and their results about the compromise of costs and reliability.
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Chapter 9. Error Location and Reconfiguration
Abstract
This is the second and the last chapter of this book devoted to on-line Network-on-Chip (NoC) testing strategies. As mentioned before, the main difference of on-line and off-line tests is that the former detects run-time faults during system’s mission mode, while in the latter is typically used to detect manufacturing defects while the system is in test mode. Compared to the previous chapter, this one presents techniques used at the router, NoC, and system levels, while the previous chapter focuses on link and router level techniques. The most used techniques at the router, NoC, and the system levels are fault tolerant and adaptive routing algorithms – where an alternative path is found, avoiding the defective part of the NoC – and fault reconfiguration – where the hardware or the software are reconfigured to mask and isolate the defective block. However, both techniques assume they are able to pinpoint the exact location of a hardware defect. This task alone, called fault location, can be a challenge itself, since NoCs are scalable and they can have hundreds or even thousands of switching elements. Similarly to the previous chapter, the test approaches presented in this chapter also have costs in terms of, for instance, silicon area, network performance, network congestion, and energy consumption. Thus, the challenge for the designer is, again, to find a good trade-off between these costs and the potential benefit of the test approach in terms of reliability. However, this trade-off evaluation is typically much more complex at the NoC level than it is at link or router level, due to the size of NoCs and complex data communication patterns of the applications. This chapter presents the most relevant on-line NoC testing strategies at the NoC and system levels and their results in terms of costs and reliability.
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Chapter 10. Concluding Remarks
Abstract
About a decade ago, networks-on-chip emerged from a potential solution for the intra-chip communication problems arising in complex systems (Guerrier and Greiner 2000), to a major research topic with its specific conferences (NoCs 2011; NoCArch 2011) and then to an industrial reality (Karim et al. 2002; Goossens et al. 2005). A huge amount of works have been proposed on design oriented features of NoCs, creating equally large NoC design diversity (Bjerregaard and Mahadevan 2006). Nevertheless, efficient test and reliability approaches are required to turn NoC-based systems into a consolidated industry reality and to achieve much more challenging designs such as many-core systems. As a matter of fact, a considerable amount of effort has been made towards an economically viable, testable, and reliable NoC-based system. The increasing interest in the topic has motivated the writing of this book, where we have put together and organized such large amount of material, summarizing the most relevant scientific contributions, and identifying some open issues. The final chapter of this book addresses the open issues.
Érika Cota, Alexandre de Morais Amory, Marcelo Soares Lubaszewski
Backmatter
Metadaten
Titel
Reliability, Availability and Serviceability of Networks-on-Chip
verfasst von
Érika Cota
Alexandre de Morais Amory
Marcelo Soares Lubaszewski
Copyright-Jahr
2012
Verlag
Springer US
Electronic ISBN
978-1-4614-0791-1
Print ISBN
978-1-4614-0790-4
DOI
https://doi.org/10.1007/978-1-4614-0791-1

Neuer Inhalt