Skip to main content
main-content

Über dieses Buch

This book describes methods to address wearout/aging degradations in electronic chips and systems, caused by several physical mechanisms at the device level. The authors introduce a novel technique called accelerated active self-healing, which fixes wearout issues by enabling accelerated recovery. Coverage includes recovery theory, experimental results, implementations and applications, across multiple nodes ranging from planar, FD-SOI to FinFET, based on both foundry provided models and predictive models.

Presents novel techniques, tested with experiments on real hardware;Discusses circuit and system level wearout recovery implementations, many of these designs are portable and friendly to the standard design flow;Provides circuit-architecture-system infrastructures that enable the accelerated self-healing for future resilient systems;Discusses wearout issues at both transistor and interconnect level, providing solutions that apply to both;Includes coverage of resilient aspects of emerging applications such as IoT.

Inhaltsverzeichnis

Frontmatter

Overview

Frontmatter

Chapter 1. Introduction to Wearout

Abstract
Over the last decade, CMOS wearout emerged as one of the most critical threats to circuit performance and system reliability. Among recognized wearout mechanisms, bias temperature instability (BTI) and electromigration (EM) appear as two dominant effects that affect transistors and interconnect, respectively. Conventional flat guardband or dynamic margin design approaches address these effects by tolerating them, but they can be both costly and insufficient. Techniques that can take advantage of recovery of the phenomena can be more economic and effective. In this chapter, we present a taxonomy of state-of-the-art BTI and EM mitigation techniques that were developed across the system hierarchy, followed by the introduction of the concept of accelerated active self-healing that will be addressed throughout the rest of the book.
Xinfei Guo, Mircea R. Stan

Experimental Validations

Frontmatter

Chapter 2. Accelerated and Active Self-healing Techniques for BTI Wearout

Abstract
BTI has long been recognized as a partially reversible wearout effect, but the literature is vague about how much recovery can be achieved under different conditions and what it means for designers to boost the rate and level of BTI recovery. This chapter proposes a series of biologically inspired techniques that are able to effectively accelerate and activate the BTI recovery; measurement results with actual hardware demonstrate that even what would be considered irreversible BTI wearout can be almost fully eliminated by employing an internal circadian rhythm for recovery. By fully taking advantage of the explored unique BTI recovery behaviors and running the system in a “refreshed” mode, the necessary design margins that would be assigned by flat-guardband approach can be significantly reduced, and the average performance can be improved as well. We present the theory, models, experimental demonstration, and potential design benefits of accelerated and active BTI recovery in this chapter.
Xinfei Guo, Mircea R. Stan

Chapter 3. Accelerating and Activating Recovery Against EM Wearout

Abstract
As technology scales into the nano-regime, electromigration (EM) issues become a major threat that causes IR drops on the power delivery network (PDN) and can eventually lead to permanent failures. Conventionally, EM has been constrained by design rules during the physical design phase. In this chapter, we present experimental evidence demonstrating that EM recovery can be accelerated and activated by “reversing” the direction of stress (current in the case of EM). The recovery mechanism can be employed to relax the conservative EM design rules at design time and can potentially fix EM issues before catastrophic failure during run time. Similar to the BTI case we demonstrate that EM wearout and recovery can also follow an optimal circadian rhythm leading to an almost complete recovery. The chapter concludes by discussing the implications of EM recovery on potential improvements of chip signoff procedures.
Xinfei Guo, Mircea R. Stan

Implementing Self-healing on Chip

Frontmatter

Chapter 4. Circuit Techniques for BTI and EM Accelerated and Active Recovery

Abstract
In the previous chapters we saw that both BTI and EM recovery can be activated and accelerated; these unique recovery behaviors can benefit future resilient digital systems if they are instrumented on chip. In this chapter, we discuss a set of circuit blocks that implement the required functionality for achieving accelerated and active self-healing on chip. Examples of such portable circuit IP blocks are negative voltage generators, reconfigurable heaters, wearout-aware power gating, bidirectional-current PDNs, and novel types of BTI and EM sensors. We present design details, functionality, and potential costs of each type of circuit. By implementing all or a subset of these circuit IPs, recovery can be enabled on chip with acceptable hardware costs.
Xinfei Guo, Mircea R. Stan

Chapter 5. Active Accelerated Self-healing as a Key Design Knob for Cross-Layer Resilience

Abstract
Cross-layer resiliency is a closer to optimal way of maximizing reliability by breaking the abstraction layers boundaries across the system stack. In this chapter, we discuss how accelerated and active self-healing methods can be effectively applied at different levels in the system hierarchy. Circuit blocks that were presented in the previous chapter serve as the underlying infrastructure for recovery; at the architecture level, unit-level self-healing and intrinsic heat reduce the hardware costs for recovery through architectural opportunities; at the system level, scheduling that follows certain circadian rhythm can be implemented to deeply heal the circuit. Overall, these techniques can work together and compensate the trade-offs necessary for recovery.
Xinfei Guo, Mircea R. Stan

Wearout and Recovery in Advanced Technology and Emerging Applications

Frontmatter

Chapter 6. Design and Aging Challenges in FinFET Circuits and Internet of Things (IoT) Applications

Abstract
The advent of FinFETs has extended the CMOS lifeline by a few more technology nodes (5 nm and even 3 nm are now under development), so it is critical for digital circuit designers and researchers to understand some of the fundamental differences between advanced FinFET nodes and older planar devices, along with the associated challenges (e.g., design and aging challenges) in the forthcoming sub-10 nm regime. This chapter consists of two major thrusts. In the first thrust, we present a comprehensive study that compares multiple technology nodes spanning from old planar devices to the most advanced FinFET nodes. This study adds to the FinFET design knowledge base and helps designers gain a thorough understanding of various design challenges. The second trust mainly looks at the impact of FinFET aging within the context of Internet of Things (IoT). Through extensive simulations with foundry-provided FinFET aging models, we conclude that aging can severely affect certain category of IoT applications; hence, this aspect needs to be incorporated in the design cycle to meet the overall system lifetime targets. Several candidate techniques against aging are also presented for designing robust IoT chips that perform faster, consume lower power, and last longer than without the use of these techniques.
Xinfei Guo, Mircea R. Stan

Summary and Closing Remarks

Frontmatter

Chapter 7. Future Directions in Self-healing

Abstract
The evolution of CMOS technology and the increased market pressures have set stricter reliability requirements when designing integrated circuits. As one of the dominant unreliability sources, wearout needs to be addressed in a more efficient and effective way. This book has introduced one promising approach which can reverse the effect of wearout through active accelerated recovery techniques. Even if our focus has been mainly on digital CMOS circuits, we believe that similar methods can also be applied to new emerging technologies and can be integrated with the proposed wearout mitigation infrastructure. In this chapter, we preview several such directions that are inspired by self-healing. We believe that instrumenting recovery can be an effective design dimension for securing resilience of future electronic systems.
Xinfei Guo, Mircea R. Stan

Backmatter

Weitere Informationen