Radiation effects in state-of-the-art electronics has become a critical concern that goes beyond the traditional systems operating in harsh environments such as in aviation and space missions. Therefore, there is an increased interest in studying these effects as well as in investigating on how to design reliable and fault-tolerant systems that are hardened against radiation. In this chapter, the main stochastic events, known as single-event effects, are presented. Furthermore, the fundamental concepts necessary to understand the problem of radiation effects in electronics are introduced.
2.1 Context and Overview
As presented in the previous chapter, the reliability of electronic circuits is subject to physical damage or functional failures due to the influence of the environment, such as the presence of atmospheric or space radiation [1]. The energy deposition of a single energetic particle in the sensitive areas of a circuit can lead to destructive or nondestructive mechanisms, known as single-event effects. Initially, the first studies on circuit reliability under the stress of radiation effects were considered a primary concern of extreme relevance only in projects developed for military or space applications due to their harsh environments. Back in 1962, the work developed in [2] was the first study to predict that galactic cosmic radiation could become a threat to circuit design as the technology is scaled down into the nanometer world. And, only later in 1975, Binder et al. [3] were able to identify anomalies in the bit storage in flip-flop circuits used in a satellite system and attributed to the cosmic radiation effects.
Besides the radiation effects observed in space applications, these irregularities in the circuit operation were also identified at sea level as early as 1978 [4]. However, the root cause for these anomalies that were observed in memory circuits was associated to the alpha particles emitted from the uranium and thorium composition which were naturally present in the package material surrounding the devices. This chapter used for the very first time the term soft errors to associate the nondestructive radiation effects in electronics, and it is still largely adopted in the research community. In the following year, Guenzer et al. [5] have shown that neutrons and protons can also induce upsets in memory elements when they trigger nuclear reactions within the circuit material. It was in this chapter that the term single-event upset (SEU) was first adopted to address the bit flips observed in memory circuits, and it has been largely used since then. In the next section, the fundamental SEU mechanisms necessary to understand and investigate their effects on current technologies will be explained in detail.
Advertisement
Initially, most of the studies were focused on the radiation effects on memories due to their higher occurrence and therefore higher impact on the functionality of the systems. Only after nearly 10 years, since the first observation of upsets in satellites by Binder et al. [3], the transient effects were observed in combinational logic circuits by May et al. [6]. Then, several works during the 1990s started to examine the anomalies in the combinational part of logic circuits and it was getting more attention from the radiation effects research community [7]. It was in the work developed in [8] which reported that radiation-induced transients could propagate and upset memory elements such as the latch gates. Though the transient effects were observed since 1984, the term single-event transient (SET) was only first adopted in 1990, by Newberry et al. [9]. Historically, SEUs have been vastly studied in the literature, while SETs were not given as much importance due to the intrinsic masking effects of combinational logic circuits [7]. However, the transistor scaling, reduced logic data path depth, and increased operating frequencies have attenuated the masking capability of logic circuits at advanced technology nodes [10‐13]. Accordingly, several works started the development of radiation hardening techniques and mitigation schemes to reduce the impact of soft errors, i.e., both SEU and SET.
Although early research in radiation effects was predominantly focused on space and military systems, there has been a growing awareness of the impact on ground-level applications. For example, autonomous vehicles are increasingly susceptible to transient radiation effects, which could result in critical malfunctions if appropriate fault-tolerant measures are not implemented to enhance their reliability [45]. As discussed in the previous chapter, particle accelerators present an especially hostile environment for electronics [46]. A notable instance occurred within CERN’s super proton synchrotron (SPS) in 2021, where multiple radiation-induced failures within the injection chain of the large hadron collider (LHC) significantly impacted the overall availability of the accelerator complex [47]. These failures were caused by data corruption in memory elements, i.e., SEU events, in a programmable logic controller (PLC)-based system.
Figure 2.1 illustrates the number of radiation to electronics (R2E) beam dumps that occurred during the large hadron collider’s (LHC) physics runs in 2018 and 2024, plotted against the integrated luminosity at the compact muon solenoid (CMS) experiment. In 2018, a substantial number of failures were observed, particularly in power converters and quench protection systems, as a consequence of increased radiation levels caused by adjustments in the nominal operational settings of a collimator in the LHC ring [46].
×
To improve the overall availability of the accelerator complex, especially with the anticipated increase in integrated luminosity during the high-luminosity LHC (HL-LHC) era, a target failure rate of 0.1 R2E events per fb\({ }^{-1}\) has been set, as depicted in Fig. 2.1. This goal emphasizes the necessity of rigorous radiation monitoring and the comprehensive qualification of electronic systems deployed in the accelerator, ensuring adherence to a robust radiation hardness assurance (RHA) protocol to mitigate radiation-induced disruptions effectively.
Advertisement
Therefore, whether for space systems or ground-level applications, understanding the fundamental mechanisms of radiation effects is essential for designing reliable systems and enhancing their operational availability. In this chapter, the foundational concepts of single-event effects (SEE) in digital circuits will be introduced. These concepts will provide the necessary knowledge for addressing radiation-induced challenges in a wide range of applications, from space exploration to terrestrial technologies.
2.2 Single-Event Upset (SEU)
As mentioned previously, the SEUs are characterized by a single particle strike in memory elements leading to bit flips and consequently data corruption. To illustrate how a single particle can induce a SEU, Fig. 2.2 contains the gate representation of the basic element that composes a traditional Static Random Access Memory (SRAM) circuit topology, the cross-coupled inverters. This architecture is the core circuit of a traditional SRAM memory design in which additional access transistors (not shown in the figure) are used to read and write the logic states, i.e., the bit information.
×
This positive feedback circuit architecture is responsible for holding the bit information, therefore, working as a memory element. If a particle strikes any of these inverter gates and changes its output signal, the feedback mechanism will hold the incorrect signal and then change the bit information storage in the circuit as shown in Fig. 2.2. To observe such effects, an energetic particle needs to hit next to one of the off-state transistors in the circuit (as shown in Fig. 2.3) and deposit sufficient energy. By doing so, the induced electron-hole pairs created by ionization can be collected by the reversed-biased p-n junction of the transistor, and a transient current is observed in its drain terminal.
×
If the amplitude and pulse width of this transient current are sufficiently large, in other words, if enough energy is deposited by the incident particle in the right place of the device, the output signal of the affected inverter will change and, consequently, the stored data will be corrupted, as illustrated in Fig. 2.2. Accordingly, the minimum collected charge necessary to change the output signal of the circuit is related to its nodal capacitance, and it is usually known as the critical charge (\(Q_{crit}\)) of the circuit. Thus, the sensitivity of such circuit depends on the node capacitance of its internal transistors which are responsible for the logic state retention. Accordingly, \(Q_{crit}\) can be expressed with the following simplified equation where C corresponds to the nodal capacitance and \(V_{DD}\) the supply voltage of the circuit:
$$\displaystyle \begin{aligned} Q_{crit} \sim 2 V_{DD} C \end{aligned} $$
(2.1)
This is a simplified formula widely used to understand the implications of circuit design and to provide useful insights of the impact of transistor technology on the susceptibility to energetic particles. Together with the threshold Linear Energy Transfer (LET), as described in Chap. 1, the critical charge \(Q_{crit}\) is a very important parameter used to characterize components and estimate their SEE rate.
Critical charge\(Q_{crit}\) of a circuit is widely used as a measurement of its sensitivity to SEE. In current state-of-the-art technologies, \(Q_{crit}\) is expected to be lower than \(1 fC\).
Besides the critical charge, another very important concept used to define the SEE sensitivity of a component is the SEE cross-section\(\sigma _{SEE}\), which is a measure of the probability of a SEE to occur in a device. For a given LET, the event cross-section \(\sigma _{SEE}\) is calculated based on the number of observed events \(N_{SEE}\) under a given particle fluence \(\phi \) as shown in Eq. 2.3:
The particle fluence, expressed in particles\(/cm^2\), is the integrated number of particles passing through a unit area perpendicular to the beam direction. The sensitivity of a component is normally expressed in cross section as a function of the particle LET (or particle energy in the case of protons and neutrons). For ion-induced events, the energy deposition is dependent on the angle of incidence and, therefore, if the incident angle is not perpendicular to the device, the effective LET should be used instead:
Given the feedback mechanism of such architecture, two additional effects are observed at the circuit level, and they are essential for the failure analysis: the transient propagation delay and the restoring current. As shown in Fig. 2.4, the radiation-induced transient pulse observed in the output signal of the struck inverter propagates to the output of the second inverter with a propagation delay \(T_{pd}\) which is an intrinsic characteristic of the designed circuit considering a given technology node. While the transient pulse propagates to the input of the second inverter, the PMOS transistor in the first inverter is still on, and therefore it provides a restoring current that counteracts the radiation-induced transient current. The strength of the restoring transistor, i.e., its feature size, determines the resulting transient pulse observed in the circuit node. For instance, if the p-type transistor (PMOS device) in Fig. 2.4 is faster in restoring the output voltage than the propagation delay \(T_{pd}\), no SEU will be observed in this memory cell. In addition to the reduced nodal capacitance, in the deeply scaled transistor technologies, the propagation delay tends to be smaller in every new generation [14, 15]. As a result, if no proper hardening approach is considered when adopting advanced technologies, a higher sensitivity to radiation effects should be expected.
×
Another issue observed when adopting deeply scaled technologies is the increase of the charge sharing effect, discussed in the previous chapter, that can lead to multiple-cell upset (MCU). In other words, within a single particle hit, multiple memory cells are upset due to their close proximity in the physical layout and the carrier diffusion mechanism through the substrate. Figure 2.5 illustrates the impact of scaling in a memory array and the consequent MCU phenomenon. When the affected cells correspond to the bits from the same logical word in the memory, they are called multiple-bit upset (MBU). The MBU occurrence has an important implication in the efficiency of error correcting codes (ECC) [17] where most of the techniques are designed to correct only single-bit upsets (SBUs). In order to preserve the data integrity considering MCUs, the ECC techniques would require a large number of redundant bits increasing the memory design complexity and the area overhead.
×
Another approach widely used along with the ECC techniques is the bit interleaving [18]. In a interleaving scheme, the memory cells are placed in such a way that the adjacent cells correspond to bits of different logical words. Therefore, if the memory array in Fig. 2.5 adopts bit interleaving, the MCU event would not lead to a MBU and the ECC techniques would be able to correct the SBU from the different affected words. Although the increased cell density in memories has shown an increased sensitivity to MCUs and MBUs in the latest planar devices, the memories using three-dimensional devices such as FinFET have shown an improved resilience due to the higher substrate doping profiles [19, 20]. In order to suppress the short-channel effects (SCEs) such as increased off-state leakage current, higher substrate doping levels are used in FinFET devices which limit the carrier mobility and therefore reduces the charge sharing effect and the multiple node charge collection. However, it was also shown that in both planar and FinFET SRAMs, the MBU susceptibility has a strong angle dependence where the device orientation could be critical in determining the efficiency of ECC techniques [21].
Charge sharing is a predominantly negative effect that increases the sensitivity of advanced technologies. However, design techniques can take into consideration to propose hardening strategies as shown in the next chapters.
2.3 Single-Event Functional Interruption (SEFI)
The consequences of an SEU depend on the nature of the corrupted information and the timing of the event. In simpler devices, the error might go unnoticed. However, for complex systems like processors and Field-Programmable Gate Arrays (FPGAs) devices, an SEU can lead to critical misbehavior. When an SEU occurs in a control or state-holding section of a complex device, it can trigger a more severe consequence known as a single-event functional interruption (SEFI). A SEFI manifests as a complete cessation of operation in the affected circuit, often resulting in a system reset or lockup. Imagine a tiny glitch causing your computer to freeze entirely—that is akin to a SEFI in electronics. Unlike some soft errors that might cause transient hiccups, SEFIs are typically detectable due to the complete halt in operation.
SEFIs are most prevalent in devices with integrated control or state sections, such as modern memories, processors, FPGAs, and Application-Specific Integrated Circuits (ASICs). This is because these devices rely heavily on accurate logic states and control signals, and any disruption caused by an SEU can have a significant impact on functionality. Understanding SEFIs requires familiarity with three key concepts in reliability in electronics:
Fault: A physical defect within a device that can trigger errors under certain conditions. In the context of radiation effects, faults can arise from disruptions in critical circuitry due to ionization or displacement damage caused by particle interactions. For instance, a radiation-induced transient in an off-state transistor is a fault.
Error: An error is the manifestation of a fault and implies a deviation from the intended behavior of a system, for example, a bit flip in a memory element that corrupts critical data.
Failure: A failure occurs when a system or component cannot fulfill its intended function. In the context of SEFI, the failure is marked by a catastrophic loss of functionality in the affected circuit, potentially compromising the integrity of the entire system.
SEUs are known as soft errors because they do not present permanent damage, and they are often silent, i.e., they alter the state of a memory cell without system detection. These undetected errors can propagate through the system, potentially leading to failures. However, if a soft error occurs in a control unit or critical part of the system, it can disrupt operation in a detectable way, allowing the system to take corrective measures and potentially avoid a complete failure. SEFIs, in essence, are a type of fault that manifests as a detectable error, often with the potential for self-correction (through a system reset). This differentiates them from potentially silent soft errors.
2.4 Single-Event Transient (SET)
The basic mechanisms observed in SEUs and introduced in the last subsection also apply to SETs in logic circuits. In fact, one of the main differences between SETs and SEUs is the type of circuit affected by particle interaction. Overall, digital logic circuits can be classified into two groups: combinational logic and sequential logic circuits. The combinational logic circuit implements a Boolean logic function which only depends on the actual set of inputs of the circuit. For instance, the NOT gates (inverters) which are used to implement the logical negation, and the AND/OR gates to implement the logical conjunction/disjunction are very commonly used combinational logic gates. On the other hand, in a sequential logic architecture, the logic implementation depends not only on the actual input of the circuit but also on the previous inputs and outputs. Therefore, ”sequential” refers to the sequence of information which introduces the notion of storage. A simple way to implement this sequential mechanism is by using positive feedback as shown previously in Fig. 2.2. Besides the SRAM design, this feedback approach is widely used to design different sequential logic circuits used as storage elements such as registers, latches, and flip-flops. In Fig. 2.6, an example circuit illustrates the sequential logic gates (in this case, the flip-flops) and the combinational logic gates such as the NAND (not AND), the NOR (not OR), and inverter.
×
In contrast to SEU which occurs in the sequential logic part of the circuit, the SET occurs in the combinational one. Therefore, the transient pulse is generated in the output of the logic gate, and it can propagate through the data path until it is latched by a memory element and corrupts the stored bit. However, for this SET pulse to upset a memory element at the end of its data path, it needs to surpass three basic masking capabilities inherent in any combinational circuit: logical masking, electrical masking, and latching-window masking (also known as temporal masking).
2.4.1 Logical Masking Effect
Combinational circuits provide the logical masking effect when the SET event occurs in a logic gate where its logic output does not determine the output signal of the subsequent logic stage. For instance, a two-input NOR gate has its output determined whenever one of its input signals is evaluated to 1, i.e., whenever one input signal is at logic 1, the output evaluates to logic 0. This phenomenon can be better understood by analyzing the block of combinational logic in Fig. 2.7. A SET event occurs in the first NOR gate, in which the output signal initially was evaluated to logic 0. The SET pulse propagates to the next logic stage, which is also a NOR gate. However, this logic stage has already been evaluated to logic 0 due to the input signal provided by the NAND gate. Since the output of the second NOR gate has already been determined by one of its inputs, the SET pulse is not able to change it; hence, it is logically masked and cannot propagate to the next logic stage and reach a memory element, for instance. Although this mechanism is effective, recent technologies have shown a reduction in the logic depth of combinational circuits, thus logical masking effect has been reduced [22]. Nevertheless, circuit designers can promote the logical masking effect by introducing more basic logic gates in the design instead of using complex logic gates as shown in [23].
×
2.4.2 Electrical Masking Effect
The electrical masking effect is another phenomenon that can occurs in a combinational circuit and prevents the propagation of a SET pulse. Due to electrical losses, a SET pulse suffers from magnitude and amplitude attenuation, and it might not be able to propagate to a memory element as observed in Fig. 2.8. The initial SET pulse has its waveform affected by each stage of logic, vanishing near the memory element.
×
In this case, the propagated SET pulse did not have sufficient amplitude to upset the memory element due to the electrical masking effect. However, it was shown that not only the transient pulse can suffer from attenuation, but it can also experience a broadening effect, the so-called propagation-induced pulse broadening (PIPB) [24, 25]. Similarly to the SEU, the pulse width of the SET depends on the restoring current of the struck circuit and its capacitive load (fan-out). Larger capacitance can lead to increase in the critical charge; however, it can lead to pulse broadening due to the longer time to restore the output voltage [26]. Due to its complexity, the PIPB effect is difficult to be evaluated and, therefore, it is a significant issue when considering hardening methodologies for SET mitigation [12, 27].
2.4.3 Latching-Window Masking Effect
In the end, if the SET pulse has not been masked logically or electrically, it might still be masked by the latching window of a memory element. This window is composed by the setup time (\(T_{setup}\)) and the hold time (\(T_{hold}\)) around the edge of the clock signal of a flip-flop circuit. If the SET pulse does not arrive during this latching window, it will not be able to induce a bit upset, i.e., a change in the stored bit value. Figure 2.9 illustrates this phenomenon. Due to the high operating clock frequencies in advanced technologies, the latching-window effect is expected to be reduced given the short \(T_{setup}\) and \(T_{hold}\) of FF designs [28].
×
In summary, the particle strike must induce a SET pulse with sufficient amplitude and duration to propagate through an open logic path and reach a memory element during a clock pulse, enabling the latching of the input value. Thus, as the clock frequency increases and the supply voltage reduces, the probability of a SET pulse to be latched by a memory element increases [29].
Masking effects are naturally present in digital logical circuits and their effectiveness depends on several factors such as the transistor technology, circuit design, and operation parameters (clock frequency, supply voltage, temperature, and so on).
2.5 Single-Event Latchup (SEL)
Since the 1970s, the latchup mechanism has been a very well-known reliability issue observed in bulk Complementary Metal-Oxide Semiconductor (CMOS) technology due to the parasitic pnpn structure inherently present in this technology as shown in Fig. 2.10 [30‐32]. In nominal operation, this structure composed of parasitic bipolar junction transistors (BJTs) is under high impedance, and therefore, it does not interfere in the circuit operation. However, these transistors can be activated externally by (1) electrical stress, known as electrical latchup or transient-induced latchup or (2) induced by particle radiation, known as single-event latchup (SEL). Different from the soft errors, such as the SEU and SET discussed previously, SEL can have a destructive effect depending on the duration and magnitude of the induced parasitic current.
×
This phenomenon occurs when the particle interaction within a circuit leads to the activation of these parasitic BJTs and therefore the creation of a low-impedance path between both power rails, the power supply, and ground. Once this connection is established, a high current is observed in the circuit which can permanently damage the component. To restore the correct functionality of the circuit and prevent its permanent damage, a power cycle is necessary. In Fig. 2.10, the cross-sectional view of a typical CMOS-based inverter design is shown in which the parasitic BJTs are also illustrated. In the literature, these BJTs are also often known as vertical transistor (VT) and lateral transistor (LT). The resistors \(R_W\) and \(R_S\) correspond to the well and substrate resistance, respectively. The placement of the well-taps and the doping profile of the CMOS structure determines the value of these resistances and therefore the electrical characteristics required to observe a sustainable SEL.
Notice that these BJTs are not designed to be active during the nominal operation of the circuit; however, they are formed due to the interplay between the different junctions, potentials, and nested wells within the CMOS structure itself. Due to their positive feedback connection, the BJTs will only turn off, and, the SEL will be extinguished when the supply voltage of the circuit is reduced to a level below the so-called holding voltage (\(V_{hold}\)). Figure 2.11 illustrates the case when a particle strike deposits enough charge to activate the feedback loop of BJTs leading to a sustained SEL (until the supply voltage is reduced under the \(V_{hold}\)) and the case in which the radiation-induced current is not sufficient to turn on the parasitic BJT loop. Besides the \(V_{hold}\), another latchup criterion is related to the BJT gain of both lateral and vertical transistors. In order to establish the positive feedback, the product of their gains must exceed unity, i.e., \(\beta _{V} \beta _{L} > 1\) [34].
×
Together with soft errors, the threat of SEL in electronic components has been a well-known and highly important concern for systems operating in the space environments [36‐39]. Besides the circuit design characteristics, the SEL sensitivity of a given component also depends on the environmental temperature as proven in the literature [33, 35, 40‐43]. The temperature variation modifies the intrinsic characteristics of the devices such as the carrier mobility and the threshold voltage. For example, carrier mobility decreases with the increase in temperature; therefore, the substrate/well resistance increases accordingly. As a consequence of such variation in the substrate and well resistances, the SEL triggering mechanism is also affected. In Fig. 2.12, the SEL sensitivity of an inverter gate designed in 65 nm bulk CMOS is shown as a function of particle LET and temperature of operation. The sensitivity is measured in terms of the SEL cross section which increases with temperature for both low and high LET values. Thus, an increase in temperature can not only increase the saturation SEL cross section but can also lower the threshold LET necessary to trigger the latchup. However, these relationships are not as simple as they might seem. For instance, the temperature dependence of carrier mobility is also a function of the doping concentration: the lower the concentration, the stronger the temperature dependence [44]. It is for this reason that the SEL susceptibility of a given circuit is highly dependent on the layout design itself, but also on the process technology. Therefore, in Chap. 4, the implications of process technologies and mostly design approaches on the overall SEE sensitivity of a circuit are discussed.
×
2.6 Summary
Radiation effects are no longer an exclusive concern for system designers targeting space or military applications. Due to the advancement of technology, low-energy particles present even at sea level are able to induce failure mechanisms in the device and circuit levels. In this chapter, the stochastic effects known as single-event effects (SEEs) were presented. The most well-known and highly important effects are the soft errors, i.e., the nondestructive effects named single-event upset (SEU) and single-event transient (SET). Their failure signature is strongly related to the data corruption either by a direct impact on the memory element themselves or by a transient pulse which is propagated through the data path and latched during the read mode.
Initially, the SETs in digital electronics showed less of a concern due to the inherent masking effect capability of combinational circuits. However, with the technology integration, the effectiveness of these masking effects has diminished, and a higher impact of SET is observed in today’s electronic technology. Another very relevant failure mechanism in modern technology is the single-event latchup (SEL), a potentially destructive effect that if no action is taken, the electronic components can suffer permanent damage due to the very high current flow between the supply rails.
To characterize and investigate these effects in different systems, simulation codes are widely employed in conjunction with irradiation testing campaigns. Therefore, in the next chapter, we will discuss the physical models used to describe these phenomena and provide an overview of various simulation tools available in the literature. This understanding of SEEs and the associated analysis tools is essential for effectively assessing and mitigating their impact on electronic systems in diverse applications.
Highlights
Single-event effects (SEEs) are stochastic effects caused by single particle interactions, and they can be destructive or nondestructive (soft errors).
A single-event functional interruption (SEFI) is a manifestation of soft errors in complex devices, leading the component to reset, lockup or experience other types of malfunction.
Among several factors, the SEE sensitivity of a circuit is dependent on the transistor technology, supply voltage, and nodal capacitance of transistors.
The threshold LET and the critical charge of a component are important parameters to characterize and estimate its SEE rate.
The multiple-node charge collection due to the charge sharing can pose a threat to the efficiency of error-correcting codes (ECCs) in advanced technologies.
The masking effect capability of digital combinational circuit has been reduced in deeply scaled transistor technologies.
The single-event latchup (SEL) is a potentially destructive SEE that can damage the component if the supply voltage is not reduced under the so-called holding voltage \(V_{hold}\).
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.