Construction of event-tree/fault-tree models from a Markov approach to dynamic system reliability

https://doi.org/10.1016/j.ress.2008.01.008Get rights and content

Abstract

While the event-tree (ET)/fault-tree (FT) methodology is the most popular approach to probability risk assessment (PRA), concerns have been raised in the literature regarding its potential limitations in the reliability modeling of dynamic systems. Markov reliability models have the ability to capture the statistical dependencies between failure events that can arise in complex dynamic systems. A methodology is presented that combines Markov modeling with the cell-to-cell mapping technique (CCMT) to construct dynamic ETs/FTs and addresses the concerns with the traditional ET/FT methodology. The approach is demonstrated using a simple water level control system. It is also shown how the generated ETs/FTs can be incorporated into an existing PRA so that only the (sub)systems requiring dynamic methods need to be analyzed using this approach while still leveraging the static model of the rest of the system.

Introduction

While the event-tree (ET)/fault-tree (FT) methodology is by far the most popular approach to probabilistic risk assessment (PRA), concerns have been raised in the literature over the past 25 years regarding its potential limitations in the reliability modeling of dynamic systems. These concerns include:

  • lack of time element in the ET/FT methodology to represent fault propagation through logic loops or possible dependence of the system failure modes on the exact timing of the component failures with respect to the changing magnitudes of the plant process variables [1], [2],

  • treatment of the coupling between the plant physical processes and triggered or stochastic events (e.g., valve openings, pump startups) which could lead to statistical dependence between failure events [3],

  • semi-quantitatively modeling of the propagation of system disturbances through a classification of changes in process variables (e.g., small, moderate, large) which may lead to omission of some failure mechanisms due to inconsistencies in the definition of the allowed ranges for the process variables [3], [4] or due to possible significant changes in the system behavior arising from very small changes in system parameters [5],

  • possible sensitivity of Top Event frequencies to stochastic changes in the system settings [2] or process dynamics.

A more detailed discussion of the possible limitations of the ET/FT approach is given in [6]. A more recent review regarding its applicability to the reliability modeling of digital instrumentation and control systems is given in [7].

Markov reliability models have been traditionally used to account for statistical dependencies between hardware failures [7]. Augmented with the cell-to-cell mapping technique (CCMT), Markov models can be also used to address the concerns indicated above for the ET/FT methodology [8], [9]. A challenge in the use of Markov reliability models for nuclear plant PRAs is that, while they have been utilized to model plant subsystems with statistically dependent failures [10], [11], [12], they cannot be used (nor needed) to model the whole plant due to state-space explosion. The state-space explosion issue can be addressed by using Markov models only for plant subsystems where needed (e.g. steam generator feedwater control system of a pressurized water reactor [12] or for the risk modeling of digital instrumentation and control systems [13]), however, there are no mechanized procedures that allow the incorporation of the Markov reliability model for a portion of the plant into an existing PRA for the plant that is based on the ET/FT methodology.

While ET interpretation of Markov chains seems to be common in operations research and economics [14], [15], [16], [17], [18], very few attempts have been encountered in the reliability literature to generate ETs or FTs from Markov models [19]. In this paper, we explore a new approach to the generation of failure scenarios and their compilation into dynamic event trees (DETs) or dynamic fault trees (DFTs) from a Markov model of the system, which allows the Markov model to be incorporated into PRAs based on the ET/FT methodology in a mechanized manner. The DETs are similar to conventional event trees except that the branching times are determined from the system simulator through user specified branching rules and associated probabilities to generate and quantify the likelihood of possible scenarios following an initiating event [20], [21], [22], [23]. The branching rules can be used to model the uncertainty in hardware/human/process behavior. For example, if the normal system operation requires valve opening upon pressure increasing above a preset limit, a branching rule could be that the valve opens or not (fails) when the system pressure as determined by the system simulator reaches this limit, each possibility leading to different scenarios. Refs. [22], [24] show, respectively, how uncertainties in human and process behavior can be modeled using DETs starting from a given initiating event. In that respect, the DET generation is based on inductive logic and the DET has to be regenerated for changing initiating event conditions. The DFTs also account for timing of failure events, however, they use deductive logic to identify event sequences leading to a specified undesirable event (Top Event). They have been mostly used to model hardware/software failure dependencies [25] or system availability when there is repair [26].

While DETs can be independently generated using the probabilistic simulation of the dynamic behavior of the system as described above, the advantage of the use of Markov models to generate DETs over independent generation of DETs are the following:

  • Markov models are generally applicable for all possible initial conditions in the discrete state-space range of interest.

  • They can represent the dynamics of systems with control loops in a more compact manner than DETs for both normal and abnormal system operation.

  • Modeling uncertainties or uncertainties in the initial conditions of systems with continuous controlled/monitored variables can be accounted for using the Markov/CCMT approach [7].

Section 2 describes the example dynamic system used for illustration of the proposed approach (Section 4). Section 3 gives an overview of the Markov/CCMT methodology used to construct the Markov reliability model for the example dynamic system. Section 5 implements the proposed approach on the example dynamic system and Section 6 illustrates how to incorporate the resulting DETs and DFTs into an existing PRA using the SAPHIRE code [27].

Section snippets

Example dynamic system

In this section we introduce a simple level control system [9] often used as benchmark in the literature. We will refer to this system as the example system throughout the rest of the paper. The example system is depicted in Fig. 1. The example system consists of a water tank, two water supply units (Units 1 and 2), and one drain unit (Unit 3). Each control unit has a separate level sensor and we assume the sensor is part of the control unit.

There is one process variable—the level x of the

Markov model and CCMT

The CCMT [28] regards system evolution in time as probability of transition of the process variables xl (l=1, …, L) between their specified magnitude intervals. These intervals form L-dimensional cells Vj={xl: al,jxl<bl,j; j=1, …, Jl; l=1, …, L} in the system state-space in a similar fashion to those used by finite difference or finite element methods. Once the Vj are specified, the probability pn ,j(k) at time t= that x()=x≡[x1 x2xlxL] is within Vj and α()=αn where αn is the system

Dynamic ET/FT generation

The basic idea of our approach is to use the transition matrix of the Markov model of the system as a graph representation of a finite state machine (a discrete process model of the stochastic dynamic behavior of the system). We can then use this representation and standard search algorithms [29] to explore all possible paths to failure (scenarios) with associated probabilities and to construct dynamic event trees of arbitrary depth. In addition, for each type of failure, we can construct a

Analysis of benchmark system

We have implemented a prototype tool (in Java) that allows us to generate the transition matrix of the discrete-time Markov chain modeling the benchmark system, and to use this matrix to explore the dynamic behavior of the system. Here we present the results of our analysis of the sample benchmark system. Note that for the purposes of illustration we make the following assumptions:

  • H=3 m, L=−3 m, hsp=1 m, and lsp=−1 m (see Fig. 1).

  • Q=0.01 m level change/minute (see Eq. (1)).

  • Units are not repaired;

Integration into existing PRAs

The DETs generated may be easily incorporated into an ET/FT code such as SAPHIRE. The DETs may be described as AND event sequences, which can be modeled using fault trees. The fault trees may then be entered into SAPHIRE graphically using the fault tree editor. However, for large models it may be easier to construct a text file containing the fault tree logic and import this file into SAPHIRE. Other ET/FT codes such as CAFTA [30] and RISKMAN [31] have similar features.

In order to import the

Conclusion

In this paper, we have discussed a new approach to the generation of failure scenarios and their compilation into dynamic event trees and dynamic fault trees from a Markov model of a given system. In particular, we have presented two algorithms for the generation of DETs from a Markov model of the system and how to construct DFTs by traversing the generated DETs. We have also illustrated the use of these algorithms to analyze the behavior of a simple level control system and how the generated

Acknowledgments

The research presented in this paper was partially supported by a contract from the Idaho National Laboratory (INL). The information and conclusions presented herein are those of the authors and do not necessary represent the views or positions of the INL. Neither the US Government nor any agency thereof, nor any employee, makes any warranty, expressed or implied, or assume any legal liability or responsibility for any third party's use of this information.

References (31)

  • T. Aldemir

    Utilization of the cell-to-cell mapping technique to construct Markov failure models for process control systems

  • T. Aldemir

    Computer-assisted Markov failure modeling of process control systems

    IEEE Trans Reliab

    (1987)
  • T.C. Sharma et al.

    Reliability analysis of large system by Markov techniques

  • Y.D. Lukic

    Nuclear power plant core-protection-calculator reliability analysis

  • T. Aldemir et al.

    Dynamic reliability modeling of digital instrumentation and control systems for nuclear reactor probabilistic risk assessments, NUREG/CR-6942

    (2007)
  • Cited by (110)

    • A comprehensive review on dynamic risk analysis methodologies

      2022, Journal of Loss Prevention in the Process Industries
    • Markov and semi-Markov models in system reliability

      2022, Engineering Reliability and Risk Assessment
    View all citing articles on Scopus
    View full text