Assessment of redundant systems with imperfect coverage by means of binary decision diagrams

https://doi.org/10.1016/j.ress.2007.05.002Get rights and content

Abstract

In this article, we study the assessment of the reliability of redundant systems with imperfect fault coverage. We term fault coverage as the ability of a system to isolate and correctly accommodate failures of redundant elements. For highly reliable systems, such as avionic and space systems, fault coverage is in general imperfect and has a significant impact on system reliability. We review here the different models of imperfect fault coverage. We propose efficient algorithms to assess them separately (as k-out-of-n selectors). We show how to implement these algorithms into a binary decision diagrams engine. Finally, we report experimental results on real life test cases that show on the one hand the importance of imperfect coverage and on the other hand the efficiency of the proposed approach.

Introduction

Computer controlled systems used in life-critical applications often utilize redundancy to meet the required stringent levels of reliability. An increasing interest in the development of highly reliable systems has resulted in an extensive treatment of reliability models for k-out-of-n systems in the literature. Only a much smaller portion of this literature, however, addresses the modeling of imperfect fault coverage (IFC) for these redundant systems [1], [2], [3], [4], [5], [6], [7], [8]. The proper operation of all redundant systems is clearly dependent on the system's ability to detect, isolate, and correctly accommodate failures of the redundant elements. The probability of correctly accomplishing these tasks is termed coverage. For highly reliable systems, even for very high values approaching unity, coverage has a significant effect on system reliability. As a result, appropriate modeling of the effects of coverage is critical to the design of these systems, particularly for those whose operation is life critical.

The redundancy management (RM) process of a redundant system is responsible for the following tasks: (1) the detection of faults among the system's elements, (2) the isolation of a fault to a particular element, and (3) the reconfiguration of the system in such a way to assure the system's proper operation subsequent to a redundant element failure. For systems in which the RM tasks are accomplished under computer control, this process can seldom, perhaps never, be done with perfect certainty. There are two main techniques used as a primary basis for the implementation of a system's RM function; (1) the RM tasks are accomplished using a diagnostic process that is associated with each of the individual redundant elements, this typically takes advantage of a built-in test capability (BIT) incorporated within the redundant element or (2) the RM function may take advantage of some version of mid-value-select voting when there are at least three operational elements within the system's set of redundant elements of a given type. Systems utilizing these two RM techniques require different approaches for the modeling of system reliability. Those utilizing the first type of RM (BIT) are modeled using what we term element level coverage (ELC), while the second RM approach (voting) are modeled using a fault level coverage (FLC) model. For all IFC systems (whether ELC or FLC), an “uncovered” failure results in system failure even though sufficient redundancy may still exist.

In this article, we propose efficient table-based algorithms for k-out-of-n:F systems subject to IFC for both ELC and FLC systems. These algorithms are based on a recursive decomposition inspired from those used for general k-out-of-n:F systems proposed in Ref. [9]. We show that both algorithms have a O(n×k) time complexity. With such a low complexity, virtually any real life system can be assessed within a few seconds. The ELC functions presented in this paper produce results identical to those computed using the SEA algorithm [1]. Similarly, presented FLC functions produce the same results as the RFLC procedure presented in Ref. [7]. In the latter case, the functions presented here achieve a much better complexity.

Voters are in general embedded into broader systems whose failures are conveniently modeled by means of fault trees. It is therefore of interest to be able to encode ELC and FLC models by means of the usual fault tree gates. Although the recursive decomposition we propose makes such an encoding possible, we adopt here a slightly different approach. Namely, we propose to define new types of gates, one for each IFC model. We show that the table-based algorithms can be adapted to the construction of binary decision diagrams (BDDs). Since BDDs have proved to be the most efficient technology to assess fault trees, we end up with a powerful tool to assess general systems with IFC components.

The remainder of this article is organized as follows. Section 2 presents taxonomy for these classes of imperfect fault coverage systems. Section 3 proposes the table-based algorithms to assess these models. It also reports results of small experiments that show the efficiency of these algorithms as well as the impact of the coverage on k-out-of-n:F systems. Section 4 proposes a BDD-based implementation of these models. Finally, Section 5 reports some results on a realistic test case.

Section snippets

Imperfect fault coverage models

In this section, we shall consider the different IFC models, i.e., k-out-of-n:F systems using ELC, FLC, or OLC models. Recall that a k-out-of-n:F system is a system with n redundant, not necessarily identical, components that fail when at least k out of the n components have failed. The taxonomy presented hereafter is mainly borrowed from Myers’ work [7].

Table-based algorithms for ELC and FLC models

In this section, we shall analyze the time complexity of several algorithms to assess ELC and FLC models. We shall use the standard notation O(.). Consider an algorithm that apply on data that can be described using parameters n1, n2, …, nk (where n1, n2, …, nk are integers). Let F(n1, n2, …, nk) be a function from the k parameters to integers. The algorithm is said in O(F(n1, n2, …, nk)) if there exists a constant C such that for all values of the parameters, the number of elementary operations performed

BDDs implementation

Bryant's BDDs [10] are now a well-known and widely used technique [11]. In this section, we give first an encoding of PFC and IFC models by means of classical fault tree gates. Then, we recall briefly the basics of this technique. Finally, we show how to use it to implement PFC, ELC, OLC, and FLC models.

Assessment of a digital flight control system

We created a BDD reliability model for the flight critical portion of a hypothetical aircraft utilizing a quadruple redundant parallel-channel digital flight control system. The principal elements of this system's RM are preformed by the flight control computers using a mid-value-select voting logic to detect failures and to select from among the system's redundant elements. Subsequent to a third failure of a given element type the RM logic detects failures by observing that two remaining

Conclusion

In this article, we reviewed first the taxonomy of imperfect fault coverage (IFC) models proposed by Myers [7]. We gave a formal definition of the three models ELC (element level coverage), FLC (function level coverage) and OLC (one-on-one level coverage). These three models can be seen as three different types of k-out-of-n:F systems.

Then, we derived, from a principle proposed by Dutuit and Rauzy [9], O(n×k) table-based algorithms to compute the unreliability of k-out-of-n:F IFC systems. These

References (15)

  • Y. Dutuit et al.

    New insights in the assessment of k-out-of-n and related systems

    Reliab Eng Syst Saf

    (2001)
  • A. Rauzy

    New algorithms for fault trees analysis

    Reliab Eng Syst Saf

    (1993)
  • S.V. Amari et al.

    A seperable method for incorporating imperfect fault-coverage into combinatorial models

    IEEE Trans Reliab

    (1999)
  • Chang Y-R, Amari SV, Kuo S-Y. Reliability evaluation of multi-state systems subject to imperfect coverage using OBDD....
  • S.V. Amari et al.

    Optimal reliability of systems subject to imperfect fault-coverage

    IEEE Trans Reliab

    (1999)
  • S.A. Doyle et al.

    A combinatorial approach to modeling imperfect coverage

    IEEE Trans Reliab

    (1995)
  • J.B. Dugan

    Fault tree and imperfect coverage

    IEEE Trans Reliab

    (1989)
There are more references available in the full text version of this article.

Cited by (44)

  • Influence of failure propagation on mission abort policy in heterogeneous warm standby systems

    2019, Reliability Engineering and System Safety
    Citation Excerpt :

    While the former type affects only a subset of system components, the latter type can cause the entire system failure. In this work we only focus on the PFs with global effect, which may be caused by imperfect fault coverage (an undetected component fault may propagate throughout the system) [29-32], destructive effects (e.g., a component failure causes fire, overheating, short circuit [33]), malicious attacks (e.g., jamming attacks [34]), and etc. It has been shown by many studies that PFs can make significant contributions to system failures [35-37].

  • Probabilistic competing failure analysis in phased-mission systems

    2018, Reliability Engineering and System Safety
    Citation Excerpt :

    Caballé et al. [19] considered that the intensity of the random shock is a piecewise function of the degradation level. Competing processes of different failure modes or causes have also been investigated in the context of accelerated life tests [20,21], system maintenance policies [22,23], or imperfect fault coverage [24–27] where covered and uncovered (or propagated) component fault modes and their distinct effects are modeled. Different from the above-mentioned works that are concerned with competitions between different types of failure modes of the same component or system, this paper addresses competitions between global failure propagation effect and probabilistic isolation effect caused by the PFD behavior.

  • Reliability of demand-based phased-mission systems subject to fault level coverage

    2014, Reliability Engineering and System Safety
    Citation Excerpt :

    While most of the existing works on PMS analysis did not consider the IFC effect at all [3,5,8,12,14], some researchers studied the reliability of PMS subject to ELC [1,2,4,6]. FLC has been studied for single-phase systems [28,29], however, to the best of our knowledge, no work has been performed to consider the effect of FLC in the reliability analysis of PMS. In this paper, the DB-PMS subject to FLC is first analyzed and multi-valued decision diagram (MDD) is adapted to evaluate the system reliability.

  • Optimal replacement policy for a repairable system with multiple vacations and imperfect fault coverage

    2013, Computers and Industrial Engineering
    Citation Excerpt :

    An efficient method to examine the reliability of phased-mission systems with imperfect fault coverage by introducing multi-mode failures was given by Xing (2007) whereas Myers (2007) did the modeling for the reliability of M-out-of-N system for four different coverage models. In this direction, the analysis of repairable system with imperfect coverage by means of binary decision diagrams and Bayesian approach has been considered by Myers and Rauzy (2008), and Ke, Lee, and Hsu (2008b), respectively while a repairable system with imperfect coverage under the fuzzy environment was discussed by Ke, Huang, and Lin (2008a). Kuniewski, Weide, and Noortwijk (2009) have done sampling inspection for the reliability evaluation of deteriorating systems under imperfect fault detection.

  • Reliability analysis method for redundancy systems based on GO methodology and the universal generation function

    2021, Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability
View all citing articles on Scopus
View full text