Contributed Paper
An integrated model-based approach for real-time on-line diagnosis of complex systems

https://doi.org/10.1016/S0952-1976(97)00054-7Get rights and content

Abstract

Model-based diagnostic programs have been shown to be useful in isolating unpredictable faults in various types of systems. Due to the complex nature of many of these systems, models used by these programs to represent monitored systems have traditionally imposed restrictions on domain representations. These restrictions can make it difficult (and often impossible) to model a domain whose behavior is global in nature. By global, is meant behavior that affects system variables in parts of the system not directly related to the component in question. Analog electrical circuits and hydraulic circuits are only a few examples of such global systems. Accurate modelling of the behavior of these global systems is very often essential for obtaining a correct diagnosis. In complex systems such as those typically found in the electrical power-distribution domain, global behavior can be observed when voltages and currents throughout an entire system are affected by local load fluctuations, transient disturbances, faults, or circuit re-configurations, even when these are in remote parts of the circuit. Traditional models used in diagnosis have not been able to easily reflect these global interactions, and as a result, monitoring and diagnostic capabilities of model-based systems dependent upon such models are significantly degraded. This paper presents an implementation that can correctly simulate power systems and other such complex systems by overcoming the problem of representing global behavior while preserving the diagnostic abilities of structure–function models in model-based reasoning methodologies. This paper describes the integration of robust models, within the conventional device-centered models. These robust models are mathematically accurate system models, normally used in quantitative simulation for the purpose of system analysis. If used within the conventional device-centered models, they can provide the functionality needed in a structure–function model-based diagnostic paradigm, and therefore eliminate the problem of representing global behaviors in diagnosis. This paper further describes a conflict-oriented diagnostic technique used in conjunction with robust models to obtain real-time on-line FDIR (Fault Diagnosis, Isolation, and Recovery).

Introduction

The autonomous operation and control of systems is an important topic in domains where constant supervision of critical values is needed, large amounts of data are processed, and rapid response to various phenomena is a must. One area associated with the autonomous operation and maintenance of systems involves the handling of faulty system parts. Faulty parts must be detected and the system must be restored to an acceptable state through fault diagnosis, isolation, and recovery (FDIR) or by troubleshooting and repair. In those systems that must remain on-line for sustained periods of time before repair is possible, redundancy is usually designed into the system. This enables the system to remain operational by establishing an alternate pathway should the normal pathway contain a faulty component. This action is called recovery. The location and incapacitation of the pathway that contains the fault is called fault isolation. This will minimize potential damage to other devices in the system, and prevent the faulty pathway from interfering with the normal operation of a restored system. Once the system has been restored to normal operation, repair can take place at the convenience of the technician. The act of identifying the faulty component, whether by manual methods (troubleshooting) or automated methods (diagnostic programs), is called diagnosis.

Diagnostic knowledge for complex systems can be found in several forms. The first of these is termed shallow reasoning, and is most often used in reasoning methodologies that are associative in nature. Associative systems rely on experience-based knowledge as typically found in human experts. This expertise is most often represented in the form of rules in the traditional rule-based knowledge-based systems. For simple systems, this method can be quite effective. However, as monitored systems become more complex, the large number of knowledge bits (i.e., rules, frames) to be generated during development as well as executed at run time, can result in a slow and computationally expensive system. Furthermore, associative systems cannot generally adapt to unforeseen situations, as all fault modes must be predefined by the developer within a fault model.

There are numerous diagnostic systems described in the literature which employ associative techniques, and an exhaustive review of these is not warranted here. Nevertheless, some notable ones exist that are closely related to the work described in this paper. One of these is a diagnostic tool called the Fault Recovery and Management Expert System (FRAMES) (Ashworth, 1989; Ashworth and Walls, 1990; Risdesel, 1989; Risdesel et al., 1989). FRAMES is a largely associative system implemented in LISP which performs FDIR on a spacecraft power-distribution system. While it performs isolation and recovery quite capably and quickly, it depends on the interrupting device itself for primary fault detection and isolation.

Gonzalez et al. (1986) developed a purely rule-based system called GenAID to diagnose abnormal conditions in real time for a turbine-generator system. The magnitude of the knowledge base (around 6000–7000 rules) attests to the significant development effort required for such large systems. Nevertheless, the domain of generator diagnostics is not easily defined through first principles, and the real-time requirements of such a system are in the order of minutes, which is supportive of a slow diagnostic paradigm. Furthermore, this system is advisory in nature and does not unilaterally carry out any control functions. GenAID has been in continuous commercial operation since 1985.

Another similar system called the Generator Expert Monitoring System (GEMS) (Lloyd et al., 1989) employs Bayesian Belief networks to represent the same associative knowledge for on-line diagnosis of electrical generators. There are many other diagnostic systems that employ shallow reasoning or fault models (Bau and Brezillon, 1992; Gholdston et al., 1988; Hester, 1986; Paasch and Agogino, 1993; Padalkar et al., 1991; Spier and Liffring, 1989; Sueda and Iwamasa, 1995; Watson et al., 1988).

These serious disadvantages of associative systems described above, however, have caused researchers to search for more robust ways to carry out the diagnostic function. This has led to the technique known as the first principles approach which is said to use deep knowledge. This approach encompasses what is known as model-based reasoning (MBR), and is often used in encoding the transfer (input-to-output) equations or constraints in structure–behavior models. These structure–behavior models are designed to predict and/or simulate the behavior of the monitored system. The structure–behavior approach has traditionally required an assignment of directionality to its objects, and the use of the principle of locality that suggests causal relationships between component neighbors. These restrictions give rise to the problem of representing global behaviors. Both associative and first-principle-based reasoning methodologies employ either a qualitative or quantitative approach to representing measured values throughout the modelled system. Diagnostic systems with qualitative methodologies (Dvorak and Kuipers, 1991; Ng, 1991; Watson et al., 1988), however, are susceptible to faults that may be undetectable in macro-sensitive representations.

Adamovits and Pagurek (1993) employ a hybrid method in the system called Multiple Fault Diagnostic System (MFDS), also intended to perform FDIR on electric power systems. The authors combine a robust first-principles model of a domain with shallow reasoning diagnostic control structures. Here, global behavior is handled by shallow reasoning procedures during a hypothesis space pruning step which weeds out many of the hypotheses that would have been due to global phenomena (Adamovits and Pagurek, 1993). This step, as it is shallow reasoning, requires fault models. Thus, any global behavior exhibited during the malfunction of the device is handled by shallow reasoning procedures rather than by the robust model. Later, during a testing phase, the robust model is used to determine newly calculated values for global behaviors that occur with changes to the resistive equivalence of the electric circuit due to structural changes from switches opening and closing.

The resulting implementation is a successful strategy that effectively handles multiple faults. However, the implementation suffers from the problems brought on by fault models, inefficient candidate reduction, and much off-line testing. The time complexity involved is in the order of minutes, with the majority of time being spent in the Prolog-implemented reasoning procedures (Adamovits and Pagurek, 1993).

Another system described in the literature uses both model-based and associative methods to reduce candidates. Xiang and Srihari (1986) use a strategy for diagnostic reasoning that encompasses both shallow and deep reasoning (i.e., reasoning from experience and reasoning from first principles). The approach used in the design of this system is first to diagnose using empirical knowledge and then, if need be, to use a model-based approach. When using the model-based approach, reasoning takes place qualitatively at first; then, if need be, quantitatively.

A fault-identification method that uses a structure-behavior model eliminates the problem of unforeseen faults, since these would be modelled implicitly in the objects or devices themselves, and the knowledge base involved would not grow on the basis of the number of fault patterns of a system, but rather would be based on the number of devices in the system. This is one significant benefit of structure–behavior systems.

Fesq et al. (1992) employ a constraint suspension technique (see Davis, 1984) in developing a system called Marple. When a fault occurs, the offending device propagates the error, and the behavior of a number of objects will then become inconsistent with the modelled system. The lowest component is ignored or suspended and its predecessor (the device from which this component gets its input) is observed. If an inconsistency still exists, then the next predecessor is suspended, and so on, until there are no more inconsistencies between the remaining devices and the model. The last component to be suspended is found to be the fault.

Suspects are viewed as components that are causally connected upstream of a discrepancy. In the case of many device-centered models, causality is modelled as local behaviors. Behaviors that are non-local (i.e., global) are difficult or impossible to represent in device-centered models. At times these global behaviors do not follow any perceived physical connections, such as in a bridge fault where the actual physical connectivity of a device is accidentally compromised by a foreign influence such as flying solder in a circuit board. In the candidate–generation step, such a multiple-fault scenario would require hypothesis sets that included the two components in order that the diagnoser could reason that the two failed at the same time. In theory, causalities can occur among every possible multiple of components. To account for all probabilities, every component in the system should be a suspect. However, this is not computationally acceptable. Therefore, diagnostic tool designers discount small-probability scenarios in order to handle the majority of faults. For example, device-centered models tag the components upstream of a discrepancy as its suspects.

Typically, the output values of modelled components are predicted by propagating primary input values downstream until the final downstream components are met. These end-components are usually sensors. The predicted values for these sensors are compared with the actual measured values. If any of these comparisons reveal a discrepancy, the diagnostic procedure is initiated. Suspects are obtained from the upstream components of the discrepant sensor or sensors. If these upstream components were dependent upon further upstream components, the further upstream components are also added to the suspect list. These dependencies are continuously checked upstream until the primary inputs are reached. This process can be simplified by keeping a dependency record for each predicted sensor value. The combination of the dependency records (structural connectivity of the components) for all discrepant sensor readings forms the basis for a suspect list. The suspect list may then be reduced by combining sets of the dependency records. Intuitively, (as well as by proof as in de Kleer et al., 1992), the intersection of the dependency records (assuming more than one discrepant sensor, i.e., more than one dependency record) will contain the culprit (failed component). This is only valid, however, when assuming unidirectional causal models, because a faulty behavior in a particular branch that causes discrepancies in disjointed or higher branches cannot be identified by intersection. This method of generating suspects is termed dependency-based (Hamscher and Davis, 1987).

A third method, called the conflict-oriented view, is “a more general framework than the intuitive notion of upstream tracing” (Hamscher and Davis, 1987). Predictions of the behavior of components are generated. These predictions are made with the assumption that all of the components are operating correctly. Discrepancies indicate that the assumption of correct operation is false for at least one of the components.

Hamscher and Davis (1987) indicate that “in domains for which components' causal direction is the sole source of dependencies, there is little distinction between the `upstream tracing' and the `conflict-oriented' views”. The benefit of the conflict-oriented view is that it more readily accommodates components that are non-directional in nature because distinctions between inputs and outputs need not be made (Hamscher and Davis, 1987). The disadvantage of the conflict-oriented view is that it requires sensors between all components so that conflicts can be localized at the component level. A possible fix to this is to supplement this approach with a fault envisionment (Hamscher and Davis, 1987) method that will allow conflicts to be extrapolated to nearby sensors. Hamscher and Davis (1987), however, note that fault envisionment requires a predefined set of misbehaviors for each component so that simulation may be possible. The potential exponential growth of these predefined misbehaviors is the significant disadvantage of fault envisionment.

Other work in model-based diagnostics of electric power systems FDIR was carried out by Lee (1993), Adams (1986) and Blasdel (1987) with varying degrees of success.

Section snippets

On-line, real-time, diagnosis using conflict sets and robust models

The objective of the work described here is to devise, implement and test diagnostic techniques that can enable real-time FDIR without the disadvantages of having to pre-determine all potential faults. Reiter's, 1987 conflict set paradigm is utilized to produce a conflict-oriented diagnostic algorithm together with a robust modelling approach, integrated within a structure–behavior model. This system meets the requirements stated above without requiring fault envisionment or other such

The conflict-oriented approach

To incorporate robust models in a device-centered model-based diagnostic reasoner (which would provide the needed speed improvement in isolating faults, and eliminate cumbersome meta-object strategies for representing global behavior) (McKenzie, 1994), a diagnostic method was required that improves upon the constraint suspension method used in KATE, which relies upon time-consuming substitution and recalculation (re-prediction) functions.

Referring to Fig. 1, someone employing constraint

Global scope in FDIR

Dynamic changes in the status of loads (i.e., closed or open switches) cannot be properly represented without the inclusion of global behaviors which are difficult to model in device-centered models. These occur in dynamic systems where altering devices in one area of a model indirectly affects another area of the model, not directly connected to the devices altered. Orthodox modelling of this behavior in unidirectional paradigms results in a circular network of objects. An upstream (reverse)

Testbed system

The system to be monitored is a subsystem of the distribution network in the Space Station Freedom. This was represented by the Space Station Module Power Management And Distribution (SSM-PMAD, or PMAD for short). It consists of a variety of buses and switches, connected to two power sources that supply power to various loads throughout the space station. The hardware testbed is located at the NASA Marshall Space Flight Center (MSFC) in Huntsville, Alabama.

The PMAD is composed of two power

Summary and conclusion

The techniques that form the basis for the research reported here—that of applying a conflict-oriented diagnostic approach and a robust quantitative modelling technique—were incorporated in a prototype system called the Intelligent Power Controller (IPC) (McKenzie, 1994). The IPC had earlier served as the prototype for testing other reasoning techniques, such as constraint suspension, as well as modelling the power system using another technique developed by the authors called meta-objects (

Frederic McKenzie has been a member of the Advanced Distributed Simulation Research Team (ADS RT) since April 1995, serving as P.I. for two interoperability IRAD projects. He holds a Senior Scientist position at SAI Orlando. For two years prior to joining the ADS RT, he had been a member of the SAF Behaviors team for the Close Combact Tactical Trainere (CCTT) project. He obtained a Master of Science in Computer Engineering in 1990, and a Ph.D. Engineering in 1994 from the University of Central

References (33)

  • R. Davis

    Diagnostic reasoning based on structure and behavior

    Artificial Intelligence

    (1984)
  • J. de Kleer et al.

    Characterizing diagnosis and systems

    Artificial Intelligence

    (1992)
  • R. Reiter

    A theory of diagnosis from first principles

    Artificial Intelligence

    (1987)
  • Adamovits, P., Pagurek, B., 1993. Simulation (model) based fault detection and diagnosis of a spacecraft electrical...
  • Adams, T.L., 1986. Model-based reasoning for automated fault diagnosis and recovery planning in space power systems. In...
  • Ashworth, B., 1989. An architecture for automated fault diagnosis. In Proceedings of the 24th IECEC, pp....
  • Ashworth, B., Walls, B., 1990. Autonomous operation of a space station freedom type power testbed. In Proceedings from...
  • D. Bau et al.

    Model-based diagnosis of power-station control systems

    IEEE Expert

    (1992)
  • Blasdel, A.N., 1987. Automated fault handling of a satellite electrical power subsystem using a model-based expert...
  • D. Dvorak et al.

    Process monitoring and diagnosis: a model-based approach

    IEEE Expert

    (1991)
  • Fesq, L.M., Stephan, A., McNamee, L., 1992. Modeling Power Systems for Diagnosis: How Good is Good Enough? Proceedings...
  • P.A. Fishwick et al.

    Qualitative physics: towards the automation of systems problem solving

    J. Expt. Theor. Artif. Intell.

    (1991)
  • Gholdston, E.W., Janik, D.F., Lane, G., 1988. A diagnostic expert system for space-based electrical power networks. In...
  • Gonzalez, A., Morris, R.A., McKenzie, F., Carreira, D., Gann, B., 1996. Model-based real-time control of electrical...
  • A.J. Gonzalez et al.

    On-line diagnosis of turbine generators using artificial intelligence

    IEEE Transactions on Energy Conversions

    (1986)
  • Hamscher, W., Davis, R., 1987. Issues in model-based troubleshooting. A.I. Memo 893, Artificial Intelligence Lab.,...
  • Cited by (9)

    • A real-time fault diagnosis methodology of complex systems using object-oriented Bayesian networks

      2016, Mechanical Systems and Signal Processing
      Citation Excerpt :

      Over the last several years, some real-time fault diagnosis methods have been developed for various systems and structures. McKenzie et al. [3] proposed integrated robust models and device-centered models for real-time fault diagnosis of complex systems, and demonstrated the diagnosis technique using power systems. Chen et al. [4] proposed an on-line fault diagnosis approach to estimate the fault section and to identify the fault types by using hybrid cause-and-effect networks and fuzzy rule-based approaches.

    • Model-based development of a fault signature matrix to improve solid oxide fuel cell systems on-site diagnosis

      2015, Journal of Power Sources
      Citation Excerpt :

      It is worth remarking that the aim of this paper is to give a guideline for the development of an improved FSM to be implemented into a comprehensive diagnostic algorithm. The study has been made with the only purpose of highlighting a plausible approach to design an improved diagnostic algorithm [32]. The considered mathematical model is exploited to reproduce both normal operating conditions and faulty states, without a specific reference to a real system and the related errors in the reproduction of the system behaviour.

    • An integrated lookahead control-based adaptive supervisory framework for autonomic power system applications

      2014, International Journal of Electrical Power and Energy Systems
      Citation Excerpt :

      This approach is more robust in principle and is better prepared for unforeseen consequences which adapt to the environment accordingly. Models from first principles [2], probabilistic models [3], and data-driven models [4] have been used for power system prediction, regulation, disturbance rejection and optimization. Designs incorporating the system model assist on proactive control [5] with the near future predicted trajectories of particular objectives.

    • Bayesian networks in fault diagnosis: Practice and application

      2018, Bayesian Networks In Fault Diagnosis: Practice And Application
    View all citing articles on Scopus

    Frederic McKenzie has been a member of the Advanced Distributed Simulation Research Team (ADS RT) since April 1995, serving as P.I. for two interoperability IRAD projects. He holds a Senior Scientist position at SAI Orlando. For two years prior to joining the ADS RT, he had been a member of the SAF Behaviors team for the Close Combact Tactical Trainere (CCTT) project. He obtained a Master of Science in Computer Engineering in 1990, and a Ph.D. Engineering in 1994 from the University of Central Florida.

    View full text