A review of process fault detection and diagnosis: Part I: Quantitative model-based methods
Introduction
The discipline of process control has made tremendous advances in the last three decades with the advent of computer control of complex processes. Low-level control actions such as opening and closing valves, called regulatory control, which used to be performed by human operators are now routinely performed in an automated manner with the aid of computers with considerable success. With progress in distributed control and model predictive control systems, the benefits to various industrial segments such as chemical, petrochemical, cement, steel, power and desalination industries have been enormous. However, a very important control task in managing process plants still remains largely a manual activity, performed by human operators. This is the task of responding to abnormal events in a process. This involves the timely detection of an abnormal event, diagnosing its causal origins and then taking appropriate supervisory control decisions and actions to bring the process back to a normal, safe, operating state. This entire activity has come to be called Abnormal Event Management (AEM), a key component of supervisory control.
However, this complete reliance on human operators to cope with such abnormal events and emergencies has become increasingly difficult due to several factors. It is difficult due to the broad scope of the diagnostic activity that encompasses a variety of malfunctions such as process unit failures, process unit degradation, parameter drifts and so on. It is further complicated by the size and complexity of modern process plants. For example, in a large process plant there may be as many as 1500 process variables observed every few seconds (Bailey, 1984) leading to information overload. In addition, often the emphasis is on quick diagnosis which poses certain constraints and demands on the diagnostic activity. Furthermore, the task of fault diagnosis is made difficult by the fact that the process measurements may often be insufficient, incomplete and/or unreliable due to a variety of causes such as sensor biases or failures.
Given such difficult conditions, it should come as no surprise that human operators tend to make erroneous decisions and take actions which make matters even worse, as reported in the literature. Industrial statistics show that about 70% of the industrial accidents are caused by human errors. These abnormal events have significant economic, safety and environmental impact. Despite advances in computer-based control of chemical plants, the fact that two of the worst ever chemical plant accidents, namely Union Carbide's Bhopal, India, accident and Occidental Petroleum's Piper Alpha accident (Lees, 1996), happened in recent times is a troubling development. Another major recent incident is the explosion at the Kuwait Petrochemical's Mina Al-Ahmedi refinery in June of 2000, which resulted in about 100 million dollars in damages.
Further, industrial statistics have shown that even though major catastrophes and disasters from chemical plant failures may be infrequent, minor accidents are very common, occurring on a day to day basis, resulting in many occupational injuries, illnesses, and costing the society billions of dollars every year (Bureau of Labor Statistics, 1998, McGraw-Hill Economics, 1985, National Safety Council, 1999). It is estimated that the petrochemical industry alone in the US incurs approximately 20 billion dollars in annual losses due to poor AEM (Nimmo, 1995). The cost is much more when one includes similar situations in other industries such as pharmaceutical, specialty chemicals, power and so on. Similar accidents cost the British economy up to 27 billion dollars every year (Laser, 2000).
Thus, here is the next grand challenge for control engineers. In the past, the control community showed how regulatory control could be automated using computers and thereby removing it from the hands of human operators. This has led to great progress in product quality and consistency, process safety and process efficiency. The current challenge is the automation of AEM using intelligent control systems, thereby providing human operators the assistance in this most pressing area of need. People in the process industries view this as the next major milestone in control systems research and application.
The automation of process fault detection and diagnosis forms the first step in AEM. Due to the broad scope of the process fault diagnosis problem and the difficulties in its real time solution, various computer-aided approaches have been developed over the years. They cover a wide variety of techniques such as the early attempts using fault trees and digraphs, analytical approaches, and knowledge-based systems and neural networks in more recent studies. From a modelling perspective, there are methods that require accurate process models, semi-quantitative models, or qualitative model. At the other end of the spectrum, there are methods that do not assume any form of model information and rely only on process history information. In addition, given the process knowledge, there are different search techniques that can be applied to perform diagnosis. Such a collection of bewildering array of methodologies and alternatives often pose a difficult challenge to any aspirant who is not a specialist in these techniques. Some of these ideas seem so far apart from one another that a non-expert researcher or practitioner is often left wondering about the suitability of a method for his or her diagnostic situation. While there have been some excellent reviews in this filed in the past, they often focused on a particular branch, such as analytical models, of this broad discipline.
The basic aim of this three part series of papers is to provide a systematic and comparative study of various diagnostic methods from different perspectives. We broadly classify fault diagnosis methods into three general categories and review them in three parts. They are quantitative model based methods, qualitative model based methods, and process history based methods. We review these different approaches and attempt to present a perspective showing how these different methods relate to and differ from each other. While discussing these various methods we will also try to point out important assumptions, drawbacks as well as advantages that are not stated explicitly and are difficult to gather. Due to the broad nature of this exercise it is not possible to discuss every method in all its detail. Hence the intent is to provide the reader with the general concepts and lead him or her on to literature that will be a good entry point into this field.
In the first part of the series, the problem of fault diagnosis is introduced and fault diagnosis approaches based on quantitative models are reviewed. In the following two parts, fault diagnostic methods based on qualitative models and process history data are reviewed. Further, these disparate methods will be compared and evaluated based on a common set of desirable characteristics for fault diagnostic classifiers introduced in this paper. The relation of fault diagnosis to other process operations and a discussion on future directions are presented in Part III.
By way of introduction, we first address the definitions and nomenclature used in the area of process fault diagnosis. The term fault is generally defined as a departure from an acceptable range of an observed variable or a calculated parameter associated with a process (Himmelblau, 1978). This defines a fault as a process abnormality or symptom, such as high temperature in a reactor or low product quality and so on. The underling cause(s) of this abnormality, such as a failed coolant pump or a controller, is(are) called the basic event(s) or the root cause(s). The basic event is also referred to as a malfunction or a failure. Since one can view the task of diagnosis as a classification problem, the diagnostic system is also referred to as a diagnostic classifier. Fig. 1 depicts the components of a general fault diagnosis framework. The figure shows a controlled process system and indicates the different sources of failures in it. In general, one has to deal with three classes of failures or malfunctions as described below:
In any modelling, there are processes occurring below the selected level of detail of the model. These processes which are not modelled are typically lumped as parameters and these include interactions across the system boundary. Parameter failures arise when there is a disturbance entering the process from the environment through one or more exogenous (independent) variables. An example of such a malfunction is a change in the concentration of the reactant from its normal or steady state value in a reactor feed. Here, the concentration is an exogenous variable, a variable whose dynamics is not provided with that of the process. Another example is the change in the heat transfer coefficient due to fouling in a heat exchanger.
Structural changes refer to changes in the process itself. They occur due to hard failures in equipment. Structural malfunctions result in a change in the information flow between various variables. To handle such a failure in a diagnostic system would require the removal of the appropriate model equations and restructuring the other equations in order to describe the current situation of the process. An example of a structural failure would be failure of a controller. Other examples include a stuck valve, a broken or leaking pipe and so on.
Gross errors usually occur with actuators and sensors. These could be due to a fixed failure, a constant bias (positive or negative) or an out-of range failure. Some of the instruments provide feedback signals which are essential for the control of the plant. A failure in one of the instruments could cause the plant state variables to deviate beyond acceptable limits unless the failure is detected promptly and corrective actions are accomplished in time. It is the purpose of diagnosis to quickly detect any instrument fault which could seriously degrade the performance of the control system.
Outside the scope of fault diagnosis are unstructured uncertainties, process noise and measurement noise. Unstructured uncertainties are mainly faults that are not modelled a priori. Process noise refers to the mismatch between the actual process and the predictions from model equations, whereas, measurement noise refers to high frequency additive component in the sensor measurements.
In this series of review papers, we will provide a review of the various techniques that have been proposed to solve the problem of fault detection and diagnosis. We classify the techniques as quantitative model based, qualitative model based and process history based approaches. Under the quantitative model based approaches, we will review techniques that use analytical redundancy to generate residuals that can be used for isolating process failures. We will discuss residual generation through diagnostic observers, parity relations, Kalman filters and so on. Under the qualitative model based approaches, we review the signed directed graph (SDG), Fault Trees, Qualitative Simulation (QSIM), and Qualitative Process Theory (QPT) approaches to fault diagnosis. Further, we also classify diagnostic search strategies as being topographic or symptomatic searches. Under process history based approaches we will discuss both qualitative approaches such as expert systems and qualitative trend analysis (QTA) techniques and quantitative approaches such as neural networks, PCA and statistical classifiers.
We believe that there have been very few articles that comprehensively review the field of fault diagnosis considering all the different types of techniques that have been discussed in this series of review papers. Most of the review papers such as the one by Frank, Ding, and Marcu (2000) seem to focus predominantly on model based approaches. For example, in the review by Frank et al., a detailed description of various types of analytical model based approaches is presented. The robustness issues in fault detection, optimized generation of residuals and generation of residuals for nonlinear systems are some of the issues that have been addressed in a comprehensive manner. There are a number of other review articles that fall under the same category. A brief review article that is more representative of all the available fault diagnostic techniques has been presented by Kramer and Mah (1993). This review deals with data validation, rectification and fault diagnosis issues. The fault diagnosis problem is viewed as consisting of feature extraction and classification stages. This view of fault diagnosis has been generalized in our review as the transformations that measurements go through before a final diagnostic decision is attained. The classification stage is examined by Kramer and Mah as falling under three main categories. (i) pattern recognition, (ii) model-based reasoning and (iii) model-matching. Under pattern recognition, most of the process history based methods are discussed; under model-based reasoning most of the qualitative model based techniques are discussed; and symptomatic search techniques using different model forms are discussed under model matching techniques.
Closely associated with the area of fault detection and diagnosis is the research area of gross error detection in sensor data and the subsequent validation. Gross error detection or sensor validation refers to the identification of faulty or failed sensors in the process. Data reconciliation or rectification is the task of providing estimates for the true values of sensor readings. There has been considerable work done in this area and there have also been review papers and books written on this area. Hence, we do not provide a review of this field in this series of papers. However, as mentioned before, fault diagnosis includes sensor failures also in its scope and hence data validation and rectification is a specific case of a more general fault diagnosis problem (Kramer & Mah, 1993).
The rest of this first part of the review is organized as follows. In the next section, we propose a list of ten desirable characteristics that one would like a diagnostic system to possess. This list would help us assess the various approaches against a common set of criteria. In Section 3, we discuss the transformations of data that take place during the process of diagnostic decision-making. This discussion lays down the framework for analyzing the various diagnostic approaches in terms of their knowledge and search components. In Section 4, a classification of fault diagnosis methods is provided. In Section 5, diagnosis methods based on quantitative models are discussed in detail.
Section snippets
Desirable characteristics of a fault diagnostic system
In the last section, the general problem of fault diagnosis was presented. In order to compare various diagnostic approaches, it is useful to identify a set of desirable characteristics that a diagnostic system should possess. Then the different approaches may be evaluated against such a common set of requirements or standards. Though these characteristics will not usually be met by any single diagnostic method, they are useful to benchmark various methods in terms of the a priori information
Transformations of measurements in a diagnostic system
To attempt a comparative study of various diagnostic methods it is helpful to view them from different perspectives. In this sense, it is important to identify the various transformations that process measurements go through before the final diagnostic decision is made. Two important components in the transformations are the a priori process knowledge and the search technique used. Hence, one can discuss diagnostic methods from these two perspectives. Also, one can view diagnostic methods based
Classification of diagnostic algorithms
As discussed earlier two of the main components in a diagnosis classifier are: (i) the type of knowledge and (ii) the type of diagnostic search strategy. Diagnostic search strategy is usually a very strong function of the knowledge representation scheme which in turn is largely influenced by the kind of a priori knowledge available. Hence, the type of a priori knowledge used is the most important distinguishing feature in diagnostic systems. In this three part review paper we classify the
Quantitative model-based approaches
This section reviews quantitative model-based fault diagnosis methods. The concept of analytical redundancy is introduced first, followed by a description of discrete dynamic system with linear models. The most frequently used FDI approaches, including diagnostic observers, parity relations, Kalman filters and parameter estimation are outlined. The recent effort of generating enhanced residuals to facilitate the fault isolation procedure is discussed. We will discuss the principles behind these
Conclusions
In this first part of the three part review paper, we have reviewed quantitative model based approaches to fault diagnosis. For the comparative evaluation of various fault diagnosis methods, we first proposed a set of desirable characteristics that one would like the diagnostic systems to possess. This can serve as a common set of criteria against which the different techniques may be evaluated and compared. Further, we provided a general framework for analyzing and understanding various
References (65)
Detecting changes in signals and systems—a survey
Automatica
(1988)- et al.
Simplification techniques for EKF computations in fault diagnosis - suboptimal gains
Chemical Engineering Science
(1998) Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy—a survey and some new results
Automatica
(1990)- et al.
Detection and diagnosis of plant failures: the orthogonal parity equation approach
Control and Dynamic Systems
(1990) - et al.
Generating directional residuals with dynamic parity relations
Automatica
(1995) - et al.
A new structural framework for parity equation-based failure detection and isolation
Automatica
(1990) - et al.
Artificial neural network models of knowledge representation in chemical engineering
Computers and Chemical Engineering
(1988) Process fauls detection based on modelling and estimation methods—a survey
Automatica
(1984)- et al.
Combining pattern classification and assumption-based techniques for process fault diagnosis
Computers and Chemical Engineering
(1992) Recent safety and environmental legislation
Trans IchemE
(2000)