An improved weighted recursive PCA algorithm for adaptive fault detection
Graphical abstract
Introduction
PCA is a dimensionality reduction technique used for fault detection that optimally captures the maximum variance of the data in a low-dimensional space, and has been widespread used in process monitoring (Amanian, Salahshoor, Jafari, & Mosallaei. (2007), Choi & Lee (2004), Iwashita (1997), Jeng (2010); McGregor & Kourti (1995), Iwashita (1997), Russell et al., 2000, Qin (2003), Choi & Lee (2004), Amanian, Salahshoor, Jafari, & Mosallaei. (2007), Jeng (2010)). The large amount of observations gathered from sensors and actuators is turned into a couple of meaningful measures such as the and the statistics (Chen, Kruger, Meronk, & Leung (2004), Chiu & Ling (2009); Iwashita (1997), Chiu & Ling (2009)). Its ultimate advantage is to perform the monitoring procedures like univariate charts, comparing a calculated measure with a statistical threshold, both arranged in a single plot (Chow, Tan, Tabe, Zhang, & Thornhill, 1999).
Among the drawbacks ascribed to conventional PCA, perhaps the major one is that, once the fault detection models have been structured in the training step, their monitoring schemes remain invariant. This feature becomes a significant disadvantage considering that real industrial processes usually demonstrate slow time-varying behaviors, such as catalyst deactivation, heat exchanger fouling, equipment and sensor aging and process time-drifting (Chen & Liao (2002), Gallagher, Wise, Butler, White, & Barna (1997), Wold (1994), Yingwei, Shuai, & Yongdong (2012)). As a consequence, false alarms will eventually occur unless the underlying statistical structure is updated.
When processes exhibit slow-varying changes, adaptive/recursive approaches are more suitable to address the false alarm issue. On the other hand, when processes exhibit several different operation conditions, the multimode monitoring approaches should be implemented (Zhiqiang, Zhihuan & Furong, 2013). Another type of monitoring methods, which tackles both time-varying and multimode processes, is found in literature, such as the just-in-time-learning model proposed by Cheng and Chiu (2005), the adaptive local model proposed by Ge and Song (2008), and the external analysis combined with ICA (Independent Component Analysis) proposed by Kano, Hasebe, Hashimoto and Ohno (2004). An overview of these methods is summarized in Fig. 1.
Many adaptive approaches have been developed based on PCA and Partial Least Squares (PLS) algorithms as these techniques register the more successful industrial implementations. Alkaya and Eker (2011) proposed a PCA-based variance-sensible fault detection algorithm combined with a dynamic threshold, mitigating false alarms caused by time-drifting behaviors by following the statistic trend and adjusting the detection threshold. Including the time-varying behavior and variable autocorrelation is a key feature for a robust fault detection scheme, being one the most important development branches on the fault detection field. Some approaches combine univariate techniques like Exponentially Weighted Moving Average (EWMA) and Cumulative Sum (CUSUM) charts with PCA, e.g. Wold (1994) discussed the use of EWMA filters in conjunction with PCA and PLS. Ku, Storer, and Georgakis (1995) proposed a modification to PCA to include time-lagged information to mitigate the temporal correlation among the process variables; this method is considered a dynamic version of the conventional PCA, hence dynamic PCA (DPCA) (Maravelakis & Castagliola (2009), Rato & Reis (2013), Weihua & Qin (2001)).
Wang, Kruger, and Irwin (2005) presented a fast moving window PCA approach to improve monitoring efficiency of time-varying processes monitoring and Liu, Kruger, and Littler (2009) developed a moving window kernel PCA for non-linear time-varying process. Rigopoulos, Arkun, Kayihan and Hanezyc (1996) used a similar window scheme to identify significant modes in a simulated paper machine profile. Rannar, MacGregor and Wold (1997) used a hierarchical PCA for adaptive batch monitoring in a similar way to EWMA-based PCA. For industrial processes with multiple modes, different multimode approaches for process monitoring have been developed, such as the real-time monitoring approach proposed by Hwang and Han (1999). Moreover, when the monitoring of transition period between two different operation modes is a requirement, soft modeling algorithms offer a alternative to perform the fault detection procedures (Choi et al., 2005, Ge & Song (2010), Yu & Qin (2008)).
Besides adaptive approaches, there are also solutions relying on the periodic incorporation of new process data, thus recursively updating the statistical fault detection model. Dayal and MacGregor (1997) developed a recursive exponentially weighted PLS method for adaptive control in industrial processes, Wang, Kruger and Lennox (2003) built a recursive PLS (RPLS) model for adaptive monitoring in complex industrial processes, Naik, Yin, Ding and Zhang (2010) propose algorithms to deal with recursive identification of parity-based fault detection systems, updating their eigenstructure after every new measurement, which improves fault detection performance against frequent shifts in operation point or parameter variations. Qin (1998a), Qin (1998b) proposed several RPLS algorithms for both offline and on-line process modeling allowing the adaptation to process changes and dealing with a large number of data samples. These algorithms include a block-wise RPLS with a moving window and a forgetting factor adaptation scheme and a block-wise RPLS off-line used to reduce computation time and computer memory usage in PLS regression and cross-validation.
Like adaptive approaches, recursive techniques are developed based on PCA to take advantage of its widespread implementation. Jeng (2010) proposed a recursive PCA (RPCA) algorithm based on a rank-one matrix update. This algorithm pre-treats data to be mean-centered however, it does not perform an auto-scaling operation neglecting the effects of such changes on the standard deviations of process variables. Besides, this update is made after every new measurement (sample by sample), making it inconvenient due to the large amount of FLOPS (Floating Operation Points per Second) required. Weihua, Yue, Valle-Cervantes, and Qin (2000) proposed two PCA-based algorithms using a rank-one modification and a Lanczos tridiagonalization, respectively. After a computational complexity assessment, Weihua et al. concluded that the algorithm based on rank-one modification is less demanding. The rank-one algorithm carries out an auto-scaling operation to consider the changes on standard deviations of process variables; nevertheless it requires two spectral decompositions to update the eigenstructure. In addition, the formulas used to update the covariance matrix and standard deviations may be improved in order to lower their complexity. This algorithm also features a forgetting factor to weight current and new datasets.
In this paper, a new weighted adaptive recursive PCA-based algorithm is developed. A comparison between the proposed algorithm and other recursive PCA-based algorithms (Weihua, Yue, Valle-Cervantes, & Qin (2000), Jeng (2010)) is carried out in terms of false alarm rate, misdetection rate, detection delay and computational complexity. The paper is organized as follows: in Section 2, the background about conventional PCA is presented. The recursive formulas proposed for PCA are developed in Section 3 (WARP). In Section 4, the computational complexity of the proposed recursive formulas to update means, standard deviations and the eigenstructure is compared to the computational complexity of two sets of recursive formulas found in the literature. Section 5 assesses the performance of overall algorithms with a benchmark process, whilst the Section 6 contains a second validation with a real process data from a regional natural gas pipeline. Finally, Section 7 presents the conclusions and future work related to this investigation.
Section snippets
Conventional PCA algorithm
Historical process data corresponding to normal operation is arranged in a matrix, , where is the number of process variables being measured and is the number of samples. Variables with null variance or missing signal problems are removed, so that becomes , with . The means of these remaining variables are contained in the vector ,
The standard deviations of the variables are contained in:
The data
Recursive formulas proposed for PCA
A new set of recursive equations to update conventional PCA statistical model is carried out by incorporating the new available information within a specific online process dataset. A weighting (or learning) factor is introduced to assign “importance” to the new online dataset about to be included. A guideline notation used during this section is presented below.
Computational complexity
Computational complexity is assessed in terms of the FLOPs spent by overall operations required to perform the updates on the eigenstructure. This assessment is performed over the proposed algorithm and two sets of recursive formulas found in literature: one proposed by Jeng (2010) and another proposed by Weihua et al. (2000). The results from the comparison are summarized in Table 1. As a remark, the computational complexity of the three sets has been expressed according the notation presented
Performance assessment
As it may be observed in Fig. 4, the algorithm proposed in this paper has the lowest computational complexity compared to Jeng (2010) and Weihua, Yue, Valle-Cervantes, & Qin (2000), Jeng (2010), consuming between 7 and 20 times less FLOPs within the evaluated span of process variables. It is expected to achieve even better performance in this indicator for more complex processes with larger amount of variables. This reduction of computational demand is of special significance considering that a
Validation with real process data: natural gas transmission pipeline
A complementary validation of the proposed technique (WARP) along a reference technique (Weihua et al.) is performed using a real process dataset from a regional natural gas transmission pipeline. This dataset was also used by Torres, Posada, Garcia and Sanjuan (2012) in a previous research study on fault detection in natural gas pipelines.
Conclusions
A literature review showed that among the drawbacks ascribed to conventional PCA, perhaps the major one is that, once the fault detection models have been structured in the training step, their monitoring schemes remain invariant. This feature becomes a significant disadvantage considering that real industrial processes usually demonstrate slow time-varying behaviors. A new weighted recursive PCA-based algorithm (WARP) was developed in order to address the rise of false alarms in process
References (48)
- et al.
Variance sensitive adaptive threshold-based PCA method for fault detection and diagnosis
ISA Transactions
(2011) - et al.
Dynamic process fault monitoring based on neural network and PCA
Journal of Process Control
(2002) - et al.
Nonlinear process monitoring using JITL-PCA
Chemometrics and Intelligent Laboratory Systems
(2005) - et al.
A hybrid approach based on Hotelling statistics for automated visual inspection of display blemishes in LCD panels
Expert Systems with Applications
(2009) - et al.
Nonlinear dynamic process monitoring based on dynamic kernel PCA
Chemical Engineering Science
(2004) - et al.
Recursive exponentially weighted PLS and its applications to adaptive control and prediction
Journal of Process Control
(1997) - et al.
Online monitoring of nonlinear multiple mode processes based on adaptive local model approach
Control Engineering Practice
(2008) - et al.
Maximum-likelihood mixture factor analysis model and its application for process monitoring
Chemometrics and Intelligent Laboratory Systems
(2010) - et al.
Real-time monitoring for a process with multiple operating modes
Control Engineering Practice
(1999) Asymptotic null and nonnull distribution of Hotelling’s T2-statistic under the elliptical distribution
Journal of Statistical Planning and Inference
(1997)