main-content

## Weitere Artikel dieser Ausgabe durch Wischen aufrufen

25.10.2017 | Original article | Ausgabe 6/2017 Open Access

# Adaptive Change Detection for Long-Term Machinery Monitoring Using Incremental Sliding-Window

Zeitschrift:
Chinese Journal of Mechanical Engineering > Ausgabe 6/2017
Autoren:
Teng Wang, Guo-Liang Lu, Jie Liu, Peng Yan
Wichtige Hinweise
Supported by National Natural Science Foundation of China (Grant Nos. 61403232, 61327003), Shandong Provincial Natural Science Foundation of China (Grant No. ZR2014FQ025), and Young Scholars Program of Shandong University, China (YSPSDU, 2015WLJH30).

## 1 Introduction

Detection of structural changes from an operational process is a major goal in machinery monitoring which enables to solve many practical problems ranging from early fault detection, safety protection as well as other process control problems. Existing works are mainly based on a retrospective analysis of a data stream composed of numerical condition monitoring (CM) variables, such as vibration, sound, power consumption. The basic idea of a standard retrospective change detection mainly relies on estimating the logarithm of the likelihood ratio between two distributions [ 1, 2]. This kind of strategy argues that the detection of a change can be converted into the detection of the parameter difference between the two distributions before and after this change point. As a consequence, the retrospective change detection aims to estimate this parameter difference of distributions based upon likelihood ratio statistics. Change decision can be made by performing a null hypothesis testing with a threshold. Many effective tools for this goal such as the cumulative sum metric (CUSUM) [ 36], geometric moving average (GMA) [ 7, 8] and the generalized likelihood ratio test (GLRT) [ 912] have been widely used. For example, Willersrud et al. [ 12] developed the GLRT to make efficient downhole drilling washout detection with the multivariate t-distribution; Reñones et al. [ 6] used the CUSUM analysis for multi-tooth machine tool fault detection. Although these methods have been experimentally demonstrated the effectiveness in various fields, due to the requirement of data after change point, a large detection delay is an essential limitation of these methods for real applications [ 13]. On the other hand, real-time change detection aims to detect changes as soon as possible when a change occurs, this requirement is crucial in many real-life scenarios such as security monitoring [ 14, 15], health care [ 16, 17], automated factory [ 18, 19] as well as machine operation monitoring studied in this paper. In operation of real-time change detection, at each time when a datum is input, it evaluates what extent the input datum is likely to be a change-point by a certain type of measuring score [ 20] which does not need any input data after change time. The real-time approaches have succeeded in solving many practical applications (e.g., wind turbine condition monitoring [ 21], driver vigilance monitoring [ 22]), and thus are promising [ 23].
The goal of this paper is to further advance this research line of real-time detection methods. More specifically, our main contributions in this paper are summarized in two folds. The first contribution is to apply a martingale-based framework proposed in our recent article [ 24] to long-term machine monitoring by combining an incremental sliding-window strategy. The basic idea of the original martingale is to directly learn a statistical regularity from already observed data, and then detect possible change(s) by investigating how much each data is deviated from the regularity using martingale by testing exchangeability. That framework, however, only works for at-most-one-point change detection, thus unsuitable for the cases containing multiple changes in long-term monitoring applications. In this paper, we introduce an incremental sliding-window strategy for solving this problem.
Recall that, the threshold value for change decision making is a key factor of detection accuracy. Potential weakness of the majority of exiting algorithms, e.g., Refs. [ 511], is that they need either human-made instruction/intervention or an off-line cross-validation to confirm the value of threshold before operation, and thus make them largely limited in real applications. Another contribution of this paper is to develop a new adaptive threshold when performing change decision making. In particular, we introduce an alternative factor: a fixed-global parameter used to control the coarse-to-fine level of detection, instead of the fixed value of threshold, for change detection (see Section  3.2 for details). By using this factor, at each step of change decision, the threshold value can be adaptively computed from the already observed data.
Besides the methodological extensions of the proposed method, we also conducted validations on an experimental setup to investigate effectiveness/priority of the method for change detection with large datasets. For more details, please see Section  4.
The rest of the paper is structured as follows. Section  2 presents the outline of martingale framework for machine monitoring. Section  3 formulates problems addressed in this paper and provides our proposed methods, followed by experimental results in Section  4. Section  5 concludes this paper and shows the future work.

## 2 Martingale Based Change Detection for Machinery Monitoring

Assuming that we have collected a data stream composed of numerical CM variables from an operational process of a machine, i.e., X = { x 1,…, x i ,…, x n } where x i $$i \in \{ 1,2, \ldots ,n\}$$ is the variable value at time $$i$$, three points are provided in the following to support the use of martingale for change detection:
(1)
The changes are detected by testing the null hypothesis that all n (strangeness) values (which corresponds to $$x_{1} ,x_{2} , \ldots ,x_{n} ,$$ respectively) are exchangeable in the index, through the corresponding exchangeability martingale $$M_{1} ,M_{2} , \ldots ,M_{n}$$, where M n is a measurable function of $$s_{1} ,s_{2} , \ldots ,s_{n}$$, satisfying
$$M_{n} = E(M_{n + 1} |M_{1} ,M_{2} , \ldots ,M_{n} ).$$
(1)

(2)
The following Doob’s inequality [ 25] can be used for rejecting this null hypothesis for a large value of $$M_{n}$$:
$$P(\exists n|M_{n} \ge \lambda ) \le \frac{1}{\lambda }.$$
(2)

(3)
This (exchangeability) martingale is constructed from a p value, the probability of obtaining a test statistic at least as extreme as the one that was actually observed, and the p-value is obtained by a strangeness value appropriately determined in each specific application.

On the basis of the above three points, the outline of performing martingale for change detection is described as follows (see Ref. [ 25] for more details):
Step 1: The randomized power martingale (RPM) [ 26] is constructed from the computed $$s_{1} ,s_{2} , \ldots ,s_{n}$$ by
$$M_{t} = \prod\limits_{i = 1}^{t} ( \varepsilon \hat{p}_{i}^{\varepsilon - 1} ),t \in \{ 1,2, \ldots ,n\} ,$$
(3)
where $$\varepsilon \in (0,\;1)$$ and $$\hat{p}_{i} s$$ are the value computed from $$\hat{p}$$ −value functions:
\begin{aligned} &\hat{p}_{i} (\{ s_{1} , \ldots ,s_{i - 1} \} ,s_{i} ) \hfill \\ &\quad = \frac{{\# \{ j:s(j) > s(i)\} + \theta_{i} \# \{ j:s(j) = s(i)\} }}{i}, \hfill \\ \end{aligned}
(4)
where # $$\{ \bullet \}$$ is a counting function and $$\theta_{i}$$ is a random value from a uniformly distribution of [0, 1], $$j \in \{ 1,2, \ldots ,i - 1\} .$$
Step 2: The following inequality is then tested based on the Doob’s inequality for any $$t \in \{ 1,2, \ldots ,n\}$$ to test the hypothesis as below:
\begin{aligned} H_{0} :\;\;{\text{no}}\;{\text{change}}\;:\;\;\;0 < M_{t} < \lambda \;, \hfill \\ H_{A} :\;\;{\text{change}}\;{\text{occurs}}\;:\;\;\;M_{t} \ge \lambda \;. \hfill \\ \end{aligned}
(5)
That is, if the martingale value M t is greater than a predefined threshold $$\lambda$$, H A in Eq. ( 5) is satisfied, i.e., a change occurs on the time t. Otherwise, the martingale test satisfying $$H_{0}$$ continues to operate as long as 0 <  M t  < λ.

## 3 Problem Formulation and Proposed Scheme

In Section  2, we have provided the outline of martingale-test for change detection. Two problems have to be further considered for long-term machinery monitoring:
(1)
How to deal with multi-change detection in long-term monitoring?

(2)
Is it possible to adaptively compute the threshold value when making change decision?

In the following, we will discuss these two problems in details and provide our proposed schemes.

### 3.1 Change Detection Using Incremental Sliding-Window

Problem 1. As shown in Eq. ( 3) and Eq. ( 5), $$M_{t}$$ can be sequentially processed with a fixed-length L sliding-window over the given data stream, and all possible change candidates $$t \in \{ 1,2, \ldots ,n\}$$ are tested. This process however may be unsuitable for long-term monitoring applications. A key feature of real machine operations is temporal variations, i.e., one operation can last for a long time or only a few seconds. Hence, it is difficult to use a fixed-length L sliding-window to capture transitions (i.e., changes from an operational state to another) in long-term monitoring. More specifically, a small length of L causes over-change-detection and a large length of L causes a large delay. To overcome this problem, we combine the martingale with an incremental sliding-window strategy [ 27] to design a real-time change detection algorithm for Eq. ( 5).
Proposed scheme: By virtue of incremental sliding-window, the length L can be automatically updated depending on whether a change is detected or not at time t:
\begin{aligned} &{\text{If}}\;t\;\;{\text{is no change }}:\;\;n_{t + 1} = n_{t} ,\;\;\;L_{t + 1} = L_{t} + \Delta L, \hfill \\ &{\text{If}}\;t \, \;{\text{is a change }}:\;\;n_{t + 1} = t,\;\;\;L_{t + 1} = L_{1} , \hfill \\ \end{aligned}
(6)
where $$n_{t}$$ is the starting time when computing the current martingale and $$L_{t}$$ is the length of corresponding sliding window at time t. The process starts with $$n_{1}$$ = 1 and $$L_{1}$$ = 1, and ends with $$n_{t} + L_{t} > n$$ where n is the length of a given data stream in off-line applications or ends at an pre-defined stopping time in on-line applications. Here, it is worth mentioning that ∆ L is an increasing step to update the sliding window and was set as 1 in the following experiments.

### 3.2 Adaptive Threshold for Change Detection

Problem 2. When making change decision by testing the null hypothesis shown in Eq. ( 5), the threshold of $$\lambda$$ is essential as it balances the detection precision and recall (their definitions will be given in Section  4.3). In general, the value of $$\lambda$$ is pre-defined empirically or confirmed by a prior estimation for change detection. It is, however, often difficult to confirm the optimal value in real-world applications. To address this problem, unlike existing works that directly used the original monitored variables for change detection (e.g., Refs. [ 612, 25]), we utilize the Hilbert space embedding of distribution (HED, also called the kernel mean or mean map) to map the original data $$\{ x_{i} \}$$$$i \in \{ 1,2, \ldots ,n\}$$ into the Re-producing Kernel Hilbert Space (RKHS) (see Figure  1). Without going into details, the idea of using HED for change detection is straightforward. By this, the probability distribution is now represented as an element of a RKHS, and the change can be thus detected by using a well behaved smoothing kernel function, whose values are small on the data belonging to the same pattern and large on the data from different patterns.
Proposed scheme: Inspired by Ref. [ 28], probabilistic distributions can be embedded in RKHS. The center of the HED are the mean mapping functions:
\begin{aligned} \mu (p_{x} )= E (k ( {{\{ }}x_{i} {{\} ))}}, \hfill \\ \mu ( {{\{ }}x_{i} {{\} )}} = \frac{1}{t}\sum\limits_{i = 1}^{t} {k (x_{i} )} , \hfill \\ \end{aligned}
(7)
where $$\{ x_{i} \}$$ i = 1,2,…, t are assumed to be I.I.D sampled from the distribution $$P_{x}$$. Under mild conditions,  $$\mu \;(P_{x} )$$(same for $$\mu \;(\{ x_{i} \} )$$) is an element of the Hilbert space. Mapping $$\mu \;(P_{x} )$$ is attractive because each data point $$x_{i}$$ has a one-to-one correspondence with mapping $$\mu \;(P_{x} )$$. Thus, we can use the function norm $$s(\mu (P_{x} ),k(x_{t} ))$$(instead of $$s(P\{ x_{1} ,x_{2} , \ldots ,x_{t - 1} \} ,x_{t} )$$) used in Ref. [ 1]) to quantify the strangeness value $$s_{t}$$ for $$x_{t}$$. We do not need to access the actual distributions but rather finite samples to calculate/estimate the distribution $$\rho$$.
Lemma 1. As long as the rademacher average [ 29], which measures the “size” of a class of real-valued functions with respect to a probability distribution, is well behaved, finite sample yield error converges to zero, thus they empirically approximate $$\mu (P_{x} )$$(see Ref. [ 28] for more details).
The success of kernel methods largely depends on the choice of the kernel function k which is chosen according to the domain knowledge or universal kernels. In this paper, we employ the widely-used Gaussian radial basis function (RBF) kernel by
$$k(x_{i} ) = \exp \left( { - \frac{1}{{2\sigma^{2} }}||x_{i} - \bar{x}||^{2} } \right),$$
(8)
where $$\bar{x}$$ and $$\sigma$$ are the sample mean and standard deviation of the data stream $$\{ x_{1} ,x_{2} , \ldots ,x_{i} \} .$$ We next construct $$s_{t}$$ to measure the strangeness of $$x_{t}$$ to the past data stream up to time t–1, i.e., $$\{ x_{1} ,x_{2} , \ldots ,x_{i - 1} \}$$ in RKHS, as
$$s_{t} = s(\mu (p_{x} ),k(x_{t} )) = \;|k(x_{t} ) - k_{t - 1}^{c} |,$$
(9)
where $$k_{t - 1}^{c}$$ is the kernel center of the data stream, and $$\left| \bullet \right|$$ is distance metric. Here, it is worth mentioning that in real engineering scenarios, the CM variables are often composing of multidimensional values measured from multiple sensors at each time instance, we thus use the Mahalanobis distance [ 30] to compute the strangeness $$s_{t}$$, by considering correlations between variables such that different patterns in each dimension can be identified and analyzed [ 30], computed by
$$s_{t} = \sqrt {(k(x_{t} ) - k_{{t - 1}}^{c} )^{\prime } \sum\nolimits_{{}}^{{ - 1}} {(k(x_{t} ) - k_{{t - 1}}^{c} )} }$$
(10)
where ∑ is the covariance matrix.
Since we used RBF as the kernel function as given in Eq. ( 8), an isolated data point can be certified if $$s_{t} \ge \alpha \cdot \sigma$$, where $$\alpha$$ is a fixed-global factor controlling the confidence level of detection and $$\sigma$$ is the standard deviation computed from existed data (that is, an adaptive threshold).
Based on this fact, kernelized change decision can be made by re-writing Eq. ( 5) as follows,
\begin{aligned} {\text{If}}\;\;0 < M_{t} < \alpha \cdot K \cdot \sigma_{t - 1} :\;\;{\text{no}}\;{\text{change,}} \hfill \\ {\text{If }}M_{t} \ge \alpha \cdot K \cdot \sigma_{t - 1\;} :\;\;{\text{change}}\;{\text{occurs,}} \hfill \\ \end{aligned}
(11)
where K is a projection coefficient of data from RKHS to martingale space and $$\sigma_{t - 1}$$ can be computed adaptively from the data stream up to time $$t - 1$$. In real implementation, the employed Gaussian function is often standardized as a normal Gaussian function (i.e., $$\mu$$ = 0 and $$\sigma$$ = 1). Figure  2 gives typical confidence levels corresponding to different α in Gaussian distribution. Thus $$K$$ can be fixed as $$K \approx 2.17$$ by an off-line estimation. The estimation is made as follows: (a) given a set of data streams containing changes, we first make a definition of detection accuracy as q =  N/ P where N is the number of correctly detected changes and P is the number of total detected changes, and set up a threshold value $$\lambda^{*}$$ to guarantee a perfect accuracy, i.e., q = 100%; (b) then, decrease the value of $$\lambda^{*}$$ gradually to make sure q not decrease; (c) once q decreases, K is computed by
$$K\; = \;\frac{{\lambda^{*} }}{5 \cdot \sigma }$$
since five times of σ can guarantee all changes can be detected.

## 4 Experimental Verification

In this section, we aim to demonstrate the effectiveness and priority of the proposed adaptive change-detection algorithm for long-term machine monitoring. Thus, Experiment I and Experiment II are conducted to answer the following two questions:
(1)
Will the proposed incremental sliding-window be more suitable than the fixed-length sliding-window for long-term machine monitoring?

(2)
Can the adaptive detection algorithm be effective for change detection and how does it perform with large datasets?

### 4.1 Experimental Setup

Figure  3 shows the experimental setup where various motor speed inputs were applied to the setup to simulate changes in real machinery operations. We used the sound signal as the tested CM signal for testing due to its relatively large applicability in implementation. The relevant CM sound signals were acquired by a microphone mounted on the gearbox, and then sent to equipment for a PC.
Figure  4 shows a data stream of collected CM sound signals where three changes indicating transitions from one operational state to another are provided by a human instructor.

### 4.2 Experiment I: Performance of Incremental Sliding-Window

In our method, we propose to use incremental sliding-window, instead of length-fixed sliding window, for long-term machine monitoring. To evaluate the effectiveness/priority of this strategy for change detection, a set of testing threshold values from 1 to 20 at an interval of 1 were checked. Particularly, Figure  5 shows three detection cases of using incremental sliding-window for the testing data stream (shown in Figure  4). It is apparent that the performance differs greatly with different values of $$\lambda$$, and the best performance has been achieved when $$\lambda$$ = 7. Specifically, the smaller value of brings higher false alarms but larger value takes a lager detection delay. Here, it is demonstrated again the important roles of threshold value $$\lambda$$ for change detection. In the following, since the value of $$\lambda = 7$$ has been shown the success of change detection for the testing data, we compare this performance with that of using a fixed-length L sliding window detection. From Figure  6 that shows the results of three cases of L = {3000, 5000, 7000}, we can observe that the small length of L = 3000 causes over-detection (i.e., more false alarms) and the large length of L = 7000 cause a large detection delay; it tends to achieve a good detection performance when L = 5000, while one notes that it is often difficult to fix the length of L of the sliding-window considering the great temporal variations of collected CM signal from real machine operations.
In overall, taking the results shown in Figure  5 and Figure  6 together, the followings were interestingly found:
• The length-fixed sliding-window martingale requires more parameters, i.e., $$\lambda$$ and L, for performing change detection, which requires a more complicated prior estimation of them before usage;
• By the incremental sliding-window martingale, only one parameter: $$\lambda$$ is required, which inspires an extension of change detection by adaptive threshold.
Both of them inspire the adaptive threshold given in Section  3.2, which will be evaluated in the following.

### 4.3 Experiment II: Performance of Adaptive Threshold

In this section, we will evaluate the proposed adaptive threshold for machine monitoring. Here, it is noted that since in the Section  4.2, we have demonstrated the priority of using incremental sliding-window for long-term machine monitoring, in this section, we only test the performance of adaptive threshold with incremental sliding-window.
Figure  7 shows the results where all changes in the testing CM data (the same data shown in Figure  4) have been successfully detected without any false alarms when setting α as 3.0 but when $$\lambda$$ decreases, more false alarms are brought out. In addition, it is also observed that on the basis of considering the projection coefficient K from RKHS to martingale space and the global fixed confidence level α, the threshold value can be computed/adjusted adaptively according to the standard deviation computed from the past data at each step of change decision making as mentioned in Eq. ( 11), in other words, the threshold value is not fixed in the whole process, that is different from many existing works [ 711, 25]. All of them are consistent with the analysis previously made in Section  3.2. Moreover, to verify the effectiveness of the adaptive threshold with large datasets, we collected streams totally containing 90 changes for evaluation. The performance evaluation is based on two retrieval performance indicators: precision and recall, which are defined respectively as
\begin{aligned} Precision = \frac{{{\text{Number}}\;{\text{of}}\;{\text{Correct}}\;{\text{Detectins}}\;{\text{of}}\;{\text{Changes}}}}{{{\text{Number}}\;{\text{of}}\;{\text{Detections}}\;{\text{of}}\;{\text{Changes}}}} \hfill \\ Recall = \frac{{{\text{Number}}\;{\text{of}}\;{\text{Correct}}\;{\text{Detectins}}\;{\text{of}}\;{\text{Changes}}}}{{{\text{Number}}\;{\text{of}}\;{\text{Ture}}\;{\text{Changes}}}}. \hfill \\ \end{aligned}
Precision is the probability that a detection is actually correct, i.e., a true change. Recall is the probability that the detection recognizes a true change.
In addition, we also use a single performance indicator $$F_{1}$$ defined as
$$F_{ 1} \; = \;\frac{ 2\times Recall \times Precision}{Recall + Precision}.$$
Apparently, $$F_{1}$$ is a harmonic mean between precision and recall, and a high value of $$F_{1}$$ ensures reasonably a high balance between precision and recall.
Figure  8 shows the detection performances for different values of $$\alpha \in$$ {0.92, 1.84, 2.30, 2.50, 2.75, 3.00, 3.22, 3.69, 4.15, 4.61, 5.00}. Specifically, it can be found in Figure  8(a) that with an increasing value of α, the detection precision increases and achieves the best performance when $$\alpha > 3$$. On the other hand, for the recall shown in Figure  8(b), our proposed method can always obtain a perfect performance, that is 100% which means the all true changes can be successfully detected for tested values of $$\alpha$$. These results can be more clearly observed in Figure  8(c) where $$F_{1}$$ can achieve the best performance when $$\alpha > 3$$. All of these results are not surprising because three times of σ guarantees approximate 99.8% (as shown in Figure  2) data points have been contained in a Gaussian distribution. Considering a smaller α brings a smaller detection delay as shown in Figure  7, it is thus recommended $$\alpha = 3$$ when using the proposed method in real applications.

## 5 Conclusions

In this paper, we have extended our recent work [ 25] to long-term machine monitoring where two schemes are proposed: 1) using the incremental sliding-window to solve the problem of multi-change detection; and 2) developing an adaptive threshold when making change decision. Experimental results on an experimental setup demonstrated great successes of the proposed method in multi-change detection in long-term monitoring. With this, it can be concluded that the improved algorithm is feasible for a new generation of long-term machine monitoring systems. In view of this, further work will be done to continue verifying the capability of the improved algorithm for detecting a wider range of changes when operating a machine to make it ready for commercial exploitation.
In addition, considering that the detection delay is one of essential aspects to be considered when design a detection method, another future work is to extract informative features to represent the raw collected data for modeling in order to further decrease the delay of our method when detecting changes

## Unsere Produktempfehlungen

### Premium-Abo der Gesellschaft für Informatik

Sie erhalten uneingeschränkten Vollzugriff auf alle acht Fachgebiete von Springer Professional und damit auf über 45.000 Fachbücher und ca. 300 Fachzeitschriften.

Literatur
Über diesen Artikel

Zur Ausgabe