25.10.2017  Original article  Ausgabe 6/2017 Open Access
Adaptive Change Detection for LongTerm Machinery Monitoring Using Incremental SlidingWindow
 Zeitschrift:
 Chinese Journal of Mechanical Engineering > Ausgabe 6/2017
Wichtige Hinweise
Supported by National Natural Science Foundation of China (Grant Nos. 61403232, 61327003), Shandong Provincial Natural Science Foundation of China (Grant No. ZR2014FQ025), and Young Scholars Program of Shandong University, China (YSPSDU, 2015WLJH30).
1 Introduction
Detection of structural changes from an operational process is a major goal in machinery monitoring which enables to solve many practical problems ranging from early fault detection, safety protection as well as other process control problems. Existing works are mainly based on a retrospective analysis of a data stream composed of numerical condition monitoring (CM) variables, such as vibration, sound, power consumption. The basic idea of a standard retrospective change detection mainly relies on estimating the logarithm of the likelihood ratio between two distributions [
1,
2]. This kind of strategy argues that the detection of a change can be converted into the detection of the parameter difference between the two distributions before and after this change point. As a consequence, the retrospective change detection aims to estimate this parameter difference of distributions based upon likelihood ratio statistics. Change decision can be made by performing a null hypothesis testing with a threshold. Many effective tools for this goal such as the cumulative sum metric (CUSUM) [
3–
6], geometric moving average (GMA) [
7,
8] and the generalized likelihood ratio test (GLRT) [
9–
12] have been widely used. For example, Willersrud et al. [
12] developed the GLRT to make efficient downhole drilling washout detection with the multivariate
tdistribution; Reñones et al. [
6] used the CUSUM analysis for multitooth machine tool fault detection. Although these methods have been experimentally demonstrated the effectiveness in various fields, due to the requirement of data after change point, a large detection delay is an essential limitation of these methods for real applications [
13]. On the other hand, realtime change detection aims to detect changes as soon as possible when a change occurs, this requirement is crucial in many reallife scenarios such as security monitoring [
14,
15], health care [
16,
17], automated factory [
18,
19] as well as machine operation monitoring studied in this paper. In operation of realtime change detection, at each time when a datum is input, it evaluates what extent the input datum is likely to be a changepoint by a certain type of measuring score [
20] which does not need any input data after change time. The realtime approaches have succeeded in solving many practical applications (e.g., wind turbine condition monitoring [
21], driver vigilance monitoring [
22]), and thus are promising [
23].
The goal of this paper is to further advance this research line of realtime detection methods. More specifically, our main contributions in this paper are summarized in two folds. The first contribution is to apply a martingalebased framework proposed in our recent article [
24] to longterm machine monitoring by combining an incremental slidingwindow strategy. The basic idea of the original martingale is to directly learn a statistical regularity from already observed data, and then detect possible change(s) by investigating how much each data is deviated from the regularity using martingale by testing exchangeability. That framework, however, only works for atmostonepoint change detection, thus unsuitable for the cases containing multiple changes in longterm monitoring applications. In this paper, we introduce an incremental slidingwindow strategy for solving this problem.
Anzeige
Recall that, the threshold value for change decision making is a key factor of detection accuracy. Potential weakness of the majority of exiting algorithms, e.g., Refs. [
5–
11], is that they need either humanmade instruction/intervention or an offline crossvalidation to confirm the value of threshold before operation, and thus make them largely limited in real applications. Another contribution of this paper is to develop a new adaptive threshold when performing change decision making. In particular, we introduce an alternative factor: a fixedglobal parameter used to control the coarsetofine level of detection, instead of the fixed value of threshold, for change detection (see Section
3.2 for details). By using this factor, at each step of change decision, the threshold value can be adaptively computed from the already observed data.
Besides the methodological extensions of the proposed method, we also conducted validations on an experimental setup to investigate effectiveness/priority of the method for change detection with large datasets. For more details, please see Section
4.
The rest of the paper is structured as follows. Section
2 presents the outline of martingale framework for machine monitoring. Section
3 formulates problems addressed in this paper and provides our proposed methods, followed by experimental results in Section
4. Section
5 concludes this paper and shows the future work.
2 Martingale Based Change Detection for Machinery Monitoring
Assuming that we have collected a data stream composed of numerical CM variables from an operational process of a machine, i.e.,
X = {
x
_{1},…,
x
_{ i },…,
x
_{ n }} where
x
_{ i },
\(i \in \{ 1,2, \ldots ,n\}\) is the variable value at time
\(i\), three points are provided in the following to support the use of martingale for change detection:
(1)
The changes are detected by testing the null hypothesis that all
n (strangeness) values (which corresponds to
\(x_{1} ,x_{2} , \ldots ,x_{n} ,\) respectively) are exchangeable in the index, through the corresponding exchangeability martingale
\(M_{1} ,M_{2} , \ldots ,M_{n}\), where
M
_{ n } is a measurable function of
\(s_{1} ,s_{2} , \ldots ,s_{n}\), satisfying
$$M_{n} = E(M_{n + 1} M_{1} ,M_{2} , \ldots ,M_{n} ).$$
(1)
(2)
The following Doob’s inequality [
25] can be used for rejecting this null hypothesis for a large value of
\(M_{n}\):
$$P(\exists nM_{n} \ge \lambda ) \le \frac{1}{\lambda }.$$
(2)
(3)
This (exchangeability) martingale is constructed from a
p value, the probability of obtaining a test statistic at least as extreme as the one that was actually observed, and the
pvalue is obtained by a strangeness value appropriately determined in each specific application.
Anzeige
On the basis of the above three points, the outline of performing martingale for change detection is described as follows (see Ref. [
25] for more details):
Step 1: The randomized power martingale (RPM) [
26] is constructed from the computed
\(s_{1} ,s_{2} , \ldots ,s_{n}\) by
where
\(\varepsilon \in (0,\;1)\) and
\(\hat{p}_{i} s\) are the value computed from
\(\hat{p}\) −value functions:
where #
\(\{ \bullet \}\) is a counting function and
\(\theta_{i}\) is a random value from a uniformly distribution of [0, 1],
\(j \in \{ 1,2, \ldots ,i  1\} .\)
$$M_{t} = \prod\limits_{i = 1}^{t} ( \varepsilon \hat{p}_{i}^{\varepsilon  1} ),t \in \{ 1,2, \ldots ,n\} ,$$
(3)
$$\begin{aligned} &\hat{p}_{i} (\{ s_{1} , \ldots ,s_{i  1} \} ,s_{i} ) \hfill \\ &\quad = \frac{{\# \{ j:s(j) > s(i)\} + \theta_{i} \# \{ j:s(j) = s(i)\} }}{i}, \hfill \\ \end{aligned}$$
(4)
Step 2: The following inequality is then tested based on the Doob’s inequality for any
\(t \in \{ 1,2, \ldots ,n\}\) to test the hypothesis as below:
$$\begin{aligned} H_{0} :\;\;{\text{no}}\;{\text{change}}\;:\;\;\;0 < M_{t} < \lambda \;, \hfill \\ H_{A} :\;\;{\text{change}}\;{\text{occurs}}\;:\;\;\;M_{t} \ge \lambda \;. \hfill \\ \end{aligned}$$
(5)
That is, if the martingale value
M
_{ t } is greater than a predefined threshold
\(\lambda\),
H
_{ A } in Eq. (
5) is satisfied, i.e., a change occurs on the time
t. Otherwise, the martingale test satisfying
\(H_{0}\) continues to operate as long as 0 <
M
_{ t } < λ.
3 Problem Formulation and Proposed Scheme
In Section
2, we have provided the outline of martingaletest for change detection. Two problems have to be further considered for longterm machinery monitoring:
(1)
How to deal with multichange detection in longterm monitoring?
(2)
Is it possible to adaptively compute the threshold value when making change decision?
In the following, we will discuss these two problems in details and provide our proposed schemes.
3.1 Change Detection Using Incremental SlidingWindow
Problem 1. As shown in Eq. (
3) and Eq. (
5),
\(M_{t}\) can be sequentially processed with a fixedlength
L slidingwindow over the given data stream, and all possible change candidates
\(t \in \{ 1,2, \ldots ,n\}\) are tested. This process however may be unsuitable for longterm monitoring applications. A key feature of real machine operations is temporal variations, i.e., one operation can last for a long time or only a few seconds. Hence, it is difficult to use a fixedlength
L slidingwindow to capture transitions (i.e., changes from an operational state to another) in longterm monitoring. More specifically, a small length of
L causes overchangedetection and a large length of
L causes a large delay. To overcome this problem, we combine the martingale with an incremental slidingwindow strategy [
27] to design a realtime change detection algorithm for Eq. (
5).
Proposed scheme: By virtue of incremental slidingwindow, the length
L can be automatically updated depending on whether a change is detected or not at time
t:
where
\(n_{t}\) is the starting time when computing the current martingale and
\(L_{t}\) is the length of corresponding sliding window at time
t. The process starts with
\(n_{1}\) = 1 and
\(L_{1}\) = 1, and ends with
\(n_{t} + L_{t} > n\) where
n is the length of a given data stream in offline applications or ends at an predefined stopping time in online applications. Here, it is worth mentioning that ∆
L is an increasing step to update the sliding window and was set as 1 in the following experiments.
$$\begin{aligned} &{\text{If}}\;t\;\;{\text{is no change }}:\;\;n_{t + 1} = n_{t} ,\;\;\;L_{t + 1} = L_{t} + \Delta L, \hfill \\ &{\text{If}}\;t \, \;{\text{is a change }}:\;\;n_{t + 1} = t,\;\;\;L_{t + 1} = L_{1} , \hfill \\ \end{aligned}$$
(6)
3.2 Adaptive Threshold for Change Detection
Problem 2. When making change decision by testing the null hypothesis shown in Eq. (
5), the threshold of
\(\lambda\) is essential as it balances the detection
precision and
recall (their definitions will be given in Section
4.3). In general, the value of
\(\lambda\) is predefined empirically or confirmed by a prior estimation for change detection. It is, however, often difficult to confirm the optimal value in realworld applications. To address this problem, unlike existing works that directly used the original monitored variables for change detection (e.g., Refs. [
6–
12,
25]), we utilize the Hilbert space embedding of distribution (HED, also called the kernel mean or mean map) to map the original data
\(\{ x_{i} \}\),
\(i \in \{ 1,2, \ldots ,n\}\) into the Reproducing Kernel Hilbert Space (RKHS) (see Figure
1). Without going into details, the idea of using HED for change detection is straightforward. By this, the probability distribution is now represented as an element of a RKHS, and the change can be thus detected by using a well behaved smoothing kernel function, whose values are small on the data belonging to the same pattern and large on the data from different patterns.
×
Proposed scheme: Inspired by Ref. [
28], probabilistic distributions can be embedded in RKHS. The center of the HED are the mean mapping functions:
where
\(\{ x_{i} \}\)
i = 1,2,…,
t are assumed to be I.I.D sampled from the distribution
\(P_{x}\). Under mild conditions,
\(\mu \;(P_{x} )\)(same for
\(\mu \;(\{ x_{i} \} )\)) is an element of the Hilbert space. Mapping
\(\mu \;(P_{x} )\) is attractive because each data point
\(x_{i}\) has a onetoone correspondence with mapping
\(\mu \;(P_{x} )\). Thus, we can use the function norm
\(s(\mu (P_{x} ),k(x_{t} ))\)(instead of
\(s(P\{ x_{1} ,x_{2} , \ldots ,x_{t  1} \} ,x_{t} )\)) used in Ref. [
1]) to quantify the strangeness value
\(s_{t}\) for
\(x_{t}\). We do not need to access the actual distributions but rather finite samples to calculate/estimate the distribution
\(\rho\).
$$\begin{aligned} \mu (p_{x} )= E (k ( {{\{ }}x_{i} {{\} ))}}, \hfill \\ \mu ( {{\{ }}x_{i} {{\} )}} = \frac{1}{t}\sum\limits_{i = 1}^{t} {k (x_{i} )} , \hfill \\ \end{aligned}$$
(7)
Lemma 1. As long as the rademacher average [
29], which measures the “size” of a class of realvalued functions with respect to a probability distribution, is well behaved, finite sample yield error converges to zero, thus they empirically approximate
\(\mu (P_{x} )\)(see Ref. [
28] for more details).
The success of kernel methods largely depends on the choice of the kernel function
k which is chosen according to the domain knowledge or universal kernels. In this paper, we employ the widelyused Gaussian radial basis function (RBF) kernel by
where
\(\bar{x}\) and
\(\sigma\) are the sample mean and standard deviation of the data stream
\(\{ x_{1} ,x_{2} , \ldots ,x_{i} \} .\) We next construct
\(s_{t}\) to measure the strangeness of
\(x_{t}\) to the past data stream up to time
t–1, i.e.,
\(\{ x_{1} ,x_{2} , \ldots ,x_{i  1} \}\) in RKHS, as
where
\(k_{t  1}^{c}\) is the kernel center of the data stream, and
\(\left \bullet \right\) is distance metric. Here, it is worth mentioning that in real engineering scenarios, the CM variables are often composing of multidimensional values measured from multiple sensors at each time instance, we thus use the Mahalanobis distance [
30] to compute the strangeness
\(s_{t}\), by considering correlations between variables such that different patterns in each dimension can be identified and analyzed [
30], computed by
where ∑ is the covariance matrix.
$$k(x_{i} ) = \exp \left( {  \frac{1}{{2\sigma^{2} }}x_{i}  \bar{x}^{2} } \right),$$
(8)
$$s_{t} = s(\mu (p_{x} ),k(x_{t} )) = \;k(x_{t} )  k_{t  1}^{c} ,$$
(9)
$$s_{t} = \sqrt {(k(x_{t} )  k_{{t  1}}^{c} )^{\prime } \sum\nolimits_{{}}^{{  1}} {(k(x_{t} )  k_{{t  1}}^{c} )} }$$
(10)
Since we used RBF as the kernel function as given in Eq. (
8), an isolated data point can be certified if
\(s_{t} \ge \alpha \cdot \sigma\), where
\(\alpha\) is a fixedglobal factor controlling the confidence level of detection and
\(\sigma\) is the standard deviation computed from existed data (that is, an adaptive threshold).
Based on this fact, kernelized change decision can be made by rewriting Eq. (
5) as follows,
where
K is a projection coefficient of data from RKHS to martingale space and
\(\sigma_{t  1}\) can be computed adaptively from the data stream up to time
\(t  1\). In real implementation, the employed Gaussian function is often standardized as a normal Gaussian function (i.e.,
\(\mu\) = 0 and
\(\sigma\) = 1). Figure
2 gives typical confidence levels corresponding to different
α in Gaussian distribution. Thus
\(K\) can be fixed as
\(K \approx 2.17\) by an offline estimation. The estimation is made as follows: (a) given a set of data streams containing changes, we first make a definition of detection accuracy as
q =
N/
P where
N is the number of correctly detected changes and
P is the number of total detected changes, and set up a threshold value
\(\lambda^{*}\) to guarantee a perfect accuracy, i.e.,
q = 100%; (b) then, decrease the value of
\(\lambda^{*}\) gradually to make sure
q not decrease; (c) once
q decreases,
K is computed by
since five times of
σ can guarantee all changes can be detected.
$$\begin{aligned} {\text{If}}\;\;0 < M_{t} < \alpha \cdot K \cdot \sigma_{t  1} :\;\;{\text{no}}\;{\text{change,}} \hfill \\ {\text{If }}M_{t} \ge \alpha \cdot K \cdot \sigma_{t  1\;} :\;\;{\text{change}}\;{\text{occurs,}} \hfill \\ \end{aligned}$$
(11)
$$K\; = \;\frac{{\lambda^{*} }}{5 \cdot \sigma }$$
×
4 Experimental Verification
In this section, we aim to demonstrate the effectiveness and priority of the proposed adaptive changedetection algorithm for longterm machine monitoring. Thus, Experiment I and Experiment II are conducted to answer the following two questions:
(1)
Will the proposed incremental slidingwindow be more suitable than the fixedlength slidingwindow for longterm machine monitoring?
(2)
Can the adaptive detection algorithm be effective for change detection and how does it perform with large datasets?
4.1 Experimental Setup
Figure
3 shows the experimental setup where various motor speed inputs were applied to the setup to simulate changes in real machinery operations. We used the sound signal as the tested CM signal for testing due to its relatively large applicability in implementation. The relevant CM sound signals were acquired by a microphone mounted on the gearbox, and then sent to equipment for a PC.
×
Figure
4 shows a data stream of collected CM sound signals where three changes indicating transitions from one operational state to another are provided by a human instructor.
×
4.2 Experiment I: Performance of Incremental SlidingWindow
In our method, we propose to use incremental slidingwindow, instead of lengthfixed sliding window, for longterm machine monitoring. To evaluate the effectiveness/priority of this strategy for change detection, a set of testing threshold values from 1 to 20 at an interval of 1 were checked. Particularly, Figure
5 shows three detection cases of using incremental slidingwindow for the testing data stream (shown in Figure
4). It is apparent that the performance differs greatly with different values of
\(\lambda\), and the best performance has been achieved when
\(\lambda\) = 7. Specifically, the smaller value of brings higher false alarms but larger value takes a lager detection delay. Here, it is demonstrated again the important roles of threshold value
\(\lambda\) for change detection. In the following, since the value of
\(\lambda = 7\) has been shown the success of change detection for the testing data, we compare this performance with that of using a fixedlength
L sliding window detection. From Figure
6 that shows the results of three cases of
L = {3000, 5000, 7000}, we can observe that the small length of
L = 3000 causes overdetection (i.e., more false alarms) and the large length of
L = 7000 cause a large detection delay; it tends to achieve a good detection performance when
L = 5000, while one notes that it is often difficult to fix the length of
L of the slidingwindow considering the great temporal variations of collected CM signal from real machine operations.
×
×
In overall, taking the results shown in Figure
5 and Figure
6 together, the followings were interestingly found:

The lengthfixed slidingwindow martingale requires more parameters, i.e., \(\lambda\) and L, for performing change detection, which requires a more complicated prior estimation of them before usage;

By the incremental slidingwindow martingale, only one parameter: \(\lambda\) is required, which inspires an extension of change detection by adaptive threshold.
Both of them inspire the adaptive threshold given in Section
3.2, which will be evaluated in the following.
4.3 Experiment II: Performance of Adaptive Threshold
In this section, we will evaluate the proposed adaptive threshold for machine monitoring. Here, it is noted that since in the Section
4.2, we have demonstrated the priority of using incremental slidingwindow for longterm machine monitoring, in this section, we only test the performance of adaptive threshold with incremental slidingwindow.
Figure
7 shows the results where all changes in the testing CM data (the same data shown in Figure
4) have been successfully detected without any false alarms when setting
α as 3.0 but when
\(\lambda\) decreases, more false alarms are brought out. In addition, it is also observed that on the basis of considering the projection coefficient
K from RKHS to martingale space and the global fixed confidence level
α, the threshold value can be computed/adjusted adaptively according to the standard deviation computed from the past data at each step of change decision making as mentioned in Eq. (
11), in other words, the threshold value is not fixed in the whole process, that is different from many existing works [
7–
11,
25]. All of them are consistent with the analysis previously made in Section
3.2. Moreover, to verify the effectiveness of the adaptive threshold with large datasets, we collected streams totally containing 90 changes for evaluation. The performance evaluation is based on two retrieval performance indicators:
precision and
recall, which are defined respectively as
$$\begin{aligned} Precision = \frac{{{\text{Number}}\;{\text{of}}\;{\text{Correct}}\;{\text{Detectins}}\;{\text{of}}\;{\text{Changes}}}}{{{\text{Number}}\;{\text{of}}\;{\text{Detections}}\;{\text{of}}\;{\text{Changes}}}} \hfill \\ Recall = \frac{{{\text{Number}}\;{\text{of}}\;{\text{Correct}}\;{\text{Detectins}}\;{\text{of}}\;{\text{Changes}}}}{{{\text{Number}}\;{\text{of}}\;{\text{Ture}}\;{\text{Changes}}}}. \hfill \\ \end{aligned}$$
×
Precision is the probability that a detection is actually correct, i.e., a true change.
Recall is the probability that the detection recognizes a true change.
In addition, we also use a single performance indicator
\(F_{1}\) defined as
$$F_{ 1} \; = \;\frac{ 2\times Recall \times Precision}{Recall + Precision}.$$
Apparently,
\(F_{1}\) is a harmonic mean between
precision and
recall, and a high value of
\(F_{1}\) ensures reasonably a high balance between
precision and
recall.
Figure
8 shows the detection performances for different values of
\(\alpha \in\) {0.92, 1.84, 2.30, 2.50, 2.75, 3.00, 3.22, 3.69, 4.15, 4.61, 5.00}. Specifically, it can be found in Figure
8(a) that with an increasing value of
α, the detection
precision increases and achieves the best performance when
\(\alpha > 3\). On the other hand, for the
recall shown in Figure
8(b), our proposed method can always obtain a perfect performance, that is 100% which means the all true changes can be successfully detected for tested values of
\(\alpha\). These results can be more clearly observed in Figure
8(c) where
\(F_{1}\) can achieve the best performance when
\(\alpha > 3\). All of these results are not surprising because three times of
σ guarantees approximate 99.8% (as shown in Figure
2) data points have been contained in a Gaussian distribution. Considering a smaller
α brings a smaller detection delay as shown in Figure
7, it is thus recommended
\(\alpha = 3\) when using the proposed method in real applications.
×
5 Conclusions
In this paper, we have extended our recent work [
25] to longterm machine monitoring where two schemes are proposed: 1) using the incremental slidingwindow to solve the problem of multichange detection; and 2) developing an adaptive threshold when making change decision. Experimental results on an experimental setup demonstrated great successes of the proposed method in multichange detection in longterm monitoring. With this, it can be concluded that the improved algorithm is feasible for a new generation of longterm machine monitoring systems. In view of this, further work will be done to continue verifying the capability of the improved algorithm for detecting a wider range of changes when operating a machine to make it ready for commercial exploitation.
In addition, considering that the detection delay is one of essential aspects to be considered when design a detection method, another future work is to extract informative features to represent the raw collected data for modeling in order to further decrease the delay of our method when detecting changes
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.