2.1 Anomaly detection procedure
The conventional signal frequency spectrum analysis is built on the basis of Fourier transform (FT) which is a global transform. However, FT has a low-frequency resolution and cannot recognize the subtle changes of the frequency spectrum. Wavelet transform (WT) may be viewed as an extension of the traditional FT with adjustable window locations and sizes. Compared with the Fourier-based analyses that use global sine and cosine functions as bases, the basis wavelets are local functions, each of which is defined by two parameters: its scale (relating to frequency) and its position (relating to time). One possible drawback of the WT is that its frequency resolution is quite poor in the high-frequency region. The wavelet packet transform (WPT) is one extension of the WT that provides a complete level-by-level decomposition. The WPT enables the extraction of features from signals that combine stationary and nonstationary characteristics with an arbitrary time-frequency resolution.
The wavelet packet transform of a time-domain signal
x(
t) can be calculated using a recursive filter-decimation operation. After the signal
x(
t) is decomposed into
j levels of decomposition and the node signals are reconstructed as
, the signal
x(
t) can be expressed as
(1)
The node signal energies
can be defined as
(2)
According to the theory of WPT, each node signal contains information of the original signal in a specific time-frequency window. Hence Equation
2 illustrates that the node signal energy
is the energy stored in the corresponding frequency band. Obviously, the frequency components will be varied when anomaly occurs in the original signal. Thus, the anomaly detection can be achieved by investigating the changing trend of
.
The wavelet packet components with small energy magnitudes are easily jammed by the measurement noise. Thus, in this paper, instead of directly observing node signal energies on an individual basis, the criterion for anomaly detection is designed as
(3)
where
is the node signal energy ratio in the total signal energy and
is the reference baseline of node signal energy ratio which is the mean value of node signal energy by measuring some subsequent signals. In order to eliminate the noise effect, the first
m dominant nodes are retained. It can be deduced that anomalies in the signal would affect the wavelet node signal energies and subsequently alter the criterion ‘ADC.’ However, the anomalies are not the only factor that can affect the criterion which also can be influenced by measurement noise. It is essential to establish threshold values for the anomaly detection criterion so that the criterion can be used to extract anomalies from measurement noise with a large probability [
11]. In this paper, the threshold values would be fixed based on statistical process control (SPC) and statistical process control charts which are used to describe the output characteristics of the process in a coordinate graph.
SPC was proposed by Dr. W.A. Shewhart, and the statistical process control charts described the output characteristics of the process in a coordinate graph. The algorithm was proposed to monitor the manufacturing process so as to reduce variation and guarantee quality into the product. The control chart includes center line, upper control limit (UCL), and lower control limit (LCL). For anomaly detection applications, the core of the technique is to establish the control limits that can enclose variation of the extracted criterion due to measurement noises with a large probability.
Assume continuous measuring of
m sets of time-domain signals under the same status. In other words, there are no anomalies during the measurements. According to Equations
1,
2, and
3, a total of
p ADCs can be acquired using the average node energies as the reference baseline. Furthermore, the mean values and the standard deviation of
p ADCs can be obtained as
μADC and
SADC. On the basis of SPC theory, an X-bar control chart is used to determine threshold values of ADC. Suppose that the
p ADCs are divided into subgroups of size
q. Then, the one-side 1 -
α upper confidence level for average ADCs of a subgroup can be defined as
(4)
where Z
α
is the value of a standard normal distribution with zero mean and unit variance such that the cumulative probability is 100 (1 - α)%. The level UCL
α
can be regarded as the threshold value of the criterion. Therefore, if no anomalies happen, the average criterion ADC of a followed subgroup would be in the range of UCL
α
with a high probability of 100 (1 - α)%. On the other hand, when the average criterion ADC of a consecutive subgroup is beyond the limit, it shows that there are some anomalies. However, it should indicate that the SPC is a statistical principle of hypothesis testing. So, there are two types of hypothesis testing errors. Usually, the confidence limit can be improved by increasing the sizes p and q. From the anomaly detection procedure, it can be seen that no training data is required to construct a mathematical mode for anomaly detection. That is to say that the proposed algorithm belongs to unsupervised anomaly detection and can achieve online anomaly detection.
2.2 Compressive sensing theory
Although the proposed algorithm can overcome the noise interference and achieve anomaly detection with high probability, this algorithm is based on the complete data which greatly limits the application in the big data field. Compressive sensing theory overwhelms the limitation of Nyquist sampling theory and can acquire and compress data simultaneously. The theory provides a feasible basis for the proposed anomaly detection algorithm in the big data field.
For signal
x ∈
R
N
, it can be expressed as
(5)
where Φ is the
N ×
N orthonormal transform basis and
θ is the expansion coefficient vector under the orthonormal basis. If signal
x is a
K sparse signal, that is,
K elements in vector
θ are not zero and
K is far less than
N, the signal
x can be collected with a small set of nonadaptive, linear measurements according to compressive sensing theory. Then, it can be described as follows [
12‐
18]:
where Ψ is a M × N random measurement matrix and M < N. Here, (Φ, Ψ) is a pair of orthobases which followed the incoherence restriction.
When the above condition holds, the expansion coefficients
θ can be reconstructed by solving the following
l0-norm constrained optimization problem:
(7)
where the ∥
θ∥
0 norm counts the number of nonzero components of
θ. However, solving Equation
7 was both numerically unstable and NP complete. Instead of solving the
l0 minimization problem, the nonadaptive CS theory seeks to solve the ‘closest possible’ tractable minimization problem, i.e., the
l1 minimization:
(8)
If
x is termed as
K sparse in the orthonormal basis, then we only need to collect
M =
O(
K log(
N/
K)) random measurements to recover the signal by
l1-norm algorithm. Many recovery algorithms based on linear programming, convex relaxation, and greedy strategies have been proposed to solve Equation
8, such as matching pursuit (MP), orthogonal matching pursuit (OMP), StOMP, subspace pursuit (SP), and CoSaMP. Finally, the reconstruction signal
x can be given by
.
According to compressive sensing theory, the acquired low-dimensional signal contained the main features of the signal under the premise of appropriate measurement matrix. So, the frequency component features would be included in the gathered low-dimensional signal. Therefore, the proposed anomaly detection procedure for big data could be carried out in the compressed domain.