1 Introduction
-
Applicability of multi-threading for similarity search over multiple time-series streams.
-
Incremental update on the envelopes of new-coming time-series subsequences so that these envelopes can be immediately used in a lower bounding function.
2 Background
2.1 Dynamic Time Warping
2.2 Techniques to speedup Dynamic Time Warping
-
Constraints The technique aims to limit the number of cells evaluated in the accumulated cost matrix. Figure 3b depicts a Sakoe–Chiba band [16] that prevents pathological warping paths, where a data point in one time-series sequence matches too many data points of another as in Fig. 1b. The Sakoe–Chiba band constrains a warping window in the area defined by two lines parallel to the diagonal. Keogh and Ratanamahatana [10] showed that restricting the size of the warping windows not only speedups computation, because only a part of the accumulated cost matrix needs computing, but also tightens the lower bounding property. The Sakoe–Chiba band works well in domains where an optimal warping path is expected to be close to the diagonal of the accumulated cost matrix. The constraint works poorly if time series are of events that start and stop at extremely different times because the warping path can stray very far from a linear warping path and nearly the entire matrix must be evaluated to find an optimal warping path.
-
Lower bounding The technique uses cheap-to-compute lower bounding functions to reduce the number of times computing the DTW distance for finding the time-series sequence that is nearly similar to a given time-series pattern. Let F be a function of dimension reduction or feature extraction of a time-series sequence. A lower bounding function \(d_F\) is of the lower bounding property as follows.The efficiency of \(d_F\) is evaluated in terms of time complexity and pruning power. The pruning power of \(d_F\) is competence for early detection of unpromising sequences so as not to use the naive DTW calculation on these sequences in the post-processing phase. Let g be the number of unpromising sequences which \(d_F\) identifies, and G be the total number of sequences which are performed in a similarity search. The pruning power of \(d_F\) is$$\begin{aligned} d_F(F(C), F(Q)) \le DTW(C, Q). \end{aligned}$$(3)$$\begin{aligned} \frac{100 \times g}{G}~\%. \end{aligned}$$(4)
-
Early abandoning The technique is based on a comparison of distances with a threshold \(\varepsilon \). Similarity search over two time-series sequences C and Q needs to check f(i, j) in the formula (1), such that \(f(i, j) \le \varepsilon \). If the value of f exceeds \(\varepsilon \), then C is not similar to Q and the course to compute DTW(C, Q) is stopped immediately. Some works using the technique to accelerate the DTW calculation are [11, 18]. Notice that early abandoning with a threshold \(\varepsilon \) can also be used in the calculation of the Euclidean distance. Moreover, the LB_\(_{Keogh}\) lower bound can be used for early abandoning of the DTW calculation as follows. While the classical DTW calculation is being incrementally computed from left to right of two sequences Q and C (e.g. from 1 to k), if the partial DTW accumulation with the LB_\(_{Keogh}\) contribution from k + 1 to n exceeds \(\varepsilon \), then the naive DTW calculation is aborted right away, since the sum ofis a lower bound of \(DTW(Q_{1 : n}, C_{1 : n})\). Hence, on the occasion of the calculation of \(LB\__{Keogh}(Q, C)\), an array of cumulative bounds is got from the lower bounding function. The kth element of the array of cumulative bounds is \(LB\__{Keogh}(Q_{k : n}, C_{k : n})\).$$\begin{aligned} DTW(Q_{1 : k}, C_{1 : k}) + LB\__{Keogh}(Q_{k + 1 : n}, C_{k + 1 : n}) \end{aligned}$$
2.3 Data normalization
2.4 Typical tasks of similarity search in streaming time series
-
Best-so-far search: Finding such an NX that is most similar to NY. That means DTW(NX, NY) is smallest. The smallest value, which is recorded until time tick n, is the best-so-far value, and X is the best-so-far subsequence of Y.
-
Range search: Given a threshold \(\varepsilon \), finding any NX such that \({DTW(NX, NY) \le \varepsilon }\). Notice that \(\varepsilon \) is also referred to as a range radius of Y. It is likely that similar subsequences are overlapped, so Range search is modified to Disjoint query. This means that given all overlapped resultant NXs, Disjoint query chooses the one with the smallest DTW(NX, NY).
-
k-nearest neighboring (k-NN) search: Given a positive integer k, finding a set of k NXs similar to NY, the set is referred to as kS, such that if there is \(NX^{\prime }\notin kS ,\) then \(\forall NX \in kS, DTW(NX, NY) \le DTW(NX^{\prime }, NY)\). Note that if \(k = 1\), the similarity search type becomes best-so-far search.
3 Related work
Notation | Meaning |
---|---|
T
| A static time series |
s
| A big section of T
|
q
| A time-series pattern for query |
nq
| The normalized pattern of q
|
q.l
| The length of q
|
q.r
| The width r of the Sakoe–Chiba band of q
|
\(E_s\)
| The envelope of s constructed with q.r
|
q.bsf
| The best-so-far value of q over T
|
c
| A new-coming subsequence of S, corresponding with q
|
coef
| The normalization coefficients of c
|
nc
| The normalized pattern of c
|
cb1, cb2 | Two arrays of cumulative bounds |
4 The proposed methods
4.1 Incremental data normalization
-
Incremental min–max normalization In the beginning, an ascending numeric array is created from the data points of X with the algorithm of Quicksort, so \(x_{\mathrm{min}}\) is the first element and \(x_{\mathrm{max}}\) is the last one of the ordering array. When there is a new-coming data point, the oldest data point of X is deleted out of the array, and then the new data point is inserted into the array. The course of the deletion and insertion must preserve the ascending order of the array, so the algorithm of Binary search is used to find the element that needs deleting and the suitable position in the array to insert the new data point. As Quicksort is carried out once when the array of new-coming data points is full at the beginning of the course of the similarity search, and since then Binary search is invoked for every new-coming data point afterward, the time complexity of incremental min–max normalization is \(\mathcal {O}(\log (n))\).
-
Incremental z -score normalization$$\begin{aligned}&\text {Let us define} \quad x^2 = \sum _{i=1}^{n}x_i^2. \end{aligned}$$(11)At first, Eq. (9) is used to compute \(\mu \) and Eq. (11) is used to compute x2. Next, when there is a new-coming data point \(x_{n+1}\) deriving from the evolution of X, we compute$$\begin{aligned}&\text {Equation (10) can be expressed as:} \quad \sigma ^2=\frac{x^2}{m}-\mu ^2.\nonumber \\ \end{aligned}$$(12)$$\begin{aligned}&\mu _{\mathrm{new}} = \mu +\frac{x_{n + 1} - x_1}{n}\end{aligned}$$(13)Therefore, we do not need to compute \(\mu _{\mathrm{new}}\) and \(x^2_{\mathrm{new}}\) completely. Note that the time complexity of incremental z-score normalization is higher than that of incremental min–max normalization, since the complex arithmetic operators, which are the square to compute \(x^2_{\mathrm{new}}\) and the square root to compute \(\sigma \), are used in incremental z-score normalization. Notice that because of the accumulation of the floating-point error in the implementation of the incremental z-score normalization, \(\mu _{\mathrm{new}}\) and \(x^2_{\mathrm{new}}\) will be completely calculated by Eqs. (9) and (11), respectively, to flush out any accumulated error for once every 100,000 coming data points of a time-series stream.$$\begin{aligned}&\quad \text {and} \quad x^2_{\mathrm{new}} = x^2 + x_{n + 1}^2 - x_1^2. \end{aligned}$$(14)
4.2 Problem definition
Notation | Meaning |
---|---|
QSet
| The set of patterns |
q.ep
| The range radius of q
|
q.RSet
| The range set of q
|
S
| A streaming time series |
\(T_n\)
| The new-coming data value at time point n of S
|
\(T_{n-1}\)
| The new-coming data value at time point \(n - 1\) of S
|
\(E_c\)
| The envelope of c
|
4.3 SUCR-DTW
4.4 ESUCR-DTW
5 Experimental evaluation
5.1 Evaluation of SUCR-DTW
No. | Time-series file | Length |
---|---|---|
1 | Data.txt to demonstrate UCR Suite | 1,000,000 |
2 | Revised CinC_ECG_torso_TEST | 2,261,820 |
3 | Revised InlineSkate_TEST | 1,035,100 |
4 | Revised NonInvasiveFatalECG_Thorax1TEST | 1,271,250 |
5 | Revised uWaveGestureLibrary_X_TEST | 1,128,330 |
Total points | 6,695,500 |
LB_\(_{Kim}\) (%) |
LB_\(_{Keogh}\) (%) | Reversed LB_\(_{Keogh}\) (%) | |
---|---|---|---|
Pattern set 1 | |||
SUCR-DTW | 54.66 | 34.67 | 9.95 |
UCR-DTW | 54.67 | 34.66 | 9.90 |
Pattern set 2 | |||
SUCR-DTW | 55.45 | 34.41 | 9.44 |
UCR-DTW | 55.45 | 34.40 | 9.39 |
Pattern set 3 | |||
SUCR-DTW | 51.47 | 39.17 | 8.69 |
UCR-DTW | 51.47 | 39.17 | 8.65 |
5.2 Evaluation of ESUCR-DTW
No. | Time-series file | Length |
---|---|---|
1 | Revised Adiac_TEST | 68,816 |
2 | Revised Adiac_TRAIN | 68,640 |
3 | Revised FISH_TEST | 81,025 |
4 | Revised FISH_TRAIN | 81,025 |
5 | Revised MedicalImages_TEST | 75,240 |
Total points | 374,746 |
\({(\alpha , \beta )}\)
| # of same similar subsequences |
---|---|
(1, 1) | 8 |
(2, 2) | 6 |
(3, 3) | 9 |
(4, 4) | 4 |
(5, 5) | 2 |
6 Conclusions and future work
-
SUCR-DTW has the same precision as UCR-DTW and runs faster than the original UCR-DTW without multi-threading.
-
The envelopes incrementally updated in SUCR-DTW are tighter than those constructed once in UCR-DTW.
-
Similar subsequences obtained by ESUCR-DTW are better than those done by SUCR-DTW, yet the former must spend more time than the latter.
-
With regard to ESUCR-DTW, the best-so-far subsequences obtained with incremental z-score normalization rarely coincide with those done with incremental min–max normalization.
-
ESUCR-DTW often returns similar subsequences longer than the patterns.
-
In addition, ESUCR-DTW using incremental z-score normalization is slower than ESUCR-DTW using incremental min–max normalization.
-
Best-so-far values obtained by ISPRING are less than or equal to those done by ESUCR-DTW, yet ISPRING often returns best-so-far subsequences whose lengths are unreasonable. This means that many similar subsequences obtained by ISPRING are too short in comparison to the lengths of the patterns.