Top

GeoInformatica

Published in:

Open Access 08-06-2023

Rotation invariant GPS trajectory mining

Authors: Maximilian Leodolter, Claudia Plant, Norbert Brändle

Published in: GeoInformatica | Issue 1/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Mining of GPS trajectories of moving vehicles and devices can provide valuable insights into urban systems, planning and operational applications. Understanding object motion often requires that the spatial-temporal matching of trajectories be invariant to shifting, scaling and rotation. To this end, Procrustes analysis enables to transform one data set of a trajectory to represent another set of data as closely as possible. We propose a novel shift-scale-rotation invariant Procrustes distance metric based on the Kabsch algorithm, which calculates the optimal rotation matrix by minimizing the root-mean squared deviation between two paired sets of points of trajectories or trajectory segments. We present two novel runtime efficient algorithms which are based on our proposed distance metric: 1) the sliding-shifting-scaling-Kabsch-rotation (S3KR) algorithm for detecting recurring short query patterns in longer motion trajectories and 2) a novel time series subsequence clustering algorithm to group GPS trajectory data and to discover prototypical patterns. We demonstrate the potential of our proposed sliding Procrustes analysis algorithms by applying it on real-world GPS trajectories collected in urban and rural areas from different transport modes, as well as on nautical GPS trajectories. We also demonstrate that our methods outperform the state of the art in accuracy and runtime on synthetic and real world data.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The increasingly wide use of mobile devices and sensors leads to a flood of machine-generated trajectory data about moving people, vehicles, vessels and other objects. A plethora of work has been developed in different domains taking advantage of this new wealth of motion data, proposing techniques for analyzing motion trajectory data, e.g. [25]. Understanding motion often requires that the spatial-temporal matching of trajectories be invariant to shifting, scaling and rotation.

Traditional time series distances lack the capability to discover rotation invariant similarities, and therefore require some form of preprocessing of the time series [3, 19, 24]. As such preprocessing methods are sensitive to noise (as we demonstrate in Section 4.1), we propose a novel distance method based on Procrustes analysis [5] and demonstrate that it outperforms traditional methods from literature.

Procrustes analysis can be used to transform one set of data (e.g. a trajectory or shape) to represent another set of data as closely as possible. The Kabsch algorithm [6] solves the mathematical problem of finding the rotation matrix which minimizes the distance of the rotated shape to another shape. In order to perform shape comparisons which are invariant to translation/shift, scale and rotation for each time point of a long trajectory, a runtime efficient sliding algorithm is essential.

A disadvantage of traditional Procrustes analysis is that it is computationally too expensive for exhaustive time series analysis. We tackle this complexity problem with a new distance measure – derived from Procrustes analysis – which is suitable for integration in a runtime efficient algorithm. We also demonstrate that our new distance measure finds similarity structures in a time series database identical to traditional Procrustes analysis.

We propose a novel efficient sliding technique for Procrustes analysis based on the Kabsch algorithm, denoted as the S3KR algorithm (sliding-shifting-scaling-Kabsch-rotation). S3KR compares a query trajectory q with all sub-trajectories x of the long trajectory, as illustrated in Fig. 1: The left part of Fig. 1 shows the color-coded GPS trajectory of a car driving eastwards, where the colors along the trajectory indicate the distance to the short query trajectory q (a sharp left curve). In order to calculate the distance value for a point of the trajectory, we extract the sub-trajectory x starting at the point where the length of x equals the length of q. We shift, scale and rotate x to achieve y, and measure the distance of y to q. Yellow colors indicate low distance to the query trajectory, i.e. a sharp left turn. Conversely, black indicates large distance to the query trajectory, such as the right turn at the beginning of the trajectory.

The S3KR algorithm relies on a novel rotation invariant distance measure which is also a metric. Having the appealing features of a metric, notably the triangle inequality, means that this distance measure can be used in data structures enabling efficient k-NN or range queries, such as, for example, M-Trees [2]. We also introduce an algorithm for clustering segments of trajectories to discover typical structures and prototypical patterns in GPS trajectories. This algorithm relies on our proposed distance metric and the sliding algorithm S3KR.

We summarize our contributions: (1) a new rotation-scaling-shifting invariant Procrustes distance metric (see Section 3), (2) a sliding algorithm for efficient scanning of long time series (see Sections 3.2 and 3.3), (3) an algorithm based on spectral clustering for time series subsequence clustering (see Section 3.4), (4) experiments on synthetic random walks and trajectories representing handwritten characters (see Section 4), as well as two case studies on real-world urban and maritime GPS trajectories (see Section 5). We compare our methods with traditional time series distances and related methods from literature developed for similar problems. Our experiments demonstrate that our novel distance measure outperforms traditional time series distances on trajectory mining problems.

Table 1

Notation

x, y, q, f multivariate time series $\in \textbf{R}^{n\times m}$	R rotation matrix
$x_{ij}$, $x_{i.}$ the j-th or all dimensions of x at time i	$x^T$ transposed matrix
$\tau $, $\gamma $, $\pi $ shift, scaling and rotation operator	$\cdot $ matrix/scalar multiplication
$\mu $, $\sigma $, s mean, standard deviation, scaling factor	H matrix, R depends on H
$\{y\}_{i,n}$ segment of y of length n, starts at time i	\|\|.\|\| Euclidean norm

2 Procrustes analysis

In order to make this paper self-contained and to introduce our notation (Table 1) we briefly recapitulate the essentials of Procrustes analysis. We denote the mean-shift operator $\tau _{\mu }: \textbf{R}^{n\times m} \rightarrow \textbf{R}^{n\times m}$ as

$$\begin{aligned} \tau _{\mu }(x_{ij}) := x_{ij} - \mu _{j}, \end{aligned}$$

(1)

where $\tau _{\mu }$ is the typical shift operator in Procrustes analysis such that the shifted x has zero mean in all dimensions. For a multivariate time series x we notate the scaling operator $\gamma : \textbf{R}^{n\times m} \rightarrow \textbf{R}^{n\times m}$ as

$$\begin{aligned} \gamma (x_{ij}) := x_{ij} \cdot (\sum _j \sigma _j^2)^{-^{1}/{2}} = x_{ij} \cdot s. \end{aligned}$$

(2)

The scaling factor s is equal to the inverse of the root mean squared deviation of a trajectory to its mean, thus providing information about the extension of x in all dimensions.

Given two $n \times m$-dimensional time series x and y, Kabsch’ algorithm [6] solves the constrained orthogonal Procrustes problem [5] to find the rotation matrix R minimizing the Euclidean norm $||x - y\cdot R||$ by:

$$\begin{aligned} \begin{array} {ll}H=x^T \cdot y &{} \in \textbf{R}^{m\times m}\\ H=U \cdot D \cdot V &{} \mathrm {SVD \;(Singular \;Value \;Decomposition)}\\ d=det(V \cdot U^T) &{} \textrm{determinant}\\ E=diag(1, \dots , 1,\ sign(d)) &{} \mathrm {diagonal\; matrix \;of \;m-1 \;ones \;and}\; sign(d)\\ R=V \cdot E \cdot U^T \end{array} \end{aligned}$$

(3)

Since we want to preserve the direction of a trajectory, we concentrate on rotation matrices only, and therefore do not allow R to be a reflection matrix. Otherwise, a GPS trajectory representing a left turn could be reflected to perfectly fit a right turn.

For x and y of equal lengths we define the rotation operator $\pi : \textbf{R}^{n\times m} \times \textbf{R}^{n\times m} \rightarrow \textbf{R}^{n\times m}$:

$$\begin{aligned} \pi (x,\ y) := y \cdot R, \end{aligned}$$

(4)

where R is the rotation matrix according to (3).

The distance function $\delta $ which is invariant to scaling, shifting and rotation is defined as $\delta : \textbf{R}^{n\times m} \times \textbf{R}^{n\times m} \rightarrow \textbf{R}_+$:

$$\begin{aligned} \delta (x,\ y) := ||\tau _{\mu }( \gamma (x)) \ - \ \pi \big (\tau _{\mu } ( \gamma (x)),\ \tau _{\mu }(\gamma (y))\big )||. \end{aligned}$$

(5)

3 Sliding procrustes analysis

Performing distance calculations between many time series segments is runtime expensive. In the following we propose slight adaptions to the operators from Section 2 which enable the sliding algorithm to reduce the number of required operations and thus save calculation time.

3.1 Operators for sliding distance

We accompany the introduction of the computation operators with an illustration of two GPS trajectories in Fig. 2a and b, which are two segments of the original trajectory presented in Fig. 1. Before we calculate the distance between these two trajectories x and y, we shift both of them to the origin. We define the zero-shift operator (that shifts a time series such that it starts in the origin¹) $\tau _0: \textbf{R}^{n\times m} \rightarrow \textbf{R}^{n\times m}$ for a multivariate time series x as follows:

$$\begin{aligned} \tau _0(x_{ij}) := x_{ij} - x_{0j}. \end{aligned}$$

(6)

Figure 2c) depicts the shifted trajectory segments $\tau _0(x)$ and $\tau _0(y)$. Both shifted segments start at the origin. Next we apply $\gamma $ (2) to scale the trajectories – Fig. 2d) shows $\gamma (\tau _0(.))$ for x and y. The shapes remain almost identical to those from Fig. 2c), but the extensions in both dimensions have changed, and also the scales of the axes have changed. We remark that due to the fact that $\sigma $ is invariant to translation, $\gamma $ and $\tau _0$ are commutative. It therefore makes no difference whether $\gamma $ is applied first and followed by $\tau _0$, or vice versa. Before measuring the distance between x and y, we unify their orientation in a final step. We rotate both segments to minimize their abbreviation to one and the same reference time series (in this example we set $f_{i.}=(0, 1) \ \forall \ i$, cf. (7)). Figure 2e) visualizes the rotated trajectory segments (after scaling and shifting) and finally shows the similarity of x and y, meaning that both segments are left turns. To achieve the results from Fig. 2e) we applied a rotation function defined as follows: for a time series y and a fixed reference time series f, we define the rotation function $\pi ^f: \textbf{R}^{n\times m} \rightarrow \textbf{R}^{n\times m}$ which rotates y to minimize $||f-y \cdot R||$:

$$\begin{aligned} \pi ^f(y) := y \cdot R, \end{aligned}$$

(7)

where R is the rotation matrix according to (3) for f instead of x. Consequently, for a comparison of two time series x and y, $\pi ^f$ rotates x and y independently of each other, but dependent on the reference f. Section 3.3 discusses the choice of f and how this contributes to reducing the runtime complexity of the rotation from linear to constant.

Next we calculate the distance by summing up the squared lengths of the dashed lines in Fig. 2f), which is the Euclidean distance of the shifted, scaled and rotated time series. Formally, we define the shift-scaling-rotation invariant distance metric $\delta ^f: \textbf{R}^{n\times m} \times \textbf{R}^{n\times m} \rightarrow \textbf{R}_+$:

$$\begin{aligned} \delta ^f(x,\ y) := ||\pi ^f\big ( \tau _0 (\gamma (x))\big ),\ \pi ^f \big ( \tau _0 ( \gamma (y))\big )||. \end{aligned}$$

(8)

Appendix B proves that $\delta ^f$ is a metric. Hence, $\delta ^f$ is symmetric, and $\delta ^f(x,y) = 0 \Leftrightarrow $ x and y are equal after scaling, shifting and rotating to fit f. Further $\delta ^f$ fulfills the triangle inequality. Section 4.3 demonstrates how this is beneficial for standard data mining tasks as a nearest neighbor search.

3.2 Sliding distance: s3kr

The sliding distance s3kr enables to efficiently perform sliding pattern recognition (e.g. k-NN and $\epsilon $-queries), and to efficiently compare time series of different lengths. To compare a shorter query q with segments of a longer time series y, s3kr executes this in a sliding fashion and calculates $\delta ^f$ for every point of time. If $q \in \textbf{R}^{n_q\times m}$ and $y \in \textbf{R}^{n_y\times m}$ are two time series of different lengths $n_q < n_y$, we measure the distance between segments of y, $\{y\}_{i,n_q}$, and q by sliding a window of length $n_q$ along y. The result is a vector of distances. We define the sliding-shifting-scaling-Kabsch-rotation function s3kr: $\textbf{R}^{n_q\times m} \times \textbf{R}^{n_y\times m} \rightarrow \textbf{R}^{n_y-n_q+1}_+$:

$$\begin{aligned} \text {s3kr}(q,\ y)_i := \delta ^f(q, \{y\}_{i,n_q} ). \end{aligned}$$

(9)

Compared to a traditional distance computation, s3kr does not return a single distance value, but a vector of distances. Each entry of the distance vector represents the distance $\delta ^f$ of q to a segment of y. Figure 1 shows a color coded GPS trajectory, where a GPS position is colored according to the distance between q (top right in Fig. 1) and the segment starting at this position. The vector of distances (bottom right in Fig. 1) is the result of s3kr.

To find the section of the longer time series that best fits the shorter time series q, we are only interested in the minimum of the vector of distances. For this purpose we define the minimum distance as min_s3kr: $\textbf{R}^{n_q\times m} \times \textbf{R}^{n_y\times m} \rightarrow \textbf{R}_+$

$$\begin{aligned} \text {min\_s3kr}(q,\ y) := \text {min}(s3kr(q,\ y)). \end{aligned}$$

(10)

When one is only interested in finding the segment with minimum distance (1-NN, nearest neighbor search), it is easily possible to identify multiple unnecessary computation steps. To elaborate this, we first present an example to discuss early abandoning with a fixed threshold $\epsilon $, and then show the adjustments for a 1-NN search. Consider the example of two univariate time series $x = (9, 1, 2, 4, 5, 9, 1, 2, 10)$ and $q = (1, 2, 3)$ and an $\epsilon \text {-query}$ looking for segments of x with a distance to q smaller than $\epsilon =3$. The first segment is $\{x\}_{1,3} = (9, 1, 2)$ and has a Euclidean distance to q of $\sqrt{8^2+1^2+1^2} = \sqrt{66} > \epsilon $. This segment has a distance larger than the threshold $\epsilon =3$ and is therefore not relevant. The question is whether it is possible to recognize this at an early stage before having finished the entire distance calculation. Here we want to stress the monotonicity of the Euclidean distance for time series: The Euclidean distance of two segments of two univariate time series x and y is always smaller or equal to the distance of the same segments extended by any number of observations $(k\ge 0)$:

$$\begin{aligned} \begin{aligned}&{\text {E}}(\{x\}_{i,n}, \{y\}_{j,n}) = \sqrt{\sum _{m=1}^{n} (x_{i+m} - y_{j+m})^2} \le \\ {}&\le \sqrt{\sum _{m=1}^{n+k} (x_{i+m} - y_{j+m})^2} =d_{\text {E}}(\{x\}_{i,n+k}, \{y\}_{j,n+k}) \hspace{1cm} \forall \ i,j,n,k. \end{aligned} \end{aligned}$$

(11)

The same applies to multivariate time series, and to the squared Euclidean distance. It is therefore also possible to compare the squared Euclidean distance with $\epsilon ^2$, and one can observe that already the first element in the sum of the distance calculation is above the threshold: $8^2>3^2=\epsilon ^2$. Hence due to (11) one can already abandon the calculation for index $i=1$ after having only compared the first elements of $\{x\}_{1,3}$ and q. In fact, for this example we only need to finish the distance calculation for $\{x\}_{i,3}$ where $i = $ 2, 3 and 7.

For the 1-NN search the threshold $\epsilon $ stores the distance to the nearest neighbor found so far. And $\epsilon $ is updated as soon as a segment is found which has a smaller distance than all segments evaluated so far – in this example after finishing the distance calculation for $\{x\}_{2,3}$, $\epsilon ^2=1$. Consequently, in this example, we can early abandon the distance calculations for all remaining segments after hitting the threshold $\epsilon =1$. For example, for $x_{3,3}$ the distance calculation already hits $\epsilon =1$ after the first element. The same applies for $x_{i,3}$ for all $i > 2$ except for $i=7$. $x_{7,3} = (1, 2, 10)$, so the first two elements are identical to q, an therefore the distance calculation must complete to recognize that the distance exceeds $\epsilon $.

Early abandoning can help to drastically reduce the number of computation operations. Since $\delta ^f$ is a metric, and applies $d_{\text {E}}$ after shifting, scaling and rotation, (11) also applies for $\delta ^f$. Section 3.3 and Algorithm 2 give details about our implementation of early abandoning for $\delta ^f$.

3.3 Sliding algorithm: S3KR

Comparison of the GPS trajectories x and the query time series q in Fig. 1 rises the following questions: Which segments in x are similar to q, and which segments in x are dissimilar to q? And, how can we quantify these similarities? Answering both questions requires to calculate distances of the separate segments of x and q. To get a similarity measure independent of scale, shift and rotation, we apply $\delta ^f$. Hence, calculating s3kr(q, x) answers the questions. The naive brute force approach to calculate s3kr(q, x) is to extract the set of trajectory segments from x, each equally long as $n_q$ (the length of q) and starting at the indices i for $i\in 1...n_x-n_q+1$. Then we calculate the distance $\delta ^f$ of all these segments to q. In a brute force approach we calculate all these distances independently from each other, so we need to get the parameters to shift, scale and rotate the segments independently from previous distance calculations, even though, there might be a relation of the parameters for consecutive segments. Also, the naive approach does not apply early abandoning to reduce the number of computed operations. The brute force approach would require too much computation time for applying the method on big data sets or in a real time scenario where the analysis needs to be completed before new data is recorded. A more efficient algorithm to calculate s3kr(q, x) would enable a broader applicability.

We propose the algorithms S3KR (described in detail in Algorithm 1) and DELTA (described in Algorithm 2) to calculate s3kr by efficiently making use of the relations of the parameters for scale, shift and rotation. The procedure DELTA is called inside the algorithm S3KR with the rotation matrix R and scaling factor s as input parameters, which are updated inside of the S3KR algorithm. Algorithm 2 can calculate both $\delta $ and $\delta ^f$, dependent on the input parameter R. The former is achieved if R rotates x to fit q, and the latter if R rotates x to fit the reference f. q is supposed to be already scaled and shifted (and rotated to fit f), see Line 3 in Algorithm 1. In the following we walk through the algorithms in detail and discuss how we apply the following strategies for the respective components of the algorithms to accelerate the calculation of $\delta ^f$ and s3kr:

(a)

Incremental update of the dimension means and variances,

(b)

Selection of a reference f,

(c)

Replacement of the SVD by the exact expression of the singular values (for $m=2$),

(d)

Early abandoning and decoupling of the calculation of H and R (see (3)) from the actually scaled and shifted time series.

Figure 3 shows results of runtime experiments², where we compare the adjusted components of the algorithm S3KR to the brute force computation. We simulate synthetic trajectories of variable lengths, and q is simulated to be of equal length as $^{n_x}{10}$.

Incremental update: The sliding window algorithm s3kr benefits from incrementally updating the scaling factor for $\gamma $. This is possible since the scaling factor is a function of the standard deviations per dimension, $\sigma _j$. Having $\sigma _{i,j}$, then $\sigma _{i+1,j}$ is the square root of $\sigma _{i+1,j}^2$:

$$\begin{aligned} \begin{aligned}&\mu _{i+1,j} = \mu _{i,j} + \frac{(x_{i+n,j} - x_{i,j})}{n}\\ {}&\sigma _{i+1,j}^2 = \sigma _{i,j}^2 + \frac{(x_{i+n,j}^2 - x_{i,j}^2)}{n-1} + (\mu _{i,j}^2 - \mu _{i+1,j}^2)\frac{n}{n-1}. \end{aligned} \end{aligned}$$

(12)

Figure 3a demonstrates that incrementally updating (red line) $\mu $ and $\sigma $ is up to four times faster than the brute force method (dashed black).

Selecting a referencef: To demonstrate the benefit of f, we start with rotating the scaled and shifted version of y to fit f. We need the matrix H, where $H = f^T \cdot \gamma ( \tau _0(y))$ (see (3)):

$$\begin{aligned} \begin{aligned}&H_{ij} = \sum _{k=1}^n x_{ki} \cdot \big (\gamma (\tau _0(y))\big )_{kj} = \underbrace{\Big (\sum _{l=1}^m \sigma _l^2\Big )^{-1/2}}_{s} \big ( \sum _{k=1}^n f_{ki} \cdot y_{kj} - y_{1j} \cdot \sum _{k=1}^n f_{ki} \big )\\ {}&\Rightarrow H = s \big ( f^T y - (f^T {\textbf {1}} ) y_{1.} \big ), \end{aligned} \end{aligned}$$

(13)

where³$\textbf{1}$ is the column vector of ones and $y_{1.}$ is the row vector of the first row of y, so the observation at time $i=1$. Next, we set f equal the matrix of zeros in all but the last dimension, and ones in the last dimension. For $m=2$, (13) reduces to

$$\begin{aligned} H= s \cdot n_q \begin{bmatrix} 0 &{} 0 \\ \mu _{i,1} -y_{i1} &{} \mu _{i,2} -y_{i2} \end{bmatrix} \end{aligned}$$

(14)

where $\mu _{i,1}$ is the mean of all $y_{j1} \ \forall i \le j \le i + n_q -1$, and $\mu $ is updated incrementally via (12). Calculating H via (14) instead of (13) saves time since no cross-products of q and the segment of y are necessary, so reduces the runtime complexity for updating H from $O(n_q)$ to O(1) per sliding step. Figure 3b shows the actual effect on the algorithm by comparing the runtime for the cross product (black dashed) to the update of H via (14) (in red). Since the number of sliding steps increase with the length of x (x-axis), the red line shows linear increase even though the complexity of a single sliding step is O(1).

Replace the SVD: For the popular case of $m=2$ (e.g. 2-dimensional GPS trajectories) we derive the rotation angle as described in Section 4.6 of [5], by making use of the exact expression of the singular values of H in (3). Further we make use of the function atan2 to return unambiguous rotation angles for the full range of 0 to 360 degrees.

$$\begin{aligned} \begin{aligned} \alpha&= \text {atan2}(H_{1,2}-H_{2,1},\ \ H_{1,1} + H_{2,2})\\ R&= \begin{bmatrix} \cos {\alpha } &{} -\sin {\alpha } \\ \sin {\alpha } &{} \ \ \ \cos {\alpha } \end{bmatrix}. \end{aligned} \end{aligned}$$

(15)

Figure 3c illustrates that in our experiments this calculation of R is up to 60 times faster than the brute force method which derives R according to (7).

Early abandoning: We come back to the example at the beginning of Section 3.3 and ask the slightly adjusted question, where to find in Fig. 1 a) all segments having a distance smaller than a given threshold $\epsilon $ to q (an $\epsilon \text {-query}$), or similarly b) just the segment with the smallest distance (the 1-nearest neighbor search, 1-NN). Again, the brute force approach is to calculate all distances, and in a second step, for a) to decide which distances are smaller than $\epsilon $, and for b) to apply a linear search to find the minimum. To answer the $\epsilon \text {-query}$ and 1-NN search efficiently, we propose the algorithm S3KR (Alg. 3), that saves computation time by abandoning as much as possible of the calculation process of distances larger than $\epsilon $. Section 3.2 already discussed early abandoning (see (11) and the accompanying example) and next we discuss how Algs. 2 and 3 apply early abandoning. For calculating the distance of q and a segment of y – $\delta ^f(q, \{y\}_{i, n_q})$ – we need to shift, scale and rotate only the segment of y, since $\pi ^f(\gamma (\tau _0(q)))$ is performed only once $\forall i$ (see Line 3 of Alg. 1). The rotation matrix R depends on H, which is the cross product of f and $\gamma ( \tau _0(y))$ (see (3)). However, (14) showed that we can express H as a function of f, y and s, but independent of $\tau (\gamma (y))$. Consequently we can calculate $R^* = \text {argmin}_R d_E(f,\ \tau (\gamma (y)) \cdot R)$ before actually scaling and shifting y. Only the scaling factor s is required, which is updated incrementally at O(1) costs, see (12). This is necessary to plug in the operators $\tau $, $\gamma $ and $\pi ^f$ directly into the implementation of the distance calculation, implemented as for loop iterating over the time index (see Line 4, Alg. 2). As soon as the accumulated distance hits the threshold $\epsilon $, the for loop breaks and further unnecessary scaling, shifting and rotation calculations are abandoned. This way of decoupling the calculation of the rotation matrix, and scaling and shifting the time series helps the algorithm S3KR to save computation time, since unnecessary shifting, scaling and rotation operations are abandoned. With the help of early abandoning of DELTA inside of S3KR we can considerably accelerate k-NN searches and $\epsilon $-queries, where all distances smaller than $\epsilon $ are returned. Algorithm 3 details MIN_S3KR, which is the algorithm to calculate min_s3kr(), so to answer the 1-NN search. The algorithm is similar to S3KR, but updates the threshold $\epsilon $ dependent on the best so far detected index (see Line 12, Alg. 3). With minor adaptations it can also answer the k-NN query: Adjust the threshold management with regard to the latest (to guarantee no overlap) and k best so far detected distances. For details about this threshold management we refer to [10].

Figure 3d compares the absolute runtimes of S3KR (red solid) and a brute force method (black dashed) which does not take advantage of the discussed acceleration methods a)-c). S3KR is 6 to 45 times faster than the brute force method. The figure also shows runtimes for the 1-NN search by MIN_S3KR that benefits evidently from early abandoning, and needs less than 10 ms for the nearest neighbor search with $n_x = 3000$.

3.4 TSS clustering with s3kr

Equipped with the presented sliding algorithm S3KR, we propose a clustering algorithm that applies S3KR to find clusters of shift-scaling-rotation invariant segments of a trajectory.

Figure 4a) motivates this by showing a trajectory y forming two spirals. To reveal structure and repetitive patterns in y we want to cluster the segments of y. But, when we want to cluster the overlapping time series subsequences (TSS) of the trajectory y, the clustering easily becomes meaningless (as discussed in [7]), since the closest neighbor of a segment is often the trivial neighbor with a high overlap, starting one time index earlier or later. Hence, we propose an algorithm that avoids these pitfalls by setting distances of segments with an overlap equal infinity. This ensures that overlapping segments can only end up in the same cluster, if they are both close enough to a third segment, that has no overlap to either of the two. To avoid these pitfalls of calculating meaningless cluster representatives due to overlapping subsequences, we propose the following algorithm to cluster a time series y of length m, with a subsequence length n and number of clusters k:

Distance Matrix: Apply s3kr to calculate the distance matrix D of all pairwise distances $\delta ^f$ of all segments of y. So we get $D_{ij} = \delta ^f(\{y\}_{i, n}, \{y\}_{j, n})$.

Set $D_{ij} = \infty $ where $|i-j| < n$. D now represents a weighted graph where all subsequences are connected, except they have an overlap.

Perform Spectral Clustering [23]: Two overlapping segments can only end in the same cluster, if they are close enough to a third segment that does not overlap with either. Consider the segments of y in Fig. 4c) (here $m = 1000$ and $n = 30$) that start at the first and second index. Both describe the movement of a left turn, but $D_{1,2}$ is set to $\infty $. Since both have a small distance to segments starting after index 31, they end up in the same cluster, describing a left turn.

Get cluster representatives: Say a cluster c consists of segments starting at the indices $i\in I^c$. According to step 3, a cluster can contain overlapping segments. However, the representation of a cluster benefits from presenting only non-overlapping segments, instead of showing redundant information. We define the degree vector for a cluster c, $g^c \in \textbf{R}^{m - n + 1} $ for $l \in (1, ...\ m-n+1)$ as:

$$\begin{aligned} g_l^c = {\left\{ \begin{array}{ll} \infty , &{} \text {if}\;{l\notin I^c}.\\ \frac{1}{\text {number of elements in} I^c, \text {where} |l-j| \ge n}\sum \limits _{\begin{array}{c} j\in I^c \\ |l-j| \ge n \end{array} } D_{lj},&\text {if}\;{l\in I^c}. \end{array}\right. } \end{aligned}$$

(16)

So, $g_l^c$ is the mean of the distances of the segment starting at index l, $y_{l, n}$, to all segments in the same cluster c, $y_{j, n}$, where $j\in I^c$, and the segments $y_{l, n}$ and $y_{j, n}$ have no overlap. Then $g^c$ is the vector of average cluster-internal distances. We call those segments, that represent a local minima of $g^c$ the characteristic segments of c, because these segments represent the cluster the best. In detail, a segment $y_{l, n}$ is a characteristic segment $(seg^{char})$ of cluster c, if $g^c_l = \min _j (g^c_j)$ $\forall |l-j| < n$. The condition $|l-j| < n$ guarantees that characteristic segments do not overlap.

It is sometimes desirable to have a single representative time series for each cluster. In such cases, we average the $seg^{char}$ of each cluster. Figure 4b) depicts the average of the $seg^{char}$ for each cluster in that example, and together with Fig. 4c) emphasizes that the two spirals look identical at first glance, but are of different directions, and so assigned to the clusters 3 and 4. The first cluster describes the straight connection line of the two spirals, and the second cluster represents the transition between the others.

With the spectral approach, our algorithm assigns segments that are not directly connected to a common cluster if sufficient indirect connections via other segments exist. A density-based clustering algorithm such as DBSCAN would achieve a similar result, indeed there are strong theoretical relationships between spectral and density-based methods [17]. However, unlike centroid-based methods, DBSCAN does not detect representative points for each cluster. We obtain representative points from the degree distribution of the similarity matrix. Those representative points greatly facilitate the interpretation of the result.

To cluster the subsequences of a set of time series we apply the former algorithm on each single time series and collect all the $seg^{char}$ of all trips, to clusters these again with spectral clustering. Again, if desired, we can average the segments without any concern of overlap, since the $seg^{char}$ are defined to have no overlap.

Apart from the example in Fig. 4, we also apply this TSS clustering, (a) on a single GPS time series in Section 5.1, and (b) on a set of GPS time series in Section 5.2, and demonstrate that it detects meaningful clusters and cluster centers in both applications.

4 Experimental results

Section 3 elaborates the rotation-invariant distance metric $\delta ^f$, the sliding algorithm S3KR and finally the TSS algorithm, which build on each other. Before we apply $\delta ^f$ embedded in S3KR or TSS, we test the key component $\delta ^f$ itself. In detail, the experiments in the following sections demonstrate the following:

Sections 4.1 & 4.2: The procrustes distances are rotation-invariant, more robust to noise, and faster than the methods from literature.
Section 4.3: Since $\delta ^f$ is a metric, we can build an M-Tree data structure with $\delta ^f$ as distance metric. For a nearest neighbor search in a set of trajectories of equal lengths, the M-Tree outperforms the brute force search.
Section 4.4: Applied on a benchmark dataset of trajectories of varying lengths, the proposed sliding algorithm s3kr performs better, in terms of accuracy and runtime, compared to the methods from literature.

Sections 4.1 and 4.2 evaluate and demonstrate the capabilities of the procrustes distances in an isolated setting, whereas the Sections 4.3, 4.4 and 5 demonstrate the procrustes distances embedded in an M-Tree, a sliding algorithm and a clustering algorithm.

In Section 4.1, 4.2 and 4.3 we simulate trajectories. In Section 4.4 we make use of an open source benchmark data set. The case studies in Sections 5.1 and 5.2 demonstrate how to apply our methods on real world (maritime and urban) GPS trajectories. We selected this elaborate set of experiments to demonstrate the benefits of our proposed methods.

We compare our proposed methods with the following methods from literature applied on trajectory data or time series data.

z-norm: The z-normalization (as e.g. applied in [12, 16]) combined with the Euclidean distance $d_E$ or Dynamic Time Warping (DTW) is not rotation invariant. However, here it deals as baseline method to point out the challenges of discovering rotation invariant patterns in trajectories.

arc-angle: The projection of the coordinates of a trajectory into the space of arc-length and angle ([3]) is a typical approach to measure shift and rotation independent similarities. In the sense of a fair comparison we include a normalization step (according to (2)) to make it scale invariant.

shape-sgnt: The shape signature is a 1-dimensional representation of 2-dimensional shapes, by measuring distances of all points of the shape to its center. This method is often applied in shape recognition [24], and is rotation and shift invariant. We z-normalise the 1-dimensional representation to achieve scale invariance.

cMass: The work by [19] presents a shift-scale-rotation invariant distance measure for trajectories, that transfers the 2-dimensional longitude-latitude time series into the arc-angle space, and then interpolates the angles to be equidistant sampled, and uses DTW to measure the distance. This work proposes three different methods to calculate the angles: ’Exact’, ’Relative’ and ’cMass’. We evaluated all three of these in our experiments in Section 4.1 with varying window size parameters for DTW. Since cMass outperfoms the other two by far, and also performs best in the original work [19], we compare our methods with cMass in the remaining experiments. We set the maximum number of iterations for the modulo normalisation algorithm (described in [19]) equal 10 for the runtime experiments, and equal 100 for the remaining experiments.

We combine the methods z-norm, arc-ang, shape-sgnt and all three variations of [19] with $d_E$ and DTW – with the Sakoe Chiba window [15] of 5, 10, 25 and 100% of the time series length. It is worth noting, that DTW with a window size parameter equal 0, is equivalent to applying the $d_E$. Our initial experiment in Section 4.1 analyses the performance of all these methods dependent on the window size parameter. In the remaining experiments we apply the window sizes that showed best performance in the initial experiment. Table 2 gives an overview of the used parameters across all the experiments in this section and the case studies in Section 5.

We⁴ used a standard laptop computer with 2.8 GHz and 16GB RAM for the runtime comparisons. For a fast DTW computation we applied the R package IncDTW [9], and for cMass we adjusted the local distance and lower bound method as described in [19].

Table 2

Overview of the parameters used in the experiments and case studies presented in the various sections

	Section 4.1	Section 4.2	Section 4.3	Section 4.4	Section 5.1	Section 5.2
k	3				4	9
$\sigma $	$\{0, 0.2, 0.4, 0.6, 0.8, 1\}$		1
ws	$\{0, 5, 10, 20, None\}$ in %	$ws^*$	$ws^*$	$ws^*$

k is the number of clusters for the clustering algorithm. $\sigma $ is the standard deviation of added noise for simulating synthetic trajectories. ws is the Sake Chiba [15] window size parameter for the DTW algorithm. $ws^*$ is the set of ws values, which show best performance for the benchmark methods in Section 4.1. The respective sections provide more details about the choice of the parameters

4.1 Clustering synthetic random walks

We simulate three 2-dimensional random walks of 50 observations as cluster centers. We copy each of the cluster centers 20 times and randomly modify the copies by adding white noise, scaling by a random positive factor, shifting by a random 2-dimensional vector and rotating by a random angle between 0 and 360 degrees. Appendix hyperlinkappen1A gives further details about the simulation of random trajectories. Then we calculate the pairwise distance matrices and cluster these with PAM [14] (with $k=3$, so searching for 3 centers). To keep it straight forward, we set k equal to the number of clusters we simulated and focus the presentation on the Procrustes distance performance. For each combination of the standard deviation of the added noise, and whether random walks are rotated or not, we repeat the whole experiment 30 times to exclude the possibility of a method’s superiority by chance.

We compare our proposed methods with baseline methods that rely on DTW. The performance, as well as the runtime of DTW heavily depend on the window size parameter (ws). When comparing two time series x and y, ws defines the maximum number of observations⁵ that can be matched from x to y.

To guarantee a thorough and fair comparison with the baseline methods, we first evaluate for what values of ws the baseline methods perform best. Figure 5 plots the normalized mutual information (NMI)s versus the standard deviation of added white noise, for rotated (right) and not-rotated (left) random walks. Since we are interested in detecting rotation invariant similarities, we select the best-in-class based on the performance on the right hand side of Fig. 5 (cMass with $ws = 10$, arc-ang with $ws = 5$, znorm with $ws=None$, shape-sgnt with $ws=0$ so the Euclidean distance). For the rest of this work we apply the methods with these values for ws.

We also tested $\delta ^f$ in combination with $\tau _{\mu }$. Figure 6 depicts, that this variation hardly identifies any structure in the data. We concluded that the rotation towards a reference time series that starts in the origin only shows meaningful results when both trajectories also start at the origin, which $\tau _0$ guarantees, but $\tau _{\mu }$ does not.

Finally, Fig. 7 presents the comparison of our proposed methods with the baseline methods, where the applied ws parameters are the best-in-class from Fig. 5

For a deeper visual inspection of the performance of the distance measures in Fig. 7 – independent from a clustering algorithm –, we also plot the pairwise distance matrices. Figure 8 shows the pairwise distance matrices for all trajectories from one of the 30 repetitions, where the trajectories were rotated randomly and noise was added with a standard deviation of 0.6. One can easily identify the clear structure of the three clusters for $\delta ^f$ and $\delta $, but hardly for the shape-signature or one of the others.

This experiment demonstrates a) the Procrustes distances seem to be more robust to noise and clearly outperform the others, and b) that traditional time series distance measures that ignore the rotation of trajectories are incapable of measuring similarities independent of rotations.

4.2 Runtime comparison

Section 3.3 discusses via Fig. 3d the runtimes for the sliding algorithm S3KR (with $\delta ^f$, or $\delta $), and compares these with the brute force alternative and MIN_S3KR. Here we complement these experiments and compare the runtimes for distance measures only, i.e. without sliding algorithm. We simulate 2-dimensional random walks of equal lengths, ranging from 50 to 1000, and apply the same methods which show best results in Section 4.1, especially in connection with the Sakoe Chiba window size parameter for DTW. Figure 9 shows that $\delta ^f$ and $\delta $ are of similar speed, and outperform the other methods by a factor of 10 to 470.

4.3 Classifying synthetic random walks with an M-Tree

One main advantage of our proposed distance measure $\delta ^f$ is, that it is a metric, and so fulfills the triangle inequality. In this experiment we demonstrate how we can make use of this beneficial characteristic. Similar to the previous example, we simulate 2-dimensional random walks (1000 random walks, each of length 50 observations, standard deviation of noise $= 1$, belonging to 10 different groups) and classify these with a 1-NN classifier, in a 10-fold cross validation. The results in Table 3 demonstrate that the Procrustes distances outperform the others in accuracy and runtime. But most importantly, we can also fulfill this task by building an M-Tree [2], which is a special data structure for range queries and nearest neighbor searches. Building an M-Tree⁶ requires as input a set of observations (in our experiment this is the set of all the trajectories from 9 out of the 10 folds, so 900 trajectories) and a distance function fulfilling the triangle inequality, $\delta ^f$. This allows to perform a nearest neighbor search without the need of comparison with all data in the tree. We set the maximum node size of the M-Tree equal 3. To classify a single trajectory takes on average only 71 distance calculations to step down the tree, and find the nearest neighbor. In comparison, without the M-Tree (as performed for all but the first column in Table 3) we need to calculate the distances to all trajectories in the train data, so 900. So we save lots of computation time, which is also why the runtime for the combination $\delta ^f$ & M-Tree is more than 10 times faster than for $\delta $. It is worth mentioning that for each distance calculation only one rotation is necessary (compare Eq. 8), since the time series spanning the tree are already rotated to fit f. Building the M-Tree involves a certain amount of work, since it requires multiple distance calculations to setup this data structure. In this experiment we measured 4.4 seconds to set up the M-Tree for one of the 10 folds, meaning a database of 900 trajectories. However, this overhead is negligible for classifying a trajectory based on an already existing M-Tree. We conclude, that $\delta ^f$ shows a prediction accuracy just as good, or even slightly better than $\delta $, and moreover that being a metric is a major advantage for performing a range query or nearest neighbor search most efficiently.

Table 3

1st row: Average F1 scores (%) for classifying the trajectories with a 1-NN classifier and 10-fold cross validation. 2nd row: Average runtime (in milliseconds) for calculating all distances required to classify a single trajectory

	$\delta ^f$ &	$\delta ^f$	$\delta $	cMass	z-norm	shape-sgnt	arc-ang
	M-Tree			[19]	[12, 16]	[24]	[3]
F1	0.98	0.98	0.97	0.25	0.85	0.84	0.26
runtime (ms)	20	225	218	3757	574	669	3248

The bold entries emphasise the best results in the presented comparison

4.4 Recognizing handwritten symbols

To compare our methods with the distance measures proposed by [19] on a benchmark trajectory data set, we downloaded time series data from the UCI⁷ repository consisting of 2858 trajectories describing the movement of a pen when writing 20 different symbols [21]. These trajectories are similar to traditional GPS trajectories. The trajectories consist of the two dimensions longitude and latitude, and are of different lengths, ranging from 60 to 182 observations. We built a 10-fold cross validation 1-NN classifier based on the same distance measures as in the previous experiment on synthetic data (Section 4.1), but in a sliding fashion and take the minimum of all distances if the two compared time series differ in length. Table 4 summarizes the F1 scores and shows that our proposed Procrustes distance with fixed reference f and zero-shift operator outperforms the others.

We also measured the computation times for calculating all required distances in this experiment. The second row of Table 4 shows that our methods outperform the methods from literature in terms of runtime by a factor of 1.6 to 100.

Table 4

1st row: Average F1 scores (%) for classifying the handwritten symbols with a 1-NN classifier and 10-fold cross validation. 2nd row: Average runtime (in milliseconds) for calculating all distances required in a sliding algorithm for comparing two time series. The algorithm MIN_S3KR is applied to compute the distances $\delta ^f$ and $\delta $

	$\delta ^f$	$\delta $	cMass [19]	z-norm [12, 16]	shape-sgnt. [24]	arc-ang. [3]
F1 (%)	89.40	81.63	80.02	57.35	31.67	36.80
Runtime (ms)	0.3	0.4	5.8	0.5	6.2	30.6

The bold entries emphasise the best results in the presented comparison

The problem statement of this experiment differs slightly from the one in Section 4.3. The trajectories in Section 4.3 are all of the same length, which they are not in this experiment. Consequently, here we apply another distance measure: Two trajectories are similar, if we find a good match of the shorter trajectory in the longer trajectory. For this reason we apply min_s3kr. This experiment showcases the sliding algorithm, and that it is capable of detecting similar patterns.

Applying an M-Tree here would be possible, however with some complications: Consider two trajectories x and y, and say x is longer than y, so $n_x > n_y$. For the comparison of x and y we would need to build an M-Tree (we notate this M-Tree as $M_{xy}$) out of all the segments of x of length $n_y$, $\{x\}_{.,n_y}$. Similar to Section 4.3 this would require the overhead costs of building $M_{xy}$, before finding the most similar segment of x to y, and their distance. Contrary to Section 4.3, here the overhead costs would not be negligible, because in general $M_{xy}$ is of no further usage for comparing x with other trajectories. For comparing x to another trajectory z, a new M-Tree $M_{xz}$ would be required out of the segments $\{x\}_{.,n_z}$ if $n_x> n_z$, or out of the segments $\{z\}_{.,n_x}$ if $n_x < n_z$. Only if $n_z=n_y$ (which is unlikely for real world GPS trajectories), the former M-Tree $M_{xy}$ could be reused for comparing x and z.

We conclude, that there are use cases for both approaches (M-Tree, and sliding algorithm), and that our proposed methods can help to efficiently perform rotation invariant pattern recognition in both situations.

5 Case studies

The following two case studies demonstrate how the TSS algorithm helps to cluster GPS trajectories and detect clusters of similar (rotation-invariant) trajectory segments. These clusters help to gain higher level knowledge and better understand the data.

5.1 Nautical GPS trajectories

To showcase the algorithm for clustering time series subsequences (see 3.4), we analyzed historical GPS tracks from the AIS database⁸. Figure 10 shows an approximately 50km extract of a GPS track of a dredger ship in front of the Danish coast, northwestern of Copenhagen. The record (shown in the left part) starts at the green marker, and ends at the purple marker. The right part of Fig. 10 zooms in the area where the ship is dredging.

We applied the TSS clustering algorithm from Section 3.4 on the approx. 2000 overlapping segments of 500 meters length each. We set k (as we also did in Section 5.2) by visual inspection to get the full spectrum of distinct movement patterns in the data. Figure 10 shows that (a) the biggest cluster 1 describes a straight pattern and is mainly formed by the ship going to the purple marker after finishing the dredging, (b) clusters 2 and 3 describe a left curve and a left u-turn, (c) cluster 4 is small and consists of only N=3 $seg^{char}$, (d) the ship does only one hard right turn, which is the dashed orange line in cluster 3.

The clustering presented in Fig. 10 achieves a value of 92% for the share of the between-sum-of-squares divided by the total-sum-of-squares $(\frac{bss}{tss})$ This means a considerable amount of the total variation in the data is explained by the clustering.

Applying the TSS algorithm requires the calculation of many time series distances, best performed in a sliding approach, as the algorithm S3KR proposes. To point out the benefit of S3KR compared to a brute force approach, we performed this experiment with both approaches and measured the computation time. The calculated distances are identical. S3KR needs 0.8 seconds, whereas the brute force approach needs 51.4 seconds, so S3KR is about 65 times faster. This points out the major benefit of S3KR compared to a naive method, especially for analysing big data bases of GPS trajectory data.

5.2 Urban GPS trajectories

In the final experiment we demonstrate the algorithm for clustering time series subsequences of a set of time series (see Section 3.4). We recorded GPS tracks while travelling by train (6 trips) and tram/light-rail (9 trips) in Vienna, Austria and Zurich, Switzerland. These 15 trips have a total length of ca. 185km, split into 18200 segments of 200m each. We aim to discover distinguished maneuver patterns, rather than splitting the data by the transport modes. That is why we clustered all the train trips and tram trips separately. Figure 11 compares the results. For both modes of transport we set $k=9$. The upper row shows the red cluster centers for train, and the lower row in green for tram. As expected some of the clusters are characterized by straight movements. The prototypical patterns for tram show more variations for turning movements, especially show angles of higher degrees than for the train. This is reasonable since the rails for trains are typically shaped differently to those for trams, that are embedded in the road network. As in Section 5.1 we also measured the ratios $\frac{bss}{tss}$ for the clustering results in Fig. 11, which are 81% for train and 76% for tram. This is just inline with what the visual inspection of Fig. 11 shows. We see slightly more variation within some of the clusters for tram, which means the within-sum-of-squares $wss = tss-bss$ is higher for tram, consequently less of the total variation is explained by the clustering, and $\frac{bss}{tss}$ is smaller for tram than for train. The results of this experiment can help a transport mode classifier model to distinguish transport modes based on recognizing transport mode specific driving maneuvers.

The unique characteristic of our method is to detect scale-shift-rotation invariant similarities of subsequences of trajectories. Even though much work has been developed for trajectory analysis [25], only few discuss rotation invariance, and to the best of our knowledge none in combination with an algorithm for sliding pattern recognition (1-NN and $\epsilon $-queries), or subsequence clustering. In Section 4 we compare our proposed methods with the most related methods from literature [3, 19, 24] and a baseline method for time series analysis [12, 16]. We outperform these in accuracy and computation time in our experiments. In the following we give further details about these methods.

The work by [3, 19] both present approaches to project 2-dimensional trajectories in the arc-angle space. Vlachos et al. [19] propose three different algorithms to calculate the angles: ’Exact’, ’Relative’ and ’cMass’. Their results showed the best performance for cMass, which we also observed in our experiments in Section 4. Feuerhake [3] apply their methods to identify similar trajectories of ball sport athletes. They split trajectories into disjunct segments, contrary to our sliding approach. As Procrustes analysis origins from analysing shapes, we also compared our methods with the shape signature, a method often applied in shape recognition, as e.g. [24]. A shape signature is a representation of a multivariate time series as the one dimensional time series of the Euclidean distances of the original observations to a center point, typically the mean or gravity center.

In literature [12, 16] it is popular to apply the z-normalization before performing time series clustering or classification based on distance measures as e.g. the Euclidean distance or DTW. Section 4 demonstrates how this baseline method performs well on traditional time series, but struggles with recognizing similarities independent from rotation.

More recent works such as [1] focus on queries in databases, or [11] discuss different distance measures for trajectories, but they do neither discuss sliding approaches, nor how to cluster overlapping segments of trajectories. Also [18] compare different similarity measures for trajectories after scaling, shifting and rotating so that the first and final points of trajectories are fixed to each other. This step is not described detailed enough so that we could not include their method in our experiments.

Gawde and Pawar [4] present a method to represent a trajectory by multiple non-overlapping and non-intersecting polygons, which is hardly compatible with our sliding approach for the use case of analysing long trajectories as e.g. Fig. 10 shows. Apart from the polygon representation [4] also apply turning functions to represent the polygons in the arc-angle space, similar to [19], which we included in our experiments in Section 4. Yu et al. [22] elaborate the clustering of contemporary trajectories and therefore only define a similarity measure for trajectories whose recordings overlap in time, which clearly deviates from our methods. Vochten et al. [20] propose a generalized demonstration of motion trajectories in the field of robotics. They search for similar trajectories to a query trajectory by solving an optimal control problem, formulated as a non-convex NLP problem.

We propose a sliding algorithm that runtime-efficiently calculates the Euclidean distance after transformations to be rotation-scaling-shift invariant. Replacing the Euclidean distance by another distance measure would require to replace the Euclidean distance in the formulas (5), (8) and (9). Applying specifically DTW – DTW is a very popular distance metric for time series analysis – would limit the applicability of our proposed sliding algorithm, and so of the TSS clustering algorithm, since the runtime complexity of the Euclidean distance is O(n), and for DTW it is between $O(n^2)$ and $O(n\times w)$, where n is the time series length and w is the window size parameter (see e.g. [15]). Further, when applying DTW, our proposed sliding algorithm would require adjusted logic for early abandoning similar to [8] or [13]. Finally the sliding algorithm would need major adjustment for integrating the rotation-scaling-shifting transformation and a runtime efficient incremental computation of the DTW, as e.g. presented by [10]. We conclude that in general our proposed method can be adjusted to integrate other time series distance methods instead, but see it out of the scope of this paper and leave it open for future work.

We summarize that our approach of sliding Procrustes analysis is novel and compared our methods to those most similar from literature.

7 Conclusion

We proposed a novel shift-scale-rotation invariant distance metric for motion trajectories based on Procrustes analysis. Further, we presented the novel algorithm S3KR, to slide the distance calculation along a longer trajectory and detect characteristic recurring motion patterns. Based on the distance metric and the sliding algorithm we also presented a novel time series subsequence clustering algorithm based on spectral clustering. We demonstrated our methods to outperform the benchmark on synthetic data and a UCI benchmark trajectory data set, in performance (8% to 58%) and runtime (by a factor of up to 470). Also, we demonstrated the benefit of our proposed shift-scale-rotation invariant distance measure being a metric, by applying it in an M-Tree [2] and accelerating the search by a factor of more than 10 compared to the brute force search. Finally we applied our methods on GPS trajectories recorded when travelling by car, train, tram, or ship, to cluster these data and discover prototypical patterns for different transport modes. Our methods can be applied on spatial-temporal trajectory data from various domains for an enhanced understanding of motion content and to gain deep information about typical recurring maneuvers, and so support applications as activity recognition, transport mode identification, automated ticketing, analysis of athletes and many more.

Declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Opportunistic package delivery as a service on road networks

next article Keeping an eye on moving objects: processing continuous spatial-keyword range queries

Appendix A: Trajectory simulation

For the experiments in Section 4 we simulate 2-dimensional trajectories. The goal was to simulate random walks that resemble motion patterns recorded by a GPS device carried by a person or vehicle in a real world scenario. The following formulas attempt that by defining the angle for updating the orientation of the trajectory dependent on its own past. The simulated trajectory x is then randomly modified by adding white noise, a random rotation, scaling and translation. The standard deviation $\sigma _u$ for simulating the white noise u proved in experiments to be a crucial parameter, as Section 4.1 discusses.

$$\begin{aligned} \begin{aligned} x_0&= \begin{pmatrix} 0\\ 0\end{pmatrix}, \ \ x_i = x_{i-1} + \begin{pmatrix} \cos {\beta _i}\\ \sin {\beta _i} \end{pmatrix} \cdot |\text {arclength}_i| \\ \alpha _0&= \beta _0 = 0 \\ \alpha _i&= \alpha _{i-1} + z_i\\ \beta _i&= \beta _{i-1} + \alpha _i\\ y_i&= \begin{pmatrix} \cos {\gamma } \ -\sin {\gamma }\\ \sin {\gamma } \ \cos {\gamma } \end{pmatrix} \cdot \begin{pmatrix} x_{i1} + u_{i1}\\ x_{i2} + u_{i2}\end{pmatrix} \cdot |\text {scalefac}| + \begin{pmatrix} \text {trans}_1\\ \text {trans}_2\end{pmatrix} \\ z_i&,\ \ u_i, \text { arclength}_i, \text { trans}_i \text { and scalefac} \sim N(0,1)\\ u_i&\sim N(0, \sigma _u)\\ \gamma&\sim U(1, 360) \end{aligned} \end{aligned}$$

(17)

Figure 12 shows two examples of simulated trajectories, and their random modifications according to (17) with $\sigma _u = 0.6$.

Appendix B: $\delta ^f$ is a metric

Theorem 1

$\delta ^f$ is a semimetric, such that it fulfills for any x, y, z: $\delta ^f(x,y) \ge 0$, $\delta ^f(x, y) = \delta ^f(y,x)$, and $\delta ^f(x,y)\le \delta ^f(x, z) + \delta ^f(z,y)$.

Proof

Since the Euclidean distance $d_E$ (induced by the Euclidean norm ||.||) is metric, also $\delta ^f(x, y) \ge 0$ for any x and y. The symmetry follows directly from the definition. Let $x^* = \gamma (\tau (x))$:

$$\begin{aligned} \begin{aligned} \delta ^f(x,\ y)&\le \delta ^f(x,\ z) + \delta ^f(z,\ y)\\ d_E(\underbrace{x^* \cdot R}_{a},\ \underbrace{y^* \cdot S}_{b})&\le d_E(\underbrace{x^* \cdot R}_{a},\ \underbrace{z^* \cdot P}_{c}) + d_E(\underbrace{z^* \cdot P}_{c},\ \underbrace{y^* \cdot S}_{b}). \end{aligned} \end{aligned}$$

Since a to c are in $\textbf{R}^{n\times m}$, and independent from each other, and since $d_E$ is metric, the triangle inequality holds for $\delta ^f$. $\square $

The last statement is not true for the distance function $\delta $, since the rotation matrices S on the left side to rotate y to fit x, would be different from the rotation matrix on the right hand side to rotate y to fit z. So, compared to $\delta $, $\delta ^f$ has the major advantage of fulfilling the triangle inequality. Consequently $\delta ^f$ is applicable to build database structures as e.g. an M-Tree [2], that enables fast k-NN and range queries⁹.

It follows from its definition that $\delta ^f$ is an equivalence relation, and that in the space of all equivalence classes induced by $\delta ^f$, $\delta ^f$ is metric. So $\delta ^f(x,y) = 0 \Leftrightarrow $ x and y are equal after scaling, shifting and rotating to fit f.

The benefit of $\tau _0$ will become clear in Section 4, where we compare the performance of the zero-shift and the mean-shift operator

We used a standard laptop computer with 2.8 GHz and 16GB RAM for the runtime comparisons.

For calculating $\delta (q, y)$ – instead of $\delta ^f(q, y)$ – the derivation of H is analogous to (13), but we substitute f by q, and $y_{1.}$ by the vector of column means.

To reproduce our results presented in Sections 4 and 5 all data and code are here: https://tinyurl.com/d7jrestk

To be precise, with $ws=5$ we notate that the window size is equal 5% of the length of x, $ws=0$ is identical to calculating the Euclidean distance, and a $ws =$ ’None’ means that the alignment of x and y is not constrained by a window at all. We refer to [15] or [10] for a detailed discussion of ws.

We applied the implementation of https://github.com/tburette/mtree which follows the original paper [2]

Dua, D., Graff, C.: UCI ML repository (2017), https://archive.ics.uci.edu/ml/datasets/Character+Trajectories

AIS automatic identification system, https://www.dma.dk. Contains data from the Danish Maritime Authority that is used in accordance with the conditions for the use of Danish public data.

also called $\epsilon $-queries

Ali ME, Eusuf SS, Abdullah K, Choudhury FM, Culpepper JS, Sellis T (2018) The maximum trajectory coverage query in spatial databases. Proceedings of the VLDB Endowment 12(3):197–209CrossRef

Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd VLDB

Feuerhake U (2016) Recognition of repetitive movement pattern’s the case of football analysis. ISPRS International Journal of Geo-Information 5(11):208

Gawde G, Pawar J (2018) Similarity search of time series trajectories based on shape. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp 340–343

Gower JC, Dijksterhuis GB et al (2004) Procrustes problems, vol 30. Oxford University Press on Demand

Kabsch W (1976) A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography 32(5):922–923

Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. KAIS 8(2):154–177

Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowledge and information systems 7(3):358–386CrossRef

Leodolter M (2017) IncDTW: Incremental Calculation of Dynamic Time Warping. R package version 1(1):4

10.

Leodolter M, Plant C, Brändle N (2021) IncDTW: an R package for incremental calculation of dynamic time warping. Journal of Statistical Software 99:1–23CrossRef

11.

Pelekis N, Kopanakis I, Marketos G, Ntoutsi I, Andrienko G, Theodoridis Y (2007) Similarity search in trajectory databases. In: 14th International Symposium on Temporal Representation and Reasoning (TIME’07), IEEE, pp 129–140

12.

Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: 18th ACM SIGKDD, pp 262–270

13.

Rath TM, Manmatha R (2002) Lower-bounding of dynamic time warping distances for multivariate time series. University of Massachusetts Amherst Technical Report MM 40:1–4

14.

Reynolds AP, Richards G, de la Iglesia B, Rayward-Smith VJ (2006) Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. Journal of Mathematical Modelling and Algorithms 5(4):475–504MathSciNetCrossRef

15.

Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49CrossRef

16.

Schäfer P (2016) Scalable time series classification. Data Mining and Knowledge Discovery 30(5):1273–1298MathSciNetCrossRef

17.

Schubert E, Hess S, Morik K (2018) The relationship of DBSCAN to matrix factorization and spectral clustering. In: Proceedings of LWDA, pp 330–334

18.

Toohey K, Duckham M (2015) Trajectory similarity measures. Sigspatial Special 7(1):43–50CrossRef

19.

Vlachos M, Gunopulos D, Das G (2004) Rotation invariant distance measures for trajectories. In: Proceedings of the 10th ACM SIGKDD, ACM, pp 707–712

20.

Vochten M, De Laet T, De Schutter J (2019) Generalizing demonstrated motion trajectories using coordinate-free shape descriptors. Robotics and Autonomous Systems 122:103291CrossRef

21.

Williams BH, Toussaint M, Storkey AJ (2006) Extracting motion primitives from natural handwriting data. In: ICANN, Springer, pp 634–643

22.

Yu Q, Luo Y, Chen C, Chen S (2019) Trajectory similarity clustering based on multi-feature distance measurement. Applied Intelligence 49(6):2315–2338CrossRef

23.

Zelnik-Manor L, Perona P (2005) Self-tuning spectral clustering. In: Advances in neural information processing systems, pp 1601–1608

24.

Zhang D, Lu G (2004) Review of shape representation and description techniques. Pattern recognition 37(1):1–19CrossRef

25.

Zheng Y (2015) Trajectory data mining: an overview. ACM Transactions on Intelligent Systems and Technology 6:1–41. https://doi.org/10.1145/2743025MathSciNetCrossRef

Title: Rotation invariant GPS trajectory mining
Authors: Maximilian Leodolter
Claudia Plant
Norbert Brändle
Publication date: 08-06-2023
Publisher: Springer US
Published in: GeoInformatica / Issue 1/2024
Print ISSN: 1384-6175
Electronic ISSN: 1573-7624
DOI: https://doi.org/10.1007/s10707-023-00495-4

x, y, q, f multivariate time series \(\in \textbf{R}^{n\times m}\)	R rotation matrix
\(x_{ij}\), \(x_{i.}\) the j-th or all dimensions of x at time i	\(x^T\) transposed matrix
\(\tau \), \(\gamma \), \(\pi \) shift, scaling and rotation operator	\(\cdot \) matrix/scalar multiplication
\(\mu \), \(\sigma \), s mean, standard deviation, scaling factor	H matrix, R depends on H
\(\{y\}_{i,n}\) segment of y of length n, starts at time i	\|\|.\|\| Euclidean norm

Springer Professional

Rotation invariant GPS trajectory mining

Abstract

Publisher's Note

1 Introduction

2 Procrustes analysis

3 Sliding procrustes analysis

3.1 Operators for sliding distance

3.2 Sliding distance: s3kr

3.3 Sliding algorithm: S3KR

3.4 TSS clustering with s3kr

4 Experimental results

4.1 Clustering synthetic random walks

4.2 Runtime comparison

4.3 Classifying synthetic random walks with an M-Tree

4.4 Recognizing handwritten symbols

5 Case studies

5.1 Nautical GPS trajectories

5.2 Urban GPS trajectories

7 Conclusion

Declarations

Conflicts of Interest

Publisher's Note

Appendix A: Trajectory simulation

Appendix B: \(\delta ^f\) is a metric

	Section 4.1	Section 4.2	Section 4.3	Section 4.4	Section 5.1	Section 5.2
k	3				4	9
\(\sigma \)	\(\{0, 0.2, 0.4, 0.6, 0.8, 1\}\)		1
ws	\(\{0, 5, 10, 20, None\}\) in %	\(ws^*\)	\(ws^*\)	\(ws^*\)

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Procrustes analysis

3 Sliding procrustes analysis

3.1 Operators for sliding distance

3.2 Sliding distance: s3kr

3.3 Sliding algorithm: S3KR

3.4 TSS clustering with s3kr

4 Experimental results

4.1 Clustering synthetic random walks

4.2 Runtime comparison

4.3 Classifying synthetic random walks with an M-Tree

4.4 Recognizing handwritten symbols

5 Case studies

5.1 Nautical GPS trajectories

5.2 Urban GPS trajectories

6 Related work

7 Conclusion

Declarations

Conflicts of Interest

Publisher's Note

Appendix A: Trajectory simulation

Appendix B: \(\delta ^f\) is a metric

Other articles of this Issue 1/2024

Fast slope algorithm with the use of vectorization and parallelization for multicore architectures

Multilingual spatial domain natural language interface to databases

Opportunistic package delivery as a service on road networks

Keeping an eye on moving objects: processing continuous spatial-keyword range queries

Accuracy of UAV-based DEMs without ground control points