Introduction
Limitations of the existing work
Contributions
-
We introduce the notion of a Constricted Spatio-Temporal Sequential (CSTS) pattern that constitutes concise representation of all ST sequential patterns.
-
We thoroughly analyze theoretical properties of CSTS patterns. Specifically, we show that the set of CSTS patterns is a subset of the set of closed ST sequential patterns and that each CSTS pattern is also a closed ST sequential pattern. Moreover, we show that given the set of Participation Index (PI-)strong CSTS patterns one can obtain the set of all PI-strong ST sequential patterns and approximate participation index of each of them with an approximation margin of \(\pm ~\varepsilon\).
-
We offer a new algorithm called CSTS-Miner that discovers PI-strong CSTS patterns. CSTS-Miner applies an introduced MAX-Tree structure for more efficient candidate patterns generation. The proposed MAX-Tree is generated in two main phases of CSTS-Miner: the first phase called “top-down”, in which all PI-strong ST sequential patterns are generated using the breadth-first strategy, and the second phase called “bottom-up”, which calculates PI-strong CSTS patterns. We also offer the analysis of computational and memory complexity of CSTS-Miner.
-
We experimentally compare the results obtained with the CSTS-Miner algorithm with three other state-of-the-art algorithms discovering ST sequential patterns: the adapted version of STS-Miner [9], STBFM [11], CSTPM [10], which discover PI-strong ST sequential patterns. Moreover, we also compare CSTS-Miner with the CST-SPMiner algorithm [12], which discovers PI-strong closed ST sequential patterns. Similarly to our CSTS patterns, closed spatio-temporal sequential patterns discovered by CST-SPMiner also constitute a concise representation of all ST sequential patterns. For the purpose of experiments, we selected and preprocessed two crime events datasets: the Pittsburgh Police Incident Blotter Dataset and the Boston Crime Incident Reports Dataset. As we show, CSTS-Miner discovers much fewer redundant patterns than the other selected algorithms. Specifically, as the results of the experiments confirm, CSTS-Miner provides up to 60% fewer patterns compared to the STS-Miner, STBFM and CSTPM algorithms and up to 40% fewer patterns compared to the CST-SPMiner algorithm.
-
We provide experimental comparison of the effectiveness and efficiency of the above-mentioned STS-Miner, CSTPM, STBFM and CST-SPMiner algorithms discovering (closed) spatio-temporal sequential patterns. In our comparison, we showed the number of discovered patterns and execution time of each of algorithms for the same parameters of participation index threshold and neighborhood specification. Our implementations (prepared in the C++ language) of the selected algorithms (STS-Miner, STBFM, CSTPM, CST-SPMiner) as well as the proposed CSTS-Miner algorithm are available at the GitHub repository.1
-
Eventually, we provide examples of interesting crime-related patterns discovered by CSTS-Miner from the Pittsburgh Police Incident Blotter Dataset.
Structure
Related work
-
STS-Miner [9]—the first algorithm offered for the discovery of ST sequential patterns. STS-Miner uses the depth-first patterns generation strategy. In this work, we adapted STS-Miner to use the participation index measure instead of the sequence index measure introduced in [9]. Thus, the adapted version of STS-Miner is capable of discovering PI-strong ST sequential patterns.
-
STBFM [11]—the algorithm discovering PI-strong ST sequential patterns by means of the breadth-first pattern generation strategy. Maciąg and Bembenik [11] introduced a structure called SP-Tree that allows to efficiently generate candidate patterns using their first and second parent patterns and a children list of the first parent pattern. In addition, [11] presented the two variations of STBFM that can discover top-K PI-strong ST sequential patterns. The experiments of [11] showed that STBFM is capable of discovering some interesting crime-related patterns.
-
CST-SPMiner [12]—the algorithm discovering closed PI-strong ST sequential patterns. CST-SPMiner applies the breadth-first candidate patterns generation strategy to obtain all PI-strong closed ST sequential patterns. For each obtained PI-strong ST sequential pattern \(\overrightarrow{s}\), CST-SPMiner determines if this pattern is closed or not using a check condition verifying if \(\overrightarrow{s}\) is a closure pattern of any of its parent patterns.
-
CSTPM [10]—the algorithm discovering Cascading Spatio-Temporal Patterns (CSTP). CSTP patterns consist not only of ST sequential patterns but also of cascades of event types. In our work, to directly compare CSTPM with the proposed CSTS-Miner, we adapted the CSTPM algorithm for the discovery of ST sequential patterns rather than cascading ST patterns.
Basic notions
ST sequential patterns
-
A—Vandalism,
-
B—Robbery,
-
C—Simple assault,
-
D—Arson,
-
E—Aggravated assault.
-
\({\textbf{I}}(\overrightarrow{s}, 1) = {\textbf{D}}(A) = \{a_1, a_2\}\);
-
\({\textbf{I}}(\overrightarrow{s}, 2) = \bigcup \limits _{a \in {\textbf{I}}(\overrightarrow{s},1)} {\textbf{N}}(a, B) = \{b_1, b_2, b_3, b_4\}\);
-
\({\textbf{I}}(\overrightarrow{s}, 3) = \bigcup \limits _{b \in {\textbf{I}}(\overrightarrow{s}, 2)} {\textbf{N}}(b, C) = \{c_1, c_2, c_3 \}\);
-
\(PR(\overrightarrow{s}, 1)= 1\),
-
\(PR(\overrightarrow{s}, 2)= \frac{4}{8} = 0.5\) (that is half of all Robbery event instances occur in neighborhoods of Vandalism event instances),
-
\(PR(\overrightarrow{s}, 3)= \frac{3}{8} = 0.375\) (only three instances of Simple assault event type occur in neighborhoods of event instances of the set \({\textbf{I}}(\overrightarrow{s}, 2)\).
Pattern length | Patterns set |
---|---|
\(L_1 (F)\) | A(1), B(1), C(1), D(1), E(1) |
\(L_2\) | \(A \rightarrow B(0.5)\), \(B \rightarrow B(0.625)\), \(B \rightarrow C(0.5)\), \(B \rightarrow D(1)\), \(C \rightarrow E(0.8)\), \(E \rightarrow C(0.5)\) |
\(L_3\) | \(A \rightarrow B \rightarrow B(0.25)\), \(A \rightarrow B \rightarrow C(0.375)\), \(A \rightarrow B \rightarrow D(0.5)\), \(B \rightarrow B \rightarrow C(0.5)\), \(B \rightarrow B \rightarrow D(0.33)\), \(B \rightarrow C \rightarrow E(0.375)\), \(C \rightarrow E \rightarrow C(0.25)\) |
\(L_4\) | \(A \rightarrow B \rightarrow B \rightarrow C(0.25)\), \(A \rightarrow B \rightarrow C \rightarrow E(0.375)\), \(B \rightarrow B \rightarrow C \rightarrow E(0.375)\), \(B \rightarrow C \rightarrow E \rightarrow C(0.25)\) |
\(L_5\) | \(A \rightarrow B \rightarrow B \rightarrow C \rightarrow E(0.25)\), \(A \rightarrow B \rightarrow C \rightarrow E \rightarrow C(0.25)\), \(B \rightarrow B \rightarrow C \rightarrow E \rightarrow C(0.25)\) |
\(L_6\) | \(A \rightarrow B \rightarrow B \rightarrow C \rightarrow E \rightarrow C(0.25)\) |
Closed ST sequential patterns
-
\(A \rightarrow B \rightarrow B(0.25)\),
-
\(C \rightarrow E \rightarrow C(0.25)\),
-
\(A \rightarrow B \rightarrow B \rightarrow C(0.25)\),
-
\(B \rightarrow C \rightarrow E \rightarrow C(0.25)\),
-
\(A \rightarrow B \rightarrow B \rightarrow C \rightarrow E(0.25)\),
-
\(A \rightarrow B \rightarrow C \rightarrow E \rightarrow C(0.25)\),
-
\(B \rightarrow B \rightarrow C \rightarrow E \rightarrow C(0.25)\).
-
\(A \rightarrow B \rightarrow C \rightarrow E(0.375)\),
-
\(B \rightarrow B \rightarrow C \rightarrow E(0.375)\).
Pattern length | Patterns set |
---|---|
\(L_1 (F)\) | A(1), C(1), E(1) |
\(L_2\) | \(B \rightarrow B(0.625)\), \(B \rightarrow D(1)\), \(C \rightarrow E(0.8)\), \(E \rightarrow C(0.5)\) |
\(L_3\) | \(A \rightarrow B \rightarrow D(0.5)\), \(B \rightarrow B \rightarrow C(0.5)\), \(B \rightarrow B \rightarrow D(0.33)\) |
\(L_4\) | \(A \rightarrow B \rightarrow C \rightarrow E(0.375)\), \(B \rightarrow B \rightarrow C \rightarrow E(0.375)\) |
\(L_5\) | \(A \rightarrow B \rightarrow C \rightarrow E \rightarrow C(0.25)\) |
\(L_6\) | \(A \rightarrow B \rightarrow B \rightarrow C \rightarrow E \rightarrow C(0.25)\) |
Discovery of constricted ST sequential patterns
Motivation
Elementary notions
-
\(B \rightarrow B\),
-
\(A \rightarrow B \rightarrow B\),
-
\(B \rightarrow B \rightarrow C\),
-
\(B \rightarrow B \rightarrow D\),
-
\(A \rightarrow B \rightarrow B \rightarrow C\),
-
\(B \rightarrow B \rightarrow C \rightarrow E\),
-
\(A \rightarrow B \rightarrow B \rightarrow C \rightarrow E\),
-
\(A \rightarrow B \rightarrow B \rightarrow C \rightarrow E \rightarrow C\).
-
\(A \rightarrow B \rightarrow B \rightarrow C \rightarrow E \rightarrow C(0.25)\),
-
\(A \rightarrow B \rightarrow B \rightarrow C \rightarrow E(0.25)\),
-
\(B \rightarrow B \rightarrow C \rightarrow E \rightarrow C(0.25)\),
-
\(A \rightarrow B \rightarrow B \rightarrow C(0.25)\),
-
\(B \rightarrow B \rightarrow C \rightarrow E(0.375)\),
-
\(B \rightarrow C \rightarrow E \rightarrow C(0.25)\),
-
\(A \rightarrow B \rightarrow B(0.25)\),
-
\(B \rightarrow B \rightarrow C(0.5)\),
-
\(B \rightarrow C \rightarrow E(0.375)\),
-
\(A \rightarrow B(0.5)\), \(B \rightarrow C(0.5)\), \(E \rightarrow C(0.5)\).
Theoretical properties of CSTS patterns
-
for the parameter \(\varepsilon = 0\) the set of all CSTS patterns is equivalent to the set of all closed ST sequential patterns (Lemma 1);
-
each CSTS pattern is a closed ST sequential pattern regardless of the value of the approximation margin \(\varepsilon\) (Lemma 2);
-
the set of CSTS patterns is a subset of the set of closed ST patterns for any value of the approximation margin parameter \(\varepsilon\) (Theorem 2);
-
it can be verified if an ST sequential pattern \(\overrightarrow{s}\) is PI-strong given the set of PI-strong CSTS patterns and how to approximate PI value of \(\overrightarrow{s}\) (Lemma 3);
-
one can approximate the value of PI of \(\overrightarrow{s}\) with an error no greater than \(\pm ~\varepsilon\) (Lemma 4).
-
the PI value of \(\overrightarrow{s_1}\) being a CSTS pattern of \(\overrightarrow{s}\) equals PI value of \(\overrightarrow{s}\),
-
CSTS pattern \(\overrightarrow{s_1}\) is a maximal supersequence of \(\overrightarrow{s}\).
-
For \(\varepsilon = 0.5\), pattern \(B \rightarrow B \rightarrow C \rightarrow E(PI = 0.375)\) is a CSTS pattern (as it is \(\varepsilon\)-constricted maximal supersequence of e.g. pattern \(C \rightarrow E(PI = 0.8)\)). At the same time pattern \(B \rightarrow B \rightarrow C \rightarrow E\) is also a closed ST sequential pattern (as follows from Table 2).
-
For \(\varepsilon = 0\), the set of CSTS patterns is the same as the set of closed ST sequential patterns. Thus, each closed ST sequential pattern presented in Table 2 is also a CSTS pattern (e.g. pattern \(B \rightarrow D (PI =1)\) is a closed ST sequential pattern and CSTS patterns derived from singleton patterns \(B (PI =1)\) and \(D (PI =1)\)).
-
underestimate the exact PI value of \(\overrightarrow{s}\) by less than \(PI(\overrightarrow{s}) - \varepsilon\), and
-
overestimate the exact PI value of \(\overrightarrow{s}\) by more than \(PI(\overrightarrow{s}) + \varepsilon\).
Constricted ST sequential patterns miner
Notation | Description |
---|---|
\({\textbf{D}}\) | A dataset of spatio-temporal event instances |
\({\textbf{F}}\) | A set of event types |
\(\overrightarrow{s}\) | An ST sequential pattern |
\({\textbf{N}}(e, F)\) | Neighborhood of event instance e with respect to event type \(F \in {\textbf{F}}\) |
\({\textbf{I}}(\overrightarrow{s},i)\) | A set of event instances supporting i-th element of \(\overrightarrow{s}\) |
\(PR(\overrightarrow{s}, i)\) | Participation ratio of i-th element of \(\overrightarrow{s}\) |
\(PI(\overrightarrow{s})\) | Participation index of pattern \(\overrightarrow{s}\) |
\(\text {children}(\overrightarrow{s})\) | Children patterns of \(\overrightarrow{s}\) |
\(\text {parent}_1(\overrightarrow{s})\) | First parent of pattern \(\overrightarrow{s}\) |
\(\text {parent}_2(\overrightarrow{s})\) | Second parent of pattern \(\overrightarrow{s}\) |
\({\mathcal {C}}^{max}(\overrightarrow{s})\) | A set of \(\varepsilon\)-constricted maximal supersequences of \(\overrightarrow{s}\) |
\(\mathcal{R}\mathcal{C}(\overrightarrow{s})\) | Set of ST sequential patterns, for which \(\overrightarrow{s}\) is a \(\varepsilon\)-constricted maximal supersequence |
\(PI_{min}\) | Participation index threshold of discovered patterns |
\(\varepsilon\) | Approximation margin |
“Top-down” phase of the CSTS-miner algorithm
“Bottom-up” phase of the CSTS-miner algorithm
-
Either \(C^{max}(\overrightarrow{s_i}) = 0\) or the PI value of \(\overrightarrow{s}\) equals the PI values of \(C^{max}(\overrightarrow{s_i})\) patterns. In any case, \(\overrightarrow{s}\) is inserted to \(C^{max}(\overrightarrow{s_i})\) and \(\overrightarrow{s_i}\) is inserted to \(\mathcal{R}\mathcal{C}^{max}(\overrightarrow{s_i})\).
-
Alternatively, if the PI value of \(\overrightarrow{s}\) is greater than the PI values of patterns in \(C^{max}(\overrightarrow{s_i})\), then \(\overrightarrow{s_i}\) is removed from the \(\mathcal{R}\mathcal{C}^{max}\) lists of all patterns in \(C^{max}(\overrightarrow{s_i})\) and \(C^{max}(\overrightarrow{s_i})\) is set to be empty. Next, \(\overrightarrow{s}\) is added to \(C^{max}(\overrightarrow{s_i})\) and \(\overrightarrow{s_i}\) is added to \(\mathcal{R}\mathcal{C}^{max}(\overrightarrow{s})\). These operations fulfill the second condition of Definition 10.
CSTS-miner complexity analysis
Experiments
Selected datasets
Pittsburgh police incident blotter dataset
Parameter | Value |
---|---|
No. of incident types (|F|) | 236 |
No. of incident instances (|D|) | 72 867 |
Avg. no. of incidents per type | 309 |
Median no. of incidents per type | 26 |
Std. of no. of incidents per type | 784 |
Min. no. of instances in a type | 1 |
Max. no. of instances in a type | 6201 |
Boston crime incident reports dataset
Parameter | Value |
---|---|
No. of incident types (|F|) | 26 |
No. of incident instances (|D|) | 40 545 |
Avg. no. of incidents per type | 1559 |
Median no. of incidents per type | 527 |
Std. of no. of incidents per type | 2176 |
Min. no. of instances in a type | 1 |
Max. no. of instances in a type | 8575 |
Parameter | Value |
---|---|
No. of incident types (|F|) | 10 |
No. of incident instances (|D|) | 896 |
Avg. no. of incidents per type | 90 |
Median no. of incidents per type | 72 |
Std. of no. of incidents per type | 83 |
Min. no. of instances in a type | 1 |
Max. no. of instances in a type | 245 |
Experimental setup
Results of the experiments
-
For the Pittsburgh Police Incident Blotter Dataset: R = 350 m, T = 11,520 (8 days), \(PI_{min}\) = \(\{0.33, 0.32, \dots , 0.25\}\), \(\varepsilon\) = \(\{0.025, 0.05, 0.075\}\).
-
For the complete Boston Crime Incidents Report Dataset: R = 300 m, T = 5760 min (4 days), \(PI_{min}\) = \(\{0.055, 0.05, \dots , 0.015\}\), \(\varepsilon\) = \(\{0.05, 0.1, 0.15\}\).
-
For the reduced Boston Crime Incidents Report Dataset: R = 500 m, T = 43,200 min (30 days), \(PI_{min} = \{0.01, 0.0095, \dots , 0.005\}\), \(\varepsilon = \{0.05, 0.1, 0.15\}\).
R = 350 meters, T = 11520 (8 days) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
\(PI_{min}\) | PI-strong ST seq. patterns | PI-strong closed ST seq. patterns | PI-strong CSTS patterns | |||||||||||
STS-Miner [9] | CSTPM [10] | STBFM [11] | CST-SPMiner [12] | CSTS-Miner \((\varepsilon = 0.025)\) | CSTS-Miner \((\varepsilon = 0.05)\) | CSTS-Miner \((\varepsilon = 0.075)\) | ||||||||
Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | |
0.33 | 26 | 796 | 9 | 796 | 9 | 796 | 9 | 657 | 9 | 594 | 9 | 576 | 9 | 556 |
0.32 | 31 | 1002 | 9 | 1002 | 9 | 1002 | 9 | 819 | 9 | 712 | 9 | 681 | 9 | 666 |
0.31 | 39 | 1320 | 9 | 1320 | 9 | 1320 | 9 | 1065 | 9 | 883 | 9 | 827 | 9 | 802 |
0.3 | 56 | 1990 | 9 | 1990 | 9 | 1990 | 9 | 1573 | 9 | 1161 | 9 | 1050 | 9 | 997 |
0.29 | 116 | 4371 | 11 | 4371 | 10 | 4371 | 10 | 3323 | 10 | 2235 | 10 | 2018 | 10 | 1932 |
0.28 | 269 | 10,434 | 17 | 10,434 | 12 | 10,434 | 12 | 7741 | 12 | 4516 | 12 | 4206 | 12 | 4104 |
0.27 | 898 | 35,166 | 100 | 35,166 | 19 | 35,166 | 19 | 25,446 | 19 | 14,956 | 20 | 14,182 | 22 | 13,939 |
0.26 | – | – | 1658 | 143,666 | 47 | 143,666 | 47 | 100,235 | 51 | 60,977 | 67 | 58,907 | 99 | 58,519 |
0.25 | – | – | – | – | 206 | 1,073,917 | 206 | 683,860 | 253 | 515,128 | 712 | 509,232 | 1744 | 508,492 |
R = 300 meters, T = 5760 (4 days) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
\(PI_{min}\) | PI-strong ST seq. patterns | PI-strong closed ST seq. patterns | PI-strong CSTS patterns | |||||||||||
STS-Miner [9] | CSTPM[10] | STBFM [11] | CST-SPMiner [12] | CSTS-Miner \((\varepsilon = 0.1)\) | CSTS-Miner \((\varepsilon = 0.05)\) | CSTS-Miner \((\varepsilon = 0.15)\) | ||||||||
Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | |
0.055 | 69 | 3982 | 4 | 3982 | 3 | 3982 | 3 | 3346 | 3 | 2472 | 3 | 2232 | 3 | 2170 |
0.05 | 92 | 5386 | 4 | 5386 | 3 | 5386 | 4 | 4484 | 3 | 3265 | 3 | 2977 | 3 | 2913 |
0.045 | 129 | 7550 | 6 | 7550 | 4 | 7550 | 4 | 6248 | 4 | 4410 | 4 | 4060 | 4 | 3991 |
0.04 | 199 | 12,014 | 9 | 12,014 | 5 | 12,014 | 5 | 9899 | 5 | 6713 | 5 | 6292 | 5 | 6211 |
0.035 | 350 | 22,191 | 23 | 22,191 | 6 | 22,191 | 6 | 17,732 | 6 | 11,210 | 6 | 10,663 | 6 | 10,576 |
0.03 | 676 | 43,327 | 77 | 43,327 | 8 | 43,327 | 8 | 33,954 | 9 | 21,091 | 8 | 20,449 | 8 | 20,328 |
0.025 | – | – | 440 | 102,731 | 14 | 102,731 | 14 | 79,701 | 14 | 48,775 | 14 | 47964 | 14 | 47797 |
0.02 | – | – | – | – | 33 | 367,400 | 32 | 282,156 | 33 | 168,511 | 33 | 167,361 | 34 | 167,139 |
0.015 | – | – | – | – | 145 | 2,819,490 | 142 | 2,040,303 | 160 | 1,173,939 | 176 | 1,172,294 | 190 | 1,171,955 |
R = 500 meters, T = 43200 (30 days) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
\(PI_{min}\) | PI-strong ST seq. patterns | PI-strong closed ST seq. patterns | PI-strong CSTS patterns | |||||||||||
STS-Miner [9] | CSTPM [10] | STBFM [11] | CST-SPMiner [12] | CSTS-Miner \((\varepsilon = 0.05)\) | CSTS-Miner \((\varepsilon = 0.1)\) | CSTS-Miner \((\varepsilon = 0.15)\) | ||||||||
Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | Time | # Patterns | |
0.01 | 6 | 34,102 | 46 | 34,102 | 0 | 34,102 | 0 | 9987 | 9 | 8629 | 13 | 8580 | 17 | 8575 |
0.0095 | 6 | 34,102 | 46 | 34,102 | 0 | 34,102 | 0 | 9987 | 9 | 8629 | 13 | 8580 | 17 | 8575 |
0.009 | 6 | 34,102 | 46 | 34,102 | 0 | 34,102 | 0 | 9987 | 9 | 8629 | 13 | 8580 | 17 | 8575 |
0.0085 | 6 | 34,102 | 46 | 34,102 | 0 | 34,102 | 0 | 9987 | 9 | 8629 | 13 | 8580 | 17 | 8575 |
0.008 | 27 | 141,860 | 1055 | 141,860 | 0 | 141,860 | 0 | 51,219 | 217 | 44,228 | 318 | 44,165 | 393 | 44,165 |
0.0075 | 27 | 141,860 | 1048 | 141,860 | 0 | 141,860 | 0 | 51,219 | 215 | 44,228 | 324 | 44,165 | 393 | 44,165 |
0.007 | 27 | 141,860 | 1050 | 141,860 | 0 | 141,860 | 0 | 51,219 | 216 | 44,228 | 318 | 44,165 | 393 | 44,165 |
0.0065 | 27 | 141,860 | 1099 | 141,860 | 0 | 141,860 | 0 | 51,219 | 216 | 44,228 | 317 | 44,165 | 393 | 44,165 |
0.006 | 30 | 161,238 | 1394 | 161,238 | 0 | 161,238 | 0 | 56,812 | 277 | 47,407 | 405 | 47,347 | 487 | 47,342 |
0.0055 | 30 | 161,238 | 1381 | 161,238 | 0 | 161,238 | 0 | 56,812 | 278 | 47,407 | 405 | 47,347 | 487 | 47,342 |
0.005 | 43 | 228,285 | 2787 | 228,285 | 1 | 228,285 | 1 | 76,894 | 369 | 65,967 | 530 | 65,903 | 624 | 65,899 |