Mining frequent trajectory patterns in spatial–temporal databases

doi:10.1016/j.ins.2009.02.016

Information Sciences

Volume 179, Issue 13, 13 June 2009, Pages 2218-2231

https://doi.org/10.1016/j.ins.2009.02.016 Get rights and content

Abstract

In this paper, we propose an efficient graph-based mining (GBM) algorithm for mining the frequent trajectory patterns in a spatial–temporal database. The proposed method comprises two phases. First, we scan the database once to generate a mapping graph and trajectory information lists (TI-lists). Then, we traverse the mapping graph in a depth-first search manner to mine all frequent trajectory patterns in the database. By using the mapping graph and TI-lists, the GBM algorithm can localize support counting and pattern extension in a small number of TI-lists. Moreover, it utilizes the adjacency property to reduce the search space. Therefore, our proposed method can efficiently mine the frequent trajectory patterns in the database. The experimental results show that it outperforms the Apriori-based and PrefixSpan-based methods by more than one order of magnitude.

Introduction

With advances in tracking technologies and the rapid improvement in location-based services, a large amount of spatial–temporal data has been collected in databases [8]. As a result, mining implicit and useful patterns in such databases has attracted increasing attention recently. The entities in a database that change their locations over time are defined as “moving objects”, where the movement (i.e., the trajectory) of an object can be described as a sequence of spatial coordinates, each of which is associated with a time stamp.

Finding the frequently repeated trajectory patterns can help us analyze and predict the movements of objects. Temporal information in the patterns makes many applications that are only concerned with the spatial information more realistic and practical. For example, using temporal information to understand customers’ movement (trajectory) patterns can help us make an appropriate recommendation or a mobile advertisement for them. In ecology, analyzing animals’ movement routes can help us better understand their behavior or detect a strange phenomenon or disaster, especially if some animals’ movement patterns change abruptly.

Therefore, in this paper, we propose an efficient algorithm that takes into account both spatial and temporal attributes to mine the frequent trajectory patterns. The proposed method comprises two phases. First, we scan the database once to generate a mapping graph and trajectory information lists (TI-lists). Then, we apply the proposed graph-based mining (GBM) algorithm to traverse the mapping graph and mine the frequent trajectory patterns in a depth-first search manner. Since our proposed algorithm uses the adjacency property to reduce the search space, we only need to extend a trajectory to the adjacent neighborhoods of the last location of the trajectory pattern. By using the mapping graph and TI-lists, the GBM algorithm can localize support counting and pattern extension to a small number of TI-lists. Thus, it is more efficient than the Apriori-based and PrefixSpan-based algorithms.

The main contributions of this study are summarized as follows: (1) We propose a spatial–temporal pattern to describe a trajectory, where both the spatial and temporal attributes are simultaneously considered. (2) By using the mapping graph, the GBM algorithm can localize support counting and pattern extension to a small number of TI-lists. (3) The GBM algorithm exploits the adjacency property to mine frequent trajectory patterns and reduce the search space. (4) We propose an efficient mining algorithm for discovering frequent trajectory patterns without candidate generation.

The remainder of this paper is organized as follows: Section 2 discusses the related work. Section 3 describes the preliminary concepts and the problem definitions used throughout the paper. Section 4 presents the proposed algorithm in detail. Section 5 illustrates the performance of our approach. Finally, Section 6 describes concluding remarks and future work.

Section snippets

Related work

Mining frequent itemsets is one of the fundamental problems in the area of data mining. An itemset is considered frequent if its support is not less than a user-specified minimum support threshold, where the support of an itemset is defined as the percentage of transactions in the database that contain the itemset. Many itemset mining algorithms [1], [15], [17], [24], [36], [38], [39] have been proposed for mining the frequent itemsets in a database. Agrawal et al. [1] proposed an Apriori

Preliminaries and problem definitions

A reference space of size e × e is a 2-dimensional space, where the reference space is a square, the length and width of the reference space are equal to e, e is an integer, and e ≧ 1. The x- and y-coordinates of a point g_i in the reference space are denoted by g_i · x and g_i · y, respectively, where g_i · x and g_i · y are integers, 1 ≦ g_i · x ≦ e, and 1 ≦ g_i · y ≦ e. Consider a spatial–temporal database D = {T₁, T₂, … , T_u}, where T_i is the trajectory of a moving object, T_i = 〈(x₁, y₁, t₁), (x₂, y₂, t₂), … , (x_m, y_m, t_m)〉, (x_j, y_j) is

The proposed method

In this section, we discuss the proposed GBM (graph-based mining) algorithm for mining the frequent patterns. As mentioned earlier, the proposed algorithm comprises two phases. First, we transform all trajectories in the database into a mapping graph. For each vertex in the mapping graph, we use a data structure, called a Trajectory Information list (TI-list), to record all trajectories that pass through the vertex. Then, we apply the proposed mining algorithm to find all frequent patterns. We

Performance evaluation

We conducted experiments to compare the proposed method with the modified MP (Moving Pattern mining) algorithm [9] (hereafter MP) and the modified PrefixSpan algorithm (hereafter PrefixSpan).

The original PrefixSpan algorithm is designed to mine sequential patterns over sequence databases and not trajectory patterns over trajectory databases, where time is not explicit as it is in a trajectory. Thus, the modified PrefixSpan algorithm needs to consider how to embed the time constraint into the

Conclusions and future work

In this paper, we have proposed an efficient graph-based mining (GBM) algorithm for mining frequent trajectory patterns. The algorithm comprises two phases. In the first phase, we transform all trajectories in the database into a mapping graph. Then, we create a TI-list for each vertex to record all trajectories that pass through the vertex. In the second phase, using the information recorded in TI-lists, the proposed algorithm traverses the mapping graph in a depth-first search manner to mine

Acknowledgements

The authors are grateful to the anonymous referees for their helpful comments and suggestions. This research was supported in part by the National Science Council of Republic of China under Grant No. NSC 97-2410-H-002-117.

References (41)

Y.L. Chen et al.
Constraint-based sequential pattern mining: the consideration of recency and compactness
Decision Support Systems
(2006)
D.Y. Choi
Personalized local internet in the location-based mobile web search
Decision Support Systems
(2007)
T. Hu et al.
Discovery of maximum length frequent itemsets
Information Sciences
(2008)
Y. Huang et al.
Mining maximal hyperclique pattern: a hybrid search strategy
Information Sciences
(2007)
A.J.T. Lee et al.
Mining spatial association rules in image databases
Information Sciences
(2007)
A.J.T. Lee et al.
Mining association rules with multi-dimensional constraints
The Journal of Systems and Software
(2006)
A.J.T. Lee et al.
Efficient data mining for calling path patterns in GSM networks
Information Systems
(2003)
C.Y. Wang et al.
Flexible online association rule mining based on multidimensional pattern relations
Information Sciences
(2006)
J.X. Yu et al.
A false negative approach to mining frequent itemsets from high speed transactional data streams
Information Sciences
(2006)
U. Yun
Efficient mining of weighted interesting patterns with a strong weight and/or support affinity
Information Sciences
(2007)

U. Yun

A new framework for detecting weighted sequential patterns in large sequence databases

Knowledge-Based Systems

(2008)

R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings...

J, Ayres, J.E. Gehrke, T. Yiu, J. Flannick, Sequential pattern mining using a bitmap representation, in: Proceedings of...

A. Brilingaite, C.S. Jensen, N. Zokaite, Enabling routes as context in mobile services, in: Proceedings of the 12th...

H. Cao, N. Mamoulis, D.W. Cheung, Mining frequent spatio-temporal sequential patterns, in: Proceedings of the 5th IEEE...

H. Cao, N. Mamoulis, D.W. Cheung, Discovery of collocation episodes in spatiotemporal data, in: Proceedings of the...

H. Cao et al.

Discovery of periodic patterns in spatiotemporal sequence

IEEE Transactions on Knowledge and Data Engineering

(2007)

J.D. Chung, O.H. Paek, J.W. Lee, K.H. Ryu, Temporal moving pattern mining for location-based service, in: Proceedings...

M. Garofalakis et al.

Mining sequential patterns with regular expression constraints

IEEE Transactions on Knowledge and Data Engineering

(2002)

F. Giannotti, M. Nanni, D. Pedreschi, F. Pinelli, Mining sequences with temporal annotations, in: Proceedings of the...

Cited by (103)

Mining significant local spatial association rules for multi-category point data
2024, Heliyon
Spatial association rule mining can reveal the inherent laws of spatial object interdependence and is an important part of spatial data mining. Most of the existing algorithms for mining local spatial association rules are oriented towards the spatial association between two categories of points and cannot fully reflect the spatial heterogeneity of complex spatial relations among multiple categories of points. In addition, the interactions between points in different categories are often asymmetrical. However, the existing algorithms ignore this asymmetry. To address the above problems, an algorithm for mining local spatial association rules for point data of multiple categories based on position quotients is proposed. First, the proximity relationship between points is determined by an adaptive filter, and the spatial weight value is given according to Gaussian kernel function. Then, the multivariate local colocation quotient of each point is calculated to measure the strength of the local regional spatial association rule. Finally, the Monte Carlo simulation function is used to generate a random sample distribution to test the significance of the results. The algorithm is verified on artificial simulation data and real Point of Interest (POI) data. The experimental results show that the algorithm can identify significant association regions of different spatial association rules for point sets.
Changes in mobility amid the COVID-19 pandemic in Sapporo City, Japan: An investigation through the relationship between spatiotemporal population density and urban facilities
2023, Transportation Research Interdisciplinary Perspectives
By the end of 2021, the Omicron variant of coronavirus disease 2019 had become the dominant cause of a worldwide pandemic crisis. This demands a deeper analysis to support policy makers in creating interventions that not only protect people from the pandemic but also remedy its negative effects on the economy. Thus, this study investigated people’s mobility changes through the relationship between spatiotemporal population density and urban facilities. Results showed that places related to daily services, restaurants, commercial areas, and offices experienced decreased visits, with the highest decline belonging to commercial facilities. Visits to health care and production facilities were stable on weekdays but increased on holidays. Educational institutions’ visits decreased on weekdays but increased on holidays. People’s visits to residential housing and open spaces increased, with the rise in residential housing visits being more substantial. The results also confirmed that policy interventions (e.g., declaration of emergency and upgrade of restriction level) have a great impact on people’s mobility in the short term. The findings would seem to indicate that visit patterns at service and restaurant places decreased least during the pandemic. The analysis outcomes suggest that policy makers should pay more attention to risk perception enhancement as a long-term measure. Furthermore, the study clarified the population density of each facility type in a time series. Improving model performance would be promising for tracking and predicting the spread of future pandemics.
A multi-agent system for solving the Dynamic Capacitated Vehicle Routing Problem with stochastic customers using trajectory data mining
2022, Expert Systems with Applications
The worldwide growth of e-commerce has created new challenges for logistics companies, such as delivering products quickly and cheaply. This paper presents a heuristic to solve the last-mile route creation problem dynamically. The heuristic is based on a multi-agent system integrated with trajectory data mining techniques to extract territorial patterns and use them to solve the Dynamic Capacitated Vehicle Routing Problem with Stochastic Customers. Our solution approach is focused on a linear-time heuristic that depends only on the Warehouse system configurations and not on the total number of packages processed, which is suitable for express delivery logistics companies that must process a large number of packages per day. We compare our proposal with benchmark algorithms from the literature; additionally, we evaluate its performance and robustness under different scenarios. Results show that our solution approach is effective for scenarios in which routes must be set dynamically from a continuous stream of packages.
Genders prediction from indoor customer paths by Levenshtein-based fuzzy kNN
2019, Expert Systems with Applications
Companies have an advantage over the competitors if they can present customized offers to customers. Demographic information of customers is critical for the companies to develop individualized systems. While current technologies make it easy to collect customer data, the main problem is that demographic data are usually incomplete. Hence, several methods are developed to predict unknown genders of customers. In this study, customer genders are predicted from their paths in a shopping mall using fuzzy sets. A fuzzy classification method based on Levenshtein distance is developed for string data that refer to the indoor customer paths. Although there are several ways to predict the gender, no study has focused on path-based gender classification. The originality of the research is to classify customer data into the gender classes using indoor paths.
Group evolution patterns in running races
2019, Information Sciences
We address the problem of tracking and detecting interactions between the different groups of runners that form during a race. In athletic races control points are set to monitor the progress of athletes over the course. Intuitively, a group is a sufficiently large set of athletes that cross a control point together. After adapting an existing definition of group to our setting we go on to study two types of group evolution patterns. The primary focus of this work are evolution patterns, i.e. the transformation and interaction of groups of athletes between two consecutive control points. We provide an accurate geometric model of the following evolution patterns: survives, appears, disappears, expands, shrinks, merges, splits, coheres and disbands, and present algorithms to efficiently compute these patterns. Next, based on the algorithms introduced for identifying evolution patterns, algorithms to detect long-term patterns are introduced. These patterns track global properties over several control points: surviving, traceable forward, traceable backward and related forward and backward. Experimental evaluation of the algorithms provided is presented using real and synthetic data. Using the data currently available, our experiments show how our algorithms can provide valuable insight into how running races develop. Moreover, we also show how, even if dense (synthetic) data is considered, our algorithms are also able to process it in real time.
An indoor trajectory frequent pattern mining algorithm based on vague grid sequence
2019, Expert Systems with Applications
Citation Excerpt :
In the data preprocessing phase, the trajectory sequence is transformed into a sequence of regions of interest, and regions of interest are divided into two categories according to the spatial discretization: default regions of interest and hot regions of interest. Lee, Chen, and Ip (2009) proposed a trajectory frequent pattern mining algorithm GBM based on graph, which firstly scans the database once to generate a mapping graph and trajectory information lists(TI-lists), and then traverses the mapping graph by depth-first walk to mine the trajectory frequent patterns. Luo, Tan, Chen, and Ni (2013) investigated the most frequent trajectory query based on time cycle to analyze the most frequent road choice of most pedestrians.
Trajectory frequent pattern mining is an important branch of data mining. The constraint of indoor space is between Euclid space and road network space, which makes it difficult to represent the approximate positions. Grid partition method is a feasible way to solve this problem, but it will lead to a sharp problem of grid boundary. Considering the indoor trajectory frequent pattern mining, this paper proposes a grid partition method based on vertical projection distance (VGS) and a trajectory frequent pattern mining algorithm based on vague grid sequence (VGS-PrefixSpan). At first, each grid is divided into explicit zones and vague zones according to vertical projection distance. Then the trajectories are transformed into vague grid sequences. At last, VGS-PrefixSpan is a PrefixSpan-like algorithm to mine trajectory frequent patterns from vague grid sequences. Experimental results show that VGS-PrefixSpan has better performance than VSP-PrefixSpan under the same area ratio of explicit zones and covered zones, and has better mining results than VSP-PrefixSpan and GS-PrefixSpan under any value of $Min_Support$ . In terms of mining efficiency, the total time of VGS-PrefixSpan is close to GS-PrefixSpan and less than VSP-PrefixSpan about two orders of magnitude. Therefore, VGS-PrefixSpan is an effective and efficient algorithm in mining frequent patterns of indoor trajectories. As a research hotspot in Location Based Services (LBS), mining frequent patterns of indoor trajectories can protect the trajectory privacy of users from being leaked or mitigating the risk of leakage. Therefore, the study of trajectory frequent patterns is of great significance to public security and personal information protection.

View all citing articles on Scopus

View full text

Mining frequent trajectory patterns in spatial–temporal databases

Abstract

Introduction

Section snippets

Related work

Preliminaries and problem definitions

The proposed method

Performance evaluation

Conclusions and future work

Acknowledgements

Decision Support Systems

Decision Support Systems

Information Sciences

Information Sciences

Information Sciences

The Journal of Systems and Software

Information Systems

Information Sciences

Information Sciences

Information Sciences

Knowledge-Based Systems

Discovery of periodic patterns in spatiotemporal sequence

IEEE Transactions on Knowledge and Data Engineering

Mining sequential patterns with regular expression constraints

IEEE Transactions on Knowledge and Data Engineering