1 Introduction
2 Semantic trajectory anonymity protection algorithm
-
Step 1 Semantic trajectory modeling: The algorithm preprocesses the raw data and extracts spatiotemporal sequences, important spatial points (starting points, end points and stay points), velocities and motion modes. In other words, the raw data acquired are transformed into semantic trajectories as defined in Definition 1.
-
Step 2 Sensitive area construction: The sensitive point is processed based on the k-anonymity model, eventually forming a coverage area that contains k − 1 POI points of a similar type to the sensitive point. The coverage area is referred to as the sensitive area.
-
Step 3 Trajectory ambiguity: Trajectory ambiguity is performed according to the users’ motion modes, the road network topologies and the road weights in the sensitive area. The targets of ambiguity mainly include the start–end points and the stay points. The ambiguity methods can be divided into two types, trajectory segment pruning and trajectory segment addition.
-
Step 4 K-anonymity set construction: A similarity comparison is performed to form an anonymity set that contains the other k − 1 trajectories with the highest similarity.
2.1 Semantic trajectory modeling
2.2 Sensitive area construction
2.3 Trajectory ambiguity
2.3.1 Start–end point ambiguity
-
The first step is to calculate the sensitive area.
-
A trajectory can be directly pruned when the trajectory contains the start–end point, involves the motion mode of walking and satisfies the following two conditions:
-
There are trajectory segments that contain different motion modes in the sensitive area.
-
The sensitive point is at least 300 m (200–500 m) away from the end point of the trajectory section that contains the starting point of the trajectory, or the starting point of the trajectory section that contains the end point of the trajectory. The remaining trajectory segments will form new semantic trajectories.
-
-
When the conditions are not satisfied, it is necessary to recalculate the weights of roads in the sensitive area and select a point in the road with the lowest weight as the new start–end point (the start–end point should be at least 300 m away from a sensitive point). The point will be combined with the starting point or ending point of the original trajectory segment in the sensitive area to form a new trajectory segment, eventually forming a new semantic trajectory.
2.3.2 Stay point ambiguity
-
The length of stay exceeds the threshold \( \Delta t \).
-
The length of stay does not exceed the threshold \( \Delta t \).
3 Trajectory set construction based on the K-anonymity model
3.1 Spatial distance measurement
3.2 Temporal distance measurement
3.3 Spatial–temporal distance measurement
3.4 Semantic distance measurement
3.5 Semantic trajectory similarity measurement
-
Step 1 Perform noise reduction on the two trajectories A and B (the trajectory segments merely include the start–end point, velocity outlier, stay point and road network node).
-
Step 2 Interpolate the various points in trajectories A and B into a third trajectory, to eliminate the impacts of different sampling sizes, reference locations and strategies.
-
Step 3 Calculate the spatial distance between the two trajectories by using Eq. 5.
-
Step 4 Calculate the temporal distance between the two trajectories by using Eq. 7.
-
Step 5 Calculate the spatiotemporal similarity between the two trajectories by using Eq. 9.
-
Step 6 Calculate the semantic similarity between the two trajectories for the start–end and stay points.
-
Step 7 Calculate the semantic similarity between the two trajectories for motion modes.
-
Step 8 Obtain the similarity between the two trajectories.