Zum Inhalt

A novel ANP-PSO framework for clustering transportation modes from GPS tracking data

  • Open Access
  • 07.03.2026

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Dieser Artikel stellt ein neuartiges ANP-PSO-Rahmenwerk zur Clusterbildung und Klassifizierung von Transportarten anhand von GPS-Trackingdaten vor. Die Studie befasst sich mit den Herausforderungen der präzisen Erkennung von Transportarten ohne umfangreiche gekennzeichnete Daten und schlägt einen zweistufigen Prozess vor, der unbeaufsichtigte K-Mittel-Cluster mit einer hybriden ANP-PSO-Methode kombiniert. Das Rahmenwerk nutzt multikriterielle Entscheidungsfindung und meta-heuristische Optimierung, um die Klassifikationsgenauigkeit zu verbessern und gleichzeitig den Bedarf an gekennzeichneten Daten zu verringern. Die Ergebnisse zeigen die Effektivität des vorgeschlagenen Algorithmus, der bei der Klassifizierung der Transportarten eine Genauigkeit von über 92% erreicht. Die Studie vergleicht die ANP-PSO-Hybridmethode mit anderen etablierten Ansätzen und hebt ihre überlegene Leistung hervor. Der Artikel schließt mit der Diskussion der praktischen Anwendung des Rahmenwerks in der Verkehrsplanung und in intelligenten Transportsystemen sowie potenziellen zukünftigen Forschungsrichtungen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Transportation plays a pivotal role in our daily lives, with various aspects and choices that need to be considered, such as selecting the most suitable mode of transport (Feng and Timmermans 2013; Gong et al. 2020; Moiseeva et al. 2010). The imperative of making well-informed decisions regarding transportation modes is essential for ensuring the optimization of travel efficiency and effectiveness (Chan et al. 2024; Feng and Timmermans 2016; Sadeghian et al. 2024a). Therefore, it has prompted the extensive adoption of the Analytic Hierarchy Process (AHP) within the transportation domain as a robust decision-making technique (Saaty 1988; Sipahi and Timor 2010; Vaidya and Kumar 2006). However, one limitation of AHP is its hierarchical and one-way relationship assumption between decision elements, overlooking their interdependencies (Sipahi and Timor 2010). As transportation systems become increasingly multimodal and data-intensive, it is essential to explicitly model the interdependencies among GPS-derived mobility features (Axhausen and Gärling 1992; Batty et al. 2012).
To address this, the Analytic Network Process (ANP) was introduced, enabling the consideration of interdependencies among all transportation decision elements, such as criteria, sub-criteria, and options (Saaty et al. 2013; Saaty and Vargas 2006; Sipahi and Timor 2010). ANP provides a more comprehensive framework for decision-making in transportation, particularly in the context of transport mode detection (Sipahi and Timor 2010). While ANP has been valuable in transportation decision-making, it has not been employed in direct detection of transport modes. Moreover, the introduction of global positioning systems (GPS) has affected transportation by offering precise location tracking and the ability to identify human behaviors and transportation modes (Dabiri et al. 2019; Sauerländer-Biebl et al. 2017; Stopher et al. 2008). Combining ANP with GPS-based mobility data does not directly alter traffic conditions or transportation system performance. Rather, this integration provides a structured analytical framework for understanding travel behavior and transportation mode choices, thereby supporting evidence-based transport planning and policy development (Łukawska et al. 2023; Sadeghian et al. 2024b; Shakibaei et al. 2021). By enabling more accurate identification of transportation modes and travel patterns, such approaches can inform the design and evaluation of strategies aimed at reducing congestion, improving public transport efficiency, and promoting sustainable mobility.
The extraction of valuable travel information from GPS devices has been the subject of extensive research (Gong et al. 2014; Łukawska et al. 2023; Sadeghian et al. 2021; Salas et al. 2022; Shakibaei et al. 2021). However, accurately determining transportation modes solely based on GPS data poses a significant challenge, and it remains largely to be solved (Mesaric et al. 2024; Molloy et al. 2023; Rieser-Schüssler and Axhausen 2013). The detection of transportation modes necessitates a more sophisticated analytical approach compared to what have been used so far (Callefi et al. 2022; Lin et al. 2013; Tian et al. 2023; Wang et al. 2017). The used methods to developed detection of transportation modes in GPS data, range from rule-based approaches to advanced machine learning algorithms (Laffitte et al. 2019; Lin and Hsu 2014; Militão et al. 2025; Rashidi et al. 2012; Sadeghian et al. 2021). One major problem with many of these methods is that they heavily rely on large amounts of labeled datasets for training and classification purposes (Asci and Guvensan 2019; Wang et al. 2018), leading to that rather small data sets have been used, not taking the full advantage of that GPS data could be collected in large amounts. The development of methods that generalize well and remain effective when applied to large-scale GPS datasets is therefore much needed in research (Dabiri and Heaslip 2018; Markos and Yu 2020). This article therefore aims to develop a novel method to classify transport modes from a large raw GPS dataset.
The use of PSO (Particle Swarm Optimization) into the methodology for transportation mode detection is essential to augment the accuracy and efficiency of the clustering model. PSO serves as a powerful optimization algorithm that fulfills a crucial role in fine-tuning key elements of the data analysis process (Singh and Singh 2023; Zheng et al. 2022). It enables the precise assignment of weights to various characteristics and the establishment of classification thresholds, addressing the intricate complexities of human mobility patterns within the realm of transportation mode detection (Zheng et al. 2022). This inclusion of PSO ensures that the model can adapt to the intricacies of diverse transportation behaviors, optimizing its performance (Eberhart and Kennedy 1995). The distinctive capability of PSO to efficiently explore the solution space and refine critical parameters is invaluable in the context of transportation mode detection, particularly when dealing with unlabeled data (Rani 2023).
This integration of ANP, GPS data, and PSO represents a novel and promising approach for transportation mode detection studies. It offers both the adaptability and precision necessary for accurately classifying transportation modes, with substantial implications for optimizing transportation systems, informing urban planning, and advancing sustainability initiatives. One of the key challenges in this area of research has been the limited availability of large, high-quality datasets, which often restricts the generalizability and scalability of proposed methods. In contrast, this study addresses that limitation by applying the model to the extensive MOBIS dataset, which includes GPS tracking data from over 21,000 participants and comprises more than one million recorded trips (Molloy et al. 2023). While the core of our method is defined as an ANP-PSO hybrid, K-means clustering plays a critical preliminary role. Specifically, K-means is used to group the raw GPS data into initial clusters, each representing a distinct transportation mode. These clusters then serve as the foundation for further optimization using the ANP-PSO framework. The combination of K-means clustering and the ANP-PSO hybrid enables more accurate and scalable classification of transportation modes, forming the comprehensive approach presented in this study.
The rest of this paper is organized as follows. “Literature review” section provides a comprehensive review of the existing literature on transportation mode detection using GPS data, highlighting the challenges and limitations of previous approaches. “Methodology” section presents the methodology and framework of our proposed clustering model, including the integration of multi-criteria decision-making, network analysis, and the PSO algorithm. “Numeric results” section describes the experimental setup and discusses the results obtained from applying the model to GPS tracking data. Finally, “Conclusion” concludes the paper, summarizing the key contributions and outlining directions for future research.

Literature review

Transportation mode detection plays a crucial role in various domains, including urban planning, traffic management, and intelligent transportation systems (Gong et al. 2014; Lari and Golroo 2015). Accurately identifying the transportation mode of individuals can provide valuable insights into travel behavior, mode choices, and the utilization of transportation infrastructure (Lyons 2018; Yazdizadeh et al. 2021). In recent years, the use of Global Positioning Systems (GPS) has increased significantly in the field of transportation; especially for data collection, enabling researchers to extract valuable travel information and analyze transportation mode patterns (McGowen and McNally 2007; Sadeghian et al. 2021).
The extraction of transportation mode information from GPS data has been a topic of extensive research (Sadeghian et al. 2022; Wang et al. 2017). Researchers have developed various approaches and algorithms to accurately determine transportation modes based on GPS data (Li et al. 2020; Molloy et al. 2022). One commonly used approach for transportation mode detection is clustering, where GPS data points are grouped into distinct clusters representing different transportation modes (Dabiri et al. 2020; Yao et al. 2023). The selection of relevant features plays a critical role in the clustering process (Gao et al. 2021). The features such as average speed, mean speed, total distance, total time, average acceleration, mean bearing, maximum speed, and maximum acceleration were chosen based on existing standards and served as input variables for the clustering and classification process (Bachir et al. 2018; Sadeghian et al. 2022). The K-means algorithm was then used to cluster data points based on these features. The results demonstrated the effectiveness of this approach in successfully grouping the data points into different clusters corresponding to various transportation modes (Dabiri et al. 2020; Dabiri and Heaslip 2018).
While clustering approaches have achieved considerable success in transportation mode detection, accurately determining the transportation mode based on distance measurements can be challenging (Sadeghian and Mojarrad 2025). To overcome this limitation, researchers have explored the integration of multi-criteria decision-making techniques into the classification process (Saaty and Vargas 2006; Yazdizadeh et al. 2021). The ANP method considers the complex relationships between decision elements by replacing the hierarchical structure with a network structure (Saaty et al. 2013). It allows for the evaluation of the relative importance of each feature and captures the interdependencies among them. The weights assigned to the features can significantly influence the classification results. The PSO algorithm, on the other hand, optimizes the classification thresholds to achieve the best classification performance (Golshan et al. 2025; Singh and Singh 2023). By combining ANP and PSO, the proposed hybrid algorithm in the text demonstrated improved accuracy in classifying transportation modes (Singh and Singh 2023; Zheng et al. 2022). These preprocessing steps help improve the quality and reliability of the GPS data, leading to more accurate transportation mode detection.
Several studies have also explored the use of machine learning algorithms for transportation mode detection. Machine learning algorithms, such as support vector machines (SVM), random forests, and neural networks, have been employed to classify transportation modes based on GPS data (Sadeghian et al. 2021). These algorithms can capture complex patterns and relationships in the data, allowing for accurate classification (Sadeghian et al. 2022; Yao et al. 2023). However, the success of machine learning heavily relies on the availability of labeled training datasets (Sadeghian et al. 2021). One common challenge in transportation mode detection is the heterogeneity of GPS data and the presence of noisy or missing data points (Stenneth et al. 2011; Stopher et al. 2005). Several studies have addressed this challenge by employing data preprocessing techniques, such as data cleaning, outlier removal, and interpolation (Gong et al. 2014; Sauerländer-biebl et al. 2017).
In summary, it becomes evident that the field of transportation mode detection using GPS data has witnessed extensive exploration. Various methodologies, encompassing clustering, multi-criteria decision-making, and machine learning algorithms, have been harnessed to accurately ascertain transportation modes. However, the review distinctly identifies a research gap in the utilization of hybrid ANP-PSO algorithms in this domain, particularly concerning fully unlabeled datasets. This study aims to address this void and build upon prior works. Amid the discourse, challenges such as data heterogeneity and missing data were explored, emphasizing the pivotal role of data preprocessing strategies. Notably, the review underscores the significance of judicious feature selection and data quality in elevating the effectiveness of transportation mode detection methodologies.

Methodology

Rationale

The ANP method offers a robust approach for decision-making by considering the complex relationships and feedback effects among decision elements. Figure 1 presents the workflow of the proposed ANP–PSO framework. The figure highlights the integration of feature extraction, ANP-based weighting, and PSO optimization for transportation mode classification.
Fig. 1
Transport mode classification framework diagram
Bild vergrößern
By utilizing a network structure and the supermatrix method, ANP overcomes the limitations of the AHP and provides a more comprehensive analysis of interdependencies in decision-making problems. The structure and difference between the AHP and ANP are shown in Fig. 2.
Fig. 2
Structure of the AHP (a), the ANP (b), and displaying a classification problem with three attributes in the ANP method (c)
Bild vergrößern
As can be seen, in the Analytic Network Process, communication between components can be bidirectional, and each component can also have feedback. A classification problem can be considered as a single-part system consisting of n attributes that are interconnected in a network. Figure 1 illustrates a classification problem with three attributes in a compartment. (a) Shows the structure of the Analytic Hierarchy Process (AHP) with one-way communication. (b) Represents the Analytic Network Process (ANP) with bidirectional communication and feedback. (c) is a practical example within the ANP framework, demonstrating its real-world application. The figure shows the distinctions between AHP and ANP, emphasizing ANP’s bidirectional communication and feedback capabilities. This concept is adaptable to more complex scenarios. The arcs represent the interdependence and mutual influence of criteria on each other. Additionally, the loops indicate the feedback effect. The mutual relationships and feedback among the criteria create an impact on the overall decision-making process.
The ANP consists of the following three stages (Moalagh and Ravasan 2013). The first stage is problem modeling and structuring; the problem needs to be clearly articulated and logically organized, for example, by employing a systematic framework such as a decision network. The second stage is pairwise comparisons and priority vectors; the relative importance of each criterion with respect to the control criterion can be obtained from a pairwise comparison matrix. Pairwise comparisons express the relationships between elements within a component. Subsequently, a relative priority vector can be derived from the pairwise comparison matrix. Similarly, all local priority vectors are calculated using a similar method. Furthermore, the relative importance of criteria, and, based on the objective of the decision problem, is calculated from the pairwise comparison matrix. The third stage is the formation of the Super matrix; by substituting the relative priority vectors into the appropriate columns of a super matrix, the global priority weights of each criterion in the decision-making problem can be obtained from the restricted super matrix, which is derived from the multiplication of the super matrix by itself until the column convergence is achieved. The limit supermatrix is obtained by iteratively multiplying the initial supermatrix by itself until the matrix converges to a steady state with identical columns.
In the Analytic Network Process, pairwise comparisons are used to quantify the relative influence of one element on another. Let \(\text{C}=\left\{{\text{c}}_{1},{\text{c}}_{2},\dots ,{\text{c}}_{\text{n}}\right\}\) denote a set of criteria (or features). A pairwise comparison matrix \(\text{A}\in {\mathbb{R}}^{\text{n}\times \text{n}}\) is defined such that each element \({\text{a}}_{\text{ij}}\) represents the relative importance of the criterion \({\text{c}}_{\text{i}}\) with respect to the criterion \({\text{c}}_{\text{j}}\), Eq. (1):
$$\text{A}=\left[\begin{array}{cccc}1& {a}_{12}& \cdots & {a}_{1n}\\ {a}_{21}& 1& \cdots & {a}_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ {a}_{n1}& {a}_{n2}& \cdots & 1\end{array}\right]$$
(1)
where \({\text{a}}_{\text{ij}}>0\) and \({\text{a}}_{\text{ij}}=\frac{1}{{\text{a}}_{\text{ji}}}\). In classical ANP, these values are typically elicited from expert judgment; in this study, they are treated as decision variables optimized through PSO.
Given a pairwise comparison matrix \(\text{A}\), the priority vector \(\text{p}=({\text{p}}_{1},{\text{p}}_{2},\dots ,{\text{p}}_{\text{n}}{)}^{\text{T}}\) represents the relative weights of the criteria and is obtained as the normalized principal eigenvector of \(\text{A}\), Eq. (2):
$$\text{A}p={\lambda }_{\text{max}}p$$
(2)
where \({\uplambda }_{\text{max}}\) is the maximum eigenvalue of \(\text{A}\), and the elements of \(\text{p}\) satisfy \(\sum\nolimits_{{{\text{i}} = 1}}^{{\text{n}}} {{\text{p}}_{{\text{i}}} } = 1\). These priority vectors populate the corresponding columns of the ANP supermatrix.
The ANP supermatrix aggregates all local priority vectors into a single matrix that captures the interdependencies among criteria. By arranging the priority vectors into their appropriate columns, a weighted supermatrix is formed. The global priority weights of the criteria are then obtained by computing the limit supermatrix, which is derived by iteratively multiplying the weighted supermatrix by itself until convergence is achieved. Under standard ANP assumptions that the supermatrix is column-stochastic, irreducible, this iterative process converges to a steady-state matrix in which all columns are identical, in accordance with the Perron–Frobenius theorem.
The control criterion defines the context under which pairwise comparisons are performed and determines the dependency structure of the ANP network. In the present study, the control criterion is the transportation mode classification objective, under which all GPS-derived features are assumed to be mutually interdependent and evaluated with respect to their contribution to distinguishing transportation modes. Unlike traditional ANP applications that rely on subjective expert assessments, this study adopts a data-driven formulation in which the pairwise comparison matrices and resulting priority vectors are parameterized and optimized automatically within the PSO framework.

Classification method considering the weights of indicators in the ANP approach

This section describes how the Analytic Network Process (ANP) is operationalized for transportation mode classification. In the proposed framework, each GPS trip is represented by an n-dimensional feature vector, and ANP is used to assign relative importance weights to these features. These weights are then used to compute a weighted classification score, which forms the basis for transportation mode detection.
Let x = (x1,x2,…,xn) denote the feature vector corresponding to a single GPS trip (i.e., one GPS trajectory), where each component xi represents the value of the i-th GPS-derived feature, where each element corresponds to one extracted mobility feature (e.g., speed, distance, acceleration). The corresponding ANP-derived weight vector is denoted by w = (w1,w2,…,wn), where each weight reflects the relative importance of a feature as determined by the ANP supermatrix.
In the classification problem based on the ANP, the goal is to determine the appropriate class for each n-dimensional input pattern in the form of x. Each input pattern is treated as an option in ANP, its value is calculated using Eq. (3):
$$ u(x) = \sum\limits_{i = 1}^{n} {w_{i} x_{i} } $$
(3)
In this equation, the values of X are normalized between zero and one. It should be noted that the X values correspond to different features of a sample and are measured on different scales. The utility value u(x) represents a weighted aggregation of trip characteristics and serves as a one-dimensional score summarizing the mobility pattern of a trip. Higher values of u(x) indicate greater similarity to faster or longer-distance transportation modes, while lower values correspond to slower modes such as walking. To ensure compatibility of judgments, Eq. (5) is used to normalize the data instead of the parameter X in Eqs. (23):
$$ x_{i}{^\prime} = \frac{{x_{i} - mi_{i} }}{{ma_{i} - mi_{i} }} $$
(4)
This normalization ensures that all features contribute comparably to the weighted aggregation, preventing variables with larger numerical ranges from dominating the classification score. In Eq. (4), xi’ represents the normalized value, xi, mai is the maximum value of the i-th and mii feature is the minimum value of the i-th feature, which are determined based on the available training and testing data. By considering xi according to Eq. (4), which results in \(0\le {x}_{i}`\le 1\), and also considering the equation \(\sum {w}_{i}=1\) it is easily ensured that \(0\le \text{u}(\text{x})\le 1\), where represents x = (x1, x2, …, xn).
Different thresholds can be set to determine the classes according to the desired criterion and utilized in the representation of solutions in the metaheuristic algorithm. For example, by determining the thresholds of u(x)with a two-class problem, if the value of option u(x) is less than the threshold, the pattern belongs to class 1; otherwise, it belongs to class 2. Therefore, to generalize this principle to an m-class problem, m-1 thresholds are defined:
  • If u(x) < Th1, the pattern belongs to class 1.
  • If Thi-1 < u(x) < Thi, the pattern belongs to class i.
  • If u(x) > Thm-1, the pattern belongs to class m.
In the proposed framework, these thresholds are not predefined heuristically but are treated as decision parameters. Their values are optimized using the Particle Swarm Optimization algorithm described in “Particle swarm optimization algorithm”, jointly with the ANP feature weights, to minimize classification error. The next section will explain how the PSO algorithm can be used to determine the parameters of a classification problem using the ANP method based on the pattern data. These data include the parameters related to the elements of the super matrix and the thresholds. Thus, the ANP-based weighting scheme provides a structured feature aggregation mechanism, while PSO is responsible for identifying the optimal combination of feature weights and classification thresholds.

Particle swarm optimization algorithm

The Particle swarm optimization algorithm is an algorithm is a stochastic optimization technique based on swarm, which was proposed by (Eberhart and Kennedy 1995). It is inspired by the behavior of fish and birds, and in a way, it can be considered as a combination of collective group experiences and individual experiences that individuals use in their decision-making processes. (Richardson and Boyd 1985) examined the human decision-making process and developed the concept of individual learning and cultural transmission, which humans use two important types of information in their decision-making process: first, their own experiences and experiments and second experiences and experiments of other individuals.
This theory later became the basis for the development of the PSO algorithm by (Eberhart and Kennedy 1995). This algorithm utilizes a set of particles that form a swarm and move in the solution space in search of the best solution. Each particle determines its own movement path in the search space based on its individual experiences and the experiences of other particles. Each particle keeps track of its best position (Pbest) and its value, as well as the best overall position of the swarm (gbest) and its value that has been found so far, and then updates its movement path based on this information.
If we assume that each particle moves in an n-dimensional space, we represent the position vector of particle i as xi = (xi,1, xi,2, …, xi, n) ∈ ℜn, and the velocity vector as vi = (vi,1, vi,2, …, vi, n) ∈ ℜn. After each iteration of the algorithm, we update the particle's velocity and position using Eqs. (5) and (6)
$${{\varvec{v}}}_{{\varvec{i}}}({\varvec{k}}+1)={\varvec{\omega}}\times {{\varvec{v}}}_{{\varvec{i}}}({\varvec{k}})+{{\varvec{c}}}_{1}\times {{\varvec{r}}}_{1}\times ({\varvec{p}}{\varvec{b}}{\varvec{e}}{\varvec{s}}{{\varvec{t}}}_{{\varvec{i}}}-{{\varvec{x}}}_{{\varvec{i}}}({\varvec{k}}))+{{\varvec{c}}}_{2}\times {{\varvec{r}}}_{2}\times ({\varvec{g}}{\varvec{b}}{\varvec{e}}{\varvec{s}}{{\varvec{t}}}_{{\varvec{i}}}-{{\varvec{x}}}_{{\varvec{i}}}({\varvec{k}}))$$
(5)
$${{\varvec{x}}}_{{\varvec{i}}}({\varvec{k}}+1)={{\varvec{x}}}_{{\varvec{i}}}({\varvec{k}})+{{\varvec{v}}}_{{\varvec{i}}}({\varvec{k}}+1)$$
(6)
In Eq. (5), "ω" represents the inertia component, "c1" represents the cognitive component, and "c2" represents the social learning component. Additionally, "r1" and "r2" are random numbers between 0 and 1. pbesti is the personal best (the best solution found by particle i), and gbesti is the best solution found by particle i’s neighbours. The inertia component is a memory that retains the previous movement direction. This memory component can be seen as inertia that prevents sudden changes in movement direction. In Eq. (6) xi (k + 1) is the updated position of particle i at time k + 1, and xi (k) is the current position of particle i at time t, and vi (k + 1) is the updated velocity as calculated in the previous step.
c1 and c2 are used in the Particle Swarm Optimization (PSO) algorithm to control the velocity and position updates of particles within the swarm. Specifically, c1 reflects a particle’s self-confidence, while c2 represents its confidence in other particles. If c2 > 0 and c1 = 0, the entire swarm converges towards a single point, transforming all particles into random explorers. Conversely, if c1 > 0 and c2 = 0, particles become independent explorers, performing local search. Excessive growth of these components leads to oscillatory particle movement, whereas smaller values create smoother paths. Typically, suitable values for c1 and c2 are empirically determined. In a 1998 article, Kennedy demonstrated that the sum of c1 and c2 should be less than 4 to ensure network convergence; otherwise, velocities and positions tend towards infinity.
The cognitive component evaluates the efficiency of the particle relative to its previous performances. Similar to memory, the cognitive component keeps track of the position where the particle performed better. The effect of this component is to encourage the particle to return to its previous best position, similar to individuals’ tendency to return to a location or situation that satisfied them more in the past.
The social learning component evaluates the efficiency of the particle relative to a group of particles or the entire swarm. Conceptually, the social component acts as a criterion that particles strive to achieve. The effect of this component is to attract each particle towards the best position found by its neighbours or the whole swarm. Figure 3. schematically illustrates the change in the particle’s trajectory in the PSO algorithm.
Fig. 3
Particle Movement in the PSO algorithm
Bild vergrößern
During each iteration, it is essential to assess the fitness of individual particles to align with the intended optimization objective. The evaluation process involves quantifying the performance of neural networks using the root-mean-square-error (RMSE) metric. This RMSE metric gauges the average detection error and can be computed using the subsequent Eq. (7):
$$RMSE=\sqrt{\frac{1}{nTr*nout}\sum_{i=1}^{nTr}\sum_{j=1}^{nOut}{\left({t}_{i,j}-{y}_{i,j}\right)}^{2}}$$
(7)
In Eq. (7), nTr represents the count of training samples, nOut signifies the number of network outputs, and ti,j and yi,jstand for the derived and reported outputs, respectively, corresponding to the ith training sample and jth network output. For instance, when the reported travel mode for the ith training sample is “walk,” yi,1 equals 1. Similarly, if the derived travel mode for the ith training sample is “bus,” ti,1 and ti,3 are set to 0 and 1, respectively. The fitness function is defined as follows Eq. (8):
$$fitness=\frac{1}{1+RMSE}$$
(8)
This fitness value serves as a metric to gauge the performance of the proposed algorithm. After assessing the initial swarm, the individual best performance for each particle (referred to as “personal best”) and the best performance among neighbouring particles (referred to as “local best”) are retained.
If the fitness of a particle surpasses its personal best, the local best is updated with that particle’s position. Likewise, if a particle’s fitness outperforms the local best, the local best is updated to match the particle’s current position. This iterative process continues until the user-defined criterion is met, which could involve reaching a specific RMSE threshold or completing a designated number of iterations. In this algorithm, the most important parameters are the particle count, velocity components, and inertia weight.
A higher inertia weight accompanies a broader search space with sudden changes, while a lower inertia weight focuses the search in a narrower and more concentrated region. The optimal value of inertia weight depends on the problem and the dependent velocity components. In fact, these three parameters are selected simultaneously. In the past, the inertia weight was considered a constant value for all iterations. However, nowadays, these parameters are defined as variables. These methods typically start with higher initial inertia weight values, gradually reducing them over time. As a result, particles engage in exploration during the initial stages and gradually shift towards exploitation of desired regions. The adjusted inertia weight for each iteration can be determined using Eq. (9).
$$ \omega = \omega_{\max } - \frac{{\omega_{\max } - \omega_{\min } }}{{iter_{\max } }} \times iter $$
(9)
In the above equation ωmin, ωmax, there are maximum and minimum inertia weight values, which are input parameters of the algorithm and should be determined by the user. ’iter ’ represents the current iteration of the algorithm, while itermax indicates the maximum number of iterations.
Furthermore, the initial conditions and termination criteria of the algorithm should be specified. The particle positions vector can be randomly generated or initialized using heuristic methods, and the initial particles must be assigned values that ensure sufficient diversity. The initial velocity vector is commonly set to zero, but if it is randomly initialized, the values should be very small.

Representation of solutions in the particle swarm optimization algorithm

In the PSO metaheuristic algorithm, each solution, which is considered as a particle, is composed of a string containing elements n2 + m-1. The n2 parameter related to the elements of the super matrix consists of n features, and m-1 corresponds to the number of thresholds defined for the m desired classes, as explained above. To generate n2 elements corresponding to the weights of feature importance, we first generate n random numbers between zero and one for each row of matrix W in Eq. (1). Then, considering that according to Eq. (2), the sum of these numbers should be equal to one, we calculate the sum of the random numbers generated for each row of matrix W, and then divide the numbers in each row by the sum of that row. Using this method, n2 elements of matrix W are calculated in Eq. (1). Then, as explained, by exponentiating the matrix W until the matrix converges to W2k+1 a constant value, the weights of the matrix, i.e., the values of wi for each i, are determined. The thresholds corresponding to different classes also need to be determined. It is evident that the threshold for the i-th class is smaller than the threshold for the i-1-th class.
Moreover, considering that\(0\le \text{u}(\text{x})\le 1\), to generate the strings related to class thresholds in the initial solution generation, we can generate m-1 random numbers between zero and one, and then after sorting these numbers in descending order, we assign them in order as Th1, Th2,.., Thm-1. Therefore, the initial solutions of the problem can be created in the form of a string, considering the above explanations, where the numbers of this method are continuous variables between zero and one.
After specifying the above explanations, we will proceed with implementing the PSO algorithm for determining the parameters of the classification model (Fig. 4).
Fig. 4
Pseudocode of a PSO using a ring topology
Bild vergrößern
In the proposed implementation, each particle in the PSO population represents a candidate solution consisting of ANP feature weights and classification threshold values. The population size was set to 100 particles, as reported in Table 2. The personal best (pbesti) corresponds to the best solution previously found by a particle, while the global best (gbest) represents the best solution found across the entire population. These solutions are evaluated using the RMSE-based fitness function, and particle positions are updated iteratively according to Eqs. (6) and (7).

Numeric results

Input dataset

The dataset used in this study is drawn from the MOBIS:COVID-19 research project conducted by ETH Zürich (Molloy et al. 2023). This dataset was collected across Switzerland during the COVID-19 pandemic to monitor and analyze how individual travel behavior and transportation choices changed in response to public health interventions and societal restrictions. The data was acquired through a smartphone application called Catch-my-Day, which passively recorded GPS trajectories and sensor data from thousands of volunteer participants over an extended period. Each participant’s device continuously captured time-stamped geolocation information, allowing for the reconstruction of detailed daily mobility patterns and the inference of transportation modes.
The original dataset comprised a total of 1,230,196 trips, distributed across multiple transportation modes: Car (606,770), Walk (566,812), Bus (21,334), LightRail (16,400), Bicycle (13,718), Train (2,996), RegionalTrain (928), Tram (846), Subway (199), Plane (134), and Ebicycle (59). Although the MOBIS dataset also contains a substantial number of light rail, tram, subway/metro, and regional rail trips, these modes were not combined into the “Train” category and were excluded from the present analysis. This decision was made because different rail-based public transport modes often exhibit highly overlapping speed, acceleration, and distance profiles at the trip level, which increases classification ambiguity when relying solely on GPS-derived features. For this study, we focused on the five primary transportation modes Car, Walk, Bus, Bicycle, and Train as they represent the most common and relevant categories for analysis. This filtering resulted in a refined dataset comprising 1,211,630 trips, which provided a more focused and meaningful basis for evaluating transportation mode detection methodologies.
The MOBIS dataset contains both validated and non-validated trip records, as documented by Molloy et al. (2023). In this study, GPS trajectory data and derived movement features (e.g., speed, distance, acceleration, and bearing) were used as the sole inputs for the K-means clustering stage. This step was entirely unsupervised and did not rely on transportation mode labels generated by the MotionTag system or user validation. For the subsequent ANP–PSO hybrid classification, transportation mode labels were used only for calibration, balancing, and performance evaluation. Specifically, a balanced subset was constructed using trips for which transportation mode had been validated by users at least once, following MOBIS documentation that approximately 85% of participants provided validation. This strategy was adopted to mitigate known reliability issues of automated mode detection, particularly for bus trips, while preserving the unsupervised nature of the clustering process. MotionTag-inferred modes were not used as ground truth for clustering or optimization, but only as a reference for evaluating classification performance.
To prepare the data for transport mode classification, several preprocessing steps were applied. First, entries from users with invalid GPS recordings, such as those trips with a maximum speed of zero, were removed to ensure the dataset’s reliability. Further cleaning was performed by filtering out trips with an average speed of trips below one meter per second, as these typically represented stationary behavior or sensor noise. The GPS trajectory data was then aligned with the labeled transport legs provided in the dataset to ensure consistency between inferred and reported modes.
From each trajectory, a set of numerical features was derived to support mode detection. These include average speed, mean speed, total distance traveled, total trip duration, average acceleration, mean directional bearing, maximum speed, and maximum acceleration. These variables were normalized and used as input for the Analytic Network Process (ANP) combined with Particle Swarm Optimization (PSO). The ANP framework was used to model the interrelationships and relative importance of the extracted features, while PSO served as a heuristic optimization technique to identify the most effective feature weights and classification thresholds. Unlike other studies that rely heavily on manual labeling or unsupervised clustering such as k-means, this approach benefits from the semi-labeled nature of the MOBIS dataset, enabling more robust model training and evaluation. In this study, the MOBIS dataset is referred to as semi-labelled in the sense that transportation mode labels are available but are not uniformly validated at the trip level. Mode labels are primarily inferred automatically by the MotionTag system, and only a subset of trips are explicitly validated by users. While approximately 85% of participants validated at least one trip, not all individual trip records are confirmed, and the reliability of automatic mode detection varies across modes. For this reason, the dataset is not treated as fully labelled ground truth in the present analysis. The combination of rich sensor data, rigorous cleaning procedures, and a hybrid classification methodology provides a comprehensive and scalable solution for detecting transportation modes in real-world mobility datasets.

K-means classification

In order to classify and identify transportation modes accurately, the K-means algorithm was employed as a clustering technique using selected features from the data. The features considered for clustering included average speed, mean speed, total distance, total time, average acceleration, mean bearing, maximum speed, and maximum acceleration. These features were selected based on common practice in the transportation mode detection literature, where GPS-derived variables such as speed, distance, time, acceleration, and bearing are consistently shown to be discriminative for distinguishing between walking, cycling, and motorized modes (Dabiri et al. 2020; Sadeghian et al. 2024a). Such features have been widely used in both supervised and unsupervised mode detection approaches and are considered robust indicators of travel behavior.
Before applying the K-means algorithm, the data underwent preprocessing and sorting procedures. Additionally, new records such as maximum speed and acceleration were extracted from the data to enhance the clustering process. The K-means algorithm was then applied using Python, iteratively clustering the data for a total of 10,000 iterations.
The resulting output of Table 1 presents the cluster centers obtained from the K-means clustering method. Each cluster is associated with specific transportation modes. Cluster 1 is identified as the walking cluster, Cluster 2 as the bike cluster, Cluster 3 as the buses cluster, Cluster 4 as the cars cluster, and Cluster 5 as the trains cluster. These assignments were made based on the characteristic features and patterns exhibited by the data points within each cluster.
Table 1
Cluster centers in data clustering using the K-means method
Cluster
Attribute
Average speed (m/s)
Mean speed
(m/s)
Total sistance
(m)
Total time
(s)
Average acceleration
(m/s2)
Average bearing
(degree/min)
Maximum speed
(m/s)
Maximum acceleration
(m/s2)
Walk
4.37
4.11
2426.30
494.95
− 0.003
26.13
6.68
1.06
Bike
5.86
6.57
19,530.76
3253.03
− 0.011
160.49
19.94
3.85
Bus
6.48
7.54
17,437.24
2456.71
− 0.026
201.86
19.33
3.78
Car
7.31
7.96
20,200.63
2453.35
− 0.011
113.90
23.83
4.13
Train
8.60
9.65
14,470.75
1321.88
− 0.0071
259.26
17.18
3.43
While it is possible to determine the transportation mode of a new trip by calculating the distance between its features and the cluster centers, further accuracy improvement is desired. To achieve this, the classification method described in “Methodology” section is employed. By utilizing this classification method, new data points can be assigned to their respective clusters, ensuring a more precise classification of transportation modes.
The classification method works by leveraging the information gathered during the clustering process. The new data points are evaluated based on their feature values, and their distances to the cluster centers are calculated. The cluster with the minimum distance to the new data point is then selected as the likely transportation mode for that trip. By incorporating the classification method alongside the clustering approach, the accuracy of the data mining method for transportation mode detection is significantly enhanced.
Some extreme values observed in Table 1, particularly for maximum speed and maximum acceleration, reflect the inherent noise and positional uncertainty associated with raw GPS measurements. Although preliminary data cleaning was performed by removing trips with invalid recordings (e.g., zero maximum speed and unrealistically low average speed), no trajectory-level smoothing or point-wise outlier filtering was applied prior to feature extraction. As a result, maximum speed and acceleration values represent upper-bound GPS-derived estimates rather than continuous physical motion and should be interpreted accordingly.
In conclusion, the combination of the K-means algorithm for initial clustering and the classification method described in “Methodology” section provides an effective approach for accurately detecting transportation modes. The K-Means clustering algorithm achieved an accuracy of approximately 50% on the full dataset. Although this number may appear low, it is important to note that the dataset is both large and imbalanced. Most of the trips are labeled as Car (606,770) and Walk (566,812), while other modes such as Bus (21,334), Bicycle (13,718), and Train (2,996) have much fewer instances. Given this imbalance and the unsupervised nature of K-Means, the 50% accuracy indicates that the algorithm was able to identify meaningful structure in the data. Furthermore, the algorithm demonstrated computational efficiency, completing its execution in 128.12 s. This result provides a useful baseline and demonstrates the effectiveness of clustering as a first step in transportation mode classification. The K-means algorithm helps to identify distinct patterns in the data and supports the classification process by grouping similar trips. Additionally, the clustering results are used to generate initial threshold values for the ANP-PSO algorithm, helping to improve the accuracy of feature weighting and classification. This combined approach offers a reliable method for detecting transportation modes and can support applications in transport planning, urban mobility studies, and intelligent transport systems.
Since the original dataset was imbalanced, with a disproportionate number of trips across different transportation modes, a balanced subset was created to ensure fair and effective training. Specifically, 2,996 trips were randomly selected for each mode, resulting in a balanced dataset that was used as input for the ANP-PSO algorithm. This approach helped to prevent bias toward overrepresented classes and improved the reliability of the classification results.

Data classification using the ANP-PSO hybrid algorithm

The hybrid ANP-PSO algorithm, as detailed in “Methodology” section, was applied through Phyton programming to classify transportation modes based on the designated variables. Within this approach, updating the solution vector ‘x’ is essential to explore potential solutions more comprehensively. The iterative update of travel features aims to enhance the transport mode detection process by optimizing the weightings assigned to distinctive features and refining the classification thresholds. This dynamic adaptation of ‘x’ permits the algorithm to fine-tune its parameters for each cluster, ensuring a more accurate and tailored outcome.
The implementation unfolds as follows: commencing with the application of the K-means algorithm to cluster GPS data into distinct transportation modes, the ANP-PSO hybrid algorithm is subsequently employed on each cluster individually. This segment-wise application of ANP-PSO is geared towards determining specific weights for selected features and individualized classification thresholds corresponding to each transportation mode. The algorithm encompasses a configuration of 64 elements intertwined with the ANP super matrix, encapsulating 8 variables (average speed, mean speed, total distance, total time, average acceleration, average bearing, maximum speed, and maximum acceleration), in connection with 5 variables linked to classification thresholds. The algorithm’s parameters, employed during the initial analysis, are delineated in Table 2, offering a comprehensive insight into the algorithm’s functioning and configuration.
Table 2
Parameters used in the combined ANP and PSO algorithm
Parameter
ωmax
ωmin
c1
c2
Number of iterations
Max number of particles
Value
0.9
0.4
2
2
300
100
Table 2 shows the values assigned to the parameters used in the combined ANP and PSO algorithm. The parameter ωmax represents the maximum inertia weight, ωmin represents the minimum inertia weight, c1 and c2 represent the cognitive and social learning factors, respectively. The number of iterations represents the maximum number of iterations allowed for the algorithm, and the maximum number of particles represents the number of particles in the PSO process.
Figure 4 shows the convergence behavior of the PSO algorithm over 300 iterations. The fitness function value steadily increases, indicating that the algorithm is successfully improving its solution quality with each iteration. This upward trend reflects PSO’s ability to effectively explore the solution space and optimize model parameters for transportation mode detection. The consistent improvement in fitness values supports the final classification accuracy of over 92.4%, demonstrating the robustness of the algorithm and the relevance of the selected features.
In this context, a higher fitness function value indicates better convergence and more accurate classification. As shown in Fig. 5, the fitness value rises steadily throughout the iterations, confirming that the algorithm is approaching optimal parameter settings. The increasing fitness trend is a strong indicator of the algorithm’s learning progress and its ability to generalize to accurate results. The classification variables of the algorithm include the weights assigned to each characteristic and the threshold values used in the classification process. These weights and thresholds can be utilized for classifying new data, enabling accurate transportation mode detection. Furthermore, the execution details of the algorithm, such as the solving time, are presented. The results indicate that the execution time of the algorithm was 336,635 s, reflecting the good performance but lacking efficiency in terms of time due to using a large dataset.
Fig. 5
Convergence and accuracy improvement trend of the proposed algorithm
Bild vergrößern
To further explore the performance of the hybrid-algorithm, the confusion matrix was calculated. Recall and precision are used to measure the evaluation of the classifier's classification capability hinges on the utilization of recall and precision metrics, whose definitions are elucidated in detail by (Forman 2003). As is demonstrated in Table 3, the most elevated recall is attained for modes corresponding to walk, registering at 93.68%. This outcome was anticipated, considering the conspicuous discrepancies in various features, such as average speed and travel distance, between walk and other modes. Notably, the recall for points associated with bus and bike modes is comparably lower amongst the four modes. Regarding precision, the lowest value is observed for bus segments at 88.47%, likely due to the intermediate nature of bus-related speed features, which overlap with those of both slower modes (e.g., bicycling) and faster modes like car, leading to greater classification ambiguity.
Table 3
Confusion matrix of hybrid algorithms
 
Walk
Bike
Car
Bus
Train
Total Points
Total trips
Recall (%)
F1 Score
(%)
Walk
47,320,984
2,793,393
0
399,056
0
50,513,434
2996
93.68
92.94
Bike
2,620,890
50,924,898
873,630
437,315
436,315
55,293,049
2996
92.1
91.74
Car
806,437
1,241,888
48,260,712
2,069,813
21,487
52,400,339
2996
91.8
92.84
Bus
369,376
369,376
1,846,881
43,062,724
1,108,128
46,756,487
2996
92.1
90.24
Train
193,176
386,352
579,528
2,704,468
45,041,855
48,905,381
2996
90.1
94.31
Precision (%)
92.93
91.4
93.6
88.47
95.83
234,611,173
14,980
  
The model leverages a two-step process, combining K-means clustering with the ANP-PSO hybrid method. The K-means clustering step effectively forms initial clusters of transportation modes without relying on labelled data, using only raw GPS data. Subsequently, the ANP-PSO hybrid method refines these clusters to achieve a high level of classification accuracy. Through this approach, we have indeed enhanced the model's performance compared to traditional methods. The results show an accurate rate of over 92.4%, surpassing previous studies using alternative techniques. This enhancement is attributed to the integration of unsupervised clustering and a hybrid approach that leverages multi-criteria decision-making and meta-heuristic optimization. It demonstrates the feasibility of accurately classifying transportation modes while minimizing dependence on labelled data.

Comparison with previous studies

We compared the ANP-PSO hybrid method with the Convolutional AutoEncoder (Markos and Yu 2020), Modetect algorithm (Lin et al. 2013), Expectation Maximization algorithm (Patterson et al., 2003) and masked autoregressive flow (MAF) (Dutta and Patra 2023). These methods were selected due to their relevance in transportation mode detection and their use of fully raw GPS dataset. By addressing both accuracy and execution time, the ANP-PSO hybrid method achieved outperformed results, with an accuracy of 92.4%, showcasing its effectiveness and practicality over these established approaches (Table 4).
Table 4
Performance comparison between algorithms found in some seminal and related studies
Study
Model
Dataset & Size
Modes detected
Method overview
Accuracy (%)
This study
ANP–PSO hybrid method
Large-scale MOBIS GPS trajectories
Walk, bike, car, bus, train
Unsupervised clustering + ANP criteria weighting + PSO optimization
92.4
Markos and Yu (2020)
Convolutional AutoEncoder (CAE)
Geolife dataset
Walk, bike, car, bus, train
classification via CNN-based representation learning (ACM Digital Library)
 ~ 80
Lin et al. (2013)
Modetect algorithm
Two datasets were used (10 users), and one user over 10 months
Walk, bike, car, bus
Kolmogorov–Smirnov–test–based unsupervised classifier
 ~ 74
Dutta and Patra (2023)
MAF-based method
Geolife dataset (182 users, 73 users labeled)
Walk, bike, car, bus, train
Combines Masked AutoRegressive Flow
 ~ 68
Patterson et al. (2003)
EM algorithm
29 trips from a single participant
Walk, car, bus
Expectation–Maximization clustering with GIS context
 ~ 58
As is shown in Table 4 below, model used by Patterson et al. (2003) obtained 58% accuracy with the Expectation Maximization algorithm, considering Velocity and standard deviation of velocity for Walk, Car, and Bus modes. Model applied by Lin et al. (2013) achieved 74% accuracy by incorporating diverse GPS input variables to classify Walk, Car, Bus, and Bike modes. Our study surpassed the others with an accuracy of 92.4% by combining clustering with multi-criteria decision-making. Markos and Yu (2020) proposes an unsupervised deep learning approach to identify transportation modes from unlabeled GPS trajectory data using a Convolutional AutoEncoder with an integrated clustering layer. The method achieves 80.5% clustering accuracy on the Geolife dataset, demonstrating the potential of unsupervised learning for transportation mode detection without relying on labeled data. In the recent study done by Dutta and Patra (2023), proposes an unsupervised learning approach for transportation mode detection that addresses the limitations of supervised methods, such as the reliance on scarce labeled data. By using point-level features and masked autoregressive flow (MAF) to estimate probability densities, followed by K-means clustering, the method effectively identifies transportation modes and outperforms traditional techniques.(Dutta and Patra 2023).
The ANP-PSO hybrid method classified Walk, bike, Car, Bus, and Train modes using GPS input variables such as average speed, mean speed, total distance, total time, average acceleration, mean bearing, maximum speed, and maximum acceleration. The comparison highlights the efficiency of diverse approaches and the potential of our clustering-based method for accurate transportation mode prediction from GPS trajectories. Our study's utilization of transportation-specific input variables and multi-criteria decision-making provided a distinct advantage in achieving higher accuracy. The results showcase promising avenues for future research in GPS trajectory analysis and transportation mode classification. The comparison presented in Table 4 is based on performance figures reported in the respective original studies and is intended to provide contextual insight rather than a direct, controlled benchmark. The listed methods were evaluated on different datasets with varying sizes, sampling frequencies, preprocessing pipelines, and parameter settings. The reported accuracies of approaches such as convolutional autoencoders and expectation–maximization models depend strongly on how GPS trajectories are segmented, how model parameters are tuned, and how labels are defined or inferred in each study. The present study did not re-implement these baseline methods on the MOBIS dataset; therefore, differences in accuracy should not be interpreted as arising solely from algorithmic superiority, but also from dataset characteristics and experimental configurations.
This study is one of the first to mention and evaluate the execution time of a transportation mode classification algorithm alongside its accuracy. The proposed hybrid ANP-PSO algorithm achieved an accuracy of over 92% in the final iteration, demonstrating strong performance in classifying transportation modes based on GPS trajectory data. The total execution time of the algorithm was approximately 336,652 s, corresponding to about 93.5 h (nearly four days) of computing time. This runtime reflects the execution of the K-means clustering and ANP–PSO optimization stages on trip-level feature representations and does not include preliminary data cleaning or raw GPS filtering. As the framework operates on aggregated trip-level features, the computational cost is primarily influenced by the number of trips and feature dimensionality, while the GPS sampling frequency affects runtime indirectly through feature extraction. This extended runtime is largely due to the algorithm’s complexity and the use of a significantly larger dataset compared to previous studies. By including execution time in the evaluation, this study offers a more comprehensive analysis of the algorithm’s practical applicability, particularly in scenarios where real-time or time-sensitive processing is required.
Comparing this with other algorithms that mainly focus on accuracy highlights the unique strength of our approach in balancing both computational efficiency and classification accuracy. The results showed that the proposed algorithm is well-suited for real-world scenarios, where rapid and reliable transportation mode detection is important.

Conclusion

In conclusion, the study presents a novel approach for clustering and classifying transportation modes based on raw GPS data. Recognizing the challenge of accurately classifying transportation modes without extensive labeled data, a two-step process is devised to embody these principles. Initially, the unsupervised K-means clustering technique is applied, relying solely on raw GPS data to form initial clusters representing various transportation modes. Then, the ANP-PSO hybrid method is employed to further refine classification without the need for labeled data, leveraging multi-criteria decision-making and meta-heuristic optimization. By combining clustering with the hybrid approach, this method improves classification accuracy while reducing the need for labeled data. This makes it very useful in real-world situations where labeled data is scarce or expensive to obtain, demonstrating both its practicality and effectiveness.
The results showed that the proposed hybrid algorithm achieved high accuracy in classifying transportation modes. The convergence trend of the algorithm showed continuous improvement, indicating the effectiveness of the algorithm in exploring the solution space and refining the classification results. With an accuracy rate exceeding 92% in the final iteration, the proposed algorithm showed its capability to accurately identify transportation modes based on GPS data. Furthermore, the execution time of the algorithm was measured to assess its computational efficiency. This indicated that the proposed approach could be applied in real-time scenarios, providing efficient and timely transportation mode detection.
This approach contributes significantly to the field of transportation mode detection, offering a comprehensive and effective method for accurately clustering and classifying modes from GPS data. In practical terms, it has applications in transportation planning and intelligent transportation systems by providing real-time insights into transportation mode usage. Such information can optimize transportation services, allocate resources efficiently, and improve overall system performance. Additionally, this method supports various applications in transportation research, including travel behavior analysis, policy impact assessment, and alternative mode evaluation, thereby opening new opportunities for data-driven decision-making.
The proposed ANP–PSO classification framework is formulated as a deterministic classification and optimization procedure, in which transportation modes are assigned based on a weighted aggregation score and optimized threshold values. The model does not rely on probabilistic behavioral assumptions or utility-maximization theory, but instead determines feature weights and classification boundaries through a meta-heuristic optimization process. This formulation allows flexibility in handling complex feature interactions and weakly labelled data without imposing distributional assumptions.
Future research can extend the proposed framework in several directions. In the present study, the classification focused on five primary transportation modes (walk, bike, car, bus, and train) in order to demonstrate the effectiveness of the ANP–PSO hybrid method on a set of dominant and clearly distinguishable modes. Although the MOBIS dataset also contains substantial numbers of light rail, tram, metro, and regional rail trips, these modes were excluded due to their strong feature overlap with other rail-based modes and the increased classification ambiguity they introduce at the trip-level feature scale.
An important avenue for future work is the extension of the framework to finer-grained public transport distinctions. This may include hierarchical classification schemes that first separate road-based and rail-based modes, followed by more detailed differentiation between tram, light rail, metro, regional rail, and long-distance train services. Such extensions could further benefit from the integration of contextual information such as transit network topology, stop locations, or timetable data. Expanding the framework in this direction would enhance its applicability for detailed public transport planning and multimodal mobility analysis.

Acknowledgements

We would like to express our sincere gratitude to Professor Kay W. Axhausen and Dr. Daniel Heimgartner from ETH Zurich for providing us with access to the MOBIS dataset. Their support and the availability of this high-quality dataset were essential to the success of our research on transportation mode detection.

Declarations

Conflict of interest

The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Paria Sadeghian

is a Lecturer at Dalarna University, Sweden. She received her PhD in microdata analysis from Dalarna University. Her research focuses on transportation mode detection, urban mobility analytics, travel behavior modeling, and data-driven approaches to mobility analysis. Her work integrates advanced data analytics and machine learning techniques to better understand and model human mobility patterns.

Johan Håkansson

is a is full-professor in microdata analysis at Dalarna University, Sweden. His research interests are in urban and regional mobility, last-mile logistics, location analysis, and data analytics.
Download
Titel
A novel ANP-PSO framework for clustering transportation modes from GPS tracking data
Verfasst von
Paria Sadeghian
Johan Håkansson
Publikationsdatum
07.03.2026
Verlag
Springer US
Erschienen in
Transportation
Print ISSN: 0049-4488
Elektronische ISSN: 1572-9435
DOI
https://doi.org/10.1007/s11116-026-10739-5
Zurück zum Zitat Asci, G., Guvensan, M.A.: A novel input set for lstm-based transport mode detection. In: 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). pp. 107–112. IEEE (2019)
Zurück zum Zitat Axhausen, K.W., Gärling, T.: Activity‐based approaches to travel analysis: conceptual frameworks, models, and research problems. Transp. Rev. 12, 323–341 (1992)CrossRef
Zurück zum Zitat Bachir, D., Khodabandelou, G., Gauthier, V., El Yacoubi, M., Vachon, E.: Combining Bayesian inference and clustering for transport mode detection from sparse and noisy geolocation data. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 569–584. Springer (2018)
Zurück zum Zitat Batty, M., Axhausen, K.W., Giannotti, F., Pozdnoukhov, A., Bazzani, A., Wachowicz, M., Ouzounis, G., Portugali, Y.: Smart cities of the future. Eur. Phys. J. Spec. Top. 214, 481–518 (2012)CrossRef
Zurück zum Zitat Callefi, M.H.B.M., Ganga, G.M.D., Godinho Filho, M., Queiroz, M.M., Reis, V., dos Reis, J.G.M.: Technology-enabled capabilities in road freight transportation systems: a multi-method study. Expert Syst. Appl. 203, 117497 (2022). https://doi.org/10.1016/j.eswa.2022.117497CrossRef
Zurück zum Zitat Chan, H.-Y., Ma, H., Zhou, J.: Resilience of socio-technical transportation systems: a demand-driven community detection in human mobility structures. Transp. Res. Part A Policy Pract. 190, 104244 (2024). https://doi.org/10.1016/j.tra.2024.104244CrossRef
Zurück zum Zitat Dabiri, S., Heaslip, K.: Inferring transportation modes from GPS trajectories using a convolutional neural network. Transp. Res. Part c, Emerg. Technol. 86, 360–371 (2018). https://doi.org/10.1016/j.trc.2017.11.021CrossRef
Zurück zum Zitat Dabiri, S., Lu, C., Heaslip, K., Reddy, C.K., Member, S.: Semi-supervised deep learning approach for transportation mode identification using GPS trajectory data. IEEE Trans. Knowl. Data Eng. (2019). https://doi.org/10.1109/TKDE.2019.2896985CrossRef
Zurück zum Zitat Dabiri, S., Marković, N., Heaslip, K., Reddy, C.K.: A deep convolutional neural network based approach for vehicle classification using large-scale GPS trajectory data. Transp. Res. Part C Emerg. Technol. 116, 102644 (2020). https://doi.org/10.1016/j.trc.2020.102644CrossRef
Zurück zum Zitat Dutta, S., Patra, B.K.: Inferencing transportation mode using unsupervised deep learning approach exploiting GPS point-level characteristics. Appl. Intell. 53, 12489–12503 (2023)CrossRef
Zurück zum Zitat Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science. pp. 39–43. IEEE (1995)
Zurück zum Zitat Feng, T., Timmermans, H.J.P.: Transportation mode recognition using GPS and accelerometer data. Transp. Res. Part c, Emerg. Technol. 37, 118–130 (2013)CrossRef
Zurück zum Zitat Feng, T., Timmermans, H.J.P.: Comparison of advanced imputation algorithms for detection of transportation mode and activity episode using GPS data. Transp. Plann. Technol. 39, 180–194 (2016)CrossRef
Zurück zum Zitat Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Zurück zum Zitat Gao, Q., Molloy, J., Axhausen, K.W.: Trip purpose imputation using GPS trajectories with machine learning. ISPRS Int. J. Geo-Inf. 10, 775 (2021)CrossRef
Zurück zum Zitat Golshan, A., Sardar, S., Mahdavi Ardestani, S.F., Sadeghian, P.: A fuzzy analytical network process framework for prioritizing competitive intelligence in startups. Analytics 4, 3 (2025)CrossRef
Zurück zum Zitat Gong, L., Morikawa, T., Yamamoto, T., Sato, H.: Deriving personal trip data from GPS data : a literature review on the existing methodologies. Proc. Soc. Behav. Sci. 138, 557–565 (2014). https://doi.org/10.1016/j.sbspro.2014.07.239CrossRef
Zurück zum Zitat Gong, S., Ardeshiri, A., Hossein Rashidi, T.: Impact of government incentives on the market penetration of electric vehicles in Australia. Transp. Res. Part D Transp. Environ. 83, 102353 (2020). https://doi.org/10.1016/j.trd.2020.102353CrossRef
Zurück zum Zitat Laffitte, P., Wang, Y., Sodoyer, D., Girin, L.: Assessing the performances of different neural network architectures for the detection of screams and shouts in public transportation. Expert Syst. Appl. 117, 29–41 (2019). https://doi.org/10.1016/j.eswa.2018.08.052CrossRef
Zurück zum Zitat Lari, Z.A., Golroo, A.: Automated transportation mode detection using smart phone applications via machine learning: Case study mega city of Tehran. In: Proceedings of the Transportation Research Board 94th Annual Meeting, Washington, DC, USA. pp. 11–15 (2015)
Zurück zum Zitat Li, L., Zhu, J., Zhang, H., Tan, H., Du, B., Ran, B.: Coupled application of generative adversarial networks and conventional neural networks for travel mode detection using GPS data. Transp. Res. Part A Policy Pract. 136, 282–292 (2020). https://doi.org/10.1016/j.tra.2020.04.005CrossRef
Zurück zum Zitat Lin, M., Hsu, W.J.: Mining GPS data for mobility patterns: a survey. Pervasive Mob. Comput. (2014). https://doi.org/10.1016/j.pmcj.2013.06.005CrossRef
Zurück zum Zitat Lin, M., Hsu, W., Lee, Z.Q.: Detecting modes of transport from unlabelled positioning sensor data. J. Loc. Based Serv. 7, 272–290 (2013). https://doi.org/10.1080/17489725.2013.819128CrossRef
Zurück zum Zitat Łukawska, M., Paulsen, M., Rasmussen, T.K., Jensen, A.F., Nielsen, O.A.: A joint bicycle route choice model for various cycling frequencies and trip distances based on a large crowdsourced GPS dataset. Transp. Res. Part A Policy Pract. 176, 103834 (2023). https://doi.org/10.1016/j.tra.2023.103834CrossRef
Zurück zum Zitat Lyons, G.: Getting smart about urban mobility—aligning the paradigms of smart and sustainable. Transp. Res. Part A Policy Pract. 115, 4–14 (2018). https://doi.org/10.1016/j.tra.2016.12.001CrossRef
Zurück zum Zitat Markos, C., Yu, J.J.Q.: Unsupervised Deep Learning for GPS-Based Transportation Mode Identification. Pp. 1–6 (2020). https://doi.org/10.1109/itsc45102.2020.9294673
Zurück zum Zitat McGowen, P., McNally, M.: Evaluating the potential to predict activity types from GPS and GIS data. In: Transportation Research Board 86th Annual Meeting, Washington. Citeseer (2007)
Zurück zum Zitat Mesaric, R., Mondal, A., Asmussen, K., Molloy, J., Bhat, C.R., Axhausen, K.W.: Impact of the COVID-19 pandemic on activity time use and timing behavior in Switzerland. Transp. Res. Rec. 2678, 47–58 (2024)CrossRef
Zurück zum Zitat Militão, A.M., Q. Ho, C., Nelson, J.D.: Mobility-as-a-service and travel behaviour change: how multimodal bundles reshape our travel choices. Transp. Res. Part A Policy Pract. 191, 104310 (2025). https://doi.org/10.1016/j.tra.2024.104310CrossRef
Zurück zum Zitat Moalagh, M., Ravasan, A.Z.: Developing a practical framework for assessing ERP post-implementation success using fuzzy analytic network process. Int. J. Prod. Res. 51, 1236–1257 (2013)CrossRef
Zurück zum Zitat Moiseeva, A., Jessurun, J., Timmermans, H.: Semiautomatic imputation of activity travel diaries: use of global positioning system traces, prompted recall, and context-sensitive learning algorithms. Transp. Res. Rec. 2183, 60–68 (2010)CrossRef
Zurück zum Zitat Molloy, J., Castro, A., Götschi, T., Schoeman, B., Tchervenkov, C., Tomic, U., Hintermann, B., Axhausen, K.W.: The MOBIS dataset: a large GPS dataset of mobility behaviour in Switzerland. Transportation (Amst). 50, 1–25 (2022)
Zurück zum Zitat Molloy, J., Castro, A., Götschi, T., Schoeman, B., Tchervenkov, C., Tomic, U., Hintermann, B., Axhausen, K.W.: The MOBIS dataset: a large GPS dataset of mobility behaviour in Switzerland. Transportation (Amst) 50, 1983–2007 (2023)CrossRef
Zurück zum Zitat Patterson, D. J., Liao, L., Fox, D., & Kautz, H.: Inferring high-level behavior from low-level sensors. In International Conference on Ubiquitous Computing (pp. 73–89). Berlin, Heidelberg: Springer Berlin Heidelberg (2003, October). https://doi.org/10.1007/978-3-540-39653-6_6
Zurück zum Zitat Rani, D.: Solving non-linear fixed-charge transportation problems using nature inspired non-linear particle swarm optimization algorithm. Appl. Soft Comput. 146, 110699 (2023)CrossRef
Zurück zum Zitat Rashidi, T.H., Auld, J., Mohammadian, A.: A behavioral housing search model: two-stage hazard-based and multinomial logit approach to choice-set formation and location selection. Transp. Res. Part A Policy Pract. 46, 1097–1107 (2012). https://doi.org/10.1016/j.tra.2012.01.007CrossRef
Zurück zum Zitat Richardson, P.J., Boyd, R.: Culture and the Evolutionary Process (1985)
Zurück zum Zitat Rieser-Schüssler, N., Axhausen, K.W.: Identifying chosen public transport connections from GPS observations. In: TRB 92nd Annual Meeting Compendium of Papers. pp. 13–588. Transportation Research Board (2013)
Zurück zum Zitat Saaty, T.L.: What is the Analytic Hierarchy Process? Springer (1988)CrossRef
Zurück zum Zitat Saaty, T.L., Vargas, L.G., Saaty, T.L., Vargas, L.G.: The Analytic Network Process. Springer (2013)
Zurück zum Zitat Saaty, T.L., Vargas, L.G.: Decision Making with the Analytic Network Process. Springer (2006)
Zurück zum Zitat Sadeghian, P., Mojarrad, B.B.: Analysing gender and temporal dynamics in human mobility patterns in Central Sweden. Geographies 5, 7 (2025)CrossRef
Zurück zum Zitat Sadeghian, P., Håkansson, J., Zhao, X.: Review and evaluation of methods in transport mode detection based on GPS tracking data. J. Traffic Transp. Eng. (Engl. Ed.) 8, 467–482 (2021). https://doi.org/10.1016/j.jtte.2021.04.004CrossRef
Zurück zum Zitat Sadeghian, P., Zhao, X., Golshan, A., Håkansson, J.: A stepwise methodology for transport mode detection in GPS tracking data. Travel Behav. Soc. 26, 159–167 (2022). https://doi.org/10.1016/j.tbs.2021.10.004CrossRef
Zurück zum Zitat Sadeghian, P., Golshan, A., Zhao, M.X., Håkansson, J.: A deep semi-supervised machine learning algorithm for detecting transportation modes based on GPS tracking data. Transportation (2024a). https://doi.org/10.1007/s11116-024-10472-xCrossRef
Zurück zum Zitat Sadeghian, P., Han, M., Håkansson, J., Zhao, M.X.: Testing feasibility of using a hidden Markov model on predicting human mobility based on GPS tracking data. Transp. B Transp. Dyn. 12, 2336037 (2024b). https://doi.org/10.1080/21680566.2024.2336037CrossRef
Zurück zum Zitat Salas, P., la De Fuente, R., Astroza, S., Carrasco, J.A.: A systematic comparative evaluation of machine learning classifiers and discrete choice models for travel mode choice in the presence of response heterogeneity. Expert Syst. Appl. 193, 116253 (2022). https://doi.org/10.1016/j.eswa.2021.116253CrossRef
Zurück zum Zitat Sauerländer-Biebl, A., Brockfeld, E., Suske, D., Melde, E.: Evaluation of a transport mode detection using fuzzy rules. Transp. Res. Proc. 25, 591 (2017)
Zurück zum Zitat Shakibaei, S., de Jong, G.C., Alpkökin, P., Rashidi, T.H.: Impact of the COVID-19 pandemic on travel behavior in Istanbul: a panel data analysis. Sustain. Cities Soc. 65, 102619 (2021). https://doi.org/10.1016/j.scs.2020.102619CrossRef
Zurück zum Zitat Singh, G., Singh, A.: Extension of Particle Swarm Optimization algorithm for solving two-level time minimization transportation problem. Math. Comput. Simul 204, 727–742 (2023)CrossRef
Zurück zum Zitat Sipahi, S., Timor, M.: The analytic hierarchy process and analytic network process: an overview of applications. Manag. Decis. 48, 775–808 (2010)CrossRef
Zurück zum Zitat Stenneth, L., Wolfson, O., Yu, P.S., Xu, B., Morgan, S.: Transportation Mode Detection using Mobile Phones and GIS Information. (2011). https://doi.org/10.1145/2093973.2093982
Zurück zum Zitat Stopher, P., FitzGerald, C., Zhang, J.: Search for a global positioning system device to measure person travel. Transp. Res. Part c, Emerg. Technol. 16, 350–369 (2008)CrossRef
Zurück zum Zitat Stopher, P.R., Jiang, Q., FitzGerald, C.: Processing GPS data from travel surveys. In: 2nd International Colloqium on the Behavioural Foundations of Integrated Land-Use and Transportation Models: Frameworks, Models and Applications, Toronto (2005)
Zurück zum Zitat Tian, S., Zhang, Y., Feng, Y., Elsagan, N., Ko, Y., Mozaffari, M.H., Xi, D.D.Z., Lee, C.-G.: Time series classification, augmentation and artificial-intelligence-enabled software for emergency response in freight transportation fires. Expert Syst. Appl. 233, 120914 (2023). https://doi.org/10.1016/j.eswa.2023.120914CrossRef
Zurück zum Zitat Vaidya, O.S., Kumar, S.: Analytic hierarchy process: an overview of applications. Eur. J. Oper. Res. 169, 1–29 (2006)CrossRef
Zurück zum Zitat Wang, B., Gao, L., Juan, Z.: Travel mode detection using GPS data and socioeconomic attributes based on a random forest classifier. IEEE Trans. Intell. Transp. Syst. 19, 1547–1558 (2017)CrossRef
Zurück zum Zitat Wang, Z., He, S.Y., Leung, Y.: Applying mobile phone data to travel behaviour research: a literature review. Travel Behav. Soc. 11, 141–155 (2018). https://doi.org/10.1016/j.tbs.2017.02.005CrossRef
Zurück zum Zitat Yao, Y., Zhang, H., Chen, Q.: Trip segmentation and mode detection for human mobility data. In: Handbook of Mobility Data Mining, pp. 97–115. Elsevier (2023)CrossRef
Zurück zum Zitat Yazdizadeh, A., Patterson, Z., Farooq, B.: Semi-supervised gans to infer travel modes in GPS trajectories. J. Big Data Anal. Transp. (2021). https://doi.org/10.1007/s42421-021-00047-yCrossRef
Zurück zum Zitat Zheng, C., Sun, K., Gu, Y., Shen, J., Du, M.: Multimodal transport path selection of cold chain logistics based on improved particle swarm optimization algorithm. J. Adv. Transp. (2022). https://doi.org/10.1155/2022/5458760CrossRef
    Bildnachweise
    AVL List GmbH/© AVL List GmbH, dSpace, BorgWarner, Smalley, FEV, Xometry Europe GmbH/© Xometry Europe GmbH, The MathWorks Deutschland GmbH/© The MathWorks Deutschland GmbH, HORIBA/© HORIBA, Outokumpu/© Outokumpu, Gentex GmbH/© Gentex GmbH, Ansys, Yokogawa GmbH/© Yokogawa GmbH, Softing Automotive Electronics GmbH/© Softing Automotive Electronics GmbH, measX GmbH & Co. KG, Hirose Electric GmbH/© Hirose Electric GmbH