Short-term speed predictions exploiting big data on large urban road networks

doi:10.1016/j.trc.2016.10.019

Transportation Research Part C: Emerging Technologies

Volume 73, December 2016, Pages 183-201

https://doi.org/10.1016/j.trc.2016.10.019 Get rights and content

Highlights

•
A consistent method for short-term speed predictions by using raw floating car data on large networks is presented.
•
A Bayesian network and a neural network are compared with a seasonal ARMA model.
•
A double star model reflecting time-space correlation allows for modular implementation.
•
The variable accuracy of speed measures from floating car data is taken into account.
•
A supervisor framework is envisaged to integrate different models.

Abstract

Big data from floating cars supply a frequent, ubiquitous sampling of traffic conditions on the road network and provide great opportunities for enhanced short-term traffic predictions based on real-time information on the whole network. Two network-based machine learning models, a Bayesian network and a neural network, are formulated with a double star framework that reflects time and space correlation among traffic variables and because of its modular structure is suitable for an automatic implementation on large road networks. Among different mono-dimensional time-series models, a seasonal autoregressive moving average model (SARMA) is selected for comparison. The time-series model is also used in a hybrid modeling framework to provide the Bayesian network with an a priori estimation of the predicted speed, which is then corrected exploiting the information collected on other links. A large floating car data set on a sub-area of the road network of Rome is used for validation. To account for the variable accuracy of the speed estimated from floating car data, a new error indicator is introduced that relates accuracy of prediction to accuracy of measure. Validation results highlighted that the spatial architecture of the Bayesian network is advantageous in standard conditions, where a priori knowledge is more significant, while mono-dimensional time series revealed to be more valuable in the few cases of non-recurrent congestion conditions observed in the data set. The results obtained suggested introducing a supervisor framework that selects the most suitable prediction depending on the detected traffic regimes.

Introduction

Fast and accurate predictions of future traffic conditions are a crucial requirement for reliable applications of Intelligent Transportation Systems (ITS) devoted to traffic management and traveler information, whose intelligence is related to their capability to foresee future states of the system and individuate the most appropriate actions to undertake. Advances in Information and Communication Technologies (ICT) are currently making available an unprecedented amount of measures of traffic variables from the road network that are a premise for introducing new models and methods for traffic predictions (Shi and Abdel-Aty, 2015). Traditional traffic monitoring systems are based on fixed measure stations where flows, occupancy and possibly speed are detected. Collected data are then transmitted to the traffic control center, where they are processed to derive short-term predictions. The relatively high cost of investment and maintenance of fixed monitoring system was one of the most relevant limiting factors for a full ITS deployment although efficient algorithms for optimizing sensor locations were developed (Cipriani et al., 2006).

Availability of Floating Car Data (FCD) obtained by tracking GPS-enabled vehicles and mobile devices opens new perspectives to develop novel predicting models. In fact, they provide a pervasive tool to explore the road network and get information related to theoretically any point of the network (Fusco et al., 2015) and, in a near future, perform self-organizing monitoring techniques (Baiocchi et al., 2015). The existence of very detailed road graphs developed for on-board navigators would require equally detailed estimations of present and future traffic conditions. However, a suitable trade-off between reliability and accuracy of traffic estimates and predictions should be investigated. The main drawback of FCD is that the information is collected from only a sample of vehicles that send their current positions and speeds. Thus, they provide ubiquitous but partial information. This requires a supplementary effort to process these data and combine measures collected at different points and different instants. Moreover, while the sampling rule is usually specified, the actual sampling rate on each road link is unknown, so that the reliability of the measures is variable and difficult to estimate, except for the few links equipped with fixed traffic counting stations. Furthermore, in links not traveled by equipped vehicles data are missed at all. In the last years, several private companies have started collecting and selling real-time speed data from different sources, including floating car data. Aggregate measures supplied by private providers are usually paired with some qualitative confidence value and so preclude performing a rigorous estimation of the statistical significance of the data. Although the accuracy appeared to be improved since the earliest independent evaluation (Kim and Coifman, 2014), the reliability of traffic measures is still a crucial issue for studies dealing with short-term prediction methods that use floating car data. The huge amount of data collected in real-time on the road network requires also efficient analysis methods to catch the most useful information embedded in such time–space big data.

A large interest for machine learning methods arose in the last years in the literature on big data analysis and many network-based approaches, such as neural networks and Bayesian networks, were proposed with the aim of exploiting existing correlations among measures collected at different time intervals and on different links of the network. Specifically, Bayesian networks, which combine graph structure and Bayes approach to posterior probability from a priori estimate seem to offer a sound methodology for formulating short-term predictions from the pervasive sampling of traffic performances provided by floating car data.

In this paper, we aim at investigating the potentials of these methods to produce accurate short-term traffic predictions by exploiting floating car data collected ubiquitously on the network from a number of probe vehicles that is indeed large in absolute but is a relatively small fraction of the traffic flow on each link of the road network.

Two main approaches can be individuated to perform short-term traffic predictions: either explicit or implicit traffic modeling. Explicit approach is based on mathematical models that represent the interactions between the physical variables that describe traffic phenomena. Traffic on freeways is usually modeled by macroscopic continuous models that discretize in time and space the partial differential equations that describe traffic dynamics. Traffic on urban road networks needs dynamic traffic assignment models that simulate the complex dynamic interactions between drivers’ trip choices, vehicular congestion and road performances on the traffic networks.

Application of traffic models for real-time short-term predictions requires recursive methods implementable online. The rolling horizon method exploits current traffic measures to update trip demand estimation at every given short time interval and runs a new traffic simulation, which covers a longer time interval and holds until a new update is available. Relevant examples are the Dynasmart-X (Mahmassani et al., 2005) and Dynamit (Ben-Akiva et al., 2012). State-space models formulate the dynamic evolution of all traffic variables on the road network based on available real-time traffic measurements under a probabilistic environment (Muñoz et al., 2003). Typical applications for short-term real-time predictions imply the linear approximation of non-linear macroscopic traffic models that leads to the extended Kalman filter formulation (Stathopoulos and Karlaftis, 2003, Wang and Papageorgiou, 2005), although other approximation methods such as particle filter (Mihaylova et al., 2007) and Newtonian relaxation (Herrera and Bayen, 2010) were developed. The switching-mode model, which can be thought of as a combination of the hidden Markov model and the linear state-space model (Sun et al., 2003), was introduced to reproduce the possible transitions from a discrete traffic state to another, namely free-flow and congestion states that characterize the cell transmission model (Daganzo, 1994). A more complex architecture implements artificial neural networks to derive density values and determine transitions between traffic states on the linearized triangular fundamental diagram (Celikoglu, 2014).

Implicit approach derives dynamic relationships directly from time series of observed data and therefore is usually called data-driven approach. Although we acknowledge that explicit models have superior interpretation capabilities with respect to implicit models and can be applied to generate control and information strategies that prevent system over-reaction (Ben-Akiva, 1985), we recognize also that they require a huge effort to achieve an adequately accurate calibration of a large urban network. On the other hand, the enormous amount of available data on urban mobility makes implicit models a valuable alternative, easier to implement and open to possible integrations with explicit models within a hybrid rolling horizon framework that applies an explicit model to forecast traffic states over a time horizon of a few hours and an implicit model that adjusts prior model forecasts on the basis of real-time measures and supplies posterior short-term predictions. Thus, in this paper we focus on studying suitable structures of data-driven models to exploit time-space information embedded in floating car big data and testing the accuracy of the short-term predictions so obtained.

Data driven methods for short term traffic forecasting are object of a huge literature, which has been the object of a recent special issue on this journal (Zhang, 2014). We refer to the papers by Vlahogianni et al., 2014, Oh et al., 2015 for a complete review of the state-of-the art and we focus here on the following issues: (i) the relevance of capturing the time-space correlation for short-term traffic forecasting in urban road networks through implicit models; (ii) the opportunities and concerns that arise from variable point traffic measures collected by sparsely sampled vehicles on the whole road network; and (iii) the generalization capability of probabilistic graphical models with respect to different congestion patterns.

Although the majority of previous studies conducted independent forecasting for each single monitored section of the road (Cai et al., 2016), several attempts were made in the past to catch spatial correlation between traffic variables on the road network by extending time-series models to multivariate form (Kamarianakis and Prastacos, 2005, Chandra and Al-Deek, 2009, Guo et al., 2014, Mai et al., 2015, Li et al., 2015a), through implicit prediction models that include a network structure, such as artificial neural networks (Fusco and Gori, 1996, Dougherty and Cobbett, 1997, Zhang, 2000, Zhu et al., 2014, Ma et al., 2015a, Ma et al., 2015b), Bayesian networks (Sun et al., 2006, Castillo et al., 2008, Hofleitner et al., 2012, Chen et al., 2015), deep architecture models (Lv et al., 2015). Several authors devised hybrid methods that combine different techniques and use multiple predictors (among others: Zhang, 2003, Zheng et al., 2006, van Hinsbergen et al., 2009, Wang et al., 2014). Chen et al. (2012) performed a systematic comparison of different methods for the short-term prediction on a single loop sensor and found that Bayesian networks and artificial neural networks be effective and efficient prediction models, although traffic breakdowns can be identified but cannot be accurately predicted. Other authors focused on the spatial-temporal correlation among traffic measures to face the complementary problem of estimating missed data, and applied either a tensor-based method (Tan et al., 2013) or a kernel probabilistic principle component analysis (Li et al., 2013). Recently, Lv et al. (2015) pointed out that traffic prediction models are still unsatisfying for many real-world applications and rethought the traffic flow prediction problem based on with big traffic data.

With reference to the second issue mentioned above, an increased interest in the opportunity of using FCD for traffic predictions arose in last years. First studies were based on data collected by special fleets like taxis (Cfr. Castro-Neto et al. (2009) for a review) or vehicles equipped with GPS specifically for the traffic experiment (Herrera et al., 2010, Bucknell and Herrera, 2014). Other studies on short-term traffic predictions from FCD used synthetic data to estimate the suitable penetration rate of vehicles to get accurate predictions (Deng et al., 2013). Feng et al. (2014) analyzed vehicle trajectories tracked in NGSIM experiment and developed a Bayesian method to estimate the probability distribution of travel times among different vehicles by taking into account synthetic GPS data and signal setting parameters to identify prevailing actual traffic conditions in real-time. Ye et al. (2012) studied a method to accommodate data recorded at irregular intervals, which exploits information from adjacent links. Among the studies based on-the-field data, Kim and Coifman (2014) analyzed aggregated information provided by INRIX company against loop detector measurements on 44 links and highlighted that they do not appear to reflect the latency with respect to reference measures or the occurrence of repeated reported speeds. Schneider et al. (2010) compared the effectiveness and accuracy of floating car studies with that achievable by Bluetooth technology. Patire et al. (2015) discussed the opportunities and challenges related to the use of non-aggregated point-speed GPS data and developed a data fusion method to exploit raw probe data in addition to fixed sensor counts.

As far as the generalization capability of prediction models to provide accurate predictions under different congestion patterns, almost all studies, with the exception of Guo et al. (2014), applied the short-term prediction models to a selected set of data covering a suitable time interval and assess their performances on the whole period, without inspecting the reliability of predictions in the case of heavy congestion. Many authors looked at the problem from a different perspective and tried to improve the traffic prediction by adapting the model framework to different traffic states. Two main approaches can be individuated: a clustering approach, which classifies traffic states either on the basis of the observed time-series pattern (Cai et al., 2016) or over the fundamental diagram (Celikoglu and Silgu, 2016, Antoniou et al., 2013), and a regime switching approach, again based on either time-series pattern (Cetin and Comert, 2006, Kamarianakis et al., 2012) or on the fit to the fundamental diagram (Dunne and Ghosh, 2012). Charle et al. (2010) addressed a rather different problem, which was route travel time reliability, and analyzed the historical space correlations between travel times of close links. Their perspective highlights the significance of long-term effects to individuate recurrent congestion conditions, which the short-term variation superimposes to. A reliable historical estimate is significant especially when dealing with FCD, whose sampling rate in real-time is often low other than unknown, so reducing the reliability of predictions founded on short-term series only. So far, few studies were based on large real data sets of FCD, as it would be necessary to face the question concerning the reliability of traffic forecasting methods based on FCD with respect to the reliability of the measures. Hofleitner et al. (2012) used individual FCD collected by 500 cars in a specific experiment; Cai et al. (2016) used a data set of space mean speed data collected on 30 road segments for 20 weekdays. Data were suitably preprocessed to fill missed data and eliminate abnormal values and filtered to get smoothed data. In a very recent paper (Fusco et al., 2016), we compared different network-based short-term forecasting models on a 10-month long series of aggregated measures obtained from FCD and we proposed a model structure conceived to perform forecasts on large networks exploiting speed estimates on all the links where they are available.

The paper aims at providing a consistent method for short-term speed predictions on large networks based on raw floating car data and presents a modeling framework that implements some well-known network-structured prediction models. The paper also focuses on the issues that the analysis of the literature revealed to be worthy of further examination: the reliability of traffic measures collected at random points of the road network; the suitability of different prediction models with respect to different traffic conditions, such as free-flow, recurrent and non-recurrent congestion. The approach that we aim at following is that the nature of traffic congestion implicates that the computational methodologies of artificial intelligence must be transportation-inspired.

We introduce different architectures of machine learning models based on different levels of exploration of the road network in order to catch possible spatial correlations among traffic measures taken on different links of the network. In contrast to our previous study (Fusco et al., 2016), where we used the historical average speed as an a priori estimation, we are here closer to the Bayesian approach and we try to provide an as good as possible a priori estimation based on previous observations. Thus, we formulate a hybrid modeling framework where we integrate the best a priori estimation based on time correlation, which is provided by a consolidated Seasonal ARIMA model, with the spatial correlation estimated through a Bayesian network. Unlike our previous paper as well as other works in the literature, with the exception of Patire et al. (2015), we deal with issues and advantages of using raw data of individual cars. While Patire et al. focus on the question of sampling and penetration rates and present a data fusion framework to integrate floating car data and fixed point measurements, we introduce here a consistent method designed to use disaggregated raw data. We specify the model variables to exploit all the available information about traffic estimation. Specifically, variances between individual speeds and the number of measures in each time interval on each link are considered to account for the time-variable accuracy of the measures. However, no flow measure is assumed because the number of counts available is very often insufficient to get an accurate estimation in a reasonable time interval. We also enhance the validation method by introducing specific error indicators that relate the accuracy of different prediction models with the accuracy of the measures.

Unlike most studies in the literature, we assess the performances of the models under different traffic congestion conditions. Fig. 1 provides a flow-chart of the problems arising from sparse floating car data, the specific procedures implemented to face with each of them and the corresponding solutions that compose our method. It highlights also the main advantages that this method offers with respect to the state-of-the-art: the variable selection focuses on a fundamental issue of sparse floating car data, that is their variable sampling rate, and allows considering the accuracy of observed data in both model structure and prediction results; the double-star network structure of the forecasting models allows an easy modular implementation of the procedure even on very large networks as well as a parallel computation that preserves anyway the possible spatial correlation among the links; hybrid model formulation with a priori autoregressive predictions allows an easy extension of the model to integrate a supervisor mechanism that selects the best forecasting model based on estimated traffic conditions.

In contrast with other papers in the literature that aim at adapting the model framework to different traffic states, such supervisor exploits only individual point speed observations, so it does not require flow measure. Moreover, it does not seek to estimate traffic states but to individuate the occurrence of anomalous conditions and then it relaxes relationships based on recurrent observations. Finally, while only limited tests have been presented until now in the literature on traffic predictions using GPS-equipped Floating Car Data, we present a large numerical experiment conducted on a big data set composed of about 300,000 single point-speed data collected on a wide portion of an urban street network (120 links) selected on a sub-network of a large town, Rome.

The rest of the paper is organized as follows. Section 2 presents the methodology proposed for short-term forecasting, describes the state-of-art methods selected for reference and introduces the error indicators chosen for the comparison between different methods. Section 3 illustrates the experimental application on a suitable subarea of the road network of Rome, where the data set was available. Results of different prediction methods under different traffic conditions are illustrated and commented. Conclusions and suggestions for further research are reported in Section 4.

Section snippets

Time series analysis

Autoregressive Integrated Moving Average (ARIMA) is one of the most consolidated methods for time-series forecasting, used in various fields and introduced in traffic forecast on freeways since the late ‘70s by Ahmed and Cook (1979). In the case of stationary time series, the forecast provided by the Autoregressive Moving Average (ARMA) model is a linear combination of past observations multiplied by coefficients reflecting autoregressive (AR) and moving average (MA) nature of the process. In

Data set

The study area is composed of the primary urban road network of the EUR district in the Southern area of Rome, depicted in 0. The complete data set included one month of raw Floating Car Data obtained by a fleet of about 100,000 GPS equipped private vehicles, corresponding to about the 2.5% of the whole vehicular fleet of the town. Every data point, detected with a frequency rate of 1 reading every 2 min, reports the individual position and speed, the state of the engine (turned on, turned off,

Performance analysis under different traffic congestion conditions

In order to assess model performances in different traffic patterns, we divided traffic condition into two groups: recurrent traffic condition, i.e. traffic pattern which is normally observed on a link, and non-recurrent traffic condition, which can be defined as a strong and sudden deviation from the standard situation.

Conclusions

The paper dealt with the problem of providing reliable short-term forecasts on urban road traffic networks by exploiting ubiquitous big data composed by individual point speeds from a large fleet of private cars. In order to reflect the transportation nature of the problem, the topology of the road network was taken into account by different network-based models, namely Bayesian Network (BN) and Neural Network (NN), trained to reproduce the spatial-temporal correlation between traffic

References (69)

C. Antoniou et al.
Dynamic data-driven local traffic state estimation and prediction
Transp. Res. Part C Emerg. Technol.
(2013)
A. Baiocchi et al.
Vehicular ad-hoc networks sampling protocols for traffic monitoring and incident detection in intelligent transportation systems
Transp. Res. Part C Emerg. Technol.
(2015)
M. Ben-Akiva
Dynamic network equilibrium research
Transp. Res. Part A: Gen.
(1985)
M. Ben-Akiva et al.
A dynamic traffic assignment model for highly congested urban networks
Transp. Res. Part C
(2012)
C. Bucknell et al.
A trade-off analysis between penetration rate and sampling frequency of mobile sensors in traffic state estimation
Transp. Res. Part C Emerg. Technol.
(2014)
P. Cai et al.
A spatiotemporal correlative k-nearest neighbor model for short-term traffic multistep forecasting
Transp. Res. Part C Emerg. Technol.
(2016)
E. Castillo et al.
Predicting traffic flow using Bayesian networks
Transp. Res. Part B Methodol.
(2008)
M. Castro-Neto et al.
Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions
Expert Syst. Appl.
(2009)
C. Chen et al.
The retrieval of intra-day trend and its influence on traffic prediction
Transp. Res. Part C Emerg. Technol.
(2012)
C. Chen et al.
Bayesian network-based formulation and analysis for toll road utilization supported by traffic information provision
Transp. Res. Part C: Emerg. Technol.
(2015)

C.F. Daganzo

The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory

Transp. Res. Part B: Methodol.

(1994)

J. de Oña et al.

Analysis of traffic accidents on rural highways using latent class clustering and Bayesian networks

Accid. Anal. Prev.

(2013)

W. Deng et al.

Traffic state estimation and uncertainty quantification based on heterogeneous data sources: a three detector approach

Transp. Res. Part B Methodol.

(2013)

M.S. Dougherty et al.

Short-term inter-urban traffic forecasts using neural networks

Int. J. Forecast.

(1997)

Y. Feng et al.

Probe vehicle based real-time traffic monitoring on urban roadways

Transp. Res. Part C Emerg. Technol.

(2014)

J. Guo et al.

Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification

Transp. Res. Part C Emerg. Technol.

(2014)

J.C. Herrera et al.

Incorporation of Lagrangian measurements in freeway traffic state estimation

Transp. Res. Part B—Methodol.

(2010)

J.C. Herrera et al.

Evaluation of traffic data obtained via GPS-enabled mobile phones: The Mobile Century field experiment

Transp. Res. Part C Emerg. Technol.

(2010)

A. Hofleitner et al.

Arterial travel time forecast with streaming data: a hybrid approach of flow modeling and machine learning

Transp. Res. Part B Methodol.

(2012)

Y. Kamarianakis et al.

Space–time modeling of traffic flow

Comput. Geosci.

(2005)

S. Kim et al.

Comparing INRIX speed data against concurrent loop detector stations over several months

Transp. Res. Part C Emerg. Technol.

(2014)

L. Li et al.

Efficient missing data imputing for traffic flow by considering temporal and spatial dependence

Transp. Res. Part C Emerg. Technol.

(2013)

L. Li et al.

Robust causal dependence mining in big data network and its application to traffic flow predictions

Transp. Res. Part C Emerg. Technol.

(2015)

T. Ma et al.

Nonlinear multivariate time–space threshold vector error correction model for short term traffic state prediction

Transp. Res. Part B Methodol.

(2015)

X. Ma et al.

Long short-term memory neural network for traffic speed prediction using remote microwave sensor data

Transp. Res. Part C: Emerg. Technol.

(2015)

L. Mihaylova et al.

Freeway traffic estimation within particle filtering framework

Automatica

(2007)

S. Oh et al.

Short-term travel-time prediction on highway: a review of the data-driven approach

Transp. Rev.

(2015)

A.D. Patire et al.

How much GPS data do we need?

Transp. Res. Part C

(2015)

B.L. Smith et al.

Comparison of parametric and nonparametric models for traffic flow forecasting

Transp. Res. Part C Emerg. Technol.

(2002)

A. Stathopoulos et al.

A multivariate state space approach for urban traffic flow modeling and prediction

Transport. Res. Part C: Emerg. Technol.

(2003)

H. Tan et al.

A tensor-based method for missing traffic data completion

Transp. Res. Part C Emerg. Technol.

(2013)

C.P.I. van Hinsbergen et al.

Bayesian committee of neural networks to predict travel times with confidence intervals

Transp. Res. Part C: Emerg. Technol.

(2009)

E.I. Vlahogianni et al.

Short-term traffic forecasting: where we are and where we’re going

Transp. Res. Part C Emerg. Technol.

(2014)

Y. Wang et al.

Real-time freeway traffic state estimation based on extended Kalman filter: a general approach

Transp. Res. Part B: Methodol.

(2005)

Cited by (132)

A fundamental diagram based hybrid framework for traffic flow estimation and prediction by combining a Markovian model with deep learning
2024, Expert Systems with Applications
Accurate traffic congestion estimation and prediction are critical building blocks for smart trip planning and rerouting decisions in transportation systems. Over the decades, there have been many studies focusing on traffic congestion estimation and prediction with different statistical approaches (e.g., Markov chain) and machine learning models (e.g., clustering, Bayesian networks, and artificial neural networks). However, there is a lack of a unified framework to address the mechanisms of different models and integrate the advantages of different methods through combinations. This paper introduces the FD-Markov-LSTM model, a hybrid interpretable approach that combines the fundamental diagram (FD), Markov chain, and long short-term memory (LSTM). The aim is to estimate and predict traffic states by integrating statistical data in both congested and uncongested scenarios. The FD-Markov-LSTM model leverages the FD to identify hierarchical traffic states and utilizes the Markov process to capture the probabilistic transitions between these states. We employ the LSTM model to further capture the residual time series produced by the Markov chain model (assuming a memoryless property) to enhance the estimation and prediction performance. The proposed model's accuracy in estimating and predicting traffic flow is evaluated using empirical data from three case studies conducted in Beijing and Los Angeles. The results highlight a significant improvement in accuracy compared to classical benchmark models such as the Markov model, ARIMA model, k-Nearest Neighbor model, Random Forest model, and LSTM. Specifically, the FD-Markov-LSTM model achieves reductions of over 39% in mean absolute error, 35% in root mean squared error, and 7.4% in mean absolute percentage error. These results clearly demonstrate that the FD-Markov-LSTM model outperforms the benchmark models, enabling more precise predictions of traffic flow.
Short-term urban rail transit passenger flow forecasting based on fusion model methods using univariate time series
2023, Applied Soft Computing
Global urbanization has made the urban rail transit system an essential service for a growing population. To help urban rail transit stations design optimal operational plans, previous studies have devoted extensive efforts to passenger flow forecasting, especially short-term predictions. By considering the complex pattern of passenger flow, previous research investigated the feasibility of machine learning (ML) methods on different data features and found the limited application of a single ML method. Based on the dynamic historical passenger flow data at an urban rail station, this study proposes an ML-fusion strategy to enhance prediction accuracy, including data aggregation, time series forecasting model selection, and fusion model strategy. First, this study aggregates the data into working days, weekends, and hourly time series for single model development. Based on the predictive performance of single model development, this study selects XGBoost, AdaBoost, and LightGBM from the widely used ML-method pool. To overcome prediction errors caused by the discrepancy between characteristics of passenger flow and single prediction models, the proposed ML-fusion model combines single forecasting models with dynamically predicted passenger flow to enhance the accuracy and efficiency of the prediction. Based on the experimental results, the mean absolute error is 1.54, and the regression coefficient is 0.99, which is in close agreement with unity, which validates that the proposed ML-fusion method has displayed superiority over all other single models tested both in accuracy and stability.
Bibliometric methods in traffic flow prediction based on artificial intelligence
2023, Expert Systems with Applications
Artificial intelligence (AI) technologies are increasingly applied to traffic flow prediction (TFP) to enhance prediction accuracy. This study utilizes bibliometric methods and network analysis measures to gain insights into the research status, development process, opportunities, and challenges of AI-based TFP research based on the literature data retrieved from the Web of Science core collection. The study first conducts basic statistical analysis of all papers. Subsequently, cooperation network analysis is conducted to identify the most productive countries/territories, institutions, and authors, the cooperative relationships, and the formed research communities. Co-citation network analysis is then employed to identify publications that have made outstanding contributions to the AI-based TFP field. Finally, the main path analysis of the paper citation network is used to analyze the knowledge diffusion process, while the keyword co-occurrence analysis is conducted to reveal the evolution characteristics of the research topics. Based on the bibliometric analysis results, we gain insights into the opportunities and challenges in this field from the perspectives of data, models, and applications, and provide pertinent suggestions for future research. Overall, this study can assist researchers in capturing the state-of-the-art and research directions in AI-based TFP.
Meta-heuristic aggregate calibration of transport models exploiting data collected in mobility
2023, Case Studies on Transport Policy
The wide diffusion of data collected in mobility led to an unprecedented amount of information about people's mobility behavior. While on one hand the availability of big data from multiple sources enables to calibrate complex models with a high number of parameters, on the other hand, the dimension of the problem increases, and computational efficiency becomes an important issue. The paper presents a general methodology for the aggregate calibration of transport system models that exploits data collected in mobility jointly with other data sources within a multi-step optimization procedure based on metaheuristic algorithms. The methodology is applied to two real large-scale case studies in two different contexts. The first concerns the aggregate calibration updating a national strategic 4-step demand model in use in a big European Country; the second deals with the calibration of link and node performance functions implemented in a traffic network model of a town of around 3 million inhabitants. The results demonstrate the effectiveness of the aggregate calibration methodology in significantly improving earlier models’ estimations. The results also highlight that the errors are in the same order of magnitude as the intrinsic variation of the data collected in the field.
Traffic congestion patterns in the urban road network: (Dammam metropolitan area)
2023, Ain Shams Engineering Journal
Traffic congestion is a significant problem affecting the sustainable development of urban traffic. It is important to analyze the congestion and forecast future traffic models to prevent traffic congestion. This study is conducted with the main aim to determine the most congested area of the road network and determine how they are related to the demand of the drivers. This study uses the Floating Car Data method to find the traffic congestion and the degree to which observed congestion clusters are a meaningful representation of congestion patterns within a more extensive urban road network. Statistical calculations have been carried out to determine the correlation between clusters based on which conclusions are drawn. Findings have shown that this approach can effectively identify the traffic congestion patterns in the urban road network. The analyses of the traffic congestion behaviour have shown that congestion is more severe and widespread in evening rush hours than morning. Overall, the results can be used to develop a framework to describe potential traffic issues and a system for predicting congestion.
Speed data collection methods: a review
2023, Transportation Research Procedia
Various studies have been focusing on a wide range of techniques to detect traffic flow characteristics, like speed and travel times. Therefore, a key aspect to obtain statistically significant set of data is to observe and record driver behaviours in real world.
To collect traffic data, traditional methods of traffic measurement – such as detection stations, radar guns or video cameras – have been used over the years. Other innovative methods refer to probe vehicles equipped with GPS devices and/or cameras, which allow continuous surveys along the entire road route.
While point-based devices provide information of the entire flow, just in the section in which they are installed and only in the time domain, probe vehicles data are referred both to temporal and space domains but ignore traffic conditions. Obviously, it is necessary that the data collected refer to representative samples, by number and composition, of the user population.
The paper proposes a review of the most used methods for speed data collection, highlighting the advantages and disadvantages of each experimental approach. Accordingly, the comparison illustrates the best relief method to be adopted depending on the research and investigation that will be performed.

View all citing articles on Scopus

View full text

Short-term speed predictions exploiting big data on large urban road networks

Highlights

Abstract

Introduction

Section snippets

Time series analysis

Data set

Performance analysis under different traffic congestion conditions

Conclusions

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part A: Gen.

Transp. Res. Part C

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part B Methodol.

Expert Syst. Appl.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part C: Emerg. Technol.

Transp. Res. Part B: Methodol.

Accid. Anal. Prev.

Transp. Res. Part B Methodol.

Int. J. Forecast.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part B—Methodol.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part B Methodol.

Comput. Geosci.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part B Methodol.

Transp. Res. Part C: Emerg. Technol.

Automatica

Transp. Rev.

Transp. Res. Part C

Transp. Res. Part C Emerg. Technol.

Transport. Res. Part C: Emerg. Technol.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part C: Emerg. Technol.

Transp. Res. Part C Emerg. Technol.

Transp. Res. Part B: Methodol.