Traffic prediction is a task where the goal is to determine the number and type of vehicles, or some other traffic related metric, at certain time point. In addition to predicting the short-term evolution of traffic, prediction can be done for estimating traffic for distant future based on the trends found in historical traffic data, which is a critical component of traffic simulators being able to spawn realistic number of vehicles under prevailing situation. Such prediction system needs to be dependent on the characteristics of the situation and not the preceding traffic flow. This work presents a deep learning based prediction pipeline that uses a Long Short Term Memory (LSTM) network to map temporal, weather and traffic accident data accurately into traffic flow to predict traffic flow over multiple timesteps from various non-traffic inputs. Traffic data can then be produced based on independent data like weather forecasts and be used for other applications. As far as we know, no previous traffic predictor combines so many input variables to predict traffic flow with vehicle type information. To make the event based traffic accident dataset compatible with time series data, a novel preprocessing step based on power law decay phenomenon is added. Quantitative experiments show that the proposed preprocessing step and optimized hyperparameters improve the accuracy of the predictor on multiple metrics compared to a model without accident information. In two established statistical evaluation metrics, Mean Absolute Error and Mean Squared Error, the improvement was over \(20 \%\) for certain vehicle types.
Hinweise
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Abkürzungen
LSTM
Long Short Term Memory
UN
United Nations
ARIMA
Autoregressive Integrated Moving Average
GAN
Generative Adversarial Network
LWR
Lighthill-Whitham-Richards
HA
Historical Average
SARIMA
Seasonal Autoregressive Integrated Moving Average
CNN
Convolutional Neural Network
CGAN
Conditional Generative Adversarial Network
RNN
Recurrent Neural Network
TMS
Traffic Measurement System
MAE
Mean Absolute Error
MSE
Mean Squared Error
\(R^2\)
Coefficient of Determination
sMAPE
symmetric Mean Absolute Percentage Error
RAE
Relative Absolute Error
GEH
Geoffrey E. Havers
1 Introduction
With the expected increase in road traffic and traffic congestion in the future [1], urban development methods that improve general livability and traffic efficiency while reducing the costs are a necessity. According to the European Environment Agency, transportation contributes to 25% of the total greenhouse gas emissions in the European Union, with the majority (70%) of the emissions coming from road traffic [2]. This means that efficient management of traffic congestion is vital for quality of life in urban areas.
Traffic simulations are used in traffic research for various purposes including traffic modeling, planning and urban development. Traffic simulations have been developed since 1950s [3] to support traffic and urban planning as well as to provide data for simulations at other domains, such as air quality research [4]. At present, simulations providing recommendations for planning future smart cities are seen as a path for achieving some of the UN’s Sustainable Development Goals [5] and therefore computer science research at various domains is actively looking for novel approaches. Some examples of the most resent innovations for traffic simulations include computer vision solutions for learning drivers’ behaviour at intersections [6] and Reinforcement Learning solutions to control self driving vehicles [7] or automated traffic lights [8].
Anzeige
Traffic simulation and the accompanying traffic flow data are valuable tools for studying the relationship between emissions and air quality. The traffic flow data plays a crucial role in air quality simulations and research by providing inputs for understanding and assessing the impact of traffic-related emissions. These models utilize traffic flow data to determine the driving patterns, fleet composition, and congestion levels, which are essential inputs for estimating emissions of pollutants from vehicles. Traffic flow data allows researchers to characterize the contribution of related emissions to overall air pollution in specific areas. Furthermore, traffic models can simulate future scenarios which can be used to model the future air quality spatially in newly planned city neighborhoods.
Traffic simulations can be divided into macroscopic and microscopic, based on their scale and focus. Macroscopic simulations model traffic in an ensemble, as total traffic flow and traffic density [9]. They are suitable for situations where the goal is to simulate the traffic flow of a selected area [10]. On the other hand, microscopic traffic simulations aim to simulate the behaviour of each individual vehicle [11].
Traffic prediction is an essential part of traffic simulations [12]. Traffic prediction methods infer the number of vehicles spawned into the traffic flow at specific geographical locations [13]. The most simple traffic prediction approach is to compute the statistical parameters from historical traffic data and then to model the predicted traffic to follow common distributions, like Poisson [14] or uniform distribution [15, 16]. Traditionally, statistical time series analysis models, such as Autoregressive Integrated Moving Average (ARIMA) [17] and Bayesian networks [18] have been used for more realistic traffic prediction. Traffic prediction has also been done by modeling the traffic flow using fluid dynamics [10]. At present, traffic prediction methods are largely based on machine learning, specifically deep neural networks, since they are able to learn complex relations within large datasets, thus making them suitable for various tasks [12]. Most used deep learning methods include Long Short Term Memory networks (LSTMs) and Generative Adversarial Networks (GANs) [19], and most recently combinations of these two [20].
Our goal is to develop a simulator that is able to spawn realistic number of vehicles into the system under a situation that we define based on a selected scenario. The situation will be defined on for example the time of the day, weather and recent incidents, but not on any knowledge of preceding traffic situation. Thereby our prediction model needs to estimate the traffic flow based on the trends found in the historical traffic data meaning that it is dependent only on the characteristics of the situation [21, 22]. This allows for our method to be combined with a microscopic traffic simulator like SUMO to keep producing traffic flow data during the running of the simulation. For our needs current deep learning based traffic predictors have limitations. Most deep learning based traffic prediction methods are based on models that need past traffic data also at inference phase. This means that predicting traffic flow for a specific scenario requires always data from the scenario in question and prediction is not generalizable. Conditional GANs enable predicting traffic flow without using traffic as an explanatory variable. However, the process has largely still not been explored [23]. Another limitation is that the prediction methods do not generally differentiate between vehicle classes (e.g. trucks, vans and busses), meaning the traffic flow is depicted as a single variable [12]. In many applications this is enough, although the different vehicle classes have characteristics which affect their behaviour, such as speed limits and emission profiles, necessary parts of an accurate traffic simulation.
Anzeige
Another main limitation of the existing traffic prediction methods is that even though external conditions, such as weather [24] and accidents [25] effect the traffic flow, these are not used as part of the traffic prediction. Most deep learning based methods use just past traffic flow data as an input variable to predict future traffic flow. Fusion of instances from different data types is challenging [26]. For instance, traffic accidents are event-based whereas weather and traffic data are time dependent. However, with pre-processing traffic accident data can be presented as time series where the effect of accidents on traffic flow decreases over time following power law decay, using a method based on inverse correlation between accidents and traffic flow [27].
The ultimate goal of our research is to provide planning solutions that improve the local air quality, accessibility and traffic efficiency by developing novel artificial intelligence methods to consider population structures and traffic flows jointly with the state-of-the-art air pollution modelling. The planning solution requires a tool as a basis for simulating the different phenomena. The tool must be able to simulate the traffic situation of a planned non-existing urban environment and therefore requires a mechanism for estimating the realistic number of vehicles in the area. Therefore, the goal of this research is to develop a traffic prediction method to be the basis of our simulation tool. The method should be able to 1) predict the traffic flow, i.e. the number of vehicles with multiple vehicle classes for different scenarios by using only variables related to selected situational and environmental characteristics, specifically weather, temporal and accident data without using previous traffic flow data as an input during inference phase and 2) fuse multi-modal data at an extent not done before, with a power law based preprocessing step for traffic accident data. Therefore, we have developed a deep learning, namely LSTM based traffic prediction framework to map multi source, multivariate input data to traffic flow data with multiple vehicle classes. LSTM was selected due to its ability in mapping sequences to sequences making it suitable for handling time series data [28]. As input to our network, we used data imported from different sources and therefore developed a feature engineering process for fusing data points with different resolutions and characteristics. The input data consists of following variables; weather, accident and temporal attributes. Historical traffic data were used only at the training phase as ground truth. Fusion of most of the data points was quite straightforward, however modeling the impact of accidents on traffic flow required sophisticated methods and was done using power law distribution [29].
We evaluated the performance of our traffic predictor via experimenting. Analysis show that the predictor is able to learn to model traffic flow based on the time, weather and traffic accident information. Additional comparative experiments show that the performance of the traffic prediction model is improved by using the power law based traffic accident feature presented in this paper. The paper is organized as follows. First in Section 2, we will discuss related research, then in Section 3 the traffic prediction pipeline is presented, Section 4 presents the experiments and related results. Finally, in Section 6 we will conclude the work and propose future research.
2 Related Research
The selection of the traffic prediction model is determined by the application and its objectives. Early studies on traffic prediction employed models based on fluid dynamics, assuming that a sufficiently large number of vehicles can be regarded as fluid streams [10]. Additionally, continuum models have been utilised for traffic flow modeling. In order to make the vehicle model more realistic, high order models, such as the Lighthill-Whitham-Richards (LWR) model [30, 31], have been employed to incorporate the effects of inertial and driver anticipation on vehicle speed [10]. However, modeling the traffic as a fluid flow presupposes the presence of a large number of vehicles in the simulated road section, which may not always be realistic. To address this issue, non-continuum models were presented [32], where each vehicle is treated as a dynamical system in itself. Nevertheless, in non-continuum models, every vehicle is treated as an identical particle, meaning that there is no differentiation among vehicle classes [9].
In addition to previously mentioned ARIMA, other statistical models have been used. Those include multivariate linear regression [33], Historical Average (HA) [34], Seasonal Autoregressive Integrated Moving Average (SARIMA) [35, 36] and probabilistic models like Bayesian networks [18]. However, the statistical models impose certain limitations as they rely on the assumptions of stationarity and linearity of the datasets, which may not hold true for traffic flow data.
Most modern traffic prediction methods use deep learning. The intended output of a prediction algorithm is some desired traffic metric, such as flow, density, volume or speed [40]. The methods generally try to do short-term predictions using recent traffic data to estimate the situation in the near future, mainly at the horizon of minutes or hours. Usually the models use previous traffic data to predict future timesteps. For example, [37] predicts traffic for up to 45 minutes using past traffic data as inputs. Some previous methods have also used only non-traffic related variables to predict traffic metrics like congestion [21] or speed and volume [22]. LSTM networks are generally suitable for mapping different types of sequential input data into other sequences, like video frames into captions [38]. This would suggest that they could also be used predicting multi-step traffic flow data using input variables from other sources. GANs [39] can also be used to learn to generate representative samples without inputed past traffic data. In [20], a GAN is used to generate traffic scenarios by combining a bidirectional LSTM-network and a Convolutional Neural Network (CNN) as the generator and discriminator, respectively. The generator is trained with only historical traffic flow data. In addition to the limited training information, generation of scenarios can’t be controlled making it impossible to consider specific circumstances, like time of day or season. In contrast, [19] uses a conditional GAN (CGAN) which is fed with auxiliary information to condition the output to generate traffic flow for different days of the week. However, the generative model is still trained using only traffic flow data.
Some previous methods combine traffic data with other input sources, like weather conditions or accidents. In [41] the weather status is encoded into one of three different states (sunny, rainy, snowy) and used as an input to an LSTM-model with traffic data to predict congestion. Traffic and accident data were fused in a Mixture Deep LSTM network to forecast traffic speed in peak-hour and post-accident conditions [42]. Similarly, hand-crafted, multidimensional traffic incident data has been used to forecast the impact on speed, volume and occupancy [43]. However, none of the existing methods combine both weather and accident features. The data output by most of the previous methods is one dimensional and does not differentiate between different vehicle types, although exceptions for this observation exist such as [44].
Our model maps temporal, weather and traffic accident data to traffic flow, using historical traffic data with vehicle types only for training, making it possible to predict traffic data with multiple vehicle classes for various scenarios without using past traffic flow data during the inference phase. Inclusion of data from multiple sources is shown to improve the accuracy of the prediction.
×
The novelty of our research is in the development of a traffic prediction framework processing multi source, multivariate input data in an LSTM deep neural network. The goal is to predict traffic flow in a single reference point without needing to use past traffic flow data during inference by mapping various non-traffic data into traffic flow data. Based on our knowledge, our method is the first traffic flow prediction algorithm processing data from such a wide range of sources. As input to our network, we use data imported from various sources and therefore developed a feature engineering process for fusing data points with different resolutions and characteristics. Inclusion of the accident data is especially complex, and for the purpose we modeled it using a power law function. The input data of the resulting LSTM framework consists following input variables; weather, accident and temporal attributes.
3 Methodology
Our novel traffic prediction pipeline maps three multimodal input variables; time, weather and traffic accident information into traffic flow data with multiple vehicle classes. As the traffic slowdown and congestion due to accidents are not linear functions of time, we model the effect using a novel input feature creation approach based on power law. Our method predicts traffic flow data for one spatiotemporal point, namely a point on the road entering our simulation area at a certain time epoch. The traffic prediction framework is based on a multilayer LSTM-network.
Here, we first discuss the main functionalities of the preprocessing and combining of the inputs of our predictor followed by a description of the prediction architecture with the LSTM network. Then we present the feature engineering step with the main emphasis on the power law function.
3.1 Traffic Prediction Architecture with an LSTM Model Core
The traffic prediction pipeline, presented in Fig. 1, consists of 1) feature engineering the multimodal raw data, 2) concatenating the resulting features into an input data structure, 3) scaling the input data values into the predictor input and 4) generating the output data via mapping the sequential input data \(({\textbf {X}})\) into sequential output data \(({\textbf {Y}})\) in an LSTM network.
The multimodal raw input data consists of weather data, traffic accident data and temporal data. We mark these datasets as \({\textbf {C}}\), \({\textbf {A}}\) and \({\textbf {T}}\) respectively, where \({\textbf {C}} \in \mathbb {R}^{n \times c}\), \({\textbf {A}} \in \mathbb {R}^{n \times 1}\) and \({\textbf {T}} \in \mathbb {R}^{n \times k}\). Values in \({\textbf {C}}\) are nonnegative real numbers where as the values of \({\textbf {A}}\) and \({\textbf {T}}\) are nonnegative integers. The columns include different features in the datasets, namely c weather and k time features, and rows n are the timesteps composing the temporal extent of the data.
First, if there exists missing values in \({\textbf {C}}\), linear interpolation function is applied to each column and as a result we get matrix \({\textbf {C}}^*\). Linear interpolation is suitable for filling missing values if there is no large blocks of missing data as was the case for our training dataset. The values in matrix \({\textbf {A}}\) are the time durations since last traffic accident in minutes. A negative exponential function is applied to each element in \({\textbf {A}}\) in order to have its value decay according to a power law as explained in Section 3.2. As a result we get the processed matrix \({\textbf {A}}^*\), where the values are nonnegative real numbers. Each column of matrix \({\textbf {T}}\) contains temporal data at different resolutions, (year, month, day, hour, minute). After preprocessing, the feature matrices are combined horizontally to get the input matrix \({\textbf {X}}\).
where \(m=c+1+k\). The specified number of time steps defines the number of input variables \(({\textbf {X}})\) used to predict the same number of time steps \(({\textbf {Y}})\). The LSTM network used as the predictor is trained with a selected number (s) of timesteps, meaning that the input matrix needs to be modified to have dimensions \(s \times m\).
All of the values \((x_{tj})\), where \(t \in 1,2,...,s\) is the timestep, in the input data are scaled between [0, 1] using a MinMax scaler. This is a common procedure when training neural networks, as it speeds up convergence on a stable-point final solution [45]. MinMax scaling transforms each value \((x_{ij})\) into its scaled counterpart \((x_{scaled})\) as
where \(x_{\text {min,j}}\) is the smallest value and \(x_{\text {max,j}}\) is the largest value in the jth column of the training dataset. A complete row of the input is marked as \(( {\textbf {x}}_t)\). Similarly, one row of the output is marked as \(( {\textbf {h}}_t)\). For each input matrix consisting of s rows \({\textbf {x}}_t\), the output is a matrix of s times \({\textbf {h}}_t\), where \(|{\textbf {h}}_t| = v_c\), where \(v_c\) is the number of different vehicle classes depending on their occurrence in the input data. As a result, the predictor maps input \(({\textbf {X}})\) to output \(({\textbf {Y}})\) as
The prediction method uses a multilayer LSTM network, which was originally presented in [28].
3.1.1 Long Short Term Memory Network
Long Short-Term Memory (LSTM) [28] is a type of a Recurrent Neural Network (RNN). LSTM networks can be used for prediction and classification tasks similarly to conventional feed-forward neural networks but they are suitable for handling sequential datasets, such as time series data. LSTM model was developed as an improvement over regular RNN in order to avoid the problem of exploding and vanishing gradients and thereby being suitable for long-term predictions.
The LSTM architecture is as follows. For each input (\({\textbf {x}}_t\)) at timestep (t) LSTM applies the functions
where \({\textbf {h}}_t\) and \({\textbf {c}}_t\) are hidden and cell states at time t, respectively, \({\textbf {h}}_{t-1}\) is the hidden state at the previous timestep \(t-1\) (or the initial hidden state \({\textbf {h}}_0\) at time 0), and \({\textbf {i}}_t\), \({\textbf {f}}_t\), \({\textbf {g}}_t\) and \({\textbf {o}}_t\) are the input, forget, cell, and output gates, respectively. The gates use the sigmoid function, marked as \(\sigma \), to control the information flow. \({\textbf {W}}\) and \({\textbf {b}}\) are the weights and biases to be learned, while \(\odot \) is the Hadamard product. The LSTM network architecture is presented in Fig. 2.
×
The core of the LSTM is the cell state \({\textbf {c}}_t\), which is updated through the input sequence. Another state that is contained and updated through the input sequence is the hidden state \({\textbf {h}}_t\), which is actually the output of the model as presented before and is calculated for every \({\textbf {x}}_t\). The hidden state is also passed on to compute the next value in the sequence. Different gates control the flow of inputs through the network. Forget gate and input gate control how much of the value of the cell state of previous step should count compared to the value of the input of the current timestep when calculating the new cell state. Output gate controls how much of the memory in the cell state \({\textbf {c}}_t\) is used to calculate the output.
×
Multi-layer LSTM, a stable technique for challenging sequence prediction problems, works similarly to the single layer LSTM presented in Fig. 3. The multi-layer, or stacked, LSTM is an extension to the original LSTM model which comprised of a single hidden LSTM layer followed by a standard feedforward output layer. It has instead multiple hidden LSTM layers where each layer contains multiple memory cells and the upper level layers provide a sequence output rather than a single value output to the layer below. The outputs, namely hidden states \({\textbf {h}}_{t}\), of the previous layer are used as the elements of input sequence \({\textbf {x}}_t\) for the next layer. Multi-layer LSTMs have an option of using a method called dropout, which is a popular technique for improving the training performance of neural networks [47]. When dropout is used, each hidden state element \({\textbf {h}}_t\) from the previous layer is multiplied by a Bernoulli random variable before being used as inputs to the next layer. These random variables are 0 with a probability that is set as a hyperparameter. Using dropout essentially sets a portion of the network parameters to zero making the model less likely to overfit.
3.2 Traffic Accident Modeling
Traffic accidents cause traffic slowdown and congestion [27, 49, 50], meaning that there exists a negative correlation between traffic accidents and traffic flow. Our method assumes that traffic accidents happen at the surroundings of our traffic prediction point and as a result, fewer vehicles pass the point over each time interval and therefore the amount of vehicles predicted should be decreased accordingly. The effect of an accident on the traffic congestion and thereby the speed is large immediately after the accident and then starts gradually decreasing. The recovery duration from an event creating traffic congestion follows a distinct power law distribution. Zhang et al. [29] defined the recovery duration as the time it takes from median traffic speed dropping below a set threshold to again revert above it.
Power law describes the relationship between two quantities where the relative change in one leads to the proportional relative change in the other, meaning that one quantity varies as a power of another [48]. Power-law behavior has been identified in various natural and man-made systems. Power-law decay f(t) refers to phenomenon where a quantity or effect decreases over time according to the power law. The quantity over time is
where t is the time in minutes from the occurrence of the phenomenon, \(\beta \) is the power law exponent and C is a normalizing constant to ensure that \(\int f(t) dt = 1\).
3.3 Exponent Value Estimation
The power law exponent \(\beta \) does not have a theoretical value, but it must be estimated by experimenting. We used our traffic accident and flow datasets to fit the traffic recovery to estimate \(\beta \). We chose the traffic congestion observations related to the times of respective traffic accidents near our traffic flow measurement point for fitting. We selected traffic accidents that had happened within two kilometer radius of the traffic measuring point, regardless of the road direction, and calculated traffic congestion events within two hours after the selected accidents. More information on the traffic accident dataset can be found in Section 4.1. Traffic congestion events were noted when the median speed of the traffic dropped below a threshold 20 kilometers per hour. Traffic recovery time was defined as the time in minutes from the congestion event until the traffic speed was again over the threshold. The traffic recovery times (t) were gathered into a dataset. The value for C was set as 1.
The power law exponent \(\beta \) may be estimated from the traffic recovery data using maximum-likelihood estimation [29]. If the recovery data is incomplete, logarithmic binning of data may be used [51]. Fortunately, our experimental data was complete and we were able to fit it and estimate the power law exponent using maximum-likelihood [52]. This estimated value was then used to create the traffic accident feature as detailed in Section 4.3.
4 Experiments
The performance of our traffic prediction pipeline was evaluated via experimentation discussed in this section. We will first present our data and the preprocessing steps, especially related to traffic accidents, metrics used for assessing the performance, model’s parameters and finally the results.
4.1 Datasets and Preprocessing
Our predictor was trained and evaluated with a dataset consisting of multimodal data. The data contained traffic flow, weather and traffic accidents collected in Helsinki, Finland over the period from 2011 to end of 2019. Both traffic flow and weather data are publicly available from a repository provided by the city of Helsinki. The traffic data was originally collected from Traffic Measurement System -stations (TMS) by a Finnish organization called Fintraffic [53].1 The set we acquired contained initially data for ten years, 2011 - 2021. However, we evaluated the years 2020 and 2021 to contain anomaly traffic data due to the massive Covid-19 pandemic lock-down events and decided to leave them out.
Traffic flow here is a measure of the number of vehicles passing a single point on a multi lane road traveling south towards the Helsinki city center accumulated over 15 minutes and measured at a point located along a long road that travels from north to south towards the center of Helsinki. The area included, in addition to this large main road, multiple small side roads going east and west. The area had also a traffic circle at the south end, with roads exiting from the area, as shown in Fig. 4. The TMS-station was located at the north end of the road. The raw dataset contained one row for each vehicle passing the measuring point. Description of the raw data can be seen in Table 1. The vehicles were divided into seven classes. A sum of vehicle of each type passing the measuring point during the 15 minutes measuring period was calculated to transform the raw data into a time series dataset. Data included 63 days of faulty data which was left out. The final traffic dataset contained data for 309408 measurement periods.
×
The seven vehicle classes were 1) passenger cars or vans, 2) trucks without a trailer, 3) busses, 4) trucks with a semi-trailer, 5) trucks with a trailer, 6) passenger cars with a trailer and 7) passenger cars with a mobile home. Out of all vehicles \( 94.4 \% \) were passenger cars or vans. The percentages of other vehicle types in the data can be seen in Table 2.
Table 1
Features in the raw TMS dataset
Feature name
Description
Range
Id
Id of the TMS station
151
Year
year
2011, 2012, ..., 2020
Day
day
1,2,...,365
Hour
hour
0,1,...,23
Min
minute
0,1,...,59
Sec
second
0,1,...,59
Hundreth
hundreth of a second
0,1,...,99
Length
length of the vehicle
[0, 25.4]
Lane
lane of the vehicle
1, 2, 3, 4
Direction
direction of the vehicle
1, 2
Type
type of the vehicle
1, 2, 3,...,7
Speed
speed of the vehicle
[0, ]
From the traffic flow dataset we also formed time features. The considered time features were year, month, day of week, hour and minute. These features were critical to be included, since time of the day or day of the week have substantial effects on traffic flow. Our traffic flow data followed clear patterns, it was very different during weekends and had morning and afternoon peaks.
The traffic accidents dataset contained accidents that had happened in the city of Helsinki between the start of 2011 and end of 2019. Each row in the dataset corresponded to one accident and contained event’s coordinates and time (year, month, day, hour). The occurance of each accident was marked at the resolution of one hour. The dataset used contains 20 385 unique accidents.
Our weather data was taken from a open online repository managed by the Finnish Meterological Institute. Information about the weather from multiple measuring stations in Finland is available to download.2 Our weather data contained seven different variables, detailed in Table 3. The data was captured in Helsinki, Finland over the period starting from January 2011 and ending at the end of December 2019. Each chosen variable in the dataset had missing values. Since their proportion was significantly small and sparse, linear interpolation was utilized to compensate for missing values [54].
Table 2
TMS vehicle types
Type
Explanation
n
%
1
passenger car or van
35 622 137
94.4
2
truck without trailer
885 642
2.3
3
busses
808 548
2.1
4
truck and semi-trailer
180 042
0.5
5
truck an trailerd
37 381
0.1
6
passenger car and trailer
131 285
0.3
7
passenger car and mobile home
32 667
0.1
-
all vehicle types
37 697 702
100
Table 3
Features in the weather dataset
Feature
Unit
Range
Missing
name
values %
Air pressure
hectopascal (hPa)
[964.0, 1055.8]
0.218
Humidity
percentage (%)
[13, 100]
0.217
Rain
millimeter/15min
[0, 9.13]
0.334
Snow depth
centimeters (cm)
[0, 74]
0.361
Temperature
degree celsius
[-26.0, 32.65]
0.204
Visibility
meters (m)
0.993
Wind speed
meters/second
[0, 16.2]
0.592
The original weather dataset included data averaged over 10 minute intervals and it included data for two variables, temperature and rain intensity. To make the data compatible with the traffic flow dataset it was changed to have data over 15 minute intervals by average interpolation. For example to get the temperature value (in celsius) for 08:15, the average of temperatures in 08:10 and 08:20 was taken. The rain intensity expressed the magnitude of rain in an hour with constant rate and after converting data to correct intervals its unit was mm/10min.
4.2 Metrics
The performance of our predictor was evaluated using three accuracy metrics and a test dataset sampled from our preprocessed data. The accuracy metrics were Mean Absolute Error (MAE), Mean Squared Error (MSE) and coefficient of determination (\(R^2\)), which are defined as
where \(\overline{y} = \frac{1}{n}\sum _{i=1}^n y_i\), \(y_i\) is the ground truth number of vehicles, \(\hat{y}_i\) is the number of vehicles output by the predictor and n is the number of rows of the dataset that is used to calculate the metric. For Mean Absolute Error and Mean Squared Error a lower value is better where as for Coefficient of Determination a higher value is better.
To evaluate the generalizability of our predictor, we used three additional metrics; symmetric Mean Absolute Percentage Error (sMAPE), Relative Absolute Error (RAE) [55] and GEH-statistic. These metrics are not as commonly used as MAE, MSE or \(R^2\) but they allow for comparing prediction accuracy across different datasets. They are defined as
where m and o are the predicted and ground truth hourly traffic counts. A lower value is better for each of these three metrics. The GEH-statistic is calculated for each hour of predicted traffic flow separately. Prediction for that hours traffic is considered good if the the value of the GEH statistic is less than 5 [56]. Therefore the proportion of GEH-statistics under 5 is recorded as the metric to evaluate the overall performance of the predictor over the entire test dataset.
4.3 Experimenting the Effect of Modelling Traffic Accidents
To evaluate the effect of using power law based traffic accident modelling, we run the predictor with four different input dataset produced by our processing pipeline: 1) traffic accidents modelled using power law, 2) binary accident modelling, 3) summed accident modelling, and 4) accident data excluded. The binary and summed accident models are very simple and were used to evaluate the relevance of using the advanced power law based model. The predicted traffic flows were compared with ground truth data and evaluated using the metrics described above.
Also, we wanted to evaluate the effect of limiting the accident information to only local accidents instead of using the knowledge from the whole city area. The motivation behind the test was to see, if congestion due to distant accidents (even at 20 kilometers range) would be resolved without affecting our measurement point and thereby would not have an effect on the learning process. In previous research related to impact of traffic accidents, only local accidents had been used [57, 58]. Thereby, we ran the predictor twice with each test dataset, first with all accidents recorded at the city area and then with local accidents that happened only at the road with the measurement point.
×
×
4.3.1 Datasets
A simple way to include traffic accident information is to use a binary feature that has the value of 1 for hours where at least one accident has happened and 0 otherwise. Another similar feature is a summed one, where the number of accidents happening at the same time interval are summed.
The power law based accident modeling was done using power law decay described in Eq. 3.2. The exponent \(\beta \) was estimated to be 1.73 using the method presented in 3.3.
The accident feature was set as
$$\begin{aligned} f(t) = {\left\{ \begin{array}{ll} 0,\quad & t < 15\\ t^{-\beta },\quad & 15 \le t \end{array}\right. } \end{aligned}$$
(18)
where t is time after the accident in minutes.
The predictor was trained five times for each dataset and the performance metrics Mean Absolute Error (MAE), Mean Squared Error (MSE) and coefficient of determination (\(R^2\)) were calculated. To visualise the results, a 1.5 IQR value boxplot was used. In the boxplot, the median value is marked as a line inside the box. The borders of the box mark the first and third quartiles. The whiskers of the boxplot extend to 1.5 times the inter-quartile range and points outside them are considered outliers. In order to compare the models, both the median and the variability, i.e. the height of the box and the whiskers need to be considered.
Results are plotted in Fig. 5. When the input data included the binary accident model, the predictor didn’t perform any better than when using the data without an accident feature. The median values for MAE (24.8 and 25.0), MSE (39.1 and 39.0) and \(R^2\) (0.8 and 0.8) were almost the same. Summed accident features resulted in the median values of Mean Absolute Error (22.5) and Mean Squared Error (36.2) over multiple training iterations being lower, Coefficient of Determination higher (0.8) and the variance in the results over training iterations lower than the previous two for all metrics. The data with accidents modelled using the power law resulted in the lowest error (21.7 for MAE, 34.5 for MSE) and highest Coefficient of Determination (0.8) while also having the lowest variance.
To test the effect of including only very local accidents, we repeated the tests using events that had happened on the same road as our traffic measuring point. Results are plotted in Fig. 6. From the results it can be seen that excluding other than local accidents did not improve the results in any significant way. The median error for using a model with summed local accidents were 24.3 for MAE (compared to 22.5 for a model trained with all accidents), 38.4 for MSE (compared to 36.2 for a model trained with all accidents) and 0.8 for \(R^2\) (compared to 0.8 for a model trained with all accidents). The median errors for a model using the power law based feature with only local accidents were 21.9 for MAE (compared to 21.7 for a model trained with all accidents), 34.6 for MSE (compared to 34.5 for a model trained with all accidents) and 0.8 for \(R^2\) (compared to 0.8 for a model trained with all accidents).
Based on these experiments, it was concluded that modeling the effect of traffic accidents with power law predicted the most realistic traffic flow. Traffic accident models used during the experiments can be found on Table 4.
4.4 Model and Training Parameter Selection
Our multilayer LSTM network was implemented using PyTorch library [46]. After preprocessing the data and selecting the power law based accident feature, the final, optimal model hyperparameters were found via experimentation. Since the performance and accuracy of LSTM-models is influenced by the chosen hyperparameters, it is necessary to find the optimal parameters for the best model. In addition to experimenting our method with optimized LSTM model’s parameters, we evaluated it using Pytorch library’s default ones, both shown in Table 5. The choice for the number of layers and neurons depends upon the dimensionality of the data along with other concerns. Our dataset was divided into training and test dataset by 80/20 % split. Since our dataset has time series, it was necessary to not shuffle the data before splitting it, therefore the test dataset included the last 20 % of the original dataset temporally, meaning that it contains data mostly from 2019.
4.5 Results
Since the goal of our research was to develop a prediction method able to predict new traffic flow scenarios based on the input situational and environmental parameters and not to provide a replica of any historical traffic instance, comparing the output to the distribution of our reference data wasn’t ideal. Nevertheless, using accuracy metrics to compare traffic flow predictions with historical traffic flow allowed us to show that the predictor performance had improved by using traffic accident information.
In Tables 6, 7 and 8 different implementations of the traffic predictor are compared. The comparison is between a model that doesn’t use traffic accident information as input, a model that uses traffic accident information, and a model that uses traffic accident information and LSTM hyperparameters optimized to our specific data and setup. Each model was trained 20 times and the metrics were averaged. This was done since the training of neural networks is stochastic, leading to a possibility that one model would perform unusually well just by chance. The comparison is done for three different accuracy metrics and the results are shown for all seven different vehicle classes.
Table 4
Accident features used during the experiments to find the best accident feature
Accident feature
Explanation
summed accidents
The feature is the sum of accidents that has happened during each hour, otherwise 0
binary accidents
The feature is 1 during each hour that there has been an accident, 0 otherwise
summed accidents (local)
Same as ‘summed accidents’, but uses only local accidents
power law
Based on a power law distribution with beta = 1.73
power law (local)
Same as ‘power law’, but uses only local accidents
Table 5
Hyperparameter settings
Hyperparameter
Default
Optimized
model
model
Number of hidden layers
3
2
Number of neurons
512, 256, 64
64, 32
Dropout rate
0.2
0.5
Length of output sequence
8
8
Optimizer
Adam
Adam
Loss
MSE
MSE
Epochs
100
100
Batch size
100
100
Learning rate
0.001
0.001
Weight decay
1e-06
1e-06
Train/test ratio
20%
20%
Table 6
Mean Absolute Error (MAE, lower is better)
Vehicle
Default
Default
Optimized,
Improvement
type
model,
model,
model,
%
no
with
with
accidents
accidents
accidents
1
27.3545
25.7138
19.7180
27.9170
2
1.2344
1.3323
1.0841
12.1810
3
1.0010
0.9659
1.0412
-4.0179
4
0.9297
0.9372
0.9221
0.8209
5
0.0968
0.0977
0.0970
-0.1338
6
0.3392
0.3437
0.3333
1.7622
7
0.1223
0.1384
0.0833
31.8914
Table 7
Mean Squared Error (MSE, lower is better)
Vehicle
Default
Default
Optimized,
Improvement
type
model,
model,
model,
%
no
with
with
accidents
accidents
accidents
1
41.5009
41.247
32.9188
20.6794
2
1.8380
2.1577
1.7551
4.5101
3
1.3425
1.3262
1.4272
-6.3135
4
1.3208
1.3350
1.3100
0.8211
5
0.3451
0.3463
0.3453
-0.0482
6
0.6559
0.6727
0.653
0.4501
7
0.3651
0.3841
0.3124
14.4247
Tables 6, 7 and 8 show that the prediction model using the power law accident feature and optimized hyperparameters performed the best according to the three accuracy metrics used. This was the case over most of the vehicle classes over all the accuracy metrics. Exceptions were vehicle types 3 (busses) and 5 (truck and trailer), but the difference was negligible. Averaging over all vehicle types, using traffic accident information and optimized hyperparameters resulted in \(10.0 \%\) improvement in MAE over the default model that didn’t use accident information. Biggest improvement was in vehicle type 1 (passenger car or van) where the improvement was \(27.9 \%\). On average, MSE was improved by \(4.9 \%\) and \(R^2\) was improved by \(3.1 \%\). With MSE and \(R^2\), improvements in vehicle type 1 were \(20.6 \%\) and \(9.0 \%\) respectively.
In Table 9 we have compared our optimized LSTM-based traffic predictor to two well known baseline statistical methods; multivariate linear regression and historical average. For multivariate linear regression, we fitted our training dataset and predicted the traffic flow counts of our test dataset. For historical average, we calculated the mean traffic flow counts for each vehicle type at each possible 15-minute time intervals in the training dataset and used those means as the predictions for the test dataset. We can see that our predictor outperforms the two baselines, especially when predicting the number of passenger cars (class 1), which comprise most of the vehicles in the dataset. The Mean Square Error in particular is especially high for both linear regression and historical average. Based on this comparison, learning to predict traffic flow is too complex of a task for simple statistical methods.
Table 8
Coefficient of determination (\(R^2\), higher is better)
Vehicle
Default
Default
Optimized,
Improvement
type
model,
model,
model,
%
no
with
with
accidents
accidents
accidents
1
0.8057
0.8086
0.8786
9.0523
2
0.6534
0.5220
0.6840
4.6789
3
0.0948
0.1153
-0.0226
-24.8024
4
-0.5649
-0.5989
-0.5379
6.2010
5
-0.0855
-0.0923
-0.0866
1.3650
6
0.0867
0.0397
0.0949
9.3823
7
-0.4746
-0.6405
-0.0767
16.1548
Table 9
Comparison between linear regression, historical average, and LSTM
Vehicle
Linear
Historical
LSTM,
type
Regression
Average
optimized
(HA)
(HA)
model
MAE
MSE
\(R^2\)
MAE
MSE
\(R^2\)
MAE
MSE
\(R^2\)
1
38.0992
2997.8955
0.6645
31.6062
2580.8309
0.7112
19.7180
32.9188
0.8786
2
1.7734
5.1984
0.4668
1.6521
6.1171
0.3726
1.0841
1.7551
0.6840
3
1.0928
1.9157
0.0387
1.2651
2.8723
-0.4414
1.0412
1.4272
-0.0226
4
0.9411
1.5805
-0.4160
0.9322
1.7533
-0.5708
0.9221
1.3100
-0.5379
5
0.1780
0.1036
0.0561
0.0968
0.1191
-0.0855
0.0970
0.3453
-0.0866
6
0.4257
0.3835
0.1858
0.3804
0.4612
0.0209
0.3333
0.653
0.0949
7
0.2003
0.0904
0.0023
0.0833
0.0976
-0.0765
0.0833
0.3124
-0.0767
In order to analyse the errors, we compared the distributions of the traffic flow data in the test set and the distribution of the traffic flow output of the predictor using the accident feature and optimized LSTM hyperparameters. Different vehicle classes were summed together. General statistics of the historical traffic data and predictor output are in Table 10. The graphical comparison of the distributions is in Fig. 7. Comparisons show that the distributions of historical traffic data and the predictor output are quite similar, but not identical. Main differences between distributions include longer tail in the historical traffic flow data and larger number of time intervals with very low number of vehicles. Other difference is a spike in the histogram of the predicted data between 100 and 200 vehicles while in the histogram of the real data the plot is smoother around this area. This is caused from the predictor output data having a larger number of time intervals where the number of vehicles is concentrated around this area, meaning that the predictor tends to output similar traffic numbers while real traffic numbers are more variable. These issues are caused by certain degree of averaging of the traffic prediction method, where the outputs for similar situations are uniform where as in reality the situations can vary.
Table 10
Comparing historical traffic data and respective predictor output
Historical
Predictor
traffic data
output
mean
100.971
104.349
variance
9984.457
8992.414
standard deviation
99.922
94.828
minimum
0
0
maximum
510
451
To further analyse the errors of the predictor output, we calculated the difference between the real traffic flow for each time interval in the test dataset and the predictor output. These errors were calculated for all vehicle types (Table 2). Error histograms can be seen in Fig. 8. The errors for all vehicle types are concentrated close to zero and distributed smoothly around the peak, with small number of larger errors. This shows that there is no bias in the prediction errors. The errors are rather caused by noise. These errors are likely outliers, as in the test dataset there were some cases where the number of vehicles suddenly dropped to zero in the middle of the day. This was caused by measurement errors in the original traffic flow dataset.
We have additionally provided few examples of our predictors performance in Fig. 9, where we have plotted both the traffic flow taken from historical data used to evaluate the performance of our prediction method, and the traffic flow that was predicted by using the information related to those days as inputs. Traffic flow was predicted in two hour blocks.
×
×
To test the generalizability of our framework with regards to other locations, we selected three other locations in Helsinki where suitable data was available and trained our framework based on the data of these locations. We followed all of the preprocessing steps detailed earlier in this work for each location. The locations can be seen in Fig. 10, where the original location for this study is marked with 0 and the other locations are marked as 1, 2 and 3. In Table 11 we compared the accuracy of the other locations to the accuracy results of the predictor trained on the original location chosen for this study. Each metric is calculated for the total number of vehicles predicted. Notice that the value of MAE for the original location is slightly different that it was in Tables 6 or 9. This is due to the metrics being calculated as the average accuracy over multiple trained models to reduce the effect of stochasticity in training the LSTM model.
The mean absolute error for the original location is 18.9948, while for the other locations is is 63.8007, 40.5363 and 39.3445 respectively. However, mean absolute errors are not directly comparable due to the fact that the traffic counts vary greatly in different areas. Because of this we also use other metrics; symmetric Mean Absolute Percentage Error (sMAPE), Relative Absolute Error (RAE) and the GEH statistic. For SMAPE the metrics for different locations were 0.2327, 0.2420, 0.2281 and 0.2453 respectively while for RAE the metrics were 0.2307, 0.2563, 0.2122 and 0.2291 for different locations. Since the values for sMAPE and RAE were in similar range across all the locations, it suggests around similar performance for the predictor trained for different datasets. For GEH-statistic, we marked the proportion of predicted hours where the calculated statistic was under 5, which is considered the limit for successful prediction. For the original location, proportion of GEH-statistics under 5 was 0.8518 while for other locations the proportion was 0.5364, 0.5952 and 0.6108, meaning that for all locations most of the hourly predicted intervals had a GEH-statistic less than 5. When considering GEH-statistic and other metrics, it should be noted that our framework predicts traffic flow based on non-traffic input, which makes the prediction considerably harder. These metrics suggest that the our framework is able to learn to predict traffic flow in different locations meaning that it is generalizable with other environments as long as suitable training data is available.
Table 11
LSTM trained on datasets from different areas
Original
Location 1
Location 2
Location 3
Metric
MAE
18.9948
63.8007
40.5363
39.3445
sMAPE
0.2327
0.2420
0.2281
0.2453
RAE
0.2307
0.2563
0.2122
0.2291
GEH
0.8518
0.5364
0.5952
0.6108
×
×
5 Discussion
This study presented a novel pipeline for predicting traffic flow data with multiple vehicle classes from raw, multi source input variables using an LSTM-based deep learning model. Historical traffic flow data was used as ground truth during training of the model. To include the effect of traffic accidents on traffic flow, a novel method of using power law decay to transform the traffic accident data into a form compatible with the other time series inputs was presented.
The main differences in our approach compared to other traffic prediction methods is that our predictor is based on non-traffic variables rather than on traffic that has occurred earlier and that accident information is used as one of the inputs. Predicting traffic flow by mapping temporal, weather and traffic accident data into traffic flow data presents an approach that has so far been rather underrepresented in traffic prediction research. Due to lack of research that uses this type of input and output combination, comparing our results to previous work is challenging. By investigating the relationship between traffic flow data and non-traffic variables we hope to further the larger field of Intelligent Transportation Systems research.
As seen in Fig. 9, our traffic predictor is able to produce traffic that closely corresponds to historical traffic data. This suggests that a machine learning methods, like LSTM chosen for our framework, is able to learn the traffic patterns in the area of our study. Partially this is due to high flexibility of machine learning models, but partially it is also due to the fact that the traffic follows certain patterns that depend on the time of day and weekday.
Our study was conducted in Helsinki, Finland, which is located in northern Europe. Compared to many large cities around the world, traffic in Helsinki is relatively sparse. This needs to be taken into account when attempting to use this model for predicting traffic in different environments. All of our experiments were done on data based in Helsinki, since that is where we were able to access the suitable data.
Even though we have conducted experiments for our framework on multiple areas and datasets, our study is limited in that our method has been tested only on a single city from where we acquired our training data sets. However, the framework presented in this work should be able to predict traffic flow in other environments as long as suitable training data is available. Since the framework of our method is based on LSTM architecture, which has been shown to be applicable for various sequence-to-sequence tasks in different fields, our traffic prediction should reasonably be expected to be applicable for predicting traffic in other environments as well.
6 Conclusions
This work presented a traffic prediction framework for predicting traffic flow from non-traffic input variables. As far as we know, no other LSTM-based traffic prediction method has attempted to map multiple input variables into traffic flow data with multiple vehicle classes without using past traffic data also during the inference phase. After adding the traffic accident information and the optimized LSTM hyperparameters, the predictor was able to predict accurate traffic flow for the area from where the dataset used to training was collected. Based on further experiments, our prediction framework is also able to learn to predict traffic flow in other locations if the training data is available.
Our future work includes optimizing the predictionframework in order to make the predictor output distribution match the original data better and developing new methods to compare traffic flow data quantitatively. Our framework could be extended to predict traffic flow on multiple locations simultaneously.
Acknowledgements
This work was partly supported by the Academy of Finland Flagship program: Finnish Center for Artificial Intelligence FCAI, the Academy of Finland project 332177 Sustainable urban development emerging from the merger of cutting-edge Climate, Social and Computer Sciences (CouSCOUs) and the University of Helsinki.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
All authors consent to publish.
Competing Interests
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.
Order your 30-days-trial for free and without any commitment.
Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.
Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.
Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.