Skip to main content
Erschienen in: Arabian Journal for Science and Engineering 10/2020

Open Access 04.05.2020 | Research Article-Civil Engineering

Extraction of Vehicle Turning Trajectories at Signalized Intersections Using Convolutional Neural Networks

verfasst von: Osama Abdeljaber, Adel Younis, Wael Alhajyaseen

Erschienen in: Arabian Journal for Science and Engineering | Ausgabe 10/2020

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper aims at developing a convolutional neural network (CNN)-based tool that can automatically detect the left-turning vehicles (right-hand traffic rule) at signalized intersections and extract their trajectories from a recorded video. The proposed tool uses a region-based CNN trained over a limited number of video frames to detect moving vehicles. Kalman filters are then used to track the detected vehicles and extract their trajectories. The proposed tool achieved an acceptable accuracy level when verified against the manually extracted trajectories, with an average error of 16.5 cm. Furthermore, the trajectories extracted using the proposed vehicle tracking method were used to demonstrate the applicability of the minimum-jerk principle to reproduce variations in the vehicles’ paths. The effort presented in this paper can be regarded as a way forward toward maximizing the potential use of deep learning in traffic safety applications.

1 Introduction

Road traffic safety is increasingly an issue of global concern. Recently, it has been estimated that 1.4 million people die and 73.25 million get disabled annually as a result of road traffic injuries worldwide [1]. Globally, the annual cost estimation for deaths, injuries, and disabilities due to road crashes is approximately 518 billion dollars, which makes up around 1.5% of the gross national product for middle-income countries [1]. Intersections are recognized as the most complex locations within a highway system, in which conflicts are easily generated, and thus traffic crashes are more likely to occur [2]. Despite them constituting a small part of the highway systems, intersection-related crashes share over 50% of all crashes in urban areas and 30% of those in rural regions [2]. Therefore, intersections are deemed crash-prone locations due to the large number of conflict points between traffic streams moving in different direction. Turning traffic has a major role in the safety performance of intersections due to the nature of their maneuvers which are usually characterized with significant variations in paths and speeds depending on drivers’ targeted exit lane, their instinctive judgment, intersection geometry, and other factors [3]. As left-turning vehicles (right-hand traffic rule) pass the stop line to the intersection zone, their driving routes are often changed randomly leading generally to serious conflicts and unsmooth driving, which in turn impacts on the traffic safety [4]. Therefore, analyzing the trajectories of turning vehicles is required so as to improve safety performance at signalized intersections.
Two approaches are classically implemented to evaluate the safety performance at intersections, namely, post- and preimplementation assessments. The former involves collecting the data after implementing the countermeasure, while the latter enables the engineers to predict the safety performance at the planning stage and thus is more feasible [5]. Simulation tools are deemed promising since they provide more flexibility and opportunity to achieve reliable preimplementation safety assessments and thus overcome the limitations associated with postimplementation assessments. However, the current simulation software tools are still oversimplified, and therefore, the consequent safety assessments at intersections are not sufficiently reliable [6, 7]. Recently, driving simulators and virtual reality systems are emerging as new tools to study road user behavior in combination with microscopic simulation tools [810]. These advanced tools are rapidly replacing the traditional traffic safety assessment techniques. However, realistic representation of vehicle trajectories (including path, speed, and acceleration profiles) is essential in such applications for a reliable safety assessment. Furthermore, the availability of realistic and accurate models for the trajectories of turning traffic is critically needed for the motion planning of autonomous vehicles.
The majority of vehicle path models available in the literature are developed based on a number of trajectories that are manually extracted from recorded videos. The process of manual trajectory extraction, however, can be tiresome and time-consuming since it requires tracking every single vehicle in a frame-by-frame manner. This becomes particularly problematic when a large number of trajectories are required for building an accurate vehicle trajectory model. Alternatively, automatic trajectory extraction techniques have been proposed [1113]. Yet, most previously developed automatic trajectory extraction approaches require background subtraction to detect moving vehicles. This process is significantly vulnerable to factors such as light and shadow conditions, occlusion with obstacles and other vehicles, and camera’s position and viewing angle [14]. In view of that, the effort presented in this paper aims at developing an effective tool for automatic trajectory extraction of turning vehicles, and the proposed tool relies on convolutional neural networks (CNNs) to perform the vehicles’ detection task.
The motivation of using CNNs in this work is twofold:
1.
CNNs have recently become the de facto standard for computer vision and pattern recognition as they achieved the state-of-the-art performances in challenging tasks such as handwriting recognition, classification of large image archives, and face segmentation. In the context of traffic engineering, successful applications of CNN have been reported including flow speed prediction [15], traffic density measurement [16, 17], pavement distress detection [18], road crack detection [19], and detection of traffic signs [2023] or pedestrians [24, 25].
 
2.
CNNs operate directly on the raw videos without requiring prior image preprocessing or background subtraction.
 
In this paper, the proposed vehicle tracking tool is used for an automatic extraction of left-turning vehicles trajectories at a signalized intersection located in Doha City, State of Qatar. The extracted trajectories are then compared to their manually extracted counterparts in order to demonstrate the accuracy of the CNN-based approach. After that, a minimum-jerk-based method is used to model the variations in vehicles’ trajectories (paths and speed profiles). Monte Carlo simulations are then conducted to generate a large number of simulated trajectories using the proposed minimum-jerk model. Finally, in order to verify this model, the distribution of the simulated paths is compared to that of the actual extracted trajectories.

2.1 Modeling of Vehicle Turning Trajectory

Several studies have been conducted in the past few decades to grasp, as possible, the turning behavior of vehicles at signalized intersections. In general, significant characteristics concerning the intersection layout and the turning vehicle were highlighted [2629]. As an example, Alhajyaseen et al. [30] underlined that the path of the turning vehicle is strongly related to the intersection’s angle, the vehicle’s type, and speed. However, it is agreed that the turning maneuver of vehicles is a further complex phenomenon whose variability extends to be related to highly dynamic factors [31, 32]. For instance, the turning behavior was observed to depend on inter- and intra-subject factors concerning drivers such as the perception of traffic environment, information processing, and the ability to react correctly and to cooperate with others [33, 34]. Other factors such as the waiting time of the turning vehicles [35], the relative speed of the vehicles in conflict [36], and gaps [37] were observed to impact on the decision behavior of turning vehicles.
Since understanding the complex mechanism of turning vehicles’ paths is crucial to achieve an effective traffic control and safety assessment at intersections, a reliable simulation model is required so that the variations of the turning vehicles are reproduced with high resolution. Classically, the vehicle’s turning path was simulated via one-dimensional models [3840]. In such simulations, a set of lane-based models are defined, in which the longitudinal and lateral movements are separately represented by a car-following model and a lane-changing model, respectively. Despite their simplicity and applicability to be involved in decision-making approaches, the one-dimensional models are unable to accurately reproduce the variations of turning trajectories [41].
As an alternative to the traditional one-dimensional trajectory models, the two-dimensional model has emerged as a viable simulation technique for vehicle turning paths as it breaks the lane-based assumption. Accordingly, the longitudinal and lateral movements are simultaneously simulated, and therefore, the characteristics of the turning paths are reliably reflected [42]. However, these approaches produce the path of the turning vehicles only without the speed and acceleration profiles [30], which are usually estimated using other independent models. In this context, a microscopic simulation model that generates vehicles’ turning trajectories was developed by Tan et al. [43] using different models for path (Euler spiral-based approximation method) and speed profiles. More recently, Wei et al. [44] established a left-turning vehicle’s path model by means of extracting trajectories from recorded videos and analyzing their distributions, velocities, and flow-changing characteristics. On the basis of this effort, the same authors [44] proposed the idea of setting left-turning guidelines at signalized intersections, which was verified as an effective tool to reduce traffic conflicts and improve traffic efficiency. Also, Ma et al. [45] proposed a three-phased (i.e., plan-decision-action) model to estimate the vehicle’s path at mixed-flow intersections. However, combining different models for turning path and speed profile does not ensure the spatial and temporal consistency between the location and the speed of turning vehicles. In an attempt to overcome such limitation, Dias et al. [7, 46] applied the concept of minimum jerk to fit the trajectories of manually tracked free-flow turning vehicles at signalized intersections in Japan. The proposed approach simultaneously estimates the temporal and the spatial profiles of the vehicle turning maneuvers with acceptable accuracy. Yet, the same authors [7, 46] did not discuss the limitations of their proposed approach and the procedure to generate the maneuvers of turning vehicles in microsimulation.

2.2 Automatic Trajectory Extraction

As an alternative to manual trajectory extraction, researchers have proposed several methods for semiautomatic and automatic tracking of the turning vehicles. For instance, Shirazi and Morris [47] proposed a semiautomatic technique to extract vehicles’ trajectories from traffic footages. This method requires first to identify the locations of the vehicles in each video frame manually. After that, vehicle tracking is performed using a detection-track mapping matrix which utilizes nearest global matching. Yet, despite them producing accurate results, semiautomatic techniques are considered laborious since they initially require some steps to be performed manually [4850].
Automatic extraction techniques, on the other hand, are deemed more promising since they provide swift results with minimum manpower involved. In this context, Hsieh et al. [11] proposed an automatic vehicle tracking method which implements a background subtraction technique for vehicle detection along with a Kalman filter for tracking. Similarly, Shirazi and Morris [51] used a Gaussian mixture model to detect vehicles at signalized intersections together with a Kalman filter for trajectory extraction. Apeltauer et al. [12] developed another automated method for trajectory extraction in which vehicles are detected using a two-stage classifier trained based on multi-block local binary pattern (MB-LBP) features. This method also requires applying background subtraction in order to generate the foreground mask. Also, Khan et al. [13] developed a comprehensive framework for automatic trajectory extraction of the vehicles from traffic footages acquired by unmanned aerial vehicles (UAVs). This framework involved video preprocessing and stabilization, vehicle detection and tracking, and ultimately, management of the extracted trajectories. Similar to [11, 12], Khan et al. [13] carried out the vehicles’ detection using background subtraction algorithm.

3 CNNs and R-CNNs

3.1 Convolutional Neural Networks (CNNs)

CNNs are deep, biologically inspired feed-forward artificial neural networks (ANNs) which have been developed based on a core model for mammalian visual cortex. One of the most attractive features of CNNs is their ability to classify images regardless of their scale and orientation [52]. A typical CNN designed to deal with 28 × 28 pixel RGB images is depicted in Fig. 1. The structure consists of alternating convolutional and sub-sampling layers followed by a number of multilayer perceptrons (MLP) layers (i.e., fully connected layers). Each convolutional layer contains a number of filters (neurons) having a specific kernel size (\( k_{x} = k_{y} = 5 \) in this illustration). These filters are responsible for extracting particular features from the input image called the feature maps. The convolutional filters are basically matrices of size \( \left( {k_{x} ,k_{y} } \right) \) containing certain values referred to as the weights. At each neuron, 2D convolution is performed between the input image and the filter’s weights. The output of this operation is processed by an activation function and then passed to the next subsampling layer, which decimates the feature maps extracted by the previous convolutional layer by a predefined sub-sampling factor. As shown in Fig. 1, after being processed by a sufficient number of convolutional and subsampling layers, the input image is reduced to a 1D array. This array is then processed by the MLP layers resulting in an output vector that represents the classification results.
The process of computing the weights of the convolutional filters and MLP layers is defined as CNN training. Before carrying out the training process, it is necessary to define the CNN’s structure in advance. This includes the number of convolutional and MLP layers as well as the kernel size and subsampling factor at each level and the number of neurons in each layer of the CNN. Such hyperparameters are usually picked by trial and error since there is no systematic approach for determining the optimal CNN structure within an acceptable computational time [16]. Afterward, the weights in both convolutional and MLP layers are initialized randomly. A large set of images is then used to train the CNN according to a back-propagation (BP) algorithm. The objective of this training process is to adjust the CNN weights in an iterative manner until the summation of squared error between the target values and the CNN output is minimized. The BP operation is not explained in this paper for brevity. The interested reader is referred to [53] for more details about training CNNs.
Instead of training a new CNN starting from randomly generated weights, researchers often apply a technique called transfer learning in which a pretrained CNN is fine-tuned to learn a new task. Networks such as AlexNet [54] and GoogLeNet [55] are commonly used as a starting point in deep learning applications. Previous studies have shown that this approach is faster and more efficient than training CNNs from scratch [56].

3.2 Regions with CNNs (R-CNNs)

It must be noted here that CNNs are only designed to classify the input image into a number of predefined classes without being able to detect and localize specific objects within the image. Therefore, CNNs alone cannot satisfy the main requirement of this study, which is to detect and track vehicles automatically. To bridge the gap between image classification and object segmentation, Girshick et al. [57] have proposed a method called regions with CNNs (R-CNN). As shown in Fig. 2, this method consists of three components: (1) a region proposal algorithm that generates a large number of candidate detections, (2) a large CNN that extracts features from each proposal, and (3) linear support vector machines (SVMs) to process the extracted features and classify each candidate regions.

3.3 Data Collection and CNN Training

The south approach of Lekhwair signalized intersection located in Doha City, State of Qatar, was videotaped for a duration of two and a half hours. The video was recorded at a frame rate of 30 fps and a resolution of 3840 × 2160 pixel. The same video was used in the current study for both R-CNN training and trajectory extraction operations.
The images required for carrying out the training of the R-CNN were acquired by randomly selecting 26 frames of the video. To reduce the computational time required for the training, the images were cropped to the region around the middle of the intersection and the resolution was reduced to 1920 × 1080 pixel. For each image, vehicles were manually labeled by bounding boxes, and consequently, a dataset for the coordinates of 536 vehicles in total was obtained. Based on the images and the bounding boxes dataset, the R-CNN training process was carried out using “trainRCNNObjectDetector” function available in MATLAB Computer Vision System Toolbox. AlexNet was used as a starting point for this deep learning task. Details about the architecture of this network can be found in [54].

4 Trajectory Extraction Using R-CNN

A MATLAB tool was developed to utilize the R-CNN trained in Sect. 3.3 for tracking of vehicles. This tool processes the video frames at a user-defined sampling rate and uses the R-CNN to detect the vehicles. The output of the R-CNN is a set of bounding boxes enclosing each vehicle in the processed frame. The location of a vehicle was defined here as the centroid of its bounding box. Once a vehicle is detected, the tool constructs a Kalman filter to track the location of this vehicle in the next frames until it leaves the intersection area. Using Kalman filters for tracking the vehicles is necessary to reduce trajectory noise and enable the tool to associate multiple vehicles to their correct tracks. The tool operates in two modes (Fig. 3): (1) tracking of multiple vehicles and (2) tracking of a single vehicle. The first mode allows the user to simultaneously track all moving vehicle in the video (Fig. 3a), while the second mode involves tracking a single vehicle picked by the user (Fig. 3b). The advantage of the second mode is the fact that it requires significantly lower computational time and effort compared to the first mode since the R-CNN only processes the region surrounding the vehicle of interest. The vehicle tracking process explained in the current section is illustrated in Fig. 4.

4.1 Transformation from Image to Real-World Coordinates

The trajectories generated by the aforementioned approach describe the locations of moving vehicles in image coordinates (pixels) with respect to time. In order to map the trajectories to the real-world coordinates, it is necessary to define the homography matrix corresponding to this transformation. To do so, it is required to have four known points in both real-world and image coordinates. Then, the homography matrix \( {\mathbf{H}} \) can be computed as follows:
$$ {\mathbf{H}} = \left[ {\begin{array}{*{20}c} { - x_{1} } & { - y_{1} } & { - 1} & 0 & 0 & 0 & {x_{1} x_{w,1} } & {y_{1} x_{w,1} } & {x_{w,1} } \\ 0 & 0 & 0 & { - x_{1} } & { - y_{1} } & { - 1} & {x_{1} y_{w,1} } & {y_{1} y_{w,1} } & {y_{w,1} } \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ { - x_{4} } & { - y_{4} } & { - 1} & 0 & 0 & 0 & {x_{4} x_{w,4} } & {y_{4} x_{w,4} } & {x_{w,4} } \\ 0 & 0 & 0 & { - x_{4} } & { - y_{4} } & { - 1} & {x_{4} y_{w,4} } & {y_{4} y_{w,4} } & {y_{w,4} } \\ \end{array} } \right] $$
(1)
where \( \left( {x_{w,1} ,y_{w,1} } \right) , \ldots , \left( {x_{w,4} ,y_{w,4} } \right) \) are the real-world coordinates of any four noncolinear points and \( \left( {x_{1} ,y_{1} } \right) , \ldots , \left( {x_{4} ,y_{4} } \right) \) are the image coordinates (in pixels) of the same four points. The matrix \( {\mathbf{H}} \) can be then calculated as the first eigenvector (reshaped into a 3 × 3 matrix) of \( {\mathbf{H}}^{{\mathbf{T}}} {\mathbf{H}} \). Next, the homography matrix can be used to map any point from image coordinates \( \left( {x_{i} ,y_{i} } \right) \) to world coordinates \( \left( {x_{w} ,y_{w} } \right) \) as follows:
$$ \left[ {c*\begin{array}{*{20}c} {x_{w} } & {c*y_{w} } & c \\ \end{array} } \right]^{\text{T}} = {\mathbf{H}} \left[ {\begin{array}{*{20}c} {x_{i} } & {y_{i} } & 1 \\ \end{array} } \right]^{\text{T}} $$
(2)

4.2 Verification of the Proposed CNN Tool

In order to verify its accuracy, the proposed tool was used to extract the trajectories of 18 free-flowing left-turning vehicles. The same vehicles were also tracked manually in order to identify the ground truth of the trajectories. The manual trajectory extraction was performed simply by tracking each vehicle at a 0.5-s rate, while drawing a bounding box around the vehicle in each frame. The location of a manually tracked vehicle position at a certain time was taken as the centroid of the bounding box. The manual trajectories were then transformed into real-world coordinates as explained in Sect. 4.1. A comparison between the automatically and manually extracted trajectories is shown in Fig. 5. The error between the trajectories extracted by the proposed tool and their manually extracted counterparts was calculated at each point of the trajectories. The error \( E_{p} \) was defined here as the distance between an automatically extracted point and the corresponding manually extracted one according to the following equation:
$$ E_{p} = \sqrt {\left( {x_{\text{a}} - x_{\text{m}} } \right)^{2} + \left( {y_{\text{a}} - y_{\text{m}} } \right)^{2} } $$
(3)
where \( x_{\text{a}} \) and \( y_{\text{a}} \) are the coordinates of the automatically extracted point and \( x_{\text{m}} \) and \( y_{\text{m}} \) the associated manually extracted point. The error distribution of the points corresponding to the 18 trajectories is shown in Fig. 6. The average error across all points of the trajectories is 16.5 cm with a standard deviation 11.8 cm. These results support the ability of the proposed tool to automatically track vehicles with an acceptable level of accuracy.

5 Trajectory Analyses

The proposed CNN-based tool was used to automatically extract the trajectories of 44 left-turning free-flowing vehicles (i.e., unimpeded by traffic/pedestrians) from the recorded video. The extracted trajectories are shown in image coordinates (in Fig. 7a) as well as in real-world coordinates (Fig. 7b).

5.1 Minimum-Jerk Method

Originally, the principle of minimum jerk was proposed in the mid-1980s by Flash and Hogan [58] to describe the planar movements of the human arm, after which the use of this method has gained more popularity and general acceptance. Successful applications of the minimum-jerk method have been reported in various contexts including human goal-oriented locomotion [59], robot-limb movements [60], autonomous vehicle maneuvers [61, 62], driver-following behavior [63], and more recently a preliminary application for modeling the trajectory of turning vehicles [7, 46].
In principle, the minimum-jerk model suggests that the drivers tend to optimize the smoothness of turning maneuvers by minimizing the time integration of the jerk. Thus, according to [58], the cost function to be minimized can be given as:
$$ J = {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}\mathop \int \limits_{0}^{{t_{f} }} \left( {\left( {\frac{{{\text{d}}^{3} x}}{{{\text{d}}t^{3} }}} \right)^{2} + \left( {\frac{{{\text{d}}^{3} y}}{{{\text{d}}t^{3} }}} \right)^{2} } \right){\text{d}}t $$
(4)
where \( t_{f} \) is the time elapsed by the turning vehicle to cross the intersection.
As indicated by Flash and Hogan [58], the solution of the minimization problem given in Eq. (4) can be obtained in the time domain as a set of fifth-order polynomials for \( x \) and \( y \) as follows:
$$ x\left( t \right) = a_{0} + a_{1} t + a_{2} t^{2} + a_{3} t^{3} + a_{4} t^{4} + a_{5} t^{5} $$
(5)
$$ y\left( t \right) = b_{0} + b_{1} t + b_{2} t^{2} + b_{3} t^{3} + b_{4} t^{4} + b_{5} t^{5} $$
(6)
where \( a_{i} \,{\text{and}}\,b_{i} \) (\( i = 0,1, \ldots ,5 \)) are unknown coefficients to be obtained using twelve boundary conditions corresponding to the x- and y-components of location, velocity, and acceleration at the initial and final points of the vehicle’s trajectory. By applying the boundary conditions corresponding to the x-direction on Eq. (5), the following system of equations can be obtained:
$$ x_{0} = a_{0} $$
(7a)
$$ v_{x0} = a_{1} $$
(7b)
$$ a_{x0} = 2a_{2} $$
(7c)
$$ x_{f} = a_{0} + a_{1} t_{f} + a_{2} t_{f}^{2} + a_{3} t_{f}^{3} + a_{4} t_{f}^{4} + a_{5} t_{f}^{5} $$
(7d)
$$ v_{xf} = a_{1} + 2a_{2} t_{f} + 3a_{3} t_{f}^{2} + 4a_{4} t_{f}^{3} + 5a_{5} t_{f}^{4} $$
(7e)
$$ a_{xf} = 2a_{2} + 6a_{3} t_{f} + 12a_{4} t_{f}^{2} + 20a_{5} t_{f}^{3} $$
(7f)
where \( x_{0} \), \( v_{x0} \), and \( a_{x0} \) are the position, velocity, and acceleration, respectively, in the x-direction at the starting point of the turning maneuver, and \( x_{f} \), \( v_{xf} \), and \( a_{xf} \) are those corresponding to the end point. Likewise, applying the boundary conditions corresponding to the y-direction yields a system of equations similar to that of Eq. (7) but in terms of the coefficients \( b_{i} \left( {i = 0,1, \ldots ,5} \right) \) and the parameters \( y_{0} \), \( v_{y0} \), \( a_{y0} \), \( y_{f} \), \( v_{yf} \), and \( a_{yf} \).

5.2 Identification of the Starting and Ending Points of the Turning Maneuver

In order to compute the coefficients \( a_{i} \) and \( b_{i} \left( {i = 0,1, \ldots ,5} \right) \) corresponding to each of the extracted trajectories, it is necessary to identify the x- and y-components of position, velocity, and acceleration at the points at which the vehicle starts and ends its turning maneuver, along with the time taken during the maneuver \( t_{f} \). Once these values are known, the two systems of equations (described in Sect. 5.1) can be easily solved to obtain the coefficients \( a_{i} \) and \( b_{i} \) corresponding to the trajectory’s turning curve.
Therefore, it is necessary to accurately identify the two points along the trajectory at which the turning maneuver starts and ends. To do so, we utilize the spline-fitting method presented in [30]. According to this method, the trajectory of a left-turning vehicle at a signalized intersection can be represented by a spline consisting of five segments. The spline starts with a straight line followed by an Euler spiral having a curvature profile that varies almost linearly with a gradient of \( 1/A_{1}^{2} \). This spiral is followed by a circular segment with a curvature of \( 1/R_{ \hbox{min} } \). The end of the spline consists of another Euler spiral having a nearly linear curvature profile with a gradient of \( - 1/A_{2}^{2} \) followed by a straight line. As shown in Fig. 8, there are four main locations that define the beginning and the end of each Euler spiral and circular segments. These locations are basically the points of discontinuity along the curvature profile of the vehicle’s path. The points of interest here are points 1 and 4 in Fig. 8, which represent the starting and ending points of the turning maneuver.
A MATLAB code was written to fit the aforementioned spline to each of the extracted trajectories in order to identify the four points of curvature discontinuity along the vehicles’ turning paths. The code applies the nonlinear programming solver “fmincon” available in MATLAB Optimization Toolbox to compute the optimal location of the four key points (described in Fig. 8) so that the error between the tracked path and the fitted spline is minimized. Four constraints were imposed to enforce continuity of the fitted spline at the four points. Also, another four constraints were applied to make sure that there are no sudden jumps in the curvature profile at the key points. The fitting of the two Euler spirals was conducted according to the approach proposed in [64]. Figure 9 displays four examples of splines fitted to their corresponding automatically extracted paths.

5.3 Statistical Analysis

After identifying the two points of interest along each of the extracted trajectories (as explained in Sect. 5.2), the parameters \( x_{0} \), \( v_{x,0} \), \( a_{x,0} \), \( x_{f} \), \( v_{x,f} \), \( a_{x,f} \), \( y_{0} \), \( v_{y,0} \), \( a_{y,0} \), \( y_{f} \), \( v_{y,f} \), \( a_{y,f} \), along with \( t_{f} \), were computed for each trajectory. Figure 10 displays the probability distributions for these 13 parameters. As shown in the figure, a normal distribution was fitted for each parameter. One-sample Kolmogorov–Smirnov test (95% confidence level) indicated that each of the 13 parameters comes from a normal distribution with the mean and standard deviation shown in Table 1.
Table 1
Mean, standard deviation, and coefficient of variation of the parameters’ distribution
 
\( x_{0} \) (m)
\( x_{f} \) (m)
\( v_{x,0} \) (m/s)
\( v_{x,f} \) (m/s)
\( a_{x,0} \) (m/s2)
\( a_{x,0} \) (m/s2)
\( x_{0} \) (m)
\( x_{f} \) (m)
\( v_{x,0} \) (m/s)
\( v_{x,f} \) (m/s)
\( a_{x,0} \) (m/s2)
\( a_{x,0} \) (m/s2)
\( t_{f} \) (s)
\( \varvec{\mu} \)
35.7
− 29.4
− 9.68
− 6.57
1.03
− 0.971
− 49.3
− 45.06
8.98
− 7.45
− 1.44
− 0.708
7.72
\( \varvec{\sigma} \)
3.27
4.29
1.22
0.77
1.46
1.17
2.74
3.33
1.16
1.04
1.22
1.36
0.74

5.4 Comparison Between Simulated and Observed Trajectories

Monte Carlo simulation with 500 trials was conducted using the developed models. In each trial, random values of \( x_{0} \), \( v_{x,0} \), \( a_{x,0} \), \( x_{f} \), \( v_{x,f} \), \( a_{x,f} \), \( y_{0} \), \( v_{y,0} \), \( a_{y,0} \), \( y_{f} \), \( v_{y,f} \), \( a_{y,f} \), and \( t_{f} \) were generated according to the normal distributions described in Fig. 10 and Table 1. The resulting parameters were then used to compute the coefficients which determine the shape of the trajectory (\( a_{i} \) and \( b_{i} \)) by solving the two systems of equations described in Sect. 5.1. The coefficients were then used as per Eqs. (5) and (6) to obtain the simulated paths shown in Fig. 11a.
The distributions of the simulated paths, and those of the observed/actual trajectories, were analyzed and compared along three selected cross sections (drawn in Fig. 11a). Figure 11b–d depicts a comparison between the observed and the simulated distributions. Kolmogorov–Smirnov test (performed at 95% confidence level) revealed that the simulated distributions at the three cross sections are not significantly different from the actual/observed counterparts. Furthermore, the comparison shown in Fig. 12 indicates a reasonable agreement between the observed and simulated speed and acceleration profiles, which supports the reliability of the proposed model in generating accurate and realistic vehicle turning maneuvers.
Finally, Fig. 13 provides a concise summary of the overall procedure followed to develop and validate the proposed minimum-jerk-based trajectory model, starting from trajectory extraction and ending with the use of Monte Carlo simulations to generate trajectories of the turning vehicles.

6 Conclusions and Future Recommendations

In this paper, a CNN-based tool was developed for the automatic extraction of vehicle trajectories. In order to test the proposed tool, video data were collected at a signalized intersection located in Doha City, State of Qatar. Several trajectories were extracted both manually and automatically. The average error between the manually and automatically extracted trajectory paths was 16.5 cm, which demonstrates the accuracy of the proposed method. A minimum-jerk-based approach was used to statistically model the variations in left-turning vehicle trajectories including paths and speed profiles. The minimum-jerk approach was found to be effective and reliable in producing realistic turning maneuvers. Monte Carlo simulation was conducted to verify the statistical model by comparing the simulated and actual trajectories.
Finally, the effort presented in this paper can be regarded as a step forward toward maximizing the potential use of deep learning in traffic safety applications. However, in order to further improve the applicability of the proposed methods, the following recommendations can be considered in future studies:
  • The R-CNN used in this work was trained using images taken from a single intersection with specific geometric characteristics and surrounding conditions. Therefore, the proposed R-CNN can only be used to accurately track vehicles at this particular intersection. Training the network using a larger set of images that are collected from multiple intersections is required to generate a more versatile R-CNN.
  • The computational efficiency of the proposed tool can be improved by optimizing the structure of the R-CNN. Also, the updated versions of the standard R-CNN used in this work (i.e., fast R-CNN [65] or faster R-CNN [66]) can be implemented to minimize the required computational time.
  • The trajectory models developed in this study are based on a limited number of trajectories (N = 44) extracted from a single intersection. A larger number of trajectories obtained from several intersections are essential to achieve a deeper insight into the behavior of drivers at signalized intersections. Furthermore, the proposed trajectory model assumes that the start and end points of the turning trajectory are known; accordingly, it is required to develop probabilistic models to estimate the distribution of these points (i.e., start and end of turning path) as functions of the vehicle entry speed and intersection geometry.

Acknowledgements

Open access funding provided by Linnaeus University. This publication was made possible by the NPRP award [NPRP 9-360-2-150] from Qatar National Research Fund (a member of The Qatar Foundation). The statements made herein are solely the responsibility of the authors. Special thanks are due to Dr. Deepti Muley for the support in collecting the video records used in the current paper.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.
Literatur
3.
Zurück zum Zitat Ma, W.; Yang, X.: Coordination design of left movements of signalized intersections group. J. Tongji Univ. 36, 1507–1511 (2008)MATH Ma, W.; Yang, X.: Coordination design of left movements of signalized intersections group. J. Tongji Univ. 36, 1507–1511 (2008)MATH
7.
Zurück zum Zitat Dias, C.; Iryo-asano, M.; Oguchi, T.: Concurrent prediction of location, velocity and acceleration profiles for left turning vehicles at signalized intersections. In: 土木計画学研究発表会・講演集, pp. 3054–3062 (2016) Dias, C.; Iryo-asano, M.; Oguchi, T.: Concurrent prediction of location, velocity and acceleration profiles for left turning vehicles at signalized intersections. In: 土木計画学研究発表会・講演集, pp. 3054–3062 (2016)
10.
Zurück zum Zitat Helmer, T.; Wang, L.; Kompass, K.;, Kates, R.: Safety performance assessment of assisted and automated driving by virtual experiments: stochastic microscopic traffic simulation as knowledge synthesis. In: IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, pp. 2019–2023 (2015) Helmer, T.; Wang, L.; Kompass, K.;, Kates, R.: Safety performance assessment of assisted and automated driving by virtual experiments: stochastic microscopic traffic simulation as knowledge synthesis. In: IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, pp. 2019–2023 (2015)
12.
13.
Zurück zum Zitat Khan, M.A.; Ectors, W.; Bellemans, T.; Janssens, D.; Wets, G.: Unmanned aerial vehicle-based traffic analysis: methodological framework for automated multivehicle trajectory extraction. Transp. Res. Rec. J. Transp. Res. Board. 2626, 25–33 (2017). https://doi.org/10.3141/2626-04CrossRef Khan, M.A.; Ectors, W.; Bellemans, T.; Janssens, D.; Wets, G.: Unmanned aerial vehicle-based traffic analysis: methodological framework for automated multivehicle trajectory extraction. Transp. Res. Rec. J. Transp. Res. Board. 2626, 25–33 (2017). https://​doi.​org/​10.​3141/​2626-04CrossRef
24.
Zurück zum Zitat Szarvas, M.; Yoshizawa, A.; Yamamoto, M.; Ogata, J.: Pedestrian detection with convolutional neural networks. In: Intelligent Vehicles Symposium 2005. Proceedings. IEEE, pp. 224–229 (2005) Szarvas, M.; Yoshizawa, A.; Yamamoto, M.; Ogata, J.: Pedestrian detection with convolutional neural networks. In: Intelligent Vehicles Symposium 2005. Proceedings. IEEE, pp. 224–229 (2005)
26.
Zurück zum Zitat Reed, M.: Intersection kinematics: a pilot study of driver turning behavior obscuration by a-pillars. Report No. UMTRI-2008-54, University of Michigan, Ann Arbor, Industry Affiliation Program for Human Factors in Transportation Safety (2008) Reed, M.: Intersection kinematics: a pilot study of driver turning behavior obscuration by a-pillars. Report No. UMTRI-2008-54, University of Michigan, Ann Arbor, Industry Affiliation Program for Human Factors in Transportation Safety (2008)
27.
Zurück zum Zitat Stover, V.G.; Koepke, F.J.: Transportation and Land Development, pp. 1–239. Prentice-Hall, Englewood Cliffs (1988) Stover, V.G.; Koepke, F.J.: Transportation and Land Development, pp. 1–239. Prentice-Hall, Englewood Cliffs (1988)
28.
Zurück zum Zitat Stover, V.G.: Issues relating to the geometric design of intersections. In: Proceedings of 8th International Conference on ACCESS Management (2008) Stover, V.G.: Issues relating to the geometric design of intersections. In: Proceedings of 8th International Conference on ACCESS Management (2008)
34.
Zurück zum Zitat Sun, R.: Cognition and Multi-agent Interaction: From Cognitive Modeling to Social Simulation. Cambridge University Press, Cambridge (2005)CrossRef Sun, R.: Cognition and Multi-agent Interaction: From Cognitive Modeling to Social Simulation. Cambridge University Press, Cambridge (2005)CrossRef
40.
Zurück zum Zitat Gipps, P.G.: A model for the structure of lane-changing decisions. 8 (1986) Gipps, P.G.: A model for the structure of lane-changing decisions. 8 (1986)
41.
Zurück zum Zitat Huang, W.; Fellendorf, M.; Schönauer, R.: Social force based vehicle model for two-dimensional spaces. In: Transportation Research Board 91st Annual Meeting, pp. 1–16 (2012) Huang, W.; Fellendorf, M.; Schönauer, R.: Social force based vehicle model for two-dimensional spaces. In: Transportation Research Board 91st Annual Meeting, pp. 1–16 (2012)
43.
47.
Zurück zum Zitat Salvo, G.; Caruso, L.; Scordo, A.: Gap acceptance analysis in an urban intersection through a video acquired by an UAV. In: Recent Advances in Civil Engineering and Mechanics, WSEAS Press, ISSN: 2227-4588, pp. 199–205 (2014) Salvo, G.; Caruso, L.; Scordo, A.: Gap acceptance analysis in an urban intersection through a video acquired by an UAV. In: Recent Advances in Civil Engineering and Mechanics, WSEAS Press, ISSN: 2227-4588, pp. 199–205 (2014)
48.
Zurück zum Zitat Salvo, G.; Caruso, L.; Scordo, A.: Gap acceptance analysis in an urban intersection through a video acquired by an UAV. In: Recent Advances in Civil Engineering and Mechanics, pp. 199–205. WSEAS Press (2014) Salvo, G.; Caruso, L.; Scordo, A.: Gap acceptance analysis in an urban intersection through a video acquired by an UAV. In: Recent Advances in Civil Engineering and Mechanics, pp. 199–205. WSEAS Press (2014)
50.
Zurück zum Zitat Barmpounakis, E.N.; Vlahogianni, E.I.; Golias, J.C.: Extracting kinematic characteristics from unmanned aerial vehicles. In: Transportation Research Board 95th Annual Meeting, p. 16 (2016) Barmpounakis, E.N.; Vlahogianni, E.I.; Golias, J.C.: Extracting kinematic characteristics from unmanned aerial vehicles. In: Transportation Research Board 95th Annual Meeting, p. 16 (2016)
53.
Zurück zum Zitat Kiranyaz, S.; Ince, T.; Gabbouj, M.: Real-time patient-specific ECG classification by 1-D convolutional neural networks (2016) Kiranyaz, S.; Ince, T.; Gabbouj, M.: Real-time patient-specific ECG classification by 1-D convolutional neural networks (2016)
54.
Zurück zum Zitat Krizhevsky, A.; Sutskever, I.; Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012) Krizhevsky, A.; Sutskever, I.; Hinton, G.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
56.
Zurück zum Zitat Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014). https://doi.org/10.1109/CVPR.2014.222 Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014). https://​doi.​org/​10.​1109/​CVPR.​2014.​222
57.
Zurück zum Zitat Girshick, R.; Donahue, J.; Darrell, T.; Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Girshick, R.; Donahue, J.; Darrell, T.; Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
58.
Zurück zum Zitat Flash, T.; Hogan, N.: The coordination of arm movements: an experimentally confirmed mathematical model. J. Neurosci. 5, 1688–1703 (1985)CrossRef Flash, T.; Hogan, N.: The coordination of arm movements: an experimentally confirmed mathematical model. J. Neurosci. 5, 1688–1703 (1985)CrossRef
61.
Zurück zum Zitat Bianco, C.G. Lo; Romano, M.: Bounded velocity planning for autonomous vehicles. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005 (IROS 2005), pp. 685–690. IEEE (2005) Bianco, C.G. Lo; Romano, M.: Bounded velocity planning for autonomous vehicles. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005 (IROS 2005), pp. 685–690. IEEE (2005)
63.
Zurück zum Zitat Hiraoka, T.; Kunimatsu, T.; Nishihara, O.; Kumamoto, H.: Modeling of driver following behavior based on minimum-jerk theory. In: Proceedings of 12th World Congress ITS (2005) Hiraoka, T.; Kunimatsu, T.; Nishihara, O.; Kumamoto, H.: Modeling of driver following behavior based on minimum-jerk theory. In: Proceedings of 12th World Congress ITS (2005)
65.
Zurück zum Zitat Girshick, R.: Fast R-CNN. In: ICCV 2015 (2015) Girshick, R.: Fast R-CNN. In: ICCV 2015 (2015)
Metadaten
Titel
Extraction of Vehicle Turning Trajectories at Signalized Intersections Using Convolutional Neural Networks
verfasst von
Osama Abdeljaber
Adel Younis
Wael Alhajyaseen
Publikationsdatum
04.05.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Arabian Journal for Science and Engineering / Ausgabe 10/2020
Print ISSN: 2193-567X
Elektronische ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-020-04546-y

Weitere Artikel der Ausgabe 10/2020

Arabian Journal for Science and Engineering 10/2020 Zur Ausgabe

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.