Utilizing support vector machine in real-time crash risk evaluation
Highlights
► We introduce support vector machine models to perform real-time crash risk evaluation. ► SVM models with different kernel functions were compared with logistic regression models. ► Hierarchical Bayesian logistic regression models were used to capture unobserved heterogeneity. ► Extension analyses about the explanatory variables used in the SVM models have been conducted.
Introduction
Recently Active Traffic Management (ATM) have been emerging in the US and Europe, its key control strategies such as variable speed limits (VSL) were recognized to have the benefits of improving traffic safety (Mirshahi et al., 2007, Pan et al., 2010, Chang et al., 2011). These implemented systems were mostly designed to reduce speed variations to reduce the crash risk. Moreover, more advanced proactive crash prediction models have showed promising effects on reducing crash occurrence along with the VSL system (Abdel-Aty et al., 2006, Lee et al., 2006b, Lee and Abdel-Aty, 2008). Within these studies, sophisticated real-time crash risk evaluation models were estimated to emulate the crash occurrence probabilities with the real-time traffic data. Crash risks would be evaluated with real-time traffic data and once a certain threshold of crash risk has been reached, the VSL control system would be triggered to smoothen the traffic flow and improve traffic safety. The real-time crash risk evaluation models try to identify the “crash precursor conditions” by comparing the crash occurrence traffic statuses and randomly selected non-crash cases. In the previous studies, both the traditional statistical models and artificial intelligence models have been utilized. Matched case–control logistic regression was one of the widely employed traditional statistical models (Abdel-Aty et al., 2004, Abdel-Aty et al., 2007, Lee et al., 2006a) while the artificial neural network models (Pande and Abdel-Aty, 2006a, Pande et al., 2011) was another popular modeling technique that has been adopted in previous studies. More recently, as the Bayesian inference technique became popular, Bayesian logistic regression models have been used in real-time crash risk evaluation studies (Ahmed et al., in press-a, Ahmed et al., in press-b).
Although previous real-time crash risk evaluation models have been proven to be capable of differentiating between crash and non-crash cases, these models have some limitations. Logistic regression models assumed a linear relationship between the dependent and independent variables while neural network models work as a black-box and may have over-fitting issues. Support vector machine (SVM), a newly introduced pattern classifier based on statistical learning theory (Vladimir and Vapnik, 1995) was introduced in this study to formulize the real-time crash risk evaluation model. Data from a 15-mile mountainous freeway (I-70) in Colorado was used in this study. With the merit of the Remote Traffic Microwave Sensor (RTMS) radars implemented along the freeway, real-time traffic data (speed, occupancy and volume) was captured and matched with the historical crash data.
The data has been split into training and scoring datasets. The training dataset was utilized to estimate the models and the scoring dataset was meant to test the prediction powers of different models. Due to SVM models lack of the capability of selecting significant variables and the use of all the variables as input would make the model cumbersome, a classification and regression tree (CART) was first estimated to select the most significant contributing variables. Then based on the chosen explanatory variables, three candidates Bayesian logistic regression models have been estimated with accounting for different levels unobserved heterogeneity. Then SVM model with Radial-basis kernel function and linear kernel function have been estimated and compared to Bayesian logistic regression model. Comparisons have been made based on the areas under the ROC curves (AUC). Moreover, SVM models without the variable selection procedure have also been estimated and investigated. Furthermore, the scoring datasets were divided into different sample sizes to test the sample size issue on these models prediction abilities. Finally, sensitivity analyses have been conducted to reveal the effects of the explanatory variables. Fig. 1 presents the flowchart of the main modeling procedures for this study.
Section snippets
Background
Support vector machine (SVM) models have been employed in some aspects of transportation research studies. Yuan and Cheu (2003) introduced SVM in incident detection and they compared the results from SVM models with the multi-layer feed forward neural network (MLFNN) and probabilistic neural network models. They concluded that SVM models provided a lower misclassification rate, higher correct detection rate and lower false alarm rate. Later on, Chen et al. (2009) also constructed SVM models to
Data preparation
The 15-mile mountainous freeway is located on I-70 in Colorado and the studied segment starts from the Mile Marker (MM) 205 and ends at MM 220. There were two datasets utilized in this study, (1) crash data from October 2010 to October 2011 provided by Colorado Department of Transportation (CDOT) and (2) real-time traffic data detected by 30 RTMS radars. There were 265 crashes documented and matched with the traffic data and 1017 non-crash cases that were matched with the crash cases. The RTMS
Support vector machine
Support vector machine was originally designed based on statistical learning theory and the structural risk minimization. The algorithm tries to find a separating hyperplane by minimizing the distance of misclassified points to the decision boundary. For the binary classification problem in this study (crash and non-crash), given the training dataassuming that for the crash cases for the non-crash cases; xi represent the matrix for explanatory
Variable selection
Due to the SVM models lack of capability of selecting significant variables from the 18 explanatory variables, a classification and regression tree (CART) has been estimated to do the variable selection work. CART models have frequently used in traffic safety studies for their classification capability. For example, Chang and Wang (2006) utilized a CART model to analyze crash injury severity and Chang and Chen (2005) employed a CART model to predict crash frequency. Moreover, Kuhnert et al.
Conclusion
Active Traffic Management (ATM) concepts are gaining momentum around the world. Improving traffic safety is expected to be a major component of ATM. Thus efficient and accurate real-time crash prediction models are required. Previous studies that have focused on this topic adopted both the traditional statistical (logistic regression model) and the artificial neural network techniques. Due to limitations of these models (linear function forms and over-fitting problems), SVM models have been
References (38)
- et al.
Evaluation of variable speed limits for real-time freeway safety improvement
Accident Analysis and Prevention
(2006) - et al.
Data mining of tree-based models to analyze freeway accident frequency
Journal of Safety Research
(2005) - et al.
Analysis of traffic injury severity: an application of non-parametric classification tree techniques
Accident Analysis and Prevention
(2006) - et al.
Construct support vector machine ensemble to detect traffic incident
Expert Systems with Applications
(2009) - et al.
Combing non-parametric models with logistic regression: an application to motor vehicle injury data
Computational Statistics and Data Analysis
(2000) - et al.
Evaluation of variable speed limits to improve traffic safety
Transportation Research Part C
(2006) - et al.
Predicting motor vehicle crashes using support vector machine models
Accident Analysis and Prevention
(2008) - et al.
Using support vector machine models for crash injury severity analysis
Accident Analysis and Prevention
(2012) - et al.
The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives
Transportation Research Part A
(2010) - et al.
Assessment of freeway traffic parameters leading to lane-change related collisions
Accident Analysis and Prevention
(2006)
Incident detection using support vector machines
Transportation Research Part C: Emerging Technologies
Bayesian random effect models incorporating real-time weather and traffic data to investigate mountainous freeway hazardous factors
Accident Analysis and Prevention
Crash risk assessment using intelligent transportation systems data and real-time intervention strategies to improve safety on freeways
Journal of Intelligent Transportation Systems
Predicting freeway crashes from loop detector data by matched case–control logistic regression
Transportation Research Record: Journal of the Transportation Research Board
Its field demonstration: integration of variable speed limit control and travel time estimation for a recurrently congested highway
Forecasting shared-use vehicle trips with neural networks and support vector machines
Transportation Research Record: Journal of the Transportation Research Board
Cited by (268)
A hybrid deep learning framework for conflict prediction of diverse merge scenarios at roundabouts
2024, Engineering Applications of Artificial IntelligenceAdvancing proactive crash prediction: A discretized duration approach for predicting crashes and severity
2024, Accident Analysis and PreventionAdvances, challenges, and future research needs in machine learning-based crash prediction models: A systematic review
2024, Accident Analysis and PreventionMachine learning based real-time prediction of freeway crash risk using crowdsourced probe vehicle data
2024, Journal of Intelligent Transportation Systems: Technology, Planning, and Operations