Elsevier

Accident Analysis & Prevention

Volume 51, March 2013, Pages 252-259
Accident Analysis & Prevention

Utilizing support vector machine in real-time crash risk evaluation

https://doi.org/10.1016/j.aap.2012.11.027Get rights and content

Abstract

Real-time crash risk evaluation models will likely play a key role in Active Traffic Management (ATM). Models have been developed to predict crash occurrence in order to proactively improve traffic safety. Previous real-time crash risk evaluation studies mainly employed logistic regression and neural network models which have a linear functional form and over-fitting drawbacks, respectively. Moreover, these studies mostly focused on estimating the models but barely investigated the models’ predictive abilities. In this study, support vector machine (SVM), a recently proposed statistical learning model was introduced to evaluate real-time crash risk. The data has been split into a training dataset (used for developing the models) and scoring datasets (meant for assessing the models’ predictive power). Classification and regression tree (CART) model has been developed to select the most important explanatory variables and based on the results, three candidates Bayesian logistic regression models have been estimated with accounting for different levels unobserved heterogeneity. Then SVM models with different kernel functions have been developed and compared to the Bayesian logistic regression model. Model comparisons based on areas under the ROC curve (AUC) demonstrated that the SVM model with Radial-basis kernel function outperformed the others. Moreover, several extension analyses have been conducted to evaluate the effect of sample size on SVM models’ predictive capability; the importance of variable selection before developing SVM models; and the effect of the explanatory variables in the SVM models. Results indicate that (1) smaller sample size would enhance the SVM model's classification accuracy, (2) variable selection procedure is needed prior to the SVM model estimation, and (3) explanatory variables have identical effects on crash occurrence for the SVM models and logistic regression models.

Highlights

► We introduce support vector machine models to perform real-time crash risk evaluation. ► SVM models with different kernel functions were compared with logistic regression models. ► Hierarchical Bayesian logistic regression models were used to capture unobserved heterogeneity. ► Extension analyses about the explanatory variables used in the SVM models have been conducted.

Introduction

Recently Active Traffic Management (ATM) have been emerging in the US and Europe, its key control strategies such as variable speed limits (VSL) were recognized to have the benefits of improving traffic safety (Mirshahi et al., 2007, Pan et al., 2010, Chang et al., 2011). These implemented systems were mostly designed to reduce speed variations to reduce the crash risk. Moreover, more advanced proactive crash prediction models have showed promising effects on reducing crash occurrence along with the VSL system (Abdel-Aty et al., 2006, Lee et al., 2006b, Lee and Abdel-Aty, 2008). Within these studies, sophisticated real-time crash risk evaluation models were estimated to emulate the crash occurrence probabilities with the real-time traffic data. Crash risks would be evaluated with real-time traffic data and once a certain threshold of crash risk has been reached, the VSL control system would be triggered to smoothen the traffic flow and improve traffic safety. The real-time crash risk evaluation models try to identify the “crash precursor conditions” by comparing the crash occurrence traffic statuses and randomly selected non-crash cases. In the previous studies, both the traditional statistical models and artificial intelligence models have been utilized. Matched case–control logistic regression was one of the widely employed traditional statistical models (Abdel-Aty et al., 2004, Abdel-Aty et al., 2007, Lee et al., 2006a) while the artificial neural network models (Pande and Abdel-Aty, 2006a, Pande et al., 2011) was another popular modeling technique that has been adopted in previous studies. More recently, as the Bayesian inference technique became popular, Bayesian logistic regression models have been used in real-time crash risk evaluation studies (Ahmed et al., in press-a, Ahmed et al., in press-b).

Although previous real-time crash risk evaluation models have been proven to be capable of differentiating between crash and non-crash cases, these models have some limitations. Logistic regression models assumed a linear relationship between the dependent and independent variables while neural network models work as a black-box and may have over-fitting issues. Support vector machine (SVM), a newly introduced pattern classifier based on statistical learning theory (Vladimir and Vapnik, 1995) was introduced in this study to formulize the real-time crash risk evaluation model. Data from a 15-mile mountainous freeway (I-70) in Colorado was used in this study. With the merit of the Remote Traffic Microwave Sensor (RTMS) radars implemented along the freeway, real-time traffic data (speed, occupancy and volume) was captured and matched with the historical crash data.

The data has been split into training and scoring datasets. The training dataset was utilized to estimate the models and the scoring dataset was meant to test the prediction powers of different models. Due to SVM models lack of the capability of selecting significant variables and the use of all the variables as input would make the model cumbersome, a classification and regression tree (CART) was first estimated to select the most significant contributing variables. Then based on the chosen explanatory variables, three candidates Bayesian logistic regression models have been estimated with accounting for different levels unobserved heterogeneity. Then SVM model with Radial-basis kernel function and linear kernel function have been estimated and compared to Bayesian logistic regression model. Comparisons have been made based on the areas under the ROC curves (AUC). Moreover, SVM models without the variable selection procedure have also been estimated and investigated. Furthermore, the scoring datasets were divided into different sample sizes to test the sample size issue on these models prediction abilities. Finally, sensitivity analyses have been conducted to reveal the effects of the explanatory variables. Fig. 1 presents the flowchart of the main modeling procedures for this study.

Section snippets

Background

Support vector machine (SVM) models have been employed in some aspects of transportation research studies. Yuan and Cheu (2003) introduced SVM in incident detection and they compared the results from SVM models with the multi-layer feed forward neural network (MLFNN) and probabilistic neural network models. They concluded that SVM models provided a lower misclassification rate, higher correct detection rate and lower false alarm rate. Later on, Chen et al. (2009) also constructed SVM models to

Data preparation

The 15-mile mountainous freeway is located on I-70 in Colorado and the studied segment starts from the Mile Marker (MM) 205 and ends at MM 220. There were two datasets utilized in this study, (1) crash data from October 2010 to October 2011 provided by Colorado Department of Transportation (CDOT) and (2) real-time traffic data detected by 30 RTMS radars. There were 265 crashes documented and matched with the traffic data and 1017 non-crash cases that were matched with the crash cases. The RTMS

Support vector machine

Support vector machine was originally designed based on statistical learning theory and the structural risk minimization. The algorithm tries to find a separating hyperplane by minimizing the distance of misclassified points to the decision boundary. For the binary classification problem in this study (crash and non-crash), given the training data(x1,y1),,(xi,yi),yi{1,1}assuming that for the crash cases yi=1andyi=1 for the non-crash cases; xi represent the matrix for explanatory

Variable selection

Due to the SVM models lack of capability of selecting significant variables from the 18 explanatory variables, a classification and regression tree (CART) has been estimated to do the variable selection work. CART models have frequently used in traffic safety studies for their classification capability. For example, Chang and Wang (2006) utilized a CART model to analyze crash injury severity and Chang and Chen (2005) employed a CART model to predict crash frequency. Moreover, Kuhnert et al.

Conclusion

Active Traffic Management (ATM) concepts are gaining momentum around the world. Improving traffic safety is expected to be a major component of ATM. Thus efficient and accurate real-time crash prediction models are required. Previous studies that have focused on this topic adopted both the traditional statistical (logistic regression model) and the artificial neural network techniques. Due to limitations of these models (linear function forms and over-fitting problems), SVM models have been

References (38)

  • F. Yuan et al.

    Incident detection using support vector machines

    Transportation Research Part C: Emerging Technologies

    (2003)
  • R. Yu et al.

    Bayesian random effect models incorporating real-time weather and traffic data to investigate mountainous freeway hazardous factors

    Accident Analysis and Prevention

    (2013)
  • M. Abdel-Aty et al.

    Crash risk assessment using intelligent transportation systems data and real-time intervention strategies to improve safety on freeways

    Journal of Intelligent Transportation Systems

    (2007)
  • M. Abdel-Aty et al.

    Predicting freeway crashes from loop detector data by matched case–control logistic regression

    Transportation Research Record: Journal of the Transportation Research Board

    (2004)
  • Ahmed, M., Abdel-Aty, M., Yu, R. Assessment of the interaction between crash occurrence, mountainous freeway geometry,...
  • Ahmed, M., Abdel-Aty, M., Yu, R. A Bayesian updating approach for real-time safety evaluation using AVI data. Journal...
  • Breslow, N., Day, N., 1980. Statistical Methods in Cancer Research, vol. 1. The Analysis of Case–Control Studies...
  • G. Chang et al.

    Its field demonstration: integration of variable speed limit control and travel time estimation for a recurrently congested highway

  • R. Cheu et al.

    Forecasting shared-use vehicle trips with neural networks and support vector machines

    Transportation Research Record: Journal of the Transportation Research Board

    (2006)
  • Cited by (268)

    • Machine learning based real-time prediction of freeway crash risk using crowdsourced probe vehicle data

      2024, Journal of Intelligent Transportation Systems: Technology, Planning, and Operations
    View all citing articles on Scopus
    View full text