Analysis of traffic accident injury severity on Spanish rural highways using Bayesian networks
Research highlights
▶ Bayesian Networks are usefully applied in the domain of traffic accident modeling. ▶ BNs are used for classifying traffic accidents according to their injury severity. ▶ BNs inference identifies variables associated with KSI (killed or seriously injured). ▶ The key variables for KSI were accident type, age, lighting and number of injuries.
Introduction
The number of traffic accidents and their effects, mainly human fatalities and injuries, justify the importance of analyzing the factors that contribute to their occurrence. Identifying the factors that significantly influence the injury severity of traffic accidents was the main objective of many previous studies. Factors affecting injury severity of a traffic accident are usually caused by one or more of the following factors: driver characteristics, highway characteristics, vehicle characteristics, accidents characteristics and atmospheric factors (Kopelias et al., 2007, Chang and Wang, 2006).
Regression analysis has been widely used to determine the contributing factors that cause a specific injury severity. The most commonly used regression models in traffic injury analysis are the logistic regression model and the ordered Probit model (Al-Ghamdi, 2002, Milton et al., 2008, Bédard et al., 2002, Yau et al., 2006, Yamamoto and Shankar, 2004, Kockelman and Kweon, 2002). However, most of the regression models that are used to model traffic injury severity have their own model assumptions and pre-defined underlying relationships between dependent and independent variables (i.e. linear relations between the variables) (Chang and Wang, 2006). If these assumptions are violated, the model could lead to erroneous estimations of the likelihood of severe injury.
Gregoriades (2007) highlighted the interest of using Bayesian Networks (BNs) to model traffic accidents and discussed the need to not consider traffic accidents as a deterministic assessment problem. Instead, researchers should model the uncertainties involved in the factors that can lead to road accidents. He listed a number of candidate approaches for modeling uncertainty, such as, Bayesian probability.
BNs make it easy to describe accidents that involve many interdependent variables. The relationship and structure of the variables can be studied and trained from accident data. They do not need to know any pre-defined relationships between dependent and independent variables.
The three main advantages of BNs are bi-directional induction, incorporation of missing variables and probabilistic inference. By using BNs, it is relatively easy to discover the underlying patterns of data, to investigate the relationships between variables and to make predictions using these relationships. Incident data used in a study by Ozbay and Noyan (2006) were collected from incident clearance survey forms to understand incident clearance characteristics and then used to develop incident duration prediction models. The researchers modeled the incidents’ clearance durations using BNs and were able to represent the stochastic nature of incidents.
Using BNs to analyze traffic accident injury severity is scarce. A two car accident injury severity model was constructed using BNs (Simoncic, 2004). A BN was built using several variables, and the Most Probable Explanation (MPE) was calculated for the most probable configuration of values for all the variables in the BN, in order to serve as an indication of the quality of the estimated BN. The results pointed out that BNs could be applied in road accident modeling, and some improvements, such as using more variables and larger datasets, were recommended. Although this study highlighted the possibility of using BNs to model traffic accidents, the results were based on building only one possible network, without measuring the performance of the Bayesian classifier.
The scope of this paper is to validate the possibility of using BNs to classify traffic accidents according to their injury severity, and to find out the best BN classification performance along with the best graphical representation, in order to be capable of identifying the relevant variables that affect the injury severity of traffic accidents.
This paper is organized as follows. Section 2 presents the data used and briefly reviews the concept of BNs and Bayesian learning. The methods used for preprocessing and evaluating the data are also presented; finally a brief description of inference is presented. In Section 3, the results and their discussion are presented. In Section 4, summary and conclusions are given.
Section snippets
Accident data
Accident data were obtained from the Spanish General Traffic Directorate (DGT) for rural highways for the province of Granada (South of Spain) for three years (2003–2005). The total number of accidents obtained for this period was 3302. The data were first checked out for questionable data, and those which were found to be unrealistic were screened out. Only rural highways were considered in this study; data related to intersections were not included, since intersections have their own specific
Results and discussion
Table 2 shows the results obtained from building BNs using the hill climbing search method and three different score metrics (BDe, MDL and AIC) using both the training and the test set to validate the results. From the original dataset, 2/3 of the data was held for training the BNs and the other 1/3 was used for testing them.
Ten different schemes of training/testing datasets were used to analyze the effect of swapping training and test datasets. Table 2 shows the average and the standard
Limitations of the study
Before conclusions, some limitations should be pointed out:
- •
The need for large datasets when working with Bayesian networks, and the effect that imbalanced dataset (slight injured versus killed or seriously injured) has on both sensitivity and specificity.
- •
The data collection is based on the standard traffic police report used in Spain. So, the variable cause of the accident (CAU) was determined and judged based on the experience of the traffic police. However, a different person might have
Summary and conclusions
This paper uses BNs to analyze traffic accident data in order to validate the ability of this data-mining technique to classify traffic accidents according to their injury severity, and to identify the significant factors that are associated with KSI in traffic accidents.
Traffic accident data was obtained from the DGT for a period of three years (2003–2005) for Granada (Spain). Three BNs were built using three different score metrics: BDe, MDL and AIC.
Several indicators have been used in order
Acknowledgements
The authors are grateful to the Spanish General Directorate of Traffic (DGT) for supporting this research and offering all the resources that are available to them. The authors appreciate the reviewers’ comments and effort in order to improve the paper.
References (26)
Analysis of driver injury severity levels at multiple locations using ordered probit models
Journal of Safety Research
(2003)Using logistic regression to estimate the influence of accident factors on accident severity
Accident Analysis and Prevention
(2002)- et al.
The independent contribution of driver, crash, and vehicle characteristics to driver fatalities
Accident Analysis and Prevention
(2002) - et al.
Analysis of traffic injury severity: an application of non-parametric classification tree techniques
Accident Analysis and Prevention
(2006) - et al.
Diagnosis of breast cancer using Bayesian networks: a case study
Computers in Biology and Medicine
(2007) - et al.
Injury severity analysis of accidents involving young male drivers in Great Britain
Journal of Safety Research
(2008) - et al.
Driver injury severity: an application of ordered probit models
Accident Analysis and Prevention
(2002) On the classification performance of TAN and general Bayesian networks
Journal of Knowledge-Based Systems
(2009)- et al.
Highway accident severities and the mixed logit model: an exploratory empirical analysis
Accident Analysis and Prevention
(2008) - et al.
Estimation of incident clearance times using Bayesian Networks approach
Accident Analysis and Prevention
(2006)
Classification tree to identify severe and moderate injuries in young and middle aged adults
Artificial Intelligence in Medicine
Age and gender patterns in motor vehicle crash injuries: importance of type of crash and occupant role
Accident Analysis and Prevention
Bivariate ordered-response probit model of driver's and passenger's injury severities in collisions with fixed objects
Accident Analysis and Prevention
Cited by (218)
Prioritizing rear-end crash explanatory factors for injury severity level using deep learning and global sensitivity analysis
2024, Expert Systems with ApplicationsAdvances, challenges, and future research needs in machine learning-based crash prediction models: A systematic review
2024, Accident Analysis and Prevention
- 1
These authors have contributed equally to this work.