Analysis of traffic accident injury severity on Spanish rural highways using Bayesian networks

doi:10.1016/j.aap.2010.09.010

Accident Analysis & Prevention

Volume 43, Issue 1, January 2011, Pages 402-411

https://doi.org/10.1016/j.aap.2010.09.010 Get rights and content

Abstract

Several different factors contribute to injury severity in traffic accidents, such as driver characteristics, highway characteristics, vehicle characteristics, accidents characteristics, and atmospheric factors. This paper shows the possibility of using Bayesian Networks (BNs) to classify traffic accidents according to their injury severity. BNs are capable of making predictions without the need for pre assumptions and are used to make graphic representations of complex systems with interrelated components. This paper presents an analysis of 1536 accidents on rural highways in Spain, where 18 variables representing the aforementioned contributing factors were used to build 3 different BNs that classified the severity of accidents into slightly injured and killed or severely injured. The variables that best identify the factors that are associated with a killed or seriously injured accident (accident type, driver age, lighting and number of injuries) were identified by inference.

Research highlights

▶ Bayesian Networks are usefully applied in the domain of traffic accident modeling. ▶ BNs are used for classifying traffic accidents according to their injury severity. ▶ BNs inference identifies variables associated with KSI (killed or seriously injured). ▶ The key variables for KSI were accident type, age, lighting and number of injuries.

Introduction

The number of traffic accidents and their effects, mainly human fatalities and injuries, justify the importance of analyzing the factors that contribute to their occurrence. Identifying the factors that significantly influence the injury severity of traffic accidents was the main objective of many previous studies. Factors affecting injury severity of a traffic accident are usually caused by one or more of the following factors: driver characteristics, highway characteristics, vehicle characteristics, accidents characteristics and atmospheric factors (Kopelias et al., 2007, Chang and Wang, 2006).

Regression analysis has been widely used to determine the contributing factors that cause a specific injury severity. The most commonly used regression models in traffic injury analysis are the logistic regression model and the ordered Probit model (Al-Ghamdi, 2002, Milton et al., 2008, Bédard et al., 2002, Yau et al., 2006, Yamamoto and Shankar, 2004, Kockelman and Kweon, 2002). However, most of the regression models that are used to model traffic injury severity have their own model assumptions and pre-defined underlying relationships between dependent and independent variables (i.e. linear relations between the variables) (Chang and Wang, 2006). If these assumptions are violated, the model could lead to erroneous estimations of the likelihood of severe injury.

Gregoriades (2007) highlighted the interest of using Bayesian Networks (BNs) to model traffic accidents and discussed the need to not consider traffic accidents as a deterministic assessment problem. Instead, researchers should model the uncertainties involved in the factors that can lead to road accidents. He listed a number of candidate approaches for modeling uncertainty, such as, Bayesian probability.

BNs make it easy to describe accidents that involve many interdependent variables. The relationship and structure of the variables can be studied and trained from accident data. They do not need to know any pre-defined relationships between dependent and independent variables.

The three main advantages of BNs are bi-directional induction, incorporation of missing variables and probabilistic inference. By using BNs, it is relatively easy to discover the underlying patterns of data, to investigate the relationships between variables and to make predictions using these relationships. Incident data used in a study by Ozbay and Noyan (2006) were collected from incident clearance survey forms to understand incident clearance characteristics and then used to develop incident duration prediction models. The researchers modeled the incidents’ clearance durations using BNs and were able to represent the stochastic nature of incidents.

Using BNs to analyze traffic accident injury severity is scarce. A two car accident injury severity model was constructed using BNs (Simoncic, 2004). A BN was built using several variables, and the Most Probable Explanation (MPE) was calculated for the most probable configuration of values for all the variables in the BN, in order to serve as an indication of the quality of the estimated BN. The results pointed out that BNs could be applied in road accident modeling, and some improvements, such as using more variables and larger datasets, were recommended. Although this study highlighted the possibility of using BNs to model traffic accidents, the results were based on building only one possible network, without measuring the performance of the Bayesian classifier.

The scope of this paper is to validate the possibility of using BNs to classify traffic accidents according to their injury severity, and to find out the best BN classification performance along with the best graphical representation, in order to be capable of identifying the relevant variables that affect the injury severity of traffic accidents.

This paper is organized as follows. Section 2 presents the data used and briefly reviews the concept of BNs and Bayesian learning. The methods used for preprocessing and evaluating the data are also presented; finally a brief description of inference is presented. In Section 3, the results and their discussion are presented. In Section 4, summary and conclusions are given.

Section snippets

Accident data

Accident data were obtained from the Spanish General Traffic Directorate (DGT) for rural highways for the province of Granada (South of Spain) for three years (2003–2005). The total number of accidents obtained for this period was 3302. The data were first checked out for questionable data, and those which were found to be unrealistic were screened out. Only rural highways were considered in this study; data related to intersections were not included, since intersections have their own specific

Results and discussion

Table 2 shows the results obtained from building BNs using the hill climbing search method and three different score metrics (BDe, MDL and AIC) using both the training and the test set to validate the results. From the original dataset, 2/3 of the data was held for training the BNs and the other 1/3 was used for testing them.

Ten different schemes of training/testing datasets were used to analyze the effect of swapping training and test datasets. Table 2 shows the average and the standard

Limitations of the study

Before conclusions, some limitations should be pointed out:

•
The need for large datasets when working with Bayesian networks, and the effect that imbalanced dataset (slight injured versus killed or seriously injured) has on both sensitivity and specificity.
•
The data collection is based on the standard traffic police report used in Spain. So, the variable cause of the accident (CAU) was determined and judged based on the experience of the traffic police. However, a different person might have

Summary and conclusions

This paper uses BNs to analyze traffic accident data in order to validate the ability of this data-mining technique to classify traffic accidents according to their injury severity, and to identify the significant factors that are associated with KSI in traffic accidents.

Traffic accident data was obtained from the DGT for a period of three years (2003–2005) for Granada (Spain). Three BNs were built using three different score metrics: BDe, MDL and AIC.

Several indicators have been used in order

Acknowledgements

The authors are grateful to the Spanish General Directorate of Traffic (DGT) for supporting this research and offering all the resources that are available to them. The authors appreciate the reviewers’ comments and effort in order to improve the paper.

References (26)

M. Abdel-Aty
Analysis of driver injury severity levels at multiple locations using ordered probit models
Journal of Safety Research
(2003)
A.S. Al-Ghamdi
Using logistic regression to estimate the influence of accident factors on accident severity
Accident Analysis and Prevention
(2002)
M. Bédard et al.
The independent contribution of driver, crash, and vehicle characteristics to driver fatalities
Accident Analysis and Prevention
(2002)
L.Y. Chang et al.
Analysis of traffic injury severity: an application of non-parametric classification tree techniques
Accident Analysis and Prevention
(2006)
N. Cruz-Ramírez et al.
Diagnosis of breast cancer using Bayesian networks: a case study
Computers in Biology and Medicine
(2007)
R.C. Gray et al.
Injury severity analysis of accidents involving young male drivers in Great Britain
Journal of Safety Research
(2008)
K.M. Kockelman et al.
Driver injury severity: an application of ordered probit models
Accident Analysis and Prevention
(2002)
M.G. Madden
On the classification performance of TAN and general Bayesian networks
Journal of Knowledge-Based Systems
(2009)
J.C. Milton et al.
Highway accident severities and the mixed logit model: an exploratory empirical analysis
Accident Analysis and Prevention
(2008)
K. Ozbay et al.
Estimation of incident clearance times using Bayesian Networks approach
Accident Analysis and Prevention
(2006)

L.J. Scheetz et al.

Classification tree to identify severe and moderate injuries in young and middle aged adults

Artificial Intelligence in Medicine

(2009)

D.R. Tavris et al.

Age and gender patterns in motor vehicle crash injuries: importance of type of crash and occupant role

Accident Analysis and Prevention

(2001)

T. Yamamoto et al.

Bivariate ordered-response probit model of driver's and passenger's injury severities in collisions with fixed objects

Accident Analysis and Prevention

(2004)

Cited by (218)

Prioritizing rear-end crash explanatory factors for injury severity level using deep learning and global sensitivity analysis
2024, Expert Systems with Applications
Traffic accidents are usually unique events with unpredictable geographical and temporal dimensions; thus, accident injury severity level (INJ-SL) analysis presents formidable categorization and data stability problems. Classical statistical models are limited in their ability to correctly model INJ-SL, whilst sophisticated machine learning approaches do not appear to have any equations to prioritize/analyze multiple contributing factors to forecast accidents accompanying INJ-SLs. In addition, the intercorrelations between the input variables may render the conclusions of a formal sensitivity analysis incorrectly. Rear-end collisions are the most common form of traffic accidents; consequently, their linked INJ-SL requires more research. This paper provides a complex technique based on a deep learning paradigm paired with different indicators of Global Sensitivity Analysis to address all of these concerns. Unlike existing neural network designs, this technique presents a deep residual neural network structure that employs residual shortcuts (i.e., connections). The connections enable the DRNNs to sidestep a few levels of the deep network architecture, evading the regular training with high accuracy issues. Using the trained DRNNs model, a Latin Hypercube sampling simulation was undertaken to determine each explanatory component's influence on the resulting INJ-SL. The latest available data from 2011 to 2018 is used to assess all rear-end collisions in North Carolina. A comparison was made between the performance of two different schemes of data categorization using a set of global sensitivity metrics. It was determined that the devised technique overcame the data heterogeneity problems to achieve an accuracy of 87%. In addition, the proposed sensitivity analysis identified the most relevant factors associated with INJ-SL rear-end collisions.
Advances, challenges, and future research needs in machine learning-based crash prediction models: A systematic review
2024, Accident Analysis and Prevention
Accurately modelling crashes, and predicting crash occurrence and associated severities are a prerequisite for devising countermeasures and developing effective road safety management strategies. To this end, crash prediction modelling using machine learning has evolved over two decades. With the advent of big data that provides unprecedented opportunities to better understand the crash mechanism and its determinants, such efforts will likely be accelerated. To gear these efforts, understanding state-of-the-art machine learning-based crash prediction models becomes paramount to summarise the lessons learned from past efforts, which can assist in developing robust and accurate models. This review paper aims to address this gap by systematically reviewing the machine learning studies on crash modelling. Models are reviewed from three aspects of the application: (a) crash occurrence (or real-time crash) prediction, (b) crash frequency prediction, and (c) injury severity prediction. Further, model intricacies that impact model performance are identified and thoroughly reviewed. This comprehensive review highlights specific gaps and future research needs in three aforementioned model applications, such as improper selection of non-crash events for crash occurrence models, the inability of future forecasting of crash frequency models, and inconsistency in injury severity classes. Critical research needs relating to model development, evaluation, and application are also discussed. This review envisages methodological advancements in machine learning models for crash prediction modelling and leveraging big data to better link crashes with its determinants.
Estimating injury severity for motorized and non-motorized vehicle-involved crashes: Insights from random-parameter ordered probit model with heterogeneity in means and variances
2023, IATSS Research
The use of advanced models to investigate the determinants of injury severity outcomes for motorized and non-motorized-involved crashes are sparse. Therefore, random-parameter ordered probit models with heterogeneity in means and variances were developed to estimate factors affecting injury severity for motorized and non-motorized-involved crashes. Data covering a five-year period comprising 5976 and 634 cases for motorized and non-motorized-involved crashes respectively, was retrieved from the database of the National Road Safety Authority, State Insurance Company and Driver and Vehicle Licensing Authority in Ghana and used for model estimation. The results show that factors have varying significant effects on injury severity outcomes for motorized and non-motorized models. Marginal effects indicate that old age occupants, head-on-collision, exceeding a posted speed limit of 100 km/h and crash during weekends contributed greatly to the likelihood of severe injury outcomes in motorized model. Additionally, male non-motorists, non-use of helmet, rear-end collision, right-angle collision and crash on urban roads and during weekends, contributed significantly to the severe injury outcomes of non-motorized models. The direction of effect of the factors on severe injury was observed to have varying degrees of estimated coefficients. The difference in estimated coefficients shows that crashes involving non-motorized vehicles were more likely to result in severe injury compared to motorized vehicles. The motorized model had heterogeneity in means of five (5) random parameters observed, while the non-motorized model had heterogeneity in means of four (4) random parameters observed with two variables affecting the variance of three random parameters. Based on the results, various countermeasures were proposed to enhance road traffic safety.
Modelling crash severity outcomes for low speed urban roads using back propagation – Artificial neural network (BP – ANN) – A case study in Indian context
2023, IATSS Research
This work analyses influence of road, weather and crash-specific factors on crash severity outcomes for low-speed urban midblock sections and intersections, for day and night time, using Backpropagation–Artificial Neural Network (BP–ANN). Five-year crash data (2015–2019) from 82Km urban road network of Patna, India was used for the study. The road factors include pavement width, distress condition, marking; shoulder type, condition; road section type as mid-block, intersection and intersection control. Weather factors include season of crash, fog or rain at crash time. Crash factor include collision partner, type and crash time. The most appropriate BP–ANN model architecture was estimated using Misclassification-Rate. It was observed that midblock segments witness higher severities during daytime, whereas intersections witness higher severities during night. Controlled intersections are safer compared to un-controlled intersections. Pavement distress greatly increase the chance of higher severities. Narrow roads record greater severities during day due to lack of surveillance.
Factors propelling fatalities during road crashes: A detailed investigation and modelling of historical crash data with field studies
2022, Heliyon
One of the major concerns in developing countries like India is to maintain traffic safety under mixed and heterogenous scenario. Although zero accidents is the need of the hour, the first step to attain it is ensuring zero deaths and no serious long-term disabling injuries in road crashes. To reduce the road crash fatalities, explicit and detailed studies have been conducted by utilising historical road crash data of two emerging smart cities of India - Bhubaneswar and Visakhapatnam. Traffic flow data and characteristics of road infrastructure has also been collected by performing field studies at accident prone locations. Various factors including vehicular characteristics, road user characteristics, and road infrastructure have been analyzed using various non-parametric tests to identify the contributing factors resulting in fatalities. It is observed that out of 14 variables used for study, 8 factors were significantly related to fatal crashes. These included categories of victim and accused, 85th percentile speed, presence of road markings, availability of sight distance, etc. The significant factors were subjected to binary logistic regression to determine the odd’s ratio of significant factors. The logistic regression predicted 79% of deaths correctly. Crash fatality prediction models are developed using both Classification and Regression Tree (CART) classification tree with 83% accuracy. Although CART classification led to higher accuracy, binary logistic regression is more robust as it considered more significant factors as compared to CART. Subsequently, a severity index has been proposed based on proportions of actual fatal crashes and usage of K-means clustering technique. The proposed indices shall be really helpful in traffic safety management, specifically in reduction of fatalities during road crashes.
Strategic planning support for road safety measures based on accident data mining
2022, IATSS Research
When actions and measures to increase road safety are to be planned by the police and local authorities, it is necessary to consider the specific accident circumstances as well as their historical, current, and predicted course. In particular, combinations of accident circumstances not contained in existing police statistics are often neglected, but may nevertheless be relevant, e. g., due to an increasing frequency. In order to identify these undiscovered interesting combinations, we propose a framework to support strategic planning of road safety measures based on several consecutive data mining stages. The scope, type, and location of road safety measures must be planned at a strategic level several months in advance to be fully effective. Therefore, it is essential to investigate and predict the accident circumstances and the temporal changes in their frequency comprehensively. Only with the knowledge, e. g., about the temporal pattern, locations, conditions of roads or speeds, meaningful actions can be derived. The embedded data mining approaches, i. e., frequent itemset mining, time series clustering, time series classification, forecasting, and scoring, are carefully selected, coordinated, and aligned. As a result, the framework provides police users with information about circumstances of accidents that are of interest in the future and presents their previous temporal and local patterns in a dashboard. In this study, the framework is applied in four different geographical regions. Thereby, default parameter settings for all approaches are found that are particularly suitable for the framework to investigate novel geographic regions.

View all citing articles on Scopus

¹: These authors have contributed equally to this work.

View full text

Analysis of traffic accident injury severity on Spanish rural highways using Bayesian networks

Abstract

Research highlights

Introduction

Section snippets

Accident data

Results and discussion

Limitations of the study

Summary and conclusions

Acknowledgements

Journal of Safety Research

Accident Analysis and Prevention

Accident Analysis and Prevention

Accident Analysis and Prevention

Computers in Biology and Medicine

Journal of Safety Research

Accident Analysis and Prevention

Journal of Knowledge-Based Systems

Accident Analysis and Prevention

Accident Analysis and Prevention

Artificial Intelligence in Medicine

Accident Analysis and Prevention

Accident Analysis and Prevention