Background
Proposed framework
Data preprocessing
Clustering algorithm
Distance measure
Association rules
Interestingness measures
Data set description
S. no. | Attribute | Code | Value | Total | Criticality | |
---|---|---|---|---|---|---|
Critical | Non-critical | |||||
1 | NOV: number of injury | 1 | 1 injury | 5932 | 689 | 5243 |
2 | 2 injuries | 2598 | 451 | 2147 | ||
+2 | >2 injuries | 3044 | 114 | 2930 | ||
2 | AGE: age | CHL | < 18 years | 988 | 268 | 720 |
YNG | 18–30 years | 5954 | 654 | 5300 | ||
ADL | 30–60 years | 3045 | 165 | 2880 | ||
SNR | >60 years | 1587 | 167 | 1420 | ||
3 | GND: gender | M | Male | 8625 | 952 | 7673 |
F | Female | 2949 | 302 | 2647 | ||
4 | TOD: time of day | T1 | [0–4] | 678 | 45 | 633 |
T2 | [4–8] | 1032 | 164 | 868 | ||
T3 | [8–12] | 1358 | 258 | 1100 | ||
T4 | [12–16] | 1972 | 126 | 1846 | ||
T5 | [16–20] | 3768 | 245 | 3523 | ||
T6 | [20–24] | 2766 | 416 | 2350 | ||
5 | MON: month | WNT | Winter | 2822 | 325 | 2497 |
SPR | Spring | 2787 | 312 | 2475 | ||
SMR | Summer | 3144 | 368 | 2776 | ||
ATM | Autumn | 2821 | 249 | 2572 | ||
6 | LOR: lighting on road | DLT | Day light | 3850 | 268 | 3582 |
DUS | Dusk | 3203 | 429 | 2774 | ||
RLT | Road light | 1665 | 126 | 1539 | ||
NLT | No light | 2856 | 431 | 2425 | ||
7 | ROF: roadway feature | INT | Intersection | 3526 | 374 | 3152 |
SLP | Slope | 1157 | 212 | 945 | ||
CUR | Curve | 2827 | 266 | 2561 | ||
UNK | Unknown | 4064 | 402 | 3662 | ||
8 | RTY: road type | HIW | Highway | 6032 | 785 | 5247 |
NHW | Non-highway | 5542 | 469 | 5073 | ||
9 | ASV: accident severity | CR | Critical | 1254 | 1254 | 0 |
NC | Non-critical | 10320 | 0 | 10320 | ||
10 | ARA: area around | AGL | Agriculture land | 1984 | 289 | 1695 |
MAR | Market | 2069 | 145 | 1924 | ||
COL | Colony | 3250 | 119 | 3131 | ||
FOR | Forest | 1165 | 267 | 898 | ||
HIL | Hill area | 2354 | 345 | 2009 | ||
HOS | Hospital | 752 | 89 | 663 | ||
11 | TOA: type of accident | TWH | Two wheeler | 3688 | 194 | 3494 |
THW | Three wheeler | 255 | 55 | 200 | ||
MVH | Multi-vehicular | 855 | 64 | 791 | ||
VFH | Vehicle fall height | 2132 | 398 | 1734 | ||
VRO | Vehicle roll over | 1356 | 129 | 1227 | ||
PH | Pedestrian hit | 1580 | 265 | 1315 | ||
NM | Non-motorized | 254 | 12 | 242 | ||
MC | Multi-casualty | 364 | 16 | 348 | ||
FO | Fixed object/divider hit | 987 | 121 | 866 | ||
OT | Others | 103 | 0 | 103 |
Results and discussion
Cluster analysis
Cluster 1 (C1)
Cluster 2 (C2)
Cluster 3 (C3)
Cluster 4 (C4)
Cluster 5 (C5)
Cluster 6 (C6)
Cluster 1 | Cluster description | Count | Size (%) |
---|---|---|---|
1 | Two wheeler accidents on road intersections and curves near colonies and markets | 3181 | 27.48 |
2 | Two wheeler accident occurred on highways near hill, forest and agriculture land area | 1772 | 15.31 |
3 | All fall height accidents with two or more injuries | 1928 | 16.66 |
4 | Multiple vehicle accidents and fixed object hit accidents in no light condition | 1394 | 12.04 |
5 | Pedestrian hit cases | 1746 | 15.08 |
6 | Vehicle roll-over accidents | 1553 | 13.42 |
Association rule mining
Rule no. | Rule body | Support | Confidence | Lift |
---|---|---|---|---|
Cluster 1
| ||||
1 | {HIW, INT, COL} →{1} | 0.54 | 0.75 | 5.24 |
2 | {HIW, CUR} → {1, DUS} | 0.45 | 0.86 | 4.47 |
3 | {NHW, INT, MAR} → {>2, DLT} | 0.65 | 0.61 | 2.31 |
4 | {NHW, INT} → {COL} | 0.38 | 0.52 | 2.63 |
5 | {HIW, MAR} → {+2} | 0.35 | 0.55 | 1.54 |
6 | {NHW, COL} → {+2} | 0.58 | 0.65 | 1.26 |
7 | {INT, COL} → {NLT} | 0.62 | 0.66 | 1.23 |
8 | {MAR, DUS} → {INT, T5} | 0.36 | 0.54 | 1.20 |
9 | {HIW, HOS} → {T6} | 0.54 | 0.84 | 1.14 |
10 | {MAR, T6} → {HIW} | 0.47 | 0.69 | 1.11 |
Cluster 2
| ||||
11 | {HIW, SLP} → {HIL} | 0.63 | 0.8 | 3.16 |
12 | {HIW, NLT} → {FOR} | 0.56 | 0.74 | 3.14 |
13 | {HIW, AGL} → {+2} | 0.40 | 0.68 | 2.75 |
14 | {HIW}→ {AGL} | 0.54 | 0.75 | 2.71 |
15 | {FOR, T6] → {CUR, NLT} | 0.56 | 0.74 | 1.98 |
16 | {CUR} → {HIL, T2] | 0.69 | 0.71 | 1.95 |
17 | {HIL, CUR} → {+2} | 0.36 | 0.65 | 1.65 |
18 | {AGL, CUR} → {T5} | 0.42 | 0.58 | 1.35 |
19 | {YNG, HIL} → {T3, CUR} | 0.45 | 0.64 | 1.23 |
20 | {FOR, UNK} → {NLT} | 0.39 | 0.46 | 1.15 |
Cluster 3
| ||||
21 | {HIW, HIL} → {+2, CR} | 0.78 | 0.90 | 3.18 |
22 | {CUR, HIL, ADL} → {HIW} | 0.64 | 0.85 | 2.93 |
23 | {HIW, HIL, +2} → {CR} | 0.85 | 0.95 | 2.87 |
24 | {HIW, CUR} → {HIL} | 0.82 | 0.88 | 2.58 |
25 | {FOR} → {NC} | 0.78 | 0.65 | 1.78 |
26 | {NHW} → {FOR} | 0.45 | 0.50 | 1.76 |
27 | {HIL, ADL} → {CR} | 0.42 | 0.65 | 1.64 |
28 | {T2, HIL} → {NLT} | 0.39 | 0.46 | 1.30 |
29 | {CUR} → {HIL, T5} | 0.46 | 0.80 | 1.25 |
30 | {HIL, SLOPE} → {HIW} | 0.35 | 0.74 | 1.22 |
Cluster 4
| ||||
31 | {HIW, NLT} → {INT, T6} | 0.65 | 0.78 | 4.85 |
32 | {HIW, CUR} → {NLT, T1} | 0.78 | 0.85 | 3.81 |
33 | {NLT} → {INT} | 0.74 | 0.7 | 3.77 |
34 | {HIW, DAY} → {INT, NC} | 0.70 | 0.84 | 2.73 |
35 | {NHW, NLT} → {SLP} | 0.40 | 0.65 | 3.38 |
36 | {T6, AGL} → {NLT, CR} | 0.55 | 0.65 | 2.56 |
37 | {CUR, T1, HIW} → {FOR} | 0.36 | 0.46 | 2.16 |
38 | {FOR} → {NLT, HIW} | 0.54 | 0.74 | 1.98 |
39 | {RLT, MAR} → {INT, NC} | 0.57 | 0.66 | 1.80 |
40 | {HIW} → {NLT, AGL} | 0.56 | 0.84 | 1.65 |
Cluster 5
| ||||
41 | {NHW, MAR} → {YNG} | 0.48 | 0.87 | 3.22 |
42 | {COL, INT} → {NHW, DUS} | 0.56 | 0.92 | 3.11 |
43 | {INT, NLT} → {AGL, T2} | 0.58 | 0.74 | 2.31 |
44 | {HIW, INT} → {MAR} | 0.38 | 0.46 | 2.11 |
45 | {T3, DAY} → {NC, INT} | 0.65 | 0.69 | 2.05 |
46 | {NLT, INT} →{CR, T6} | 0.39 | 0.82 | 1.95 |
47 | {HOS, NLT} → {T6} | 0.54 | 0.81 | 1.80 |
48 | {MAR} → {NC, DAY} | 0.46 | 0.64 | 1.78 |
49 | {RLT, INT} → {MAR} | 0.51 | 0.79 | 1.50 |
50 | {AGL, ADL} → {NLT} | 0.36 | 0.78 | 1.45 |
Cluster 6
| ||||
51 | {FOR, SLP} → {NLT} | 0.45 | 0.65 | 3.62 |
52 | {AGL, DUS} → {CUR} | 0.35 | 0.54 | 3.58 |
53 | {FOR} → {NLT, UNK} | 0.65 | 0.78 | 2.46 |
54 | {HIW, T2} → {AGL} | 0.58 | 0.81 | 2.42 |
55 | {AGL, UNK} → {NHW} | 0.64 | 0.85 | 2.12 |
56 | {UNK} →{DAY, COL} | 0.58 | 0.65 | 1.94 |
57 | {RLT, UNK} → {NC, MAR} | 0.45 | 0.75 | 1.68 |
58 | {ADL, AGL} → {UNK} | 0.64 | 0.68 | 1.42 |
59 | {AGL, INT} → {NC} | 0.39 | 0.75 | 1.35 |
60 | {FOR} → {CUR} | 0.40 | 0.68 | 1.34 |
Entire data set (EDS)
| ||||
61 | {HIW, INT} → {NC, MAR} | 0.40 | 0.75 | 5.45 |
62 | {FOR, NLT, M} → {HIW, TWH} | 0.52 | 0.65 | 4.35 |
63 | {HIW, NC} → {AGL, DAY} | 0.64 | 0.79 | 4.25 |
64 | {NHW, T5} → {NC, COL} | 0.38 | 0.65 | 4.36 |
65 | {HIW, HIL} → {NLT, NC} | 0.42 | 0.74 | 3.89 |
66 | {HIW, YNG} → {NC, TWH} | 0.46 | 0.65 | 3.48 |
67 | {NHW, MAR} → {M, NC} | 0.47 | 0.65 | 3.26 |
68 | {HIW, UNK} → {NC, TWH} | 0.56 | 0.69 | 2.98 |
69 | {NHW, T6, UNK} → {NC} | 0.65 | 0.62 | 2.46 |
70 | {NHW, COL} → {NC, M} | 0.39 | 0.74 | 2.15 |
-
Only two wheeler accidents are identified in EDS that satisfies minimum support of 30 %, other accident type remain hidden.
-
Rules for EDS do not reveal the obvious impact of road features on accidents such as it only shows that intersections are accident prone for every accident type, but rules for clusters shows that its probability of being accident prone varies for different clusters.
-
Forming cluster before rule generation gives various rules that are mainly associated with that cluster, but rules for EDS only shows a common association for each accident type which is not interesting.
-
A majority of unknown road feature is there in EDS rules but after cluster analysis it seems that its impact is associated with few clusters.