1 Introduction
2 Related work
3 The proposed data mining system
3.1 Data aggregation
3.2 Preprocessing
Field number | Description |
---|---|
1 | Age |
2 | Sex |
3 | BMI |
4 | Inherit |
5 | SBP |
6 | DBP |
7 | Education |
8 | FBS |
9 | 2hpp-BS |
10 | TC |
11 | LDL |
12 | HDL |
13 | TG |
14 | BUN/CR |
15 | Activity |
16 | Dispeplimi |
17 | Eye problem |
18 | High pressure |
19 | Dialyze |
20 | Heart problem |
21 | Stroke |
22 | Foot ulcers |
23 | DKA |
24 | Smoke |
3.2.1 Data cleaning (DC)
Category | Body mass index (kg/m\(^2\)) |
---|---|
Underweight | < 18.5 |
Normal weight | [18.5; 25) |
Overweight | [25; 30] |
Obese | > 30 |
Category | Systolic blood | Diastolic blood |
---|---|---|
pressure (mmHg) | pressure (mmHg) | |
Low blood pressure | < 90 | < 60 |
Normal | [90; 120) | [60; 80) |
The risk of high blood pressure | [120; 140] | [80; 90] |
High blood pressure | > 140 | > 90 |
Blood sugar test | Normal (mg/dL) | Pre-diabetic (mg/dL) | Diabetic (mg/dL) |
---|---|---|---|
Fasting | [70; 100] | [100; 130] | > 127 |
2 h after meal | < 140 | [140; 200] | > 200 |
Category | Cholesterol level (mg/dL) |
---|---|
Normal | < 200 |
At risk | [200; 240] |
High cholesterol | > 240 |
Category | HDL (mg/dL) |
---|---|
Low values | < 40 |
Average values | [40; 60] |
Normal | > 60 |
Category | LDL (mg/dL) |
---|---|
Normal | < 100 |
Values close to normal | [100; 130) |
At the beginning of the danger | [130; 160) |
At risk | [160; 190] |
Very high risk level | > 190 |
Category | Triglyceride (mg/dL) |
---|---|
Normal | < 150 |
Values close to normal | [150; 200) |
Dangerous levels | [200; 500) |
Very high risk level | > 500 |
3.2.2 Data transformation (DT)
3.3 Proposed classification method
Dispeplimi | An increase in blood lipids |
---|---|
Eye problem | Eye problem |
High pressure | High blood pressure |
Dialysis | Dialysis history |
Heart problem | Heart problems |
Stroke | Stroke |
Foot ulcers | Diabetic foot ulcer |
Dka | Diabetic coma |
3.3.1 Grey wolf optimizer (GWO)
-
Generate initial population of wolves based on a set of random solutions,
-
Calculate the corresponding objective value for each wolf,
-
Choose the first three best wolves and save them as alpha, beta, and omega,
-
Update the position of the rest of the population (delta wolves) using equations given in [40],
-
Update parameters a, A, and C,
-
Go to the second step if the criterion is not satisfied,
-
Position and score of the alpha solution is returned as the best solutions.
3.3.2 Improved GWO using weighted adaptive middle filter
3.3.3 Applying filter at each step of the GWO implementation
-
\(P=0.1\) when agent(i) is \(X_{\alpha }\)
-
\(P=0.2\) when agent(i) is \(X_{\beta }\)
-
\(P=0.3\) when agent(i) is \(X_{\delta }\)
-
\(P=0.4\) when agent(i) is \(X_{\omega }\)
-
\(weight(j) = 4\) when window(j) is \(X_{\alpha }\)
-
\(weight(j) = 3\) when window(j) is \(X_{\beta }\)
-
\(weight(j) = 2\) when window(j) is \(X_{\delta }\)
-
\(weight(j) = 1\) when window(j) is \(X_{\omega }\)
3.3.4 Avoid improved GWO from stocking in local optimum
3.3.5 Features selection
4 Experimental results and discussion
Diabetic complication | ||||||||
---|---|---|---|---|---|---|---|---|
Feature | Increase blood lipid | Eye problem | High blood pressure | Dialysis history | Heart problems | Stroke | Diabetic foot ulcer | Diabetic coma |
Age | 0.3476 | 0.4090 | 0.5183 | 0.3525 | 0.6142 | 0.1574 | 0.6354 | 0.1962 |
Sex | 0.2946 | 0.2068 | 0.1653 | 0.9535 | 0.8275 | 0.2466 | 0.1311 | 0.5397 |
BMI | 0.9583 | 0.7772 | 0.0095 | 0.9988 | 0.7025 | 0.7926 | 0.1762 | 0.8739 |
Inherit | 0.6989 | 0.9995 | 0.3814 | 0.1835 | 0.5176 | 0.3391 | 0.9725 | 0.4856 |
SBP | 0.8754 | 0.6344 | 0.1338 | 0.7600 | 0.4164 | 0.7215 | 0.4866 | 0.1218 |
DBP | 0.1390 | 0.1888 | 0.0891 | 0.6073 | 0.1953 | 0.8372 | 0.9278 | 0.7392 |
Education | 0.0920 | 0.0582 | 0.0373 | 0.0033 | 0.1344 | 0.2010 | 0.9968 | 0.5065 |
FBS | 0.4739 | 0.1898 | 0.3180 | 0.3105 | 0.7651 | 0.6029 | 0.2356 | 0.8864 |
2hpp-BS | 0.8580 | 0.9092 | 0.5889 | 0.6691 | 0.2965 | 0.1517 | 0.5593 | 0.5515 |
TC | 0.2877 | 0.7195 | 0.7506 | 0.4329 | 0.9505 | 0.0547 | 0.2578 | 0.9427 |
LDL | 0.2854 | 0.7660 | 0.6323 | 0.2741 | 0.8186 | 0.7312 | 0.0073 | 0.6617 |
HDL | 0.2668 | 0.6144 | 0.3132 | 0.2197 | 0.9569 | 0.2779 | 0.1230 | 0.7680 |
TG | 0.5718 | 0.6771 | 0.0153 | 0.5283 | 0.7770 | 0.0930 | 0.8608 | 0.7347 |
BUN/CR | 0.6923 | 0.4011 | 0.2842 | 0.0802 | 0.3140 | 0.4216 | 0.7524 | 0.6533 |
Activity | 0.5928 | 0.8015 | 0.1400 | 0.8081 | 0.0898 | 0.9237 | 0.4764 | 0.1142 |
Dispeplimi | 0.9952 | 0.0124 | 0.0127 | 0.0366 | 0.0953 | 0.0926 | 0.0442 | 0.2836 |
Eye problem | 0.0945 | 0.9869 | 0.0241 | 0.0855 | 0.1660 | 0.0843 | 0.2006 | 0.2713 |
High pressure | 0.0810 | 0.0089 | 0.9651 | 0.0348 | 0.0865 | 0.0867 | 0.0026 | 0.0700 |
Dialyze | 0.0397 | 0.2932 | 0.2262 | 0.9874 | 0.1794 | 0.0763 | 0.2470 | 0.0349 |
Heart problem | 0.0171 | 0.0162 | 0.1728 | 0.0506 | 0.9715 | 0.0265 | 0.0818 | 0.0924 |
Stroke | 0.1676 | 0.1267 | 0.0882 | 0.0131 | 0.0535 | 0.9770 | 0.0134 | 0.0049 |
Foot ulcers | 0.0904 | 0.0099 | 0.0148 | 0.0517 | 0.0721 | 0.0395 | 0.9499 | 0.0956 |
DKA | 0.0579 | 0.0902 | 0.0811 | 0.0678 | 0.0628 | 0.1652 | 0.1525 | 0.9754 |
Smoke | 0.2796 | 0.5537 | 0.7837 | 0.5844 | 0.1163 | 0.4912 | 0.3710 | 0.6156 |
4.1 Prediction of health complication
4.1.1 Increased blood lipids complication
4.1.2 Eye problem complication
4.1.3 High blood pressure complication
4.1.4 Dialysis history complication
4.1.5 Heart attack complications
4.1.6 Stroke complications
4.1.7 Diabetes foot ulcer complication
4.1.8 Diabetes coma complication
4.2 Evaluation and comparison of proposed method based on machine learning algorithms
Diabetes complications | Proposed method | Decision tree | Simple Bayes | MLP NN |
---|---|---|---|---|
Increase blood lipid | 96.0 | 94.0 | 81.0 | 92.0 |
Eye problem | 94.0 | 91.0 | 73.0 | 85.0 |
High blood pressure | 92.0 | 89.0 | 69.0 | 78.0 |
Dialysis history | 97.0 | 95.0 | 93.0 | 94.0 |
Heart problems | 95.0 | 89.0 | 72.0 | 86.0 |
Stroke | 96.0 | 93.0 | 90.0 | 94.0 |
Diabetic foot ulcer | 96.0 | 94.0 | 82.0 | 93.0 |
Diabetic coma | 97.0 | 93.0 | 89.0 | 94.0 |
4.2.1 Increased blood lipids complication
TP | FP | TN | FN | Recall | Precision | Sensitivity | Specificity | F-measure | ||
---|---|---|---|---|---|---|---|---|---|---|
1160 | 66 | 261 | 12 | 0.99 | 0.96 | 0.99 | 0.80 | 0.97 | Complication | Proposed method |
261 | 12 | 1160 | 66 | 0.80 | 0.96 | 0.80 | 0.99 | 0.87 | No complication | |
1228 | 66 | 185 | 20 | 0.98 | 0.95 | 0.98 | 0.74 | 0.96 | Complication | Decision tree |
185 | 20 | 1228 | 66 | 0.74 | 0.90 | 0.74 | 0.98 | 0.81 | No complication | |
1076 | 111 | 140 | 172 | 0.86 | 0.91 | 0.86 | 0.56 | 0.88 | Complication | Simple Bayes |
140 | 172 | 1076 | 111 | 0.56 | 0.45 | 0.56 | 0.86 | 0.50 | No complication | |
1217 | 92 | 159 | 31 | 0.98 | 0.93 | 0.98 | 0.63 | 0.95 | Complication | MLP NN |
159 | 31 | 1217 | 92 | 0.63 | 0.84 | 0.63 | 0.98 | 0.72 | No complication |
4.2.2 Eye problem complication
TP | FP | TN | FN | Recall | Precision | Sensitivity | Specificity | F-measure | ||
---|---|---|---|---|---|---|---|---|---|---|
1161 | 71 | 256 | 11 | 0.97 | 0.94 | 0.98 | 0.78 | 0.97 | Complication | Proposed method |
256 | 11 | 1161 | 71 | 0.78 | 0.96 | 0.78 | 0.99 | 0.86 | No complication | |
1080 | 85 | 282 | 52 | 0.95 | 0.93 | 0.95 | 0.77 | 0.94 | Complication | Decision tree |
282 | 52 | 1080 | 85 | 0.77 | 0.84 | 0.77 | 0.95 | 0.80 | No complication | |
946 | 220 | 147 | 186 | 0.84 | 0.81 | 0.84 | 0.40 | 0.82 | Complication | Simple Bayes |
147 | 186 | 946 | 220 | 0.40 | 0.44 | 0.40 | 0.84 | 0.42 | No complication | |
1082 | 172 | 195 | 50 | 0.96 | 0.86 | 0.96 | 0.53 | 0.91 | Complication | MLP NN |
195 | 50 | 1082 | 172 | 0.53 | 0.80 | 0.53 | 0.96 | 0.64 | No complication |
4.2.3 High blood pressure complication
TP | FP | TN | FN | Recall | Precision | Sensitivity | Specificity | F-measure | ||
---|---|---|---|---|---|---|---|---|---|---|
863 | 104 | 516 | 16 | 0.98 | 0.89 | 0.98 | 0.86 | 0.93 | Complication | Proposed method |
516 | 16 | 863 | 104 | 0.83 | 0.97 | 0.83 | 0.98 | 0.90 | No complication | |
787 | 86 | 541 | 85 | 0.90 | 0.89 | 0.90 | 0.83 | 0.92 | Complication | Decision tree |
541 | 85 | 787 | 86 | 0.86 | 0.86 | 0.86 | 0.90 | 0.86 | No complication | |
659 | 260 | 367 | 213 | 0.76 | 0.72 | 0.76 | 0.59 | 0.74 | Complication | Simple Bayes |
367 | 213 | 659 | 260 | 0.59 | 0.63 | 0.59 | 0.76 | 0.61 | No complication | |
739 | 201 | 426 | 133 | 0.85 | 0.79 | 0.85 | 0.68 | 0.82 | Complication | MLP NN |
426 | 133 | 739 | 201 | 0.68 | 0.76 | 0.68 | 0.85 | 0.72 | No complication |
4.2.4 Dialysis history complication
TP | FP | TN | FN | Recall | Precision | Sensitivity | Specificity | F-measure | ||
---|---|---|---|---|---|---|---|---|---|---|
1441 | 29 | 27 | 2 | 0.96 | 0.97 | 0.98 | 0.48 | 0.99 | Complication | Proposed method |
27 | 2 | 1441 | 29 | 0.48 | 0.93 | 0.48 | 1.00 | 0.64 | No complication | |
1386 | 65 | 29 | 19 | 0.95 | 0.96 | 0.97 | 0.31 | 0.97 | Complication | Decision tree |
29 | 19 | 1386 | 65 | 0.31 | 0.60 | 0.31 | 0.99 | 0.41 | No complication | |
1390 | 93 | 1 | 15 | 0.92 | 0.92 | 0.97 | 0.19 | 0.96 | Complication | Simple Bayes |
1 | 15 | 1390 | 93 | 0.01 | 0.06 | 0.01 | 0.99 | 0.02 | No complication | |
1390 | 72 | 22 | 15 | 0.78 | 0.95 | 0.94 | 0.23 | 0.97 | Complication | MLP NN |
22 | 15 | 1390 | 72 | 0.23 | 0.59 | 0.23 | 0.99 | 0.34 | No complication |
4.2.5 Heart attack complications
TP | FP | TN | FN | Recall | Precision | Sensitivity | Specificity | F-measure | ||
---|---|---|---|---|---|---|---|---|---|---|
1166 | 69 | 258 | 6 | 0.98 | 0.94 | 0.95 | 0.83 | 0.97 | Complication | Proposed method |
258 | 6 | 1166 | 69 | 0.79 | 0.98 | 0.79 | 0.99 | 0.87 | No complication | |
787 | 86 | 541 | 85 | 0.90 | 0.90 | 0.90 | 0.79 | 0.90 | Complication | Decision tree |
541 | 85 | 787 | 86 | 0.86 | 0.86 | 0.86 | 0.90 | 0.86 | No complication | |
958 | 237 | 111 | 193 | 0.83 | 0.80 | 0.83 | 0.32 | 0.82 | Complication | Simple Bayes |
111 | 193 | 958 | 237 | 0.32 | 0.37 | 0.32 | 0.83 | 0.34 | No complication | |
1075 | 147 | 201 | 76 | 0.93 | 0.88 | 0.93 | 0.58 | 0.91 | Complication | MLP NN |
201 | 76 | 1075 | 147 | 0.58 | 0.73 | 0.58 | 0.93 | 0.64 | No complication |
4.2.6 Stroke complications
TP | FP | TN | FN | Recall | Precision | Sensitivity | Specificity | F-measure | ||
---|---|---|---|---|---|---|---|---|---|---|
1431 | 43 | 19 | 6 | 0.98 | 0.97 | 0.97 | 0.31 | 0.98 | Complication | Proposed method |
19 | 6 | 1431 | 43 | 0.31 | 0.76 | 0.31 | 1.00 | 0.44 | No complication | |
1370 | 77 | 25 | 27 | 0.91 | 0.95 | 0.94 | 0.25 | 0.96 | Complication | Decision tree |
25 | 27 | 1370 | 77 | 0.25 | 0.48 | 0.25 | 0.98 | 0.32 | No complication | |
1343 | 95 | 7 | 54 | 0.96 | 0.93 | 0.96 | 0.17 | 0.95 | Complication | Simple Bayes |
7 | 54 | 1343 | 95 | 0.07 | 0.11 | 0.07 | 0.96 | 0.09 | No complication | |
1384 | 82 | 20 | 13 | 0.93 | 0.94 | 0.93 | 0.20 | 0.97 | Complication | MLP NN |
20 | 13 | 1384 | 82 | 0.20 | 0.61 | 0.20 | 0.99 | 0.30 | No complication |
4.2.7 Diabetes foot ulcer complication
TP | FP | TN | FN | Recall | Precision | Sensitivity | Specificity | F-measure | ||
---|---|---|---|---|---|---|---|---|---|---|
1377 | 42 | 76 | 4 | 0.98 | 0.97 | 0.99 | 0.64 | 0.98 | Complication | Proposed method |
76 | 4 | 1377 | 42 | 0.64 | 0.95 | 0.64 | 0.97 | 0.77 | No complication | |
1313 | 74 | 83 | 29 | 0.98 | 0.95 | 0.98 | 0.53 | 0.96 | Complication | Decision tree |
83 | 29 | 1313 | 74 | 0.53 | 0.74 | 0.53 | 0.98 | 0.62 | No complication | |
1174 | 106 | 51 | 168 | 0.87 | 0.92 | 0.87 | 0.32 | 0.90 | Complication | Simple Bayes |
51 | 168 | 1174 | 106 | 0.32 | 0.23 | 0.32 | 0.87 | 0.27 | No complication | |
1321 | 93 | 64 | 21 | 0.98 | 0.93 | 0.98 | 0.41 | 0.96 | Complication | MLP NN |
64 | 21 | 1321 | 93 | 0.41 | 0.75 | 0.41 | 0.98 | 0.53 | No complication |
4.2.8 Diabetes coma complication
TP | FP | TN | FN | Recall | Precision | Sensitivity | Specificity | F-measure | ||
---|---|---|---|---|---|---|---|---|---|---|
1451 | 31 | 10 | 7 | 0.97 | 0.95 | 0.98 | 0.24 | 0.98 | Complication | Proposed method |
10 | 7 | 1451 | 31 | 0.24 | 0.59 | 0.24 | 1.00 | 0.34 | No complication | |
1378 | 75 | 13 | 33 | 0.94 | 0.91 | 0.97 | 0.15 | 0.96 | Complication | Decision tree |
13 | 33 | 1378 | 75 | 0.15 | 0.28 | 0.15 | 0.98 | 0.19 | No complication | |
1288 | 70 | 18 | 123 | 0.91 | 0.93 | 0.91 | 0.20 | 0.93 | Complication | Simple Bayes |
18 | 123 | 1288 | 70 | 0.20 | 0.13 | 0.20 | 0.91 | 0.16 | No complication | |
1387 | 79 | 9 | 24 | 0.96 | 0.94 | 0.93 | 0.10 | 0.96 | Complication | MLP NN |
9 | 24 | 1387 | 79 | 0.10 | 0.27 | 0.10 | 0.98 | 0.15 | No complication |
4.3 Experimental evaluation on UCI dataset
33 = Regular insulin dose | |
34 = NPH insulin dose | |
35 = Ultra Lente insulin dose | |
48 = Unspecified blood glucose measurement | |
57 = Unspecified blood glucose measurement | |
58 = Pre-breakfast blood glucose measurement | |
59 = Post-breakfast blood glucose measurement | |
60 = Pre-lunch blood glucose measurement | |
61 = Post-lunch blood glucose measurement | |
62 = Pre-supper blood glucose measurement | |
63 = Post-supper blood glucose measurement | |
64 = Pre-snack blood glucose measurement | |
65 = Hypoglycemic symptoms | |
66 = Typical meal ingestion | |
67 = More-than-usual meal ingestion | |
68 = Less-than-usual meal ingestion | |
69 = Typical exercise activity | |
70 = More-than-usual exercise activity | |
71 = Less-than-usual exercise activity | |
72 = Unspecified special event |