Introduction
Major report statistics from various health organisations
-
In 2018, the American Diabetes Association models of therapeutic care [2] in diabetes discharges a report about “Order and finding of diabetes” which incorporates the arrangement of diabetes, diabetes care, treatment objectives, criteria for conclusion test ranges and dangers esteems, chance engaged with diabetes.
-
In 2017, Global provides details regarding Diabetes by world wellbeing association [8], it expresses the weight of diabetes, hazard components and inconveniences of diabetes. Likewise, gives the data about counteracting diabetes in individuals with high hazard and overseeing diabetes at beginning times with fundamental solutions to be taken.
Diagnosis levels of diabetes (Fig. 2)
-
A1C tests/tested It is a blood trail of a person for recent months. The range of the various classes is recorded in the table. It is prescribed for diabetes and prediabetes.
-
FPG tests/tested It’s a Fasting plasma glucose test level is utilized to recognize prediabetes and diabetes.
-
OGT It is oral glucose, the blood test used to analyze the prediabetes, diabetes and gestational diabetes.
Effects of diabetes
Data mining and classification
Literature survey
Implementation methods
Decision tree
Naïve Bayesian
-
P(c|x) is the posterior probability of class (target) given predictor (attribute).
-
P(c) is the prior probability of class.
-
P(x|c) is the likelihood which is the probability of predictor given class.
-
P(x) is the prior probability of predictor.
Support vector machine
Random forest
K nearest neighbour (KNN)
Data set description
SI. no | Attribute | Description |
---|---|---|
1 | Age | Age of a person |
2 | Gender | Male or female |
3 | Plasma glucose fasting | – |
4 | Plasma glucose post prandial | – |
5 | Pregnancy | Pregnancy count of women |
6 | Blood glucose level | Plasma glucose concentration a 2 h in an oral glucose tolerance test |
7 | Blood pressure | Diastolic blood pressure (mm Hg) |
8 | Skin thickness | Triceps skin fold thickness (mm) |
9 | Insulin | 2-h serum insulin (mu U/ml) |
10 | BMI (body mass index) | Body mass index (weight in kg/(height in m)2) |
11 | DPF | Diabetes pedigree function |
12 | Serum creatinine | Test measures the level of creatinine in the blood |
13 | Serum sodium | sodium content is in your blood |
14 | Serum potassium | Potassium content in blood |
15 | HBAIC | Hemoglobin A1c, a blood pigment that carries oxygen |
Modified approach
Experimental results
SI. no | Classification technique | Accuracy | Correctly classified | Incorrectly classified |
---|---|---|---|---|
1 | SVM | 77.73 | 597 | 171 |
2 | Random forest | 75.39 | 579 | 189 |
3 | NB | 73.48 | 129 | 61 |
4 | Decision tree | 73.18 | 562 | 206 |
5 | KNN | 63.04 | 145 | 85 |
Accuracy = 77.73 | True non-diabetic | True diabetic | Class precision |
---|---|---|---|
Pred. non-diabetic | 153 | 56 | 73.21 |
Pred. diabetic | 115 | 444 | 79.43 |
Class recall | 57.09 | 88.80 |
Accuracy = 75.39 | True non-diabetic | True diabetic | Class precision |
---|---|---|---|
Pred. non-diabetic | 89 | 10 | 89.90 |
Pred. diabetic | 179 | 490 | 73.24 |
Class recall | 33.21 | 98.00 |
Accuracy = 73.48 | True non-diabetic | True diabetic | Class precision |
---|---|---|---|
Pred. non-diabetic | 49 | 30 | 62.03 |
Pred. diabetic | 31 | 120 | 79.47 |
Class recall | 61.25 | 80.00 |
Accuracy = 73.18 | True non-diabetic | True diabetic | Class precision |
---|---|---|---|
Pred. non-diabetic | 71 | 9 | 88.75 |
Pred. diabetic | 197 | 491 | 71.37 |
Class recall | 61.25 | 80.00 |
Accuracy = 63.04 | True non-diabetic | True diabetic | Class precision |
---|---|---|---|
Pred. non-diabetic | 37 | 42 | 46.84 |
Pred. diabetic | 43 | 108 | 71.52 |
Class recall | 46.25 | 72.00 |
Discussion
Comparison of classification technique
No. | Algorithm | Sensitivity % | Specificity % | Positive likelihood ratio | Negative likelihood ratio | Disease prevalence % | Positive predictive value % | Negative predictive value % | Accuracy % |
---|---|---|---|---|---|---|---|---|---|
1 | SVM | 57.09 | 88.80 | 5.10 | 0.48 | 34.90 | 73.21 | 79.43 | 77.73 |
2 | Random forest | 33.21 | 98.00 | 16.60 | 0.68 | 34.90 | 89.90 | 73.24 | 75.39 |
3 | NB | 61.25 | 80.00 | 3.06 | 0.48 | 34.78 | 62.03 | 79.47 | 73.48 |
4 | Decision tree | 26.49 | 98.20 | 14.72 | 0.75 | 34.90 | 88.75 | 71.37 | 73.18 |
5 | KNN | 46.25 | 72.00 | 1.65 | 0.75 | 34.78 | 46.84 | 71.52 | 63.04 |
Results of modified approach for selection of attributes
Attributes | Co-relation value |
---|---|
Age | 1.837 |
Gender | 1.788 |
Plasma glucose fasting | 2.464 |
Plasma glucose post prandial | 0.464 |
Pregnancy | 0.798 |
Blood glucose level | 1.789 |
Blood pressure | 2.332 |
Skin thickness | 2.004 |
Insulin | 1.664 |
BMI (body mass index) | 1.456 |
DPF | 1.555 |
Serum creatinine | 0.389 |
Serum sodium | 2.203 |
Serum potassium | 1.963 |
HBAIC | 0.466 |