Introduction
-
Type 1: This type includes 10% of people, who are insulin-dependent.
-
Type 2: This type affects 90% of people with diabetes. This type of body produces some insulin for the body's activities, but this amount is not enough for all the body's needs. In general, type 2 diabetes is the most common model and is usually asymptomatic or has mild symptoms. This can cause a person with diabetes to be unaware of their condition for years and irreparable damage to various parts of the patient's body. A summary of these injuries can be seen in Fig. 2 [3].
-
Data: This challenge includes the availability of accurate and quality data, data collection and sharing, data privacy and security, data integration from heterogeneous sources, data access and storage
-
Clearing and preprocessing data: This challenge includes selecting the correct data, clearing the data, selecting and extracting features, reducing dimensions, removing data noise, converting data, and integrating data.
-
Diagnosis and prediction techniques: This challenge also includes general and global methods, clinical and public usability, evaluation of existing approaches concerning recent new data sets, robust software tools, development of online extraction tools and real-time forecasting, model selection Suitable are the integration of models from different fields and efficiency and accuracy.
Related works
Research highlights
Materials and methods
Exchange Market Algorithm (EMA)
Binarizing the algorithm
Function | Transfer function | Transfer function in EMA |
---|---|---|
S1 | \(T\left(x\right)=\frac{1}{1+{\mathrm{e}}^{-2x}}\) | \(T\left({BEMA}_{i}^{d}(t)\right)=\frac{1}{1+{\mathrm{e}}^{-2{BEMA}_{i}^{d}(t)}}\) |
S2 | \(T\left(x\right)=\frac{1}{1+{\mathrm{e}}^{-x}}\) | \(T\left({BEMA}_{i}^{d}(t)\right)=\frac{1}{1+{\mathrm{e}}^{-{BEMA}_{i}^{d}(t)}}\) |
S3 | \(T\left(x\right)=\frac{1}{1+{\mathrm{e}}^{(-\frac{x}{2})}}\) | \(T\left({BEMA}_{i}^{d}(t)\right)=\frac{1}{1+{\mathrm{e}}^{(-\frac{{BEMA}_{i}^{d}(t)}{2})}}\) |
S4 | \(T\left(x\right)=\frac{1}{1+{\mathrm{e}}^{(-\frac{x}{3})}}\) | \(T\left({BEMA}_{i}^{d}(t)\right)=\frac{1}{1+{\mathrm{e}}^{(-\frac{{BEMA}_{i}^{d}(t)}{3})}}\) |
Proposed method
-
First phase: calling, normalizing, and segmenting dataIn this section, the data are called first, and then the data is normalized using the minimum–maximum method and Eq. (14). Placing data in a specific domain is done by normalization. Ignoring this step will reduce the correct diagnosis in the objective function. Therefore, the application of this method causes the various dimensions to be examined relatively by the algorithm, and the effect of one is not more than the others.$$Z=\frac{X-\mathrm{min}(x)}{\mathrm{max}\left(x\right)-\mathrm{min}(x)}$$(14)After normalizing the data, the two sets of training and testing divide the data by 80% to 20%, respectively.
-
Phase 2: Optimal selection of sigmoid function type, optimal feature selection, and reduction of data dimensionIn this phase, the most optimal type of sigmoid function is selected to binary the algorithm to achieve the best type of feature selection (reducing data volume and increasing detection accuracy) in the diabetes dataset. In this step, first, the algorithm generates a set of initial solutions. According to the algorithm process described in the second part (Using Eqs. (1) to (12)), optimization will be performed to select the optimal type of sigmoid function and the optimal selection of features. Finally, several important and practical features remain to reduce the maximum dimension of the data while the critical information is not deleted.
-
Phase 3: Training, testing, and classificationAfter reducing the volume of data by the optimally selected attributes, the desired classifications, several types of SVM, KNN and NB have been used to train the data and tests it in the next step use of the criteria presented in relations (15) to (18) to evaluate the performance of the algorithm in the optimal attribute selection step.
Evaluation Parameters
-
TN: Indicates the number of records whose actual category is negative, and the classification algorithm has correctly identified the variety as unfavorable.
-
TP: Indicates the number of records whose actual category is positive, and the classification algorithm correctly identifies the variety as favorable.
-
FN: Indicates the number of records whose actual batch is positive, and the batch classification algorithm has erroneously detected as unfavorable.
-
FP: Indicates the number of records whose actual batch is negative, and the batch classification algorithm has erroneously identified as favorable.
-
Accuracy: This criterion is the ratio of the number of correctly predicted classes and shows the degree of accuracy of prediction. The following equation shows the calculation of this criterion.$$Accuracy=\frac{ TP+TN}{TP+\mathrm{T}N+FP+FN}$$(15)
-
Sensitivity: This criterion indicates the accuracy of the prediction model and means classes that are ready for prediction error. Therefore, this criterion indicates the ability of the algorithm to detect positive categories. The following equation shows the calculation of this criterion.$$Sensitivity=\frac{TP}{TP+FN}$$(16)
-
Specificity: This criterion indicates the efficiency of classification incorrect prediction and thus no involvement with the disease. The following equation shows the calculation of this criterion.$$Specificity=\frac{TN}{FP+TN}$$(17)
-
F-Measure: This criterion is an appropriate parameter for evaluating classification quality. This criterion describes the weighted average between the two quantities of sensitivity and specificity. The following equation also shows the calculation of this criterion.$$F-Measure=2\times \frac{Sensitivity\times Specificity}{Sensitivity+Specificity}$$(18)
Results
Dataset
Feature description | Number of times pregnant |
---|---|
Number of times pregnant | [0–17] |
Plasma glucose concentration a 2 h in an oral glucose tolerance test (mg/dl) | [0–199] |
Diastolic blood pressure (mm Hg) | [0–122] |
Triceps skin fold thickness (mm) | [0–99] |
2-h serum insulin (mu U ml) | [0–864] |
Body mass index (weight in kg/(height in m)2) | [2.42–18.2] |
Diabetes pedigree function | [0.078–1] |
Age (years) | [21–72] |
Class | [1–2] |
Results of the proposed algorithms and comparison
Feature name | Feature description | Selective feature |
---|---|---|
Pregnancies | Number of times pregnant | – |
Glucose | Plasma glucose concentration a 2 h in an oral glucose tolerance test (mg/dl) | ✓ |
BloodPressure | Diastolic blood pressure (mm Hg) | – |
SkinThickness | Triceps skin fold thickness (mm) | ✓ |
Insulin | 2h serum insulin (mu U.ml) | – |
BMI | Body mass index (weight in kg/(height in m)^2) | ✓ |
DiabetesPedigreeFunction | Diabetes pedigree function | – |
Age | Age (years) | – |
References | Algorithm | Accuracy % | Sensitivity % | Specificity % | F-Measure % |
---|---|---|---|---|---|
[19] | HR-Kmeans | 91.65 | 91.11 | 50 | 64.54 |
GA-PSO-Kmeans | 89.64 | 86.65 | 75.33 | 80.59 | |
GA-Kmeans | 88.02 | 83.73 | 50 | 62.95 | |
Without Feature selection | BEMA-SVM | 79.221 | 100 | 0 | 88.406 |
BEMA-KNN | 81.169 | 100 | 0 | 89.606 | |
BEMA-NB | 83.766 | 100 | 0 | 91.166 | |
With Feature selection | BEMA-SVM | 98.502 | 100 | 90.625 | 98.785 |
BEMA-KNN | 92.857 | 96.8 | 75.862 | 95.652 | |
BEMA-NB | 93.506 | 93.023 | 96 | 96 |