Skip to main content

2025 | OriginalPaper | Buchkapitel

An Experimental Analysis of Machine Learning Models for Diabetes Classification

verfasst von : Subhayu Ghosh, Riyan Acharya, Nanda Dulal Jana

Erschienen in: Advances in Communication, Devices and Networking

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Diabetes is a chronic metabolic disorder that affects millions of people worldwide. Early detection and effective management of diabetes are crucial to prevent severe complications and improve the quality of life for affected individuals. Machine Learning (ML) techniques have shown great promise in aiding the early detection of various diseases on patient data, including diabetes. ML algorithms can analyze vast datasets, identify patterns, and make accurate predictions, helping medical professionals to diagnose diabetes at its early stages. In our work, we employed several ML models for diabetes classification using different datasets. These models include K-Nearest Neighbor (KNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Gradient-Boosted Model (GBM), eXtreme Gradient Boosting (XG-Boost), Adaptive Boosting (Ada-Boost), Support Vector Machine (SVM), and Gaussian Naive Bayes (GNB). We performed a comparative analysis of their performance on three distinct datasets using evaluation metrics like accuracy, precision, F1-score, sensitivity, specificity and Cohen’s Kappa Value. Our findings revealed that the RF algorithm is optimal for symptoms-based and primary lab report-based diabetes detection, while XG-Boost excels in classifying different types of diabetes from a multi-class dataset. Moreover, we investigated diverse symptoms and their impact on diabetes outcomes, offering insights into preventive measures and early stage monitoring for this disease classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The full coding implementation of all the ML models for three datasets are publicly uploaded at: https://​github.​com/​riyanacharya2002​/​Comparative-analysis-of-Diabetes-Detection-using-ML.
 
Literatur
1.
Zurück zum Zitat Abdulhadi N, Al-Mousa A (2021) Diabetes detection using machine learning classification methods. In: 2021 international conference on information technology (ICIT). IEEE, pp 350–354 Abdulhadi N, Al-Mousa A (2021) Diabetes detection using machine learning classification methods. In: 2021 international conference on information technology (ICIT). IEEE, pp 350–354
2.
Zurück zum Zitat Atkinson MA, Eisenbarth GS, Michels AW (2014) Type 1 diabetes. Lancet 383(9911):69–82CrossRef Atkinson MA, Eisenbarth GS, Michels AW (2014) Type 1 diabetes. Lancet 383(9911):69–82CrossRef
3.
Zurück zum Zitat Chatterjee S, Khunti K, Davies MJ (2017) Type 2 diabetes. Lancet 389(10085):2239–2251CrossRef Chatterjee S, Khunti K, Davies MJ (2017) Type 2 diabetes. Lancet 389(10085):2239–2251CrossRef
4.
Zurück zum Zitat Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining
5.
Zurück zum Zitat Cristianini N, Ricci E (2008) Support vector machines. Springer US, Boston, pp 928–932 Cristianini N, Ricci E (2008) Support vector machines. Springer US, Boston, pp 928–932
6.
Zurück zum Zitat Dalianis H, Dalianis H (2018) Evaluation metrics and evaluation. In: Clinical text mining: secondary use of electronic patient records, pp 45–53 Dalianis H, Dalianis H (2018) Evaluation metrics and evaluation. In: Clinical text mining: secondary use of electronic patient records, pp 45–53
7.
Zurück zum Zitat Deshpande AD, Harris-Hayes M, Schootman M (2008) Epidemiology of diabetes and diabetes-related complications. Phys Ther 88(11):1254–1264CrossRef Deshpande AD, Harris-Hayes M, Schootman M (2008) Epidemiology of diabetes and diabetes-related complications. Phys Ther 88(11):1254–1264CrossRef
9.
Zurück zum Zitat DiMeglio LA, Evans-Molina C, Oram RA (2018) Type 1 diabetes. Lancet 391(10138):2449–2462CrossRef DiMeglio LA, Evans-Molina C, Oram RA (2018) Type 1 diabetes. Lancet 391(10138):2449–2462CrossRef
11.
Zurück zum Zitat Gahukar G, Gahukar G (2019) Classification algorithms in machine learning Gahukar G, Gahukar G (2019) Classification algorithms in machine learning
12.
Zurück zum Zitat García S, Ramírez-Gallego S, Luengo J, Benítez JM, Herrera F (2016) Big data preprocessing: methods and prospects. Big Data Anal 1(1):1–22CrossRef García S, Ramírez-Gallego S, Luengo J, Benítez JM, Herrera F (2016) Big data preprocessing: methods and prospects. Big Data Anal 1(1):1–22CrossRef
13.
Zurück zum Zitat Ghosh P, Azam S, Karim A, Hassan M, Roy K, Jonkman M (2021) A comparative study of different machine learning tools in detecting diabetes. Proc Comput Sci 192:467–477CrossRef Ghosh P, Azam S, Karim A, Hassan M, Roy K, Jonkman M (2021) A comparative study of different machine learning tools in detecting diabetes. Proc Comput Sci 192:467–477CrossRef
14.
Zurück zum Zitat Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: European conference on information retrieval. Springer, pp 345–359 Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: European conference on information retrieval. Springer, pp 345–359
15.
Zurück zum Zitat Gujral S (2017) Early diabetes detection using machine learning: a review Gujral S (2017) Early diabetes detection using machine learning: a review
16.
Zurück zum Zitat Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification. In: On the move to meaningful Internet systems 2003: CoopIS, DOA, and ODBASE: OTM confederated international conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, 3–7 Nov 2003. Proceedings. Springer, pp 986–996 Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification. In: On the move to meaningful Internet systems 2003: CoopIS, DOA, and ODBASE: OTM confederated international conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, 3–7 Nov 2003. Proceedings. Springer, pp 986–996
17.
Zurück zum Zitat Gupta O, Joshi M, Dave S (1978) Prevalence of diabetes in India. Adv Metabol Disord 9:147–165CrossRef Gupta O, Joshi M, Dave S (1978) Prevalence of diabetes in India. Adv Metabol Disord 9:147–165CrossRef
18.
Zurück zum Zitat He B, Shu KI, Zhang H (2019) Machine learning and data mining in diabetes diagnosis and treatment. IOP Conf Ser Mater Sci Eng 490:042049CrossRef He B, Shu KI, Zhang H (2019) Machine learning and data mining in diabetes diagnosis and treatment. IOP Conf Ser Mater Sci Eng 490:042049CrossRef
19.
Zurück zum Zitat Jahromi AH, Taheri M (2017) A non-parametric mixture of Gaussian Naive Bayes classifiers based on local independent features. In: 2017 artificial intelligence and signal processing conference (AISP). IEEE, pp 209–212 Jahromi AH, Taheri M (2017) A non-parametric mixture of Gaussian Naive Bayes classifiers based on local independent features. In: 2017 artificial intelligence and signal processing conference (AISP). IEEE, pp 209–212
20.
Zurück zum Zitat Lin CH, Chang YC, Chuang LM (2016) Early detection of diabetic kidney disease: present limitations and future perspectives. World J Diabetes 7(14):290CrossRef Lin CH, Chang YC, Chuang LM (2016) Early detection of diabetic kidney disease: present limitations and future perspectives. World J Diabetes 7(14):290CrossRef
21.
Zurück zum Zitat Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. In: Liu B, Ma M, Chang J (eds) Information computing and applications. Springer, Berlin, pp 246–252CrossRef Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. In: Liu B, Ma M, Chang J (eds) Information computing and applications. Springer, Berlin, pp 246–252CrossRef
22.
Zurück zum Zitat McKinney W et al (2011) pandas: a foundational python library for data analysis and statistics. Python High Perform Sci Comput 14(9):1–9 McKinney W et al (2011) pandas: a foundational python library for data analysis and statistics. Python High Perform Sci Comput 14(9):1–9
23.
Zurück zum Zitat Mesquita F, Maurício J, Marques G (2021) Oversampling techniques for diabetes classification: a comparative study. In: 2021 international conference on e-health and bioengineering (EHB). IEEE, pp 1–6 Mesquita F, Maurício J, Marques G (2021) Oversampling techniques for diabetes classification: a comparative study. In: 2021 international conference on e-health and bioengineering (EHB). IEEE, pp 1–6
24.
Zurück zum Zitat Misra A, Gopalan H, Jayawardena R, Hills AP, Soares M, Reza-Albarrán AA, Ramaiya KL (2019) Diabetes in developing countries. J Diabetes 11(7):522–539CrossRef Misra A, Gopalan H, Jayawardena R, Hills AP, Soares M, Reza-Albarrán AA, Ramaiya KL (2019) Diabetes in developing countries. J Diabetes 11(7):522–539CrossRef
25.
Zurück zum Zitat Mujumdar A, Vaidehi V (2019) Diabetes prediction using machine learning algorithms. Proc Comput Sci 165:292–299CrossRef Mujumdar A, Vaidehi V (2019) Diabetes prediction using machine learning algorithms. Proc Comput Sci 165:292–299CrossRef
26.
Zurück zum Zitat Nick TG, Campbell KM (2007) Logistic regression. Topics in biostatistics, pp 273–301 Nick TG, Campbell KM (2007) Logistic regression. Topics in biostatistics, pp 273–301
27.
Zurück zum Zitat Oliphant TE et al (2006) Guide to numpy, vol 1. Trelgol Publishing, USA Oliphant TE et al (2006) Guide to numpy, vol 1. Trelgol Publishing, USA
28.
Zurück zum Zitat Papatheodorou K, Banach M, Bekiari E, Rizzo M, Edmonds M et al (2018) Complications of diabetes 2017 Papatheodorou K, Banach M, Bekiari E, Rizzo M, Edmonds M et al (2018) Complications of diabetes 2017
29.
Zurück zum Zitat Quinlan JR (1996) Learning decision tree classifiers. ACM Comput Surv (CSUR) 28(1):71–72CrossRef Quinlan JR (1996) Learning decision tree classifiers. ACM Comput Surv (CSUR) 28(1):71–72CrossRef
30.
Zurück zum Zitat Rady M, Moussa K, Mostafa M, Elbasry A, Ezzat Z, Medhat W (2021) Diabetes prediction using machine learning: a comparative study. In: 2021 3rd novel intelligent and leading emerging sciences conference (NILES), pp 279–282 Rady M, Moussa K, Mostafa M, Elbasry A, Ezzat Z, Medhat W (2021) Diabetes prediction using machine learning: a comparative study. In: 2021 3rd novel intelligent and leading emerging sciences conference (NILES), pp 279–282
31.
Zurück zum Zitat Ramachandran A, Snehalatha C (2009) Current scenario of diabetes in India. J Diabetes 1(1):18–28CrossRef Ramachandran A, Snehalatha C (2009) Current scenario of diabetes in India. J Diabetes 1(1):18–28CrossRef
32.
Zurück zum Zitat Roglic G et al (2016) Who global report on diabetes: a summary. Int J Noncommun Dis 1(1):3CrossRef Roglic G et al (2016) Who global report on diabetes: a summary. Int J Noncommun Dis 1(1):3CrossRef
33.
Zurück zum Zitat Sankar Ganesh P, Sripriya P (2020) A comparative review of prediction methods for PIMA Indians diabetes dataset. Comput Vis Bio-Inspired Comput ICCVBIC 2019:735–750CrossRef Sankar Ganesh P, Sripriya P (2020) A comparative review of prediction methods for PIMA Indians diabetes dataset. Comput Vis Bio-Inspired Comput ICCVBIC 2019:735–750CrossRef
34.
Zurück zum Zitat Schapire RE (2013) Explaining adaboost. In: Empirical inference: festschrift in Honor of Vladimir N. Vapnik. Springer, pp 37–52 Schapire RE (2013) Explaining adaboost. In: Empirical inference: festschrift in Honor of Vladimir N. Vapnik. Springer, pp 37–52
35.
Zurück zum Zitat Swapna G, Vinayakumar R, Soman K (2018) Diabetes detection using deep learning algorithms. ICT Express 4(4):243–246CrossRef Swapna G, Vinayakumar R, Soman K (2018) Diabetes detection using deep learning algorithms. ICT Express 4(4):243–246CrossRef
36.
Zurück zum Zitat Swift A, Heale R, Twycross A (2020) What are sensitivity and specificity? Evid Based Nurs 23(1):2–4CrossRef Swift A, Heale R, Twycross A (2020) What are sensitivity and specificity? Evid Based Nurs 23(1):2–4CrossRef
37.
Zurück zum Zitat Vach W (2005) The dependence of Cohen’s kappa on the prevalence does not matter. J Clin Epidemiol 58(7):655–661CrossRef Vach W (2005) The dependence of Cohen’s kappa on the prevalence does not matter. J Clin Epidemiol 58(7):655–661CrossRef
38.
Zurück zum Zitat Vijan S (2010) Type 2 diabetes. Ann Int Med 152(5):ITC3-1 Vijan S (2010) Type 2 diabetes. Ann Int Med 152(5):ITC3-1
39.
Zurück zum Zitat Yacouby R, Axman D (2020) Probabilistic extension of precision, recall, and F1 score for more thorough evaluation of classification models. In: Proceedings of the first workshop on evaluation and comparison of NLP systems, pp 79–91 Yacouby R, Axman D (2020) Probabilistic extension of precision, recall, and F1 score for more thorough evaluation of classification models. In: Proceedings of the first workshop on evaluation and comparison of NLP systems, pp 79–91
40.
Zurück zum Zitat Ye J, Chow JH, Chen J, Zheng Z (2009) Stochastic gradient boosted distributed decision trees. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 2061–2064 Ye J, Chow JH, Chen J, Zheng Z (2009) Stochastic gradient boosted distributed decision trees. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 2061–2064
41.
Zurück zum Zitat Zimmet PZ, Magliano DJ, Herman WH, Shaw JE (2014) Diabetes: a 21st century challenge. Lancet Diab Endocrinol 2(1):56–64CrossRef Zimmet PZ, Magliano DJ, Herman WH, Shaw JE (2014) Diabetes: a 21st century challenge. Lancet Diab Endocrinol 2(1):56–64CrossRef
Metadaten
Titel
An Experimental Analysis of Machine Learning Models for Diabetes Classification
verfasst von
Subhayu Ghosh
Riyan Acharya
Nanda Dulal Jana
Copyright-Jahr
2025
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-6465-5_11