Skip to main content
Top

2025 | OriginalPaper | Chapter

An Experimental Analysis of Machine Learning Models for Diabetes Classification

Authors : Subhayu Ghosh, Riyan Acharya, Nanda Dulal Jana

Published in: Advances in Communication, Devices and Networking

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Diabetes is a chronic metabolic disorder that affects millions of people worldwide. Early detection and effective management of diabetes are crucial to prevent severe complications and improve the quality of life for affected individuals. Machine Learning (ML) techniques have shown great promise in aiding the early detection of various diseases on patient data, including diabetes. ML algorithms can analyze vast datasets, identify patterns, and make accurate predictions, helping medical professionals to diagnose diabetes at its early stages. In our work, we employed several ML models for diabetes classification using different datasets. These models include K-Nearest Neighbor (KNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Gradient-Boosted Model (GBM), eXtreme Gradient Boosting (XG-Boost), Adaptive Boosting (Ada-Boost), Support Vector Machine (SVM), and Gaussian Naive Bayes (GNB). We performed a comparative analysis of their performance on three distinct datasets using evaluation metrics like accuracy, precision, F1-score, sensitivity, specificity and Cohen’s Kappa Value. Our findings revealed that the RF algorithm is optimal for symptoms-based and primary lab report-based diabetes detection, while XG-Boost excels in classifying different types of diabetes from a multi-class dataset. Moreover, we investigated diverse symptoms and their impact on diabetes outcomes, offering insights into preventive measures and early stage monitoring for this disease classification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The full coding implementation of all the ML models for three datasets are publicly uploaded at: https://​github.​com/​riyanacharya2002​/​Comparative-analysis-of-Diabetes-Detection-using-ML.
 
Literature
1.
go back to reference Abdulhadi N, Al-Mousa A (2021) Diabetes detection using machine learning classification methods. In: 2021 international conference on information technology (ICIT). IEEE, pp 350–354 Abdulhadi N, Al-Mousa A (2021) Diabetes detection using machine learning classification methods. In: 2021 international conference on information technology (ICIT). IEEE, pp 350–354
2.
go back to reference Atkinson MA, Eisenbarth GS, Michels AW (2014) Type 1 diabetes. Lancet 383(9911):69–82CrossRef Atkinson MA, Eisenbarth GS, Michels AW (2014) Type 1 diabetes. Lancet 383(9911):69–82CrossRef
3.
go back to reference Chatterjee S, Khunti K, Davies MJ (2017) Type 2 diabetes. Lancet 389(10085):2239–2251CrossRef Chatterjee S, Khunti K, Davies MJ (2017) Type 2 diabetes. Lancet 389(10085):2239–2251CrossRef
4.
go back to reference Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining
5.
go back to reference Cristianini N, Ricci E (2008) Support vector machines. Springer US, Boston, pp 928–932 Cristianini N, Ricci E (2008) Support vector machines. Springer US, Boston, pp 928–932
6.
go back to reference Dalianis H, Dalianis H (2018) Evaluation metrics and evaluation. In: Clinical text mining: secondary use of electronic patient records, pp 45–53 Dalianis H, Dalianis H (2018) Evaluation metrics and evaluation. In: Clinical text mining: secondary use of electronic patient records, pp 45–53
7.
go back to reference Deshpande AD, Harris-Hayes M, Schootman M (2008) Epidemiology of diabetes and diabetes-related complications. Phys Ther 88(11):1254–1264CrossRef Deshpande AD, Harris-Hayes M, Schootman M (2008) Epidemiology of diabetes and diabetes-related complications. Phys Ther 88(11):1254–1264CrossRef
9.
go back to reference DiMeglio LA, Evans-Molina C, Oram RA (2018) Type 1 diabetes. Lancet 391(10138):2449–2462CrossRef DiMeglio LA, Evans-Molina C, Oram RA (2018) Type 1 diabetes. Lancet 391(10138):2449–2462CrossRef
11.
go back to reference Gahukar G, Gahukar G (2019) Classification algorithms in machine learning Gahukar G, Gahukar G (2019) Classification algorithms in machine learning
12.
go back to reference García S, Ramírez-Gallego S, Luengo J, Benítez JM, Herrera F (2016) Big data preprocessing: methods and prospects. Big Data Anal 1(1):1–22CrossRef García S, Ramírez-Gallego S, Luengo J, Benítez JM, Herrera F (2016) Big data preprocessing: methods and prospects. Big Data Anal 1(1):1–22CrossRef
13.
go back to reference Ghosh P, Azam S, Karim A, Hassan M, Roy K, Jonkman M (2021) A comparative study of different machine learning tools in detecting diabetes. Proc Comput Sci 192:467–477CrossRef Ghosh P, Azam S, Karim A, Hassan M, Roy K, Jonkman M (2021) A comparative study of different machine learning tools in detecting diabetes. Proc Comput Sci 192:467–477CrossRef
14.
go back to reference Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: European conference on information retrieval. Springer, pp 345–359 Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: European conference on information retrieval. Springer, pp 345–359
15.
go back to reference Gujral S (2017) Early diabetes detection using machine learning: a review Gujral S (2017) Early diabetes detection using machine learning: a review
16.
go back to reference Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification. In: On the move to meaningful Internet systems 2003: CoopIS, DOA, and ODBASE: OTM confederated international conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, 3–7 Nov 2003. Proceedings. Springer, pp 986–996 Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification. In: On the move to meaningful Internet systems 2003: CoopIS, DOA, and ODBASE: OTM confederated international conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, 3–7 Nov 2003. Proceedings. Springer, pp 986–996
17.
go back to reference Gupta O, Joshi M, Dave S (1978) Prevalence of diabetes in India. Adv Metabol Disord 9:147–165CrossRef Gupta O, Joshi M, Dave S (1978) Prevalence of diabetes in India. Adv Metabol Disord 9:147–165CrossRef
18.
go back to reference He B, Shu KI, Zhang H (2019) Machine learning and data mining in diabetes diagnosis and treatment. IOP Conf Ser Mater Sci Eng 490:042049CrossRef He B, Shu KI, Zhang H (2019) Machine learning and data mining in diabetes diagnosis and treatment. IOP Conf Ser Mater Sci Eng 490:042049CrossRef
19.
go back to reference Jahromi AH, Taheri M (2017) A non-parametric mixture of Gaussian Naive Bayes classifiers based on local independent features. In: 2017 artificial intelligence and signal processing conference (AISP). IEEE, pp 209–212 Jahromi AH, Taheri M (2017) A non-parametric mixture of Gaussian Naive Bayes classifiers based on local independent features. In: 2017 artificial intelligence and signal processing conference (AISP). IEEE, pp 209–212
20.
go back to reference Lin CH, Chang YC, Chuang LM (2016) Early detection of diabetic kidney disease: present limitations and future perspectives. World J Diabetes 7(14):290CrossRef Lin CH, Chang YC, Chuang LM (2016) Early detection of diabetic kidney disease: present limitations and future perspectives. World J Diabetes 7(14):290CrossRef
21.
go back to reference Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. In: Liu B, Ma M, Chang J (eds) Information computing and applications. Springer, Berlin, pp 246–252CrossRef Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. In: Liu B, Ma M, Chang J (eds) Information computing and applications. Springer, Berlin, pp 246–252CrossRef
22.
go back to reference McKinney W et al (2011) pandas: a foundational python library for data analysis and statistics. Python High Perform Sci Comput 14(9):1–9 McKinney W et al (2011) pandas: a foundational python library for data analysis and statistics. Python High Perform Sci Comput 14(9):1–9
23.
go back to reference Mesquita F, Maurício J, Marques G (2021) Oversampling techniques for diabetes classification: a comparative study. In: 2021 international conference on e-health and bioengineering (EHB). IEEE, pp 1–6 Mesquita F, Maurício J, Marques G (2021) Oversampling techniques for diabetes classification: a comparative study. In: 2021 international conference on e-health and bioengineering (EHB). IEEE, pp 1–6
24.
go back to reference Misra A, Gopalan H, Jayawardena R, Hills AP, Soares M, Reza-Albarrán AA, Ramaiya KL (2019) Diabetes in developing countries. J Diabetes 11(7):522–539CrossRef Misra A, Gopalan H, Jayawardena R, Hills AP, Soares M, Reza-Albarrán AA, Ramaiya KL (2019) Diabetes in developing countries. J Diabetes 11(7):522–539CrossRef
25.
go back to reference Mujumdar A, Vaidehi V (2019) Diabetes prediction using machine learning algorithms. Proc Comput Sci 165:292–299CrossRef Mujumdar A, Vaidehi V (2019) Diabetes prediction using machine learning algorithms. Proc Comput Sci 165:292–299CrossRef
26.
go back to reference Nick TG, Campbell KM (2007) Logistic regression. Topics in biostatistics, pp 273–301 Nick TG, Campbell KM (2007) Logistic regression. Topics in biostatistics, pp 273–301
27.
go back to reference Oliphant TE et al (2006) Guide to numpy, vol 1. Trelgol Publishing, USA Oliphant TE et al (2006) Guide to numpy, vol 1. Trelgol Publishing, USA
28.
go back to reference Papatheodorou K, Banach M, Bekiari E, Rizzo M, Edmonds M et al (2018) Complications of diabetes 2017 Papatheodorou K, Banach M, Bekiari E, Rizzo M, Edmonds M et al (2018) Complications of diabetes 2017
29.
go back to reference Quinlan JR (1996) Learning decision tree classifiers. ACM Comput Surv (CSUR) 28(1):71–72CrossRef Quinlan JR (1996) Learning decision tree classifiers. ACM Comput Surv (CSUR) 28(1):71–72CrossRef
30.
go back to reference Rady M, Moussa K, Mostafa M, Elbasry A, Ezzat Z, Medhat W (2021) Diabetes prediction using machine learning: a comparative study. In: 2021 3rd novel intelligent and leading emerging sciences conference (NILES), pp 279–282 Rady M, Moussa K, Mostafa M, Elbasry A, Ezzat Z, Medhat W (2021) Diabetes prediction using machine learning: a comparative study. In: 2021 3rd novel intelligent and leading emerging sciences conference (NILES), pp 279–282
31.
go back to reference Ramachandran A, Snehalatha C (2009) Current scenario of diabetes in India. J Diabetes 1(1):18–28CrossRef Ramachandran A, Snehalatha C (2009) Current scenario of diabetes in India. J Diabetes 1(1):18–28CrossRef
32.
go back to reference Roglic G et al (2016) Who global report on diabetes: a summary. Int J Noncommun Dis 1(1):3CrossRef Roglic G et al (2016) Who global report on diabetes: a summary. Int J Noncommun Dis 1(1):3CrossRef
33.
go back to reference Sankar Ganesh P, Sripriya P (2020) A comparative review of prediction methods for PIMA Indians diabetes dataset. Comput Vis Bio-Inspired Comput ICCVBIC 2019:735–750CrossRef Sankar Ganesh P, Sripriya P (2020) A comparative review of prediction methods for PIMA Indians diabetes dataset. Comput Vis Bio-Inspired Comput ICCVBIC 2019:735–750CrossRef
34.
go back to reference Schapire RE (2013) Explaining adaboost. In: Empirical inference: festschrift in Honor of Vladimir N. Vapnik. Springer, pp 37–52 Schapire RE (2013) Explaining adaboost. In: Empirical inference: festschrift in Honor of Vladimir N. Vapnik. Springer, pp 37–52
35.
go back to reference Swapna G, Vinayakumar R, Soman K (2018) Diabetes detection using deep learning algorithms. ICT Express 4(4):243–246CrossRef Swapna G, Vinayakumar R, Soman K (2018) Diabetes detection using deep learning algorithms. ICT Express 4(4):243–246CrossRef
36.
go back to reference Swift A, Heale R, Twycross A (2020) What are sensitivity and specificity? Evid Based Nurs 23(1):2–4CrossRef Swift A, Heale R, Twycross A (2020) What are sensitivity and specificity? Evid Based Nurs 23(1):2–4CrossRef
37.
go back to reference Vach W (2005) The dependence of Cohen’s kappa on the prevalence does not matter. J Clin Epidemiol 58(7):655–661CrossRef Vach W (2005) The dependence of Cohen’s kappa on the prevalence does not matter. J Clin Epidemiol 58(7):655–661CrossRef
38.
go back to reference Vijan S (2010) Type 2 diabetes. Ann Int Med 152(5):ITC3-1 Vijan S (2010) Type 2 diabetes. Ann Int Med 152(5):ITC3-1
39.
go back to reference Yacouby R, Axman D (2020) Probabilistic extension of precision, recall, and F1 score for more thorough evaluation of classification models. In: Proceedings of the first workshop on evaluation and comparison of NLP systems, pp 79–91 Yacouby R, Axman D (2020) Probabilistic extension of precision, recall, and F1 score for more thorough evaluation of classification models. In: Proceedings of the first workshop on evaluation and comparison of NLP systems, pp 79–91
40.
go back to reference Ye J, Chow JH, Chen J, Zheng Z (2009) Stochastic gradient boosted distributed decision trees. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 2061–2064 Ye J, Chow JH, Chen J, Zheng Z (2009) Stochastic gradient boosted distributed decision trees. In: Proceedings of the 18th ACM conference on information and knowledge management, pp 2061–2064
41.
go back to reference Zimmet PZ, Magliano DJ, Herman WH, Shaw JE (2014) Diabetes: a 21st century challenge. Lancet Diab Endocrinol 2(1):56–64CrossRef Zimmet PZ, Magliano DJ, Herman WH, Shaw JE (2014) Diabetes: a 21st century challenge. Lancet Diab Endocrinol 2(1):56–64CrossRef
Metadata
Title
An Experimental Analysis of Machine Learning Models for Diabetes Classification
Authors
Subhayu Ghosh
Riyan Acharya
Nanda Dulal Jana
Copyright Year
2025
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-6465-5_11