Skip to main content
Erschienen in: Health and Technology 4/2019

05.02.2019 | Original Paper

Ensemble method based predictive model for analyzing disease datasets: a predictive analysis approach

verfasst von: Dharavath Ramesh, Yogendra Singh Katheria

Erschienen in: Health and Technology | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Medical datasets have attracted the research community for possible analysis and suitable prediction, which helps the human to take proper precautions in preventing future diseases. To perform related operations, data mining techniques have been widely used in developing decision support systems for disease prediction through a set of medical datasets. This work proposes a new predictive model for disease prediction using pre-processing techniques for various disease datasets. The proposed model not only analyses the datasets also improves the performance by using ensemble methods. To process the datasets, pre-processing techniques such as discretization, resampling, principal component, and decision tree have been used. To classify the datasets, classification techniques such as Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Naïve Bayes (NB), Decision Tree (DT), and Random Forest (RF) have been used. The algorithms are applied with 10 fold validation technique. A predictive analysis has also been performed on various disease datasets, where every dataset results in significant improvement for various performance measures. We perform a predictive analysis on the datasets such as CKD (Chronic Kidney Disease), Cardiovascular Disease (CVD) or heart, Diabetes, Hepatitis disease, Cancer disease and ILPD (Indian Liver Patient disease). Experimental results show that the proposed predictive model outperforms in terms of better accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Magoulas GD, Prentza A. Machine learning in medical applications. Advanced course on artificial intelligence. Berlin, Heidelberg: Springer; 1999. p. 300–7.MATH Magoulas GD, Prentza A. Machine learning in medical applications. Advanced course on artificial intelligence. Berlin, Heidelberg: Springer; 1999. p. 300–7.MATH
4.
Zurück zum Zitat Godara S, Singh R. Evaluation of predictive machine learning techniques as expert systems in medical diagnosis. Indian J Sci Technol. 2016;9(10):1–14. Godara S, Singh R. Evaluation of predictive machine learning techniques as expert systems in medical diagnosis. Indian J Sci Technol. 2016;9(10):1–14.
5.
Zurück zum Zitat Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn. 1999;36(1–2):105–39.CrossRef Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn. 1999;36(1–2):105–39.CrossRef
7.
Zurück zum Zitat John R, Webb M, Young A, Stevens PE. Unreferred chronic kidney disease: a longitudinal study. Am J Kidney Dis. 2004;5(3):825–35.CrossRef John R, Webb M, Young A, Stevens PE. Unreferred chronic kidney disease: a longitudinal study. Am J Kidney Dis. 2004;5(3):825–35.CrossRef
8.
Zurück zum Zitat de Lusignan S, Chan T, Stevens P, O'donoghue D, Hague N, Dzregah B, et al. Identifying patients with chronic kidney disease from general practice computer records. Fam Pract. 2005;22(3):234–41.CrossRef de Lusignan S, Chan T, Stevens P, O'donoghue D, Hague N, Dzregah B, et al. Identifying patients with chronic kidney disease from general practice computer records. Fam Pract. 2005;22(3):234–41.CrossRef
9.
Zurück zum Zitat Levey AS, Eckardt KU, Tsukamoto Y, Levin A, Coresh J, Rossert J, et al. Definition and classification of chronic kidney disease: a position statement from kidney disease: improving global outcomes (KDIGO). Kidney Int. 2005;67(6):2089–100.CrossRef Levey AS, Eckardt KU, Tsukamoto Y, Levin A, Coresh J, Rossert J, et al. Definition and classification of chronic kidney disease: a position statement from kidney disease: improving global outcomes (KDIGO). Kidney Int. 2005;67(6):2089–100.CrossRef
10.
Zurück zum Zitat Ribeiro RT, Marinho RT, Miguel Sanches J. Classification and staging of chronic liver disease from multimodal data. IEEE Trans Biomed Eng. 2013;60(5):1336–134.CrossRef Ribeiro RT, Marinho RT, Miguel Sanches J. Classification and staging of chronic liver disease from multimodal data. IEEE Trans Biomed Eng. 2013;60(5):1336–134.CrossRef
11.
Zurück zum Zitat Bhatla N, Jyoti K. An analysis of heart disease prediction using different data mining techniques. IJERT. 2012; 1(8). Bhatla N, Jyoti K. An analysis of heart disease prediction using different data mining techniques. IJERT. 2012; 1(8).
12.
Zurück zum Zitat Palaniappan S, Awang R. Intelligent heart disease prediction system using data mining techniques. Int J Comput Sci Netw Sec. 2008;8(8):1–8. Palaniappan S, Awang R. Intelligent heart disease prediction system using data mining techniques. Int J Comput Sci Netw Sec. 2008;8(8):1–8.
13.
Zurück zum Zitat Ho C, Pai T, Peng Y, Lee C, Chen Y, Chen Y. Ultrasonography image analysis for detection and classification of chronic kidney disease. IEEE Complex Intell Softw Intens Syst. 2012; 624–629. Ho C, Pai T, Peng Y, Lee C, Chen Y, Chen Y. Ultrasonography image analysis for detection and classification of chronic kidney disease. IEEE Complex Intell Softw Intens Syst. 2012; 624–629.
14.
Zurück zum Zitat Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D Top 10 algorithms in data mining. Knowl Inf Syst 14, 1–37, 2008. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D Top 10 algorithms in data mining. Knowl Inf Syst 14, 1–37, 2008.
15.
Zurück zum Zitat Kim MJ, Suh DJ. Profiles of serum bile acids in liver diseases. Korean J Intern Med. 1986;1(1):37–43.CrossRef Kim MJ, Suh DJ. Profiles of serum bile acids in liver diseases. Korean J Intern Med. 1986;1(1):37–43.CrossRef
16.
Zurück zum Zitat Adekanle O, Ndububa DA, Olowookere SA, Ijarotimi O, Ijadunola KT. Knowledge of hepatitis B virus infection, immunization with hepatitis B vaccine, risk perception, and challenges to control hepatitis among hospital workers in a Nigerian tertiary hospital. Hepatitis Res Treat. 2015, 1:6. Adekanle O, Ndububa DA, Olowookere SA, Ijarotimi O, Ijadunola KT. Knowledge of hepatitis B virus infection, immunization with hepatitis B vaccine, risk perception, and challenges to control hepatitis among hospital workers in a Nigerian tertiary hospital. Hepatitis Res Treat. 2015, 1:6.
17.
Zurück zum Zitat Sharma P, Kaur M. Classification in pattern recognition: a review. Int J Adv Res Comput Sci Softw Eng. 2013;3:298. Sharma P, Kaur M. Classification in pattern recognition: a review. Int J Adv Res Comput Sci Softw Eng. 2013;3:298.
18.
Zurück zum Zitat Kumar Dewangan A, Agrawal P. Classification of diabetes mellitus using machine learning techniques. Int J Eng Appl Sci. 2015;2(5):145–8. Kumar Dewangan A, Agrawal P. Classification of diabetes mellitus using machine learning techniques. Int J Eng Appl Sci. 2015;2(5):145–8.
19.
Zurück zum Zitat Nai-arun N, Moungmai R. Comparison of classifiers for the risk of diabetes prediction. Proc Comput Sci. 2015;69:132–42.CrossRef Nai-arun N, Moungmai R. Comparison of classifiers for the risk of diabetes prediction. Proc Comput Sci. 2015;69:132–42.CrossRef
20.
Zurück zum Zitat Zheng T, Xie W, Xu L, He X, Zhang Y, You M, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.CrossRef Zheng T, Xie W, Xu L, He X, Zhang Y, You M, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.CrossRef
21.
Zurück zum Zitat Pradeep KR, Naveen NC. Predictive analysis of diabetes using J48 algorithm of classification techniques. Contemporary Computing and Informatics (IC3I), 2016 2nd International Conference on. 2016; 347–352). IEEE. Pradeep KR, Naveen NC. Predictive analysis of diabetes using J48 algorithm of classification techniques. Contemporary Computing and Informatics (IC3I), 2016 2nd International Conference on. 2016; 347–352). IEEE.
22.
Zurück zum Zitat Bashir S, Qamar U, Khan FH, Javed MY. An efficient rule-based classification of Diabetes using ID3, C4. 5, & CART ensembles. 2014 12th International Conference on Frontiers of Information Technology (FIT). 2014; 226–231. IEEE. Bashir S, Qamar U, Khan FH, Javed MY. An efficient rule-based classification of Diabetes using ID3, C4. 5, & CART ensembles. 2014 12th International Conference on Frontiers of Information Technology (FIT). 2014; 226–231. IEEE.
23.
Zurück zum Zitat Guo Y, Bai G, Hu Y. Using bayes network for prediction of type-2 diabetes. Internet Technology Secured Transactions, 2012 International Conf. 2012; 471–472. IEEE. Guo Y, Bai G, Hu Y. Using bayes network for prediction of type-2 diabetes. Internet Technology Secured Transactions, 2012 International Conf. 2012; 471–472. IEEE.
24.
Zurück zum Zitat Lee BJ, Ku B, Nam J, Pham DD, Kim JY. Prediction of fasting plasma glucose status using anthropometric measures for diagnosing type 2 diabetes. IEEE J Biomed Health Inform. 2014;18(2):555–61.CrossRef Lee BJ, Ku B, Nam J, Pham DD, Kim JY. Prediction of fasting plasma glucose status using anthropometric measures for diagnosing type 2 diabetes. IEEE J Biomed Health Inform. 2014;18(2):555–61.CrossRef
25.
Zurück zum Zitat Meng XH, Huang YX, Rao DP, Zhang Q, Liu Q. Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. Kaohsiung J Med Sci. 2013;29(2):93–9.CrossRef Meng XH, Huang YX, Rao DP, Zhang Q, Liu Q. Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. Kaohsiung J Med Sci. 2013;29(2):93–9.CrossRef
26.
Zurück zum Zitat Übeyli ED. Implementing automated diagnostic systems for breast cancer detection. Expert Syst Appl. 2007;33(4):1054–62.CrossRef Übeyli ED. Implementing automated diagnostic systems for breast cancer detection. Expert Syst Appl. 2007;33(4):1054–62.CrossRef
27.
Zurück zum Zitat Gerson SL, Jensen RA. Patient access to academic cancer centers. J Med Syst. 2018;42(5):86.CrossRef Gerson SL, Jensen RA. Patient access to academic cancer centers. J Med Syst. 2018;42(5):86.CrossRef
28.
Zurück zum Zitat Gupte A, Joshi S, Gadgul P, Kadam A. Comparative study of classification algorithms used in sentiment analysis. Int J Comput Sci Inform Technol. 2014;5(5):1–4. Gupte A, Joshi S, Gadgul P, Kadam A. Comparative study of classification algorithms used in sentiment analysis. Int J Comput Sci Inform Technol. 2014;5(5):1–4.
29.
Zurück zum Zitat Polat K, Günes S. Breast cancer diagnosis using least square support vectormachine. Digit Sign Process. 2007;17(4):694–701.CrossRef Polat K, Günes S. Breast cancer diagnosis using least square support vectormachine. Digit Sign Process. 2007;17(4):694–701.CrossRef
32.
Zurück zum Zitat Kirubha V, Priya SM. Survey on data mining algorithms in disease prediction. Int J Comput Trends Technol. 2016;38(3):24–128.CrossRef Kirubha V, Priya SM. Survey on data mining algorithms in disease prediction. Int J Comput Trends Technol. 2016;38(3):24–128.CrossRef
33.
Zurück zum Zitat Pakhale H, Xaxa DK. A survey on diagnosis of liver disease classification. Int J Eng Techn. 2016;2:2395–1303. Pakhale H, Xaxa DK. A survey on diagnosis of liver disease classification. Int J Eng Techn. 2016;2:2395–1303.
34.
Zurück zum Zitat Sen SK, Dash S. Application of Meta learning algorithms for the prediction of diabetes disease. Int J Adv Res Comput Sci Manag Stud. 2014;2:396–401. Sen SK, Dash S. Application of Meta learning algorithms for the prediction of diabetes disease. Int J Adv Res Comput Sci Manag Stud. 2014;2:396–401.
36.
Zurück zum Zitat Patil TR, Sherekar SS. Performance analysis of naive Bayes and J48 classification algorithm for data classification. Int J Comput Sci Appl. 2013;6(2):256–61. Patil TR, Sherekar SS. Performance analysis of naive Bayes and J48 classification algorithm for data classification. Int J Comput Sci Appl. 2013;6(2):256–61.
37.
Zurück zum Zitat Miranda E, Irwansyah E, Amelga AY, Maribondang MM, Salim M. Detection of cardiovascular disease risk's level for adults using naive Bayes classifier. Healthcare Inform Res. 2016;22(3):196–205.CrossRef Miranda E, Irwansyah E, Amelga AY, Maribondang MM, Salim M. Detection of cardiovascular disease risk's level for adults using naive Bayes classifier. Healthcare Inform Res. 2016;22(3):196–205.CrossRef
38.
Zurück zum Zitat Teli S, Kanikar P. A survey on decision tree based approaches in data mining. Int J Adv Res Comput Sci Softw Eng. 2015;5(4):1–5. Teli S, Kanikar P. A survey on decision tree based approaches in data mining. Int J Adv Res Comput Sci Softw Eng. 2015;5(4):1–5.
39.
Zurück zum Zitat Sindhuja D, Priyadarsini RJ. A survey on classification techniques in data mining for analyzing liver disease disorder. Int J Comput Sci Mobile Comput. 2016;5(5):483–8. Sindhuja D, Priyadarsini RJ. A survey on classification techniques in data mining for analyzing liver disease disorder. Int J Comput Sci Mobile Comput. 2016;5(5):483–8.
40.
Zurück zum Zitat Kaur R. Using some data mining techniques to predict the survival year of lung cancer patient. Int J Comput Sci Mobile Comput. 2013;2(4):1–6. Kaur R. Using some data mining techniques to predict the survival year of lung cancer patient. Int J Comput Sci Mobile Comput. 2013;2(4):1–6.
41.
Zurück zum Zitat Romani S, Hosseini SM, Mohebbi SR, Kazemian S, Derakhshani S, Khanyaghma M, et al. Interleukin-16 gene polymorphisms are considerable host genetic factors for patients’ susceptibility to chronic hepatitis B infection. Hepatitis research and treatment. 2014, 1:5. Romani S, Hosseini SM, Mohebbi SR, Kazemian S, Derakhshani S, Khanyaghma M, et al. Interleukin-16 gene polymorphisms are considerable host genetic factors for patients’ susceptibility to chronic hepatitis B infection. Hepatitis research and treatment. 2014, 1:5.
42.
Zurück zum Zitat Sira MM, Behairy BE, Abd-Elaziz AM, Abd Elnaby SA, Eltahan EE. Serum inter-alpha-trypsin inhibitor heavy chain 4 (ITIH4) in children with chronic hepatitis C: relation to liver fibrosis and viremia. Hepatitis Res Treat. 2014, 1:7. Sira MM, Behairy BE, Abd-Elaziz AM, Abd Elnaby SA, Eltahan EE. Serum inter-alpha-trypsin inhibitor heavy chain 4 (ITIH4) in children with chronic hepatitis C: relation to liver fibrosis and viremia. Hepatitis Res Treat. 2014, 1:7.
43.
Zurück zum Zitat Pouriyeh S, Vahid S, Sannino G, De Pietro G, Arabnia H, Gutierrez J. A comprehensive investigation and comparison of Machine Learning Techniques in the domain of heart disease. Comput Commun (ISCC), 2017 IEEE Symposium. 2017; 204–207. IEEE. Pouriyeh S, Vahid S, Sannino G, De Pietro G, Arabnia H, Gutierrez J. A comprehensive investigation and comparison of Machine Learning Techniques in the domain of heart disease. Comput Commun (ISCC), 2017 IEEE Symposium. 2017; 204–207. IEEE.
44.
Zurück zum Zitat Fatima M, Pasha M. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;9(01):1–16. Fatima M, Pasha M. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;9(01):1–16.
45.
Zurück zum Zitat Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. Emerg Artif Intell Applic Comput Eng. 2007;160:3–24. Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. Emerg Artif Intell Applic Comput Eng. 2007;160:3–24.
46.
Zurück zum Zitat Mythili MS, Shanavas ARM. An analysis of students’ performance using classification algorithms. IOSR J Comput Eng. 2014;16(1):63–9.CrossRef Mythili MS, Shanavas ARM. An analysis of students’ performance using classification algorithms. IOSR J Comput Eng. 2014;16(1):63–9.CrossRef
47.
Zurück zum Zitat Elsayad A, Fakr M. Diagnosis of cardiovascular diseases with Bayesian classifiers. J Comput Sci. 2015;11(2):274–82.CrossRef Elsayad A, Fakr M. Diagnosis of cardiovascular diseases with Bayesian classifiers. J Comput Sci. 2015;11(2):274–82.CrossRef
48.
Zurück zum Zitat Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. J Artif Intell Med. 2001;1:89–109.CrossRef Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. J Artif Intell Med. 2001;1:89–109.CrossRef
49.
Zurück zum Zitat Karabatak M. A new classifier for breast cancer detection based on Naïve Bayesian. Measurement. 2015;72:32–6.CrossRef Karabatak M. A new classifier for breast cancer detection based on Naïve Bayesian. Measurement. 2015;72:32–6.CrossRef
50.
Zurück zum Zitat Marcano-Cedeño A, Quintanilla-Domínguez J, Andina D. WBCD breast cancer database classification applying artificial meta plasticity neural network. Expert Syst Appl. 2011;38(8):9573–9.CrossRef Marcano-Cedeño A, Quintanilla-Domínguez J, Andina D. WBCD breast cancer database classification applying artificial meta plasticity neural network. Expert Syst Appl. 2011;38(8):9573–9.CrossRef
51.
Zurück zum Zitat Ba-Alwi FM, Hintaya HM. Comparative study for analysis the prognostic in hepatitis data: data mining approach. Int J Sci Eng Res. 2013;4:680–5. Ba-Alwi FM, Hintaya HM. Comparative study for analysis the prognostic in hepatitis data: data mining approach. Int J Sci Eng Res. 2013;4:680–5.
52.
Zurück zum Zitat Singh Y, Bhatia PK, Sangwan O. A review of studies on machine learning techniques. Int J Comput Sci Secur. 2007;1:70–84. Singh Y, Bhatia PK, Sangwan O. A review of studies on machine learning techniques. Int J Comput Sci Secur. 2007;1:70–84.
Metadaten
Titel
Ensemble method based predictive model for analyzing disease datasets: a predictive analysis approach
verfasst von
Dharavath Ramesh
Yogendra Singh Katheria
Publikationsdatum
05.02.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Health and Technology / Ausgabe 4/2019
Print ISSN: 2190-7188
Elektronische ISSN: 2190-7196
DOI
https://doi.org/10.1007/s12553-019-00299-3

Weitere Artikel der Ausgabe 4/2019

Health and Technology 4/2019 Zur Ausgabe