Skip to main content

Advertisement

Log in

A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery

  • Systems-Level Quality Improvement
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

The increasing amount of data produced by various biomedical and healthcare systems has led to a need for methodologies related to knowledge data discovery. Data mining (DM) offers a set of powerful techniques that allow the identification and extraction of relevant information from medical datasets, thus enabling doctors and patients to greatly benefit from DM, particularly in the case of diseases with high mortality and morbidity rates, such as heart disease (HD). Nonetheless, the use of raw medical data implies several challenges, such as missing data, noise, redundancy and high dimensionality, which make the extraction of useful and relevant information difficult and challenging. Intensive research has, therefore, recently begun in order to prepare raw healthcare data before knowledge extraction. In any knowledge data discovery (KDD) process, data preparation is the step prior to DM that deals with data imperfectness in order to improve its quality so as to satisfy the requirements and improve the performances of DM techniques. The objective of this paper is to perform a systematic mapping study (SMS) on data preparation for KDD in cardiology so as to provide an overview of the quantity and type of research carried out in this respect. The SMS consisted of a set of 58 selected papers published in the period January 2000 and December 2017. The selected studies were analyzed according to six criteria: year and channel of publication, preparation task, medical task, DM objective, research type and empirical type. The results show that a high amount of data preparation research was carried out in order to improve the performance of DM-based decision support systems in cardiology. Researchers were mainly interested in the data reduction preparation task and particularly in feature selection. Moreover, the majority of the selected studies focused on classification for the diagnosis of HD. Two main research types were identified in the selected studies: solution proposal and evaluation research, and the most frequently used empirical type was that of historical-based evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Ting, S. L., Shum, C. C., Kwok, S. K., Tsang, A. H. C., and Lee, W. B., Data mining in biomedicine: current applications and further directions for research. J. Softw. Eng. Appl. 2:150–159, 2009. https://doi.org/10.4236/jsea.2009.23022.

    Article  Google Scholar 

  2. Kurgan, L. A., and Musilek, P., A survey of knowledge discovery and data mining process models. Knowl. Eng. Rev. 21:1, 2006. https://doi.org/10.1017/S0269888906000737.

    Article  Google Scholar 

  3. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P., From data mining to knowledge discovery in databases. AI Mag. 17:37, 1996. https://doi.org/10.1609/aimag.v17i3.1230.

    Article  Google Scholar 

  4. Goebel, M., and Gruenwald, L., A survey of data mining and knowledge discovery software tools. ACM SIGKDD Explor. Newsl. 1:20–33, 1999. https://doi.org/10.1145/846170.846172.

    Article  Google Scholar 

  5. Kadi, I., Idri, A., and Fernandez-Aleman, J. L., Systematic mapping study of data mining–based empirical studies in cardiology. Health Informat J. 2017. https://doi.org/10.1177/8081460458217717636.

  6. Han, J., Kamber, M., Jian, P., Data Mining : Concepts and Techniques Third Edition, p 744, 2011. https://www.elsevier.com/books/data-mining-concepts-and-techniques/han/978-0-12-381479-1. Accessed May 2018

  7. Maimon, O., Rokach, L., Data Mining and Knowledge Discovery Handbook (2nd ed.). Springer Publishing Company, Incorporated. 2010 https://doi.org/10.1007/978-0-387-09823-4.

  8. Almuhaideb, S., and Menai, M. E. B., Impact of preprocessing on medical data classification. Front. Comput. Sci. 10:1082–1102, 2016. https://doi.org/10.1007/s11704-016-5203-5.

    Article  Google Scholar 

  9. García, S., Luengo, J., and Herrera, F., Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl.-Based Syst., 2015. https://doi.org/10.1016/j.knosys.2015.12.006.

  10. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G., Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73:220–239, 2017. https://doi.org/10.1016/j.eswa.2016.12.035.

    Article  Google Scholar 

  11. He, H., and Garcia, E. A., Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21:1263–1284, 2009. https://doi.org/10.1109/TKDE.2008.239.

    Article  Google Scholar 

  12. Jabbar, M. A., Deekshatulu, B. L., and Chandra, P., Heart disease classification using nearest neighbor classifier with feature subset selection. Ann. Comput. Sci. Ser. XI:47–54, 2013 http://www.anale-informatica.tibiscus.ro/download/lucrari/11-1-06-Jabbar.pdf Accessed May, 2018.

  13. Mendes, D., Paredes, S., Rocha, T., Carvalho, P., Henriques, J., Cabiddu, R., and Morais, J., Assessment of cardiovascular risk based on a data-driven knowledge discovery approach. Conf Proc IEEE Eng Med Biol Soc. 2015:6800–6803, 2015. https://doi.org/10.1109/EMBC.2015.7319955.

    Article  CAS  Google Scholar 

  14. Gaziano, T. A., Reddy, K. S., Paccaud, F., Horton, S., Cardiovascular Disease. Disease Control Priorities in Developing Countries. 2nd edition. Washington (DC): World Bank; Chapter 33 2006. https://doi.org/10.1596/978-0-8213-6179-5.

  15. World Health Organization, The world health report 2002 - Reducing Risks, Promoting Healthy Life, 2002, 2002. https://doi.org/10.1080/1357628031000116808.

  16. Kadi, I., Idri, A., and Fernandez-Aleman, J. L., Systematic mapping study of data mining–based empirical studies in cardiology. Health Informatics J.:146045821771763, 2017. https://doi.org/10.1177/1460458217717636.

  17. Benhar H., Idri A., Fernández-Alemán J.L. (2018) Data preprocessing for decision making in medical informatics: potential and analysis. In: Rocha Á., Adeli H., Reis L., Costanzo S. (eds) Trends and advances in information systems and technologies. WorldCIST'18 2018. Advances in intelligent systems and computing, vol 746. Springer, Cham.

  18. Idri, A., Benhar, H., Fernández-Alemán, J. L., and Kadi, I., A systematic map of medical data preprocessing in knowledge discovery. Comput. Methods Prog. Biomed. 162:69–85, 2018. https://doi.org/10.1016/j.cmpb.2018.05.007.

    Article  CAS  Google Scholar 

  19. Yu, S. N., and Chen, Y. H., Noise-tolerant electrocardiogram beat classification based on higher order statistics of subband components. Artif. Intell. Med. 46:165–178, 2009. https://doi.org/10.1016/j.artmed.2008.11.004.

    Article  PubMed  Google Scholar 

  20. Zhang, Y., Kambhampati, C., Davis, D. N., Goode, K., Cleland, J. G. F., A comparative study of missing value imputation with multiclass classification for clinical heart failure data. In: Proc. - 2012 9th Int. Conf. Fuzzy Syst. Knowl. Discov. FSKD 2012, pp. 2840–2844, 2012. https://doi.org/10.1109/FSKD.2012.6233805.

  21. Alickovic, E., and Subasi, A., Effect of multiscale PCA De-noising in ECG beat classification for diagnosis of cardiovascular diseases. Circ Syst Signal PR Journal. 34:513–533, 2014. https://doi.org/10.1007/s00034-014-9864-8.

  22. Sáez, J. A., Krawczyk, B., and Woźniak, M., On the influence of class noise in medical data classification: Treatment using noise filtering methods. Appl. Artif. Intell. 30:590–609, 2016. https://doi.org/10.1080/08839514.2016.1193719.

    Article  Google Scholar 

  23. Ragothaman, B., and Sarojini, B., A multi-objective non-dominated sorted artificial bee colony feature selection algorithm for medical datasets. Indian J. Sci. Technol. 9, 2016. https://doi.org/10.17485/ijst/2016/v9i45/102290.

  24. Petersen, K., Feldt, R., Mujtaba, S., Mattsson, M., Systematic mapping studies in software engineering. In: EASE’08 Proc. 12th Int. Conf. Eval. Assess. Softw. Eng., pp. 68–77, 2008. https://doi.org/10.1142/S0218194007003112.

  25. Petersen, K., Vakkalanka, S., Kuzniarz, L., Guidelines for conducting systematic mapping studies in software engineering: An update. In: Inf. Softw. Technol., pp. 1–18, 2015. https://doi.org/10.1016/j.infsof.2015.03.007.

  26. Esfandiari, N., Babavalian, M. R., Moghadam, A. M. E., and Tabar, V. K., Knowledge discovery in medicine: Current issue and future trend. Expert Syst. Appl. 41:4434–4463, 2014. https://doi.org/10.1016/j.eswa.2014.01.011.

    Article  Google Scholar 

  27. Sardi, L., Idri, A., and Fernández-Alemán, J. L., A systematic review of gamification in e-health. J. Biomed. Inform. 71:31–48, 2017. https://doi.org/10.1016/j.jbi.2017.05.011.

    Article  PubMed  Google Scholar 

  28. Idri, A., Hosni, M., and Abran, A., Systematic literature review of ensemble effort estimation. J. Syst. Softw. 118:151–175, 2016. https://doi.org/10.1016/j.jss.2016.05.016.

    Article  Google Scholar 

  29. Idri, A., Amazal, F. A., and Abran, A., Analogy-based software development effort estimation: A systematic mapping and review. Inf. Softw. Technol. 58:206–230, 2015. https://doi.org/10.1016/j.infsof.2014.07.013.

    Article  Google Scholar 

  30. Ouhbi, S., Idri, A., Fernández-Alemán, J. L., and Toval, A., Requirements engineering education: A systematic mapping study. Requir. Eng. 20:119–138, 2013. https://doi.org/10.1007/s00766-013-0192-5.

    Article  Google Scholar 

  31. Wieringa, R., Maiden, N., Mead, N., and Rolland, C., Requirements engineering paper classification and evaluation criteria: A proposal and a discussion. Requir. Eng. 11:102–107, 2006. https://doi.org/10.1007/s00766-005-0021-6.

    Article  Google Scholar 

  32. Condori-Fernandez, N., Daneva, M., Sikkel, K., Wieringa, R., Dieste, O., Pastor, O., A Systematic mapping study on empirical evaluation of software requirements specifications techniques. In: 2009 3rd Int. Symp. Empir. Softw. Eng. Meas., pp. 502–505, 2009. https://doi.org/10.1109/ESEM.2009.5314232.

  33. Niazi, K. A. K., Khan, S. A., Shaukat, A., Akhtar, M., Identifying best feature subset for cardiac arrhythmia classification. In: Sci. Inf. Conf., IEEE, 2015, pp. 494–499, 2015. https://doi.org/10.1109/SAI.2015.7237188.

  34. Yilmaz, N., Inan, O., and Uzer, M. S., A new data preparation method based on clustering algorithms for diagnosis systems of heart and diabetes diseases. J. Med. Syst. 38, 2014. https://doi.org/10.1007/s10916-014-0048-7.

  35. Qin, C.-J., Guan, Q., and Wang, X.-P., Application of ensemble algorithm integrating multiple criteria feature selection in coronary heart disease detection. Biomed Eng (Singapore) 29, 2017. https://doi.org/10.4015/S1016237217500430.

  36. Fatima, M., Basharat, I., Khan, S. A., Anjum, A. R., Biomedical (cardiac) data mining: Extraction of significant patterns for predicting heart condition. In: 2014 IEEE Conf. Comput. Intell. Bioinforma. Comput. Biol. CIBCB 2014, 2014. https://doi.org/10.1109/CIBCB.2014.6845499.

  37. Poolsawad, N., Moore, L., Kambhampati, C., and Cleland, J. G. F., Issues in the mining of heart failure datasets. Int. J. Autom. Comput. 11:162–179, 2014. https://doi.org/10.1007/s11633-014-0778-5.

    Article  Google Scholar 

  38. Verma, L., Srivastava, S., and Negi, P. C., An intelligent noninvasive model for coronary artery disease detection. Complex Intell. Syst., 2017. https://doi.org/10.1007/s40747-017-0048-6.

  39. Babaoglu, İ., Findik, O., and Ülker, E., A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine. Expert Syst. Appl. 37:3177–3183, 2010. https://doi.org/10.1016/j.eswa.2009.09.064.

    Article  Google Scholar 

  40. Wosiak, A., Zakrzewska, D., Unsupervised feature selection using reversed correlation for improved medical diagnosis. In: Proc. - 2017 IEEE Int. Conf. Innov. Intell. Syst. Appl. INISTA 2017, pp. 18–22, 2017. https://doi.org/10.1109/INISTA.2017.8001125.

  41. Son, C.-S., Kim, Y.-N., Kim, H.-S., Park, H.-S., and Kim, M.-S., Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. J. Biomed. Inform. 45:999–1008, 2012. https://doi.org/10.1016/j.jbi.2012.04.013.

    Article  PubMed  Google Scholar 

  42. Sufi, F., and Khalil, I., Diagnosis of cardiovascular abnormalities from compressed ECG: A data mining-based approach. IEEE Trans. Inf. Technol. Biomed. 15:33–39, 2011. https://doi.org/10.1109/TITB.2010.2094197.

    Article  PubMed  Google Scholar 

  43. Anbarasi, M., Anupriya, E., and Iyengar, N. C. S. N., Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int. J. Eng. Sci. Technol. 2:5370–5376, 2010.

    Google Scholar 

  44. Peter, T. J., and Somasundaram, K., Study and development of novel feature selection framework for heart disease prediction. IJSRP 2:1–7, 2012.

    Google Scholar 

  45. Konias, S., Chouvarda, I., Vlahavas, I., and Maglaveras, N., A novel approach for incremental uncertainty rule generation from databases with missing values handling: Application to dynamic medical databases. Med. Inform. Internet Med. 30:211–225, 2005. https://doi.org/10.1080/14639230500209336.

    Article  PubMed  Google Scholar 

  46. Exarchos, T. P., Papaloukas, C., Fotiadis, D. I., and Michalis, L. K., An association rule mining-based methodology for automated detection of ischemic ECG beats. IEEE Trans. Biomed. Eng. 53:1531–1540, 2006. https://doi.org/10.1109/TBME.2006.873753.

    Article  PubMed  Google Scholar 

  47. Sasikala, S., Appavu alias Balamurugan, S., and Geetha, S., RF-SEA-based feature selection for data classification in medical domain. ICACNI 243:599–608, 2014. https://doi.org/10.1007/978-81-322-1665-0_59.

    Article  Google Scholar 

  48. Rajeswari, K., Vaithiyanathan, V., and Neelakantan, T. R., Feature selection in ischemic heart disease identification using feed forward neural networks. Procedia Eng. 41:1818–1823, 2012. https://doi.org/10.1016/j.proeng.2012.08.109.

    Article  Google Scholar 

  49. Pizzi, N. J., Fuzzy quartile encoding as a preprocessing method for biomedical pattern classification. Theor. Comput. Sci. 412:5909–5925, 2011. https://doi.org/10.1016/j.tcs.2011.05.043.

    Article  Google Scholar 

  50. Dag, A., Oztekin, A., Yucel, A., Bulur, S., and Megahed, F. M., Predicting heart transplantation outcomes through data analytics. Decis. Support. Syst. 94:42–52, 2017. https://doi.org/10.1016/j.dss.2016.10.005.

    Article  Google Scholar 

  51. Pölsterl, S., Conjeti, S., Navab, N., and Katouzian, A., Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection. Artif. Intell. Med. 72:1–11, 2016. https://doi.org/10.1016/j.artmed.2016.07.004.

    Article  PubMed  Google Scholar 

  52. Jaganathan, P., and Kuppuchamy, R., A threshold fuzzy entropy based feature selection for medical database classification. Comput. Biol. Med. 43:2222–2229, 2013. https://doi.org/10.1016/j.compbiomed.2013.10.016.

    Article  CAS  PubMed  Google Scholar 

  53. Shao, Y. E., Hou, C. D., and Chiu, C. C., Hybrid intelligent modeling schemes for heart disease classification. Appl. Soft Comput. J. 14 (47–52, 2014. https://doi.org/10.1016/j.asoc.2013.09.020.

    Article  Google Scholar 

  54. Jiang, X., Zhang, L., Zhao, Q., Albayrak, S., ECG arrhythmias recognition system based on independent component analysis feature extraction. In: TENCON 2006–2006 IEEE Reg. 10 Conf., IEEE, pp. 1–4, 2006. https://doi.org/10.1109/TENCON.2006.343781.

  55. Zhao, Q., Zhang, L., ECG feature extraction and classification using wavelet transform and support vector machines. In: 2005 Int. Conf. Neural Networks Brain, pp. 1089–1092, 2005. https://doi.org/10.1109/ICNNB.2005.1614807.

  56. Abraham, R., Simha, J. B., Iyengar, S. S., Medical datamining with a new algorithm for feature selection and naive bayesian classifier. In: 10th Int. Conf. Inf. Technol. (ICIT 2007), IEEE, pp. 44–49, 2007. https://doi.org/10.1109/ICIT.2007.41.

  57. Abraham, R., Simha, J. B., Iyengar, S. S., A comparative analysis of discretization methods for medical datamining with Naïve Bayesian classifier. In: Proc. - 9th Int. Conf. Inf. Technol. ICIT 2006, pp. 235–236, 2007. https://doi.org/10.1109/ICIT.2006.5.

  58. Jabbar, M. A., Deekshatulu, B. L., and Chandra, P., Classification of heart disease using artificial neural network and feature subset selection. GJCST 13:5–14, 2013.

    Google Scholar 

  59. Song, M. H., Lee, J., Cho, S. P., Lee, K. J., and Yoo, S. K., Support vector machine-based arrhythmia classification using reduced features. Int. J. Control. Autom. Syst. 3:571–579, 2005. https://doi.org/10.1016/j.artmed.2008.04.007.

    Article  Google Scholar 

  60. Bhatia, S., Prakash, P., Pillai, G. N., SVM based decision support system for heart disease classification with integer-coded genetic algorithm to select critical features. In: Proc. World Congr. Eng. Comput. Sci., 2008.

  61. Millet-Roig, J., Ventura-Galiano, R., Chorro-Gasco, F. J., Cebrian, A., Support vector machine for arrhythmia discrimination with wavelet transform-based feature selection, in: Comput. Cardiol. 2000. vol. 27 (Cat. 00CH37163), IEEE, pp. 407–410, 2000. https://doi.org/10.1109/CIC.2000.898543.

  62. Lee, I.-N., Liao, S.-C., and Embrechts, M., Data mining techniques applied to medical information. Med. Inform. Internet Med. 25:81–102, 2009. https://doi.org/10.1080/14639230050058275.

    Article  Google Scholar 

  63. Llamedo Soria, M., and Martínez, J. P., An ECG classification model based on multilead wavelet transform features. Comput. Cardiol. 34:105–108, 2007. https://doi.org/10.1109/CIC.2007.4745432.

    Article  Google Scholar 

  64. Hejazi, M., Al-Haddad, S. A. R., Singh, Y. P., Hashim, S. J., and Aziz, A. F. A., Multiclass support vector machines for classification of ECG data with missing values. Appl. Artif. Intell. 29:660–674, 2015. https://doi.org/10.1080/08839514.2015.1051887

  65. Weston, J., Watkins, C., Support vector machines for multi-class pattern recognition. In ESANN, 1999

  66. Zhu, X., Zhang, S., Jin, Z., Zhang, Z., and Xu, Z., Missing value estimation for mixed-attribute data sets. IEEE Trans. Knowl. Data Eng. 23(1):110–121, 2011.

    Article  Google Scholar 

  67. Chen, H.-H., Pai, P.-F., Cho, Y.-Z., Lee, F.-C., and Fu, J.-C., An improved support vector machines model in medical data analysis. Int. J. Math. Model. Numer. Optim. 1:168–184, 2010. https://doi.org/10.1504/IJMMNO.2010.031747.

    Article  Google Scholar 

  68. Li, Q., Li, T., Zhu, S., Kambhamettu, C., Improving medical/biological data classification performance by wavelet preprocessing. In: 2002 IEEE Int. Conf. Data Mining, 2002. Proceedings., IEEE Comput. Soc, pp. 657–660, 2002. https://doi.org/10.1109/ICDM.2002.1184022.

  69. Kutlu, Y., and Kuntalp, D., A multi-stage automatic arrhythmia recognition and classification system. Comput. Biol. Med. 41:37–45, 2011. https://doi.org/10.1016/j.compbiomed.2010.11.003.

    Article  PubMed  Google Scholar 

  70. Mitra, M., Samanta, R. K., Cardiac arrhythmia classification using neural networks with selected features. In: Int. Conf. Comput. Intell. Model. Tech. Appl., pp. 76–84, 2013. https://doi.org/10.1016/j.protcy.2013.12.339.

  71. Melgani, F., and Bazi, Y., Classification of electrocardiogram signals with support vector machines and particle swarm optimization. IEEE Trans. Inf. Technol. Biomed. 12:667–677, 2008. https://doi.org/10.1109/TITB.2008.923147.

    Article  PubMed  Google Scholar 

  72. Anooj, P. K., Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules. J. King Saud Univ. - Comput. Inf. Sci. 24:27–40, 2012. https://doi.org/10.1016/j.jksuci.2011.09.002.

    Article  Google Scholar 

  73. Dobbins, C., Rawassizadeh, R., Clustering of physical activities for quantified self and mhealth applications. In: Proc. - 15th IEEE Int. Conf. Comput. Inf. Technol. CIT 2015, 14th IEEE Int. Conf. Ubiquitous Comput. Commun. IUCC 2015, 13th IEEE Int. Conf. Dependable, Auton. Se, pp. 1423–1428, 2015. https://doi.org/10.1109/CIT/IUCC/DASC/PICOM.2015.213.

  74. Jabbar, M. A., Deekshatulu, B. L., Chandra, P., Computational intelligence technique for early diagnosis of heart disease. In: 2015 IEEE Int. Conf. Eng. Technol, pp. 1–6, 2015. https://doi.org/10.1109/ICETECH.2015.7275001.

  75. Wang, J.-S., Chiang, W.-C., Hsu, Y.-L., and Yang, Y.-T. C., ECG arrhythmia classification using a probabilistic neural network with a feature reduction method. Neurocomputing 116:38–45, 2013. https://doi.org/10.1016/j.neucom.2011.10.045.

    Article  Google Scholar 

  76. Abawajy, J. H., Kelarev, A. V., and Chowdhury, M., Multistage approach for clustering and classification of ECG data. Comput. Methods Prog. Biomed. 112:720–730, 2013. https://doi.org/10.1016/j.cmpb.2013.08.002.

    Article  CAS  Google Scholar 

  77. Asl, B. M., Setarehdan, S. K., and Mohebbi, M., Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal. Artif. Intell. Med. 44:51–64, 2008. https://doi.org/10.1016/j.artmed.2008.04.007.

    Article  PubMed  Google Scholar 

  78. Abdel-Aal, R. E., Improved classification of medical data using abductive network committees trained on different feature subsets. Comput. Methods Prog. Biomed. 80:141–153, 2005. https://doi.org/10.1016/j.cmpb.2005.08.001.

    Article  CAS  Google Scholar 

  79. Polat, K., and Güneş, S., A new feature selection method on classification of medical datasets: Kernel F-score feature selection. Expert Syst. Appl. 36:10367–10373, 2009. https://doi.org/10.1016/j.eswa.2009.01.041.

    Article  Google Scholar 

  80. Vivekanandan, T., and Sriman Narayana Iyengar, N. C., Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease. Comput. Biol. Med. 90:125–136, 2017. https://doi.org/10.1016/j.compbiomed.2017.09.011.

    Article  CAS  PubMed  Google Scholar 

  81. Xu, S., Zhang, Z., Wang, D., Hu, J., Duan, X., Zhu, T., Cardiovascular risk prediction method based on CFS subset evaluation and random forest classification framework. In: 2017 IEEE 2nd Int. Conf. Big Data Anal. (ICBDA), IEEE, pp. 228–232, 2017. https://doi.org/10.1109/ICBDA.2017.8078813.

  82. Meenachi, L., Raghul, J. J., Raj, C. M., Kathiravan, B., Diagnosis of medical dataset using fuzzy-rough ordered weighted average classification. In: 2017 Int. Conf. Innov. Information, Embed. Commun. Syst., IEEE, pp. 1–5, 2017. https://doi.org/10.1109/ICIIECS.2017.8275922.

  83. Khemphila, A., Boonjing, V., Heart disease classification using neural network and feature selection. In: 2011 21st Int. Conf. Syst. Eng, pp. 406–409, 2011. https://doi.org/10.1109/ICSEng.2011.80.

  84. Mustaqeem, A., Anwar, S. M., Majid, M., Khan, A. R., Wrapper method for feature selection to classify cardiac arrhythmia. In: Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. EMBS, pp. 3656–3659, 2017. https://doi.org/10.1109/EMBC.2017.8037650.

  85. Moody, G. B., Mark, R. G., MIT-BIH arrhythmia database, 1997. http://ecg.mit.edu/dbinfo.html. Accessed June, 2018

  86. Blake, C. L., Merz, C. J., UCI Repository of machine learning databases. Univ. Calif, 1998. http://archive.ics.uci.edu/ml/. Accessed June, 2018

  87. Davis, J. J., and Clark, A. J., Data preprocessing for anomaly based network intrusion detection: A review. J. Comput. Secur. 30:353–375, 2011. https://doi.org/10.1016/j.cose.2011.05.008.

    Article  Google Scholar 

  88. Huang, J., Li, Y.-F., and Xie, M., An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67:108–127, 2015. https://doi.org/10.1016/j.infsof.2015.07.004.

    Article  Google Scholar 

  89. Bowyer, K. W., Mentoring advice on “Conferences versus journals” for CSE Faculty 2012, pp. 1–9, 2012.

  90. Idri, A., Abnane, I., and Abran, A., Missing data techniques in analogy-based software development effort estimation. J. Syst. Softw. 117:595–611, 2016. https://doi.org/10.1016/j.jss.2016.04.058.

    Article  Google Scholar 

  91. Quinlan, J. R., Induction of decision trees. Mach. Learn. 1:81–106, 1986. https://doi.org/10.1023/A:1022643204877.

    Article  Google Scholar 

  92. Liu, H., Hussain, F., Tan, C. L., and Dash, M., Discretization: An enabling technique. Data Min. Knowl. Disc. 6:393–423, 2002. https://doi.org/10.1023/A:1016304305535.

    Article  Google Scholar 

  93. Visalakshi, N. K., and Thangavel, K., Impact of normalization in distributed K-means clustering. Int. J. Soft Comput. 4:168–172, 2009.

    Google Scholar 

  94. Al Shalabi, L., Shaaban, Z., and Kasasbeh, B., Data mining: A preprocessing engine. J. Comput. Sci. 2:735–739, 2006. https://doi.org/10.3844/jcssp.2006.735.739.

    Article  Google Scholar 

  95. Japkowicz, N., and Stephen, S., The class imbalance problem: A systematic study. Intell. Data Anal. 6:429–449, 2002 https://doi.org/10.3233/IDA-2002-6504.

  96. Pincus, T., Yazici, Y., and Bergman, M. J., Patient questionnaires in rheumatoid arthritis: Advantages and limitations as a quantitative, standardized scientific medical history. Rheum. Dis. Clin. N. Am. 35:735–743, 2009. https://doi.org/10.1016/j.rdc.2009.10.009.

    Article  Google Scholar 

  97. El Idrissi, T., Idri, A., Bakkoury, Z., Systematic map and review of predictive techniques in diabetes self- management. Int. J. Inf. Manag., In Press, 2018.

Download references

Acknowledgments

This work was conducted within the research project MPHR-PPR1-2015-2017. The authors would like to thank the Moroccan MESRSFC and CNRST for their support. It is also a part of the GINSENG project (TIN2015-70259-C2-2-R) supported by the Spanish Ministry of Economy and Competitiveness and European FEDER funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Idri.

Ethics declarations

Conflict of interests

All the authors declare that there is no conflict of interest regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

This article is part of the Topical Collection on Systems-Level Quality Improvement

Appendix

Appendix

Table 6 List of the selected studies.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Benhar, H., Idri, A. & Fernández-Alemán, J.L. A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery. J Med Syst 43, 17 (2019). https://doi.org/10.1007/s10916-018-1134-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-018-1134-z

Keywords

Navigation