Abstract
Coronary artery disease (CAD) is caused by atherosclerosis in coronary arteries and results in cardiac arrest and heart attack. For diagnosis of CAD, angiography is used which is a costly time consuming and highly technical invasive method. Researchers are, therefore, prompted for alternative methods such as machine learning algorithms that could use noninvasive clinical data for the disease diagnosis and assessing its severity. In this study, we present a novel hybrid method for CAD diagnosis, including risk factor identification using correlation based feature subset (CFS) selection with particle swam optimization (PSO) search method and K-means clustering algorithms. Supervised learning algorithms such as multi-layer perceptron (MLP), multinomial logistic regression (MLR), fuzzy unordered rule induction algorithm (FURIA) and C4.5 are then used to model CAD cases. We tested this approach on clinical data consisting of 26 features and 335 instances collected at the Department of Cardiology, Indira Gandhi Medical College, Shimla, India. MLR achieves highest prediction accuracy of 88.4 %.We tested this approach on benchmarked Cleaveland heart disease data as well. In this case also, MLR, outperforms other techniques. Proposed hybridized model improves the accuracy of classification algorithms from 8.3 % to 11.4 % for the Cleaveland data. The proposed method is, therefore, a promising tool for identification of CAD patients with improved prediction accuracy.
Similar content being viewed by others
References
Wong, N.D., Epidemiological studies of CHD and the evolution of preventive cardiology. Nat. Rev. Cardiol. 11(5):276–289, 2014.
http://www.who.int/mediacentre/factsheets/fs317/en/ (Accessed on January 2016).
Tsipouras, M.G., Exarchos, T.P., Fotiadis, D.I., Kotsia, A.P., Vakalis, K.V., Naka, K.K., and Michalis, L.K., Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Trans. Inf. Technol. Biomed. 12(4):447–458, 2008.
http://heartdiseaseonline.com (Accessed on November 2015).
Acharya, U.R., Faust, O., Sree, V., Swapna, G., Martis, R.J., Kadri, N.A., and Suri, J.S., Linear and nonlinear analysis of normal and CAD-affected heart rate signals. Comput. Methods Prog. Biomed. 113(1):55–68, 2014.
Giri, D., Acharya, U.R., Martis, R.J., Sree, S.V., Lim, T.C., Ahamed, T., and Suri, J.S., Automated diagnosis of coronary artery disease affected patients using LDA, PCA, ICA and discrete wavelet transform. Knowl.-Based Syst. 37:274–282, 2013.
http://www.nhlbi.nih.gov/health/health-topics/topics/cad (Accessed on February 2016).
Alizadehsani, R., Hosseini, M. J., Sani, Z. A., Ghandeharioun, A., & Boghrati, R., Diagnosis of coronary artery disease using cost-sensitive algorithms. In Data Mining Workshops (ICDMW), 2012 I.E. 12th International Conference on (pp. 9–16). IEEE, 2012.
Arafat, S., Dohrmann, M., & Skubic, M., Classification of coronary artery disease stress ECGs using uncertainty modeling. In Computational Intelligence Methods and Applications, 2005 ICSC Congress on (pp. 4-pp). IEEE, 2005.
Lee, H. G., Noh, K. Y., & Ryu, K. H., A data mining approach for coronary heart disease prediction using HRV features and carotid arterial wall thickness. In BioMedical Engineering and Informatics, 2008. BMEI 2008. International Conference on (Vol. 1, pp. 200–206). IEEE, 2008.
Acharya, U.R., Sree, S.V., Krishnan, M.M.R., Molinari, F., Saba, L., Ho, S.Y.S., and Suri, J.S., Atherosclerotic risk stratification strategy for carotid arteries using texture-based features. Ultrasound Med. Biol. 38(6):899–915, 2012.
Acharya, U.R., Mookiah, M.R.K., Sree, S.V., Afonso, D., Sanches, J., Shafique, S., and Suri, J.S., Atherosclerotic plaque tissue characterization in 2D ultrasound longitudinal carotid scans for automated classification: a paradigm for stroke risk assessment. Med. Biol. Eng. Comput. 51(5):513–523, 2013.
Zhao, Z., & Ma, C., An intelligent system for noninvasive diagnosis of coronary artery disease with EMD-TEO and BP neural network. In Education Technology and Training, 2008. and 2008 International Workshop on Geoscience and Remote Sensing. ETT and GRS 2008. International Workshop on (Vol. 2, pp. 631–635). IEEE, 2008.
Acharya, U.R., Sree, S.V., Krishnan, M.M.R., Krishnananda, N., Ranjan, S., Umesh, P., and Suri, J.S., Automated classification of patients with coronary artery disease using grayscale features from left ventricle echocardiographic images. Comput. Methods Prog. Biomed. 112(3):624–632, 2013.
Kim, W. S., Jin, S. H., Park, Y. K., & Choi, H. M., A study on development of multi-parametric measure of heart rate variability diagnosing cardiovascular disease. In World Congress on Medical Physics and Biomedical Engineering 2006 (pp. 3480–3483). Springer: Berlin Heidelberg, 2007.
Patidar, S., Pachori, R.B., and Acharya, U.R., Automated diagnosis of coronary artery disease using tunable-Q wavelet transform applied on heart rate signals. Knowl.-Based Syst. 82:1–10, 2015.
Xing, Y., Wang, J., Zhao, Z., & Gao, Y., Combination data mining methods with new medical data to predicting outcome of coronary heart disease. In Convergence Information Technology, 2007. International Conference on (pp. 868–872). IEEE, 2007.
Alizadehsani, R., Habibi, J., Hosseini, M.J., Mashayekhi, H., Boghrati, R., Ghandeharioun, A., and Sani, Z.A., A data mining approach for diagnosis of coronary artery disease. Comput. Methods Prog. Biomed. 111(1):52–61, 2013.
Karaolis, M.A., Moutiris, J.A., Hadjipanayi, D., and Pattichis, C.S., Assessment of the risk factors of coronary heart events based on data mining with decision trees. IEEE Trans. Inf. Technol. Biomed. 14(3):559–566, 2010.
Ordonez, C., Association rule discovery with the train and test approach for heart disease prediction. IEEE Trans. Inf. Technol. Biomed. 10(2):334–343, 2006.
Srinivas, K., Rao, G. R., & Govardhan, A., Analysis of coronary heart disease and prediction of heart attack in coal mining regions using data mining techniques. In Computer Science and Education (ICCSE), 2010 5th International Conference on (pp. 1344–1349). IEEE, 2010.
Palaniappan, S., & Awang, R., Intelligent heart disease prediction system using data mining techniques. In Computer Systems and Applications, 2008. AICCSA 2008. IEEE/ACS International Conference on (pp. 108–115). IEEE, 2008.
Melillo, P., Izzo, R., Orrico, A., Scala, P., Attanasio, M., Mirra, M., and Pecchia, L., Automatic prediction of cardiovascular and cerebrovascular events using heart rate variability analysis. PLoS One. 10(3):e0118504, 2015.
Acharya, U.R., Faust, O., Sree, S.V., Molinari, F., Saba, L., Nicolaides, A., and Suri, J.S., An accurate and generalized approach to plaque characterization in 346 carotid ultrasound scans. IEEE Trans. Instrum. Meas. 61(4):1045–1053, 2012.
Lin, K.C., and Hsieh, Y.H., Classification of medical datasets using SVMs with hybrid evolutionary algorithms based on endocrine-based particle swarm optimization and artificial bee Colony algorithms. J. Med. Syst. 39(10):1–9, 2015.
Subanya, B., & Rajalaxmi, R. R., Feature selection using Artificial Bee Colony for cardiovascular disease classification. In Electronics and Communication Systems (ICECS), 2014 International Conference on (pp. 1–6). IEEE, 2014.
Amin, S. U., Agarwal, K., & Beg, R., Genetic neural network based data mining in prediction of heart disease using risk factors. In Information & Communication Technologies (ICT), 2013 I.E. Conference on (pp. 1227–1231). IEEE, 2013.
Kumar, R., Negi, P.C., Bhardwaj, R., Kandoria, A., Asotra, S., Ganju, N., and Marwah, R., Clinical and non-invasive predictors of the presence and extent of coronary artery disease. Indian Heart J. 66:S28, 2014.
Eom, J.H., Kim, S.C., and Zhang, B.T., AptaCDSS-E: a classifier ensemble-based clinical decision support system for cardiovascular disease level prediction. Expert Syst. Appl. 34(4):2465–2479, 2008.
Yeh, D.Y., Cheng, C.H., and Chen, Y.W., A predictive model for cerebrovascular disease using data mining. Expert Syst. Appl. 38(7):8970–8977, 2011.
Kupusinac, A., Stokic, E., and Kovacevic, I., Hybrid EANN-EA system for the primary estimation of Cardiometabolic risk. J. Med. Syst. 40(6):1–9, 2016.
Le Cessie, S., & Van Houwelingen, J. C., Ridge estimators in logistic regression. Applied statistics, 191–201, 1992.
Cohen, W. W., Fast effective rule induction. In Proceedings of the twelfth international conference on machine learning (pp. 115–123), 1995.
Hühn, J., and Hüllermeier, E., FURIA: an algorithm for unordered fuzzy rule induction. Data Min. Knowl. Disc. 19(3):293–319, 2009.
Quinlan, J. R., C4. 5: Program for machine learning Morgan Kaufmann. San Mateo, CA, 1993.
Melillo, P., De Luca, N., Bracale, M., and Pecchia, L., Classification tree for risk assessment in patients suffering from congestive heart failure via long-term heart rate variability. IEEE J. Biomed. Health Inform. 17(3):727–733, 2013.
Novaković, J., Štrbac, P., & Bulatović, D., Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav Journal of Operations Research ISSN: 0354-0243 EISSN: 2334-6043, 21(1), 2011.
Guyon, I., and Elisseeff, A., An introduction to variable and feature selection. J. Mach. Learn. Res. 3:1157–1182, 2003.
Piramuthu, S., Evaluating feature selection methods for learning in data mining applications. Eur. J. Oper. Res. 156(2):483–494, 2004.
Hall, M. A., Correlation-based feature selection for machine learning (Doctoral dissertation, The University of Waikato), 1999.
Babaoglu, İ., Findik, O., and Ülker, E., A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine. Expert Syst. Appl. 37(4):3177–3183, 2010.
Ebenhart, R., Kennedy. Particle swarm optimization. In Proceeding IEEE Inter Conference on Neural Networks, Perth, Australia, Piscat-away (Vol. 4, pp. 1942–1948), 1995.
Xue, B., Zhang, M., and Browne, W.N., Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6):1656–1671, 2013.
http://www.cs.waikato.ac.nz/ml/weka/index.html (Accessed on October 2015).
Purwar, A., and Singh, S.K., Hybrid prediction model with missing value imputation for medical data. Expert Syst. Appl. 42(13):5621–5631, 2015.
Kahramanli, H., and Allahverdi, N., Design of a hybrid system for the diabetes and heart diseases. Expert Syst. Appl. 35(1):82–89, 2008.
Peter, T. J., & Somasundaram, K., An empirical study on prediction of heart disease using classification data mining techniques. InAdvances in Engineering, Science and Management (ICAESM), 2012 International Conference on (pp. 514–518). IEEE, 2012.
Bouali, H., & Akaichi, J., Comparative Study of Different Classification Techniques: Heart Disease Use Case. In Machine Learning and Applications (ICMLA), 2014 13th International Conference on (pp. 482–486). IEEE, 2014.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is part of the Topical Collection on Transactional Processing Systems
Rights and permissions
About this article
Cite this article
Verma, L., Srivastava, S. & Negi, P.C. A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data. J Med Syst 40, 178 (2016). https://doi.org/10.1007/s10916-016-0536-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-016-0536-z