Abstract
Conventional clinical decision support systems are based on individual classifiers or simple combination of these classifiers which tend to show moderate performance. This research paper presents a novel classifier ensemble framework based on enhanced bagging approach with multi-objective weighted voting scheme for prediction and analysis of heart disease. The proposed model overcomes the limitations of conventional performance by utilizing an ensemble of five heterogeneous classifiers: Naïve Bayes, linear regression, quadratic discriminant analysis, instance based learner and support vector machines. Five different datasets are used for experimentation, evaluation and validation. The datasets are obtained from publicly available data repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several classifiers. Prediction results of the proposed ensemble model are assessed by ten fold cross validation and ANOVA statistics. The experimental evaluation shows that the proposed framework deals with all type of attributes and achieved high diagnosis accuracy of 84.16 %, 93.29 % sensitivity, 96.70 % specificity, and 82.15 % f-measure. The f-ratio higher than f-critical and p value less than 0.05 for 95 % confidence interval indicate that the results are extremely statistically significant for most of the datasets.
Similar content being viewed by others
Notes
<http://archive.ics.uci.edu/ml/datasets.html> [last Accessed: Sep 25 2013].
<http://archive.ics.uci.edu/ml/datasets.html> [last Accessed: Sep 25 2013].
http://en.wikipedia.org/wiki/Rawalpindi_Institute_of_Cardiology [Last accessed on 8th December, 2014].
References
Rajkumar A, Reena GS (2010) Diagnosis of heart disease using data mining algorithm. Glob J Comput Sci Technol 10(10):38
Porter T, Green B (2009) Identifying diabetic patients: a data mining approach. In: Americas conference on information systems
Panzarasa S et al. (2010) Data mining techniques for analyzing stroke care processes. In: Proceedings of the 13th world congress on medical informatics
Li L, Tang H, Wu Z, Gong J, Gruidl M, Zou J Tockman M, Clark RA (2004) Data mining techniques for cancer detection using serum proteomic profiling. In: Artificial intelligence in medicine, Elsevier
Das R, Turkoglu I, Sengur A (2009) Effective diagnosis of heart disease through neural networks ensembles. In: Expert Systems with Applications, Elsevier, pp. 7675–7680
Srinivas K, Rani BK, Govrdhan A (2010) Applications of data mining techniques in healthcare and prediction of heart attacks. Int J Comput Sci Eng (IJCSE) 2:250–255
Shouman M, Turner T, Stocker R (2012) Using data mining techniques in heart disease diagnosis and treatment. 978-1-4673-0484-9/12, IEEE
Zhang L, Zhou WD (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit 44:97–106
Pattekari SA, Parveen A (2012) Prediction system for heart disease using Naïve Bayes. Int J Adv Computer Math Sci 3(3):290–294
Peter TJ, Somasundaram K (2012) An empirical study on prediction of heart disease using classification data mining techniques. In: IEEE-International conference on advances in engineering, science and management (ICAESM-2012)
Ghumbre S, Patil C, Ghatol A (2011) Heart disease diagnosis using support vector machine. In: International conference on computer science and information technology (ICCSIT’) Pattaya
Chitra R, Seenivasagam DV (2013) Heart disease prediction system using supervised learning classifier. Int J Softw Eng Soft Comput 3(1):01–07
Chen AH, Huang SY, Hong PS, Cheng CH, Lin EJ (2011) HDPS: heart disease prediction system. In: Computing in cardiology
Jabbar MA, Chandra P, Deekshatulu BL (2012) Heart disease prediction system using associative classification and genetic algorithm. In: International conference on emerging trends in electrical, electronics and communication technologies-ICECIT
Valente G, Castellanos AL, Vanacor EG, Formisan OE (2014) Multivariate linear regression of high-dimensional fMRI data with multiple target variables. Hum brain mapp 35(2):2163–2177
Rizk-Jackson A, Stoffers D, Sheldon S, Kuperman J, Dale A, Goldstein J, Corey-Bloom J, Poldrack RA, Aron AR (2011) Evaluating imaging biomarkers for neurodegeneration in pre-symptomatic Huntington’s disease using machine learning techniques. NeuroImage 56(2):788–796
Maroco J, Silva D, Rodrigues A, Guerreiro M, Santana I, Mendonça AD (2011) Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests. BMC Res Notes 4(1):299
Helmy T, Rahman SM, Hossain MI, Abdelraheem A (2013) Non-linear heterogeneous ensemble model for permeability prediction of oil reservoirs. Arab J Sci Eng 38:1379–1395
Saha S, Ekbal A (2013) Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl Eng 85:15–39
Mokeddem S, Atmani B, Mokaddem M (2013) Supervised feature selection for diagnosis of coronary artery disease based on genetic algorithm. In: First international conference on computational science and engineering (CSE-2013)
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324
Patil RR (2014) Heart disease prediction system using Naive Bayes and Jelinek-mercer smoothing. Int J Adv Res Comput Commun Eng
Palaniappan S, Awang R (2008) Intelligent heart disease prediction system using data mining techniques. In: International conference on computer system and applications. AICCSA, pp 108–115
Mehra A (2003) Statistical sampling and regression: simple linear regression. PreMBA analytical methods. Columbia Business School and Columbia University
Weiss SM, Kulikowski CA (1991) Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan Kaufman, San Mateo
STAT55-Data mining (2014) The Pennsylvania State University
Uguroglu S, Carbonell J, Doyle M, Biederman R (2012) Cost-sensitive risk stratification in the diagnosis of heart disease. In: Proceedings of the twenty-fourth innovative applications of artificial intelligence conference
Breiman L (1994) Bagging Predictors, Technical Report 421, Department of Statistics, University of California, Berkeley
Jain M, Dua P, Lukiw WJ (2013) Data adaptive rule-based classification system for Alzheimer classification. J Comput Sci Syst Biol 6:291–297
Peter TJ, Somasundaram K (2012) An empirical study on prediction of heart disease using classification data mining techniques, In: IEEE-international conference on advances in engineering, science and management
Tu MC, Shin D, Shin D (2009) Effective diagnosis of heart disease through Bagging approach. In: 2nd international conference on biomedical engineering and informatics
Pai P, Li L, Hung W (2014) Using ADABOOST and rough set theory for Debris flow disaster. Water Resour Manag 28(4):1143–1155
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Data mining, inference and prediction, 2nd edn. Springer series in statistics
BLA (2009) Sensitivity, specificity, accuracy and the relationship between them. Bioinformatics
Palaniappan S, Awang R (2008) Intelligent heart disease prediction system using data mining techniques. IJCSNS Int J Comput Sci Netw Secur, 8(8)
Gelman A (2008) Variance, analysis of. The new Palgrave dictionary of economics, 2nd edn. Palgrave Macmillan, Basingstoke, Hampshire New York
Yuan G, Ho C, Lin C (2012) Recent advances of large-scale linear classification. Proc IEEE 100(9):2584–2603
Shouman M, Turner T, Stocker R (2011) Using decision tree for diagnosing heart disease patients. In: Proceedings of the 9th Australasian data mining conference, Ballarat, Australia
Tu MC, Shin D et al (2009) Effective diagnosis of heart disease through bagging approach. In: 2nd international conference on biomedical engineering and informatics. IEEE, pp 1–4
Shouman M, Turner T, Stocker R (2013) Integrating clustering with different data mining techniques in the diagnosis of heart disease. J Comput Sci Eng 20(1)
Shouman M, Turner T, Stocker R (2012) Integrating Naive Bayes and K-means clustering with different initial centroid selection methods in the diagnosis of heart disease patients. Glob J Comput Sci Technol 125–137
Chaurasia V, Pal S (2013) Early prediction of heart diseases using data mining techniques. Caribb J Sci Technol 1:208–217
Sunday NA, Latha PP (2013) Performance analysis of classification data mining techniques over heart disease database. Int J Eng Sci Adv Technol 2(3):470–478
Soni J, Ansari U, Sharma D (2011) Intelligent and effective heart disease prediction system using weighted associative classifiers. Int J Computer Sci Eng (IJCSE) 3(6):2385–2392
Acknowledgments
We are grateful to Rawalpindi Institute of Cardiology for their support in using the proposed DSS for research purposes only under the strict supervision of a team of medical experts and their information technology team.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
See Table 14.
Rights and permissions
About this article
Cite this article
Bashir, S., Qamar, U. & Khan, F.H. BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting. Australas Phys Eng Sci Med 38, 305–323 (2015). https://doi.org/10.1007/s13246-015-0337-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13246-015-0337-6