Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing

doi:10.1016/j.eswa.2006.01.027

Expert Systems with Applications

Volume 32, Issue 2, February 2007, Pages 625-631

https://doi.org/10.1016/j.eswa.2006.01.027 Get rights and content

Abstract

It is evident that usage of machine learning methods in disease diagnosis has been increasing gradually. In this study, diagnosis of heart disease, which is a very common and important disease, was conducted with such a machine learning system. In this system, a new weighting scheme based on k-nearest neighbour (k-nn) method was utilized as a preprocessing step before the main classifier. Artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism was our used classifier. We took the dataset used in our study from the UCI Machine Learning Database. The obtained classification accuracy of our system was 87% and it was very promising with regard to the other classification applications in the literature for this problem.

Introduction

One of the central problems of the information age is dealing with the enormous amount of raw information that is available. More and more data is being collected and stored in databases or spreadsheets. As the volume increases, the gap between generating and collecting the data and actually being able to understand it is widening. In order to bridge this knowledge gap, a variety of techniques known as data mining or knowledge discovery is being developed. Knowledge discovery can be defined as the extraction of implicit, previously unknown, and potentially useful information from real world data, and communicating the discovered knowledge to people in an understandable way (Fayyad et al., 1996, Michie, 1991, Piatetsky-Shapiro and Frawley, 1991).

Heart disease is any disorder that affects the heart’s ability to function normally. The most common cause of heart disease is narrowing or blockage of the coronary arteries, which supply blood to the heart itself. This happens slowly over time.¹

Extensive clinical and statistical studies have identified several factors that increase the risk of coronary heart disease and heart attack. Major risk factors are those which research has shown to significantly increase the risk of heart and blood vessel (cardiovascular) disease. Other factors are associated with increased risk of cardiovascular disease, but their significance and prevalence have not yet been precisely determined. They are called contributing risk factors. The American Heart Association has identified several risk factors. Some of them can be modified, treated or controlled, and some cannot. The more risk factors you have, the greater your chance of developing coronary heart disease. Also, the greater the level of each risk factor, the greater the risk.²

In this study, heart disease was diagnosed by using fuzzy-AIRS classification system in which a weighting process based on k-nn method was used as a preprocessing step. While conducting this study, we firstly applied the k-nn based weighting process to the dataset and weighted it in the interval [0, 1]. After this preprocessing step, the weighted dataset was presented to the main fuzzy-AIRS classifier algorithm. The obtained classification accuracy was found to be 87.00%.

The rest of the paper is organized as follows. Section 2 gives the background information including the heart disease classification problem, previous research in corresponding area and a brief introduction to natural and artificial immune systems. We explained the method in Section 3 with subtitles of the proposed method and measures for performance evaluation. In each subsection of that section, the detailed information is given. The results obtained in applications are given in Section 4. This section also includes the discussion of these results in specific and general manner. Consequently in Section 5, we conclude the paper with summarization of results by emphasizing the importance of this study and mentioning about some future work.

Section snippets

Heart disease classification problem

This database comes from the Cleveland Clinic Foundation and was supplied by Robert Detrano, M.D., Ph.D. of the V.A. Medical Center, Long Beach, CA. It is part of the collection of databases at the University of California, Irvine collected by David Aha. The purpose of the dataset is to predict the presence or absence of heart disease given the results of various medical tests carried out on a patient. This database contains 13 attributes, which have been extracted from a larger set of 75. The

Fuzzy resource allocation

The competition of resources in AIRS allows high-affinity ARBs to improve. According to this resource allocation mechanism, half of resources are allocated to the ARBs in the class of antigen while the remaining half is distributed to the other classes. The distribution of resources is done according to a number that is found by multiplying stimulation rate with clonal rate. In the study of Marwah and Boggess, a different resource allocation mechanism was tried (Marwah & Boggess, 2002). In

Results and discussion

In applications of our system, we classified heart disease dataset for the values 10, 15 and 20 of k which is used as k-nn preprocessing step. Each time for the classification, the other parameters were all unchanged but the k-value. The obtained classification accuracies were 85.99%, 87.00% and 86.12% for the 10, 15 and 20 values of the k-value, respectively. According to these results, the highest classification accuracy was reached for the 15 values of k. The results obtained by 10-fold

Conclusion

With the improvements in expert systems and ML tools, the effects of these innovations are entering into more application domains day-by-day and medical field is one of them. Decision-making in medical field can sometimes be a trouble. Classification systems that are used in medical decision-making provide medical data to be examined in shorter time and more detailed.

In the research reported in this paper, k-nn weighting preprocessing and fuzzy resource allocation mechanism with AIRS was

Acknowledgement

This study is supported by the Scientific Research Projects of Selçuk University (project no. 05401069).

References (13)

A.S. Perelson et al.
Theoretical studies of clonal selection: Minimal antibody repertoire size and reliability of self–nonself discrimination
Journal of Theoretical Biology
(1979)
A.K. Abbas et al.
Cellular and molecular immunology
(2003)
Cheung, N. (2001). Machine learning techniques for medical analysis. School of Information Technology and Electrical...
L.N. De Castro et al.
Artificial immune systems: A new computational intelligence approach
(2002)
D. Delen et al.
Predicting breast cancer survivability: A comparison of three data mining methods
Artificial Intelligence in Medicine
(2004)
U.M. Fayyad et al.
Advances in knowledge discovery and data mining
(1996)

There are more references available in the full text version of this article.

Cited by (122)

An optimized XGBoost based diagnostic system for effective prediction of heart disease
2022, Journal of King Saud University - Computer and Information Sciences
Citation Excerpt :
An expert system based on machine learning can reduce the medical test's associated costs, and it also enhances the process of diagnosis. In the previous studies, researchers have developed various diagnostic systems for the prediction of heart disease based on different techniques (Samuel et al., 2017, 2013; Alizadehsani et al., 2012; Arabasadi et al., 2017; Polat et al., 2007; Das et al., 2009; Anooj, 2012; Babaoglu et al., 2010; Olaniyi et al., 2015; Abushariah et al., 2014; Manogaran et al., 2018; Özşen and Güneş, 2009; Ali et al., 2019). Motivated by the development of various diagnostic systems to lower heart disease diagnostic barriers and improve predictive accuracy, we are trying to develop a diagnostic system based on XGBoost (Extreme Gradient Boosting) Classifier.
Researchers have created several expert systems over the years to predict heart disease early and assist cardiologists to enhance the diagnosis process. We present a diagnostic system in this paper that utilizes an optimized XGBoost (Extreme Gradient Boosting) classifier to predict heart disease. Proper hyper-parameter tuning is essential for any classifier’s successful application. To optimize the hyper-parameters of XGBoost, we used Bayesian optimization, which is a very efficient method for hyper-parameter optimization. We also used One-Hot (OH) encoding technique to encode categorical features in the dataset to improve prediction accuracy. The efficacy of the proposed model is evaluated on Cleveland heart disease dataset and compared it with Random Forest (RF) and Extra Tree (ET) classifiers. Five different evaluation metrics: accuracy, sensitivity, specificity, F1-score, and AUC (area under the curve) of ROC charts were used for performance evaluation. The experimental results showed its validity and efficacy in the prediction of heart disease. In addition, proposed model displays better performance compared to the previously suggested models. Moreover, our proposed method reaches the high prediction accuracy of 91.8%. Our results indicate that the proposed method could be used reliably to predict heart disease in the clinic.
Prediction of coronary heart disease based on combined reinforcement multitask progressive time-series networks
2022, Methods
Coronary heart disease is the first killer of human health. At present, the most widely used approach of coronary heart disease diagnosis is coronary angiography, a surgery that could potentially cause some physical damage to the patients, together with some complications and adverse reactions. Furthermore, coronary angiography is expensive thus cannot be widely used in under development country. On the other hand, the heart color Doppler echocardiography report, blood biochemical indicators and personal information, such as gender, age and diabetes, can reflect the degree of heart damage in patients to some extent. This paper proposes a combined reinforcement multitask progressive time-series networks (CRMPTN) model to predict the grade of coronary heart disease through heart color Doppler echocardiography report, blood biochemical indicators and ten basic body information items about the patients. In this model, the first step is to perform deep reinforcement learning (DRL) pre-training through asynchronous advantage actor-critic (A3C). Training data is adopted to optimize the recurrent neural network (RNN) that parameterizes the stochastic policy. In the second step, soft parameter sharing module, hard parameter sharing module and progressive time-series networks are used to predict the status of coronary heart disease. The experimental results show that after DRL pre-training, the multiple tasks in the model interact with each other and learn together to achieve satisfactory results and outperform other state-of-the-art methods.
An Approach Based on Mutually Informed Neural Networks to Optimize the Generalization Capabilities of Decision Support Systems Developed for Heart Failure Prediction
2021, IRBM
Available clinical methods for heart failure (HF) diagnosis are expensive and require a high-level of experts intervention. Recently, various machine learning models have been developed for the prediction of HF where most of them have an issue of over-fitting. Over-fitting occurs when machine learning based predictive models show better performance on the training data yet demonstrate a poor performance on the testing data and the other way around. Developing a machine learning model which is able to produce generalization capabilities (such that the model exhibits better performance on both the training and the testing data sets) could overall minimize the prediction errors. Hence, such prediction models could potentially be helpful to cardiologists for the effective diagnose of HF. This paper proposes a two-stage decision support system to overcome the over-fitting issue and to optimize the generalization factor. The first stage uses a mutual information based statistical model while the second stage uses a neural network. We applied our approach to the HF subset of publicly available Cleveland heart disease database. Our experimental results show that the proposed decision support system has optimized the generalization capabilities and has reduced the mean percent error (MPE) to 8.8% which is significantly less than the recently published studies. In addition, our model exhibits a 93.33% accuracy rate which is higher than twenty eight recently developed HF risk prediction models that achieved accuracy in the range of 57.85% to 92.31%. We can hope that our decision support system will be helpful to cardiologists if deployed in clinical setup.
An Improved Long Short-Term Memory Algorithm for Cardiovascular Disease Prediction
2024, Diagnostics
Optimization of Population Document Services in Villages using Naive Bayes and k-NN Method
2024, International Journal of Computing and Digital Systems
Heart disease prediction (HDP) using Artificial Intelligence and IoMT for intelligent healthcare models
2023, Journal of Intelligent and Fuzzy Systems

View all citing articles on Scopus

View full text

Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing

Abstract

Introduction

Section snippets

Heart disease classification problem

Fuzzy resource allocation

Results and discussion

Conclusion

Acknowledgement

Journal of Theoretical Biology

Cellular and molecular immunology

Artificial immune systems: A new computational intelligence approach

Predicting breast cancer survivability: A comparison of three data mining methods

Artificial Intelligence in Medicine

Advances in knowledge discovery and data mining