Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing

https://doi.org/10.1016/j.eswa.2006.01.027Get rights and content

Abstract

It is evident that usage of machine learning methods in disease diagnosis has been increasing gradually. In this study, diagnosis of heart disease, which is a very common and important disease, was conducted with such a machine learning system. In this system, a new weighting scheme based on k-nearest neighbour (k-nn) method was utilized as a preprocessing step before the main classifier. Artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism was our used classifier. We took the dataset used in our study from the UCI Machine Learning Database. The obtained classification accuracy of our system was 87% and it was very promising with regard to the other classification applications in the literature for this problem.

Introduction

One of the central problems of the information age is dealing with the enormous amount of raw information that is available. More and more data is being collected and stored in databases or spreadsheets. As the volume increases, the gap between generating and collecting the data and actually being able to understand it is widening. In order to bridge this knowledge gap, a variety of techniques known as data mining or knowledge discovery is being developed. Knowledge discovery can be defined as the extraction of implicit, previously unknown, and potentially useful information from real world data, and communicating the discovered knowledge to people in an understandable way (Fayyad et al., 1996, Michie, 1991, Piatetsky-Shapiro and Frawley, 1991).

Heart disease is any disorder that affects the heart’s ability to function normally. The most common cause of heart disease is narrowing or blockage of the coronary arteries, which supply blood to the heart itself. This happens slowly over time.1

Extensive clinical and statistical studies have identified several factors that increase the risk of coronary heart disease and heart attack. Major risk factors are those which research has shown to significantly increase the risk of heart and blood vessel (cardiovascular) disease. Other factors are associated with increased risk of cardiovascular disease, but their significance and prevalence have not yet been precisely determined. They are called contributing risk factors. The American Heart Association has identified several risk factors. Some of them can be modified, treated or controlled, and some cannot. The more risk factors you have, the greater your chance of developing coronary heart disease. Also, the greater the level of each risk factor, the greater the risk.2

In this study, heart disease was diagnosed by using fuzzy-AIRS classification system in which a weighting process based on k-nn method was used as a preprocessing step. While conducting this study, we firstly applied the k-nn based weighting process to the dataset and weighted it in the interval [0, 1]. After this preprocessing step, the weighted dataset was presented to the main fuzzy-AIRS classifier algorithm. The obtained classification accuracy was found to be 87.00%.

The rest of the paper is organized as follows. Section 2 gives the background information including the heart disease classification problem, previous research in corresponding area and a brief introduction to natural and artificial immune systems. We explained the method in Section 3 with subtitles of the proposed method and measures for performance evaluation. In each subsection of that section, the detailed information is given. The results obtained in applications are given in Section 4. This section also includes the discussion of these results in specific and general manner. Consequently in Section 5, we conclude the paper with summarization of results by emphasizing the importance of this study and mentioning about some future work.

Section snippets

Heart disease classification problem

This database comes from the Cleveland Clinic Foundation and was supplied by Robert Detrano, M.D., Ph.D. of the V.A. Medical Center, Long Beach, CA. It is part of the collection of databases at the University of California, Irvine collected by David Aha. The purpose of the dataset is to predict the presence or absence of heart disease given the results of various medical tests carried out on a patient. This database contains 13 attributes, which have been extracted from a larger set of 75. The

Fuzzy resource allocation

The competition of resources in AIRS allows high-affinity ARBs to improve. According to this resource allocation mechanism, half of resources are allocated to the ARBs in the class of antigen while the remaining half is distributed to the other classes. The distribution of resources is done according to a number that is found by multiplying stimulation rate with clonal rate. In the study of Marwah and Boggess, a different resource allocation mechanism was tried (Marwah & Boggess, 2002). In

Results and discussion

In applications of our system, we classified heart disease dataset for the values 10, 15 and 20 of k which is used as k-nn preprocessing step. Each time for the classification, the other parameters were all unchanged but the k-value. The obtained classification accuracies were 85.99%, 87.00% and 86.12% for the 10, 15 and 20 values of the k-value, respectively. According to these results, the highest classification accuracy was reached for the 15 values of k. The results obtained by 10-fold

Conclusion

With the improvements in expert systems and ML tools, the effects of these innovations are entering into more application domains day-by-day and medical field is one of them. Decision-making in medical field can sometimes be a trouble. Classification systems that are used in medical decision-making provide medical data to be examined in shorter time and more detailed.

In the research reported in this paper, k-nn weighting preprocessing and fuzzy resource allocation mechanism with AIRS was

Acknowledgement

This study is supported by the Scientific Research Projects of Selçuk University (project no. 05401069).

References (13)

  • A.S. Perelson et al.

    Theoretical studies of clonal selection: Minimal antibody repertoire size and reliability of self–nonself discrimination

    Journal of Theoretical Biology

    (1979)
  • A.K. Abbas et al.

    Cellular and molecular immunology

    (2003)
  • Cheung, N. (2001). Machine learning techniques for medical analysis. School of Information Technology and Electrical...
  • L.N. De Castro et al.

    Artificial immune systems: A new computational intelligence approach

    (2002)
  • D. Delen et al.

    Predicting breast cancer survivability: A comparison of three data mining methods

    Artificial Intelligence in Medicine

    (2004)
  • U.M. Fayyad et al.

    Advances in knowledge discovery and data mining

    (1996)
There are more references available in the full text version of this article.

Cited by (122)

  • An optimized XGBoost based diagnostic system for effective prediction of heart disease

    2022, Journal of King Saud University - Computer and Information Sciences
    Citation Excerpt :

    An expert system based on machine learning can reduce the medical test's associated costs, and it also enhances the process of diagnosis. In the previous studies, researchers have developed various diagnostic systems for the prediction of heart disease based on different techniques (Samuel et al., 2017, 2013; Alizadehsani et al., 2012; Arabasadi et al., 2017; Polat et al., 2007; Das et al., 2009; Anooj, 2012; Babaoglu et al., 2010; Olaniyi et al., 2015; Abushariah et al., 2014; Manogaran et al., 2018; Özşen and Güneş, 2009; Ali et al., 2019). Motivated by the development of various diagnostic systems to lower heart disease diagnostic barriers and improve predictive accuracy, we are trying to develop a diagnostic system based on XGBoost (Extreme Gradient Boosting) Classifier.

  • Optimization of Population Document Services in Villages using Naive Bayes and k-NN Method

    2024, International Journal of Computing and Digital Systems
View all citing articles on Scopus
View full text