Optimization of K-NN algorithm by clustering and reliability coefficients: application to breast-cancer diagnosis

https://doi.org/10.1016/j.procs.2018.01.125Get rights and content
Under a Creative Commons license
open access

Abstract

There is a growing trend towards data mining applications in medicine. Different algorithms have been explored by medical practitioners in an attempt to assist their work; the diagnosis of breast cancer is one of those applications. Machine learning algorithms are of vital importance to many medical problems, they can help to diagnose a disease, to detect its causes, to predict the outcome of a treatment, etc. K-Nearest Neighbors algorithm (KNN) is one of the simplest algorithms; it is widely used in predictive analysis. To optimize its performance and to accelerate its process, this paper proposes a new solution to speed up KNN algorithm based on clustering and attributes filtering. It also includes another improvement based on reliability coefficients which insures a more accurate classification. Thus, the contributions of this paper are three-fold: (i) the clustering of class instances, (ii) the selection of most significant attributes, and (iii) the ponderation of similarities by reliability coefficients. Results of the proposed approach exceeded most known classification techniques with an average f-measure exceeding 94% on the considered breast-cancer Dataset.

Keywords

data mining
cancer diagnosis
supervised classification
unsupervised classification
k-nearest neighbors
k-means
similarity measurement

Cited by (0)