An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach
Highlights
► An efficient Parkinson’s disease diagnostic system using fuzzy k-nearest neighbor method is proposed. ► The original features are dimensionally reduced using principle component analysis. ► The effectiveness of the proposed system has been rigorously estimated on a PD dataset in terms of accuracy, sensitivity, specificity and AUC. ► We have achieved superior performance against support vector machines based approaches and the existed methods in literature.
Introduction
Parkinson’s disease (PD) is one kind of degenerative diseases of the nervous system, which is characterized by a group of conditions called motor system disorders because of the loss of dopamine-producing brain cells. Primary symptoms of PD include tremor, or trembling in hands, arms, legs, jaw, and face; rigidity, or stiffness of the limbs and trunk; bradykinesia, or slowness of movement; and postural instability, or impaired balance and coordination. PD usually affects people over the age of 50, which has influenced a large part of worldwide population up to now (http://www.ninds.nih.gov/disorders/parkinsons_disease/parkinsons_disease.htm, last accessed: April 2012). Till now, the cause of PD is still unknown, however, it is possible to alleviate symptoms significantly at the onset of the illness in the early stage (Singh, Pillay, & Choonara, 2007). It is claimed that approximately 90% of the patients with PD show vocal impairment (Ho, Iansek, Marigliani, Bradshaw, & Gates, 1998), the patients with PD typically exhibit a group of vocal impairment symptoms, which is known as dysphonia. The dysphonic indicators of PD make speech measurements an important part of diagnosis. Recently, dysphonic measures have been proposed as a reliable tool to detect and monitor PD (Little et al., 2009, Rahn et al., 2007).
Previous study on the PD problem has been undertaken by various researchers. Little et al. (2009) conducted a remarkable study about PD identification, they employed an Support Vector Machine (SVM) classifier with Gaussian radial basis kernel functions to predict PD, and also performed feature selection to select the optimal subset of features from the whole feature space, and the best accuracy rate of 91.4% was obtained by the best model. Shahbaba and Neal (2009) introduced a new nonlinear model based on Dirichlet process mixtures for classification of PD, the results have been compared with that of multinomial logit models, decision trees, and SVM, the best classification accuracy of 87.7% was obtained by the proposed approach. Das (2010) presented a comparative study of using Neural Networks (ANN), DMneural, Regression and Decision Tree for effective diagnosis of PD, the experimental results have shown that the ANN classifier yielded the best results, the overall classification score of 92.9% was achieved. Sakar and Kursun (2010) used the mutual information based feature selection methods integrated with the SVM classifier for PD diagnosis, and the classification accuracy of 92.75% was achieved. Psorakis, Damoulas, and Girolami (2010) introduced novel convergence measures, sample selection strategies and model improvements for multiclass multi-kernel relevance vector machines (mRVMs), and finally, the improved mRVMs achieved the classification accuracy rate of 89.47% when applied to prediction of PD. Guo, Bhattacharya, and Kharma (2010) combined genetic programming and the expectation maximization algorithm (GP-EM) to detect PD, and the best classification accuracy of 93.1% was obtained. Recently, Luukka (2011) employed the feature selection method based on fuzzy entropy measures together with the similarity classifier to predict PD, and mean classification accuracy of 85.03% with only two features was obtained. Li, Liu, and Hu (2011) proposed a fuzzy-based non-linear transformation method in combination with the SVM classifier for prediction of PD, and the best classification accuracy of 93.47% was achieved. Ozcift and Gulten (2011) combined the correlation based feature selection (CFS) algorithm with the rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to identify PD, and the best classification accuracy of 87.13% was achieved by the proposed CFS-RF model. AStröm and Koker (2011) proposed a parallel feed-forward neural network structure for prediction of PD, the highest classification accuracy of 91.20% was obtained. Spadoto et al. (2011) applied evolutionary-based techniques in combination with the Optimum-Path Forest (OPF) classifier to detect PD, and the best classification accuracy of 84.01% was achieved.
From these works, we can see that most of the common classifiers from machine learning community have been utilized for diagnosis of PD. It is evident that the choice of an excellent classifier is of significant importance to the PD diagnosis problem. In this study, an attempt is made to investigate the fuzzy k-nearest neighbor (FKNN) classifier in constructing an automatic diagnostic system for diagnosis of PD. Compared with ANN and SVM, FKNN as an improvement over the standard KNN classifier is much simpler and more easily interpretable while maintaining the acceptable classification accuracy. The main idea behind FKNN (Keller, Gray, & Givens, 1985) is that it uses concepts from fuzzy logic to assign degree of membership to different classes while considering the distance of its k-nearest neighbors. Points closer to the query point contributes larger value to be assigned to the membership function of their corresponding class in comparison to far away neighbors. Class with the highest membership function value is taken as the winner. One unique characteristic of FKNN method is that it can assign a confidence degree for each predicted class. Thanks to its good properties, it has found its application in a wide range of classification tasks such as protein subcellular locations prediction (Huang & Li, 2004), protein solvent accessibility prediction (Sim, Kim, & Lee, 2005), hyperspectral satellite image classification (Yu, De Backer, & Scheunders, 2002), manufacturing applications (Warren Liao & Li, 1997), bankruptcy prediction (Chen et al., 2011a, Chen et al., 2011b), medical diagnosis (Liu et al., 2011, Seker et al., 2003) and so on. To the best of our knowledge, FKNN has not been examined for PD diagnosis although it has been used frequently for the classification of biological and medical data. Aiming at improving the efficiency and effectiveness of the classification accuracy for PD diagnosis, in this study, a diagnosis system based on FKNN classifier is introduced. The rationale underlying the proposed system is firstly to use principle component analysis (PCA) to eliminate the redundant information in the original PD data, then to train an optimal FKNN model whose parameters are identified by the cross validation (CV) analysis on the reduced feature space. Finally, the optimal model is utilized to perform the PD diagnostic tasks. The effectiveness of the proposed system is examined in terms of the classification accuracy, sensitivity, specificity and AUC on the PD data set taken from UCI machine learning repository. Promisingly, as can be seen that the developed diagnosis system for this data set in which a more reliable result is found (96.07% mean accuracy) by 10-fold CV method.
The remainder of this paper is organized as follows. Section 2 offers brief background knowledge on FKNN. The detail of implementations of the FKNN-based diagnosis system is described in Section 3. In the next section, the detailed experimental design is presented, and Section 5 describes all the empirical results and discussion. Finally, Conclusions and future work are summarized in Section 6.
Section snippets
Fuzzy k-nearest neighbor method
The k-nearest neighbor (KNN) is one of the oldest and simplest non-parametric pattern classification methods (Cover & Hart, 1967), in which a class is assigned according to the most common class amongst its k-nearest neighbors. As an improved version of the KNN method, FKNN (Keller et al., 1985) incorporates the fuzzy set theory into KNN. In FKNN, rather than individual classes as in KNN, the fuzzy memberships of samples are assigned to different categories according to the following
The proposed diagnosis system
In this section, we describe the proposed FKNN-based diagnosis system. The proposed approach is comprised of two stages as shown in Fig. 1. In the first stage, feature reduction is conducted by using PCA to eliminate the redundant features and thus enhance further the classification performance. In the second stage, FKNN model is firstly trained on the training sets via 10-fold CV to get the optimal parameter pair (k, m), and then the obtained optimal FKNN model is used to perform the
Data description
In this study, we have performed our conduction on the PD data set taken from UCI machine learning repository. (http://archive.ics.uci.edu/ml/datasets/Parkinsons, last accessed: May 2012). The purpose of this data set is to discriminate healthy people from those with PD, given the results of various medical tests carried out on a patient. This data set is composed of a range of biomedical voice measurements from 31 people, 23 with PD. The time since diagnoses ranged from 0 to 28 years, and the
Experimental results and discussions
In order to verify the effectiveness of the proposed model, firstly FKNN was compared with the advanced SVM classifier on the original feature space. For the FKNN classifier, Fig. 2 shows the relationship between the classification accuracy and the fuzzy strength parameter m which varies in the range of [1, 2] with the step size of 0.01 using different numbers of k. It can be observed that the classification accuracy fluctuates between 90% and 98% with different values of m. It reveals that the
Conclusions and future works
This study introduces a new model for PD diagnosis. The main novelty of this model lies in employing the FKNN classifier together with the feature reduction technique to do the diagnosis tasks for PD. Experimental results demonstrated that the proposed system performed significantly well in distinguishing the patients with PD and healthy ones. Meanwhile, a comparative study was conducted between SVM and FKNN. The experimental results have shown that FKNN approach performs advantageously over
References (37)
- et al.
A parallel neural network approach to prediction of Parkinson’s Disease
Expert Systems with Applications
(2011) - et al.
A new hybrid method based on local fisher discriminant analysis and support vector machines for hepatitis disease diagnosis
Expert Systems with Applications
(2011) - et al.
A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis
Expert Systems with Applications
(2011) - et al.
A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method
Knowledge-Based Systems
(2011) A comparison of multiple classification methods for diagnosis of Parkinson disease
Expert Systems with Applications
(2010)An introduction to ROC analysis
Pattern Recognition Letters
(2006)- et al.
Medical data mining by fuzzy modeling with selected features
Artificial Intelligence in Medicine
(2008) - et al.
A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets
Artificial Intelligence in Medicine
(2011) Feature selection using fuzzy entropy measures with similarity classifier
Expert Systems with Applications
(2011)- et al.
Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms
Comput Methods Programs Biomed
(2011)