Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms

doi:10.1016/j.cmpb.2011.03.018

Computer Methods and Programs in Biomedicine

Volume 104, Issue 3, December 2011, Pages 443-451

https://doi.org/10.1016/j.cmpb.2011.03.018 Get rights and content

Abstract

Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature.

While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC).

Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases.

RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems.

Introduction

Machine learning algorithms have been successfully applied to design CADx systems. These algorithms are first trained with diagnosed samples, i.e. with precedent diagnoses of medical experts. In the test phase, the algorithms are later used to assist the medical experts in making diagnosis of future samples [1]. In this aspect, success of an analysis strategy can be defined as the ability of algorithm to predict the correct status (normal or disease) of unseen data.

Performance of CADx systems might be enhanced with more accurate machine learning algorithms. Predictive ability of such analysis methods can be improved mainly with two strategies: (i) application of feature selection methods on the dataset [2], (ii) construction of classifier ensembles [3].

Accuracy of classification strategies can be affected negatively with the use of too many features in the classification. This may lead to overfitting, in which noise or irrelevant features may decrease classification accuracy because of the finite size of the training samples [4]. In general, there are two widely used feature selection strategies: (i) filter approaches and (ii) wrappers. Wrapper methods find feature subsets based on the performance of a preselected classification algorithm on a training data set. In contrast, filters rely on properties of the features to select the best feature subset. While selecting a subset of features, both approaches utilize a search procedure such as individual ranking, forward search and backward search [5]. In this concept, CFS is a multivariate filter approach that can evaluate strength of features to return the most relevant variables [6]. CFS, in literature, is used in various medical diagnosis applications for feature selection purposes [7], [8], [9], [10].

A powerful technique in machine learning to increase accuracy of conventional base classifiers is to construct classifier ensembles. An ensemble classifier consists of base classifiers that learn a target function by combining their prediction mutually [11]. Some of the ensemble learning approaches seen in the machine learning literature is composite classifier systems, mixture of experts, consensus aggregation, dynamic classifier selection, classifier fusion and committees of neural networks [12]. In machine learning literature, there are various CADx applications that use classifier ensembles (particularly RF algorithm) to improve accuracy of convenient classifiers [13], [14], [15], [16], [17], [18].

Other than accuracy of the base classifiers, the performance of an ensemble algorithm is affected by diversity of the community of classifiers forming the ensemble. Diverse classifiers make different errors on different samples. Combination of such classifiers might lead to more accurate decisions [19].

This study presents an evaluation study that can help to design CADx systems with increased performance. The strategy based on a two-step approach in constructing classifiers with enhanced accuracy. In the first step, feature dimensions of three benchmarking datasets are reduced by the use of CFS algorithm. In the second step, 30 base classifiers and corresponding RF classifier ensembles are used in diagnosis of Parkinson's, heart and diabetes diseases to evaluate the resultant accuracies of algorithms. All the experiments are validated with leave-one-out (10-fold cross validation) scheme.

Section snippets

Overview

In this section, we explain our technique used for creating classifiers with improved accuracies. First, our feature selection strategy, i.e. CFS, is introduced. Following CFS explanation, RF ensemble classification scheme is explained with detail. Next, the datasets used to evaluate classifier performances are briefly introduced. Section 2 is ended with the explanation of evaluation metrics used through experiments.

Variable selection with CFS algorithm

In a classification problem, goodness of features from correlation point of

The benchmarking data with the application of CFS algorithm

We utilized three medical datasets, i.e. diabetes, heart and Parkinson's, from UCI machine learning repository for benchmarking purposes.

The diabetes dataset contains 768 data samples and each sample is defined with 8 features of Table 1. In the dataset, there are two classes as negative to diabetes and positive to diabetes. The two classes involve 500 and 268 samples, respectively.

With the application of CFS algorithm to diabetes dataset, the features with IDs of {2,6,7,8} are retained while

Machine learning algorithms and their abbreviations used in the study

In order to evaluate the performance of widely used machine learning algorithms with their corresponding RF classifier ensembles, we selected 30 algorithms from Weka data mining software. While selecting the algorithms, we attempted to keep diversity of algorithms. For the ease of evaluation in all of the figures and tables, we make use of ID number of the algorithms as a replacement for their names. The ID numbers and respective name of the algorithms are given in Table 4. We used default

Experimental results

In this section, the results of the experiments for diabetes, Cleveland heart and Parkinson's datasets are given in Table 5, Table 6, Table 7, respectively. In the tables, ‘e’ means RF classifier ensemble corresponding to base classifier measures. ‘Diff’ means ‘Difference’ while ‘AVG’ stands for ‘average’.

As Table 5 is examined with three metrics (ACC, KE and AUC) simultaneously, 24 out of 30 base classifiers’ performance is seen to be improved by the use of corresponding RF classifier ensemble

Conclusion and remarks

Machine learning applications, particularly CADx systems, needs classifiers with enhanced accuracies. Such applications, in general, require a two-step approach: (i) a relevant feature selection algorithm to find the most powerful features and (ii) a high accuracy classifier to obtain the highest classification performance.

In our study, we did not evaluate the effect of feature selection algorithm on classifier performances. Instead, we used a simple CFS algorithm to decrease the feature size

References (30)

C. Demir et al.
Cost-conscious classifier ensembles
Pattern Recogn. Lett.
(2005)
R.E Abdel-Aal
GMDH-based feature ranking and selection for improved classification of medical data
J. Biomed. Inform.
(2005)
K.-J. Kim et al.
Ensemble classifiers based on correlation analysis for DNA microarray classification
Neurocomputing
(2006)
J.-H. Eom et al.
AptaCDSS-E: a classifier ensemble-based clinical decision support system for cardiovascular disease level prediction
Expert Syst. Appl.
(2008)
K.-H. Liu et al.
Cancer classification using rotation forest
Comput. Biol. Med.
(2008)
R. Polikar et al.
An ensemble based data fusion approach for early diagnosis of Alzheimer's disease
Inf. Fusion
(2008)
A. Ben-David
Comparison of classification accuracy using Cohen's Weighted Kappa
Expert Syst. Appl.
(2008)
L. Ming et al.
Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples
IEEE Trans. Syst. Man Cybern. A: Syst. Hum.
(2007)
I. Guyon et al.
An introduction to variable and feature selection
J. Mach. Learn. Res.
(2003)
M.C. Lee et al.
A two-step approach for feature selection and classifier ensemble construction in computer-aided diagnosis

K. Michalak et al.

A. Mendiburu et al.

Parallel and multi-objective EDAs to create multivariate calibration models for quantitative chemical applications

I. Skrypnyk

Comparison of feature selection strategies for hearing impairments diagnostics

A.G. Karegowda et al.

Cascading GA; CFS for feature subset selection in medical data mining

Y. Cheng-San et al.

A hybrid approach for selecting gene subsets using gene expression data

Cited by (234)

A review of machine learning and deep learning algorithms for Parkinson's disease detection using handwriting and voice datasets
2024, Heliyon
Parkinson's Disease (PD) is a prevalent neurodegenerative disorder with significant clinical implications. Early and accurate diagnosis of PD is crucial for timely intervention and personalized treatment. In recent years, Machine Learning (ML) and Deep Learning (DL) techniques have emerged as promis-ing tools for improving PD diagnosis. This review paper presents a detailed analysis of the current state of ML and DL-based PD diagnosis, focusing on voice, handwriting, and wave spiral datasets. The study also evaluates the effectiveness of various ML and DL algorithms, including classifiers, on these datasets and highlights their potential in enhancing diagnostic accuracy and aiding clinical decision-making. Additionally, the paper explores the identifi-cation of biomarkers using these techniques, offering insights into improving the diagnostic process. The discussion encompasses different data formats and commonly employed ML and DL methods in PD diagnosis, providing a comprehensive overview of the field. This review serves as a roadmap for future research, guiding the development of ML and DL-based tools for PD detection. It is expected to benefit both the scientific community and medical practitioners by advancing our understanding of PD diagnosis and ultimately improving patient outcomes.
Detecting major depressive disorder presence using passively-collected wearable movement data in a nationally-representative sample
2024, Psychiatry Research
Major Depressive Disorder (MDD) is a heterogeneous disorder, resulting in challenges with early detection. However, changes in sleep and movement patterns may help improve detection. Thus, this study aimed to explore the utility of wrist-worn actigraphy data in combination with machine learning (ML) and deep learning techniques to detect MDD using a commonly used screening method: Patient Health Questionnaire-9 (PHQ-9). Participants (N = 8,378; MDD Screening = 766 participants) completed the and wore Actigraph GT3X+ for one week as part of the National Health and Nutrition Examination Survey (NHANES). Leveraging minute-level, actigraphy data, we evaluated the efficacy of two commonly used ML approaches and identified actigraphy-derived biomarkers indicative of MDD. We employed two ML modeling strategies: (1) a traditional ML approach with theory-driven feature derivation, and (2) a deep learning Convolutional Neural Network (CNN) approach, coupled with gramian angular field transformation. Findings revealed movement-related features to be the most influential in the traditional ML approach and nighttime movement to be the most influential in the CNN approach for detecting MDD. Using a large, nationally-representative sample, this study highlights the potential of using passively-collected, actigraphy data for understanding MDD to better improve diagnosing and treating MDD.
Predicting real-time within-vehicle air pollution exposure with mass-balance and machine learning approaches using on-road and air quality data
2024, Atmospheric Environment
Modelling the air pollutant concentrations within-vehicles is an essential step to estimate our daily exposure to air pollution. This is a challenging issue however, since the processes that affect the exposures within-vehicles change with different driving patterns and ventilation settings. This study introduces an innovative approach that combines mass-balance principles and machine learning techniques, leveraging ambient air quality, on-road and within-vehicle measurements of particulate matter (PM₁₀, PM_2.5, PM₁), nitrogen dioxide (NO₂), nitrogen oxides (NO_x), aerosol lung surface deposited area (LSDA) and ultrafine particles (UFP) under different ventilation settings to estimate air pollution exposure levels within vehicles. The first model (MB) includes basic physical and chemical processes and follows a mass-balance approach to estimate the within-vehicle concentrations. The second model (ML) applies data driven machine learning algorithms to a training set of observations to predict unseen within-vehicle concentrations. By using a number generator, the whole observational dataset was divided to 80:20 and 80% was used to build and train the ML model, while 20% was used for validation. Both models demonstrated good predictions of observations apart from an underestimation in UFP and LSDA. The ML model showed better predictive power than the MB model and had skill in predicting the unseen within-vehicle exposures. The ML model predictions were as good as the MB model for most of the species and improved for NO₂. The ML model demonstrated good index of agreement (IOA >0.69) and Pearson correlation coefficient (r > 0.80) for all the species. The inclusion of air quality data from nearby monitoring stations instead of on-road (sampled while driving), in the ML model showed promising and new capabilities to within-vehicle exposure predictions. In an era where air pollution is a growing concern, understanding and predicting within-vehicle air pollution exposure is of great importance for public health and environmental research. This research not only advances the field of exposure assessment but (at no extra cost) also demonstrates practical implications for real-time exposure mapping and health impact assessment of vehicle occupants with existing infrastructure.
Optimizing deep transfer networks with fruit fly optimization for accurate diagnosis of diabetic retinopathy[Formula presented]
2023, Applied Soft Computing
It is crucial to develop a smart analytics system capable of accurately diagnosing diabetic retinopathy. This research uses a new deep transfer network framework to diagnose Diabetic Retinopathy (DR). The core of this framework is to employ a new Fruit Fly Optimization Algorithm (MALBFOA) enhanced by the Levy Flight (LF), Gaussian Transboundary Correction (GTC), Multi-subgroups, and Subgroups Annihilation (SA) mechanisms to optimize two fully connected layers parameters in one transfer deep learning model and establish a MALBFOA-based Deep Learning (MALBFOA-DL) for diagnosing diabetic retinopathy with a large set of color fundus photography obtained under a variety of imaging conditions as input. To verify the proposed method’s effectiveness, we quantitatively compare the proposed MALBFOA with the original FOA, FOA-based variants, and other traditional meta-heuristic algorithms in a comprehensive set of 49 benchmark functions (shifted and swirled). The experimental results validate that MALBFOA holds a faster convergence rate and better solutions in almost all benchmark functions, especially in solving asymmetric complicated optimization problems. The proposed MALBFOA-DL model can also grade the degree of diabetic retinopathy with more accurate recall rates than the benchmark model and assist doctors in diagnosing diabetic retinopathy.
Towards improving the performance of blind image steganalyzer using third-order SPAM features and ensemble classifier
2023, Journal of Information Security and Applications
The success rate for blind or universal steganalysis lies in the ability to extract the statistical footprints of image features. Further, the choice of machine learning (ML) algorithm is crucial to distinguish the stego image more precisely from the untouched clean images. Literature suggests that most steganalysis approaches report less favorable detection accuracy despite considering many features. This study presents a three-step process to accurately identify the clean and stego images to solve this issue. We used the curvelet denoising as an initial phase during the first step to suppress the natural noise residuals (NRs) by producing the stego NRs. Secondly, it extracts the Third-order Markov-chain sample transition probability matrices as features. Finally, the oblique decision tree ensemble using a multisurface proximal support vector machine (SVM) classifier has been utilized to achieve greater detection accuracy than the state-of-the-art classifiers. The experiments are performed on an extensive database comprising clean and stego images generated from nine embedding schemes with varying payloads. The experimental results suggest that an accuracy of 93.12 has been achieved using the proposed Third order subtractive pixel adjacency matrix (SPAM) features with an ensemble classifier.
Has machine learning over-promised in healthcare?: A critical analysis and a proposal for improved evaluation, with evidence from Parkinson's disease
2023, Artificial Intelligence in Medicine
Adoption of artificial intelligence (AI) by the medical community has long been anticipated, endorsed by a stream of machine learning literature showcasing AI systems that yield extraordinary performance. However, many of these systems are likely over-promising and will under-deliver in practice. One key reason is the community’s failure to acknowledge and address the presence of inflationary effects in the data. These simultaneously inflate evaluation performance and prevent a model from learning the underlying task, thus severely misrepresenting how that model would perform in the real world. This paper investigated the impact of these inflationary effects on healthcare tasks, as well as how these effects can be addressed. Specifically, we defined three inflationary effects that occur in medical data sets and allow models to easily reach small training losses and prevent skillful learning. We investigated two data sets of sustained vowel phonation from participants with and without Parkinson’s disease, and revealed that published models which have achieved high classification performances on these were artificially enhanced due to the inflationary effects. Our experiments showed that removing each inflationary effect corresponded with a decrease in classification accuracy, and that removing all inflationary effects reduced the evaluated performance by up to 30%. Additionally, the performance on a more realistic test set increased, suggesting that the removal of these inflationary effects enabled the model to better learn the underlying task and generalize. Source code is available at https://github.com/Wenbo-G/pd-phonation-analysis under the MIT license.

View all citing articles on Scopus

View full text

Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms

Abstract

Introduction

Section snippets

Overview

Variable selection with CFS algorithm

The benchmarking data with the application of CFS algorithm

Machine learning algorithms and their abbreviations used in the study

Experimental results

Conclusion and remarks

Pattern Recogn. Lett.

J. Biomed. Inform.

Neurocomputing

Expert Syst. Appl.

Comput. Biol. Med.

Inf. Fusion

Expert Syst. Appl.

Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples

IEEE Trans. Syst. Man Cybern. A: Syst. Hum.

An introduction to variable and feature selection

J. Mach. Learn. Res.

A two-step approach for feature selection and classifier ensemble construction in computer-aided diagnosis

Parallel and multi-objective EDAs to create multivariate calibration models for quantitative chemical applications

Comparison of feature selection strategies for hearing impairments diagnostics

Cascading GA; CFS for feature subset selection in medical data mining

A hybrid approach for selecting gene subsets using gene expression data