A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection

doi:10.1016/j.asoc.2017.04.061

Applied Soft Computing

Volume 58, September 2017, Pages 176-192

https://doi.org/10.1016/j.asoc.2017.04.061 Get rights and content

Highlights

•
We modify the BPSO for wrapper feature selection with two mechanisms.
•
The memory renewal mechanism has an effect on local and global optimum, helps the particle overstep the local extremum.
•
The mutation-enhanced mechanism increases the particle mutation probability to avoid premature convergence.
•
We examine our modified algorithm, compared with previous versions of BPSO and other algorithms.
•
The novel algorithm results show the better accuracy and fewer feature number.

Abstract

Feature selection (FS) is an essential component of data mining and machine learning. Most researchers devoted to get more effective method with high accuracy and fewer features, it has become one of the most challenging problems in FS. Certainly, some algorithms have been proven to be effectively, such as binary particle swarm optimization (BPSO), genetic algorithm (GA) and support vector machine (SVM). BPSO is a metaheuristic algorithm having been widely applied to various fields and applications successfully, including FS. As a wrapper method of FS, BPSO-SVM tends to be trapped into premature easily. In this paper, we present a novel mutation enhanced BPSO-SVM algorithm by adjusting the memory of local and global optimum (LGO) and increasing the particles’ mutation probability for feature selection to overcome convergence premature problem and achieve high quality features. Typical simulated experimental results carried out on Sonar, LSVT and DLBCL datasets indicated that the proposed algorithm improved the accuracy and decreased the number of feature subsets, comparing with existing modified BPSO algorithms and GA.

Graphical abstract

Introduction

Nowadays more and more datasets were produced by the thousands of applications of the information system, the information that contains by these datasets plays an important role for the data users. Data mining is called knowledge discovery [1], it relays on the computational science [2], [3] and can help these data users who want to find more information of the data to fetch the unknown knowledge. At present, data mining is very popular in many different fields, such as financial analysis [4], commercial management, military tecknology and medical research. In the process of data mining, datasets preprocess is an essential step, because it is easy to consider the effective utilization of datasets properties before data mining, by reducing the noisy of datasets. Usually, redundant features can reduce the accuracy and effiency of data mining, and then lead to uncorrect classification prediction. Feature Extraction (FE) [5], [6] and Feature Selection (FS)[7], [8] are two typical employed methods to solve this problem. FS selects a group of the most statistical significant characteristics of original features without transformation compared with FE. By considering this superiority of retaining the original characteristics, FS is widely applied to many domains, such as spam detection [7], telecom customers churn [9] and bioinformatics [8] including gene expression [10], [11].

Generally, FS algorithms can be divided into three categories: filter model [12], [13], wrapper model [14], [15] and mix embedded model [16], [17]. Filter methods filter out the redundant features via statistical properties of the features, have no relationship with any of the learning algorithms and spend the less computational time. The embedded techniques bound up with learning model, as part of the training process without splitting the data into training and testing sets, and more complicated. The wrapper method is another way to select both search model for picking appropriate features and leaning model for evaluating the indicator effect, and obtain high accuracy with costing high computational overhead at the meantime.

In wrapper FS, Support Vector Machine (SVM) is a popular method, which has been used in various application scenarios by scientists due to its high accuracy and generalization ability [18], [19], [20], [21]. Ref. [22] employed SVM with backward elimination method on imbalanced high dimensional microarray datasets, by utilizing special sampling methods and criteria with imbalanced datasets. So, SVM takes advantage of high-dimensional data and non-linear problems, especially in microarray data classification [23]. Ref. [24] had already implemented the combination of GA and SVM for FS, utilized majorization of SVM radial basis function (RBF) parameters parallelism and had great experiments result. Recently, the researchers focused on novel fractional order Darwinian PSO method for FS [21], and SVM was employed for evaluation as a key role to deal with the high dimensional data. Meanwhile, k-Nearest Neighbors (KNN) algorithm is also used in FS, since it can handle classification problem effectively. However, we prefer to use SVM rather than KNN due to SVM’s advantage in imbalanced data and a few samples as support vectors in classification.

In recent years, metaheuristic algorithms have been widely adopted as global optimizer methods due to their excellent performance in wrapper’s FS. The Ant Colony Optimization (ACO) [25] is one of them, showed its high properties in attributes reduction. Simulated Annealing (SA) [26] processed feature picking, while also considered algorithm parameters optimization. Genetic Algorithm (GA) [27] was applied to optimize the feature subset and model parameter selection for the SVM [28]. To achieve a promising result, some researchers also used Particle Swarm Optimization (PSO) algorithm to select the feature subset [29], [30] and optimize the model parameters simultaneously [31]. The advantages of PSO represented less overhead in operation, and it is easier for implementation and faster convergence in searching optimum than other metaheuristic algorithms [32]. However, fast convergence with a dual character quite likely lead to premature convergence in PSO.

Dr. Kennedy proposed Particle swarm optimization (PSO) in 1995 [33], and BPSO of discrete space in 1997 [34]. Researchers concentrated on modified BPSOs in wrapper FS to yield a better result. Some researchers applied BPSO and SVM as wrapper FS, optimized feature number and parameters in SVM at the same time [31], [35], [36]. Combined with other classification algorithms, BPSO also hybrids improvement mechanisms for FS [7], [30], [32], [37]. Zhang [7] brought up mutation operator to turn over the positions of particles randomly in order to avoid the particles being trapped in local optimum. The paper also designed the fitness function for application with increased the proportion of the minority class evaluation function by weight parameter α and chose decision tree as objective function for spam mail detection. Xue et al. proposed initialization strategies and LGO updating mechanisms on PSO [30]. This hybrid algorithm combined with KNN and had good results. Sheikhpour et al. [37] hybridized PSO and Kernel Density Estimation (KDE) took no-parameters with self-designed objective function to pick bandwidth and feature subset on diagnosis of breast cancer with good effect. And the experiments showed better average performance than GA-KDE model, compared between two objective functions in KDE1 and KDE2 are defined combined with PSO and GA. Ref. [32] induced novel local search strategy in hybrid PSO for FS. Considering the correlation information, the hybrid method divided the features into the similar features and salient ones, and then reduced the salient group with specific subset size of determination scheme. This filter-wrapper hybrid approach accepted KNN classifier for evaluation and the accuracy is better than filter based, including information gain, variance of fisher score and mRMR, and wrapper–based methods as GA, PSO, SA and ACO. Lin et al. [35] introduced standard PSO with SVM, resolved the parameters value selection and feature selection with PSO. How to build the data structure of SVM parameters value and attributes selection is the problem need to pay attention. Then the binary structure was proposed for representing the selected feature, but the parameter’s value of SVM is still the continuous [31]. The limitation is the differences of data structure between SVM parameters and selected feature label. In one of the modified BPSO algorithm [36], both feature selection and SVM parameters used the binary structure. Meanwhile, the author modified this schema which renewed the reset mechanism based on IBPSO [23] and the mutation based on MBPSO08 [38]. It reset the global and local optimum, added the modified mutation, and introduced variability in swarm, which tried to help modified BPSO overstep the local extremum. However, the number of jumped particles is limited and the search ability of these particles could be improved, the premature convergence still exists.

To enhance randomness of the mutation after reset mechanism, and to keep the particle active in continuous optimization, we proposed a mutation enhanced binary particle swarm optimization with SVM (ME-BPSO-SVM) in this paper. Two mechanisms were introduced in the algorithm: the memory renewal mechanism and the mutation-enhanced mechanism. We organized this paper as follows: Section 2 is the basic knowledge shows related algorithms of BPSO and SVM in FS. The details description of the proposed algorithm is presented in Section 3. We show the experiments on dataset collected in Section 4, and demonstrate the effectiveness of proposed method, compare it with other modified BPSO algorithms and Genetic Algorithm respectively, by using terms of classification performance and average number of selected attributes. Section 4 also discusses the results and analyzes their underlying reasons. Final Section 5 concludes the paper.

Section snippets

Preliminary knowledge

In this section, we review the SVM and BPSO, and evaluation criteria which used in our algorithm.

Mutation enhanced BPSO-SVM in feature selection

The standard BPSO algorithm has already demonstrated its fair performance in FS process. In the pursuit of better result, we proposed a hybrid mutation-enhanced binary PSO for FS, called ME-BPSO-SVM. In ME-BPSO-SVM, it utilizes modified memory renewal mechanism and mutation-enhanced mechanism based on standard BPSO. Meanwhile, SVM model is employed as evaluation part in wrapper FS and its parameters are selected with ME-BPSO algorithm. The modified memory renewal mechanism contains two parts,

Datasets

To evaluate our ME-BPSO-SVM method, the experiments had performed on 14 datasets. 11 of them are taken from the UCI repository [44], including datasets of breast cancer, Heart, Sonar, Kidney Disease, German, Ionosphere, LSVT voice Rehabilitation and Image Segmentation. The Colon Tumor is from Kent Ridge Bio-medical Dataset [45] and both of the DLBCL and Prostate Tumor are from Gene Expression Model Selector [46]. The selected benchmark datasets are binary class datasets due to our binary

Conclusion

In this paper, a mutation enhanced BPSO-SVM approach (ME-BPSO-SVM) has been applied to feature selection. Our approach proposes the integrated memory renewal mechanism and enhanced mutation mechanism. These two mechanisms have better performance by avoiding premature convergence compared with others including standard BPSO, previous modified BPSOs (MBPSO-08, IBPSO, MBPSO-13 and MBPSO-14) methods and GA. Experimental results show that, under the same time complexity and space complexity, our

References (49)

G. Kou et al.
Evaluation of clustering algorithms for financial risk analysis using MCDM methods
Inf. Sci.
(2014)
N.L. Ajit Krisshna et al.
Face recognition using transform domain feature extraction and PSO-based feature selection
Appl. Soft Comput.
(2014)
S.N. Sulaiman et al.
Improvement of features extraction process and classification of cervical cancer for the NeuralPap system
Proc. Comput. Sci.
(2015)
Y. Zhang et al.
Binary PSO with mutation operator for feature selection using decision tree applied to spam detection
Knowledge-Based Syst.
(2014)
A. Idris et al.
Churn prediction in telecom using Random Forest and PSO based data balancing in combination with various feature selection strategies
Comput. Electr. Eng.
(2012)
M.E. ElAlami
A filter model for feature subset selection based on genetic algorithm
Knowledge-Based Syst.
(2009)
R. Kohavi et al.
RelevanceWrappers for feature subset selection
Artif. Intell.
(1997)
H. Li et al.
Statistics-based wrapper for feature selection: an implementation on financial distress identification with support vector machine
Appl. Soft Comput.
(2014)
B. Peralta et al.
Embedded local feature selection within mixture of experts
Inf. Sci.
(2014)
M.F. Akay
Support vector machines combined with feature selection for breast cancer diagnosis
Expert Syst. Appl.
(2009)

S. Maldonado et al.

Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines

Inf. Sci.

(2014)

C.-L. Huang et al.

A GA-based feature selection and parameters optimizationfor support vector machines

Expert Syst. Appl.

(2006)

N.K. Sreeja et al.

Pattern matching based classification using ant colony optimization based feature selection

Appl. Soft Comput.

(2015)

S.-W. Lin et al.

Parameter determination of support vector machine and feature selection using simulated annealing approach

Appl. Soft Comput.

(2008)

M. Zhao et al.

Feature selection and parameter optimization for support vector machines: a new approach based on genetic algorithm with feature chromosomes

Expert Syst. Appl.

(2011)

K.K. Bharti et al.

Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering

Appl. Soft Comput.

(2016)

B. Xue et al.

Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms

Appl. Soft Comput.

(2014)

C.-L. Huang et al.

A distributed PSO–SVM hybrid system with feature selection and parameter optimization

Appl. Soft Comput.

(2008)

P. Moradi et al.

A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy

Appl. Soft Comput.

(2016)

S.-W. Lin et al.

Particle swarm optimization for parameter determination and feature selection of support vector machines

Expert Syst. Appl.

(2008)

S.M. Vieira et al.

Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients

Appl. Soft Comput.

(2013)

R. Sheikhpour et al.

Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer

Appl. Soft Comput.

(2016)

S. Lee et al.

Modified binary particle swarm optimization

Prog. Natl. Sci.

(2008)

F. Marini et al.

Particle swarm optimization (PSO). a tutorial

Chemom. Intell. Lab. Syst.

(2015)

Cited by (80)

Efficient feature selection based novel clinical decision support system for glaucoma prediction from retinal fundus images
2024, Medical Engineering and Physics
The process of feature selection (FS) is vital aspect of machine learning (ML) model's performance enhancement where the objective is the selection of the most influential subset of features. This paper suggests the Gravitational search optimization algorithm (GSOA) technique for metaheuristic-based FS. Glaucoma disease is selected as the subject of investigation as this disease is spreading worldwide at a very fast pace; 111 million instances of glaucoma are expected by 2040, up from 64 million in 2015. It causes widespread vision impairment. Optic nerve fibres can be degraded and cannot be replaced later in this disease. As a starting point, the retinal fundus images of glaucoma infected persons and healthy persons are used, and 36 features were retrieved from these images of public benchmark datasets and private dataset. Six ML models are trained for classification on the basis of the GSOA's returned subset of features. The suggested FS technique enhances classification performance with selection of most influential features. The eight statistical performance evaluating parameters along with execution time are calculated. The training and testing have been performed using a split approach (70:30), 5-fold cross validation (CV), as well as 10-fold CV. The suggested approach achieved 95.36 % accuracy. Due to its auspicious performance, doctors might use the suggested method to receive a second opinion, which would also help overburdened skilled medical practitioners and save patients from vision loss.
Predicting infarction growth rate II using ANFIS-based binary particle swarm optimization technique in ischemic stroke
2023, MethodsX
Ischemic stroke, a severe medical condition triggered by a blockage of blood flow to the brain, leads to cell death and serious health complications. One key challenge in this field is accurately predicting infarction growth - the progressive expansion of damaged brain tissue post-stroke. Recent advancements in artificial intelligence (AI) have improved this prediction, offering crucial insights into the progression dynamics of ischemic stroke. One such promising technique, the Adaptive Neuro-Fuzzy Inference System (ANFIS), has shown potential, but it faces the 'curse of dimensionality' and long training times as the number of features increased. This paper introduces an innovative, automatic method that combines Binary Particle Swarm Optimization (BPSO) with ANFIS architecture, achieves reduction in dimensionality by reducing the number of rules and training time. By analyzing the Pearson correlation coefficients and P-values, we selected clinically relevant features strongly correlated with the Infarction Growth Rate (IGR II), extracted after one CT scan. We compared our model's performance with conventional ANFIS and other machine learning techniques, including Support Vector Regressor (SVR), shallow Neural Networks, and Linear Regression.
- •
  Inputs: Real data about ischemic stroke represented by clinically relevant features.
- •
  Output: An innovative model for more accurate and efficient prediction of the second infarction growth after the first CT scan.
- •
  Results: The model achieved commendable statistical metrics, which include a Root Mean Square Error of 0.091, a Mean Squared Error of 0.0086, a Mean Absolute Error of 0.064, and a Cosine distance of 0.074.
Natural gas spot price prediction research under the background of Russia-Ukraine conflict - based on FS-GA-SVR hybrid model
2023, Journal of Environmental Management
The ongoing Russia-Ukraine conflict has led to significant upheaval in the worldwide natural gas sector. Accurate natural gas price forecasting, as an essential tool for mitigating market uncertainty, plays a crucial role in commodity trading and regulatory decision-making. This study aims to develop a hybrid forecasting model, the FS-GA-SVR model, which integrates feature selection (FS), genetic algorithm (GA), and support vector regression (SVR) to investigate Henry Hub natural gas price prediction amidst the Russia-Ukraine conflict. The results show that: (1) The feature selection automates model input variable selection, decreasing the time required while improving the model's accuracy. (2) The use of genetic algorithm for selecting support vector regression hyperparameters significantly improves the accuracy of natural gas price predictions. The algorithm leads to a decrease of approximately $70 %$ in measurement indicators. (3) During the Russia-Ukraine conflict, the FS-GA-SVR hybrid model demonstrates more consistent and accurate predictions for natural gas spot prices than the base SVR model. This study serves as a valuable theoretical reference for energy policymakers and natural gas market investors worldwide, supporting their ability to anticipate fluctuations in natural gas prices.
Role of transfer functions in PSO to select diagnostic attributes for chronic disease prediction: An experimental study
2023, Journal of King Saud University - Computer and Information Sciences
Particle Swarm Optimization (PSO) is a classic and popularly used meta-heuristic algorithm in many real-life optimization problems due to its less computational complexity and simplicity. The binary version of PSO, known as BPSO, is used to solve binary optimization problems, such as feature selection. Like other meta-heuristic optimization techniques designed on the continuous search space, PSO uses the transfer functions (TFs) to map the candidate solutions to the discrete search space in BPSO, and these TFs play a vital role to get the desired results. Over the years, many forms of TFs have been introduced in the literature, most of which fall under one of the five families - Linear, S-shaped, V-shaped, U-shaped, and Time-varying Mirrored S-shaped TFs. The goal of this study is to determine an appropriate setup constituting a TF and a classifier for feature selection from different types of clinical data. In this study, the impacts of the five TF families have been investigated, considering one from each family for the selection of attributes/features, while predicting disease using diagnosis or medical reports. The classification tasks are carried out using four standard classifiers: Support Vector Machine, Decision Tree, K-Nearest Neighbors, and Gaussian Naive Bayes. For experimental purposes, we have used four publicly available datasets namely, the UCI Heart Disease dataset, Wisconsin Breast Cancer dataset, UCI Chronic Kidney Disease dataset, and PIMA Indians Diabetes dataset. After an exhaustive set of experiments, we have obtained 96.72%, 99.82%, 100.00%, and 84.41% disease prediction scores in the best case for Heart disease, Breast Cancer, Chronic Kidney disease, and Diabetes, respectively. The obtained results are comparable to several state-of-the-art methods considered here for comparison. The present study helps in selecting a suitable BPSO setup (i.e., a TF and a classifier) to select important diagnostic attributes useful to design a computer-aided decision support system for the said diseases.
PSO-PARSIMONY: A method for finding parsimonious and accurate machine learning models with particle swarm optimization. Application for predicting force–displacement curves in T-stub steel connections
2023, Neurocomputing
We present PSO-PARSIMONY, a new methodology to search for parsimonious and highly accurate models by means of particle swarm optimization. PSO-PARSIMONY uses automatic hyperparameter optimization and feature selection to search for accurate models with low complexity. To evaluate the new proposal, a comparative study with multilayer perceptron algorithm was performed with public datasets and by applying it to predict two important parameters of the force–displacement curve in T-stub steel connections: initial stiffness and maximum strength. Models optimized with PSO-PARSIMONY showed an excellent trade-off between goodness-of-fit and parsimony. The new proposal was compared with GA-PARSIMONY, our previously published methodology that uses genetic algorithms in the optimization process. The new method needed more iterations and obtained slightly more complex individuals, but it performed better in the search for accurate models.
A self-adaptive quantum equilibrium optimizer with artificial bee colony for feature selection
2023, Computers in Biology and Medicine
Feature selection (FS) is a popular data pre-processing technique in machine learning to extract the optimal features to maintain or increase the classification accuracy of the dataset, which is a combinatorial optimization problem, requiring a powerful optimizer to obtain the optimum subset. The equilibrium optimizer (EO) is a recent physical-based metaheuristic algorithm with good performance for various optimization problems, but it may encounter premature or the local convergence in feature selection. This work presents a self-adaptive quantum EO with artificial bee colony for feature selection, named SQEOABC. In the proposed algorithm, the quantum theory and the self-adaptive mechanism are employed into the updating rule of EO to enhance convergence, and the updating mechanism from the artificial bee colony is also incorporated into EO to achieve appropriate FS solutions. In the experiments, 25 benchmark datasets from the UCI repository are investigated to verify SQEOABC, which is compared with several state-of-the-art metaheuristic algorithms and the variants of EO. The statistical results of fitness values and accuracy demonstrate that SQEOABC has better performance than the compared algorithms and the variants of EO. Finally, a real-world FS problem from COVID-19 illustrates the effectiveness and superiority of SQEOABC.

View all citing articles on Scopus

View full text

Full Length ArticleA BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Preliminary knowledge

Mutation enhanced BPSO-SVM in feature selection

Datasets

Conclusion

Inf. Sci.

Appl. Soft Comput.

Proc. Comput. Sci.

Knowledge-Based Syst.

Comput. Electr. Eng.

Knowledge-Based Syst.

Artif. Intell.

Appl. Soft Comput.

Inf. Sci.

Expert Syst. Appl.

Inf. Sci.

Expert Syst. Appl.

Appl. Soft Comput.

Appl. Soft Comput.

Expert Syst. Appl.

Appl. Soft Comput.

Appl. Soft Comput.

Appl. Soft Comput.

Appl. Soft Comput.

Expert Syst. Appl.

Appl. Soft Comput.

Appl. Soft Comput.

Prog. Natl. Sci.

Chemom. Intell. Lab. Syst.

Full Length Article
A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection