Elsevier

Applied Soft Computing

Volume 58, September 2017, Pages 176-192
Applied Soft Computing

Full Length Article
A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection

https://doi.org/10.1016/j.asoc.2017.04.061Get rights and content

Highlights

  • We modify the BPSO for wrapper feature selection with two mechanisms.

  • The memory renewal mechanism has an effect on local and global optimum, helps the particle overstep the local extremum.

  • The mutation-enhanced mechanism increases the particle mutation probability to avoid premature convergence.

  • We examine our modified algorithm, compared with previous versions of BPSO and other algorithms.

  • The novel algorithm results show the better accuracy and fewer feature number.

Abstract

Feature selection (FS) is an essential component of data mining and machine learning. Most researchers devoted to get more effective method with high accuracy and fewer features, it has become one of the most challenging problems in FS. Certainly, some algorithms have been proven to be effectively, such as binary particle swarm optimization (BPSO), genetic algorithm (GA) and support vector machine (SVM). BPSO is a metaheuristic algorithm having been widely applied to various fields and applications successfully, including FS. As a wrapper method of FS, BPSO-SVM tends to be trapped into premature easily. In this paper, we present a novel mutation enhanced BPSO-SVM algorithm by adjusting the memory of local and global optimum (LGO) and increasing the particles’ mutation probability for feature selection to overcome convergence premature problem and achieve high quality features. Typical simulated experimental results carried out on Sonar, LSVT and DLBCL datasets indicated that the proposed algorithm improved the accuracy and decreased the number of feature subsets, comparing with existing modified BPSO algorithms and GA.

Introduction

Nowadays more and more datasets were produced by the thousands of applications of the information system, the information that contains by these datasets plays an important role for the data users. Data mining is called knowledge discovery [1], it relays on the computational science [2], [3] and can help these data users who want to find more information of the data to fetch the unknown knowledge. At present, data mining is very popular in many different fields, such as financial analysis [4], commercial management, military tecknology and medical research. In the process of data mining, datasets preprocess is an essential step, because it is easy to consider the effective utilization of datasets properties before data mining, by reducing the noisy of datasets. Usually, redundant features can reduce the accuracy and effiency of data mining, and then lead to uncorrect classification prediction. Feature Extraction (FE) [5], [6] and Feature Selection (FS)[7], [8] are two typical employed methods to solve this problem. FS selects a group of the most statistical significant characteristics of original features without transformation compared with FE. By considering this superiority of retaining the original characteristics, FS is widely applied to many domains, such as spam detection [7], telecom customers churn [9] and bioinformatics [8] including gene expression [10], [11].

Generally, FS algorithms can be divided into three categories: filter model [12], [13], wrapper model [14], [15] and mix embedded model [16], [17]. Filter methods filter out the redundant features via statistical properties of the features, have no relationship with any of the learning algorithms and spend the less computational time. The embedded techniques bound up with learning model, as part of the training process without splitting the data into training and testing sets, and more complicated. The wrapper method is another way to select both search model for picking appropriate features and leaning model for evaluating the indicator effect, and obtain high accuracy with costing high computational overhead at the meantime.

In wrapper FS, Support Vector Machine (SVM) is a popular method, which has been used in various application scenarios by scientists due to its high accuracy and generalization ability [18], [19], [20], [21]. Ref. [22] employed SVM with backward elimination method on imbalanced high dimensional microarray datasets, by utilizing special sampling methods and criteria with imbalanced datasets. So, SVM takes advantage of high-dimensional data and non-linear problems, especially in microarray data classification [23]. Ref. [24] had already implemented the combination of GA and SVM for FS, utilized majorization of SVM radial basis function (RBF) parameters parallelism and had great experiments result. Recently, the researchers focused on novel fractional order Darwinian PSO method for FS [21], and SVM was employed for evaluation as a key role to deal with the high dimensional data. Meanwhile, k-Nearest Neighbors (KNN) algorithm is also used in FS, since it can handle classification problem effectively. However, we prefer to use SVM rather than KNN due to SVM’s advantage in imbalanced data and a few samples as support vectors in classification.

In recent years, metaheuristic algorithms have been widely adopted as global optimizer methods due to their excellent performance in wrapper’s FS. The Ant Colony Optimization (ACO) [25] is one of them, showed its high properties in attributes reduction. Simulated Annealing (SA) [26] processed feature picking, while also considered algorithm parameters optimization. Genetic Algorithm (GA) [27] was applied to optimize the feature subset and model parameter selection for the SVM [28]. To achieve a promising result, some researchers also used Particle Swarm Optimization (PSO) algorithm to select the feature subset [29], [30] and optimize the model parameters simultaneously [31]. The advantages of PSO represented less overhead in operation, and it is easier for implementation and faster convergence in searching optimum than other metaheuristic algorithms [32]. However, fast convergence with a dual character quite likely lead to premature convergence in PSO.

Dr. Kennedy proposed Particle swarm optimization (PSO) in 1995 [33], and BPSO of discrete space in 1997 [34]. Researchers concentrated on modified BPSOs in wrapper FS to yield a better result. Some researchers applied BPSO and SVM as wrapper FS, optimized feature number and parameters in SVM at the same time [31], [35], [36]. Combined with other classification algorithms, BPSO also hybrids improvement mechanisms for FS [7], [30], [32], [37]. Zhang [7] brought up mutation operator to turn over the positions of particles randomly in order to avoid the particles being trapped in local optimum. The paper also designed the fitness function for application with increased the proportion of the minority class evaluation function by weight parameter α and chose decision tree as objective function for spam mail detection. Xue et al. proposed initialization strategies and LGO updating mechanisms on PSO [30]. This hybrid algorithm combined with KNN and had good results. Sheikhpour et al. [37] hybridized PSO and Kernel Density Estimation (KDE) took no-parameters with self-designed objective function to pick bandwidth and feature subset on diagnosis of breast cancer with good effect. And the experiments showed better average performance than GA-KDE model, compared between two objective functions in KDE1 and KDE2 are defined combined with PSO and GA. Ref. [32] induced novel local search strategy in hybrid PSO for FS. Considering the correlation information, the hybrid method divided the features into the similar features and salient ones, and then reduced the salient group with specific subset size of determination scheme. This filter-wrapper hybrid approach accepted KNN classifier for evaluation and the accuracy is better than filter based, including information gain, variance of fisher score and mRMR, and wrapper–based methods as GA, PSO, SA and ACO. Lin et al. [35] introduced standard PSO with SVM, resolved the parameters value selection and feature selection with PSO. How to build the data structure of SVM parameters value and attributes selection is the problem need to pay attention. Then the binary structure was proposed for representing the selected feature, but the parameter’s value of SVM is still the continuous [31]. The limitation is the differences of data structure between SVM parameters and selected feature label. In one of the modified BPSO algorithm [36], both feature selection and SVM parameters used the binary structure. Meanwhile, the author modified this schema which renewed the reset mechanism based on IBPSO [23] and the mutation based on MBPSO08 [38]. It reset the global and local optimum, added the modified mutation, and introduced variability in swarm, which tried to help modified BPSO overstep the local extremum. However, the number of jumped particles is limited and the search ability of these particles could be improved, the premature convergence still exists.

To enhance randomness of the mutation after reset mechanism, and to keep the particle active in continuous optimization, we proposed a mutation enhanced binary particle swarm optimization with SVM (ME-BPSO-SVM) in this paper. Two mechanisms were introduced in the algorithm: the memory renewal mechanism and the mutation-enhanced mechanism. We organized this paper as follows: Section 2 is the basic knowledge shows related algorithms of BPSO and SVM in FS. The details description of the proposed algorithm is presented in Section 3. We show the experiments on dataset collected in Section 4, and demonstrate the effectiveness of proposed method, compare it with other modified BPSO algorithms and Genetic Algorithm respectively, by using terms of classification performance and average number of selected attributes. Section 4 also discusses the results and analyzes their underlying reasons. Final Section 5 concludes the paper.

Section snippets

Preliminary knowledge

In this section, we review the SVM and BPSO, and evaluation criteria which used in our algorithm.

Mutation enhanced BPSO-SVM in feature selection

The standard BPSO algorithm has already demonstrated its fair performance in FS process. In the pursuit of better result, we proposed a hybrid mutation-enhanced binary PSO for FS, called ME-BPSO-SVM. In ME-BPSO-SVM, it utilizes modified memory renewal mechanism and mutation-enhanced mechanism based on standard BPSO. Meanwhile, SVM model is employed as evaluation part in wrapper FS and its parameters are selected with ME-BPSO algorithm. The modified memory renewal mechanism contains two parts,

Datasets

To evaluate our ME-BPSO-SVM method, the experiments had performed on 14 datasets. 11 of them are taken from the UCI repository [44], including datasets of breast cancer, Heart, Sonar, Kidney Disease, German, Ionosphere, LSVT voice Rehabilitation and Image Segmentation. The Colon Tumor is from Kent Ridge Bio-medical Dataset [45] and both of the DLBCL and Prostate Tumor are from Gene Expression Model Selector [46]. The selected benchmark datasets are binary class datasets due to our binary

Conclusion

In this paper, a mutation enhanced BPSO-SVM approach (ME-BPSO-SVM) has been applied to feature selection. Our approach proposes the integrated memory renewal mechanism and enhanced mutation mechanism. These two mechanisms have better performance by avoiding premature convergence compared with others including standard BPSO, previous modified BPSOs (MBPSO-08, IBPSO, MBPSO-13 and MBPSO-14) methods and GA. Experimental results show that, under the same time complexity and space complexity, our

References (49)

  • S. Maldonado et al.

    Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines

    Inf. Sci.

    (2014)
  • C.-L. Huang et al.

    A GA-based feature selection and parameters optimizationfor support vector machines

    Expert Syst. Appl.

    (2006)
  • N.K. Sreeja et al.

    Pattern matching based classification using ant colony optimization based feature selection

    Appl. Soft Comput.

    (2015)
  • S.-W. Lin et al.

    Parameter determination of support vector machine and feature selection using simulated annealing approach

    Appl. Soft Comput.

    (2008)
  • M. Zhao et al.

    Feature selection and parameter optimization for support vector machines: a new approach based on genetic algorithm with feature chromosomes

    Expert Syst. Appl.

    (2011)
  • K.K. Bharti et al.

    Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering

    Appl. Soft Comput.

    (2016)
  • B. Xue et al.

    Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms

    Appl. Soft Comput.

    (2014)
  • C.-L. Huang et al.

    A distributed PSO–SVM hybrid system with feature selection and parameter optimization

    Appl. Soft Comput.

    (2008)
  • P. Moradi et al.

    A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy

    Appl. Soft Comput.

    (2016)
  • S.-W. Lin et al.

    Particle swarm optimization for parameter determination and feature selection of support vector machines

    Expert Syst. Appl.

    (2008)
  • S.M. Vieira et al.

    Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients

    Appl. Soft Comput.

    (2013)
  • R. Sheikhpour et al.

    Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer

    Appl. Soft Comput.

    (2016)
  • S. Lee et al.

    Modified binary particle swarm optimization

    Prog. Natl. Sci.

    (2008)
  • F. Marini et al.

    Particle swarm optimization (PSO). a tutorial

    Chemom. Intell. Lab. Syst.

    (2015)
  • Cited by (80)

    View all citing articles on Scopus
    View full text