A new multi-objective differential evolution approach for simultaneous clustering and feature selection

doi:10.1016/j.engappai.2019.103307

Engineering Applications of Artificial Intelligence

Volume 87, January 2020, 103307

https://doi.org/10.1016/j.engappai.2019.103307 Get rights and content

Abstract

Today’s real-world data mostly involves incomplete, inconsistent, and/or irrelevant information that causes many drawbacks to transform it into an understandable format. In order to deal with such issues, data preprocessing is a proven discipline in data mining. One of the typical tasks in data preprocessing, feature selection aims to reduce the dimensionality in the data and thereby contributes to further processing. Feature selection is widely used to enhance the performance of a supervised learning algorithm (e.g., classification) but is rarely used in unsupervised tasks (e.g., clustering). This paper introduces a new multi-objective differential evolution approach in order to find relatively homogeneous clusters without the prior knowledge of cluster number using a smaller number of features from all available features in the data. To analyze the goodness of the introduced approach, several experiments are conducted on a various number of real-world and synthetic benchmarks using a variety of clustering approaches. From the analyzes through several different criteria, it is suggested that our method can significantly improve the clustering performance while reducing the dimensionality at the same time.

Introduction

Nowadays vast amounts of data can be stored without more effort thanks to the developments in technological tools such as the World Wide Web, network services, and database storages. On the other hand, due to this massive data growth, the process of knowledge extraction using a data mining method becomes more difficult. The quality of the knowledge extracted from a data does not only depend on the data mining method but also depends on the data quality. The more quality and suitability such data involves, the better the process of knowledge extraction is supposed to be. Unfortunately, not only required and necessary information but also noise, inconsistency, missing values, and a huge number of samples and features may be involved in the data. To deal with such issues, data preprocessing is proven to be a major and essential step to prepare the data for further data mining processes.

One of the critical tasks in data preprocessing, dimensionality reduction is commonly used to alleviate the curse of dimensionality “which is a serious problem as it will impede the operation of most data mining algorithms as the computational cost rise (García et al., 2016)”. Dimensionality reduction approaches are mainly investigated in two categories: space transformation and feature selection. Space transformation tries to transform the original feature set into a small number of projections. The representative examples concerning space transformation are principal component analysis, linear component analysis, and factor analysis (Ekbal and Saha, 2015). Feature selection is the process of eliminating as more irrelevant and redundant features as possible. The goal is to obtain a feature subset from all available features that is far more beneficial to the further process. In other words, feature selection aims to eliminate irrelevant and redundant features which may lead to undesired correlations in further (learning) process. By the way, the further learning process will also be faster and require less storage. Unlike space transformation, feature selection keeps the originality of features and thereby is treated as more general. Feature selection has been widely used to enhance the performance of supervised tasks (e.g., prediction, classification, regression) in data mining and machine learning (Anifowose and Abdulraheem, 2011, Anifowose et al., 2013, Anifowose et al., 2014, Anifowose et al., 2016, Anifowose, 2018). Especially, feature selection approaches have brought significant improvements in both the learning performance and the computational efficiency of the classification algorithms. Such improvements are particularly observed when feature selection approaches are wrapped around evolutionary computation (EC) techniques which are known to be powerful global search methods (Xue et al., 2013).

Despite the popularity in classification, feature selection has been rarely used in unsupervised tasks, e.g., clustering. Clustering is the process of dividing data instances into clusters (or groups) such that the instances within a cluster are highly similar to each other, whereas the instances from different clusters are maximally dissimilar. The representative measures to quantify the similarity between a pair of instances are Euclidean, Mahalanobis, Cityblock, Cosine, Correlation, Hamming, and Jaccard distances (Hancer and Karaboga, 2017). While typically computing the distance between data instances, each feature is considered with equal importance. Unfortunately, this such typical computation may lead to the curse of dimensionality problem in clustering for high dimensional data. It is also reported in Chakraborty and Das (2018) that the performance of arguably well-known clustering algorithms, such as K-means and its variants (MacQueen, 1967) may deteriorate proportionally to the number of features. Another significant and long-standing point concerning clustering is the automatic evolution of clusters since today’s real-world data mostly does not include the information of cluster number. Therefore, it is necessary to integrate the process of automatic cluster evolution with an automated feature selection process, referred to as simultaneous clustering and feature selection. Simultaneous clustering and feature selection has just come into consideration and so there exist only a few attempts to deal with this issue in the literature. In the literature, it has been considered as a single-objective problem using EC techniques, such as niching genetic algorithm (GA) (Shang et al., 2007) and particle swarm optimization (PSO) (Lensen et al., 2016). In these EC-based simultaneous clustering and feature selection approaches, all features are considered during the calculation of similarity measures whereas they are not even selected by the approaches to prevent feature selection introducing bias towards lower cluster numbers. In other words, feature selection is carried out independent of clustering; thus, it is not possible to determine whether the selected feature subset is suitable or not for the partitions obtained by clustering. It can, therefore, be inferred without no doubt that simultaneous clustering and feature selection is still an open issue for researchers.

In this paper, we aim to deal with issues concerning clustering that attract researchers’ attention by developing a new simultaneous clustering and feature selection approach, with the expectation of enhancing the quality of cluster partitions. To achieve this, we propose a variable-string length based multi-objective differential evolution approach. Specifically, we will investigate:

•
the performance of the proposed approach versus representative partitional approaches on real-world datasets,
•
the performance of the proposed approach versus representative partitional approaches on synthetic datasets, and
•
the performance of the proposed approach versus several different clustering approaches.

The rest of the paper is organized as follows. Section 2 outlines differential evolution, cluster analysis, and recent related works. Section 3 introduces the proposed algorithm. Section 4 defines the experimental design. Section 5 presents the results with discussions. Finally, Section 6 states conclusions with future trends.

Section snippets

Background

This section first describes the basic differential evolution algorithm. It then explains the fundamentals of cluster analysis. Finally, it surveys recent works on simultaneous clustering and feature selection.

Proposed clustering and feature selection approach

In this section, we describe the newly proposed multi-objective differential evolution approach (referred to as MODE-CFS) for simultaneous clustering and feature selection. It should be notified that instead of using an existing variant of multi-objective differential evolution, we develop a new multi-objective variable-string length based differential evolution to perform clustering and feature selection simultaneously.

Experimental design

The performance of the proposed simultaneous clustering and feature selection approach is evaluated by comparing it with a number of representative partitional clustering approaches on a variety of real-world and synthetic datasets using representative evaluation criteria. Due to its non-deterministic mechanism, each approach is carried out for 30 independent runs, and the mean of each evaluation criterion is calculated across all runs. To further analyze the performance of the MODE-CFS

Results and discussions

The experimental results are presented in Table 2, Table 3 for the real-world and synthetic datasets, respectively. Each table shows the mean values of the feature subset size and the number of clusters obtained by each method as well the average clustering performance in terms of a variety of evaluation criteria. The external evaluation criteria should be maximized, while the internal evaluation criteria should be minimized as well as the number of features. For each of the approaches used for

Conclusions

Multi-objective clustering is not an easy task due to the following issues. First, the components of the multi-objective fitness function should be designed or selected according to the considered multi-objective framework. For example, one internal index which properly works in a multi-objective evolutionary framework may not have a positive effect on clustering when used with another multi-objective evolutionary framework. Second, the selection of a solution from the Pareto front is also a

References (51)

AnifowoseF. et al.
Fuzzy logic-driven and SVM-driven hybrid computational intelligence models applied to oil and gas reservoir characterization
J. Nat. Gas Sci. Eng.
(2011)
AnifowoseF. et al.
Integrating seismic and log data for improved petroleum reservoir properties estimation using non-linear feature-selection based hybrid computational intelligence models
J. Pet. Sci. Eng.
(2016)
AnifowoseF.A. et al.
Non-linear feature selection-based hybrid computational intelligence models for improved natural gas reservoir characterization
J. Nat. Gas Sci. Eng.
(2014)
BezdekJ.C. et al.
FCM: The fuzzy c-means clustering algorithm
Comput. Geosci.
(1984)
ChakrabortyS. et al.
Simultaneous variable weighting and determining the number of clusters:a weighted Gaussian means algorithm
Statist. Probab. Lett.
(2018)
EkbalA. et al.
Joint model for feature selection and parameter optimization coupled with classifier ensemble in chemical mention recognition
Knowl.-Based Syst.
(2015)
HancerE. et al.
A comprehensive survey of traditional, merge-split and evolutionary approaches proposed for determination of cluster number
Swarm Evol. Comput.
(2017)
ParkH.-S. et al.
A simple and fast algorithm for k-medoids clustering
Expert Syst. Appl.
(2009)
PeñuñuriF. et al.
A study of the classical differential evolution control parameters
Swarm Evol. Comput.
(2016)
RousseeuwP.J.
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
J. Comput. Appl. Math.
(1987)

SahaS. et al.

Simultaneous feature selection and symmetry based clustering using multiobjective framework

Appl. Soft Comput.

(2015)

ShangW. et al.

A novel feature selection algorithm for text Categorization

Exp. Syst. Appl.

(2007)

SzilágyiS.M. et al.

A fast hierarchical clustering algorithm for large-scale protein sequence data sets

Comput. Biol. Med.

(2014)

ZhaoQ. et al.

WB-index: A sum-of-squares based index for cluster validity

Data Knowl. Eng.

(2014)

Anifowose, F., Data-driven approach to handling high-dimensional data input space using feature selection-based hybrid...

AnifowoseF. et al.

A least-square-driven functional networks type-2 fuzzy logic hybrid model for efficient petroleum reservoir properties prediction

Neural Comput. Appl.

(2013)

AnkerstM. et al.

OPTICS: Ordering points to identify the clustering structure

BacheK. et al.

UCI machine learning repository

(2013)

CalinskiR. et al.

A dendrite method for cluster analysis

Commun. Stat. -Theory Methods

(1974)

ComaniciuD. et al.

Mean shift: A robust approach toward feature space analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(2002)

DaviesD. et al.

A cluster separation measure

IEEE Trans. Pattern Anal. Mach. Intell.

(1979)

DunnJ.

Well separated clusters and optimal fuzzy partitions

J. Cybern.

(1974)

Dutta, D., Dutta, P., Sil, J., Simultaneous feature selection and clustering for categorical features using multi...

Dutta, D., Dutta, P., Sil, J., Simultaneous continuous feature selection and K clustering by Multi Objective Genetic...

FowlkesE.B. et al.

A method for comparing two hierarchical clusterings

J. Amer. Statist. Assoc.

(1983)

Cited by (58)

An evolutionary filter approach to feature selection in classification for both single- and multi-objective scenarios
2023, Knowledge-Based Systems
The high-dimensional datasets in various domains, such as text categorization, information retrieval and bioinformatics, have highlighted the importance of feature selection in data mining. Despite the numerous existing approaches to feature selection, there is still a need for further research in this field. In this paper, we propose an evolutionary filter feature selection approach that can be used for both single- and multi-objective scenarios by introducing an objective function inspired by Neighborhood Component Analysis (NCA)-based method and then integrating it into the differential evolution framework. The proposed approach applicable to two scenarios aims to identify an optimal feature subset through an evolutionary search process that maximizes class separation while minimizing the dimensionality. Through comprehensive experimental studies conducted on diverse datasets, the results show that the proposed approach outperforms recently proposed evolutionary information-theoretic, rough set-based and state-of-the-art feature selection approaches in both scenarios. Notably, this study is the first to integrate an NCA-based strategy into an evolutionary feature selection approach. Furthermore, you can access the source code of this approach at https://github.com/ehancer06/DENCA this link.
Multi-objective squirrel search algorithm for EEG feature selection
2023, Journal of Computational Science
Feature selection plays a critical role in the application of Brain Computer Interface (BCI) systems. Many methods have been used to solve the feature selection problem, but they model it as a single-objective problem, considering only classification accuracy or number of features. To close this critical gap, we improve the squirrel search algorithm by combining it with the grid method, and propose a Multi-Objective Squirrel Search Algorithm (MOSSA) to solve the feature selection problem in BCI. We conduct experiments on three publicly available motion imagery datasets, and the experimental results reveal the best classification results of the method on dataset 1. The average classification accuracy of dataset 2 is 96.71%, with the number of selected features reduced to 18 on average. The highest classification accuracy of dataset 3 is 83.57% on the training set and 82.86% on the test set. In addition, we compare MOSSA with other algorithms and the results show the superiority of our proposed method in solving the feature selection problem. Finally, we combine MOSSA with an online application of BCI, where subjects visualize controlling the robot to perform the corresponding actions by the left and right hand movements. The average recognition rate of the three subjects is approximately 70%. In summary, the MOSSA is an effective method for solving the feature selection problem and is useful for the development of online applications of BCI.
Binary dynamic stochastic search algorithm with support vector regression for feature selection in low-velocity impact localization problem
2023, Engineering Applications of Artificial Intelligence
Locating low-velocity impacts (LVIs) on composite plates precisely is necessary. Support vector regression (SVR) is an effective method in addressing the LVI localization problem, whereas a large number of impact features may lead to a slow computational rate and over-fitting of the SVR model. In this paper, a binary dynamic stochastic search algorithm (BDSS) is proposed by introducing a threshold factor and the encoding concept of genetic algorithm into the original dynamic stochastic search algorithm (DSS). Then, a feature selection method is proposed by combining BDSS with SVR, which is called BDSS-SVR. BDSS-SVR as a wrapper method can simultaneously implement dimensionality reduction of impact features and optimize the SVR model’s parameters. Furthermore, a novel LVI localization method based on BDSS-SVR and multi-domain features is designed for accurately detecting the LVIs on a carbon fiber reinforced plastic (CFRP) plate. To analyze the performance of BDSS-SVR, a LVI localization system based on fiber Bragg grating (FBG) sensors is applied to the conduction of two experiments. Two additional control parameters of BDSS-SVR, including the threshold factor and weight coefficient, are tuned in the first experiment. Moreover, the statistical results in the second experiment illustrate that BDSS-SVR outperforms optimized SVR using DSS, feature selection methods based on five state-of-the-art algorithms and SVR, and five machine learning methods. For fifteen random LVIs on the CFRP plate, BDSS-SVR effectively reduces the number of impact features and provides satisfactory localization accuracy. The maximum, minimum, and average errors are 4.659 mm, 0.024 mm, and 3.065 mm, respectively.
Automatic design of machine learning via evolutionary computation: A survey
2023, Applied Soft Computing
Machine learning (ML), as the most promising paradigm to discover deep knowledge from data, has been widely applied to practical applications, such as recommender systems, virtual reality, and semantic segmentation. However, building a high-quality ML system for given tasks requires expert knowledge and high computation cost. This poses a significant challenge to the further development of ML in large-scale practical applications. The automatic design of ML has become an increasingly popular research trend. At the same time, evolutionary computation (EC), as an excellent heuristic search technique, has been widely employed in ML optimization, so-called evolutionary machine learning (EML). In this paper, we offer a comprehensive review of the literature (more than 500 references) for EML methods. We first introduce the concepts related to ML and EC. After that, we propose a taxonomy criterion based on the ML and EC perspectives. The important research problems of EML, e.g., ML algorithms, solution representations, search paradigms, acceleration strategies and applications, are reviewed systematically. Lastly, we analyze EML limitations and discuss potential trends that are promising to address in the future.
Automatic clustering and feature selection using multi-objective crow search algorithm
2023, Applied Soft Computing
Today’s real-world data is frequently significant in size, with many redundant, missing, and noise-based features and data instances must be addressed before applying various data-mining-based algorithms for further knowledge discovery. Excessive dimensionality may be mitigated by carefully excluding unnecessary characteristics and selecting a reasonable subset of features. When presented as an optimization issue, choosing the best clusters using the most suitable subset of attributes is a challenge that may be handled using practical meta-heuristic approaches. Besides this, the automatic finding of the appropriate cluster number is another challenging task for the real-world dataset in the unsupervised machine-learning study. The present work proposes a multi-objective crow search algorithm for clustering and feature selection (MO-CSACFS) by modifying the crow search algorithm and introducing a levy flight-based two-point cross-over mechanism for a better exploration phase of the crow and further making it suitable for multi-objective optimization problems. MO-CSACFS addresses both issues using the three objective functions to find appropriate cluster numbers and features. MO-CSACFS is implemented over several real-life and synthetic datasets with varying instances, features, and cluster numbers to assess the algorithm’s performance; apart from that, the present work is also applied over several gene-expression datasets. MO-CSACFS is compared with two similar recently proposed multi-objective optimization processes used over an automatic, unsupervised machine learning task. The results show that the MO-CSACFS has produced a compact and robust cluster comparable to other similar works from the literature.
Differential evolution based on network structure for feature selection
2023, Information Sciences
With the accumulation of high-dimensional data, feature selection (FS) plays a significant role in tasks such as classification and clustering. Recently, many evolutionary algorithms have been developed to address FS problems from single-objective or multiobjective perspectives. However, those algorithms always ignore the complex relationships among features containing abundant available information to adjust the search direction, resulting in poor performance. To address this limitation, we propose to explore the relationship patterns hidden in feature space using the techniques of network science, and based on the exploration, we develop a new differential evolution (DE) and a new multiobjective DE (MODE) for single-objective and multiobjective FS problems, respectively. In particular, we construct a feature network to represent the complex relationships between features and propose to explore the relational patterns from the perspective of the network. After that, the features with rich information are highlighted as high-degree nodes, and some features are grouped together as clusters in the feature network. Motivated by these observations, we further propose to improve the search processes of DE and MODE and produce two network-based DEs, namely, NetG-DE and NetG-MODE. Experimental results over several datasets demonstrate that the proposed NetG-DE and NetG-MODE outperform the baseline.

View all citing articles on Scopus

^☆: No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2019.103307.

View full text

A new multi-objective differential evolution approach for simultaneous clustering and feature selection☆

Abstract

Introduction

Section snippets

Background

Proposed clustering and feature selection approach

Experimental design

Results and discussions

Conclusions

J. Nat. Gas Sci. Eng.

J. Pet. Sci. Eng.

J. Nat. Gas Sci. Eng.

Comput. Geosci.

Statist. Probab. Lett.

Knowl.-Based Syst.

Swarm Evol. Comput.

Expert Syst. Appl.

Swarm Evol. Comput.

J. Comput. Appl. Math.

Appl. Soft Comput.

Exp. Syst. Appl.

Comput. Biol. Med.

Data Knowl. Eng.

A least-square-driven functional networks type-2 fuzzy logic hybrid model for efficient petroleum reservoir properties prediction

Neural Comput. Appl.

OPTICS: Ordering points to identify the clustering structure

UCI machine learning repository

A dendrite method for cluster analysis

Commun. Stat. -Theory Methods

Mean shift: A robust approach toward feature space analysis

IEEE Trans. Pattern Anal. Mach. Intell.

A cluster separation measure

IEEE Trans. Pattern Anal. Mach. Intell.

Well separated clusters and optimal fuzzy partitions

J. Cybern.

A method for comparing two hierarchical clusterings

J. Amer. Statist. Assoc.