nach oben

Neural Processing Letters

Erschienen in:

Open Access 01.04.2024

A Correlation-Redundancy Guided Evolutionary Algorithm and Its Application to High-Dimensional Feature Selection in Classification

verfasst von: Xiang Sun, Shunsheng Guo, Shiqiao Liu, Jun Guo, Baigang Du

Erschienen in: Neural Processing Letters | Ausgabe 2/2024

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

The processing of high-dimensional datasets has become unavoidable with the development of information technology. Most of the literature on feature selection (FS) of high-dimensional datasets focuses on improvements in search strategies, ignoring the characteristics of the dataset itself such as the correlation and redundancy of each feature. This could degrade the algorithm's search effectiveness. Thus, this paper proposes a correlation-redundancy guided evolutionary algorithm (CRGEA) to address high-dimensional FS with the objectives of optimizing classification accuracy and the number of features simultaneously. A new correlation-redundancy assessment method is designed for selecting features with high relevance and low redundancy to speed up the entire evolutionary process. In CRGEA, a novel initialization strategy combined with a multiple threshold selection mechanism is developed to produce a high-quality initial population. A local acceleration evolution strategy based on a parallel simulated annealing algorithm and a pruning method is developed, which can search in different directions and perform deep searches combing the annealing stage around the best solutions to improve the local search ability. Finally, the comparison experiments on 16 public high-dimensional datasets verify that the designed CRGEA outperforms other state-of-the-art intelligent algorithms. The CRGEA can efficiently reduce redundant features while ensuring high accuracy.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The classification of data is an indispensable subject in the domain of machine learning and data mining [1]. It serves as a pervasive technique for discerning and categorizing different system states, thereby finding extensive applications in various fields. As a pre-processing technology, feature selection (FS) is widely used for eliminating redundant and irrelevant features from the original dataset to improve the efficiency and accuracy of the classification model [2]. However, the number of measurements to measure the state of a system has increased dramatically with the development of information technology. High-dimensional datasets may contain more redundant and irrelevant features[3], which definitely puts more pressure on FS to filter out the best subset of features. Thus, it is of great significance to minimize the number of features as much as possible to enhance model efficiency for FS of high-dimensional datasets apart from ensuring classification accuracy. This makes an urgent need for designing an efficient method for solving multi-objective FS (MOFS) of high-dimensional datasets.

FS can be regarded as a problem of combinatorial optimization [4]. The exhaustive method can be employed to obtain all possible combinations of features and determine the subset that optimizes the evaluation criteria. Nevertheless, the solution space for a set of $n$ features is ${2}^{n}-1$, and the computational time required for the exhaustive method grows exponentially with the number of feature dimensions [5]. Thus, despite its simplicity, the exhaustive method proves challenging to implement in practical applications [6]. To solve this problem, many heuristic algorithms have been proposed for solving FS problems. Among these methodologies, sequential forward selection (SFS) and sequential backward selection (SBS) [7] emerge as two prominent approaches. These methods involve initializing either an empty set or a complete set and subsequently iteratively adding or removing features to optimize the subset of features. Despite improving search efficiency relative to exhaustive search, a limitation of both SFS and SBS is that it cannot be reconsidered for further inclusion or exclusion once a feature has been included or excluded. But some features are functionally interconnected, exhibiting a correlated coupling effect. To address this drawback, a novel selection method known as the plus-l minus-r selection method [8] has been introduced. This method effectively alleviates the aforementioned concern associated with SFS and SBS, but the plus and minus of features are fixed. Based on this, the sequential floating forward selection [9] and sequential floating backward selection [10] are designed based on plus-l minus-r selection method, where the plus and minus features can change each time. Although the heuristic methods can significantly reduce the time complexity in comparison to exhaustive search, they often encounter the challenge of being susceptible to local optima due to their fixed search strategies. Thus, intelligent algorithm (IA), as an ideal method has received increasing attention in recent years for its ability to achieve near-optimal/optimal solutions within a reasonable time [11, 12].

However, IA is easy to be trapped in suboptimal solutions when handling FS in high-dimensional datasets [13, 14]. Recently, many search strategies have been designed to improve the search efficiency of the meta-heuristic algorithm. Based on the Pearson correlation coefficient, Kabir et al. [15] developed a novel local search operation that relied on the distinct and informative nature of input features, as computed through correlation information. Moradi and Gholampour [16] introduced a local search strategy that selected distinct features by considering their correlation information. However, the Pearson correlation coefficient can only measure linear relationships between variables, not non-linear relationships, and is sensitive to outliers. To overcome this shortcoming, mutual information is used to measure the correlation between variables [17]. Vinh et al. [18] introduced a correlation-redundancy metric (CRM) to characterize the correlation and redundancy of each feature, which denoted the correlation between a feature and the labels, as well as the redundancy between two features. Furthermore, Wang et al. [19, 20] designed a “relevance-redundancy index” for evaluating the feature importance based on CRM in an ant colony optimization to classification. Song et al. [21] proposed supplementary and deletion operators to enhance the local search ability. However, CRM can measure the feature when the feature has a high correlation between class labels and a low correlation with other features. When both the correlation and redundancy of a feature are high, it will cause the probability of the feature to be close to 0, which will make features with high correlation be discarded. Thus, a novel evaluation function is designed to measure the correlation-redundancy (CR) of each feature. Based on this, a novel local search strategy is proposed which considers both the search depth and the CR of features simultaneously.

Additionally, population initialization also significantly impacts the performance of IA. Most existing literatures use traditional random initialization (TRI) [1, 4, 16, 22] which may influence the quality of the initial population, resulting in slowing down the overall convergence rate [23]. To handle this issue, Xu et al. [24] devised a split initialization approach to enhance the diversity of the initial population, thereby amplifying the breadth of the exploration process. Xue et al. [25] introduced an interval initialization technique that constrained the selection of features to a specific number. Xue et al. [26] devised three novel warm initialization strategies, namely small, large, and mixed initialization methods. The above papers focus on the controlling the quantity of selected features to enhance diversity, disregarding the importance of the selected features themselves, such as the correlation and redundancy of each feature. To overcome this problem, Deniz et al. [27] employed an information gain technique to evaluate the significance of each feature and subsequently initialized the population based on the obtained scores. Song et al. [21] developed a swarm initialization strategy that leverages label correlation as a guiding principle. Mohsen et al. [28] employed cosine similarity in their pheromone initialization approach. Li and Ren [29] designed a swam initialization method based on the maximal information coefficient between the feature and label. Although the aforementioned studies considered the correlation between features and labels, the redundancy between features is ignored. Thus, this paper introduces a novel initialization strategy, which considers a new evaluation function for evaluating the CR of each feature and utilizes it to determine the probability of selecting a feature. This provides an efficient way to get a high-quality initial population and reduce the solution space, thereby expediting the convergence speed.

To sum up, this study proposes a correlation-redundancy guided evolutionary algorithm, named CRGEA, to solve high-dimensional MOFS. A new correlation-redundancy assessment method (CRAM) is designed for measuring the CR of each feature, which guides the entire evolutionary process. In CRGEA, a novel initialization strategy combined with a multiple threshold selection mechanism (ISMTS) is developed to produce a high-quality initial population. A local acceleration evolution strategy (LAES) based on a parallel simulated annealing algorithm and a pruning method is developed to improve the local search ability of the CRGEA. The contributions of this paper can be summarized as follows:

(1) A new correlation-redundancy assessment method is designed in this paper for measuring the CR of each feature and paying more attention to the feature with high relevance and low redundancy of high-dimensional datasets to speed up the entire evolutionary process;

(2) A correlation-redundancy guided evolutionary algorithm is designed for solving MOFS of high-dimensional datasets. In CRGEA, CRAM first evaluates the CR of each feature, and produces high-quality initial populations. LAES performs deep searches combing the annealing stage around the best solutions to improve the local search ability of the CRGEA, which facilitates individuals in escaping local optima solutions and obtaining the best features subset;

(3) Numerical experiments are conducted based on 16 high-dimensional datasets, where nine datasets contain more than 1000 dimensions, three datasets have over 6000 dimensions, and two datasets have over 10,000 dimensions. The experiment results show that CRGEA can efficiently reduce redundant features while ensuring high accuracy compared to other state-of-the-art intelligent algorithms.

The following sections of this paper are structured in the following manner: Sect. 2 provides related works about MOFS, mutual information, K-Nearest Neighbor algorithm, etc. Section 3 presents a comprehensive explanation of the designed CRGEA. Section 4 describes the results obtained from analyzing 16 publicly high-dimensional datasets. In Sect. 5, a summary and discussion are provided based on the outcomes of the three numerical experiments. Finally, Sect. 6 concludes with an overview of the findings and future research directions.

2.1 Multi-objective Feature Selection for High-Dimensional Datasets

In most existing studies, FS has usually been regarded as a single objective problem for improving classification accuracy [21, 30‐34]. However, the feature subset often contains redundant features, which would reduce algorithm efficiency and classification accuracy. Therefore, some researchers introduce the number of features in the FS to remove the redundancy and irrelevant features [35‐42]. The average classification error and the number of features are combined into one goal by setting a weight. The weight for the two targets requires expert experience and may be different for different datasets, which makes it hard to put into practice. In addition, the early research of FS focused on small and medium-sized datasets. With the advent of the information era, more and more research has begun to focus on FS of high-dimensional datasets. However, most of the literature on high-dimensional datasets focuses on improvements in search strategies [13, 22, 23, 25, 43], single objective problems [44‐46], and grouping features considering relevance between features [47, 48], ignoring the characteristics of the dataset itself such as the correlation and redundancy of each feature. This could potentially degrade the algorithm's search effectiveness. Thus, the MOFS for high-dimensional datasets considering correlation-redundancy of each feature is studied in this study to minimize classification error and the number of features simultaneously. The MOFS can be described as follows:

Let the input be the normalized dataset $X\in {R}^{n\times D}$, where $n$ indicates the number of samples in $X$ and $D$ indicates the dimension of $X$. The feature set $S=\left\{{f}_{1},{f}_{2},\dots ,{f}_{D}\right\}$ represents the dimensional features of the $X$. The aim of MOFS is to optimize a function vector $f\left({\text{x}}\right)=\left\{{f}_{1}\left(x\right),{f}_{2}\left(x\right),\dots ,{f}_{m}\left(x\right)\right\}$, where $m$ represents the number of objectives to be optimized.

2.2 Mutual Information

Mutual information (MI) can evaluate the relationship between two discrete variables [49], which is widely used to evaluate the relevance between features and labels as well as the redundancy between features in research [50‐52]. However, the number of features needs to be specified in advance and cannot automatically obtain the optimal number of features and classification accuracy in the above research. Thus, the MI is used in intelligent algorithms to optimize classification accuracy and the number of features simultaneously offer end-to-end convenience in this paper. The MI is described as Eq. (1):

$$I\left(x,y\right)=\sum_{y}\sum_{x}p\left(x,y\right)lg\frac{p\left(x,y\right)}{p\left(x\right)p(y)}$$

(1)

where x and y are two random variables. The edge probabilities of x and y are denoted as $p(x)$ and $p(y)$ respectively, while $p(x,y)$ represents their joint probability density. Building upon this foundation, Vinh et al. [18] introduced a CR metric to characterize the correlation and redundancy of each feature. CR denotes the correlation between a feature and the sample labels, as well as the redundancy between two features. For the $i-th$ feature within a feature subset, its CR value can be defined as Eq. (2).

$${\varphi }\left({x}_{i}\right)=I\left({x}_{i},c\right)-\frac{1}{|D|}\sum_{{x}_{i}\in L}I\left({x}_{i},{x}_{l}\right)$$

(2)

where ${\varphi }\left({x}_{i}\right)$ represents the CR of feature ${x}_{i}$, $c$ means the class label of ${x}_{i}$, and $|D|$ means the size of the feature subset.

2.3 Non-dominated Sorting Genetic Algorithm II

The non-dominated sorting genetic algorithm II (NSGA-II) [53] employs the concept of Pareto, which can optimize multiple objectives simultaneously and realize a balance between all objectives [54]. It has been successfully used in many research fields [55‐58]. NSGA-II has been transformed into a multi-objective IA by combining non-dominated rank and crowding distance, which can provide a set of no-dominated solutions (NDSs). The solutions are compared based on the non-dominance rank and the crowding distance. The solution with the lower non-dominated rank is preferred if solutions belong to different fronts. Otherwise, the solution with the bigger crowded distance is preferred.

2.4 Simulated Annealing

The simulated annealing algorithm (SA) [59] accepts an inferior solution with a probability and finds better solutions through multiple cycles. This method can enhance the local search ability, which facilitates individuals in escaping local optima solutions. Thus, SA has a strong local search capability [60]. Equation (3) is widely applied in single-objective problems to determine the probability of accepting a new solution, but it is not suitable for multi-objective problems. Thus, a new probability is introduced as Eq. (4) in this paper [61].

$${P}_{SA}=\left\{\begin{array}{l}1, {f}^{new}<{f}^{cur} \\ {{\exp}}\left(-\frac{{f}^{new}-{f}^{cur}}{T}\right), {f}^{new}\ge {f}^{cur}\end{array}\right.$$

(3)

$${P}_{SA}=\left\{\begin{array}{l}1, \exists m, {f}_{m}^{new}<{f}_{m}^{cur} \\ {\prod }_{m=1}^{M}{{\exp}}(-\frac{{f}_{m}^{new}-{f}_{m}^{cur}}{T}), \exists m, {f}_{m}^{new}\ge {f}_{m}^{cur}\end{array}\right.$$

(4)

where $m$ {1, 2, …, $M$}, ${f}_{m}^{new}$ and ${f}_{m}^{cur}$ are the $m-th$ objective function values of new solution ${X}_{new}$ and current solution ${X}_{cur}$ respectively.

2.5 K-Nearest Neighbor Algorithm

The K-Nearest Neighbor algorithm (KNN) is a simple clustering algorithm that can produce competitive and easily interpretable results [33]. It is widely recognized and extensively employed in the field of pattern recognition for its effectiveness in both classification and regression tasks [42]. In the classification phase, a new test sample is calculated according to Eq. (5) to compute the distance between the sample and all training samples, and then the majority label of the $k$ training samples closest to the unlabeled sample is assigned to classify the unlabeled sample.

$${\text{d}}(X,Z)={\left(\sum_{i=1}^{n}({z}_{i}-{X}_{i})\right)}^{1/2}$$

(5)

where X and Z are D-dimensional vectors in the feature space; In addition, the cross-validation [62] was introduced to assess the classification accuracy for KNN, which leads to the highest classification generalizability (Fig. 1).

3 Proposed CRGEA for Feature Selection

This section presents the details of CRGEA to solve the MOFS. The flowchart of the CRGEA is described in Fig. 2. Algorithm 1 describes the main procedures of the designed CRGEA.

As shown in Fig. 1, firstly, a novel initialization strategy combined with a multiple threshold selection mechanism is developed to produce a high-quality initial population. Then, the multi-point crossover operator and hybrid mutation strategy are applied to search in multiple directions to improve the search ability. Then, the potential NDSs are obtained by the genetic operations. Next, a local acceleration evolution strategy is proposed to enhance the quality of NDSs and overall population quality for the next generation, which further improves the local search ability of the CRGEA. Select superior individuals to participate in next generation evolution. Genetic operations and local search strategy are iteratively performed until the stopping criterion is met, otherwise the algorithm is stopped and the best solutions are reported. The additional details of the CRGEA are described in the following subsections.

3.1 Binary Encoding Strategy and Decoding

The aim of FS is to choose an appropriate subset from the pool of available features and make the optimization objectives optimal. Each feature can exist in one of two states: selected or unselected. We use a binary encoding strategy, i.e., a 0–1 encoding pattern. 1 indicates the selection of the corresponding feature, whereas 0 denotes its exclusion. More specifically, if an individual is encoded as a vector of 01001000, it indicates that the number of features is 8 and the second feature and the fifth feature are selected. Then, the objectives can be calculated using a KNN classifier with K-fold crossover validation. The first objective of average classification error is described by Eq. (6).

$${{\min}}{f}_{1}\left(x\right)=\frac{1}{n}\sum_{k=1}^{n}\frac{{Nerr}_{k}}{Nval}$$

(6)

where $k$ represents $k-th$ clustering process; $x$ is a feasible solution.${Nerr}_{k}$ and $Nval$ are the number of misclassified samples and the total number of all samples in the $k-th$ classification process, respectively. The second objective of the number of features is described by Eq. (7).

$${{\min}}{f}_{2}\left(x\right) = \sum_{i=1}^{D}{x}_{i}$$

(7)

where ${x}_{i}$ indicates the value of $i-th$ position on the sample $x$, and $D$ represents the number of features.

3.2 Initialization Strategy

The quality of the initial population seriously influences the convergence speed of the meta-heuristic-based algorithm. A novel initialization strategy combined with a multiple threshold selection mechanism (ISMTS) is developed to produce a high-quality initial population. Firstly, a CRAM is designed to assess the CR of each feature. This strategy overcomes the shortcomings of evaluating CR for each feature using Eq. (2), i.e., when both correlation and redundancy of features are high, it will cause the probability of features to be close to 0, which will make such features be discarded. Moreover, a multiple thresholds selection strategy is proposed to restrict the number of selected features and enhance the distribution of the population in the target space. This approach ensures that the initial solutions maintain a high level of quality and diversity. The threshold vector is set as $T=[{0.50,0.65,0.80}]$, and ${T}_{m}$ means the $m-th$ value in $T$. The following definitions are used to describe the details of the designed ISMTS:

Definition 1

(${C}_{iy}$) Assume the labels vector of feature ${x}_{i} (i={1,2},\dots ,D)$ is ${\text{y}}$. $D$ is the number of features. The correlation between ${x}_{i}$ and $y$, is calculated by Eq. (8), denoted as ${{\text{C}}}_{iy}$, as follows:

$${C}_{iy}=I\left({x}_{i},y\right) = \sum_{y}\sum_{{x}_{i}}p\left({x}_{i},y\right)lg\frac{p\left({x}_{i},y\right)}{p\left({x}_{i}\right)p(y)}$$

(8)

where ${x}_{i}$ and y are two random variables. The edge probabilities of ${x}_{i}$ and y are denoted as $p({x}_{i})$ and $p(y)$ respectively, while $p({x}_{i},y)$ represents their joint probability density. The importance of the feature to class labels decreases as the value of ${{\text{C}}}_{iy}$ decreases. In addition to retaining the most relevant features, redundant features should be removed where possible. In this paper, ${R}_{i}$ is used to assess redundancy between feature $i$.

Definition 2

(${R}_{i}$) Let ${x}_{i} ,{x}_{j} (i,j={1,2},\dots ,D)$ be the samples of the feature$i,j$, the redundancy of feature $i$ is denoted as${R}_{i}$, as follows:

$${R}_{i}=\frac{1}{|D|}\sum_{j=1}^{D}I({x}_{i},{x}_{j}), i\ne j,i,j={1,2},\dots ,D$$

(9)

Definition 3

(${CR}_{i}$) A new correlation-redundancy assessment method is designed to describe the CR of feature $i$ (${CR}_{i}$), which is calculated by:

$${CR}_{i}={C}_{iy}-\alpha \frac{\mathit{min}\left({C}_{iy}, { R}_{i}\right)}{\mathit{max}\left({C}_{iy}, { R}_{i}\right) },i={1,2},\dots ,S$$

(10)

where $\alpha $ ($\alpha =0.3$ in this paper) is a moderator that is used to adjust the possibility of feature being selected when high correlation and high redundancy occurs. As shown in Eq. (10), it prefers features with high correlation and low redundancy. And it will reduce the probability of feature when it has high correlation and high redundancy. Based on this idea, we normalize ${CR}_{i}$ and define ${P}_{i}$ to determine the probability of a feature is selected.

Definition 4

(${P}_{i}$) The probability of a feature is selected, denoted as ${P}_{i}$, is calculated by:

$${P}_{i}=\frac{{CR}_{i}-{}_{j=1,\dots ,D}{}^{min}({CR}_{i})}{{}_{j=1,\dots ,D}{}^{max}({CR}_{i})-{}_{j=1,\dots ,D}{}^{min}({CR}_{i})}, i={1,2},\dots ,D$$

(11)

Decide whether a feature $i$ will be selected by comparing the ${P}_{i}$ with the threshold ${T}_{m}$ in $T$. The proposed ISMTS is stated as follows procedures, and corresponding flowchart is described as Fig. 2.

Step 1: Parameters initialization: initial population $(IP={\varnothing })$, the size of population ($N$), the number of features ($D$), the probability of feature $i$ is selected (${P}_{i}$);

Step 2: Set $n=1$, which is the index of individual ${x}_{n}$; Set $m=1$, which is the index of threshold vector $T$;

Step 3: Set individual ${x}_{n}\left({x}_{n1,}..,{x}_{nj,}..,{x}_{nD}\right).$ Set $j=1$, which is index of feature;

Step 4: If ${P}_{j}>{T}_{m}$,${x}_{nj} =1$, else ${x}_{nj} =0$;

Step 5: If$j\le D$, Set$j=j+1$, go to Step 4, elseif $n\le ((0.75*N)/3)*m,$ add ${x}_{n} {\text{to}}$ set $ IP,n=n+1$. Set$j=1$, go to Step 4, elseif $m\le 3 ,$ $m=m+1$ go to Step 4, else $n=n+1,$ set$j=1$, go to Step 6.

Step 6: If ${P}_{j}>rand$,${x}_{nj} =1$, else ${x}_{nj} =0$;

Step 7: If $j\le D$, Set $j=j+1$, go to Step 6, elseif $n\le N,$ add ${x}_{n} {\text{to}}$ set $ IP,$ $n=n+1$, Set $j=1$, go to Step 6, else output the obtained initial population $IP.$

3.3 Multi-points Crossover Operator

The crossover operator performs a global search by exchanging the parent's excellent genes, which helps to improve the global search ability. The crossed individuals are compared with their parents by Pareto analysis, and the better two of these individuals are selected to enter the next evolution. Furthermore, an external file (EF) is designed to store solutions that are not dominated by the current solution during the search, which will be used in the selection operator to make an improvement of the overall population. Procedure 1 shows the detailed process of the crossover procedure.

3.4 Hybrid Parallel Mutation Operator

Mutation operator improves the diversity of the population by changing the structure of existing chromosomes. Considering that the aim of FS is to obtain subsets with strong correlation and low redundancy, the proposed mutation strategy specifically designs two mutation strategies to change the number of selected feature quantities, i.e., addition strategy and deletion strategy. This guides individuals to search in different directions to improve the local search ability. The proposed novel evaluation function in Eq. (10) is adopted to determine adding or deleting a gen in a selected individual. The mutated individual $x$ is selected according to the mutation probability ${P}_{m}$. The selected features of $x$ is ${S}_{s}$, and all unselected feature is ${S}_{us}$. According to Eq. (11), calculated ${P}_{i}$ for all unselected features in ${S}_{us}$. The addition strategy and deletion strategy are presented below:

Addition strategy: The feature with maximum ${P}_{i}$ is added to ${S}_{s}$, and its corresponding gene is set to 1. Get the mutated individual ${x}_{1}$;

Deletion strategy: The feature with the smallest value of CR, f, is removed from ${S}_{s}$, and its corresponding gene is set to 0. Get the mutated individual ${x}_{2}$.

As shown in Fig. 3, the designed mutation strategies can change the number of the selected features directionally, which makes the solutions search in different directions and improves the local search ability. The details of the hybrid parallel mutation operator are stated in Procedure 2.

3.5 Local Acceleration Evolution Strategy

In this paper, a local acceleration evolution strategy (LAES) based on a parallel simulated annealing algorithm (PSA) with two search strategies i.e., addition strategy and deletion strategy, is employed to enhance the quality of NDSs derived from genetic operations. This allows LAES to perform a depth parallel search combing the annealing stage around the NDSs, which facilitates individuals in escaping local optima solutions. Unlike the traditional SA, this paper uses parallel SA according to the different characteristics of the addition strategy and deletion strategy, which can search in different directions and improve search efficiency. In addition, a pruning strategy is introduced into LAES to avoid invalid searches and improve search efficiency. If the new solution obtained by a search strategy is dominated by its parent, the strategy is disabled in the next search phase. If the new solution dominates its parent, the strategy is still retained in the next search phase, which indicates that the search strategy of features reduces the classification error. The description of the LAES is indicated in Procedure 3.

3.6 Selection Operator

The selection operator aims to identify individuals with relatively superior performance for inclusion in the next generation population. The NDSs are considered individuals exhibiting better performance. The rapid non-dominated sorting technique and the crowding distance algorithm [53] are employed to identify the NDSs after the genetic operations. The selection operator is executed as follows: Firstly, combine the offspring population ($OP$) after LAES and the $EF$ as $NP$. Then, execute rapid non-dominated sorting for $NP$. Assuming that individuals need to be selected from the first $n$ layers. Finally, individuals from the first $n-1$ front are added to the next generation population $GP$, whose number is $m$. And $N-m$ number of individuals from the $n-th$ front is selected to $GP$ according to the crowding distance.

4 Experimental Study

This section conducts three experiments to validate the effectiveness of the designed CRGEA and search strategies for solving MOFS. The first experiment is to compare the CRGEA with using all features to indicate the necessity of performing FS in classification. The second experiment is to verify the effectiveness and superiority compared with the other five well-known algorithms. In the last experiment, an ablation experiments and statistical analysis are used to prove that the designed strategies are effective. The KNN is used as the classifier to evaluate the selected feature subsets, and K-fold ($K=5$) cross-validation is introduced to assess the classification accuracy of KNN. All experiments in this study are coded in MATLAB R2018a and run on a PC with an Intel (R) Core (TM) i5-12400F CPU and 16G of RAM. The remaining parameters of the have been experimentally determined and set as follows: the population size equals 60, the maximum iteration equals 200, the crossover probability equals 0.9, the mutation probability equals 0.2, the initial temperature equals 10, the final temperature equals 1, the cooling coefficient equals 0.7, the number of cycles at each temperature level equals 3.

All experiments are based on 16 well-regarded high-dimensional datasets involving different domains from the research repository. The first five datasets are sourced from research [21], datasets 6–8 are sourced from research [1], and the last nine datasets are sourced from research [63]. Note that we preprocess the data at first, i.e., remove missing data and features with a constant value to reduce search space. Table 1 presents the details of the dataset, including the varying numbers of features, samples, and classes across different datasets. Unlike many existing studies, the cases in this paper are high-dimensional datasets.

Table 1

The information of the high-dimensional datasets

No.	Dataset	No. of attributes	No. of instances	No. of classes
1	Urban	147	507	9
2	Lsvt	310	126	2
3	Madelon	500	2600	2
4	Isolet1	617	1560	26
5	Yale	1024	165	15
6	Orl	1024	400	40
7	WarpAR10P	2400	130	10
8	WarpPIE10P	2420	210	10
9	Tsandromd	199	4465	2
10	Qsar	1024	1687	2
11	Multiple Features	649	2000	10
12	Hill-Valley	100	1212	2
13	MicroMass	1139	931	2
14	DrivFace	6400	606	3
15	Dexter	20,000	600	2
16	Dorothea	100,000	350	2

4.1 Verification of FS Model

This section compares the CRGEA with the method using all features (without FS) to reveal the necessity of FS in classification. The results are shown in Table 2. The CRGEA can obtain a set of no-dominated solutions, the table shows the solution with the best ${f}_{1}$. The ${RR}_{1}$ and ${RR}_{2}$ mean the reduction rate of ${f}_{1}$ and ${f}_{2}$ respectively. As described in Table 2, significant improvements have been made in both indicators ${f}_{1}$ and ${f}_{2}$. The CRGEA achieves the greatest improvement on WarpAR10P and WarpPIE10P of ${f}_{1}$, which is reduced by 100%. The CRGEA achieves the most dimensionality reduction on WarpPIE10P, and its subset size is reduced by 99.6694%. The results verify the necessity of performing FS in classification, which can reduce the classification error while reducing the size of datasets.

Table 2

Experimental results of CRGEA and using all features

Dataset	CRGEA		Without FS		${RR}_{1}$ (%)	${RR}_{2}$ (%)
Dataset	${f}_{1}$	${f}_{2}$	${f}_{1}$	${f}_{2}$	${RR}_{1}$ (%)	${RR}_{2}$ (%)
Urban	0.0574	17	0.1505	147	61.8605	88.4354
Lsvt	0.0240	10	0.2480	310	90.3226	96.7742
Madelon	0.2315	62	0.3796	500	39.0148	87.6000
Isolet1	0.0514	108	0.1942	617	73.5324	82.4959
Yale	0.1879	24	0.3939	1024	52.2975	97.6563
Orl	0.0250	62	0.1075	1024	76.7442	93.9453
WarpAR10P	0.0000	14	0.4692	2400	100.0000	99.4167
WarpPIE10P	0.0000	8	0.0238	2420	100.0000	99.6694
Tsandromd	0.0159	25	0.0215	199	26.0465	87.4372
Qsar	0.0677	40	0.0920	1024	26.4130	96.0938
Multiple Features	0.0015	15	0.0195	649	92.3077	97.6888
Hill-Valley	0.3347	14	0.4157	100	19.4852	86.0000
MicroMass	0.0054	213	0.0699	1139	92.2747	81.2994
DrivFace	0.0298	44	0.0529	6400	43.6673	99.3125
Dexter	0.0367	15	0.4417	20,000	91.6912	99.9250
Dorothea	0.0400	20	0.0971	100,000	58.8054	99.9800

4.2 Performance Analysis of the Designed CRGEA

To analyze the performance of the designed CRGEA, the widely applied NSGA-II [53], MOEA/D [64], SPEA-II [65], MOBGA-AOS [4], MOEA-ISa [25] and ISAGA [60] are used as the comparison algorithms. To ensure fairness, all algorithms have the same number of iterations, i.e. 200 generations. All parameters of the comparison algorithms have been experimentally determined as follows. All algorithms also have the same population size. The ${P}_{c}$ and ${P}_{m}$ of NSGA-II equal 0.9 and 0.2 respectively. The T (the number of the weight vectors in the neighborhood of each weight vector) of MOEA/D equals 30. The ${P}_{c}$ and ${P}_{m}$ are of MOBGA-AOS equals 0.9 and 1/D respectively as literature [4]. For MOEA-ISa, the crossover probability equals 1. For ISAGA, the initial temperature, the number of cycles, the cooling rate and iteration number in the simulated annealing operation are set to 1, 10, 0.9 and 3, respectively.

The best and average value of ${f}_{1}$ and ${f}_{2}$ obtained by seven algorithms in 30 runs are shown in Table 3. The results indicate that CRGEA demonstrates competitiveness against all comparison algorithms across all datasets. The CRGEA can obtain the best values of both ${f}_{1}$ and ${f}_{2}$ on all datasets, which means that the designed algorithm has a stronger search capability and the obtained solutions are closer to the ideal Pareto front. MOEA-ISa achieves the best ${f}_{1}$ by employing interval initialization which constrains the number of selected features. However, the solution obtained by MOEA-ISa is not as good as that obtained by CRGEA. In addition, compared to the remaining five algorithms using random initialization, they tend to fall into local optima easily for high-dimensional problems, such as Isolet1, WarpPIE10P, Orl, Qsar, and only find sub-optimal solutions. On the WarpAR10P dataset, the CRGEA obtained the minimum best value in ${f}_{1}$, the average performance is worse compared to the other algorithms. This is because the solutions of the designed CRGEA are more widely distributed over ${f}_{1}$. The search in the ${f}_{1}$ direction of the comparison algorithms falls into a local optimum. Figure 4 supports this conclusion. The above results verify the superiority of the designed CRGEA.

Table 3

The best and average values of ${f}_{1}$ and ${f}_{2}$ obtained by seven algorithms

Algorithm	Target	CRGEA		NSGA-II		MOEA/D		SPEA-II
Algorithm	Target	Best	Ave	Best	Ave	Best	Ave	Best	Ave
Urban	${f}_{1}$	0.0574	0.1421	0.1089	0.2074	0.1069	0.2171	0.1109	0.1325
Urban	${f}_{2}$	1.0000	7.6378	1.0000	15.2418	1.0000	16.9255	49.0000	60.1711
Lsvt	${f}_{1}$	0.0240	0.1091	0.0960	0.1545	0.1360	0.2056	0.1440	0.2142
Lsvt	${f}_{2}$	1.0000	3.6140	4.0000	10.3143	7.0000	40.2162	13.0000	123.6129
Madelon	${f}_{1}$	0.2315	0.2947	0.2838	0.3274	0.3200	0.4037	0.3281	0.3643
Madelon	${f}_{2}$	1.0000	12.0602	77.0000	152.1250	49.0000	102.2577	200.0000	227.0787
Isolet1	${f}_{1}$	0.0514	0.1983	0.0566	0.0667	0.1383	0.1896	0.1344	0.1587
Isolet1	${f}_{2}$	1.0000	19.4502	74.0000	93.6337	72.0000	147.3162	254.0000	281.4783
Yale	${f}_{1}$	0.1879	0.3703	0.2000	0.2421	0.3333	0.3864	0.3273	0.3656
Yale	${f}_{2}$	1.0000	11.5325	110.0000	137.2464	58.0000	277.6000	443.0000	474.0781
Orl	${f}_{1}$	0.0250	0.2387	0.0300	0.0524	0.0775	0.1022	0.0800	0.0967
Orl	${f}_{2}$	1.0000	21.5238	80.0000	154.9286	204.0000	305.0380	442.0000	469.7500
WarpAR10P	${f}_{1}$	0.0000	0.1881	0.2923	0.4178	0.3846	0.4400	0.3923	0.4293
WarpAR10P	${f}_{2}$	1.0000	7.9790	412.0000	536.0196	718.0000	858.8367	1086.0000	1138.3051
WarpPIE10P	${f}_{1}$	0.0000	0.1939	0.0095	0.0200	0.0095	0.0249	0.0095	0.0189
WarpPIE10P	${f}_{2}$	1.0000	5.1333	573.0000	1076.7000	711.0000	858.6977	1083.0000	1139.4857
Tsandromd	${f}_{1}$	0.0159	0.0438	0.0211	0.0719	0.0193	0.0692	0.0215	0.0298
Tsandromd	${f}_{2}$	1.0000	11.3206	1.0000	36.1911	1.0000	40.4396	80.0000	91.7917
Qsar	${f}_{1}$	0.0677	0.1698	0.0712	0.0789	0.0825	0.0897	0.0849	0.0905
Qsar	${f}_{2}$	1.0000	15.2896	206.0000	297.0000	198.0000	300.6338	470.0000	494.6905
Multiple Features	${f}_{1}$	0.0015	0.0665	0.0075	0.0134	0.0130	0.0208	0.0150	0.0183
Multiple Features	${f}_{2}$	1.0000	7.9792	90.0000	173.4058	88.0000	176.6889	279.0000	309.0811
Hill-Valley	${f}_{1}$	0.3347	0.3767	0.3471	0.4173	0.3876	0.4264	0.4083	0.4255
Hill-Valley	${f}_{2}$	1.0000	6.3684	1.0000	6.7959	1.0000	7.2609	35.0000	42.9750
MicroMass	${f}_{1}$	0.0054	0.0487	0.0086	0.0666	0.0409	0.0753	0.0505	0.0721
MicroMass	${f}_{2}$	1.0000	23.3619	190.0000	281.6042	243.0000	364.4571	512.0000	550.5652
DrivFace	${f}_{1}$	0.0298	0.0551	0.0479	0.0506	0.0463	0.0507	0.0479	0.0511
DrivFace	${f}_{2}$	1.0000	10.4603	2656.0000	2937.1143	2376.0000	2670.1875	2946.0000	3136.0000
Dexter	${f}_{1}$	0.0367	0.0895	0.2167	0.3253	0.2700	0.3525	0.2967	0.3640
Dexter	${f}_{2}$	1.0000	7.8889	4631.0000	5079.4000	4694.0000	4841.4359	5339.0000	5466.8966
Dorothea	${f}_{1}$	0.0400	0.0551	0.0486	0.0600	0.0486	0.0833	0.0771	0.0895
Dorothea	${f}_{2}$	1.0000	3.6250	34,056.0000	34,190.3636	33,762.0000	34,340.9474	35,524.0000	35,767.7600

Algorithm	Target	MOBGA-AOS		MOEA-ISa		ISAGA
Algorithm	Target	Best	Ave	Best	Ave	Best	Ave
Urban	${f}_{1}$	0.0634	0.1182	0.0614	0.1623	0.0733	0.1593
Urban	${f}_{2}$	1.0000	8.8137	1.0000	7.2124	1.0000	7.2500
Lsvt	${f}_{1}$	0.1040	0.1586	0.0560	0.1269	0.0640	0.1334
Lsvt	${f}_{2}$	1.0000	8.1714	1.0000	3.9153	1.0000	2.4865
Madelon	${f}_{1}$	0.2808	0.3917	0.2600	0.3493	0.2846	0.3064
Madelon	${f}_{2}$	61.0000	122.7619	1.0000	4.2857	130.0000	171.9851
Isolet1	${f}_{1}$	0.0559	0.1711	0.0521	0.2256	0.0740	0.0925
Isolet1	${f}_{2}$	50.0000	134.6354	1.0000	18.5472	59.0000	136.3651
Yale	${f}_{1}$	0.3333	0.3913	0.1939	0.4133	0.2121	0.2434
Yale	${f}_{2}$	44.0000	111.1882	1.0000	9.5203	113.0000	127.8333
Orl	${f}_{1}$	0.0475	0.1126	0.0350	0.2775	0.0375	0.0500
Orl	${f}_{2}$	68.0000	157.7063	1.0000	24.2932	130.0000	150.6981
WarpAR10P	${f}_{1}$	0.2692	0.3060	0.0000	0.2068	0.2615	0.3045
WarpAR10P	${f}_{2}$	380.0000	408.5000	1.0000	9.7405	398.0000	427.3333
WarpPIE10P	${f}_{1}$	0.0095	0.0141	0.0000	0.1789	0.0095	0.0227
WarpPIE10P	${f}_{2}$	296.0000	344.6087	1.0000	6.0642	561.0000	818.0750
Tsandromd	${f}_{1}$	0.0161	0.0215	0.0161	0.0478	0.0170	0.0496
Tsandromd	${f}_{2}$	1.0000	20.5088	1.0000	10.3457	1.0000	14.1630
Qsar	${f}_{1}$	0.0760	0.0895	0.0694	0.1851	0.0736	0.0780
Qsar	${f}_{2}$	178.0000	261.9342	1.0000	18.7395	208.0000	293.6769
Multiple Features	${f}_{1}$	0.0090	0.0216	0.0025	0.0686	0.0075	0.0116
Multiple Features	${f}_{2}$	82.0000	155.1736	1.0000	11.3576	133.0000	172.0615
Hill-Valley	${f}_{1}$	0.3413	0.3883	0.3372	0.3878	0.3446	0.3863
Hill-Valley	${f}_{2}$	1.0000	6.3239	1.0000	5.6143	1.0000	5.4154
MicroMass	${f}_{1}$	0.0065	0.0416	0.0065	0.0642	0.0075	0.0222
MicroMass	${f}_{2}$	162.0000	283.1039	1.0000	27.7778	115.0000	180.7625
DrivFace	${f}_{1}$	0.0463	0.0509	0.0314	0.0577	0.0446	0.0483
DrivFace	${f}_{2}$	2573.0000	2788.4848	1.0000	8.9360	2411.0000	2807.2400
Dexter	${f}_{1}$	0.2650	0.3294	0.0600	0.1408	0.1967	0.2640
Dexter	${f}_{2}$	4654.0000	5313.2222	1.0000	408.3922	4630.0000	4934.0930
Dorothea	${f}_{1}$	0.0857	0.0927	0.0486	0.0665	0.0486	0.0660
Dorothea	${f}_{2}$	35,779.0000	35,896.3077	1.0000	2.7895	33,726.0000	34,221.5500

Figure 4 indicates the NDSs acquired by the seven comparison algorithms in 30 runs to illustrate the distribution of the obtained solutions. Each algorithm yields multiple solutions, and for enhanced visualization, we present only the normalized NDSs, represented by seven distinct colors. Apparently, the NDSs obtained by CRGEA exhibit dominance over a significant portion of the non-dominated solutions obtained by other comparison algorithms. The solutions obtained by CRGEA also have better convergence and diversity, as they closely approximate the true Pareto front surface.

The evaluation of the distribution and convergence of the solutions for each algorithm is conducted using two widely recognized performance metrics, i.e., the hypervolume (HV) [66] and the inversion generation distance (IGD) [67]. Due to the Pareto frontier of each dataset is not known, we aggregate the solutions obtained by all algorithms and subsequently select the first non-dominated front as the ideal non-dominated frontier surface. Figure 5 describes the results of IGD and HV in 30 runs. The results show that the IGD and HV obtained by the proposed CRGEA are superior to the other algorithms when dealing with high-dimensional datasets.

Additionally, the comparison between the performance of the other algorithms and CRGEA is examined using the Wilcoxon rank sum test with a significance level of 0.05. This statistical analysis helps determine if there are significant differences in the performance of the comparison algorithms compared to CRGEA. The HV is used as a comprehensive evaluation metric in the Wilcoxon test. The results in Table 4 reveal that the results of CRGEA have statistical significance compared with other comparison algorithms on all the high-dimensional datasets.

Table 4

p-values of the Wilcoxon test of between CRGEA and five comparison algorithms

Algorithm	NSGA-II	MOEA\D	SPEA-II	MOBGA-AOS	MOEA-ISa	ISAGA
Urban	1.8267e−04	1.8267e−04	1.8267e−04	2.4613e−04	3.2984e−04	1.3149e−03
Lsvt	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	2.8273e−03	4.5103e−03
Madelon	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	3.2879e−04	1.8267e−04
Isolet1	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	1.7062e−03	1.8267e−04
Yale	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	5.7954e−03	1.8267e−04
Orl	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	4.3964e−04	1.8267e−04
WarpAR10P	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	2.4613e−04	1.8267e−04
WarpPIE10P	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	2.8273e−03	1.8267e−04
Tsandromd	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	5.7954e−03	1.8267e−04
Qsar	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	3.6105e−03	1.8267e−04
Multiple Features	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	2.8273e−03	1.8267e−04
Hill-Valley	2.4613e−04	1.8267e−04	1.8267e−04	7.2846e−03	1.7062e−03	1.7062e−03
MicroMass	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	1.7062e−03	1.8267e−04
DrivFace	1.8267e−04	1.8267e−04	1.8267e−04	1.8267e−04	1.3149e−03	1.8267e−04
Dexter	7.9365e−03	7.9365e−03	7.9365e−03	7.9365e−03	1.5873e−03	7.9365e−03
Dorothea	7.9365e−03	7.9365e−03	7.9365e−03	7.9365e−03	1.5873e−03	7.9365e−03

To illustrate the relationship between performance improvement and computational cost, Table 5 summaries the average CPU time of comparison algorithms. The comparison algorithms show similar time consumption for all datasets. Classifiers are usually necessary in FS to evaluate schemes performance, which consumes most of the algorithm's computation time [21]. In addition, the number of samples and features is the main determinant of the time cost of a classifier. By using the local search operator, the designed CRGEA requires additional fitness evaluations, CRGEA consumes much more runtime. However, it achieves the best classification performance. Overall, the proposed CRGEA algorithm, although adding several new operators, still has acceptable running times similar to other comparison algorithms.

Table 5

Running time consumed by the seven algorithms on the seventeen datasets (unit: minute)

Algorithm	CRGEA	NSGA-II	MOEA\D	SPEA-II	MOBGA-AOS	MOEA-ISa	ISAGA
Urban	12.7193	7.5355	5.6427	6.9557	9.0387	5.0458	14.8421
Lsvt	2.5065	1.3132	0.8748	0.9480	1.5022	0.8237	5.3740
Madelon	296.4988	307.2375	190.8918	204.0307	360.8370	155.1743	491.4305
Isolet1	230.9285	127.8991	81.9510	86.5822	159.8090	141.8812	348.9515
Yale	12.0872	3.9394	3.5636	3.6913	5.1447	2.5606	8.4381
Orl	31.6396	16.6115	10.5774	11.3264	20.2398	5.4572	71.0969
WarpAR10P	15.3468	12.9825	10.2048	10.1534	11.0644	6.9783	16.6875
WarpPIE10P	26.5319	12.1776	8.2142	8.3639	15.5144	4.9409	35.2304
Tsandromd	840.0019	689.1661	435.2586	466.4182	711.2831	364.1804	1180.0643
Qsar	377.7811	226.7835	143.4726	152.0507	288.4471	119.8202	965.2412
Multiple Features	255.9523	215.2260	159.9345	175.1065	288.6988	120.1742	449.6812
Hill-Valley	80.9282	30.6911	18.8169	20.5112	37.7909	15.8028	128.4332
MicroMass	118.9816	79.2645	50.2834	53.1553	92.4578	23.9161	295.9968
DrivFace	187.7609	188.3944	121.6430	124.1323	226.1043	60.0396	288.4295
Dexter	699.4554	336.5319	320.8320	323.7435	370.1063	224.4072	1196.2013
Dorothea	1231.1596	857.1976	574.7351	569.7160	901.9547	947.5982	1723.0139

4.3 Effectiveness Validation of the Proposed Strategies

To validate the effectiveness of the designed search techniques in CRGEA, an ablation experiment is conducted. The HV is employed to evaluate the performance of algorithms. The origin algorithm NSGA-II, named MOEA; Combine the MOEA with the proposed initialization strategy (MOEA-ISMTS). Combine the MOEA with the proposed local acceleration evolution strategy (MOEA-LAES). The average values of HV obtained by CRGEA and other variants in 30 runs are shown in Table 6.

Table 6

The results of the designed CRGEA and other variants

Algorithm	CRGEA	MOEA	MOEA-ISMTS	MOEA-LAES
Urban	0.9528	0.8323	0.9185	0.9309
Lsvt	0.8712	0.4033	0.8161	0.7773
Madelon	0.8695	0.2860	0.6547	0.6981
Isolet1	0.9391	0.3293	0.8893	0.8885
Yale	0.9437	0.2782	0.8803	0.8729
Orl	0.9681	0.6781	0.9441	0.9385
WarpAR10P	0.9950	0.2373	0.9453	0.9377
WarpPIE10P	0.9981	0.1796	0.9979	0.9960
Tsandromd	0.9676	0.8222	0.9482	0.9538
Qsar	0.9891	0.3381	0.5922	0.4447
Multiple Features	0.9907	0.3382	0.9778	0.3965
Hill-Valley	0.8983	0.6380	0.7559	0.8462
MicroMass	0.9626	0.5001	0.8479	0.6939
DrivFace	0.9762	0.0694	0.9334	0.0928
Dexter	0.9740	0.0583	0.7688	0.0747
Dorothea	0.9500	0.0337	0.7200	0.0284

As described in Table 6, it can be observed that both MOEA-ISMTS and MOEA-LAES exhibit superior performance compared to MOEA across all datasets. This finding strongly suggests that the proposed ISMTS and LAES employed in these algorithms are indeed effective. Moreover, MOEA-ISMTS performs better than MOEA-LAES when the number of features exceeds 1000, suggesting that the designed ISMTS is more effective in solving high-dimensional MOFS. MOEA-LAES performs better than MOEA-ISMTS when the number of features is less than 1000. This confirms that local search is more effective than MOEA-ISMTS when dealing with small-scale problems. The results obtained by CRGEA are superior to MOEA-ISMTS, which indicates that the local acceleration strategy can further improve the search ability and facilitates individuals in escaping local optima solutions. In addition, the Wilcoxon test is applied to analyze the statistical significance of the results between MOEA and the other three variants. The results of the Wilcoxon test are shown in Table 7, which confirms that the results in Table 6 have statistical significance on all the datasets.

Table 7

p-values of the Wilcoxon test of between MOEA and three comparison algorithms

Algorithm	CRGEA	MOEA-ISMTS	MOEA-LAES
Urban	1.8267e−04	1.8267e−04	1.8267e−04
Lsvt	1.8165e−04	1.7661e−04	1.8063e−04
Madelon	1.8267e−04	1.8267e−04	1.8267e−04
Isolet1	1.8267e−04	1.8267e−04	1.8267e−04
Yale	1.8267e−04	1.8267e−04	1.8267e−04
Orl	1.8267e−04	1.8267e−04	1.8267e−04
WarpAR10P	1.8267e−04	1.8267e−04	1.8267e−04
WarpPIE10P	1.8267e−04	1.8267e−04	1.8267e−04
Tsandromd	1.8267e−04	1.8267e−04	1.8267e−04
Qsar	1.8267e−04	1.8267e−04	2.8273e−03
Multiple Features	1.8267e−04	1.8267e−04	1.0411e−01
Hill-Valley	3.2984e−04	2.5748e−02	3.6105e−03
MicroMass	1.8267e−04	1.8267e−04	2.4613e−04
DrivFace	1.8267e−04	1.8267e−04	3.6105e−03
Dexter	7.9365e−03	7.9365e−03	1.7062e−03
Dorothea	7.9365e−03	7.9365e−03	2.8273e−03

Additionally, to verify the superiority of the designed ISMTS, a comparison experiment is conducted by comparing with the traditional random initialization (TRI) method. Taking three different scale datasets (WarpAR10P, Multiple Features and DrivFace) as an example, the distributions of individuals using ISMTS and TRI are described in Fig. 6. The results show that the proposed ISMTS method exhibits superior convergence and diversity across various datasets in comparison to the TRI. This serves as further confirmation of the effectiveness of the proposed TRI.

To verify the effectiveness of the proposed novel evaluation function, CRGEA using two different methods for evaluating the CR of each feature on 16 high-dimensional datasets, i.e., the method using the designed Eq. (10) and existing research using Eq. (2), named CRGEA and CRGEA-I respectively. The fourth column in Table 8 indicates the p-values of the Wilcoxon test between CRGEA and CRGEA-I. It is evident that the proposed CRGEA achieve the best results on all datasets, the Wilcoxon test supports the conclusion. This demonstrates the effectiveness of the designed novel evaluation function.

Table 8

The results of the CRGEA and CRGEA-I on 16 high-dimensional datasets

Dataset	CRGEA	CRGEA-I	p-values
Urban	0.8586	0.8423	3.6105e−03
Lsvt	0.7508	0.7226	3.0829e−02
Madelon	0.7713	0.6443	2.7859e−03
Isolet1	0.9313	0.9119	4.3964e−04
Yale	0.9029	0.8178	1.8267e−04
Orl	0.9371	0.9135	1.8267e−04
WarpAR10P	0.9017	0.8666	1.8267e−04
WarpPIE10P	0.8189	0.8042	1.1176e−02
Tsandromd	0.8897	0.8873	1.2108e−01
Qsar	0.9825	0.9351	1.8267e−04
Multiple Features	0.9174	0.9138	1.3039e−03
Hill-Valley	0.7250	0.7117	7.9126e−01
MicroMass	0.9523	0.9183	3.2984e−04
DrivFace	0.9147	0.8582	1.0080e−03
Dexter	0.9005	0.8765	7.9365e−03
Dorothea	0.8731	0.6918	7.9365e−03

5 Summary and Discussions

In Sect. 4.1, the CRGEA is compared with without FS which demonstrates the necessity for performing MOFS in high-dimensional datasets. In Sect. 4.2, we analyze the ${f}_{1}$,${f}_{2}$,IGD and HV metrics between each comparison algorithm, and perform a statistical analysis employing the Wilcoxon test. The performance superiority of CRGEA in tackling high-dimensional MOFS problems has been unequivocally demonstrated. Furthermore, we present the final Pareto frontier of all algorithms, which demonstrates that the Pareto frontier obtained by CRGEA exhibits greater diversity and convergence efficacy in comparison to the comparison algorithms. In Sect. 4.3, an ablation experiment is performed to prove the effectiveness of the designed ISMTS and LAES, and reveal that they both affect the performance of CRGEA. Among them, the designed initialization method can significantly improve the performance of CRGEA in solving high-dimensional MOFS problems. The local acceleration strategy can further improve the search ability and help the individuals jump out of the local optimal solutions. Then, we compare the distribution within the solutions space of the designed ISMTS with that of the TRI. The findings indicate that the proposed ISMTS method exhibits superior convergence and diversity characteristics. Finally, we compare the performance of the CRGEA using two different methods for evaluating the CR of each feature on 16 high-dimensional datasets, and the results demonstrate the effectiveness of the proposed novel correlation-redundancy assessment method.

6 Conclusions and Future Work

Most of the literature on FS of high-dimensional datasets focuses on improvements in search strategies, ignoring the characteristics of the dataset itself such as the correlation and redundancy of each feature. This could potentially degrade the algorithm's search effectiveness. Thus, a correlation-redundancy guided evolutionary algorithm is designed to address high-dimensional FS with the objective of optimizing classification accuracy and the number of features simultaneously. A new correlation-redundancy assessment method is proposed to select the features with high relevance and low redundancy to speed up the entire evolutionary process. A novel initialization strategy combined with a multiple threshold selection mechanism is developed to produce a high-quality initial population. A local acceleration evolution strategy based on a parallel simulated annealing algorithm and a pruning method is developed to improve the local search ability of the CRGEA. Numerical experiments are conducted by comparing CRGEA with state-of-the-art algorithms on 16 high-dimensional datasets to demonstrate the superiority of CRGEA in solving MOFS of high-dimensional datasets. The results show that CRGEA outperforms the other comparative algorithms in ${f}_{1}$,${f}_{2}$, IGD and HV metrics on all datasets. As the dimensionality of the data increases, this advantage becomes increasingly evident. The results of the statistical analysis support the conclusions. Additionally, an ablation experiment was performed to evaluate the potential impact of the proposed ISMTS and LAES on the algorithm's performance. The results demonstrate a significant enhancement in both the convergence speed and search capability of CRGEA when incorporating these proposed techniques. In addition, we also compare the performance of the algorithm using two different methods for evaluating the CR of each feature on 16 high-dimensional datasets. The results indicate that the proposed evaluation function can better evaluate the correlation and redundancy of each feature.

Despite the results from the aforementioned series of experiments highlighting the exceptional performance of the proposed CRGEA for solving MOFS of high-dimensional datasets. However, we mainly focus on the effect of correlation and redundancy of features on classification. The interactions between features are disregarded in this paper, which is common in biology [68‐70]. In the future, we will improve the proposed algorithm for solving interactions between features on biological datasets.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 52075401, 51705386); China Scholarship Council (No.201606955091).

Declarations

Competing interests

The authors declare no competing interests.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel Enhancing Generalization in Few-Shot Learning for Detecting Unknown Adversarial Examples

Nächster Artikel FEASE: Feature Selection and Enhancement Networks for Action Recognition

Zhou J, Hua ZS (2022) A correlation guided genetic algorithm and its application to feature selection. Appl Soft Comput 123:108964. https://doi.org/10.1016/j.asoc.2022.108964CrossRef

Wang YD, Li XP, Ruiz R (2023) Feature selection with maximal relevance and minimal supervised redundancy. IEEE Trans Cybern 53(2):707–717. https://doi.org/10.1109/Tcyb.2021.3139898CrossRefPubMed

Xue B, Zhang MJ, Browne WN et al (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evolut Comput 20(4):606–626. https://doi.org/10.1109/Tevc.2015.2504420CrossRef

Xue Y, Zhu HK, Liang JY et al (2021) Adaptive crossover operator based multi-objective binary genetic algorithm for feature selection in classification. Knowl-Based Syst 227:107218. https://doi.org/10.1016/j.knosys.2021.107218CrossRef

Tuo SH, Li C, Liu F et al (2023) MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. Complex Intell Syst 9(1):637–658. https://doi.org/10.1007/s40747-022-00813-7CrossRef

Xue Y, Tang T, Pang W et al (2020) Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers. Appl Soft Comput 88:106031. https://doi.org/10.1016/j.asoc.2019.106031CrossRef

Nakariyakul S, Casasent DP (2008) Improved forward floating selection algorithm for feature subset selection. Int C Wavel Anal Pat 793–798. https://doi.org/10.1109/ICWAPR.2008.4635885

Strearns SD (1976) On selecting features for pattern classifiers. International Conference on Pattern Recognition. https://doi.org/10.1016/S0031-3203(99)00041-2

Ververidis VD (2006) Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections. Eur Signal Process Conf. https://doi.org/10.5281/ZENODO.39825

10.

Leardi R, Nørgaard L (2010) Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions. J Chemom 18(11):486–497. https://doi.org/10.1002/cem.893CrossRef

11.

Tuo SH, Zhang JY, Yuan XG et al (2017) Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations. Sci Rep-Uk 7:11529. https://doi.org/10.1038/s41598-017-11064-9ADSCrossRef

12.

Tuo SH, Liu HY, Chen H (2020) Multipopulation harmony search algorithm for the detection of high-order SNP interactions. Bioinformatics 36(16):4389–4398. https://doi.org/10.1093/bioinformatics/btaa215CrossRefPubMed

13.

Xue Y, Xue B, Zhang MJ (2019) Self-Adaptive Particle Swarm Optimization for Large-Scale Feature Selection in Classification. ACM Trans Knowl Discov D. https://doi.org/10.1145/3340848CrossRef

14.

Tian Y, Liu R, Zhang X et al (2020) A multi-population evolutionary algorithm for solving large-scale multi-modal multi-objective optimization problems. IEEE Trans Evolut Comput 25(3):405–418. https://doi.org/10.1109/TEVC.2020.3044711CrossRef

15.

Kabir MM, Shahjahan M, Murase K (2011) A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74(17):2914–2928. https://doi.org/10.1016/j.neucom.2011.03.034CrossRef

16.

Moradi P, Gholampour M (2016) A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl Soft Comput 43:117–130. https://doi.org/10.1016/j.asoc.2016.01.044CrossRef

17.

Weigend AS, Bonnlander B (1994) Selecting input variables using mutual information and nonparemetric density estimation. SFB 373 Discussion Papers 42–50

18.

Vinh LT, Lee S, Park YT et al (2012) A novel feature selection method based on normalized mutual information. Appl Intell 37(1):100–120. https://doi.org/10.1007/s10489-011-0315-yCrossRef

19.

Wang ZQ, Gao SC, Zhang Y et al (2022) Symmetric uncertainty-incorporated probabilistic sequence-based ant colony optimization for feature selection in classification. Knowl-Based Syst 256:109874. https://doi.org/10.1016/j.knosys.2022.109874CrossRef

20.

Wang ZQ, Gao SC, Zhou MC et al (2023) Information-theory-based nondominated sorting ant colony optimization for multiobjective feature selection in classification. Ieee T Cybernetics 53(8):5276–5289. https://doi.org/10.1109/TCYB.2022.3185554CrossRef

21.

Song XF, Zhang Y, Gong DW et al (2021) Feature selection using bare-bones particle swarm optimization with mutual information. Pattern Recognit 112:107804. https://doi.org/10.1016/j.patcog.2020.107804CrossRef

22.

Zhu YB, Li WS, Li T (2023) A hybrid Artificial Immune optimization for high-dimensional feature selection. Knowl-Based Syst 260:110111. https://doi.org/10.1016/j.knosys.2022.110111CrossRef

23.

Tian Y, Zhang XY, Wang C et al (2020) An evolutionary algorithm for large-scale sparse multiobjective optimization problems. IEEE Trans Evolut Comput 24(2):380–393. https://doi.org/10.1109/Tevc.2019.2918140CrossRef

24.

Xu H, Xue B, Zhang MJ (2020) Segmented initialization and offspring modification in evolutionary algorithms for bi-objective feature selection. Gecco'20: proceedings of the 2020 genetic and evolutionary computation conference, pp 444–452. https://doi.org/10.1145/3377930.3390192

25.

Xue Y, Cai X, Neri F (2022) A multi-objective evolutionary algorithm with interval based initialization and self-adaptive crossover operator for large-scale feature selection in classification. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2022.109420CrossRef

26.

Xue B, Zhang MJ, BrowneSchool WN (2014) Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276. https://doi.org/10.1016/j.asoc.2013.09.018CrossRef

27.

Deniz A, Kiziloz HE (2019) On initial population generation in feature subset selection. Expert Syst Appl 137:11–21. https://doi.org/10.1016/j.eswa.2019.06.063CrossRef

28.

Paniri M, Dowlatshahi MB, Nezamabadi-pour H (2020) MLACO: A multi-label feature selection algorithm based on ant colony optimization. Knowl-Based Syst 192:105285. https://doi.org/10.1016/j.knosys.2019.105285CrossRef

29.

Li XQ, Ren J (2022) MICQ-IPSO: An effective two-stage hybrid feature selection algorithm for high-dimensional data. Neurocomputing 501:328–342. https://doi.org/10.1016/j.neucom.2022.05.048CrossRef

30.

Hancer E, Xue B, Karaboga D et al (2015) A binary ABC algorithm based on advanced similarity scheme for feature selection. Appl Soft Comput 36:334–348. https://doi.org/10.1016/j.asoc.2015.07.023CrossRef

31.

Hancer E, Xue B, Zhang MJ (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 140:103–119. https://doi.org/10.1016/j.knosys.2017.10.028CrossRef

32.

Hashemi A, Joodaki M, Joodaki NZ et al (2022) Ant colony optimization equipped with an ensemble of heuristics through multi-criteria decision making: a case study in ensemble feature selection. Appl Soft Comput 124:109046. https://doi.org/10.1016/j.asoc.2022.109046CrossRef

33.

Jadhav S, He HM, Jenkins K (2018) Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 69:541–553. https://doi.org/10.1016/j.asoc.2018.04.033CrossRef

34.

Zhang Y, Song XF, Gong DW (2017) A return-cost-based binary firefly algorithm for feature selection. Inform Sci 418:561–574. https://doi.org/10.1016/j.ins.2017.08.047CrossRef

35.

Beheshti Z (2022) BMPA-TVSinV: A Binary Marine Predators Algorithm using time-varying sine and V-shaped transfer functions for wrapper-based feature selection. Knowl-Based Syst 252:109446. https://doi.org/10.1016/j.knosys.2022.109446CrossRef

36.

Chen K, Xue B, Zhang MJ et al (2022) Correlation-guided updating strategy for feature selection in classification with surrogate-assisted particle swarm optimization. IEEE Trans Evol Comput 26(5):1015–1029. https://doi.org/10.1109/Tevc.2021.3134804CrossRef

37.

Emary E, Zawba HM, Hassanien AE (2016) Binary grey wolf optimization approaches for feature selection. Neurocomputing 172:371–381. https://doi.org/10.1016/j.neucom.2015.06.083CrossRef

38.

Hu Y, Zhang Y, Gao XZ et al (2023) A federated feature selection algorithm based on particle swarm optimization under privacy protection. Knowl-Based Syst 260:110122. https://doi.org/10.1016/j.knosys.2022.110122CrossRef

39.

Khalid AM, Hamza HM, Mirjalili S et al (2022) BCOVIDOA: a novel binary coronavirus disease optimization algorithm for feature selection. Knowl-Based Syst 248:108789. https://doi.org/10.1016/j.knosys.2022.108789CrossRefPubMedPubMedCentral

40.

Mafarja MM, Mirjalili S (2017) Hybrid Whale Optimization Algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312. https://doi.org/10.1016/j.neucom.2017.04.053CrossRef

41.

Samieiyan B, MohammadiNasab P, Mollaei MA et al (2022) Novel optimized crow search algorithm for feature selection. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2022.117486CrossRef

42.

Thaher T, Chantar H, Too JW et al (2022) Boolean particle swarm optimization with various evolutionary population dynamics approaches for feature selection problems. Expert Syst Appl 195:116550. https://doi.org/10.1016/j.eswa.2022.116550CrossRef

43.

Wei W, Xuan M, Li L et al (2023) Multiobjective optimization algorithm with dynamic operator selection for feature selection in high-dimensional classification. Appl Soft Comput 143:110360. https://doi.org/10.1016/j.asoc.2023.110360CrossRef

44.

Guo X, Hu J, Yu H et al (2023) A new population initialization of metaheuristic algorithms based on hybrid fuzzy rough set for high-dimensional gene data feature selection. Comput Biol Med 166:107538. https://doi.org/10.1016/j.compbiomed.2023.107538CrossRefPubMed

45.

Yuan GT, Zhai Y, Tang JS et al (2023) CSCIM_FS: Cosine similarity coefficient and information measurement criterion-based feature selection method for high-dimensional data. Neurocomputing 552:126564. https://doi.org/10.1016/j.neucom.2023.126564CrossRef

46.

Askr H, Abdel-Salam M, Hassanien AE (2024) Copula entropy-based golden jackal optimization algorithm for high-dimensional feature selection problems. Expert Syst Appl 238:121582. https://doi.org/10.1016/j.eswa.2023.121582CrossRef

47.

Deng S, Li Y, Wang J et al (2023) A feature-thresholds guided genetic algorithm based on a multi-objective feature scoring method for high-dimensional feature selection. Appl Soft Comput 148:110765. https://doi.org/10.1016/j.asoc.2023.110765CrossRef

48.

Yu K, Sun S, Liang J et al (2023) A bidirectional dynamic grouping multi-objective evolutionary algorithm for feature selection on high-dimensional classification. Inform Sci 648:119619. https://doi.org/10.1016/j.ins.2023.119619CrossRef

49.

Weigend AS, Bonnlander BV (1994) Selecting input variables using mutual information and nonparemetric density estimation. SFB 373 Discussion Papers 42–50

50.

Gu XY, Guo JC, Xiao LJ et al (2020) A feature selection algorithm based on equal interval division and minimal-redundancy-maximal-relevance. Neural Process Lett 51(2):1237–1263. https://doi.org/10.1007/s11063-019-10144-3CrossRef

51.

Gu XY, Guo JC, Ming T et al (2022) A feature selection algorithm based on equal interval division and conditional mutual information. Neural Process Lett 54(3):2079–2105. https://doi.org/10.1007/s11063-021-10720-6CrossRef

52.

Zhang L (2023) A feature selection method using conditional correlation dispersion and redundancy analysis. Neural Process Lett 55:7175–7209. https://doi.org/10.1007/s11063-023-11256-7CrossRef

53.

Deb K, Pratap A, Agarwal S et al (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197. https://doi.org/10.1109/4235.996017CrossRef

54.

Fang YL, Ming H, Li MQ et al (2020) Multi-objective evolutionary simulated annealing optimisation for mixed-model multi-robotic disassembly line balancing with interval processing time. Int J Prod Res 58(3):846–862. https://doi.org/10.1080/00207543.2019.1602290CrossRef

55.

Pal P, Tripathi S, Kumar C (2022) Bandwidth estimation in high mobility scenarios of MANET using NSGA-II optimized fuzzy inference system. Appl Soft Comput 123:108936. https://doi.org/10.1016/j.asoc.2022.108936CrossRef

56.

Tiwari S, Kumar A, Basetti V (2022) Multi-objective micro phasor measurement unit placement and performance analysis in distribution system using NSGA-II and PROMETHEE-II. Measurement 198:111443. https://doi.org/10.1016/j.measurement.2022.111443CrossRef

57.

Xu JS, Tang H, Wang X et al (2022) NSGA-II algorithm-based LQG controller design for nuclear reactor power control. Ann Nucl Energy 169:108931. https://doi.org/10.1016/j.anucene.2021.108931CrossRef

58.

Yazdinejad A, Dehghantanha A, Parizi RM et al (2023) An optimized fuzzy deep learning model for data classification based on NSGA-II. Neurocomputing 522:116–128. https://doi.org/10.1016/j.neucom.2022.12.027CrossRef

59.

Kirkpatrick S, Gelatt CD, Vecchi A (1983) Optimization by simulated annealing. Science 220(4598):671–680. https://doi.org/10.1126/science.220.4598.671ADSMathSciNetCrossRefPubMed

60.

Ji Y D, Bu X G, Sun J W et al (2016) An improved simulated annealing genetic algorithm of EEG feature selection in sleep stage. Asia-Pacific signal and information processing association annual summit and conference, Jeju, pp 1–4. https://doi.org/10.1109/APSIPA.2016.7820683

61.

Sankararao B, Chang KY (2011) Development of a robust multiobjective simulated annealing algorithm for solving multiobjective optimization problems. Ind Eng Chem Res 50(11):6728–6742. https://doi.org/10.1021/ie1016859CrossRef

62.

Hu P, Pan JS, Chu SC (2020) Improved Binary Grey Wolf Optimizer and Its application for feature selection. Knowl-Based Syst 195:105746. https://doi.org/10.1016/j.knosys.2020.105746CrossRef

63.

UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 6 Mar 2018

64.

Zhang QF, Li H (2007) MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731. https://doi.org/10.1109/Tevc.2007.892759CrossRef

65.

Zitzler E, Laumanns M, Thiele L (2001) SPEA2: Improving the strength pareto evolutionary algorithm. Technical Report Gloriastrasse. https://doi.org/10.3929/ethz-a-004284029CrossRef

66.

Rostami S, Neri F (2017) A fast hypervolume driven selection mechanism for many-objective optimisation problems. Swarm Evol Comput 34:50–67. https://doi.org/10.1016/j.swevo.2016.12.002CrossRef

67.

Zitzler E, Thiele L, Laumanns M et al (2003) Performance assessment of multiobjective optimizers: an analysis and review. Ieee T Evolut Comput 7(2):117–132. https://doi.org/10.1109/TEVC.2003.810758CrossRef

68.

Tuo S, Liu F, Feng ZY et al (2022) Membrane computing with harmony search algorithm for gene selection from expression and methylation data. J Membr Comput 4:293–313. https://doi.org/10.1007/s41965-022-00111-8MathSciNetCrossRef

69.

Tuo SH, Chen H, Liu HY (2019) A survey on swarm intelligence search methods dedicated to detection of high-order SNP interactions. IEEE Access 7:162229–162244. https://doi.org/10.1109/Access.2019.2951700CrossRef

70.

Tuo SH, Li C, Liu F et al (2022) A novel multitasking ant colony optimization method for detecting multiorder SNP interactions. Interdiscip Sci 14(4):814–832. https://doi.org/10.1007/s12539-022-00530-2CrossRefPubMed

Titel: A Correlation-Redundancy Guided Evolutionary Algorithm and Its Application to High-Dimensional Feature Selection in Classification
verfasst von: Xiang Sun
Shunsheng Guo
Shiqiao Liu
Jun Guo
Baigang Du
Publikationsdatum: 01.04.2024
Verlag: Springer US
Erschienen in: Neural Processing Letters / Ausgabe 2/2024
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-024-11440-3

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Customer Experience/© © oatawa / Getty Images / iStock, Erdgasmotor 1.5 TGI evo von Volkswagen/© Volkswagen AG, Thorsten Mücke/© Alexandra Bachran, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Dataset	CRGEA		Without FS		\({RR}_{1}\) (%)	\({RR}_{2}\) (%)
Dataset	\({f}_{1}\)	\({f}_{2}\)	\({f}_{1}\)	\({f}_{2}\)	\({RR}_{1}\) (%)	\({RR}_{2}\) (%)
Urban	0.0574	17	0.1505	147	61.8605	88.4354
Lsvt	0.0240	10	0.2480	310	90.3226	96.7742
Madelon	0.2315	62	0.3796	500	39.0148	87.6000
Isolet1	0.0514	108	0.1942	617	73.5324	82.4959
Yale	0.1879	24	0.3939	1024	52.2975	97.6563
Orl	0.0250	62	0.1075	1024	76.7442	93.9453
WarpAR10P	0.0000	14	0.4692	2400	100.0000	99.4167
WarpPIE10P	0.0000	8	0.0238	2420	100.0000	99.6694
Tsandromd	0.0159	25	0.0215	199	26.0465	87.4372
Qsar	0.0677	40	0.0920	1024	26.4130	96.0938
Multiple Features	0.0015	15	0.0195	649	92.3077	97.6888
Hill-Valley	0.3347	14	0.4157	100	19.4852	86.0000
MicroMass	0.0054	213	0.0699	1139	92.2747	81.2994
DrivFace	0.0298	44	0.0529	6400	43.6673	99.3125
Dexter	0.0367	15	0.4417	20,000	91.6912	99.9250
Dorothea	0.0400	20	0.0971	100,000	58.8054	99.9800

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Related Work

2.1 Multi-objective Feature Selection for High-Dimensional Datasets

2.2 Mutual Information

2.3 Non-dominated Sorting Genetic Algorithm II

2.4 Simulated Annealing

2.5 K-Nearest Neighbor Algorithm

3 Proposed CRGEA for Feature Selection

3.1 Binary Encoding Strategy and Decoding

3.2 Initialization Strategy

3.3 Multi-points Crossover Operator

3.4 Hybrid Parallel Mutation Operator

3.5 Local Acceleration Evolution Strategy

3.6 Selection Operator

4 Experimental Study

4.1 Verification of FS Model

4.2 Performance Analysis of the Designed CRGEA

4.3 Effectiveness Validation of the Proposed Strategies

5 Summary and Discussions

6 Conclusions and Future Work

Acknowledgements

Declarations

Competing interests

Publisher's Note

Weitere Artikel der Ausgabe 2/2024

Hierarchical Patch Aggregation Transformer for Motion Deblurring

FTUNet: A Feature-Enhanced Network for Medical Image Segmentation Based on the Combination of U-Shaped Network and Vision Transformer

Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

Conv-Attention: A Low Computation Attention Calculation Method for Swin Transformer

An Adaptive Learning Rate Deep Learning Optimizer Using Long and Short-Term Gradients Based on G–L Fractional-Order Derivative

PCTDepth: Exploiting Parallel CNNs and Transformer via Dual Attention for Monocular Depth Estimation

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.