nach oben

Complex & Intelligent Systems

Erschienen in:

Open Access 05.09.2022 | Original Article

Using dual evolutionary search to construct decision tree based ensemble classifier

verfasst von: Hao Chen, Guoxin Zhang, Xiaoying Pan, Rong Jia

Erschienen in: Complex & Intelligent Systems | Ausgabe 2/2023

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

A typical ensemble learning process typically uses a forward integration mechanism to construct the ensemble classifier with a large number of base classifiers. Based on this mechanism, it is difficult to adjust the diversity among base classifiers and optimize the structure inside ensemble since the generation process has a certain amount of randomness, which makes the performance of ensemble classifiers heavily dependent on the human design decisions. To address this issue, we proposed an automatic ensemble classifier construction method based on a dual-layer evolutionary search mechanism, which includes a tree coding-based base classifier population and a binary coding-based ensemble classifier population. Through a collaborative searching process between the two populations, the proposed method can be driven by training data to update the base classifier population and optimize the ensemble classifiers globally. To verify the effectiveness of the dual evolutionary ensemble learning method (DEEL), we tested it on 22 classification tasks from 4 data repositories. The results show that the proposed method can generate a diverse decision tree population on the training data while searching and constructing ensemble classifiers from them. Compared with 9 competitor algorithms, the proposed method achieved the best performance on 17 of 22 test tasks and improved the average accuracies by 0.97–7.65% over the second place. In particular, the generated ensemble classifiers show excellent structure, which involve small number and diverse decision trees. That increases the transparency of ensembles and helps to perform interpretability analysis on them.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Ensemble learning [1] improves classification performance by combining multiple base classifiers with suitable methods, which are generally classified into three types: bagging [2], boosting [3] and stacking [4]. The ensemble is usually constructed by two steps: first, generating the base classifiers, and then, strategically combining them to obtain the ensemble to perform the calculation task. The problem with this process is that it generates redundant base classifiers that increase the computational cost and storage space and affect the classification accuracy of the ensemble, which results in diminishing returns [5]. In addition, the redundancy within the ensemble also increases the complexity of the system structure, which decreases the model interpretability. Studies show that the generalization ability of ensemble subsets is better than that of ensembles composed of all base classifiers [6]. When diversity and accuracy reach a balance, ensembles can obtain more accurate predictions than all individual members separately. Moreover, the optimization of internal components of the model can make its calculation mechanism more transparent. This is the basis of performing interpretability analysis based on an ensemble. Therefore, for an ensemble learning process, making the construct structure of the ensemble optimal while improving its accuracy is a desirable outcome.

Currently, the selection ensemble method is used to recruit a subset of classifiers from the base classifier population to improve the generalization ability of the ensemble [7]. However, the selection ensemble can only manage the base classifiers among the existing classifiers and cannot globally optimize the structure of the ensemble. Meanwhile, the selection ensemble conducts the base classifiers generation and ensemble construction as two independent stages. The advantages of interaction and cooperation of different base classifiers are not fully exploited. Thus, the ensemble process remains unidirectional and static, and the ensemble quality depends on the reliability of the artificial setting and usually reaches only a local optimum.

To solve the above problems, we propose an automatic ensemble learning method based on a dual evolutionary search structure. The core is a two-level evolutionary process: the first level is a tree coding-based base classifier population search process, and the second level is a binary coding-based ensemble classifier population optimization process. Mutual promotion and collaborative evolution are achieved through intermediate information interactions and feedback mechanisms. By globally optimizing the ensemble population, this method can perform automated ensemble construction while laying the foundation of interpretability analysis of the ensemble model. In this paper, we focus on the following:

(1)

A multiobjective search mechanism for the base classifier population is proposed. The tree coding-based base classifiers are constructed by a greedy search model, where each node will select the local optimal solution. However, local "good" decisions may lead to quality degradation of the whole tree since the structure cannot be updated or modified. We attempt to dynamically adjust the decision tree population from a global perspective through the multiobjective optimization process, thus maintaining the balance between the diversity and accuracy of the tree population.

(2)

A dual evolutionary synergy mechanism is proposed. The base classifier search and ensemble search are two independent but interactive evolutionary processes. Information sharing is used to form a dynamic complementary and progressive relationship between two search populations. This relationship results in decision tree population with better accuracy and diversity, while an ensemble has lower redundancy and higher accuracy. In addition, this construction mechanism does not require the ensemble structure to be set in advance, and the construction process is more automated, thus reducing the influence of human decision-making.

The remainder of this paper is as follows: “Related works” section analyzes the related research work. “Dual evolutionary ensemble learning” section introduces the specific methods of the dual evolutionary ensemble learning (DEEL) mechanism. “Experiment and discussion” section analyzes and discusses the experimental results of the algorithm. “Conclusion” section summarizes the work of this paper.

Precision-oriented approaches

In traditional studies, enhancing the calculation precision is the major objective in the application field of Ensemble learning (EL) [8‐10]. For the bagging method, the base classifier learns on a dataset randomly sampled from the training set. By randomly disturbing the training set, the base classifier obtains high accuracy and diversity. Typically, ensemble learning, such as random forest ensemble learning [11], uses a self-sampling method to disturb samples and features, thus constructing diverse decision trees. Using bagging, Zhang et al. [12] clustered the training samples, constructed the submodule structure of modular neural networks by clustering, and then ensembled the submodules according to the distance measure.

Unlike bagging, boosting is a serial process; for example, adaptive boosting (AdaBoost) [3] adjusts the distribution of training sets in each iteration so that subsequent base learners pay more attention to samples that are difficult to classify correctly. In addition, Li et al. [13] used k-nearest neighbors (KNN) as the base learner for unbalanced distribution samples and used the AdaBoost method to ensemble the optimal feature subset after extracting the samples using binary particle swarm optimization (BPSO).

Stacking usually uses the entire training set to generate the base model, and then the metamodel is trained using the output of the base model as input. Liu et al. [14] proposed a spatiotemporal ensemble framework for predicting parking demand. The prediction results of the base learner for time features are considered channels of images and then ensembled using the features of convolutional neural networks (CNNs) that can capture spatial information. Wang et al. [15] proposed an ensemble method for load forecasting based on long short-term memory (LSTM) networks and clustering. A set of LSTM models was generated by data clustering, and then the prediction results of LSTM were fused and improved using fully connected cascade (FCC) networks. DeepForest [16] uses random forests as nondifferentiable modules for serial ensembles, thereby achieving representation learning through multigranularity scanning and enhancement of the input data.

Some researchers classify ensemble construction as an optimization problem and combine evolutionary algorithms to improve the accuracy of ensembles. Wang et al. [17] ran different evolutionary operators on multiple populations. Operators with good adaptability obtain more computing resources, and then the offspring produced by different populations are exchanged; this exchange produces an exchange of information and enhances model accuracy. Sheng et al. [18] proposed an evolutionary ensemble algorithm for niched neural networks with adaptive negative correlation learning. The ensemble is optimized by sharing the fitness value to improve the accuracy. Zhao et al. [19] performed multiobjective sparse ensemble using evolutionary algorithms; the authors used the sparse ratio, false positive rate, and false negative rate as evaluation metrics. Asafuddoula et al. [20] clustered imbalanced data to obtain a diversity of heterogeneous base classifiers and subsequently used a multiobjective optimization algorithm to optimize the ensemble structure. Wang et al. [21] proposed a multiobjective evolutionary ensemble model based on an extreme learning machine (ELM), which used an evolutionary algorithm to achieve feature selection and network structure search. Wen et al. [22] proposed a constraint voting ensemble framework based on variants of the differential evolutionary algorithm. The designed differential evolutionary variants were used as a search engine. Each constraint was voted according to its own rules, and the individual with the most votes entered the next generation. Wang et al. [8] proposed a random forest-based stacking method that combines a multiobjective evolutionary algorithm and local search to obtain an ensemble model with enhanced performance.

In most applications of the EL method, the primary goal is to quickly construct a high-precision ensemble. To achieve this goal, many base classifiers have been integrated or stacked together. However, recently studies have shown that adding too many base classifiers to an ensemble negatively affects its accuracy and increases computational complexity [23]. Because of the law of diminishing returns, continuing to add base classifiers after the optimal ensemble size has been reached will not improve the accuracy. Moreover, the interior of a complex model is almost a black box, which is not conducive to interpretable analyses of its computational logic.

Improving precision by optimizing the structure of the ensemble

To reduce the redundancy and complexity of the ensemble, Zhou et al. [6] proposed a “selective ensemble” method, which finds a better subset of ensembles by model selection instead of assembling all classifiers. In the past few years, such approaches have been widely used. These approaches differ mainly in the strategy of selecting base classifiers and can be divided into sorting-, clustering-, and optimization-based methods.

For example, Guo et al. [24] proposed margin-based ordered aggregation for ensemble pruning (MOAG), which adjusts the ensemble structure by an unsupervised marginal selection method. Dai et al. [25] proposed a reverse reduction error (RRE) pruning algorithm that incorporates a subtraction operation to subtract the votes of the worse classifiers at the time of voting. Zhou et al. [26] presented the Pareto optimization algorithm for selective ensembles (POSE), which combines multiobjective evolutionary algorithms with local search operators to improve accuracy while optimizing the ensemble structure. Cavalcanti et al. [27] proposed a pruning method that combines diversity measures for ensemble pruning (DivP), which combines different paired diversity matrices with a genetic algorithm. The combined diversity matrices are transformed into a graph, and then the graph coloring method is used to optimize the ensemble structure. Zhu et al. [28] proposed a selective ensemble based on an ELM and the improved discrete artificial fish swarm algorithm (IDAFSEN), which overcomes the drawback that a single ELM is unstable in terms of its classification. Ykhlef et al. [29] proposed a novel ensemble pruning methodology using nonmonotone simple coalitional games (SCG-P), which evaluates the diversity of classifiers based on the Banzhaf power index and selects classifiers with the minimum winning coalition.

Overall, the selection ensemble is still an additive model with only local optimization of the combination of base classifiers. The selection ensemble improves accuracy by reducing the ensemble complexity and removing redundant structures. Moreover, in selection ensembles, the generation of base classifiers and ensemble construction are two independent stages, and if the diversity of the generated base classifiers is low, the generalization ability of the ensemble is affected.

Diversity-oriented approach

In recent years, increasing attention has been given to the optimization of ensemble structures, which is mainly reflected in the diversity-oriented approach. Ensemble diversity is reflected in the similarity between base classifiers [20, 27]. For example, Zhou et al. [30] studied ensemble diversity from the view of multi-information. Most ensemble methods generate classifiers implicitly by perturbing the inputs. For example, bagging and boosting promote diversity by perturbing the input data during training. Random forests perturb subsets of the input data and features to produce diversity, and rotating forests apply principal component analysis (PCA) to each subset as an improved method. These methods of increasing diversity aim to reduce the redundant structure in the ensemble. The measurement of ensemble diversity is mainly focused on the output space [31]. Existing diversity measures are usually divided into paired and unpaired diversity measures. Paired diversity measures include the Q-statistic, K-statistic, correlation coefficient, disagreement measure, and double-fault measure. Nonpairwise measures include the Kohavi–Wolpert variance, the entropy measure, the measure of difficulty, and generalized diversity. Jan et al. [32] proposed misclassification diversity (MD) and demonstrated that MD improves accuracy better than other pairwise diversity metrics. Liu et al. [33] proposed an anti-spoofing ensemble model based on a deep neural network (DNN) in which the generalization of the model is improved by the different views provided by the ensemble diversity. These diversity-oriented approaches focus on the behavior of base classifiers, and the structure of base classifiers also has a significant impact on diversity [34, 35].

Essentially, the ultimate aim of the diversity-oriented approach, is to improve accuracy. However, we notice that in special applications, the interior structure of the ensemble has become an equally important optimization goal. For example, computing on mobile devices requires models that consume as few computing and storage resources as possible; additionally, medical data analyses require models with interpretable analytical capabilities. Thus, a method that can optimize the interior structure and accuracy of an ensemble offers research value.

Dual evolutionary ensemble learning

Method

The bagging method tends to construct a diverse and accurate population of base classifiers in parallel. In contrast, the boosting method focuses on correcting the current deficiencies of the ensemble through the serial generation and stacking process of the base classifiers, and this mechanism centralizes the main contradictions to rapidly improve the computational effectiveness of the ensemble. However, the selective ensemble is optimized by combining among the existing base classifiers to reduce the redundancy and complexity of the ensemble while improving its classification effect. Each of these three models has its own characteristics, and it would be ideal to combine their advantages. Initialization generates many base classifiers in parallel and then introduces an evolutionary mechanism to adjust the structure of base classifiers and global optimization of combinations, thus combining the advantages of the three ensemble models to improve the overall performance of the ensemble.

By specifying the above research ideas, we propose dual evolutionary ensemble learning (DEEL), which has the following overall process: in the initialization stage, the decision tree population and binary coding ensemble population are initialized, and the offspring population is generated through evolutionary operations. In the fitness evaluation stage, the two evolutionary processes exchange and communicate information for the purpose of coevolution. Then, multiobjective selection is performed to construct the next generation population according to the fitness value, and this process is repeated. When the maximum number of iterations is reached, the loop will be exited and the optimal ensemble will be outputted. The DEEL method proposed in this paper just uses decision trees as the base classifiers. This process is shown in Fig. 1.

The advantage of DEEL is that it can search for high-quality base classifiers in parallel while considering the combination effect evaluation and optimization of base classifiers. Moreover, DEEL can break the limitation of base classifier structure solidification, generate new forms of base classifiers, and explore the performance of more different structural ensembles. In this paper, the DEEL method is explained using decision trees.

Tree coding-based base classifier

We designed a classification tree to solve the classification problem. Each node is represented by the quadruplet in Fig. 2a, where each element is represented by a numerical value. The first element in the node is the index of the attribute, the second element is the threshold of the attribute split samples, the third element is the terminal node identifier (1 means the node is a terminal node, and 0 means the node is an intermediate node), and the fourth element is the class label (which is the predicted class of the tree). For example, a binary classification problem with a class label of 0 or 1, sample attribute of $C=\left\{{c}_{1},{c}_{2},{c}_{3},\left.{c}_{4}\right\}\right.$ and corresponding index of $\left\{1,2,3,4\right\}$ is encoded as shown in Fig. 2b.

Evaluation metrics for trees

Diversity: Tree diversity is an assessment of the differences between trees. The double-fault measure (DF) correlates strongly with the accuracy of the majority voting ensemble, so DF is used as the diversity metric of the tree. The formula is as follows:

$${f}_{DF}\left({t}_{i}\right)=\frac{1}{{N}_{t}-1}\sum_{j=1,j\ne i}^{{N}_{t}}\frac{DF\left({t}_{i},{t}_{j}\right)}{Spl}$$

(1)

where ${f}_{DF}\left({t}_{i}\right)$ represents the similarity of tree ${t}_{i}$ in the population and $DF\left({t}_{i},{t}_{j}\right)$ is the number of samples for which both ${t}_{i}$ and ${t}_{j}$ are predicted incorrectly. The $Spl$ is the total of samples. The lowest diversity is indicated when ${f}_{DF}\left({t}_{i}\right)=1$. The ${f}_{DF}\left({t}_{i}\right)$ purpose is to reduce tree repetition of common errors and make the output different for different trees.

Complexity: Because the base classifier is a binary tree whose lateral growth is limited, limiting the tree depth can reduce the tree complexity. The tree complexity can be defined as follows:

$${f}_{depth}\left({t}_{i}\right)=\mathrm{max}\_{\text{depth}}({t}_{i})$$

(2)

where $\mathrm{max}\_\mathrm{depth}\left({t}_{i}\right)$ is the maximum depth of tree ${t}_{i}$.

Accuracy: The accuracy is defined as the percentage of the sample size that the tree predicts correctly. The formula is as follows:

$${f}_{acc}\left({t}_{i}\right)=\frac{Acc\left({t}_{i}\right)}{Spl}$$

(3)

where ${f}_{acc}\left({t}_{i}\right)$ is the accuracy of tree ${t}_{i}$ and $Acc\left({t}_{i}\right)$ is the number of correctly classified samples.

Contribution:The collaboration degree between trees is the degree of contribution to the ensemble. This metric measures the average performance of trees when participating in the ensemble. The formula is as follows:

$${f}_{coop}\left({t}_{i}\right)=0.5+\frac{\sum_{j=1}^{num}\left({f}_{acc}\left({t}_{i}\notin {e}_{j}\right)-{f}_{acc}\left({t}_{i}\in {e}_{j}\right)\right)}{num+1}$$

(4)

where ${{\varvec{f}}}_{{\varvec{c}}{\varvec{o}}{\varvec{o}}{\varvec{p}}}\left({{\varvec{t}}}_{{\varvec{i}}}\right)$ indicates the degree to which ${{\varvec{t}}}_{{\varvec{i}}}$ collaborates with other trees in the population, ${\varvec{n}}{\varvec{u}}{\varvec{m}}$ is the number of times that ${{\varvec{t}}}_{{\varvec{i}}}$ participates in the ensemble, and ${{\varvec{e}}}_{{\varvec{j}}}$ is the ensemble containing ${{\varvec{t}}}_{{\varvec{i}}}$. ${{\varvec{f}}}_{{\varvec{a}}{\varvec{c}}{\varvec{c}}}\left({{\varvec{t}}}_{{\varvec{i}}}\in {{\varvec{e}}}_{{\varvec{j}}}\right)$ denotes the accuracy when ${{\varvec{e}}}_{{\varvec{j}}}$ contains ${{\varvec{t}}}_{{\varvec{i}}}$, and ${{\varvec{f}}}_{{\varvec{a}}{\varvec{c}}{\varvec{c}}}\left({{\varvec{t}}}_{{\varvec{i}}}\notin {{\varvec{e}}}_{{\varvec{j}}}\right)$ denotes the accuracy when ${{\varvec{e}}}_{{\varvec{j}}}$ does not contain ${{\varvec{t}}}_{{\varvec{i}}}$. The difference between the two accuracies is the contribution of ${{\varvec{t}}}_{{\varvec{i}}}$ to ${{\varvec{e}}}_{{\varvec{j}}}$. A smaller ${{\varvec{f}}}_{{\varvec{c}}{\varvec{o}}{\varvec{o}}{\varvec{p}}}\left({{\varvec{t}}}_{{\varvec{i}}}\right)$ indicates a better collaboration of ${{\varvec{t}}}_{{\varvec{i}}}$. If ${{\varvec{t}}}_{{\varvec{i}}}$ is not involved in any ensemble, then ${{\varvec{f}}}_{{\varvec{c}}{\varvec{o}}{\varvec{o}}{\varvec{p}}}\left({{\varvec{t}}}_{{\varvec{i}}}\right)$ defaults to 0.5.

Binary encoding of ensemble

The ensemble population can be expressed as $E=\left\{{e}_{1}...,{e}_{{N}_{e}}\right\}$. As shown in Fig. 3, an ensemble individual $e$ is expressed by a binary string; if $e\left(i\right)=1$, then tree ${t}_{i}$ is selected; and if $e\left(i\right)=0$, then tree ${t}_{i}$ is not selected. $e$ can express a subset of the tree population $T=\left\{{t}_{1},...,{t}_{{N}_{t}}\right\}$.

Evaluation metrics for ensemble

Diversity: Diversity is one of the key aspects of ensembles. The purpose of any ensemble method is for the base classifier to be as accurate and diverse as possible. Ensembles use a difficulty measure as an evaluation metric. The formula is as follows:

$${f}_{\theta }\left({e}_{i}\right)=Var(Z) $$

(5)

where $Z=\left\{{z}_{1},{z}_{2},\dots ,{z}_{Spl}\right\}$ denotes the proportion of correctly classified trees, ${z}_{j}$ is the number of trees in ensemble ${e}_{i}$ that correctly classify sample $j$, and the values are $\left\{0,\frac{1}{Nt},\frac{2}{Nt},\dots ,1\right\}$. Var (∙) denotes the variance of Z. A larger variance indicates that some of the samples are difficult to classify and the rest are easy to classify when the classification behavior of the tree is more similar and the diversity is lower. Conversely, the smaller the variance is, the higher the diversity.

Complexity: The complexity of the ensemble is the number of decision trees contained. The formula is as follows:

$${f}_{num}({e}_{i})={\sum }_{j=1}^{{N}_{t}}{e}_{i}\left(j\right)$$

(6)

where ${f}_{num}({e}_{i})$ indicates the number of decision trees in the ensemble ${e}_{i}$.

Accuracy: The ensemble obtains the prediction result by majority voting and then calculates the accuracy. The formula is as follows:

$${f}_{acc}\left({e}_{i}\right)=\frac{Acc\left({e}_{i}\right)}{Spl}$$

(7)

Algorithm

In this section, we introduce the algorithm in detail. The pseudocode is shown in Algorithm 1.

Step 1: First, the tree population $T=\left\{{t}_{1},{t}_{2},\cdots ,{t}_{{N}_{t}}\right\}$ is initialized, and ${N}_{t}$ is the population number of trees. The initial value of the feature selection probability is ${P}_{feature}=\left\{1/{C}_{feature}\right\}$ (where ${C}_{feature}$ is the sum of feature frequencies), and the node generation probability is ${P}_{node}=1/dp$ (where $dp$ is the depth of the current node), and ${P}_{node}$ is an adaptive value. To avoid generating a tree that is too complex, the probability of intermediate node generation decreases as the tree height increases. The tree construction process is as follows:

Step 1.1: Generate the root node, select a feature randomly according to ${P}_{feature}$, and then generate the split threshold randomly, as shown in Fig. 4a;

Step 1.2: The tree structure is recursively generated. When the random number $Rand$ is less than ${P}_{node}$, the intermediate node is generated, and the feature and split threshold are randomly selected according to ${P}_{feature}$; if $Rand$ is greater than ${P}_{node}$, the terminal node is generated, and the classification label is randomly selected as the attribute, as shown in Fig. 4.

Step 2: Initialize the ensemble population, denoted as $E=\left\{{e}_{1}...,{e}_{{N}_{e}}\right\}$, where ${N}_{e}$ is the ensemble population size; each $e$ consists of 0, 1 randomly; and the length is the decision tree population size ${N}_{t}$.

Step 3: Ensemble sharing and flip operation. This procedure is based on binary encoding to achieve combinatorial optimization of trees, as shown in Fig. 5. This operation allows the offspring to better learn and inherit the strengths of all parents and reduces the sensitivity to the number of parents.

Step 3.1: Shared operation. The element at each position in the shared structure is the element with the most occurrences in the parents. The blue part in Fig. 5 shows the shared structure.

Step 3.2: Binary flip. Each element of the shared structure is randomly flipped with probability $1/{N}_{t}$, as shown in the red part of Fig. 5. The population size of the generated offspring is ${N}_{e}$.

Step 4: The search process of decision trees. During the dual evolutionary search, the offspring are generated by split and fusion operations based on the decision tree in the ensemble shared structure.

Step 4.1: Split operation. It has been shown that reducing the search space boundary for continuous numerical variables can reduce the search time and improve the decision tree accuracy [36]37. We prune the decision tree search space with an ensemble shared structure to increase the search speed. First, all nodes in the decision tree except the root node are traversed, and the feature index of each node is used as the key. Then the subtree of the nodes, the feature frequency, and the split threshold limit are incorporated into the split set as values, as shown in Fig. 6.

Step 4.2: Fusion operation. The ${P}_{feature}$ is updated according to the feature frequency. The decision tree $t$ is randomly selected from the decision tree population $T$ and fused with the split set to optimize the structure. All nodes of the tree $t$ are traversed, and a random value $Rand$ is generated at each node. If $Rand$ is less than ${P}_{fusion}$, then this node is used for fusion, and $Rand$ greater than ${P}_{fusion}$ are not used for fusion.

The fusion node randomly selects one of the fusion operations to be changed, and the fusion operations include: (1) feature index and split threshold changes for fusion nodes. New feature indexes are randomly selected based on ${P}_{feature}$, and new thresholds are randomly generated based on the range of feature thresholds in the split set, as shown in Fig. 7a. (2) Fusion node replacement. Node replacement is randomly selected from the branch of the split set based on the feature index of fused nodes, as shown in Fig. 7b. (3) Change the split threshold of the fusion nodes. Based on the feature index of the fusion nodes, new threshold values are randomly generated in the range of thresholds in the split set, as shown in Fig. 7c. The probability of these three operations being selected is 1/3.

Step 5: Fitness assessment.

Step 5.1: Tree fitness assessment. Tree fitness should not only evaluate its own complexity but also ensure the diversity of the tree population. Additionally, the tree accuracy cannot indicate the collaboration degree in the ensemble, so the contribution in the ensemble must be evaluated as well. Therefore, the tree fitness is evaluated using formulas (1)–(4) in “Evaluation metrics for trees” section:

$${F}_{t}\left({t}_{i}\right)=\left({f}_{DF}\left({t}_{i}\right),{f}_{depth}\left({t}_{i}\right),{f}_{coop}\left({t}_{i}\right),{f}_{acc}\left({t}_{i}\right)\right)$$

(8)

Step 5.2: Ensemble fitness assessment. The optimization objective of the ensemble is to find the best combination of trees. The ensemble fitness is evaluated using formulas (5)–(7) in “Evaluation metrics for ensemble” section:

$${F}_{e}\left({e}_{i}\right)=\left({f}_{\theta }\left({e}_{i}\right),{f}_{num}\left({e}_{i}\right),{f}_{acc}\left({e}_{i}\right)\right)$$

(9)

Step 6: Multiobjective selection for the tree population. The parent and offspring individuals will form a population of size $2{N}_{t}$, and the next generation population of size ${N}_{t}$ will be generated by nondominance sorting. The specific steps are as follows:

Step 6.1: First, the dominance rank (i.e., dominance relationship) of all individuals in the population is calculated according to the four objective functions, and then stratification is performed according to the dominance rank. The dominance rank is the number of this individual’s dominators. When the dominance rank of the tree is 0, that indicates no individual in the population can dominate this tree.

Step 6.2: Other multiobjective algorithms sort individuals at the same rank according to density with the aim of maintaining diversity. The tree diversity in DEEL is calculated according to Eq. (1). The individuals in the same rank are sorted according to ${f}_{DF}$ in ascending order.

Step 6.3: Starting from the dominance rank of 0, the first ${N}_{t}$ trees are selected as the next generation population. For trees with the same rank, select them from front to back according to the ordering in step 6.2. To maintain the correspondence between the set and the tree, the index of trees from the parent remains unchanged and the trees from the offspring replace the eliminated trees in parent. Because the joined offspring are superior to the eliminated parents, the replacement does not reduce ensemble performance.

Step 7: Ensemble population multiobjective selection. Individuals of parents and offspring are merged, and then individuals on the Pareto front surface are retained as the next generation ensemble population based on fitness. The operation is as follows: The merged populations of parents and offspring are entered into the set $P$ and take the ensemble $e$ from $P$ in turn. If there is no individual dominating $\mathrm{e}$ in $\mathrm{P}$, $\mathrm{e}$ is retained for the next generation ensemble population $\mathrm{E}$ and the individuals dominated by $\mathrm{e}$ in $\mathrm{P}$ are deleted. If there is an individual in $\mathrm{P}$ that dominates $e$, then $e$ is dropped. This operation is repeated until the set $P$ is empty. Finding a common structure from the individuals on the Pareto front surface reduces the search space and speeds up the search efficiency.

Step 8: If the current iteration number is equal to the maximum iteration number, output the current tree population and ensemble; otherwise, go to Step 3.

Experiment and discussion

In this section, we selected 22 classification tasks from 4 data repositories, UCI, OpenML, Kaggle and KEEL, to test DEEL, and details are presented in Table 1. These test datasets were selected as randomly as possible from the 4 data repositories to test the proposed method. Meanwhile, the selected test datasets were required to include both binary-classification and multi-classification problems and cover a large range of sample size and feature dimension to make the composition of test datasets more diversified. The experiments were implemented in Python 3.6 using NumPy 1.19 and scikit-learn 0.21. The algorithm is tested on a computer running Windows 10 with a 2.5 GHz processor and 8 GB RAM.

Table 1

Details of the 22 test datasets including number of features, classes, samples, and source

Test task	Features	Classes	Samples	Data repository
German	20	2	1000	UCI
Heart	13	2	270	UCI
Bupa	6	2	345	UCI
Spambase	57	2	4601	UCI
Column	6	2	310	UCI
Ionosphere	34	2	351	UCI
QSAR	41	2	1055	UCI
Diabetes	8	2	768	UCI
Arcene	10,000	2	900	UCI
Musk	168	2	476	UCI
Hayes	5	3	160	UCI
CMC	9	3	1473	UCI
Abalone	8	3	4177	UCI
MobilePrice	21	4	2000	Kaggle
Cirrhosis	20	4	419	Kaggle
BigMartSales	12	3	7061	Kaggle
Drug	6	5	200	Kaggle
KCHouse	20	5	21,614	Kaggle
Sentiment	12	4	11,413	OpenML
Segmentation	10	4	10,697	OpenML
Movement	91	15	360	KEEL
Optdigits	65	10	5620	KEEL

Each test dataset is subjected to fivefold nested cross-validation[38]39 and repeated 20 independent times. In the outer loop of nested cross-validation, 20% of the dataset is used to test the ensemble, and the remaining 20% and 60% of the dataset are used to train the ensemble and tree in the inner loop. The proposed algorithm and the comparison algorithm both use the grid search to optimize the hyperparameters. The parameters of DEEL as follows: the population size of the decision tree is ${N}_{t}=100$, the population size of the ensemble is ${N}_{e}=100$, the fusion node probability ${P}_{fusion}=0.5$, and the number of dual evolution iterations is $G=200$.

Experimental statistical results

Tables 2 and 3 show DEEL’s classification accuracy, time cost and structure of the final ensemble based on 22 test datasets, which include the best, worst, average results, and their standard deviations based on 20 independent operations. Generally, the average accuracy of DEEL is over 70% on all test datasets except the CMC, Abalone, Cirrhosis, and Segmentation datasets. Notably, on BigMartSales and Sentiment, two multiclass classification tasks, the average accuracies of DEEL are close to 100%. Moreover, the ensembles generated by DEEL show a high quality structure, whose average size of the involved decision trees is below 12, and for most test datasets, the diversity between them is below 0.5. The diversity of the final ensemble is calculated by Eq. (5), which indicates higher diversity by smaller values.

Table 2

The statistical results of the DEEL in terms of accuracy and time cost for 22 test datasets

Datasets	Accuracy/%				Execution time/s
Datasets	Best	Worst	Mean	Std	Best	Worst	Mean	Std
German	82.5	79	81.5	1.17	20.01	21.11	20.58	0.47
Heart	92.29	88.15	91.09	1.89	10.18	10.79	10.48	0.19
Bupa	87.75	81.95	84.85	2.38	14.89	17.93	16.26	0.98
Spambase	91.65	88.39	90.34	1.41	58.81	66.62	63.06	3.36
Column	95.32	94.32	95.28	0.88	7.85	12.17	9.85	1.44
Ionosphere	97.57	96.14	96.99	0.78	11.19	16.84	13.87	2.02
QSAR	91.57	88.61	90.04	1.39	19.40	26.92	22.66	2.53
Diabetes	83.61	79.71	81.63	1.97	14.81	23.74	19.15	3.07
Arcene	95	90.5	92.6	1.59	9.28	17.77	12.55	2.73
Musk	93.68	89.47	91.81	2.11	10.31	17.21	13.02	3.08
Hayes	83.75	80.63	82.05	1.59	7.78	10.43	8.89	1.22
CMC	69.71	59.32	66.39	4.24	18.56	25.06	22.57	2.84
Abalone	69.66	64.57	67.28	3.71	40.49	47.18	44.87	3.59
MobilePrice	97.49	90.49	94.59	3.09	18.19	25.95	22.75	3.02
Cirrhosis	61.8	53.57	58.11	2.97	11.29	19.39	15.32	3.53
BigMartSales	100	98.24	99.64	0.78	63.01	75.93	71.84	5.27
Drug	95	85	90.2	4.56	7.54	15.26	10.81	3.14
KCHouse	77.04	72.13	75.48	2.09	198.04	215.71	206.49	6.31
Sentiment	99.99	97.82	99.12	1.18	104.21	118.15	111.28	4.94
Segmentation	60	52.5	55.5	3.25	95.64	119.61	105.92	8.61
Movement	78.88	67.77	72.99	4.51	11.37	18.89	13.8	2.96
Optdigits	98.88	89.99	92.43	3.71	50.40	58.81	55.36	4.48

Table 3

The statistical results of the DEEL in terms of size and diversity for 22 test datasets

Datasets	Size				Diversity
Datasets	Best	Worst	Mean	Std	Best	Worst	Mean	Std
German	5	11	7.6	2.19	0.29	0.37	0.32	0.03
Heart	5	15	9.4	3.78	0.42	0.58	0.52	0.05
Bupa	5	13	7.8	3.56	0.11	0.51	0.24	0.15
Spambase	5	12	7.6	2.6	0.38	0.53	0.44	0.07
Column	4	13	6.8	3.63	0.20	0.77	0.35	0.21
Ionosphere	4	9	8.6	0.89	0.31	0.58	0.39	0.09
QSAR	5	9	6.6	1.67	0.25	0.37	0.28	0.04
Diabetes	7	14	10.6	2.96	0.33	0.37	0.35	0.01
Arcene	8	9	8.2	1.59	0.42	0.66	0.48	0.09
Musk	6	9	7.4	1.51	0.17	0.44	0.35	0.11
Hayes	3	9	5.8	2.49	0.17	0.30	0.21	0.04
CMC	3	14	7.6	4.66	0.24	0.40	0.36	0.06
Abalone	8	14	11.8	4.29	0.22	0.68	0.54	0.16
MobilePrice	4	10	6.8	2.38	0.06	0.41	0.26	0.14
Cirrhosis	4	10	7.6	2.6	0.33	0.60	0.48	0.11
BigMartSales	3	7	4.8	1.78	0.08	0.63	0.39	0.24
Drug	3	8	4.2	2.16	0.11	0.46	0.27	0.11
KCHouse	4	12	7.2	3.11	0.06	0.95	0.56	0.34
Sentiment	3	11	6.2	3.27	0.41	0.61	0.56	0.07
Segmentation	3	14	8.6	5.12	0.23	0.40	0.32	0.06
Movement	3	9	5.89	1.67	0.59	0.80	0.66	0.07
Optdigits	3	9	6.6	2.88	0.42	0.71	0.56	0.14

As shown in Table 1, the number of samples, classes, and features of 22 test datasets have enormous variation. To observe how the increasing number of features, samples and classes affect the performance of proposed method, we divided the test datasets into multiple groups from the above three angles, respectively. The basic grouping rules are as following: from the angle of number of features, the boundary values used for partitioning were set to 10, 20, 50, 100 and 150; from the angle of the number of samples, the boundary values were set to 500, 1000, 2000, 10,000 and 20,000; from the angle of the number of classes, the boundary values were set to 2, 3, 4, 5 and 10. From the statistical results, we can summarize some general rules of DEEL’s performance characteristics. Figure 8 shows the influence of changing the number of samples, classes, and features on the classification accuracy and running time cost. First, we find all the number of samples, classes, and features affect the achieved accuracy. Overall, in the single dataset, the increasing number of samples is beneficial to improve the accuracy. However, each group in Fig. 8 contains multiple datasets. The comprehensive outcomes of DEEL show a certain fluctuation. Especially in the group with more than 10,000 samples, the accuracy of DEEL shows distinct fluctuation. The big sample size produces a huge searching space that is likely to increase the complexity of global search and cause a certain degree of fluctuation of results. Besides, when the number of features increases, each sample will provide more information, which enables the model to discover more association rules. However, it also enlarges the searching space, which is likely to reduce the prediction accuracy of model. The increasing number of classes is likely to raise the difficulty of classification tasks, which degrades the prediction performance of DEEL in general. And, the number of samples dominates the time cost, while the increasing number of classes and features exert smaller impacts on the time cost.

As shown in Fig. 9, the ensembles generated by DEEL include approximately 4 to 12 decision trees for 22 test datasets. Meanwhile, for the test datasets, the diversity of the involved decision trees does not show obvious regular changes. We believe the changing number of samples, classes and features does not have a great impact on the structure of the final ensemble. Even for some test datasets with a large number of samples, DEEL can still produce a small size and highly diverse ensemble. However, its performance may fluctuate somewhat. Therefore, from the view of optimizing the interior structure of the ensemble, the DEEL shows excellent ability.

Comparison of the experimental results

We compared DEEL against 9 competitive methods on the 22 test datasets, which include 3 precision-oriented approaches, DeepForest [16], RF [11] and Bagging [2], 4 select ensemble methods, POSE [26], IDAFSEN [28], GASEN [6], and RRE [25], and 2 diversity-oriented algorithms, ECL [20] and DivP [27]. IDAFSEN uses an improved discrete artificial fish swarm algorithm to optimize ensembles according to diversity, runs on MATLAB 2012a and uses classification accuracy for evaluation. GASEN uses a genetic algorithm to optimize the weights of base classifiers and is implemented in MATLAB, which selects base classifiers to constitute an ensemble according to the generalization error. ECL uses the clustering process to obtain a group of various base classifiers and then crops them using a multiobjective optimization algorithm. The multiobjective optimization algorithm considers the accuracy of each class and the diversity of classifiers, which is implemented in MATLAB 2015b. RRE tries to make full use of the unselected base classifiers by subtracting the votes of the classifiers with the worst validation error from the ensemble. DivP combines different pairwise diversity matrices, and the merged diversity matrix is converted into a graph. Finally, the graph algorithm is used to select the base classifier. DeepForest builds a deep cascade forest based on decision trees and achieves the effect of deep models by transforming features.

The experimental results of IDAFSEN, GASEN, ECL, RRE, and DivP were taken from the original literature. DeepForest, POSE, RF and Bagging were implemented on the same machine as DEEL, and fivefold nested cross-validation was used to evaluate their experimental results. The DEEL’s results are the mean values in Tables 2 and 3. The initial population size of the above methods is 100, except for DeepForest since it does not require the initial population size. In addition, the other hyperparameters of the above method are optimized with the grid search. As shown in Fig. 10a, the DEEL obtained the best average accuracy for 17 of the 22 test problems, which increased the average accuracy by 0.97%–7.65% over the second place. In particular, for Abalone, the DEEL improved the average accuracy over the second-ranked IDAFSEN by 7.65%. For Spambase, the average accuracy of DEEL was lower than that of the first-ranked POSE by approximately 3.16%. For BigMartSales, Sentiment, Movement and Optdigits, the average accuracy of DEEL was lower than that of the first-ranked DeepForest by approximately 0.36%, 0.88%, 7.56% and 7.49%, respectively. In addition, RRE, DeepForest, RF, and Bagging are 4 noniterative methods that have significantly shorter runtimes than iterative methods (e.g., DEEL, IDAFSEN, GASEN, DivP, and POSE), which is shown in Fig. 10b. Meanwhile, it should be noted that the time cost of DEEL is significantly less than IDAFSEN, GASEN, DivP, and POSE time costs. We believe that the main reason for this is that the shared structure reduces the search operations, and the split-fusion operation reduces the traversal times of trees.

Figure 11a shows that the average number of involved decision trees in the ensemble is approximately 4 to 12, which is much smaller than other competitors except DivP. Although DivP has the smallest size of base classifiers, its average accuracies are lower than those of the proposed method by approximately 1.84% to 11%. Furthermore, Fig. 11b reveals that DEEL has the best average diversity for all 22 test datasets. Its diversity curve is clearly below that of all competitors. Figure 11 indicates that relative to 9 algorithms, the ensemble generated by DEEL involves fewer base classifiers with higher diversity. Therefore, the proposed method can effectively reduce the redundancy inside the ensemble and globally optimize the structure of the involved decision trees. In summary, for various types of classification problems with high-dimensional and small sample, the DEEL shows stable performance. Meanwhile, it can be driven by data to automatically produce a high-quality ensemble with excellent internal structure.

Discussion

In this section, we investigated the effect of several key parameters and the dual evolutionary search mechanism.

Effect of G

In DEEL, G is used to limit the iteration of the entire search process. We set 3 different G values (100, 200 and 400), to compare the difference of accuracy among the final ensembles. As shown in Fig. 12, DEEL obviously does not complete convergence when G = 100. When G = 200, the system obtains the best accuracies in 6 of 10 binary-class problems and 8 of 12 multi-class problems. Meanwhile, the accuracies of G = 400 overpass G = 200 only in 8 of all 22 tasks, and the lifting accuracies are about 0.17% to 3.12%.

Overall, bigger G produces more cost. Furthermore, excessive search operations are likely to cause overlearning. Such as, when G = 400, the accuracies of DEEL for the test data show no obvious change comparing with the accuracies of G = 200 for 14 of 22 test datasets. So, a dynamically stopping mechanism of DEEL is desired. Considering the stopping method used in this work is still a static parameter, we believe G = 200 is a relatively appropriate setting.

Effect of ${N}_{t}$ and${N}_{e}$

Figure 13 shows the effect of ${\mathrm{N}}_{\mathrm{t}}$ and ${\mathrm{N}}_{\mathrm{e}}$ on the results when we set ${\mathrm{N}}_{\mathrm{t}}$ and ${\mathrm{N}}_{\mathrm{e}}$ to 50, 100 and 200. In general, the function of ${\mathrm{N}}_{\mathrm{t}}$ and ${\mathrm{N}}_{\mathrm{e}}$ is to limit the size of the two population. When ${\mathrm{N}}_{\mathrm{t}}$ and ${\mathrm{N}}_{\mathrm{e}}$ are 50, the average accuracy of results is approximately 4.88% lower than other two groups of parameter settings, but the gap of average accuracy is very close (approximately 0.48%), when ${\mathrm{N}}_{\mathrm{t}}$ and ${\mathrm{N}}_{\mathrm{e}}$ are set to 100 and 200. If we set ${\mathrm{N}}_{\mathrm{t}}$ and ${\mathrm{N}}_{\mathrm{e}}$ too small, that will affect the richness of the base classifiers, which impacts the quality of ensembles. In contrast, a set of ${\mathrm{N}}_{\mathrm{t}}$ and ${\mathrm{N}}_{\mathrm{e}}$ with large numbers will enhance the amount of search computation, but hardly help to improve the results. For most test datasets, we believe that 100 is a relatively appropriate number.

Effect of${P}_{fusion}$

In the search process of DEEL, the fusion probability ${P}_{fusion}$ is a key parameter to affect the final structure of decision trees. Since the fusion operation is a top-down operation, the nodes closer to the root have a greater impact on the tree structure. Hence, a large ${P}_{fusion}$ is likely to damage the current structure more. On the contrary, a small ${P}_{fusion}$ will delay the convergence process of optimization. As shown in Fig. 14, the changing ${P}_{fusion}$ from 0.1 to 0.9 causes a regular fluctuation of the results for Diabetes, Ionosphere and Hayes. For the 22 test datasets, ${P}_{fusion}=0.5$ yields the relatively better results.

Effect of the dual evolutionary search mechanism

The impact of the dual evolutionary mechanism on the ensemble is mainly reflected in three aspects: precision, complexity, and diversity. Figures 15 and 16 compare the accuracy and structure of the involved base classifiers between DEEL and POSE on Cirrhosis and KCHouse. POSE initializes the decision tree by greedy algorithm, while the decision tree of DEEL is randomly initialized, so the initial ensemble accuracy of DEEL is lower than POSE. However, the accuracy of DEEL increases significantly faster than POSE during the iteration. After 200 iterations, the accuracy of DEEL is higher than POSE by 7.8% and 7.4% on Cirrhosis and KCHouse. Meanwhile, the two curves of decision tree number show distinct difference. Firstly, by and large, the curve of DEEL is less than that of POSE by 5 and 8. Second, the curve of POSE always decreases, whereas the curve of DEEL shows unusual fluctuations. This difference indicates that the dual evolutionary mechanism is constantly adjusting the internal structure of ensembles. This global search capability enables DEEL to synchronously optimize the accuracy and interior structure synchronously of the ensembles.

Figures 17 and 18 show the diversity comparison of involved base classifiers between DEEL and POSE on Cirrhosis and KCHouse. First, the populations of decision tree are clustered. The clustering results were calculated by the affinity propagation clustering algorithm. Then, each tree uses its own features to make a prediction. In these figures, the circles represent trees, the clusters represent the classes of trees, the largest circle in each cluster represents the cluster’s center, and the black circle represents the tree has been selected by an ensemble.

As a whole, the POSE improves the diversity of the trees by self-sampling algorithm, so it has more clusters than DEEL. However, the POSE does not optimize the base classifier structure, therefore, during the iterations, the cluster structure of trees inside POSE exhibits no change. In contrast, from the comparisons between gen = 50 and 200, we notice a marked difference in the cluster structure of trees inside DEEL since DEEL constantly updates its populations of decision trees and ensembles. The diversity value of ensemble is an evaluation index for the interior redundant situation. Apparently, DEEL shows showed better ability in this regard.

Conclusion

In this paper, we proposed an ensemble construction method based on a dual evolutionary search mechanism. DEEL uses evolutionary pressure to find highly collaborative combinations, thereby achieving coevolution through the information interaction between the base classifier population and the ensemble population. The dual evolution mechanism reduces the dependency on human-designed decisions by data-driven mining of feature association rules and increases the reliability and generality of the model. The experimental results for 22 test datasets demonstrate that:

(1)

DEEL produced high quality ensembles for most of the test datasets, which have excellent interior structure and competitive computation accuracy. Overall, DEEL can handle various binary-class and multi-class classification problems. For the high dimensional small sample problem, it displays outstanding performance.

(2)

The DEEL uses only a few parameters to adjust the generation process, and other than gen, they have no significant effect on the model output. Too many iterations will likely to produce overfitting. In the discussion, we analyzed the influence of changes in parameter values on DEEL performance and provided a set of parameter values, which are appropriate for most of the test datasets.

(3)

In this work, the base classifier used in DEEL is only the decision tree due to the following considerations. First, it is easy to implement coding design. In addition, the tree coding-based optimization mechanism can make the ensemble more internally transparent inside, which aids in analyses of the internal decision logic of ensemble.

In the future, we would like to pursue two topics. We note that DEEL provides a method to implement the interpretability analyses for the ensemble model. We will explore the method of knowledge extraction from the output of DEEL. In other side, we will improve the encoding mechanism to enable the proposed method to contain more types of base classifier, and seek out ways to enhance the efficiency of optimizing the base classifier population.

Acknowledgements

We thank the editors and anonymous reviewers for their helpful comments on an earlier draft of this paper. This study was funded by the National Nature Science Foundation of China [61876138 and 61203311], the Key Research and Development Projects of Shaanxi [2022GY-315] and the Special Fund Construction Project of Key Disciplines in China’s Universities of Shaanxi Province.

Declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel On the similarity measures of N-cubic Pythagorean fuzzy sets using the overlapping ratio

Nächster Artikel Multiple spatial residual network for object detection

Dietterich TG (1997) Machine-learning research. AI Mag 18(4):97–97

Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140CrossRefMATH

Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetCrossRefMATH

Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259CrossRef

Zohaib MJ, Verma B (2019) Evolutionary classifier and cluster selection approach for ensemble classification. ACM Trans Knowl Discov Data 14(1):1–18

Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263MathSciNetCrossRefMATH

Sammut C, Webb GI (2011) Encyclopedia of machine learning. Springer Science & Business Media, BerlinMATH

Wang Y, Wang D, Geng N et al (2019) Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Appl Soft Comput 77:188–204CrossRef

Ribeiro V, Reynoso-Meza G (2019) A holistic multi-objective optimization design procedure for ensemble member generation and selection. Appl Soft Comput 83:105664CrossRef

10.

Galicia A, Talavera-Llames R, Troncoso A et al (2019) Multi-step forecasting for big data time series based on ensemble learning. Knowl Based Syst 163:830–841CrossRef

11.

Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefMATH

12.

Zhang Zhao-Zhao QIAO, Jun-Fei YUW (2018) Structure design of hierarchical adaptive modular neural network. Chin J Comput 50(11):32–39

13.

Yi-Jing Li, Hai-Xiang G, Ya-Nan Li, Xiao L (2016) A boosting based ensemble learning algorithm in imbalanced data classification. Syst Eng Theory Pract 36(01):189–199

14.

Liu Y, Lyu C, Khadka A et al (2019) Spatio-temporal ensemble method for car-hailing demand prediction. IEEE Trans Intell Transp Syst 21(12):5328–5333CrossRef

15.

Wang L, Mao S, Wilamowski BM et al (2020) Ensemble learning for load forecasting. IEEE Trans Green Commun Netw 4(2):616–628CrossRef

16.

Zhou ZH, Feng J (2019) Deep forest. Natl Sci Rev 6(1):74–86MathSciNetCrossRef

17.

Wang W, Yang S, Lin Q et al (2018) An effective ensemble framework for multiobjective optimization. IEEE Trans Evol Comput 23(4):645–659CrossRef

18.

Sheng W, Shan P, Chen S et al (2017) A niching evolutionary algorithm with adaptive negative correlation learning for neural network ensemble. Neurocomputing 247:173–182CrossRef

19.

Zhao J, Jiao L, Xia S et al (2018) Multiobjective sparse ensemble learning by means of evolutionary algorithms. Decis Support Syst 111:86–100CrossRef

20.

Asafuddoula M, Verma B, Zhang M (2017) A divide-and-conquer-based ensemble classifier learning by means of many-objective optimization. IEEE Trans Evol Comput 22(5):762–777CrossRef

21.

Wang X, Hu T, Tang L (2021) A multiobjective evolutionary nonlinear ensemble learning with evolutionary feature selection for silicon prediction in blast furnace. IEEE Trans Neural Netw Learn Syst 99:1–14

22.

Wen X, Wu G, Fan M, et al (2020) Voting-mechanism based ensemble constraint handling technique for real-world single-objective constrained optimization. 2020 IEEE Congress on Evolutionary Computation. IEEE, pp 1-8.

23.

Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440MathSciNetCrossRefMATH

24.

Guo L, Boukir S (2013) Margin-based ordered aggregation for ensemble pruning. Pattern Recogn Lett 34(6):603–609CrossRef

25.

Dai Q, Zhang T, Liu N (2015) A new reverse reduce-error ensemble pruning algorithm. Appl Soft Comput 28:237–249CrossRef

26.

Zhou ZH, Yu Y, Qian C (2019) Evolutionary learning: advances in theories and algorithms. Springer, SingaporeCrossRefMATH

27.

Cavalcanti GDC, Oliveira LS, Moura TJM et al (2016) Combining diversity measures for ensemble pruning. Pattern Recogn Lett 74:38–45CrossRef

28.

Zhu X, Ni Z, Cheng M et al (2018) Selective ensemble based on extreme learning machine and improved discrete artificial fish swarm algorithm for haze forecast. Appl Intell 48(7):1757–1775CrossRef

29.

Ykhlef H, Bouchaffra D (2017) An efficient ensemble pruning approach based on simple coalitional games. Inform Fus 34:28–42CrossRef

30.

Zhou ZH, Li N (2010) Multi-information ensemble diversity International Workshop on Multiple Classifier Systems. Springer, Berlin, pp 134–144CrossRef

31.

Gu S, Cheng R, Jin Y (2015) Multi-objective ensemble generation. Wiley Interdiscip Rev Data Mining Knowl Discov 5(5):234–245CrossRef

32.

Jan MZ, Verma B (2019) A novel diversity measure and classifier selection approach for generating ensemble classifiers. IEEE Access 7:156360–156373CrossRef

33.

Liu L, Wei W, Chow KH, et al (2019) Deep neural network ensembles against deception: ensemble diversity, accuracy and robustness. 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems, pp 274–282

34.

Li Y-J et al (2016) A boosting based ensemble learning algorithm in imbalanced data classification. Syst Eng Theory Pract 36:189–199

35.

Jiang Zheng-Shen LIU, Hong-Zhi FUB, Zhong-Hai WU (2019) Decomposition theories of generalization error and auc in ensemble learning with application in weight optimization. Chin J Comput 42(01):1–15

36.

Aglin G, Nijssen S, Schaus P (2020) Learning optimal decision trees using caching branch-and-bound search. Proc AAAI Conf Artif Intell 34(04):3146–31531

37.

Hu X, Rudin C, Seltzer M (2019) Optimal sparse decision trees. Adv Neural Inf Process Syst 32:1–14

38.

Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107MathSciNetMATH

39.

Raschka S (2018) Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808

Titel: Using dual evolutionary search to construct decision tree based ensemble classifier
verfasst von: Hao Chen
Guoxin Zhang
Xiaoying Pan
Rong Jia
Publikationsdatum: 05.09.2022
Verlag: Springer International Publishing
Erschienen in: Complex & Intelligent Systems / Ausgabe 2/2023
Print ISSN: 2199-4536
Elektronische ISSN: 2198-6053
DOI: https://doi.org/10.1007/s40747-022-00855-x

Springer Professional

Using dual evolutionary search to construct decision tree based ensemble classifier

Abstract

Publisher's Note

Introduction

Precision-oriented approaches

Improving precision by optimizing the structure of the ensemble

Diversity-oriented approach

Dual evolutionary ensemble learning

Method

Tree coding-based base classifier

Evaluation metrics for trees

Binary encoding of ensemble

Evaluation metrics for ensemble

Algorithm

Experiment and discussion

Experimental statistical results

Comparison of the experimental results

Discussion

Effect of G

Effect of \({N}_{t}\) and\({N}_{e}\)

Effect of\({P}_{fusion}\)

Effect of the dual evolutionary search mechanism

Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Premium Partner

Springer Professional

Abstract

Publisher's Note

Introduction

Related works

Precision-oriented approaches

Improving precision by optimizing the structure of the ensemble

Diversity-oriented approach

Dual evolutionary ensemble learning

Method

Tree coding-based base classifier

Evaluation metrics for trees

Binary encoding of ensemble

Evaluation metrics for ensemble

Algorithm

Experiment and discussion

Experimental statistical results

Comparison of the experimental results

Discussion

Effect of G

Effect of \({N}_{t}\) and\({N}_{e}\)

Effect of\({P}_{fusion}\)

Effect of the dual evolutionary search mechanism

Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Weitere Artikel der Ausgabe 2/2023

Object tracking in infrared images using a deep learning model and a target-attention mechanism

DynK-hydra: improved dynamic architecture ensembling for efficient inference

AGDS: adaptive goal-directed strategy for swarm drones flying through unknown environments

An adaptive timing mechanism for urban traffic pre-signal based on hybrid exploration strategy to improve double deep Q network

Semi-HFL: semi-supervised federated learning for heterogeneous devices

Improved NSGA-III using transfer learning and centroid distance for dynamic multi-objective optimization

Premium Partner