Review on problem decomposition approaches using CC
The critical step of a cooperative co-evolution (CC) technique is how to decompose the problem into subproblems [
37]. Generally speaking, a CC algorithm should decompose a large problem into many subproblems with minimal interdependency between them. The first problem decomposition methods using CC used two simple approaches: one-dimensional-based and splitting-in-half strategies [
38]. The former type decomposes an
n-dimensional problem into
n one-dimensional subproblems. This strategy is quite simple and can be effective for separable problems. However, this does not consider the interdependencies between subproblems. Therefore, it is not suitable for handling non-separable problems satisfactorily. In comparison, the splitting-in-half strategy always decomposes an
n-dimensional problem into two equal
\(\frac{n}{2}\)-dimensional subproblems. However, when
n is large enough, splitting the problem into
\(\frac{n}{2}\)-dimensional subproblems will still be large resulting computational difficulties [
39].
Considering the interdependencies between subproblems, a CC framework, EACC-G, was proposed for high-dimensional optimization problems based on random grouping (RG) [
39]. This framework randomly divides the decision variables of the high-dimensional problem into a number of groups with predefined group size. It provides a non-zero probability of assigning interacting variables into the same group. In RG, every decision variable has an equal probability of being assigned to any of the subcomponents, and at the beginning of each cycle, the entire process is repeated. This results in a better performance than one-dimensional-based and splitting-in-half strategy decomposition methods for the CC framework. However, the problem here is to define the optimal group size, which is problem-dependent. A small group size may be suitable for separable problems. A larger group size can be useful for non-separable problems, because this can provide a higher probability grouping for more interacting decision variables into the same group. Furthermore, the selection of group size can be difficult at different stages of optimization for a single problem. A small group size can be helpful for finding solution regions faster at the initial stages; however, for later stages, a large group size is preferable, containing more global information for the subproblems. Hence, the appropriate group size depends on the objective function, optimization, and decision variables. To obtain the appropriate group size, a multilevel CC framework (MLCC) was proposed by the same research group in 2010 [
40]. In MLCC, a decomposer pool is built based on different group size and using random grouping strategy, where each decomposer in the pool has various interaction levels between decision variables. The evolution process is divided into several cycles, and at the start of each cycle, a decomposer is selected from the pool depending on their record of performance. The selected decomposer is used to decompose the problem into subproblems having decision variables. The record of performance of each decomposer is updated at the end of each cycle. Following this strategy, MLCC is able to self-adapt to an appropriate interaction level regardless of the decision variables, objective function, and optimization stages. To further improve performance, MLCC also includes a weight vector to co-adapt subcomponents. To group interacting variables together, a variable interaction learning (VIL) method was proposed that accounts all decision variables as independent initially and place them into a separate group. It further investigates the variable interactions iteratively, and when interactions are identified by a pairwise interaction between variables using a non-monotonicity detection, it merges variables into the same group [
31]. This method identifies the subcomponents structure and adjusts the component size using two phases: learning and optimization.
Although DECC-G and MLCC provide opportunities to decompose a large problem into subproblems efficiently, they suffer from two major weaknesses, which degrade algorithmic performance. The first one is the decrease of probability to group interacting variables in a subproblem when the number of interacting variables is increased. The second one is that the adaptive weighting in MLCC is not important in the entire process and sometimes fails to improve the quality of the solution and also increases the number of fitness evaluations. To improve the performance of MLCC, a more frequent random grouping (DECC-ML) method has been proposed by Omidvar et al. [
14]. The DECC-ML method reduces the number of fitness evaluations to find a solution without deteriorating solution quality. In this method, to choose a decomposer, a uniform random number generator has been used, unlike MLCC, which uses a sophisticated probability formula. After DECC-ML, Omidvar et al. [
41] also proposed another decomposition method called
Delta Grouping. When two interacting variables are grouped into different subcomponents, each variable can have a limit to be improved towards the optimal value in a non-separable function. They calculated the amount of change called
delta value in each decision variable in every cycle, which can identify interacting variables. Delta values measure the averaged difference for a specific variable over the entire population to identify interacting variables. The assumption here is to have smaller delta values for interacting variables. The decision variables are sorted based on delta values, and variables with smaller delta values are grouped into the same subcomponent. Compared to previous techniques, delta grouping is effective to group interacting variables. However, this grouping method performs poorly if the objective function has more than one non-separable subcomponent.
A metamodel-based decomposition method called
high-dimensional model representation (HDMR) has also been proposed, which considered two variables non-separable when they have a cooperative effect on the approximated second-order HDMR model [
42]. HDMR uses a map of linkage between input and output system variables. For large-scale fully separable continuous functions, Omidvar et al. introduced a decomposition approach,
multilevel softmax (MLSoft), in 2014 [
43]. MLSoft is basically a modified MLCC algorithm proposed based on reinforcement learning for adaptively finding the proper subcomponent size. The experimental analysis concludes that a subcomponent size should be as small as possible. Because it is not possible to predict the capacity of an optimizer in advance, optimal subcomponent size still requires empirical analysis.
Random grouping and delta grouping decomposition methods decompose an
n-dimensional problem into
k s-dimensional subproblems. However, the major drawback of these approaches is to determine the value of
k or
s. When the objective function has large groups of interacting variables, a small
s value can decrease algorithmic performance. In contrast, when the objective function has several small groups of interacting variables, a large
s value cannot ensure proper utilization of decomposition techniques. Although MLCC [
40] solves the problem of specifying the value of
s through the use of a decomposer pool with a set of possible
s values, it still needs to determine the set of potential
s values. In addition, selecting an
s value in MLCC divides the decision variables into a set of equally-sized subcomponents, which is unlikely to have equally-sized interacting groups for most real-world problems. To overcome these limitations, an automatic decomposition method, differential grouping (DG) was introduced by Omidvar et al. in 2013 [
44]. DG starts investigating the interaction of the first decision variable with the rest of the variables in a pairwise fashion based on the definition of partially additively separable functions. When DG identifies an interaction between the first variable, and any variable, it removes that variable from the set of decision variables and places it in a subcomponent. This process is iterated until all other interactions between the rest of the variables and the first variable is identified and the first subcomponent is built. When no interaction is found, the variable is assumed separable. The entire process is iterated for the rest of the variables to find interactions for all decision variables. Even though DG achieves promising performance in decomposing a large problem into multiple subproblems by grouping interacting variables together, it has some drawbacks, namely, not detecting overlapping functions, being slow to check all interactions, requiring a threshold parameter, and being sensitive to the choice of the threshold parameter.
DG can identify the decision variables, which interact directly, and to identify the indirect interactions between the decision variables, an extension of DG,
extended DG (XDG) was proposed by Yuan et al. in 2015 [
45]. In XDG, first the variables with direct interactions are grouped to the subcomponents and then the overlappings between subcomponents are detected. When an overlap is detected, the subcomponents with same decision variables are combined. XDG may perform poorer in terms of identifying the interaction among multiple variables. Besides, XDG cannot model the decomposition problem formally, and it inherits the sensitivity issue of DG. To overcome these drawbacks, in 2016, Yingbiao et al. [
15] proposed another improvement of DG, called
graph-based DG (gDG). This gDG starts by identifying all separable variables and allocating them into the same group. Here, each decision variable is considered as a vertex, and an edge is an interaction between two variables. A graph is then constructed from this arrangement to model the decomposition problem, and the connected components are identified.
In 2016, Mei et al. [
46] proposed a global differential grouping (GDG) approach to improve the accuracy of the DG algorithm by adopting the DG mechanism and by maintaining the global information in terms of the interaction matrix between decision variables. In addition, GDG also considers computational error. The authors identified three drawbacks of DG: (1) the comparison between the variables is incomplete; (2) it tends to miss several interactions between decision variables, being the grouping performance sensitive to a threshold parameter; and (3) it cannot work appropriately with fully separable functions or when the functions have a large portion of separable variables. To overcome these limitations, GDG uses three different measures: interaction, independence, and interaction and independence combined. GDG can detect variable dependency and can isolate them into the same groups more accurately. GDG overcomes the sensitivity issue of DG by considering computational errors. However, it is not suitable for imbalanced functions because of the global parameter to identify all interactions.
DG uncovered variable interdependency for making near-optimal decomposition with the cost of ignoring indirect non-separable interactions. XDG and GDG solved the indirect non-separability problem; however, they both need high computational cost for the interaction identification process. This leads to a suboptimal performance in the following optimization process. The identification process required substantial computational time because of using a pairwise fashion to detect the interactions by most of the algorithms. To address this, a
fast interdependency identification (FII) algorithm was proposed by Xiao-Min et al. in 2016 [
11]. FII performs the decomposition by identifying the interdependency information for separable and non-separable variables. When non-separable variables are discovered, their interdependency is further investigated. FII requires no complete interdependency for non-separable variables, because it avoids interdependency identification in a pairwise fashion. As a result, FII can save a significant number of fitness evaluations for the following optimization process. The near-optimal decomposition can thus be obtained at a small computational cost.
DG is a breakthrough for large-scale problem decomposition. However, it has a number of drawbacks, and to address these, several improvements have been proposed based on DG. Similarly, another improvement of DG, called
DG2, has been proposed by the same research group who introduced DG [
47]. For fully separable functions, DG2 reduces the number of fitness evaluations by half. This, in turn, supports the algorithm to check every pair of interacting variables at a significantly lower computational cost compared to the previous approaches. The overlapping functions can be effectively identified by testing all pairs of variables interaction. If the DG mechanism is used to develop DG2, a systematic generation of sample points maximize the point reuse, which provides lower bound to detect the interactions. Beyond efficiency, grouping accuracy of DG is also improved significantly with DG2. Since DG2 considers the computational rounding-errors to estimate an appropriate threshold level, it can determine the sensitivity to weak variable interactions. The automatic calculation of the threshold value is useful to deal with imbalanced functions. Furthermore, unlike DG, DG2 does not require an external parameter.
Because previous decomposition algorithms required a large number of fitness evaluations (otherwise failed to obtain proper decomposition results), a historical interdependency based differential grouping (
HIDG) was proposed in 2018 [
12]. Similar to FII, HIDG also uses decision vectors to investigate the interdependencies among vectors in a systematic way. Based on the historical interdependency information, HIDG proposes a criterion to infer interactions for decision vectors without using extra fitness evaluations. To reduce the computational expense of the decomposition algorithm, a recursive differential grouping (RDG) method was proposed by Yuan et al. in 2017 [
10]. RDG considers the interaction between decision variables depending on non-linearity identification. It recursively analyzes the interaction between variable
\(x_i\) and the rest of the variables. When an interaction is detected, the rest of the variables are divided into two groups of equal size. Then the interaction between variable
\(x_i\) and each group is examined. The process is repeated until all independent variables that interact with
\(x_i\) are detected and placed into the corresponding subcomponent as
\(x_i\). The analytical methods validate the effectiveness of RDG without the need to examine the variable interactions in a pairwise fashion.
Another improvement of DG was proposed to overcome its drawbacks regarding the threshold parameter and identify indirect interaction, called
\(\varepsilon \)-
based DG (
\(\varepsilon \)-DG) [
8]. In general,
\(\varepsilon \) is a predefined threshold value, and
\(\varepsilon \) is zero [
44]. According to DG, when
delta \(> \varepsilon \), the decision variables are considered non-separable; otherwise, they are separable. However, authors of
\(\varepsilon \)-DG experimented and observed that
\(\varepsilon = 0\) is ineffective because
\(\varepsilon > 0\) also occurs with separable variables, i.e., setting
\(\varepsilon = 0\) can classify a few decision variables to be non-separable, which are supposed to be separable.
\(\varepsilon \)-DG detects that
delta \(> 0\) for two separable variables are resulted from a computational error. Hence,
\(\varepsilon \)-DG first checks whether
delta \(> 0\) is resulted from an interaction or computational error. If it results from a computational error,
delta is set to zero, which can solve the configuration issue of
\(\varepsilon \). In
\(\varepsilon \)-DG, an
n \(\times \) n DG matrix (DGM) is constructed to represent the interactions between every pair of decision variables. The DGM is able to identify both direct and indirect interactions by setting an element of DGM to zero when there is no interaction between two associated variables; otherwise, it sets to a non-zero value.
Based on the concept of partial derivatives of multivariate functions, GDG has been proven to be an effective grouping strategy in automatically resolving the problem by the global information between variables. However, the grouping results of GDG are no longer updated once GDG decomposed a problem into subproblems and it is not automatically adjusted throughout the algorithmic evolution process. Hence, to resolve the problem by updating the grouping results throughout the evolutionary process, an improvement of GDG was proposed by Wu et al. in 2018 [
48], called
Dynamic GDG (D-GDG). Based on GDG, in D-GDG, the original data matrix of interacting variables is obtained from the general idea of the partial derivative of multivariate functions. The fuzzy clustering technique is used to discern the dynamic clustering of decision variables during the standardization of the original data matrix and to build the fuzzy relation matrix. The size of each group of interacting variables is limited by the lower and upper bounds of the variable grouping scale. The grouping of variables has been adaptively adjusted according to the algorithmic state throughout the optimization process.
In terms of time complexity, RDG is a very effective decomposition method. However, to identify whether two subsets of variables interact, a proper parameter setting is required to determine a threshold value. Furthermore, one global threshold value may not be sufficient for determining interactions of variables in subcomponents having a dissimilar contribution to the fitness value. To solve these problems, a decomposition method inspired by DG2 and RDG was proposed, called
RDG2 [
49]. In RDG2, the threshold value is adaptively estimated based on the computational round-off errors of RDG. An upper bound of the round-off errors can be adequate to distinguish between separable and non-separable variables of large-scale benchmark problems. Besides, experimental results have shown a higher decomposition accuracy of RDG2 compared to RDG and DG2.
All the preceding decomposition algorithms usually group linked or interacting variables into the same group. However, this does not tend to reduce the size of the original problem. Moreover, overlapping problems pose a further challenge to the decomposition algorithms because of the dependency of the components on each other. To address these issues, the RDG decomposition algorithm has been further modified for decomposing overlapping problems. This new algorithm,
RDG3, breaks the linkage at variables, which is shared by more than one component. The performance of RDG3 has been evaluated by overlapping benchmark problems, which confirmed its superiority over RDG and other decomposition algorithms [
50]. Table
2 presents a summary of the papers reviewed on problem decomposition techniques with key features.
Table 2
Summary of papers reviewed on problem decomposition techniques
One-dimensional-based [ 34] | An n-dimensional problem into n one-dimensional subproblems | Quite simple and effective for separable problems | Does not consider the interdependencies between subproblems; unlikely to handle the non-separable problems satisfactorily |
Splitting-in-half strategy [ 38] | An n-dimensional problem into two equal \(\frac{n}{2}\)-dimensional subproblems | Simple and effective for separable problems like one-dimensional-based approach | When n is large enough, splitting the problem into \(\frac{n}{2}\)-dimensional subproblems will still be large; computationally expensive; handles even number of dimensions only |
| Randomly divides the decision variables in the high-dimensional problem into a number of groups with predefined group size | Provides a non-zero probability of assigning interacting variables into the same group | Defining optimal group size, which is problem-dependent; selection of group size; a decrease of probability to group interacting variables in one subproblem when the number of interacting variables is increased |
| A decomposer pool of variable group size and random grouping | Self-adaptation to appropriate interaction level despite decision variables, objective function, and optimization stages | Adaptive weighting in MLCC is not important in the entire process and sometimes fails to improve the quality of solution and also causes an extra number of fitness evaluations |
| Incremental group size; two phases: learning and optimization | Variable interactions detection in one stage and optimization of groups in another stage in the same framework | Heavy computations due to pairwise interaction check |
| More frequent random grouping; uniform selection of subcomponent size | Reduces the number of fitness evaluations to find a solution without deteriorating solution quality | Ineffective when the number of interacting variables increases to five or more; probability to group more than two interacting variables into the same subcomponent tends to be zero despite a co-evolutionary cycle |
| Delta value | Delta value computes the amount of change in each decision variable in every cycle to identify interacting variables | Suffers from low-performance issue when the objective function has more than one non-separable subcomponent |
| Meta-modelling decomposition | Applies the first order RBF-HDMR model for extracting interactions between decision variables | Computationally expensive; validation required on real-world problems |
| Value function; softmax selection | Modification of MLCC and using reinforcement learning | Not suitable for problems of a non-stationary nature |
| Automatic decomposition strategy | Groups interacting variables before optimization and fixes grouping during optimization | Not detecting overlapping functions; slow to check all interactions; requires a threshold parameter; sensitive to the choice of the threshold parameter |
| Identifies direct first and then checks indirect interactions separately | The subcomponents with the same decision variables are combined when an overlap between subcomponents is identified; identifies indirect interactions between decision variables | Not modeling the decomposition problem in a formal way; inherits the sensitivity issue of DG; high computational cost for the interaction identification process |
| Decision variable as vertex, interaction between variables as edge, a graph is constructed to model the problem | Identifies all separable problems and allocates them into the same group | Grouping accuracy depends on a variable threshold parameter |
| Adopts DG; maintains global information in terms of interaction, interdependence, and interaction and interdependence | Detect variable dependency and can isolate them into the same groups more accurately | Not suitable for imbalanced functions because of a global parameter to identify all interactions; high computational cost for the interaction identification process; grouping results of GDG are no longer updated once GDG decomposed a problem into subproblems |
| Identifies the interdependency information for separable and non-separable variables; for non-separable further investigation | Requires no complete interdependency information for non-separable variables because it avoids the interdependency identification in a pairwise fashion | The number of fitness evaluations is \(n^2\) for decomposing overlapping problems; identification accuracy is slightly lower than XDG and GDG; performance limitation on imbalance problems |
| A systematic generation of sample points to maximize the point reuse; computational rounding-errors to estimate an appropriate threshold level | For fully separable functions, DG2 reduces the number of fitness evaluations by half; the automatic calculation of the threshold value is useful to deal with imbalanced functions | Neglects the topology information of decision variables of large-scale problems |
| Decision vectors to investigate the interdependencies between vectors | Infers interactions for decision vectors without using extra fitness evaluations | A complete integration of HIDG with a CC framework is yet to be explored |
| Interaction between decision variables based on the non-linearity identification | Fitness evaluation takes only \({\mathcal {O}}(n\log {}(n))\) time | To identify whether two subsets of variables interact, a proper parameter setting is required to determine a threshold value |
( \(\varepsilon \)-DG) [ 8] | Delta check for computational errors; DG matrix construction | Identify both direct and indirect interactions by setting an element of DGM to zero or non-zero | Effectiveness of the algorithm has not been evaluated on real-world problems |
| Data matrix construction from the general idea of the partial derivative of multivariate functions; fuzzy clustering technique | The grouping of variables has been adaptively adjusted according to the algorithmic state throughout the optimization process | Effectiveness of the algorithm on large-scale black-box real-world optimization problems needs to be verified |
| Adaptively estimation of threshold value | Round-off errors are adequate to distinguish between separable and non-separable variables of large-scale benchmark problems | Neglects the topology information of decision variables of large-scale problems |
| Modified RDG; breaking the linkage at variables | Decomposing overlapping problems | Does not consider weak linkage breaking |
From the literature, it has been observed that problem decomposition techniques fall into two categories [
11,
51]:
Static decomposition A problem is decomposed into subproblems before the evolutionary process of CC starts, and the decomposed subproblems are fixed for the rest of the generations [
11]. In the original CC implementation (CCGA), a problem of
n-dimension was divided into
n one-dimensional subproblems [
34]. With this one-dimensional-based decomposition, CCGA performed better on separable problems than on non-separable problems (in CCGA, Rosenbrock was used as a non-separable problem). This concluded two major problems of static one-dimensional-based decomposition. First, the search capacity of CC may degrade drastically if there are interdependencies exist between the subcomponents. Second, one-dimensional decomposition may waste computational resources during the evolutionary process. The splitting-in-half static decomposition method cannot handle non-separable problems properly either [
38]. By exploring the interdependency information in the problem, a number of detection-based static decomposition techniques proposed to make the near-optimal decomposition. Examples of such static decomposition methods include CCVIL [
31], DG [
44], XDG [
45], GDG [
46], RDG [
10], D-GDG [
48], RDG2 [
49], and RDG3 [
50]. These methods are also called automatic decomposition strategies as they undertake the variable interactions to group the variables [
10].
Dynamic decomposition A problem is decomposed into multiple subproblems at the start. However, it has the ability to adapt the subcomponents during the evolutionary process, i.e., the decision variables can be regrouped into subcomponents in each cycle of the evolutionary process [
11]. RG is a typical example of a dynamic decomposition technique [
39]. In RG, at the start of each evolutionary process, two interacting decision variables have a non-zero probability of being grouped in the same subcomponent. However, as the number of interacting variables increases, RG becomes incapable of grouping all the interacting variables in the same group. Other approaches to improve the performance of dynamic decomposition techniques are also available including MLCC [
40], DECC-ML [
14], and delta grouping (DECC-D) [
28].
The previous extensive literature review on problem decomposition approaches using CC found that the problem decomposition techniques have been widely used, improved, and evaluated on a range of benchmark functions. Apart from these strategies, a number of decomposition methods have also been studied in the literature with CC for some real-world problems. For example, a random-based dynamic grouping method was proposed to group variables for multi-objective optimization problems, which uses a decomposer pool similar to MLCC [
40]. In this method, the decomposer pool consists of several groups of varying sizes, and the size is determined by C-metric based on historical performance [
52]. A weighted random (WR) grouping strategy proposed for large-scale crossing waypoints location problem (a fully non-separable and non-differentiable problem) in air route networks where the space of the problem was updated by extreme weather conditions, breakdowns, and delayed aeroplanes [
53]. A Markov random field-based decomposition method proposed to solve the stochastic flexible job shop scheduling problem for minimizing the expectation and variance of makespan [
54].
Because the number of existing work on FS using CC is limited, problem decomposition techniques using CC for FS are also limited in the literature. A quantum-inspired self-adaptive CC incorporated into shuffled frog leaping algorithm was proposed by Ding and Wang in 2013 to reduce the number of attributes [
55]. In this framework, the attribute sets were divided into subsets using a self-adaptive technique that selects a decomposer from the decomposer pool based on their historical performance records. The performance of this approach was evaluated on global optimization functions and UCI datasets. Recently, in 2019, Wang et al. [
29] proposed a two-stage decomposition approach (CCFS/TD) based on CC for FS on high-dimensional classification problem. In the first stage, CCFS/TD decomposes the evolutionary cycle into
m levels and performs optimization. In the second stage, the evolutionary cycle of each level is further decomposed into a number of independent processes using a majority voting technique. The effectiveness of the proposed CCFS/TD method has been validated by experiments on 10 benchmark datasets. A study by Sun et al. in 2015 [
9] on the selection of decomposition methods for large-scale fully non-separable problems performed experiments on decomposition techniques using RG, delta grouping, DG, and XDG. This study observed that RG is the most effective decomposition method for fully non-separable problems. Because FS problems are fully non-separable, based on the motivation from the study of [
9], in the next section of this paper, a random-based decomposition method, called
random feature grouping (RFG) is proposed with three variants.