1 Introduction
Evolutionary algorithms are a widely used type of stochastic optimization that mimics biological evolution in nature. Like any other metaheuristic optimization algorithm (Brown et al.
2005; Conti et al.
2018), they need to maintain a balance on the exploration/exploitation trade-off in their search process: High exploration bears the risk to miss out on optimizing the intermediate solutions to the fullest; high exploitation bears the risk to miss the global optimum and get stuck in a sub-optimal part of the search space. Analogous to biological evolution, diversity within the population of solution candidates has been identified as a central feature to adjust the exploration/exploitation trade-off. Many means to maintain the diversity of the population throughout the process of evolution have been developed in literature; comprehensive overviews are provided by Squillero and Tonda (
2016) and Gabor et al. (
2018), for example.
For problems with complex fitness landscapes, it is well known that increased exploration (via increased diversity) yields better overall results in the optimization, even when disregarding any diversity goal in the final evaluation (Ursem
2002; Toffolo and Benini
2003). However, this gives rise to a curious phenomenon: By augmenting the fitness function and thus making it match the original objective function
less, we actually get results that optimize the original objective function
more. This implies that any evolutionary algorithm does not immediately optimize for the fitness function it uses (but instead optimizes for a slightly different implicit goal). Furthermore, to really optimize for a given objective function, one should ideally use a (slightly) different fitness function for evolution. In this paper, we introduce
final productive fitness as a theoretical approach to derive the ideal fitness function from a given objective function.
We see that final productive fitness cannot feasibly be computed in advance. However, we show how to approximate it a posteriori, i.e., when the optimization process is already finished. We show that the notion of final productive fitness is sound by applying it to the special case of diversity-aware evolutionary algorithms, which (for our purposes) are algorithms that directly encode a strife for increased diversity by altering the fitness of the individuals. By running these on various benchmark problems, we empirically show that diversity-aware evolutionary processes might just approximate final productive fitness more accurately than an evolutionary process using just the original objective. We show that the fitness alteration performed by these algorithms, when it improves overall performance, does so
while (perhaps
because) it better approximates final productive fitness. We thus argue that the notion of final productive fitness for the first time provides a model of
how diversity is beneficial to evolutionary optimization, which has been called for by various works in literature:
-
“One of the urgent steps for future research work is to better understand the influence of diversity for achieving good balance between exploration and exploitation.” (Črepinšek et al.
2013),
-
“This tendency to discover both quality and diversity at the same time differs from many of the conventional algorithms of machine learning, and also thereby suggests a different foundation for inferring the approach of greatest potential for evolutionary algorithms.” (Pugh et al.
2016),
-
“However, the fragmentation of the field and the difference in terminology led to a general dispersion of this important corpus of knowledge in many small, hard-to-track research lines” and, “[w]hile diversity preservation is essential, the main challenge for scholars is devising general methodologies that could be applied seamlessly [...]” (Squillero and Tonda
2016).
It should be noted that the approach presented in this paper merely provides a new perspective on exploration/exploitation in evolutionary algorithms and a new method of analyzing the effects of diversity. It is up to future works to derive new means to actively promote diversity from this analysis.
In this paper, we provide a short mathematical description of evolutionary processes in Sect.
2 and build our notion of (final) productive fitness on top of that in Sect.
3. Section
4 describes the empirical results and Sect.
5 discusses related work before Sect.
6 concludes.
2 Foundations
For this paper, we assume an evolutionary process (EP) to be defined as follows: Given a fitness function \(f : \mathcal {X} \rightarrow [0; 1] \subset \mathbb {R}\) for an arbitrary set \(\mathcal {X}\) called the search space, we want to find an individual \(x \in \mathcal {X}\) with the best fitness f(x). For a maximization problem, the best fitness is that of an individual x so that \(f(x) \ge f(x') \;\forall x' \in \mathcal {X}\). For a minimization problem, the best fitness is that of an individual x so that \(f(x) \le f(x') \;\forall x' \in \mathcal {X}\). Note that we normalize our fitness space on \([0; 1] \subset \mathbb {R}\) for all problems for ease of comparison. Whenever the maximum and minimum fitness are bounded, this can be done without loss of generality.
Usually, the search space \(\mathcal {X}\) is too large or too complicated to guarantee that we can find the exact best individual(s) using standard computing models (and physically realistic time). Thus, we take discrete subsets of the search space \(\mathcal {X}\) via sampling and iteratively improve their fitness. An evolutionary process \(\mathcal {E}\) over g generations, \(g \in \mathbb {N}\), is defined as \(\mathcal {E} = \langle \mathcal {X}, e, f, (X_i)_{i < g} \rangle\). \(\mathcal {X}\) is the search space. \(e : 2^\mathcal {X} \rightarrow 2^\mathcal {X}\) is the evolutionary step function so that \(X_{i+1} = e(X_i) \;\forall i \ge 0\). As defined above, \(f: \mathcal {X} \rightarrow [0; 1] \subset \mathbb {R}\) is the fitness function. \((X_i)_{i < g}\) is a series of populations so that \(X_i \subseteq \mathcal {X} \;\forall i\) and \(X_0\) is the initial population. Note that as the evolutionary step function e is usually non-deterministic, we define \(E(X) = \{X' | X' = e(X)\}\) to be the set of all possible next populations.
We use the following evolutionary operators:
-
The recombination operator \(\textit{rec} : \mathcal {X} \times \mathcal {X} \rightarrow \mathcal {X}\) generates a new individual from two individuals.
-
The mutation operator \(\textit{mut} : \mathcal {X} \rightarrow \mathcal {X}\) alters a given individual slightly to return a new one.
-
The migration operator \(\textit{mig} : \mathcal {X}\) generates a random individual \(\textit{mig}() \in \mathcal {X}\).
-
The (survivors) selection operator \(\textit{sel} : 2^\mathcal {X} \times \mathbb {N} \rightarrow 2^\mathcal {X}\) returns a new population \(X' = \textit{sel}(X, n)\) given a population \(X \subseteq \mathcal {X}\), so that \(|X'| \le n\).
The operators
\(\textit{rec}, \textit{mut}, \textit{mig}\) can be applied to a population
X by choosing individuals from
X to fill their parameters (if any) according to some selection scheme
\(\sigma : 2^\mathcal {X} \rightarrow 2^\mathcal {X}\) and adding their return to the population. For example, we allow to write
\(\textit{mut}_\sigma (X) = X \cup \{ \; \textit{mut}(x') \; | \; x'\in \sigma (X) \; \}\). Note that all children are added to the population and do not replace their parents in this formulation.
For any evolutionary process
\(\mathcal {E} = \langle \mathcal {X}, e, f, (X_i)_{i < g} \rangle\) and selection schemes
\(\sigma _1, \sigma _2, \sigma _3\) we assume that
$$\begin{aligned} X_{i+1} = e(X_i) = \textit{sel}\,(\textit{mig}_{\sigma _3}(\textit{mut}_{\sigma _2}(\textit{rec}_{\sigma _1}(X_i))), |X_i|). \end{aligned}$$
(1)
Usually, we assume that an evolutionary process fulfills its purpose if the best fitness of the population tends to become better over time, i.e., given a sufficiently large amount of generations
\(k \in \mathbb {N}\), it holds for maximization problems that
\(\max _{x \in X_i} f(x) < \max _{x \in X_{i+k}} f(x)\). We define the overall result of an evolutionary process
\(\mathcal {E} = \langle \mathcal {X}, e, f, (X_i)_{i < g} \rangle\) with respect to a fitness function
\(\phi\) (which may or may not be different from the fitness
f used during evolution) to be best value found and kept in evolution, i.e., for a maximizing objective
\(\phi\) we define
$$\begin{aligned} |\mathcal {E}|_\phi = \max _{x \in X_g} \phi (x).\end{aligned}$$
(2)
Note that there are evolutionary processes which include a
hall-of-fame mechanism, i.e., are able to return the result fitness
$$\begin{aligned} ||\mathcal {E}||_\phi = \max _{i = 1, ..., g} \;\; \max _{x \in X_i} \;\;\phi (x).\end{aligned}$$
(3)
However, we can derive the equality
\(|\mathcal {E}|_\phi = ||\mathcal {E}||_\phi\) when we assume
elitism with respect to
\(\phi\), i.e.,
\({{\,\mathrm{arg\,max}\,}}_{x \in X_i} \phi (x) \in X_{i+1}\) for all
\(i=1,...,g\). Since it makes reasoning easier and hardly comes with any drawback for sufficiently large populations, we use elitist evolutionary processes (with respect to
f) from here on.
3 Approach
The central observation we build our analysis on is that in many cases the results of optimizing for a given objective function (called
\({\textsc {of}}\)) can be improved by not using
\({\textsc {of}}\) as a the fitness function
f of the evolutionary process directly. Consequently, changing the fitness function
f away from the true objective
\({\textsc {of}}\) in some cases leads to better results with respect to the original objective function
\({\textsc {of}}\). Note that this phenomenon extends beyond just heuristic optimization and is known as
reward shaping in reinforcement learning, for example (Ng et al.
1999).
In evolutionary algorithms oftentimes a property called diversity is considered in addition to the objective function
\({\textsc {of}}\) to improve the progress of the evolutionary process (Gabor et al.
2018; Squillero and Tonda
2016; Ursem
2002). In some way or the other, diversity-enhancing evolutionary algorithms award individuals of the population for being different from other individuals in the population. While there are many ways to implement this behavior, like topology-based methods (Tomassini
2006), fitness sharing (Sareni and Krahenbuhl
1998), ensembling (Hart and Sim
2018), etc., we consider an instance of diversity-enhancing evolutionary algorithms that is simpler to analyze: By quantifying the distance of a single individual to the population, we can define a secondary fitness
\({\textsc {sf}}\) that rewards high diversity in the individual. This approach was shown by Wineberg and Oppacher (
2003) to be an adequate general representation of most well-known means of measuring diversity in a population.
In order to avoid the difficulties of multi-objective evolution, we can then define the augmented fitness function \({\textsc {af}}\) that incorporates both the objective fitness \({\textsc {of}}\) and the secondary fitness \({\textsc {sf}}\) into one fitness function to be used for the evolutionary process.
As is shown in Gabor et al. (
2018) and Wineberg and Oppacher (
2003) such a definition of the augmented fitness suffices to show benefits of employing diversity.
We can then define two evolutionary processes
\(\mathcal {E}_{{\textsc {of}}} = \langle \mathcal {X}, e, {\textsc {of}}, (X_i)_{i < g} \rangle\) and
\(\mathcal {E}_{{\textsc {af}}} = \langle \mathcal {X}, e, {\textsc {af}}, (X'_i)_{i < g} \rangle\). We observe the curious phenomenon that in many cases the augmented fitness
\({\textsc {af}}\) better optimizes for
\({\textsc {of}}\) than using
\({\textsc {of}}\) itself, formally
$$\begin{aligned} |\mathcal {E}_{{\textsc {of}}}|_{{\textsc {of}}} < |\mathcal {E}_{{\textsc {af}}}|_{{\textsc {of}}},\end{aligned}$$
(5)
which raises the following question: If
\({\textsc {of}}\) is not the ideal fitness function to optimize for the objective
\({\textsc {of}}\), what is?
Given a sequence of populations
\((X_i)_{i < g}\) spanning over multiple generations
\(i=1, ..., g\) we can write down what we actually want our population to be like inductively starting from the last generation
g: The net benefit of
\(X_g\) to our (maximizing) optimization process is exactly
$$\begin{aligned} |\mathcal {E}|_{{\textsc {of}}} = \max _{x \in X_g} {\textsc {of}}(x)\end{aligned}$$
(6)
as this population will not evolve any further and thus the best individual within
\(X_g\) is what we are going to be stuck with as the result of the optimization process.
Note that the individuals of
\(X_{g-1}\) already contribute differently to the result of the optimization process: From the perspective of generation
\(g-1\) the overall optimization result is
$$\begin{aligned} \max _{x \in X_{g-1}} \;\; \max _{x' \in X_g(x)} {\textsc {of}}(x')\end{aligned}$$
(7)
where the follow-up generation
\(X_g(x)\) is any
1 population from
\(\{ X_g \; | \; X_g \in E(X_{g-1}) \wedge x \in X_g \}\), i.e., the possible next populations where
x survived.
Intuitively, the contribution of the the second-to-last generation
\(X_{g-1}\) to the result of the optimization process stems from the objective fitness
\({\textsc {of}}\) that this generation’s individuals can still achieve in the final generation
\(X_g\). Generally, this does not fully coincide with the application of the objective function
\({\textsc {of}}\) in said generation:
$$\begin{aligned} \max _{x \in X_{g-1}} \; \; \max _{x' \in X_g(x)} {\textsc {of}}(x') \;\;\; \ne \;\;\; \max _{x \in X_{g-1}} {\textsc {of}}(x) \end{aligned}$$
(8)
That means: While rating individuals according to their objective fitness
\({\textsc {of}}\) in the last generation of the evolutionary process is adequate, the actual benefit of the individual
x to the optimization result and the value of
\({\textsc {of}}(x)\) may diverge more the earlier we are in the evolutionary process. Accordingly, at the beginning of an evolutionary process, the objective fitness
\({\textsc {of}}\) might not be a good estimate of how much the individuals will contribute to the process’s return with respect to
\({\textsc {of}}\) at the end of the optimization process. Still, standard optimization techniques often use the objective fitness
\({\textsc {of}}\) as a (sole) guideline for the optimization process.
Instead, we ideally want to make every decision (mutation, recombination, survival, ...) at every generation \(X_i\) with the ideal result for the following generations \(X_{i+1}, X_{i+2}, ...\) and ultimately the final generation \(X_g\) in mind. We call this the optimal evolutionary process. Obviously, to make the optimal decision early on, we would need to simulate all the way to the end of the evolution, including all the follow-up decisions. This renders optimal evolution infeasible as an algorithm. However, we can use it for a posteriori analysis of what has happened within a different evolutionary process. In order to do so, we need to give a fitness function for the optimal process (as it obviously should not be \({\textsc {of}}\)).
Instead, we formalize the benefit to the optimization process discussed above and thus introduce the notion of productive fitness. But first, we need a simple definition on the inter-generational relationships between individuals.
We can now use this relationship to assign the benefit that a single individual has had to the evolution a posteriori. For this, we simply average the fitness of all its surviving descendants.
We argue that the productive fitness
\({\textsc {pf}}\) is better able to describe the actual benefit the individual brings to the optimization process, as represented by what parts of the individual still remain inside the population in a few generations. Note that our notion of productive fitness is rather harsh in two points:
-
We only take the average of all descendants’ fitness. One could argue that we may want a more optimistic approach where we might reward the individual for the best offspring it has given rise to. However, we argue that every bad individual binds additional resources for eliminating it down the road and thus a low target accuracy should actively be discouraged.
-
When the line of an individual dies out completely, we assign the worst possible fitness. Arguments could be made that even dead lines contribute to the search process by ruling out unpromising areas while, e.g., increasing the diversity scores of individuals in more promising areas of the search space. Still, we do count any however distant descendants, so even small contributions to the final population avoid the penalty w.
We leave the analysis of the effects of the discussed parameters to future work. Note that for now, our notion of productive fitness only covers a fixed horizon into the future. We can trivially extend this definition to respect the final generation no matter what generation the current individual is from:
We argue that final productive fitness is able to describe what the fitness function of an optimal evolutionary process looks like: Every evaluation is done in regard to the contribution to the final generation, i.e., the ultimate solution returned by the search process.
We sketch a short argument in favor of Thesis
1. For a more in-depth discussion, see Gabor and Linnhoff-Popien (
2020). Let
\(\mathcal {E}_{\textsc {fpf}}= \langle \mathcal {X}, e, {\textsc {fpf}}, (X_i^{{\textsc {fpf}}})_{i < g} \rangle\) be an evolutionary process using final productive fitness
\({\textsc {fpf}}\). Let
\(\mathcal {E}_{\textsc {idf}}= \langle \mathcal {X}, e, {\textsc {idf}}, (X_i^{\textsc {idf}})_{i < g} \rangle\) be an evolutionary process using a different (possibly more ideal) fitness
\({\textsc {idf}}\). Let
\(X_0^{{\textsc {fpf}}} = X_0^{\textsc {idf}}\). We assume that
$$\begin{aligned} \max _{x \in X_g^{{\textsc {fpf}}}} {\textsc {of}}(x) < \max _{x \in X_g^{{\textsc {idf}}}} {\textsc {of}}(x).\end{aligned}$$
(10)
Since Eq.
10 implies that at least
\(X_g^{{\textsc {fpf}}} \ne X_g^{{\textsc {idf}}}\), there is an individual
\(x \in X_g^{{\textsc {idf}}}\) so that
\(x \notin X_g^{{\textsc {fpf}}}\) and
\({\textsc {of}}(x) > \max _{y \in X_g^{{\textsc {fpf}}}} {\textsc {of}}(y)\). Since both
\(\mathcal {E}_{\textsc {fpf}}\) and
\(\mathcal {E}_{\textsc {idf}}\) use the same evolutionary step function
e except for the used fitness, their difference regarding
x needs to stem from the fact that there exists an individual
\(x'\) that is an ancestor of
x, i.e.,
\(x \in D_{x'}\), so that
\(x'\) was selected for survival in
\(\mathcal {E}_{\textsc {idf}}\) and not in
\(\mathcal {E}_{\textsc {fpf}}\), which implies that
\({\textsc {fpf}}(x') < {\textsc {idf}}(x')\). However, since
x is a possible descendant for
\(x'\), the computation of
\({\textsc {fpf}}(x')\) should have taken
\({\textsc {of}}(x)\) into account,
2 meaning that
\(x'\) should have survived in
\(\mathcal {E}_{\textsc {fpf}}\) after all, which contradicts the previous assumption.
\(\square\)
Of course, Thesis
1 is a purely theoretical argument as we cannot guarantee optimal choices in usually randomized evolutionary operators and productive fitness in general thus comes with the reasonable disadvantage that it cannot be fully computed in advance. But for a given, completed run of an evolutionary process, we can compute the factual
\({\textsc {fpf}}\) single individuals had a posteriori. There, we still do not make optimal random choices but just assume the ones made as given.
Still, we take Thesis 1 as hint that final productive fitness might be the right target to strive for. We argue that augmenting the objective fitness
\({\textsc {of}}\) (even with easily computable secondary fitness functions) may result in a fitness function which better approximates final productive fitness
\({\textsc {fpf}}\). In the following Sect.
4, we show empirically that (in the instances where it helps
3) diversity-based secondary fitness
\({\textsc {sf}}\) resembles the final productive fitness
\({\textsc {fpf}}\) of individuals much better than the raw objective function
\({\textsc {of}}\) does.
This connection not only explains why diversity-aware fitness functions fare better than the pure objective fitness but also poses a first step towards a description how to deliberately construct diversity-aware fitness functions, knowing that their ideal purpose is to approximate the not fully computable final productive fitness. Again, we refer to Gabor and Linnhoff-Popien (
2020) for more elaborate theoretical arguments.
Since we cannot estimate all possible futures for an evolutionary process, we provide empirical evidence in favor of Thesis
2 using a a posteriori approximation: Given an already finished evolutionary process, we compute the
\({\textsc {fpf}}\) values given only those individuals that actually came into being during that single evolutionary process (instead of using all possible descendants). We argue that this approximation is valid because
if the evolutionary process was somewhat successful, then all individuals’ descendants should be somewhat close to their ideal descendants
most likely.
4 Note that the reverse property is not true (i.e., even in a bad run, individuals still aim to generate better descendants, not worse), which is why our approximation does not permit any statements about augmented fitness that does not aid the evolutionary process.
Diversity has been a central topic of research in evolutionary algorithms (den Heijer and Eiben
2012; Morrison and De Jong
2001; Toffolo and Benini
2003; Ursem
2002). Its positive effect on the evolutionary process has often been observed there, but rarely been interpreted beyond a biological metaphor, i.e., “diversity is a key element of the biological theory of natural selection and maintaining high diversity is supposed to be generally beneficial” (Corno et al.
2005).
Without much concept of what to look for in a mechanism for diversity-awareness, lots of variants have spawned in research. Instead of repeating them, we would like to point out a few resources for a comprehensive overview: Burke et al. (
2004), among others like Brameier and Banzhaf (
2002 or McPhee and Hopper (
1999) discuss various means to measure and promote diversity in genetic programming, which for the most part should apply to all evolutionary algorithms. They also provide an extensive analysis of the connection between diversity and achieved fitness, but do not define productive fitness or a similar notion. A more recent comprehensive overview of means to describe and enable diversity has been put together by Squillero and Tonda (
2016), also providing a taxonomy on various classes of approaches to diversity. Gabor et al. (
2018) provide a quantitative analysis of various means of maintaining inheritance-based diversity on standard domains like the ones we used in this paper. Regarding the multitude of diversity mechanisms present in research, however, it is most important to also point out the results of Wineberg and Oppacher (
2003), who most drastically show that “all [notions of diversity] are restatements or slight variants of the basic sum of the distances between all possible pairs of the elements in a system” and suggest that “experiments need not be done to distinguish between the various measures”, a point which we already built upon in our evaluation.
Note that a variety of “meta-measurements” for the analysis of evolutionary processes exist:
Effective fitness measures the minimum fitness required for an individual to increase in dominance at a given generation (when in competition with the other individuals) (Stephens
1999). It is related to
reproductive fitness, which is the probability of an individual to successfully produce offspring (Hu and Banzhaf
2010). Both occur at the foundation of productive fitness, but do not include the (computationally overly expensive) diachronical analysis of the overall effect for the end result. Our approach is also comparable to
entropy-based diversity preservation (Squillero and Tonda
2008), where the positive effect of certain individuals on the population’s entropy is measured and preserved in order to deliberately maintain higher entropy levels. By contrast, our approach is based on the fitness values only (without the need to look into the individuals beyond their genealogical relationships) and thus also cannot be used directly as a secondary goal in evolution but purely as a tool of a posteriori analysis on the effectiveness of other secondary goal definitions.
When we construct the “optimal evolutionary process”, we construct a dynamic optimization problem from a traditionally static one. It is interesting that specifically dynamic or
on-line (Bredeche et al.
2009) evolutionary algorithms have been shown to benefit from increased diversity especially when facing changes in their fitness functions (Gabor et al.
2018; Grefenstette
1992). While this is obviously intuitive as more options in the population allow for higher coverage of possible changes, the reverse connection (pointed to by this work) is not stated there, i.e., that diversity in static domains may work because even for static domains the optimization process is inherently dynamic to some degree.