Open Access 29042021
Productive fitness in diversityaware evolutionary algorithms
Published in: Natural Computing  Issue 3/2021
Abstract
In evolutionary algorithms, the notion of diversity has been adopted from biology and is used to describe the distribution of a population of solution candidates. While it has been known that maintaining a reasonable amount of diversity often benefits the overall result of the evolutionary optimization process by adjusting the exploration/exploitation tradeoff, little has been known about what diversity is optimal. We introduce the notion of productive fitness based on the effect that a specific solution candidate has some generations down the evolutionary path. We derive the notion of final productive fitness, which is the ideal target fitness for any evolutionary process. Although it is inefficient to compute, we show empirically that it allows for an a posteriori analysis of how well a given evolutionary optimization process hit the ideal exploration/exploitation tradeoff, providing insight into why diversityaware evolutionary optimization often performs better.
1 Introduction
Evolutionary algorithms are a widely used type of stochastic optimization that mimics biological evolution in nature. Like any other metaheuristic optimization algorithm (Brown et al. 2005; Conti et al. 2018), they need to maintain a balance on the exploration/exploitation tradeoff in their search process: High exploration bears the risk to miss out on optimizing the intermediate solutions to the fullest; high exploitation bears the risk to miss the global optimum and get stuck in a suboptimal part of the search space. Analogous to biological evolution, diversity within the population of solution candidates has been identified as a central feature to adjust the exploration/exploitation tradeoff. Many means to maintain the diversity of the population throughout the process of evolution have been developed in literature; comprehensive overviews are provided by Squillero and Tonda (2016) and Gabor et al. (2018), for example.
For problems with complex fitness landscapes, it is well known that increased exploration (via increased diversity) yields better overall results in the optimization, even when disregarding any diversity goal in the final evaluation (Ursem 2002; Toffolo and Benini 2003). However, this gives rise to a curious phenomenon: By augmenting the fitness function and thus making it match the original objective function less, we actually get results that optimize the original objective function more. This implies that any evolutionary algorithm does not immediately optimize for the fitness function it uses (but instead optimizes for a slightly different implicit goal). Furthermore, to really optimize for a given objective function, one should ideally use a (slightly) different fitness function for evolution. In this paper, we introduce final productive fitness as a theoretical approach to derive the ideal fitness function from a given objective function.
Advertisement
We see that final productive fitness cannot feasibly be computed in advance. However, we show how to approximate it a posteriori, i.e., when the optimization process is already finished. We show that the notion of final productive fitness is sound by applying it to the special case of diversityaware evolutionary algorithms, which (for our purposes) are algorithms that directly encode a strife for increased diversity by altering the fitness of the individuals. By running these on various benchmark problems, we empirically show that diversityaware evolutionary processes might just approximate final productive fitness more accurately than an evolutionary process using just the original objective. We show that the fitness alteration performed by these algorithms, when it improves overall performance, does so while (perhaps because) it better approximates final productive fitness. We thus argue that the notion of final productive fitness for the first time provides a model of how diversity is beneficial to evolutionary optimization, which has been called for by various works in literature:It should be noted that the approach presented in this paper merely provides a new perspective on exploration/exploitation in evolutionary algorithms and a new method of analyzing the effects of diversity. It is up to future works to derive new means to actively promote diversity from this analysis.

“One of the urgent steps for future research work is to better understand the influence of diversity for achieving good balance between exploration and exploitation.” (Črepinšek et al. 2013),

“This tendency to discover both quality and diversity at the same time differs from many of the conventional algorithms of machine learning, and also thereby suggests a different foundation for inferring the approach of greatest potential for evolutionary algorithms.” (Pugh et al. 2016),

“However, the fragmentation of the field and the difference in terminology led to a general dispersion of this important corpus of knowledge in many small, hardtotrack research lines” and, “[w]hile diversity preservation is essential, the main challenge for scholars is devising general methodologies that could be applied seamlessly [...]” (Squillero and Tonda 2016).
2 Foundations
For this paper, we assume an evolutionary process (EP) to be defined as follows: Given a fitness function \(f : \mathcal {X} \rightarrow [0; 1] \subset \mathbb {R}\) for an arbitrary set \(\mathcal {X}\) called the search space, we want to find an individual \(x \in \mathcal {X}\) with the best fitness f(x). For a maximization problem, the best fitness is that of an individual x so that \(f(x) \ge f(x') \;\forall x' \in \mathcal {X}\). For a minimization problem, the best fitness is that of an individual x so that \(f(x) \le f(x') \;\forall x' \in \mathcal {X}\). Note that we normalize our fitness space on \([0; 1] \subset \mathbb {R}\) for all problems for ease of comparison. Whenever the maximum and minimum fitness are bounded, this can be done without loss of generality.
Usually, the search space \(\mathcal {X}\) is too large or too complicated to guarantee that we can find the exact best individual(s) using standard computing models (and physically realistic time). Thus, we take discrete subsets of the search space \(\mathcal {X}\) via sampling and iteratively improve their fitness. An evolutionary process \(\mathcal {E}\) over g generations, \(g \in \mathbb {N}\), is defined as \(\mathcal {E} = \langle \mathcal {X}, e, f, (X_i)_{i < g} \rangle\). \(\mathcal {X}\) is the search space. \(e : 2^\mathcal {X} \rightarrow 2^\mathcal {X}\) is the evolutionary step function so that \(X_{i+1} = e(X_i) \;\forall i \ge 0\). As defined above, \(f: \mathcal {X} \rightarrow [0; 1] \subset \mathbb {R}\) is the fitness function. \((X_i)_{i < g}\) is a series of populations so that \(X_i \subseteq \mathcal {X} \;\forall i\) and \(X_0\) is the initial population. Note that as the evolutionary step function e is usually nondeterministic, we define \(E(X) = \{X'  X' = e(X)\}\) to be the set of all possible next populations.
Advertisement
We use the following evolutionary operators:The operators \(\textit{rec}, \textit{mut}, \textit{mig}\) can be applied to a population X by choosing individuals from X to fill their parameters (if any) according to some selection scheme \(\sigma : 2^\mathcal {X} \rightarrow 2^\mathcal {X}\) and adding their return to the population. For example, we allow to write \(\textit{mut}_\sigma (X) = X \cup \{ \; \textit{mut}(x') \;  \; x'\in \sigma (X) \; \}\). Note that all children are added to the population and do not replace their parents in this formulation.

The recombination operator \(\textit{rec} : \mathcal {X} \times \mathcal {X} \rightarrow \mathcal {X}\) generates a new individual from two individuals.

The mutation operator \(\textit{mut} : \mathcal {X} \rightarrow \mathcal {X}\) alters a given individual slightly to return a new one.

The migration operator \(\textit{mig} : \mathcal {X}\) generates a random individual \(\textit{mig}() \in \mathcal {X}\).

The (survivors) selection operator \(\textit{sel} : 2^\mathcal {X} \times \mathbb {N} \rightarrow 2^\mathcal {X}\) returns a new population \(X' = \textit{sel}(X, n)\) given a population \(X \subseteq \mathcal {X}\), so that \(X' \le n\).
For any evolutionary process \(\mathcal {E} = \langle \mathcal {X}, e, f, (X_i)_{i < g} \rangle\) and selection schemes \(\sigma _1, \sigma _2, \sigma _3\) we assume thatUsually, we assume that an evolutionary process fulfills its purpose if the best fitness of the population tends to become better over time, i.e., given a sufficiently large amount of generations \(k \in \mathbb {N}\), it holds for maximization problems that \(\max _{x \in X_i} f(x) < \max _{x \in X_{i+k}} f(x)\). We define the overall result of an evolutionary process \(\mathcal {E} = \langle \mathcal {X}, e, f, (X_i)_{i < g} \rangle\) with respect to a fitness function \(\phi\) (which may or may not be different from the fitness f used during evolution) to be best value found and kept in evolution, i.e., for a maximizing objective \(\phi\) we defineNote that there are evolutionary processes which include a halloffame mechanism, i.e., are able to return the result fitnessHowever, we can derive the equality \(\mathcal {E}_\phi = \mathcal {E}_\phi\) when we assume elitism with respect to \(\phi\), i.e., \({{\,\mathrm{arg\,max}\,}}_{x \in X_i} \phi (x) \in X_{i+1}\) for all \(i=1,...,g\). Since it makes reasoning easier and hardly comes with any drawback for sufficiently large populations, we use elitist evolutionary processes (with respect to f) from here on.
$$\begin{aligned} X_{i+1} = e(X_i) = \textit{sel}\,(\textit{mig}_{\sigma _3}(\textit{mut}_{\sigma _2}(\textit{rec}_{\sigma _1}(X_i))), X_i). \end{aligned}$$
(1)
$$\begin{aligned} \mathcal {E}_\phi = \max _{x \in X_g} \phi (x).\end{aligned}$$
(2)
$$\begin{aligned} \mathcal {E}_\phi = \max _{i = 1, ..., g} \;\; \max _{x \in X_i} \;\;\phi (x).\end{aligned}$$
(3)
3 Approach
The central observation we build our analysis on is that in many cases the results of optimizing for a given objective function (called \({\textsc {of}}\)) can be improved by not using \({\textsc {of}}\) as a the fitness function f of the evolutionary process directly. Consequently, changing the fitness function f away from the true objective \({\textsc {of}}\) in some cases leads to better results with respect to the original objective function \({\textsc {of}}\). Note that this phenomenon extends beyond just heuristic optimization and is known as reward shaping in reinforcement learning, for example (Ng et al. 1999).
In evolutionary algorithms oftentimes a property called diversity is considered in addition to the objective function \({\textsc {of}}\) to improve the progress of the evolutionary process (Gabor et al. 2018; Squillero and Tonda 2016; Ursem 2002). In some way or the other, diversityenhancing evolutionary algorithms award individuals of the population for being different from other individuals in the population. While there are many ways to implement this behavior, like topologybased methods (Tomassini 2006), fitness sharing (Sareni and Krahenbuhl 1998), ensembling (Hart and Sim 2018), etc., we consider an instance of diversityenhancing evolutionary algorithms that is simpler to analyze: By quantifying the distance of a single individual to the population, we can define a secondary fitness \({\textsc {sf}}\) that rewards high diversity in the individual. This approach was shown by Wineberg and Oppacher (2003) to be an adequate general representation of most wellknown means of measuring diversity in a population.
In order to avoid the difficulties of multiobjective evolution, we can then define the augmented fitness function \({\textsc {af}}\) that incorporates both the objective fitness \({\textsc {of}}\) and the secondary fitness \({\textsc {sf}}\) into one fitness function to be used for the evolutionary process.
Definition 1
(Augmented Fitness) Given the objective fitness \({\textsc {of}}\), a diversityaware secondary fitness \({\textsc {sf}}\), and a diversity weight \(\lambda \in [0; 1] \subset \mathbb {R}\), we define the augmented fitness \({\textsc {af}}\) as
$$\begin{aligned} {\textsc {af}}(x) = (1  \lambda ) \cdot {\textsc {of}}(x) + \lambda \cdot {\textsc {sf}}(x). \end{aligned}$$
(4)
As is shown in Gabor et al. (2018) and Wineberg and Oppacher (2003) such a definition of the augmented fitness suffices to show benefits of employing diversity.
We can then define two evolutionary processes \(\mathcal {E}_{{\textsc {of}}} = \langle \mathcal {X}, e, {\textsc {of}}, (X_i)_{i < g} \rangle\) and \(\mathcal {E}_{{\textsc {af}}} = \langle \mathcal {X}, e, {\textsc {af}}, (X'_i)_{i < g} \rangle\). We observe the curious phenomenon that in many cases the augmented fitness \({\textsc {af}}\) better optimizes for \({\textsc {of}}\) than using \({\textsc {of}}\) itself, formallywhich raises the following question: If \({\textsc {of}}\) is not the ideal fitness function to optimize for the objective \({\textsc {of}}\), what is?
$$\begin{aligned} \mathcal {E}_{{\textsc {of}}}_{{\textsc {of}}} < \mathcal {E}_{{\textsc {af}}}_{{\textsc {of}}},\end{aligned}$$
(5)
Given a sequence of populations \((X_i)_{i < g}\) spanning over multiple generations \(i=1, ..., g\) we can write down what we actually want our population to be like inductively starting from the last generation g: The net benefit of \(X_g\) to our (maximizing) optimization process is exactlyas this population will not evolve any further and thus the best individual within \(X_g\) is what we are going to be stuck with as the result of the optimization process.
$$\begin{aligned} \mathcal {E}_{{\textsc {of}}} = \max _{x \in X_g} {\textsc {of}}(x)\end{aligned}$$
(6)
Note that the individuals of \(X_{g1}\) already contribute differently to the result of the optimization process: From the perspective of generation \(g1\) the overall optimization result iswhere the followup generation \(X_g(x)\) is any^{1} population from \(\{ X_g \;  \; X_g \in E(X_{g1}) \wedge x \in X_g \}\), i.e., the possible next populations where x survived.
$$\begin{aligned} \max _{x \in X_{g1}} \;\; \max _{x' \in X_g(x)} {\textsc {of}}(x')\end{aligned}$$
(7)
Intuitively, the contribution of the the secondtolast generation \(X_{g1}\) to the result of the optimization process stems from the objective fitness \({\textsc {of}}\) that this generation’s individuals can still achieve in the final generation \(X_g\). Generally, this does not fully coincide with the application of the objective function \({\textsc {of}}\) in said generation:That means: While rating individuals according to their objective fitness \({\textsc {of}}\) in the last generation of the evolutionary process is adequate, the actual benefit of the individual x to the optimization result and the value of \({\textsc {of}}(x)\) may diverge more the earlier we are in the evolutionary process. Accordingly, at the beginning of an evolutionary process, the objective fitness \({\textsc {of}}\) might not be a good estimate of how much the individuals will contribute to the process’s return with respect to \({\textsc {of}}\) at the end of the optimization process. Still, standard optimization techniques often use the objective fitness \({\textsc {of}}\) as a (sole) guideline for the optimization process.
$$\begin{aligned} \max _{x \in X_{g1}} \; \; \max _{x' \in X_g(x)} {\textsc {of}}(x') \;\;\; \ne \;\;\; \max _{x \in X_{g1}} {\textsc {of}}(x) \end{aligned}$$
(8)
Instead, we ideally want to make every decision (mutation, recombination, survival, ...) at every generation \(X_i\) with the ideal result for the following generations \(X_{i+1}, X_{i+2}, ...\) and ultimately the final generation \(X_g\) in mind. We call this the optimal evolutionary process. Obviously, to make the optimal decision early on, we would need to simulate all the way to the end of the evolution, including all the followup decisions. This renders optimal evolution infeasible as an algorithm. However, we can use it for a posteriori analysis of what has happened within a different evolutionary process. In order to do so, we need to give a fitness function for the optimal process (as it obviously should not be \({\textsc {of}}\)).
Instead, we formalize the benefit to the optimization process discussed above and thus introduce the notion of productive fitness. But first, we need a simple definition on the intergenerational relationships between individuals.
Definition 2
(Descendants) Given an individual x in the population of generation i, \(x \in X_i,\) of an evolutionary process. All individuals \(x' \in X_{i+1}\) so that \(x'\) resulted from x via a mutation operator, i.e., \(x' = \text {mut}(x)\), or a recombination operator with any other parent, i.e., there exists \(y \in X_i\) so that \(x'= \text {rec}(x, y)\), are called direct descendants of x. Further given a series of populations \((X_i)_{0< i < g}\) we define the set of all descendants \(D_x\) as the transitive hull on all direct descendants of x.
We can now use this relationship to assign the benefit that a single individual has had to the evolution a posteriori. For this, we simply average the fitness of all its surviving descendants.
Definition 3
(Productive Fitness) Given an individual x in the population of generation i, \(x \in X_i\), of an evolutionary process. Let \(D_x \subseteq \mathcal {X}\) be the set of all descendants from x. The productive fitness after n generations or nproductive fitness is the average objective fitness of x’s descendants, writtenNote that in case the individual x has no descendants in n generations, we set its productive fitness \({\textsc {pf}}_n(x)\) to a worst case value w, which in our case of bounded fitness values is 0 for maximizing optimization processes and 1 for minimizing optimization processes.
$${\text{PF}}_{n} (x) = \left\{ {\begin{array}{*{20}c} {{\text{avg}}_{{x^{\prime} \in D_{x} \cap X_{{i + n}} }} {\text{OF}}(x^{\prime})} & {if\;D_{x} \cap X_{{i + n}} \ne \emptyset } \\ w & {otherwise.} \\ \end{array} } \right.$$
(9)
We argue that the productive fitness \({\textsc {pf}}\) is better able to describe the actual benefit the individual brings to the optimization process, as represented by what parts of the individual still remain inside the population in a few generations. Note that our notion of productive fitness is rather harsh in two points:We leave the analysis of the effects of the discussed parameters to future work. Note that for now, our notion of productive fitness only covers a fixed horizon into the future. We can trivially extend this definition to respect the final generation no matter what generation the current individual is from:

We only take the average of all descendants’ fitness. One could argue that we may want a more optimistic approach where we might reward the individual for the best offspring it has given rise to. However, we argue that every bad individual binds additional resources for eliminating it down the road and thus a low target accuracy should actively be discouraged.

When the line of an individual dies out completely, we assign the worst possible fitness. Arguments could be made that even dead lines contribute to the search process by ruling out unpromising areas while, e.g., increasing the diversity scores of individuals in more promising areas of the search space. Still, we do count any however distant descendants, so even small contributions to the final population avoid the penalty w.
Definition 4
(Final Productive Fitness) Given an individual x in the population of generation i, \(x \in X_i\), of an evolutionary process of g generations in total. The final productive fitness of x is the fitness of its descendants in the final generation, i.e., \({\textsc {fpf}}(x) = {\textsc {pf}}_{gi}(x)\).
We argue that final productive fitness is able to describe what the fitness function of an optimal evolutionary process looks like: Every evaluation is done in regard to the contribution to the final generation, i.e., the ultimate solution returned by the search process.
Thesis 1
When rolling the ideal choices in all randomized evolutionary operators, final productive fitness \({\textsc {fpf}}\) is the optimal fitness function for evolutionary processes, i.e., an evolutionary process yields the best results when it optimizes for \({\textsc {fpf}}\) at every generation.
We sketch a short argument in favor of Thesis 1. For a more indepth discussion, see Gabor and LinnhoffPopien (2020). Let \(\mathcal {E}_{\textsc {fpf}}= \langle \mathcal {X}, e, {\textsc {fpf}}, (X_i^{{\textsc {fpf}}})_{i < g} \rangle\) be an evolutionary process using final productive fitness \({\textsc {fpf}}\). Let \(\mathcal {E}_{\textsc {idf}}= \langle \mathcal {X}, e, {\textsc {idf}}, (X_i^{\textsc {idf}})_{i < g} \rangle\) be an evolutionary process using a different (possibly more ideal) fitness \({\textsc {idf}}\). Let \(X_0^{{\textsc {fpf}}} = X_0^{\textsc {idf}}\). We assume thatSince Eq. 10 implies that at least \(X_g^{{\textsc {fpf}}} \ne X_g^{{\textsc {idf}}}\), there is an individual \(x \in X_g^{{\textsc {idf}}}\) so that \(x \notin X_g^{{\textsc {fpf}}}\) and \({\textsc {of}}(x) > \max _{y \in X_g^{{\textsc {fpf}}}} {\textsc {of}}(y)\). Since both \(\mathcal {E}_{\textsc {fpf}}\) and \(\mathcal {E}_{\textsc {idf}}\) use the same evolutionary step function e except for the used fitness, their difference regarding x needs to stem from the fact that there exists an individual \(x'\) that is an ancestor of x, i.e., \(x \in D_{x'}\), so that \(x'\) was selected for survival in \(\mathcal {E}_{\textsc {idf}}\) and not in \(\mathcal {E}_{\textsc {fpf}}\), which implies that \({\textsc {fpf}}(x') < {\textsc {idf}}(x')\). However, since x is a possible descendant for \(x'\), the computation of \({\textsc {fpf}}(x')\) should have taken \({\textsc {of}}(x)\) into account,^{2} meaning that \(x'\) should have survived in \(\mathcal {E}_{\textsc {fpf}}\) after all, which contradicts the previous assumption. \(\square\)
$$\begin{aligned} \max _{x \in X_g^{{\textsc {fpf}}}} {\textsc {of}}(x) < \max _{x \in X_g^{{\textsc {idf}}}} {\textsc {of}}(x).\end{aligned}$$
(10)
Of course, Thesis 1 is a purely theoretical argument as we cannot guarantee optimal choices in usually randomized evolutionary operators and productive fitness in general thus comes with the reasonable disadvantage that it cannot be fully computed in advance. But for a given, completed run of an evolutionary process, we can compute the factual \({\textsc {fpf}}\) single individuals had a posteriori. There, we still do not make optimal random choices but just assume the ones made as given.
Still, we take Thesis 1 as hint that final productive fitness might be the right target to strive for. We argue that augmenting the objective fitness \({\textsc {of}}\) (even with easily computable secondary fitness functions) may result in a fitness function which better approximates final productive fitness \({\textsc {fpf}}\). In the following Sect. 4, we show empirically that (in the instances where it helps^{3}) diversitybased secondary fitness \({\textsc {sf}}\) resembles the final productive fitness \({\textsc {fpf}}\) of individuals much better than the raw objective function \({\textsc {of}}\) does.
Thesis 2
When a diversityaware augmented fitness function \({\textsc {af}}\) is aiding the evolutionary optimization process with respect to an objective fitness \({\textsc {of}}\), it is doing so by approximating the final productive fitness \({\textsc {fpf}}\) of a converged evolutionary process in a more stable way (i.e., more closely when disregarding the respective scaling of the fitness functions) throughout the generations of the evolutionary process.
This connection not only explains why diversityaware fitness functions fare better than the pure objective fitness but also poses a first step towards a description how to deliberately construct diversityaware fitness functions, knowing that their ideal purpose is to approximate the not fully computable final productive fitness. Again, we refer to Gabor and LinnhoffPopien (2020) for more elaborate theoretical arguments.
Since we cannot estimate all possible futures for an evolutionary process, we provide empirical evidence in favor of Thesis 2 using a a posteriori approximation: Given an already finished evolutionary process, we compute the \({\textsc {fpf}}\) values given only those individuals that actually came into being during that single evolutionary process (instead of using all possible descendants). We argue that this approximation is valid because if the evolutionary process was somewhat successful, then all individuals’ descendants should be somewhat close to their ideal descendants most likely.^{4} Note that the reverse property is not true (i.e., even in a bad run, individuals still aim to generate better descendants, not worse), which is why our approximation does not permit any statements about augmented fitness that does not aid the evolutionary process.
4 Experiments
For all experiments, we run an evolutionary process as defined in Sect. 2 with a mutation operator \(\textit{mut}\) that adds a (possibly negative) random value to one dimension of individual, applied with rate 0.1 to all individuals at random. For \(\textit{rec}\) we apply random crossover with rate 0.3 for a single individual and a randomly chosen mate. We apply \(\textit{mig}\) with a rate of 0.1 (Gabor et al. 2018). Following Wineberg and Oppacher (2003) and the results in Gabor et al. (2018), we focus on a Manhattan distance function for the secondary fitness; we also plot evolutionary processes using fitness sharing with parameter \(\alpha = 2.0\) and dissimilarity threshold \(\sigma = n\), where n is the dimensionality of the problem (Sareni and Krahenbuhl 1998), or inherited fitness with inheritance weight \(\kappa = 0.5\) (Chen et al. 2002; Gabor et al. 2018) for comparison, since both approaches also use an adapted fitness function to promote diversity.^{5} The selection operator \(\textit{sel}\) is a simple rankbased cutoff in the shown evolutionary processes. Cutoff with protection for new individuals as well as roulette wheel selection was also tested without yielding noticeably different results.
All code that produced the results of this paper is available at github.com/thomasgabor/nacoevolib.
4.1 Pathfinding
We start with the pathfinding problem, which was shown to greatly benefit from employing diversity in the optimization process (Gabor et al. 2018): Given a room of dimensions \(1 \times 1\), we imagine a robot standing at position (0.5, 0.1). It needs to reach a target area at the opposite side of the room. See Fig. 1 for an illustration. The room also features a huge obstacle in the middle and thus the robot needs to decide on a way around it. The agent can move by performing an action \(a \in \{(\delta {x}, \delta {y})  0.33< \delta {x}< 0.33, 0.33< \delta {y} < 0.33\}\). A single solution candidate consists of \(n=5\) actions \(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le n}\). It achieves a reward of \(\frac{1}{n} = 0.2\) every time it stays within the target area between steps, i.e., its fitness is given via \({\textsc {of}}(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le m}) = r(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le m}, (0.5, 0.1))\) whereThe pathfinding problem lends itself to the application of diversity, as the optimization process in most cases first strikes a local optimum where it reaches the target area sometime by accident (and most probably towards the end of its steps). It then needs to switch to the global optimum where the first three steps are as goaldirected as possible and the last two steps are very small in order to stay within the target area.
$$\begin{aligned}&r(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le m}, (x, y)) = \nonumber \\&r(\langle (\delta {x}_i, \delta {y}_i)\rangle _{2 \le i \le m}, (x + \delta {x}_1, y + \delta {y}_1)) + t(x,y) \end{aligned}$$
(11)
$$\begin{aligned}&\text { with } r(\langle \rangle , (x, y)) = t(x,y) \end{aligned}$$
(12)
$$\begin{aligned}&\quad \text { and } t(x,y) = {\left\{ \begin{array}{ll} \frac{1}{n} &{} \textit{ if } 0.4 \le x \le 0.6 \wedge 0.8 \le y \le 1.0\\ 0 &{} \textit{ otherwise.}\end{array}\right. } \end{aligned}$$
(13)
×
×
We now compare a standard evolutionary algorithm given only the objective function \({\textsc {of}}(x) = f(x)\) to a diversityaware evolutionary algorithm using the Manhattan distance on the solution candidate structure as a secondary fitness function, i.e., \({\textsc {af}}(x) = (1  \lambda ) \cdot {\textsc {of}}(x) + \lambda \cdot {\textsc {sf}}(x)\) as given in Definition 1Note that \(\sigma _4\) is a selection function that randomly selects 10 individuals from the population X. We use it to reduce the computational cost of computing the pairwise distance for all individuals in the population. Its admissibility for approximating the full pairwise distance was shown in Gabor and Belzner (2017). Just as we normalized the fitness function f to \([0; 1] \subset \mathbb {R}\) we also normalize the secondary fitness \({\textsc {sf}}\) to the same range via division by the maximum Manhattan distance between two individuals, i.e., 2n, to make the combination easier to understand. For now, we set \(\lambda =0.4\), which we discuss later.
$$\begin{aligned}&\text {where } {\textsc {sf}}(x) = \frac{1}{2n} \cdot {{\,\mathrm{avg}\,}}_{x' \in \sigma _4(X)} \textit{manhattan}(x, x') \end{aligned}$$
(14)
$$\begin{aligned}&\text { and } \textit{manhattan}(\langle (\delta {x}_i, \delta {y}_i)\rangle _{1 \le i \le m}, \nonumber \\&\quad \langle (\delta {x}_i', \delta {y}_i')\rangle _{1 \le i \le m}) =\nonumber \\&\quad \sum _{i = 1}^m \delta {x}_i  \delta {x}_i' + \delta {y}_i  \delta {y}_i'. \end{aligned}$$
(15)
Each evolution was run 30 times for 1500 generations each, using a population size of 50. Figure 2a shows the best fitness achieved per generation for all tested approaches. We see that (especially distancebased) diversityaware evolution produces much better objective results. Figure 2b shows the separate diversity score \({\textsc {sf}}\) maintained by the best individual, which can only be computed in a meaningful way for Manhattan diversity. In Fig. 2c the standard approach shows the same plot as before since its fitness is not augmented. For all other approaches we plot the augmented fitness \({\textsc {af}}\) that is actually used for selection. We see that due to the combination of distance and objective fitness, Manhattandiverse evolution starts higher but climbs slower than the respective objective fitness. Fitness sharing results in very small absolute values for fitness but climbs up nonetheless.
From the already run evolutionary processes, we can compute the final productive fitness as given in Definitions 3 and 4 a posteriori. Figure 2d shows the maximum \({\textsc {fpf}}\) per generation. We see that Manhattandiverse and inherited fitness maintain a rather continuous lineage from the initial population to the best solution in the final generation as the final fitness propagates to the final productive fitness of very early generations. This behavior is rather unsurprising but illustrates the notion of the final productive fitness that measures the individuals’ impact in the final result.
For Fig. 2e we compute the perhaps most interesting measurement: This plot shows for each population X in a given generation the result of \({{\,\mathrm{avg}\,}}_{x \in X} {\textsc {af}}(x)  {\textsc {fpf}}(x)\), i.e., the average difference between the augmented fitness and the final productive fitness per individual. Thus, we get to assess how well the augmented fitness approximates the final productive fitness. There are a few observations to be made: To further elaborate on that last point, we consider Fig. 2: It shows the average absolute value of change over 150generationswide windows of the \({\textsc {af}}(x)  {\textsc {fpf}}(x)\) metric used in Fig. 2e. The plot was smoothed using a convolution kernel \(\langle 1,\ldots , 1 \rangle\) of size 25. Roughly speaking, we can see the slope of the plots in Fig. 2e here. In this plot, good evolutionary processes should maintain rather low values according to Thesis 2. We can observe that Manhattandiverse evolution maintains the lowest values almost throughout the entire evolution. While fitness sharing shows increases and decreases in matching the \({\textsc {fpf}}\) at a higher level than Manhattan diversity, inherited fitness shows a huge spike in the beginning (as does the standard approach), thus making a much less stable match for the \({\textsc {fpf}}\). As proposed by Thesis 2, the match between \({\textsc {af}}\) and \({\textsc {fpf}}\) roughly corresponds to the quality of the overall result of the evolutionary process.
1.
Towards the last few generations, we notice a rapid spike in the \({\textsc {fpf}}\) as the amount of descendants in the final generation to be considered for the \({\textsc {fpf}}\) decreases fast.
2.
The actual value of the distance (i.e., the height of the line) is irrelevant to the analysis of Thesis 2 and largely determined by the setting of \(\lambda\).
3.
Throughout most of the plot, the Manhattandiverse evolution maintains a relatively stable level, i.e., the augmented fitness \({\textsc {af}}\) approximates the final productive fitness \({\textsc {fpf}}\) throughout the evolution. The less stable evolutions also show a worse overall result.
As mentioned earlier, we also further analyzed the importance of the setting for \(\lambda\) for the evolution. Figure 3 shows the impact of \(\lambda\) on the best results generated by the evolution. \(\lambda = 0\) equals the standard evolution in all previous plots. Unsurprisingly, we see that some amount of diversityawareness improves the results of evolution but setting \(\lambda\) too high favors diversity over the actual objective fitness and thus yields very bad results. We want to add that more intricate version of Manhattanbased augmented fitness \({\textsc {af}}\) might aim to adjust the \(\lambda\) parameter during evolution just as inherited fitness and fitness sharing might want to adjust their parameters. For these experiments, we chose a static parameter setting for simplicity.
×
4.2 The route planning problem
The route planning problem is a discrete optimization problem with a similar motivation as the pathfinding problem. Again, we adapt the problem and its description from Gabor et al. (2018).
×
A robot needs to perform \(n=12\) different tasks in a fixed order by visiting relevant workstations. Each workstation can perform exactly one of the tasks and for each task, there are \(o=5\) identical workstations to choose from. Accordingly, a solution candidate is a vector \(\langle w_1,\ldots , w_n \rangle\) with \(w_i \in \{1, ..., o\}\) for all \(1 \le i \le n\). See Fig. 4 for an illustration using a smaller setting. A single workstation W can be identified by a tuple of its task type and its number, i.e., \(W = (i, k)\) for some \(1 \le i \le n\) and \(1 \le k \le o\). To mimic various means of transport, the distance \(D(W, W')\) between every two workstations \(W = (i, k)\) and \(W'= (j, l)\) is randomized individually within a range of \([0, \frac{1}{n}] \subset \mathbb {R}\). Note that this (most likely) gives rise to a noneuclidean space the robot is navigating. The objective fitness for this minimization problem is given viaAgain, we from here also construct an augmented fitness \({\textsc {af}}(x) = (1  \lambda ) \cdot {\textsc {of}}(x) + \lambda \cdot {\textsc {sf}}(x)\) (cf. Definition 1) but now use the Hamming distance as a secondary fitness so that
$$\begin{aligned}&{\textsc {of}}(\langle w_1, ..., w_n\rangle ) \nonumber \\&\quad = \sum _{i=1}^{n1} D((i, w_i), (i+1,w_{i+1})). \end{aligned}$$
(16)
$$\begin{aligned}&{\textsc {sf}}(x) = \frac{1}{2n} \cdot {{\,\mathrm{avg}\,}}_{x' \in \sigma _4(X)} \textit{hamming}(x, x') \end{aligned}$$
(17)
$$\begin{aligned}&\text { where } \textit{hamming}(\langle w_1,\ldots , w_n \rangle , \langle w_1',\ldots , w_n' \rangle ) = \sum _{i = 1}^{n} h(w_i, w_i') \end{aligned}$$
(18)
$$\begin{aligned}&\text { and } h(w, w') = {\left\{ \begin{array}{ll} 0 &{} \text { if } w = w'\\ 1 &{} \text { otherwise.} \end{array}\right. } \end{aligned}$$
(19)
×
Besides, we apply the same evolutionary processes as in Sect. 4.1 but the parameter search shown in Fig. 5 now recommended \(\lambda = 0.25\) for weighting now Hammingbased diversity.^{6} We evolve 20 independent populations of size 50 for 400 generations and plot the same data we have seen before: Fig. 6a shows the best fitness achieved in evolution. Inherited fitness takes a lot more time but eventually almost reaches the level of Manhattandiversity. However, both methods yield similarly solid results as fitness sharing or the naïve algorithm. This is mirrored by all methods showing quite stable behavior in Figs. 6e and 6f with the standard approach showing the highest fall within the first few generations as it matches \({\textsc {fpf}}\) the least. However, the results for all evolutions are very close together for the route planning problem.
×
4.3 Schwefel
Finally, we consider one^{7} of the canonical benchmark problems for evolutionary algorithms. The implementation of the Schwefel problem is taken from Rainville et al. (2012) while our study on the impact of diversity again follows experiments performed in Gabor et al. (2018).
The original fitness function is given aswith \(x_1, ..., x_n \in [500; 500] \subset \mathbb {R}\) where \(n=8\) is the dimensionality we use in our experiments (Rainville et al. 2012).^{8} The resulting function is illustrated in Fig. 7. The Schwefel problem is a minimization problem looking for an x so that \(\textit{schwefel}(x) = 0\). Again, we normalize the result values defining \({\textsc {of}}(x) = \frac{1}{4000} \cdot \textit{schwefel}(x)\). Manhattan distance uses diversity weight \(\lambda = 0.3\) as suggested by Fig. 8.
$$\begin{aligned} \textit{schwefel}(\langle x_1, ..., x_n\rangle ) =418.9828872724339 \cdot n  \sum _{i=1}^n x_i \cdot \sin (\sqrt{x_i}) \end{aligned}$$
(20)
×
×
We run the same kind of analysis as for the previous problems and plot the same data in Fig. 9. These experiments were performed on populations of size 50 evolving for 400 generations. Figure 9a shows a mixed picture: The Manhattandiverse evolution again outperforms the standard approach, but inherited fitness hardly yields any benefit while fitness sharing performs worse than the standard approach. In Fig. 9d we see a different picture than before: All final productive fitness values are not as stable anymore but vary throughout the evolution, suggesting that solutions to the Schwefel problem are more influenced by migration (and thus show no continuous lineage) than for the other problems considered. Fig. 9e shows the \({\textsc {af}} {\textsc {fpf}}\) metric, which measures the individual difference between augmented fitness and final productive fitness. We observe very clearly that, after a short starting phase, this distance remains much more stable for the Manhattandiverse evolution than any other approach, indicating that the augmented fitness is approximating the final productive fitness. Note again that in the last few generations, computing the final productive fitness has limited meaning. In Fig. 9f this behavior shows clearly as Manhattandiverse evolution forms a line at the very bottom of the plot with almost no change in how \({\textsc {af}}\) matches \({\textsc {fpf}}\). All other approaches, which perform noticeably worse, also show a much more erratic pattern in how well their augmented fitness matches the final productive fitness.
×
5 Related work
Diversity has been a central topic of research in evolutionary algorithms (den Heijer and Eiben 2012; Morrison and De Jong 2001; Toffolo and Benini 2003; Ursem 2002). Its positive effect on the evolutionary process has often been observed there, but rarely been interpreted beyond a biological metaphor, i.e., “diversity is a key element of the biological theory of natural selection and maintaining high diversity is supposed to be generally beneficial” (Corno et al. 2005).
Without much concept of what to look for in a mechanism for diversityawareness, lots of variants have spawned in research. Instead of repeating them, we would like to point out a few resources for a comprehensive overview: Burke et al. (2004), among others like Brameier and Banzhaf (2002 or McPhee and Hopper (1999) discuss various means to measure and promote diversity in genetic programming, which for the most part should apply to all evolutionary algorithms. They also provide an extensive analysis of the connection between diversity and achieved fitness, but do not define productive fitness or a similar notion. A more recent comprehensive overview of means to describe and enable diversity has been put together by Squillero and Tonda (2016), also providing a taxonomy on various classes of approaches to diversity. Gabor et al. (2018) provide a quantitative analysis of various means of maintaining inheritancebased diversity on standard domains like the ones we used in this paper. Regarding the multitude of diversity mechanisms present in research, however, it is most important to also point out the results of Wineberg and Oppacher (2003), who most drastically show that “all [notions of diversity] are restatements or slight variants of the basic sum of the distances between all possible pairs of the elements in a system” and suggest that “experiments need not be done to distinguish between the various measures”, a point which we already built upon in our evaluation.
Note that a variety of “metameasurements” for the analysis of evolutionary processes exist: Effective fitness measures the minimum fitness required for an individual to increase in dominance at a given generation (when in competition with the other individuals) (Stephens 1999). It is related to reproductive fitness, which is the probability of an individual to successfully produce offspring (Hu and Banzhaf 2010). Both occur at the foundation of productive fitness, but do not include the (computationally overly expensive) diachronical analysis of the overall effect for the end result. Our approach is also comparable to entropybased diversity preservation (Squillero and Tonda 2008), where the positive effect of certain individuals on the population’s entropy is measured and preserved in order to deliberately maintain higher entropy levels. By contrast, our approach is based on the fitness values only (without the need to look into the individuals beyond their genealogical relationships) and thus also cannot be used directly as a secondary goal in evolution but purely as a tool of a posteriori analysis on the effectiveness of other secondary goal definitions.
When we construct the “optimal evolutionary process”, we construct a dynamic optimization problem from a traditionally static one. It is interesting that specifically dynamic or online (Bredeche et al. 2009) evolutionary algorithms have been shown to benefit from increased diversity especially when facing changes in their fitness functions (Gabor et al. 2018; Grefenstette 1992). While this is obviously intuitive as more options in the population allow for higher coverage of possible changes, the reverse connection (pointed to by this work) is not stated there, i.e., that diversity in static domains may work because even for static domains the optimization process is inherently dynamic to some degree.
6 Conclusion
We have introduced the novel notion of final productive fitness \({\textsc {fpf}}\) (and all the definitions it is built upon). We make a theoretical argument that \({\textsc {fpf}}\) is the goal an optimal evolutionary process should strive for to achieve the best overall results. However, \({\textsc {fpf}}\) cannot be computed efficiently in advance, producing the need for an approximation. We argue that the wellknown technique of augmenting the objective fitness function with an additional diversity goal (when it helps) happens to effectively approximate the theoretically derived \({\textsc {fpf}}\) (at least better than just the objective fitness on its own). We have shown this connection empirically on benchmark domains. We argue that this provides first insight into why and how diversity terms are beneficial to evolutionary processes.
Immediate future work would consist of answering the when and which: We have tested several domains for evolutionary algorithms and many are too simple to further benefit from explicit diversityawareness. Maybe \({\textsc {fpf}}\) can be used to derive a criterion to estimate the usefulness of diversity in advance. Similarly, many mechanism to cater explicitly to diversity exist. While many can be subsumed by the pairwise distance used here (Wineberg and Oppacher 2003), others may still show different behavior. Their relation to \({\textsc {fpf}}\) requires further work.
There are means of maintaining diversity without altering the fitness function, most prominently structural techniques like island models (Tomassini 2006; Whitley et al. 1999) or hypermutation phases (Morrison and De Jong 2000; Simões and Costa 2002). As no match for \({\textsc {af}} {\textsc {fpf}}\) can be computed for them, we omitted them in this first analysis. Final productive fitness may be a useful tool to translate these structural means into an effect of the fitness function.
Eventually, there may be even more direct or outright better (compared to using diversity or similarly augmented fitness) approximations of \({\textsc {fpf}}\) to be found now that we know what we are looking for. The ultimate goal of the research into \({\textsc {fpf}}\) might be to utilize it directly or indirectly in actually constructing new types of evolutionary algorithms (instead of it “only” helping to explain how wellknown types work). We can imagine:Some of these may be applicable to other methods of optimization as well: We suspect that the notion of final productive fitness translates directly to all optimization methods (ranging from simulated annealing or particle swarm optimization to MonteCarlo tree search and backpropagation) that may or may not already implement means to approximate final productive fitness rather than just objective fitness.

We could compute \({\textsc {fpf}}\) for a simplified version of the problem (and the algorithm) for various parametrizations (Eiben and Smith 2003; Mitchell 1998). The \({\textsc {fpf}}\)’s values could then help evaluate the respective parametrization’s success. However, for most complex problems it is not entirely clear how to derive simpler instances that still preserve the interesting or challenging aspects of their larger counterparts. Furthermore, it is unclear how \({\textsc {fpf}}\) might provide more information than just using the respective runs’ \({\textsc {of}}\).

In this paper, we checked how well various established models for fitness functions approximate \({\textsc {fpf}}\). Instead, we might now construct new models with the goal to approximate \({\textsc {fpf}}\) better. Surrogate models have been used in evolutionary algorithms to approximate computationally expensive fitness functions (Gabor and Altmann 2019; Jin 2005; Jin and Sendhoff 2004). In our case, a surrogate model would have to approximate our aposterioriapproximation of \({\textsc {fpf}}\) and could then maybe save time for future evaluations. It should be noted that our results make it seem plausible that no general model for \({\textsc {fpf}}\) on all domains should exist and we cannot train the surrogate as the algorithm goes along since our target metric is only computed a posteriori. However, we might still be able to learn an n\({\textsc {pf}}\) surrogate for small n or a similar target metric. In Gabor and LinnhoffPopien (2020) we already suspect (based on the a posteriori approximation as we do in this paper) that \({\textsc {fpf}}\) constructs a simpler fitness landscape compared to, quite like surrogates do, suggesting that surrogates may be trained to achieve a similar fitness landscape. That dynamic should be explored in future work.

In Gabor and Belzner (2017) and Gabor et al. (2018) we introduce genealogical distance as a diversity metric for evolutionary algorithms. We show that it provides similar although at times inferior results to distancebased diversity metrics (i.e., the Manhattan and Hamming diversity we use in this paper). However, genealogical diversity may provide means to approximate genealogical relations between individuals making our aposterioriapproximation much easier to compute (probably at the expense of accuracy). Future work should evaluate if that approach brings any relevant benefits.
Acknowledgements
We thank the anonymous reviewers for their indepth comments and their contribution to this paper. We also thank Lenz Belzner for intense discussions that helped make this paper possible.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.