Letter The following article is Open access

Ecological Analogy for Generative Adversarial Networks and Diversity Control

Published 28 December 2022 © 2022 The Author(s). Published by IOP Publishing Ltd
, , Citation Kenichi Nakazato 2023 J. Phys. Complex. 4 01LT01 DOI 10.1088/2632-072X/acacdf

2632-072X/4/1/01LT01

Abstract

Generative adversarial networks are popular deep neural networks for generative modeling in the field of artificial intelligence. In the generative modeling, we want to output a sample with some random numbers as an input. We train the artificial neural network with a training data set for the purpose. The network is known with astonishingly fruitful demonstrations, but we know the difficulty in the training because of the complex training dynamics. Here, we introduce an ecological analogy for the training dynamics. With the simple ecological model, we can understand the dynamics. Furthermore, a controller for the training can be designed based on the understanding. We then demonstrate how the network and the controller work with an ideal case, MNIST.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Generative adversarial networks, GANs, are popular applications of deep neural networks for generative modeling in recent decades [14]. Deep neural networks usually require dataset with pairs of input and output, $\{(\boldsymbol{x}_i,y_i)\}$. We can train deep neural networks, $f(\boldsymbol{x},\boldsymbol{\theta})$, in supervised-ways with this type of data set. In other words, we can optimize the parameter, θ , to minimize the difference, $|f({\boldsymbol{x}}_i,\boldsymbol{\theta})-y_i|$. On the contrary, we do not require the paired data set in the case of GANs, because the networks are trained in a manner of unsupervised way. GANs usually consist of two parts of networks, generator and discriminator. The generator, $G(\boldsymbol{z})$, learns how to generate samples, x , to be recognized as genuine by the other one. The discriminator, $D(\boldsymbol{x})$, learns how to distinguish an input, x , is from the training data, $\{{\boldsymbol{x}}_i\}$, or generated one, $\boldsymbol{x} = G(\boldsymbol{z})$. In the output, $D(\boldsymbol{x})$, we get the likelihood, $y\, (0\leqslant y \leqslant 1)$. In the ideal solution, we expect, the distribution of generated samples, $P(G(\boldsymbol{z}))$, should be the estimation of the distribution of the training data, $P((\boldsymbol{x}))$. Here the learning can be seen as a game dynamics [5, 6] rather than a simple optimization. In fact, it is known that the ideal solution is not easy to be realized with the training [4]. A famous failure scenario, known as mode collapse, is the reduction of the generated samples. We often observe that the generated samples cover only the very limited sub-set of the training data. Such an observation suggests the game dynamics can have multiple equilibria or attractors, in other words, the ideal solution is not necessarily dynamically stable.

To understand such a complex dynamics, we want to consider an analogy. Many examples of game dynamics are known in the field of ecology, e.g. prey–predator dynamics [7]. We live in the ecological network, where animals eat other animals or plants. Living things should optimize their strategy to get rich nutrients from their environment and prevent from danger by others. We notice the generator-discriminator relationship is similar with the prey–predator one. In the ecological system, there are many niches with affluent resources. Animals or plants increase around such rich areas. In this meaning, they want to discriminate whether the area is rich or not. On the other hand, there can be other animals seeking them as foods. If stable relationship between them is established, we can expect the distribution of the animals reflect the map of resources in the land. In the view of GAN, this is the ideal solution of the generator.

However, the established relationship may not realize the resource map in some cases. For example, if the resource scatters in a sparse manner, the habitat of animals can be very limited around only a few spots. We can show the GANs also have unsuccessful solutions, corresponding to the analogy. In addition, we can introduce a control method for the GAN to stabilize the ideal solution based on the ecological analogy. In this paper, we show the analogical understanding of GAN with ecology. To demonstrate the understanding, the results with a known data set, MNIST, is used [8]. The solutions for MNIST show the unstability of the ideal solution. In addition, we can improve it with a controller based on the analogy.

2. Model

2.1. Learning dynamics of GAN

In GANs, we have two learning dynamics. One is for the generator and the other is for the discriminator. The generator is a transformation from an input, z , to the output, x , which has the same form of given dataset. The discriminator evaluates the input, x .

Equation (1)

Equation (2)

where $\boldsymbol{\theta}_g$ and $\boldsymbol{\theta}_d$ are parameters of the networks. In the training of GAN, we usually input random numbers as z into the generator. As the input for the discriminator, we use samples from the given dataset, ${\boldsymbol{x}}_i\in{\cal X}$, and outputs from the generator, $\boldsymbol{x} = G(\boldsymbol{z})$. We have two loss functions to be optimized, therefore. We should minimize a loss function, Lg , for the generator,

Equation (3)

The input, z i , is sampled random number. The generated sample, $G(\boldsymbol{z}_i)$, should be discriminated as genuine, $D(G(\boldsymbol{z}_i)) = 1$, as the result of the training. We also have one more loss function, Ld , for the discriminator. We can write an evolution function for the discriminator with the loss functions,

Equation (4)

In this equation, we want to optimize the discriminator to evaluate the training data, $\boldsymbol{X}_i\in{\cal X}$, as genuine, $D(\boldsymbol{X}_i) = 1$, and the generated one as fake, $D(G(\boldsymbol{z}_i)) = 0$. In a training step, called as 'epoch', we usually update the parameters, $\boldsymbol{\theta}_g$ and $\boldsymbol{\theta}_d$, with the randomly shuffled sub-sets of the training data. In this way, the dynamics is usually somehow stochastic along the adopted optimizer [915].

2.2. An ecological dynamics

Here we consider a very simple ecological system, where two species of animals, A and B, are there. The animals want to migrate to the place with many foods. The animals A move their distribution along the environmental food condition. In other words, we can regard it as a herbivore. When they are eaten by other animals, they want to avoid them. We can express the distribution of the foods with a function of the place, $R(\boldsymbol{x})$. We assume this function is fixed. As an assumption on the distribution of the animal A, we simply describe the distribution in a equation,

Equation (5)

where K is a distribution kernel around the position, a i . In this paper, we can assume the kernel as the following,

Equation (6)

One more assumption for animals B, we use a simple distribution again,

Equation (7)

They live around some spots, b i . We assume the animal B eat the other, the animal A, and they are carnivores therefore. As the migration dynamics, we can assume a simple dynamical system on their habitats, ai and bi ,

Equation (8)

Equation (9)

In the similar way, we assume the environmental food distribution can be written in the following equation,

Equation (10)

We can easily understand this ecological system with the ecological niches, a i , b i and c i .

In the next section, we study the ecological analogy of GAN based on the two dynamical systems, introduced here. We also demonstrate the GAN has such an analogical dynamics with the known data set MNIST [8]. Furthermore, we introduce a control method for GAN and demonstrate it as well.

3. Results

3.1. Ecological analogy for GAN

We introduce some assumptions for the dynamics of GAN and show an ecological interpretation for that, here. We usually assume a deep network for the generator, $\boldsymbol{x} = G(\boldsymbol{z})$, by which we can get a Monte-Carlo sample, x , with a random input, z . In this meaning, this realizes a distribution, $P_G(\boldsymbol{x})$, of samples to be generated. As a very easy case, we introduce an assumption for that,

Equation (11)

In other words, we assume it has more samples around the points, g i , and those are parameters to be optimized. Similarly, we introduce one more assumption on the discriminator, $D(\boldsymbol{x})$. Since the discriminator determines the map, by which the points are discriminated as genuine, D = 1, or not, D = 0, we can model it as an easy mixed distribution,

Equation (12)

This discriminator evaluates a point, x , as genuine, if it is near the one of the centers, $\{\boldsymbol{d}_i\}$. On the other hand, the point far away from all of them, $\{\boldsymbol{d}_i\}$, should be discriminated as fake. The centers, d i , are the parameters to be optimized to minimize the loss, Ld .

We rewrite the evolution equations for GAN with these assumptions. We can sample a generated one, $\boldsymbol{x} = \boldsymbol{g}_i+\boldsymbol{\epsilon}$. With the sample, we can evaluate the loss, Lg ,

Equation (13)

The parameter, $\boldsymbol{d}_j^*$, is the nearest neighbor one from the sample, $\boldsymbol{g}_i+\boldsymbol{\epsilon}$. The learning dynamics can be written with this,

Equation (14)

In addition, we assume the training data distribution, ${\cal X}$, can be written with some representative points,

Equation (15)

With this expression, we can write the evolution of the discriminator, $D(\boldsymbol{x})$, in the following,

Equation (16)

Equation (17)

Equation (18)

where the points, $\boldsymbol{T}_i^*$ and $\boldsymbol{g}_i^*$, are the nearest neighbor one from the parameter, d j . We now get the simple model for learning dynamics of GAN. The dynamics shows striking similarity with the ecological dynamics, which we defined in the previous section. In other words, we can interpret the dynamics with prey-predator one.

As we notice, the ecological dynamics does never assure the diversity of the distribution of animals. The animals, both A and B, can stay around only one niche, c i . In the similar way, we can expect this type of solutions can be realized in GAN, as well. In other words, the ideal solution of GAN can be dynamically unstable or not the unique solution.

3.2. Generated space reduction with MNIST

Here, we show results with MNIST. For simplicity, we firstly show the results with reduced MNIST. In the MNIST, we have labeled data set, consists of pairs of an image and the digit. Each image includes a manually written digit. We randomly select pairs or triplets of digits as the reduced cases. In the case of paired MNIST, we randomly select two digits and all images labeled as them are used. In the case of triplet one, three digits are randomly selected and the training data consists of all of the images with the digits, similarly. The networks, the generator and the discriminator, learn how to generate new digit images and discriminate whether an input image is from the training data or generated ones, respectively.

In figures 13, we show the results of pair cases with 100 iterated tests. We evaluated the distance between generated samples, x , and MNIST images, X i , to determine the class label of the sample. We selected the nearest neighbor MNIST image, in the training data set, for each generated one and classified it as the same class of that from MNIST. In this way, all generated ones can be classified into one of digits. The results show time series of of the class ratios, r0 and r1. We trained the network for 500 epochs in all cases. We used Adam optimizer with the learning rate, 0.001[15]. We calculated this with the simplest GAN with a full connected hidden layer in generator and discriminator. The dynamics is oscillatory and those are around the mid-ratio, $0.4 \sim 0.5$, and in the biased region, ${\sim}0.1$, in the left of the figure 1. We only show one of them, r0 and $r_1 = 1-r_0$, which is less at the end of training. In the final population, right in it, we have four peaks in the end ratio. Among them, we can confirm two peaks, in the mid-range, with weak bias. The generated distributions, with such a weakly biased ratio, can cover both of selected digits. On the contrary, almost all of the generated images are classified into either of the digits in the cases with the highly biased ratio.

Figure 1.

Figure 1. Time series of pair-MNIST training and the end distribution. We tested samples of randomly selected pairs from MNIST, as training data. In the left, We only show the time series of less ratio in the end. The horizontal axis is epochs(/10). The vertical one is the ratio of samples classified as each label. In the right, histogram of the end distribution of the ratio is shown.

Standard image High-resolution image
Figure 2.

Figure 2. Results of paired-MNIST, 0 and 2. The horizontal axis is epoch(/10). The vertical one is ratio of labels, which is evaluated with the distance between generated samples and given samples.

Standard image High-resolution image
Figure 3.

Figure 3. Results of pair-MNIST, 4 and 8. The horizontal axis is epoch(/10). The vertical one is ratio of labels, which is evaluated with the distance between generated samples and given samples.

Standard image High-resolution image

We also checked the repeatability, in figures 2 and 3. From the 100 iterated tests, we can select some ones with the same pair. Among them, we show the results of pairs, $(0,2)$ and $(4,8)$. In the first ones, we can always confirm '0' is stronger than '2'. On the other hand, in the cases with the pair, $(4,8)$, we can confirm the non-biased ratios in the both cases. These results suggest the attractors are specific to the pair.

In figure 4, we show the results of triplet cases. We calculated and evaluated this in the same way of the pair cases, for 30 iterated tests. Here, we also plotted the ratio between triples. We randomly selected three digits and all of the MNIST images with the selected ones are used for training. The dots of initial and end ratio are shown. We can confirm the initial dots are centered in the ternary plot. On the contrary, the end dots tend to distribute widely in the area. Especially, we observe more dots around edges.

Figure 4.

Figure 4. The ternary plot for the results of triple MNIST. In this, we plot the start and end point. The point shows the ratio between the classified labels.

Standard image High-resolution image

The samples of learning dynamics are shown in the figure 5. We can confirm a convergent dynamics to non-biased distributions, in the left. In the right, the biased distributions are realized as the result of training. We can find both non-biased and biased results, but biased ones are more, as confirmed in the figure 4.

Figure 5.

Figure 5. Results of triple-MNIST. Convergent and biased one. The horizontal axis is epoch(/10). The vertical one is ratio of labels, which is evaluated with the distance between generated samples and given samples.

Standard image High-resolution image

In figure 6, we show sampled generated images of the case with full MNIST. We can confirm biased generated samples, especially '1' or '9'.

Figure 6.

Figure 6. The generated samples at the end of training with full MNIST. We generated 100 samples after the training with random lateral input, z .

Standard image High-resolution image

We iterated this full-MNIST tests for ten times and plotted results. We calculated and evaluated them in the same way of the pair cases. In the figure 7, we plotted only the ratio of '1'. There are two groups, strongly and weakly biased ones. This suggests we have two types of attractor in the case of full-MNIST, along the ratio of generated images classified as '1'. In the figure 8, we show two examples of time series. One is the plot, showing strongly biased case and the other shows a weakly biased case.

Figure 7.

Figure 7. The time series of training with full-MNIST, especially labeled with '1'. Strongly and weakly biased groups are there.

Standard image High-resolution image
Figure 8.

Figure 8. The time series of training with full-MNIST. Two typical cases are shown. The horizontal and vertical axes are the same as others. The color shows the labels.

Standard image High-resolution image

3.3. Control of generated space

As we observed in the previous section, generated space can be reduced into sub regions, which can cover only the sub-set of the training data set. In the analogical ecology, equations (8) and (9), this type of biased distribution can occur as well, depending on the initial state of the animal niches, a i or b i . On the contrary, we know animals often distribute more widely on the field, in the real ecology. If animals distributed on a specific area, they can not eat enough because of decreased amount of foods. This results in decrease of animals around there and they often disperse to elsewhere with enough foods, e.g. spatial Lotka–Volterra system [16, 17]. Based on this analogy, we introduce a kind of population dynamics onto our learning dynamics, equations (3) and (4). Our idea is introducing niche dynamics as the reflection of the food dynamics. The easy way to do that is introducing dynamics for the resource niche, c i .

When animals stay around a specific resource niche, c i , in the ecology, the foods around there should be decreased and others should be increased instead. This mechanism can be realized in GAN through the learning dynamics for the discriminator, $D(\boldsymbol{x})$. The discriminator learns the distribution of the training data set, ${\cal X}$. In the training, we use samples of them for the update. In the original GAN, we usually use the uniform sampling weight, w , for the sampling, but we can control the weight appropriately. If we know the all distances, lij , between the generated one, ${\boldsymbol{x}}_i$, and the training data point, X i , the population of the generated samples, Pi , can be evaluated in a straightforward way,

Equation (19)

This population can be evaluated with just the size of nearest neighbor cluster around each training data. An easy control way for the sampling weight can be written with it,

Equation (20)

where the parameter, β, is for tuning the scale or gain. The weighting means the sampling is biased to the point with less generated population. For easier computation, we can classify the training data into some representative points, $\boldsymbol{X}_i^*\in{\cal X^*}$, with a clustering algorithm. We can evaluate the generated population distribution with such clustered ones for our cases.

3.4. Results of the controlled MNIST

Here, we show the results of controlled MNIST, paired and full one.

As we already confirmed with pair-MNIST, trained generated samples show typically four peaks with strongly or weakly biased ones, in figure 1. In figure 9, we show the results with the controller. In the first one, the control parameter is set as, β = 2.0 and β = 10.0, respectively. With the controller, we want to keep the unbiased state. In other words, we expect the controller stabilizes only two weakly biased peaks among the four ones.

Figure 9.

Figure 9. Controlled results of pair-MNIST. We show the time series of the ratio for the pair-MNIST for ten iterated tests. Each plot shows the one with the less ratio, $\lt\!\!0.5$, at the end of the test. Here we used a control parameters, β = 2 and β = 10, in the left and right, respectively.

Standard image High-resolution image

At a first glance, we notice the ratio tends to stay around the mid-range. The range covers the weakly biased peaks in the non-controlled results, figure 1. Furthermore, the convergence can be tuned with the parameter, β. When the gain is small, β = 2, the distributions show more diversity in the mid-range. But, we see more convergent ones with the larger gain, β = 10. In the cases with pair-MNIST, we can control the generated distribution.

As the next step, we want to control the generated samples distribution with full MNIST. In figure 10, we show the generated samples with full MNIST. This is calculated with the parameter, β = 10. Those are sampled after the training of 500 epochs. We can confirm more diversified samples than that without the control, figure 6.

Figure 10.

Figure 10. Generated samples of the trained generator with sampling weight control.

Standard image High-resolution image

In figure 10, we can confirm only one converged region. The region lays around the weakly biased one, ${\sim}0.2$, in figure 7. This suggests attractor selection is realized with the controller. However, we could not confirm much less biased distributions, ${\sim}0.1$, which suggests equally covered distribution for all digits.

As we confirmed, we have two peaks in the case of full-MNIST, ${\sim}0.2$ or ${\sim}0.6$. On the other hand, we have only one peak around the area, $0.2 \sim 0.3$, with the controller, figure 11. In the figure 12, we show all trajectories showing the ratio for each digit to check into detail. We can confirm the ratio for '1' is larger than others in both cases. We cannot realize equally covered distributions, unfortunately.

Figure 11.

Figure 11. The time series of ratio of '1' in the training with controlled full MNIST.

Standard image High-resolution image
Figure 12.

Figure 12. Two examples with controlled full MNIST. In both cases, we used the same parameter, $\beta-10$.

Standard image High-resolution image

The results, shown here, suggest the simple controller for sampling weight is enough to keep diversity of generated samples with the mechanism of attractor switching. But we should note it is not enough for perfect control for the ideal distribution.

4. Discussion

The learning dynamics of GAN can be expressed with two dynamical equations, equations (3) and (4). However, it is too complicated to understand the behavior intuitively, because of the complexity of generator and discriminator networks and the training dynamics. In reality, we know, GANs show unstable convergence in general. As we can see in the dynamical equations, the training is not straight forward optimization, but complicated nonlinear dynamics. That's why we need many trial and error for more successful results.

To understand the dynamical system, in more intuitive manner, we proposed an analogy from ecological dynamics. In ecological systems, we see many animals or plants. They are networked with each other through predator–prey interactions [7]. In such an ecological system, animals migrate for seeking foods or avoiding enemies. We can find the similar dynamical structure between GANs, equations (3) and (4), and the ecological system, equations (8) and (9). Along this analogical dynamical equations, we can understand the dynamics as the combined dynamics of animals and resource niches. Since we assume the resource niches are environmental factors, it can be regarded as fixed, if we focus on the shorter time ecological dynamics. The training data set in GAN corresponds to the environmental factor in the analogy and animal niches can be regarded as generated distributions or the discrimination map.

As we can notice with the analogical equations, animal niches can be reduced into limited regions around some specific resource niches. We can demonstrate this type of reduction in the original GAN with MNIST. On the other hand, we know animals, in nature, do not necessarily live together around a few niches. They live widely in their environment with dispersal migration. When they face the shortage in the foods, they should forage for it somewhere in other places. We can introduce this type of dispersal mechanism into the GAN with the control of resource niches. The amount of resources can be adjusted with the sampling weight for the training data. If the generated samples distribute around some specific training data points, the sampling weight should be reduced to simulate the lack of foods with the GAN. As we shown, the controller can successfully diversify the generated samples in the case of MNIST. This suggests the ecological interpretation is effective, regardless of the simplicity of the analogical model.

To be noted, our method for the diversity control is similar one to the minibatch discrimination in the view of training dynamics [18]. Since the phenomena of biased distributions, like mode-collapse, is known in the field of machine learning, a straight forward way for enhancing the diversity can be a solution. Actually, in the method, the diversity is estimated over the minibatch samples as a practical algorithm. In our analogical way, we focus on the dynamical nature of the GAN rather than the resultant distribution, but the originated dynamics and results are similar with each other. What we want to emphasize here is the effectiveness of the analogical understanding for complex phenomena, including GAN. As we know in the studies for complex systems, analogical understanding is often very effective and our case, GAN, is not the exception. As a fast step for such a fruitful understanding with the analogy from the other fields, we gave an ecological theory for GAN.

In the simplified model for the GAN, equations (8) and (9), the generator and discriminator can be realized with some independent niches. In such a system, we can expect simple dynamics as we depicted. On the other hand, we can observe some stronger or weaker niches and even multiple attractors for the same training situations in the original GAN with MNIST. Such a complexity may stem from the metric realized in the network or non-linear training dynamics. Anyway, we need more studies to understand the complex dynamics behind the GAN, therefore.

We already tested our method for another dataset, like fashion-MNIST [19], and got some results. Interestingly, it shows more complicated dynamics and we can observe catch and run type phenomena. In other words, it is not easy to control the diversity. Needless to say, we know such a catch and run type dynamics in prey–predator dynamics, here and there, and hardness of diversity management. We find similarity between GAN and ecological dynamics again and believe fruitful discussions between the fields.

As one more possibility, we can discuss on an aspect of GAN as a competitive algorithm. Since GAN is a kind of the competitive algorithm between the generator and discriminator, imbalanced competition can result in poor results, like other competitive games [20]. From the view of ecological dynamics, if some species are much stronger than others, some of species go extinct. However, if the genetic evolution is fast enough for the weakers, the ecological system can be stabilized. This suggests we can control the learning rates for G or D, if the loss of generator/discriminator do never show any improvement in the case of GAN. In fact, we can skip a few training steps for the discriminator, D, for the adjustment of the imbalanced dynamics [21]. As one more example, we can adjust the training ratio between the generator, G, and the discriminator, D, through a bit complicated architecture and loss function [22].

The deep neural network itself is a bio-inspired algorithm, which simulates the retinal structure [2325]. Our study demonstrates the bio-inspired modeling can be effective in a different level as well. As we know, the approaches in complex systems are effective for studying such biological systems. We believe the way of complex systems' study must be effective to deepen the understanding of bio-inspired systems. At the same time, the studies for the machine learning must lead us to the deep understanding of the complex systems, as well.

Acknowledgment

This work was partially supported by PS/PJ-ETR-JP.

Data availability statement

All data that support the findings of this study are included within the article (and any supplementary files).

Please wait… references are loading.
10.1088/2632-072X/acacdf