nach oben

KI - Künstliche Intelligenz

Erschienen in:

Open Access 01.11.2015 | Technical Contribution

Geometric Design Principles for Brains of Embodied Agents

verfasst von: Nihat Ay

Erschienen in: KI - Künstliche Intelligenz | Ausgabe 4/2015

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

I propose a formal model of the sensorimotor loop and discuss corresponding extrinsic embodiment constraints and the intrinsic degrees of freedom. These degrees constitute the basis for adaptation in terms of learning and should therefore be coupled with the embodiment constraints. Notions of sufficiency and embodied universal approximation allow us to formulate principles for such a coupling. This provides a geometric approach to the design of control architectures for embodied agents.

1 Introduction

Within the last few decades, it has become clear that an understanding of intelligence and cognition has to take into account the fact that behaving agents are embodied and situated [9, 19]. At first sight this might appear like an obvious and unimportant observation. After all, it is obvious that agents do have bodies and that they are situated in some environment. So what? But the more we think about it the more we should understand that this observation has actually far-reaching implications. Any behaviour of an agent is mediated through its sensorimotor apparatus and its interactions with the environment. Therefore, the control of intelligent behaviour is inseparably intertwined with the agent’s sensorimotor constraints. Such a coupling should allow us to reveal design principles for brains and their implemented control mechanisms. In this regard, it has been pointed out that quite complex behavioural patterns do not necessarily require complex control, leading to the notions of morphological computation and the principle of cheap design [19, 24, 25].

In this article, I want to formally address the tight connection between the control architecture and the embodiment of an agent in terms of geometry, in particular information geometry [1, 3]. In order to do so, I apply a theory of the sensorimotor loop in terms of a causal model, as developed in [7, 13, 16], and propose an approach for designing controller architectures that utilise the sensorimotor constraints. My aim is not to provide the most general results, but to exemplify the strength of the theory by integrating initial results from an existing body of work [2, 4, 8, 12, 14‐16, 25], and discussing them, for the first time, from a unifying perspective. The main intuition for connecting control architectures to sensorimotor constraints is based on two assumptions. On one hand, the architecture should be sufficient for the expression of desired behaviours, given the embodiment constraints. This clearly requires some amount of richness or complexity of the architecture. On the other hand, it should be concise in some natural sense. I discuss various versions of these two assumptions and derive related results. In particular, I address the notion of embodied universal approximation and contrast it with the standard notion of universal approximation. This is closely related to the recent work [16] which was initiated by the general line of research sketched in the present article. Being based on stronger assumptions, on the other hand, the results of [16] are more refined than the corresponding ones presented in this broad treatment.

Section 2 provides the conceptual and formal definition of the sensorimotor loop of an embodied agent. To the agent, the world appears as a black box which contains the agent’s body and its environment. Sensorimotor mechanisms and their generation of behaviour, which takes place in the world, are formalised. Section 3 discusses and formalises extrinsic and intrinsic sensorimotor constraints. Intrinsic constraints are given in terms of a controller model, which allows the agent to adapt to the extrinsic constraints. The notions of sufficiency and embodied universal approximation are introduced as basis for a geometric design of controller models. Finally, Sect. 4 exemplifies the developed geometric methods in the context of policy models and derives four examples of embodied universal approximators.

2 A Formal Model of Embodied Agents

2.1 The Basic Components of the Sensorimotor Loop

What is an embodied agent? In order to develop a theory of embodied agents that allows us to cast the core principles of the field of embodied intelligence into rigorous theoretical and quantitative statements, we need an appropriate formal model. Such a model should be general enough to be applicable to all kinds of embodied agents, including natural as well as artificial ones, and specific enough to capture the essential aspects of embodiment. How should such a model look like? First of all, obviously, an embodied agent has a body. This body is situated in an environment with which the agent can interact, thereby generating some behavior. In order to be useful, this behavior has to be guided or controlled by the agent’s brain or controller. We draw the boundary between the brain on one side and the body, together with the environment, on the other side. We assume that the brain receives sensor signals from and sends effector or actuator signals to the outside world (body and environment), which is the only mechanism of signal transmission in both directions. Stated differently, we assume that the brain is causally independent of the world, given the sensor signals, and the world is causally independent of the brain, given the actuator signals. We refer to this perspective as the black box perspective. In particular, the boundary between the body and the environment is not directly “visible” for the brain and has to be identified or actively constructed as result of the interaction with the world.

2.2 The Mechanisms of the Sensorimotor Loop

Let us now develop a formal description of this sensorimotor loop. We denote the set of world states by $\fancyscript{W}$. This set can be, for instance, the position of a robot in a static 3D environment. Information from the world is transmitted to the brain through sensors. Denoting the set of sensor states by $\fancyscript{S}$, we model the mechanism of a sensor in terms of an information transmission channel from $\fancyscript{W}$ to $\fancyscript{S}$ as it is defined in information theory. More precisely, given a world state $w \in \fancyscript{W}$, the response of the sensor can be characterized by a probability distribution of possible sensor states $s \in \fancyscript{S}$ as result of w. For instance, if the sensor is noisy, then its response will not be uniquely determined. If the sensor is noiseless, that is deterministic, then there will be only one sensor state as possible response to the world state w. In any case, the response of the sensor given the world state w can be described as a distribution $\beta (w)$ on the sensor states s. Therefore, the mechanism of the sensor can be formalised as a Markov kernel

$$\begin{aligned} \beta : \; \fancyscript{W}\; \longrightarrow \; \Delta _{\fancyscript{S}}, \end{aligned}$$

where $\Delta _{\fancyscript{S}}$ denotes the set of distributions on $\fancyscript{S}$. Each distribution $\beta (w)$ is uniquely determined by its values $\beta (w; ds)$ for infinitesimal sensor state sets ds. From this, we can calculate the probability of any (measurable) set ${\fancyscript{S}}' \subseteq \fancyscript{S}$ by integration:

$$\begin{aligned} \beta (w; {\fancyscript{S}}') \; = \; \int _{{\fancyscript{S}}'} \beta (w ; ds). \end{aligned}$$

Given a reference measure $\mu _S$ on the set of sensor states so that all distributions $\beta (w)$ are continuous with respect to $\mu _S$, we write $\beta (w; s)$ for the corresponding density of $\beta (w)$, that is $\beta (w; \fancyscript{S}') = \int _{\fancyscript{S}'} \beta (w;s) \, \mu _S(ds)$. In particular, this always holds in the case of finitely many sensor states where we take as reference measure the counting measure. Finally, we denote the set of all sensor mechanisms by $\Delta ^{\fancyscript{W}}_{\fancyscript{S}}$.

After having described in detail the mathematical model of a sensor, it is now straightforward to consider corresponding formalisations of the other components of the sensorimotor loop in terms of Markov kernels, where we apply, without explicitly stating, the same notation conventions as for the sensor. Based on our conceptual perspective, which is illustrated in Fig. 1b, we compose the sensorimotor mechanisms according to a causal diagram, which represents the interaction process of the agent with the world (see Fig. 2).

We now continue with the notion of a policy. The agent can generate an effect in the world in terms of its actuators. Since we consider the body as being part of the world, this can lead, for instance, to some body movement of the agent. In order to control or guide this movement, it is beneficial for the agent to choose its actuator state based on the internal controller state which contains information about the world received through its sensors. Denoting the controller state set by $\fancyscript{C}$ and the actuator state set by $\fancyscript{A}$, we can again consider a channel from $\fancyscript{C}$ to $\fancyscript{A}$ as formal model of a policy, which we denote by $\pi $. Being more precise, a policy receives a controller state c and generates, based on c, a distribution $\pi (c)$ of actuator states. Again, we have a Markov kernel

$$\begin{aligned} \pi : \; \fancyscript{C}\; \longrightarrow \; \Delta _{\fancyscript{A}}. \end{aligned}$$

Note that this definition of a policy allows us to also consider a random choice of actions, so-called non-deterministic policies.

Finally, we formalise the dynamics $\alpha $ of the world and the dynamics $\varphi $ of the controller, in terms of Markov kernels

$$\begin{aligned} \alpha : \; \fancyscript{W}\times \fancyscript{A}\; \longrightarrow \; \Delta _{\fancyscript{W}} \quad \text{ and } \quad \varphi : \; \fancyscript{C}\times \fancyscript{S}\; \longrightarrow \; \Delta _{\fancyscript{C}}. \end{aligned}$$

2.3 From Mechanisms to Embodied Behaviours

In order to describe the interaction process between the agent and the world, we have to sequentially apply the individual mechanisms in the right order. We first consider only one step as shown in Fig. 3, and then iterate this step. Starting with an initial world state w and an initial controller state c, we have the following Markov kernel that describes the transition to the new world and controller states:

$$\begin{aligned} {\mathbb {P}} (w,c ; dw', dc')&= \int _{\fancyscript{A}} \int _{\fancyscript{S}} \varphi (c,s'; dc') \, \beta (w' ; ds' ) \, \alpha (w,a' ; dw' ) \, \pi (c ; da' ) \nonumber \\&= \int _{\fancyscript{S}} \varphi (c,s' ; dc' ) \, \beta (w' ; ds' ) \, \int _{\fancyscript{A}} \alpha ( w,a' ; dw' ) \, \pi (c ; da' ) \nonumber \\& = {\mathbb {P}}(c,w' ; dc' ) \, {\mathbb {P}}( w,c ; dw'). \end{aligned}$$

(1)

Iterating this one-step transition T times defines a joint process $(w^1, c^1), \cdots , (w^T, c^T)$ of the world and the controller, conditioned on the initial joint state $(w^0, c^0)$. Behaviour is a process that takes place in the world, for instance as a particular movement of the agent’s body which we considered to be part of the world (see Fig. 1). This implies that only the world process $w^1, \dots , w^T$ will be of relevance for the study of behaviour. Marginalising out the controller process $c^1, \dots , c^T$ leads to

$$\begin{aligned} {\mathbb {P}}(w^0,c^0; dw^1, \dots , dw^T )= & {} \int _{\fancyscript{C}} \cdots \int _{\fancyscript{C}} {\mathbb {P}}(w^0,c^0; dw^1, dc^1, \dots , dw^T, dc^T) \nonumber \\= & {} \int _{\fancyscript{C}} \cdots \int _{\fancyscript{C}} \prod _{t = 1}^T {\mathbb {P}} (w^{t-1},c^{t-1} ; dw^{t}, dc^t), \end{aligned}$$

(2)

where we integrate T times. This Markov kernel encodes all information that is required for the evaluation of behaviour.

3 Intrinsic and Extrinsic Sensorimotor Constraints

In the previous section, I have introduced the sensorimotor loop in terms of the mechanisms $\pi $, $\alpha $, $\beta $, and $\varphi $. Here, the mechanisms $\alpha $ and $\beta $ are extrinsic to the agent and define its embodiment constraints. I refer to these constraints as being of the first kind in order to distinguish them from other constraints introduced below. Clearly, $\alpha $ and $\beta $ are not fixed and can evolve in time. For instance, if an agent moves from one environment to another the mechanism $\alpha $ will change. Both mechanisms, $\alpha $ and $\beta $, will change, if the agent’s body is subject to a developmental process or to a partially disabling injury. However, we can expect that, due to invariant structures and regularities of the environment and the body, the mechanisms will remain in a restricted domain. This fact implies a second kind of embodiment constraint. In addition to the constraints of the first kind encoded by a single pair $\sigma := (\alpha , \beta )$ of extrinsic mechanisms, the evolution of $\sigma $ is restricted to trajectories within a subset $\Sigma $ of possible mechanisms, corresponding to the variability of the body and the environment. We consider the direct cause of this evolution within $\Sigma $ to be completely exogenous with respect to the agent and, therefore, it can not be directly controlled by the agent. In contrast, we assume that the intrinsic mechanisms $\pi $ and $\varphi $, summarised as $\gamma = (\pi , \varphi )$, can be endogenously modified by the agent in terms of a learning process. The question is whether this intrinsic process is also constrained. Obviously, this will depend on the degrees of freedom of the controller or brain architecture. If it is rich enough to implement any $\pi $ and $\varphi $ then we have to consider the set

$$\begin{aligned} \Sigma \times \Gamma , \quad \text{ where } \Gamma := \Delta ^\fancyscript{C}_\fancyscript{A}\times \Delta ^{\fancyscript{C}\times \fancyscript{S}}_\fancyscript{C}, \end{aligned}$$

(3)

as domain for the process of all sensorimotor mechanisms. But should we assume this richness of brain architectures? I argue that it is not only not required but also beneficial for the agent to have a restricted control architecture, which is in line with the idea of cheap design within the field of embodied intelligence. However, the theoretical study of unrestricted control architectures, referred to as universal approximators, is helpful for exploring the ones that are optimal for the given constraints $\Sigma $ of the agent. Here, optimality involves two requirements. On one hand, the control architecture should be sufficient in the sense that it enables the agent to adapt to the embodiment constraints $\Sigma $. On the other hand, it should be concise in order to efficiently implement this adaptation. The field of embodied intelligence offers many case studies as evidence for such kinds of cheap control. This field highlights, in particular, the fact that quite complex and useful behaviours do not require much control [19].

Now, I am going to make these ideas on optimal architectures more precise. A brain architecture can be associated with a set ${\mathcal M} \subseteq \Gamma $, the set of those $\gamma = (\pi , \varphi )$ that can be generated by the architecture. We refer to such a set as controller model or simply model. Most generally, a model can be any subset ${\mathcal M}$ of $\Gamma $. Typically, a model ${\mathcal M}$ is defined as the image of a map $\eta $, referred to as a parametrisation of ${\mathcal M}$, from a parameter set $\Theta \subseteq {\mathbb {R}}^d$ to $\Gamma $. Figure 4 illustrates the important parametrisation in terms of synaptic couplings $w_{ij}$ of neurons i and j. We call a model ${\mathcal M}$ together with a parametrisation $\eta $ a parametrised model. Clearly, any map $\eta : \Theta \rightarrow \Gamma $ is a parametrisation of the model given by its image.

In order to be useful in applications, a parametrised model is typically assumed to have further properties which are context dependent, for instance smoothness properties up to some order.

In addition to the embodiment constraints $\Sigma $, a model ${\mathcal M}$ encodes the intrinsic constraints of the brain and represents, at the same time, the domain in which learning processes can take place. The set (3) of sensorimotor loops then reduces to

$$\begin{aligned} {\text{ SML }} \; := \; \Sigma \times {\mathcal M}. \end{aligned}$$

(4)

In what follows, I want to identify intrinsic constraints, that is a model ${\mathcal M}$, that are coupled with the extrinsic constraints $\Sigma $ according to some natural optimality conditions. In order to do so, let us consider the map that assigns to each sensorimotor loop its corresponding behaviour, the behaviour map:

$$\begin{aligned} \psi _{\sigma }: \Gamma \;\; \longrightarrow \;\; \Delta ^{\fancyscript{W}\times \fancyscript{C}}_{\fancyscript{W}^\infty }, \quad \gamma \;\; \longmapsto \;\; {\mathbb {P}}(w^0,c^0; dw^1, dw^2, dw^3, \dots ), \end{aligned}$$

(5)

where we use the formula (2) to define the Markov kernel $\psi _\sigma (\gamma )$ by letting T go to $\infty $. Now, say that we have given for all $\sigma \in \Sigma $ a set $\mathcal {O}_\sigma $ of optimal or desired behaviours, such as behaviours with maximal predictive information [25] or maximal expected reward [8]. In this article, optimality of behaviours is not further specified and should not be confused with the optimality of a model, which plays the central role in this paper. Optimal models should, at least, satisfy the following natural sufficiency condition.

Definition 1

We say that a model ${\mathcal M}$ is (geometrically) sufficient, if for all $\sigma \in \Sigma $ and all corresponding behaviours $\delta \in \mathcal {O}_\sigma $, there exists $\gamma \in \overline{\mathcal M}$ that generates that behaviour, that is $\psi _{\sigma }(\gamma ) = \delta $.

Here, the bar over the set ${\mathcal M}$ denotes its topological closure in $\Gamma $. In principle, this will depend on the underlying topology of $\Gamma $ for which one has various natural choices. I am not going to address these topological questions in further detail. The main results of the next section refer to the case where the state sets are finite and therefore the topology is simply the standard one of a finite-dimensional real vector space. If the closure of the model ${\mathcal M}$ equals all theoretically possible intrinsic mechanisms, that is $\overline{\mathcal M} = \Gamma $, then we say that the model is a universal approximator. This corresponds to the most flexible brain architecture which I already mentioned above [see Eq. (3)]. I argue that such a brain is not required for embodied agents in order to be universal at the behavioural level. There are sufficient models, in the sense of Definition 1, that are behaviourally equivalent to, but less complex than, a universal approximator. Clearly one has to specify the notion of complexity here. In any case, the general study of sufficient models in relation to their complexity provides one way to formally address the subject of cheap design which plays a central role within the field of embodied intelligence [19].

In order to be more precise, we consider as optimal behaviour sets ${\mathcal O}_\sigma $ all behaviours that can in principle be generated in the context of the extrinsic constraints given by $\sigma $. Stated differently, we set ${\mathcal O}_\sigma $ to be the image of $\psi _\sigma $ and obtain the following sufficiency property of a model ${\mathcal M}$, which we then call an embodied universal approximator (for $\Sigma $):

$$\begin{aligned} \psi _\sigma (\overline{\mathcal M}) = \psi _\sigma ( \Gamma ) \quad \text{ for } \text{ all } \sigma \in \Sigma . \end{aligned}$$

(6)

In general, an embodied universal approximator does not have to be concise. For instance, any universal approximator is in particular an embodied universal approximator. In order to be consistent with the idea of cheap design, we have to specify what we mean by “cheap” in terms of a complexity measure and then compare various embodied universal approximators with respect to their complexity. This will be exemplified in the next section.

4 Cheap Embodied Universal Approximation

The intention of this section is to exemplify the developed concepts and tools in terms of very specific theoretical results on cheap design which are closely related to the work [16]. In particular, I assume only finite sets $\fancyscript{W}$, $\fancyscript{S}$, $\fancyscript{C}$, and $\fancyscript{A}$, and, furthermore, I concentrate on the definition of policy models, ignoring other intrinsic constraints. To be more precise, consider the set

$$\begin{aligned} \Gamma = \Delta ^\fancyscript{C}_\fancyscript{A}\times \Delta ^{\fancyscript{C}\times \fancyscript{S}}_\fancyscript{C}\end{aligned}$$

of all possible intrinsic mechanisms $\pi $ and $\varphi $ [see Eq. (3)]. In general, a model ${\mathcal M}$ defines a reduction of $\Gamma $ in both components, that is ${\mathcal M} \subseteq \Gamma $. In this section, however, we study only reductions in the first component in terms of policy models ${\mathcal M}_{\mathrm{policy}} \subseteq \Delta ^\fancyscript{C}_\fancyscript{A}$ which correspond to joint controller models

$$\begin{aligned} {\mathcal M} = {\mathcal M}_{\mathrm{policy}} \times \Delta ^{\fancyscript{C}\times \fancyscript{S}}_\fancyscript{C}. \end{aligned}$$

(7)

We say that a policy model is an embodied universal approximator (for $\Sigma $), if the corresponding joint model (7) has this property. Note that the restriction to policy models excludes the possibility of coupling the two intrinsic mechanisms $\pi $ and $\varphi $. Such a coupling might provide a further way of reducing the complexity of the controller model.

4.1 General Selection of Policy Models

Consider the one-step kernel ${\mathbb {P}}(w,c; dw')$, which appears in (1). For finite sets, this reduces to

$$\begin{aligned} {\mathbb {P}}^\pi (w, c ; w') = \sum _{a} \alpha (w, a;w') \, \pi (c;a), \quad c \in \fancyscript{C}, \; w,w' \in \fancyscript{W}, \end{aligned}$$

(8)

where we indicate the dependence on $\pi $ explicitly. We leave all other mechanisms of the sensorimotor loop fixed and consider two policies $\pi _1$ and $\pi _2$ that satisfy

$$\begin{aligned} {\mathbb {P}}^{\pi _1}(w, c ; w') = {\mathbb {P}}^{\pi _2}(w, c ; w') \;\; \text{ for } \text{ all } c \in \fancyscript{C} \text{ and } \text{ all } w,w' \in \fancyscript{W}. \end{aligned}$$

(9)

It follows directly from (1) that the corresponding behaviours, defined by (5), are identical. In other words, if for all controller states c the distributions $\pi _1(c; \cdot )$ and $\pi _2(c; \cdot )$ give the same expectation values of the functions

$$\begin{aligned} \alpha _{w,w'}: \;\; a \; \mapsto \; \alpha _{w,w'}(a) := \alpha (w,a;w'), \quad w,w' \in \fancyscript{W}, \end{aligned}$$

(10)

then $\pi _1$ and $\pi _2$ will generate the same behaviour, assuming that all the other mechanisms are fixed. This redundancy allows us to construct policy models with smaller dimensionality without reducing the behavioural repertoire of the agent. In order to do so, we identify two distributions $\mu _1$ and $\mu _2$ on $\fancyscript{A}$ if they give the same expectation values of the functions $\alpha _{w,w'}$, $w,w' \in \fancyscript{W}$, and thereby obtain a partition of $\Delta _{\fancyscript{A}}$ into convex equivalence classes $[\mu ]$, $\mu \in \Delta _{\fancyscript{A}}$. Given a policy $\pi $, we can now consider the family of classes $[\pi (c; \cdot )]$, $c \in \fancyscript{C}$, and any further $\pi '$ satisfying $\pi '(c; \cdot ) \in [\pi (c; \cdot )]$ will generate the same behaviour as $\pi $. Following the idea of cheap design, it is natural to define a model by choosing $\pi '$ to be the one with the least complexity. Assuming that our complexity measure reflects the structure of a policy, and, furthermore, that structure corresponds to low entropy, we have the natural choice provided by the maximum entropy principle. More precisely, in each class $[\pi (c; \cdot )]$ we choose the distribution with the highest entropy $-\sum _{a \in \fancyscript{A}} \pi (c;a) \ln \pi (c;a)$. This policy is uniquely determined, as the entropy is strictly concave and the classes are convex. We obtain our first parametrisation $\eta : {\mathbb {R}}^{\fancyscript{C}\times {\fancyscript{W}}^2 } \rightarrow \Delta ^{\fancyscript{C}}_{\fancyscript{A}}$,

$$\begin{aligned} \theta \; \mapsto \; \pi _\theta (c;a) := \frac{\exp \left( \sum _{w,w'} \theta _{w,w'}(c) \, \alpha _{w,w'}(a) \right) }{\sum _{a'} \exp \left( \sum _{w, w'} \theta _{w,w'} ( c ) \, \alpha _{w,w'}(a' ) \right) }. \end{aligned}$$

(11)

This is a quite prominent model in statistics and information geometry, which is called exponential family [1]. Using general information-geometric arguments, it is obvious that the parametrisation (11) defines an embodied universal approximator for

$$\begin{aligned} \Sigma _\alpha \; := \; \left\{ (\alpha , \beta ) \; : \; \beta \in \Delta ^{\fancyscript{W}}_{\fancyscript{S}} \right\} . \end{aligned}$$

(12)

On the other hand, in most interesting cases the number $| \fancyscript{C}| \, {|\fancyscript{W}|}^2$ of parameters of this model will be a huge. It turns out, however, that this “full” representation of policies is highly redundant. One obvious redundancy follows from the fact that the $\alpha (w,a;w')$ are probability distributions in $w'$, that is $\sum _{w'} \alpha _{w,w'}(a) = 1$ for all $w \in \fancyscript{W}$ and all $a\in \fancyscript{A}$. As the space of constant functions on $\fancyscript{A}$ cancels out for each $c \in \fancyscript{C}$, the number of parameters can be reduced to $| \fancyscript{C}| \, (|\fancyscript{W}|^2 -1)$. I argue that a further, conceptually deeper, reduction can be expected in embodied systems. Formally, it is reflected by the fact that the sum

$$\begin{aligned} \sum _{w,w'} \theta _{w, w'} ( c ) \, \alpha _{w, w'}(a), \end{aligned}$$

(13)

which appears in (11), can be strongly simplified due to linear dependence of the vectors $\alpha _{w,w'} \in {\mathbb {R}}^{\fancyscript{A}}$, $w,w' \in \fancyscript{W}$. Let us consider two obvious reasons for this.

Assume that the world state $w'$ can not be reached from the world state w in one step, that is

$$\begin{aligned} \alpha (w, a ; w') = 0 \quad \text{ for } \text{ all } a \in \fancyscript{A}. \end{aligned}$$

(14)

In this case, the corresponding term disappears from the sum (13). In the situation of an embodied agent, the majority of pairs $(w,w')$ has this property, because the physical constraints of the sensorimotor loop exclude most of the transitions from w to $w'$ in one step.

It is not necessary that (14) holds in order to ignore a term from the sum (13). It is already sufficient that there is a constant $r \in {\mathbb {R}}$ such that

$$\begin{aligned} \alpha (w, a ; w') = r \quad \text{ for } \text{ all } a \in \fancyscript{A}. \end{aligned}$$

(15)

Although formally this property is a simple extension of (14), it highlights another important aspect. If $\alpha (w , a ; w') = r > 0$ for all $a \in \fancyscript{A}$ then $w'$ can be reached from w in one step but in a way that does not involve the actuators. The transition from w to $w'$ is not sensitive to the actuators, and therefore its representation does not play any role in our parametrisation (11).

Clearly, we can adjust the parametrisation (11) to the dimension $d_\alpha $ of the vector space

$$\begin{aligned} V_\alpha \; := \; \mathrm{span} \left\{ \alpha _{w,w'} \; : \; w,w' \in \fancyscript{W}\right\} \; \subseteq \; {\mathbb {R}}^{\fancyscript{A}}, \end{aligned}$$

which is at most $| \fancyscript{A}|$. Furthermore, note that the constant functions are contained in $V_\alpha $. Therefore, there are $d_\alpha - 1$ functions $\alpha _k \in V_\alpha $, which we refer to as feature vectors, such that every policy of the structure (11) can be expressed in terms of a linear combination of the $\alpha _k$ and a constant function. This leads to the following simplification of the parametrisation (11).

Proposition 1

The parametrisation $\eta : {\mathbb {R}}^{\fancyscript{C}} \times {\mathbb {R}}^{d_\alpha - 1} \rightarrow \Delta ^{\fancyscript{C}}_{\fancyscript{A}},$

$$\begin{aligned} \theta \; \mapsto \; \pi _\theta (c;a) := \frac{\exp \left( \sum _{k = 1}^{d_\alpha - 1} \theta _k(c) \, \alpha _k(a) \right) }{\sum _{a'} \exp \left( \sum _{k = 1}^{d_\alpha - 1} \theta _k (c) \, \alpha _k (a' ) \right) }, \end{aligned}$$

(16)

defines an embodied universal approximator for $\Sigma _\alpha. $

In this parametrisation of the policies, we have at most $|\fancyscript{C}| \, (d_\alpha - 1)$ parameters which can be much smaller than the number of parameters in the parametrisation (11). However, clearly they define the same policy model, which is a maximum entropy model. This was motivated by the idea that the maximum entropy distributions correspond to the least complex or structured ones. However, high entropy distributions tend to have a large support. We can take a different standpoint and ask the following question: If an agent can generate a behaviour with only a few actuator states, why should it involve more actuator states than that? In other words, if we consider actuator states to be costly then we should design the model in a way that the policies have only a minimal required number of actuator states. Interpreting, for simplicity of the argument, entropy as a cardinality measure, then we would rather prefer distributions with minimal entropy. From this perspective, the minimum entropy distributions appear more natural. Note, that, in contrast to the uniqueness of the maximum entropy distribution in a class $[\mu ]$ where $\mu \in \Delta _{\fancyscript{A}},$ there are many distributions that locally minimise the entropy in that class. On the other hand, it turns out that the cardinality of the support of all these distributions is upper bounded by $d_\alpha $ (see corresponding derivations in [2]). More precisely, the following holds: If for a policy $\pi, $ all distributions $\pi (c; \cdot )$ are local minimisers of the entropy in the respective classes $[\pi (c; \cdot )],$ then

$$\begin{aligned} | \left\{ a \in \fancyscript{A}\; : \; \pi (c; a) > 0 \right\} | \; \le \; d_\alpha \quad \text{ for } \text{ all } c \in \fancyscript{C}. \end{aligned}$$

(17)

Clearly, a policy model ${\mathcal M}_{\mathrm{policy}}$ is an embodied universal approximator, if it contains all policies that satisfy (17) in its closure. We now explicitly define such a policy model with

$$\begin{aligned} \mathrm{dim} \, {\mathcal M}_{\mathrm{policy}} \; \le \; 2 \, | \fancyscript{C}| \, d_\alpha . \end{aligned}$$

Proposition 2

For any injective function $f: \fancyscript{A}\rightarrow {\mathbb {R}},$ for which $f(a) = f(a')$ only if $a = a',$ the parametrisation $\eta : {\mathbb {R}}^{\fancyscript{C}} \times {\mathbb {R}}^{2 \, d_\alpha } \rightarrow \Delta ^{\fancyscript{C}}_{\fancyscript{A}},$

$$\begin{aligned} \theta \; \mapsto \; \pi _\theta (c;a) \; := \; \frac{\exp \left( \sum _{k = 1}^{2 \, d_\alpha } \theta _k(c) \, f^k(a) \right) }{\sum _{a'} \exp \left( \sum _{k = 1}^{2 d_\alpha } \theta _k (c) \, f^k(a' ) \right) }, \end{aligned}$$

(18)

defines an embodied universal approximator for $\Sigma _\alpha $ (here, $f^k(a)$ denotes the kth power of f(a)).

The proof of this statement is based on [2, 14] and is given in the Appendix. The policy model defined by (18) has $2 \, |\fancyscript{C}| \, d_\alpha $ parameters, which is larger than the number $|\fancyscript{C}| \, (d_\alpha - 1)$ of parameters of the previous model defined in Proposition 1. However, there are important differences. The feature vectors $f^k,$ $k = 1,\dots , 2 \, d_\alpha, $ do not explicitly depend on $\alpha, $ but their number depends on the dimension of $V_\alpha. $ Therefore, they can be used for all mechanisms $\alpha '$ that have at most the same dimension $d_{\alpha '}$ as $\alpha. $ Furthermore, these feature vectors do not require much information to be specified. More precisely, the property of f that is required in Proposition 2 is a generic property of real functions on $\fancyscript{A}$ so that f can be chosen randomly. Furthermore, given such a function f as the first feature vector, all the other feature vectors are determined as the k-th powers of f. With the set

$$\begin{aligned} \Sigma _{d_\alpha } \; := \; \left\{ (\alpha ', \beta ) \; : \; d_{\alpha '} \le d_\alpha , \; \beta \in \Delta ^{\fancyscript{W}}_{\fancyscript{S}} \right\} \end{aligned}$$

(19)

of extrinsic constrains, which is larger than $\Sigma _\alpha $ as defined by (12), we have the following direct implication of Proposition 2.

Corollary 1

The policy model of Proposition 2 is an embodied universal approximator for $\Sigma _{d_\alpha }.$

4.2 From Models to Architectures

In the last section, we defined policy models for general unstructured sets $\fancyscript{C}$ and $\fancyscript{A},$ and the policies were abstract objects without any reference to a mechanism that generates or computes an output $a \in \fancyscript{A}$ given the input $c \in \fancyscript{C}$. In this section, we extend our previous ideas to the setting of systems consisting of interacting nodes such as neuronal systems. However, we restrict attention to the actuators and assume, for simplicity, that we have $n \ge 1$ binary actuators with states $-1$ and $+1,$ that is $\fancyscript{A}= {\{-1,+1\}}^n.$ For each non-empty subset $N \subseteq [n] := \{1,\dots ,n\},$ the interaction among the actuators in N is described in terms of the monomial

$$\begin{aligned} a = (a_1,\dots , a_n) \; \mapsto \; \theta _{N}(c) \prod _{i \in N} a_i, \end{aligned}$$

(20)

where $\theta _N(c)$ is the interaction coefficient or strength of the actuators in N, which we assume to be dependent on the controller state $c \in \fancyscript{C}$. We refer to the cardinality of N as the order of the interaction. For $| N | = 2,$ we have the important special case of a pairwise interaction, which is of particular interest within the field of neural networks. There, the interaction coefficients are usually interpreted as the synaptic connection strengths between the neurons. If we want to incorporate interactions among nodes of all subsets $N \subseteq [n],$ then we have to consider the following sum of monomials (20):

$$\begin{aligned} \sum _{\emptyset \not = N \subseteq [n]} \theta _{N}(c) \prod _{i \in N} a_i. \end{aligned}$$

(21)

Note that, for each $c\in \fancyscript{C}$ we have $2^n - 1$ parameters $\theta _N(c),$ so that the overall number of parameters is $|\fancyscript{C}| \left( |\fancyscript{A}| - 1\right), $ which coincides with the dimension of the polytope $\Delta ^{\fancyscript{C}}_{\fancyscript{A}}.$ Clearly, any function $\fancyscript{A}= {\{-1,+1\}}^n \rightarrow {\mathbb {R}}$ can be written in the form (21). By setting in (21) all $\theta _N(c)$ to zero, if $| N | > k,$ we reduce this space of functions to the space of functions that involve only interaction of at most order k. This defines the following policy model, which we refer to as k-interaction model:

$$\begin{aligned} \theta \; \mapsto \; \pi _\theta (c ; a) = \frac{\exp \left( \sum _{ \emptyset \not = N \subseteq [n] \atop |N| \le k } \theta _{N}(c) \prod _{i \in N} a_i ) \right) }{ \sum _{a'} \exp \left( \sum _{ \emptyset \not = N \subseteq [n] \atop |N| \le k }\theta _{N}(c) \prod _{i \in N} a'_i \right) }. \end{aligned}$$

(22)

Given this increasing family of models, ordered according to k, we now address the problem of finding the sufficient order of interaction so that the corresponding k-interaction model is an embodied universal approximator for $\Sigma _{d_\alpha }.$ For a first simple estimate, we consider again the policy model of Proposition 2. Given that we now assume $\fancyscript{A}= {\{ -1,+1\}}^n,$ it is possible to define the function f as a linear function, that is $f(a) = f(a_1,\dots ,a_n) = \sum _{i = 1}^n w_i \, a_i$ so that $f(a) \not = f(a')$ whenever $a \not = a'.$ Note that this follows from the finiteness of $\fancyscript{A}.$ As a linear function defined on ${\mathbb {R}}^n,$ f is never injective, except for $n = 1.$ For the kth powers of f we obtain

$$\begin{aligned} f^k(a_1,\dots ,a_n) = \left( \sum _{i = 1}^n w_i \, a_i \right) ^k = \sum _{i_1,\dots ,i_k} w_{i_1} \cdots w_{i_k} \, a_{i_1} \cdots a_{i_k}. \end{aligned}$$

This implies

$$\begin{aligned} \sum _{k = 1}^{2 \, d_\alpha } \theta _k(c) \, f^k(a_1,\dots ,a_n) = \sum _{k = 1}^{2 \, d_\alpha } \theta _k(c) \sum _{i_1,\dots ,i_k} w_{i_1} \cdots w_{i_k} \, a_{i_1} \cdots a_{i_k} = \sum \nolimits_{N \subseteq [n] \atop |N| \le 2 d_\alpha} \theta _N(c) \, \prod _{i \in N} a_i \end{aligned}$$

with corresponding coefficients $\theta _N(c).$ This proves that the policy model of Proposition 2 is contained in the $(2 \, d_\alpha )$-interaction model, and that therefore the latter is also an embodied universal approximator. However, it turns out that this can be considerably improved. It directly follows from a result by Kahle [12] that $2 \, d_\alpha $ can be replaced by a much smaller interaction order, which defines the following policy model.

Proposition 3

With

$$\begin{aligned} k(\alpha ) \; := \; \lceil \log _2 \left( d_\alpha + 1\right) \rceil , \end{aligned}$$

the $k(\alpha )$-interaction model is an embodied universal approximator for $\Sigma _{d_\alpha }$ ($ \lceil x \rceil $ denotes the smallest integer $\ge x$).

In neural networks theory, the interaction order is generally considered to be pairwise. This is valid for the policy model of Proposition 3 only in the trivial case $d_\alpha = 1.$ If we aim at a neuronal interpretation of policy models then we have to represent higher order interactions in terms of pairwise interactions, which is always possible by extending the system appropriately. One important special case in line with this general idea is given by the so-called restricted Boltzmann machine (RBM). In order to define it, we extend the above system of n nodes by further m so-called hidden nodes with the same state space $\{-1,+1\}.$ The overall state is then given as a pair $(h,a) = (h_1,\dots ,h_m, a_1,\dots ,a_n) \in {\{-1,+1\}}^{m + n}.$ We now consider the following family of kernels, which involves at most pairwise interactions between the hidden nodes and the actuators:

$$\begin{aligned} \pi _\theta (c ; h, a) := \frac{\exp \left( \sum _{1 \le i \le m \atop 1 \le j \le n} \theta _{i,j}(c) \, h_i \, a_j + \sum _{1 \le i \le m} \theta _{i}(c) \, h_i + \sum _{1 \le j \le n} \theta _j (c) \, a_j \right) }{ \sum _{h' , a'} \exp \left( \sum _{1 \le i \le m \atop 1 \le j \le n} \theta _{i,j}(c) \, h'_i \, a'_j + \sum _{1 \le i \le m} \theta _{i}(c) \, h'_i + \sum _{1 \le j \le n} \theta _j (c) \, a'_j \right) }. \end{aligned}$$

(23)

Finally, we marginalise out the hidden nodes and obtain the following policy model, which is, for each $c \in \fancyscript{C},$ a restricted Boltzmann machine:

$$\begin{aligned} \theta \; \mapsto \; \pi _\theta (c ; a) = \sum _{h} \pi _\theta (c ; h, a). \end{aligned}$$

(24)

Therefore, we call this policy model controlled restricted Boltzmann machine (see Fig. 5). Due to general results by Le Roux and Bengio [22] and Montúfar and Ay [15], it is known that any distribution of support cardinality smaller than or equal to $d_\alpha $ can be represented by an RBM with $d_\alpha - 1$ hidden nodes. Together with (17), this directly implies the following result.

Proposition 4

A controlled restricted Boltzmann machine, defined by (23) and (24), with $m = d_\alpha - 1$ hidden nodes is an embodied universal approximator for $\Sigma _{d_\alpha }.$

The dependence of the parameters $\theta _{i,j}(c),$ $\theta _i(c),$ and $\theta _j(c)$ on the controller state c can be quite complicated. Assuming that the controller also has a composite structure with k binary nodes, it is possible to represent this dependence in terms of pairwise interactions between the controller nodes and the hidden nodes. This leads to the definition of a conditional restricted Boltzmann machine [17]. Cheap control with such machines has been theoretically and experimentally studied in [16]. This study is based on the notion of embodiment dimension, which is a refinement of the dimension $d_\alpha $ used in the present article.

5 Conclusions

In order to approach a general theory of embodied agents, I have introduced a formal model of the sensorimotor loop, which specifies its intrinsic and extrinsic mechanisms, building on previous work [7, 13]. The extrinsic mechanisms represent the embodiment constraints of the system which can be utilised by appropriate adjustment of the intrinsic mechanisms in order to express useful behaviours. This requires some degree of flexibility of the control architecture, which I formalised in terms of a sufficiency notion. As a particular case, I studied in more detail embodied universal approximation and the corresponding design of controller models in terms of geometry. However, I argued that sufficiency should not be the only requirement involved in systems design. In order to address the notion of cheap design within the field of embodied intelligence, we have to identify sufficient controller architectures with low complexity. Clearly, one would assume that selection pressure has generated such architectures of naturally evolved brains, which can cope with limitations of mass and energy resources. For instance, the field of sparse coding has provided evidence for internal representations of external stimuli in terms of sparse neuronal activity [18]. However, there are many aspects that contribute to the overall complexity of a controller model, and we are far from a conclusive definition of complexity that would account for the right notion of cheap control. Therefore, I was not very precise in this regard and used a few complexity notions as motivation for the policy models that I have defined. In this context, low complexity of a policy meant: 1. high entropy of the policy, 2. low number of actuator states used by the policy, 3. low interaction order among the actuators, and 4. low number of hidden nodes in a restricted Boltzmann machine.

There is a further important aspect of complexity, which I did not address explicitly. If we design policy models, we have to distinguish between the complexity of the model itself and the complexity of a particular policy, taken from that model. One can design models for which the individual policies have low complexity, but the overall model is hard to describe. However, it is problematic to implement such a structure. In nature, its information has to be transmitted through genetic inheritance. This information transmission clearly has a limited capacity, which is, for instance, modulated by the mutation rate. Therefore, robustness issues have also to be taken into account [5, 20]. To conclude, the right choice of a model should balance the complexities of both, the model itself and the policies that are implemented by the model. This is a quite natural idea within complexity theory [6, 10, 11, 21, 23].

Acknowledgments

I would like to thank Keyan Ghazi-Zahedi, Guido Montúfar, and Johannes Rauh for many stimulating discussions on the subject of embodied intelligence, and, in particular, systems design. The proof of Proposition 2, presented in the Appendix, uses an argument by Johannes Rauh, which was not used in the original proof [14], leading to an improvement of the original result. This work has been supported by the DFG Priority Program Autonomous Learning.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Vorheriger Artikel The Optimization Route to Robotics—and Alternatives

Nächster Artikel Online Learning of Bipedal Walking Stabilization

Unsere Produktempfehlungen

KI - Künstliche Intelligenz

The Scientific journal "KI – Künstliche Intelligenz" is the official journal of the division for artificial intelligence within the "Gesellschaft für Informatik e.V." (GI) – the German Informatics Society - with constributions from troughout the field of artificial intelligence.

Jetzt informieren

Appendix

Proof of Proposition 2

We consider a policy $\pi $ with the property that the supports $\fancyscript{A}_c$ of its distributions $\pi (c; \cdot ),$ $c \in \fancyscript{C},$ on $\fancyscript{A}$ have at most $d_\alpha $ elements. For each $c \in \fancyscript{C},$ we define the function $g_c: \fancyscript{A}\rightarrow {\mathbb {R}},$

$$\begin{aligned} a \; \mapsto \; g_c(a) \; := \; \prod _{a' \in \fancyscript{A}_c} {(f(a) - f(a'))}^2 \; = \; \sum _{k = 0}^{2 \, d_\alpha } \theta ^{(0)}_k (c) \, {f^k(a)}. \end{aligned}$$

Obviously, $g_c \ge 0,$ and $g_c(a) = 0$ if and only if $a \in \fancyscript{A}_c.$ Furthermore, we consider the polynomial $h_c: {\mathbb {R}} \rightarrow {\mathbb {R}},$ $t \mapsto \sum _{k = 0}^{d_\alpha - 1} \theta _k^{(1)}(c) \, t^k,$ that satisfies

$$\begin{aligned} \sum _{k = 0}^{d_\alpha - 1} \theta _k^{(1)}(c) \, f^k(a) \; = \; h_c(f(a)) \; = \; \ln \pi (c; a), \qquad a \in \fancyscript{A}_c. \end{aligned}$$

Obviously, the one parameter family

$$\begin{aligned} \pi _\lambda (c;a) \; = \; \frac{e^{h_c(f(a)) - \lambda \, g_c(a) }}{ \sum _{a' \in \fancyscript{A}} e^{h_s(f(a')) - \lambda \, g_c(a') } } \end{aligned}$$

of policies is contained in the model defined by (18). Note that we can ignore the term for $k = 0$ because this gives a constant which cancels out. Furthermore, it is easy to see that

$$\begin{aligned} \lim _{\lambda \rightarrow \infty } \, \pi _\lambda (c;a) \; = \; \lim _{\lambda \rightarrow \infty } \, \frac{e^{h_c(f(a))} \left( e^{- g_c(a)}\right) ^\lambda }{ \sum _{a' \in \fancyscript{A}} e^{h_c(f(a'))} \left( e^{- g_c(a')}\right) ^\lambda } \; = \; \pi (c;a). \end{aligned}$$

(25)

This proves that $\pi $ is in the closure of the model defined by (18). $\square $

Amari S, Nagaoka H (2000) Methods of information geometry. American Mathematical Society, Oxford University Press

Ay N (2002) An information-geometric approach to a theory of pragmatic structuring. Ann Probab 30(1):416–436MathSciNetCrossRefMATH

Ay N, Jost J, Lê HV, Schwachhöfer L (2015) Information Geometry, Springer (submitted)

Ay N, Knauf A (2007) Maximizing multi-information. Kybernetika 42(5):517–538MathSciNet

Ay N, Krakauer DC (2007) Geometric robustness theory and biological networks. Theory Biosci 125(2):93–121

Ay N, Müller M, Szkoła A (2010) Effective complexity and its relation to logical depth. IEEE Trans Inf Theory 56(9):4593–4607CrossRef

Ay N, Zahedi K (2014) On the causal structure of the sensorimotor loop. In: Prokopenko M (ed) Guided self-organization: inception. Springer, Berlin, Heidelberg

Ay N, Montúfar G, Rauh J (2012) Selection criteria for neuromanifolds of stochastic dynamics. Springer, Post-conference proceedings Advances in Cognitive Neurodynamics (III)

Brooks RA (1991) Intelligence without representation. Artif Intell 47(1–3):139–159CrossRef

10.

Gell-Mann M, Lloyd S (1996) Information measures, effective complexity, and total information. Complexity 2:44–52MathSciNetCrossRefMATH

11.

Jost J (2004) External and internal complexity of complex adaptive systems. Theory Biosci 123:69–88CrossRef

12.

Kahle T (2010) Neighborliness of marginal polytopes. Contrib Algebra Geom 51(1):45–56MathSciNetMATH

13.

Klyubin AS, Polani D, Nehaniv CL (2004) Tracking information flow through the environment: Simple cases of stigmerg. In: Pollack J (ed) Artificial Life IX: Proceedings of the Ninth International Conference on the simulation and synthesis of living systems, pages 563568. MIT Press

14.

Matúš F, Ay N (2003) On maximization of the information divergence from an exponential family. In: Vejnarová J (ed) Proceedings of WUPES’03, University of Economics Prague pp 199–204

15.

Montúfar G, Ay N (2011) Refinements of universal approximation results for deep belief networks and restricted Boltzmann machines. Neural Comput 23(5):1306–1319MathSciNetCrossRefMATH

16.

Montúfar G, Zahedi K, Ay N (2015) A theory of cheap control in embodied systems. PLOS Comput Biol. arXiv:1407.6836 (in press)

17.

Montúfar G, Ay N, Zahedi K (2015) Geometry and expressive power of conditional restricted Boltzmann machines. J Mach Learn Res. arXiv:1402.3346 (in press)

18.

Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381:607–609CrossRef

19.

Pfeifer R, Bongard JC (2006) How the body shapes the way we think: a new view of intelligence. The MIT Press (Bradford Books), Cambridge

20.

Rauh J, Ay N (2013) Robustness, canalising functions, and systems design. Theory in Biosci. doi:10.1007/s12064-013-0186-3

21.

Rissanen J (1989) Stochastic complexity in statistical inquiry. World Scientific, SingaporeMATH

22.

Le Roux N, Bengio Y (2008) Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput 20(6):1631–1649MathSciNetCrossRefMATH

23.

Vapnik VN (1998) Statistical learning theory. Wiley, New YorkMATH

24.

Zahedi K, Ay N (2013) Quantifying morphological computation. Entropy 15(5):1887–1915. doi:10.3390/e15051887 CrossRefMATH

25.

Zahedi K, Ay N, Der R (2010) Higher coordination with less control: a result of information maximisation in the sensori-motor loop. Adapt Behav 18:338–355CrossRef

Titel: Geometric Design Principles for Brains of Embodied Agents
verfasst von: Nihat Ay
Publikationsdatum: 01.11.2015
Verlag: Springer Berlin Heidelberg
Erschienen in: KI - Künstliche Intelligenz / Ausgabe 4/2015
Print ISSN: 0933-1875
Elektronische ISSN: 1610-1987
DOI: https://doi.org/10.1007/s13218-015-0382-z

Springer Professional

Abstract

1 Introduction

2 A Formal Model of Embodied Agents

2.1 The Basic Components of the Sensorimotor Loop

2.2 The Mechanisms of the Sensorimotor Loop

2.3 From Mechanisms to Embodied Behaviours

3 Intrinsic and Extrinsic Sensorimotor Constraints

4 Cheap Embodied Universal Approximation

4.1 General Selection of Policy Models

4.2 From Models to Architectures

5 Conclusions

Acknowledgments

Unsere Produktempfehlungen

KI - Künstliche Intelligenz

Appendix

Weitere Artikel der Ausgabe 4/2015

Special Issue on Autonomous Learning

Online Learning of Bipedal Walking Stabilization

Autonomous Learning of Representations

The Optimization Route to Robotics—and Alternatives

Pleased to Meet You!

Statistical Relational Artificial Intelligence: From Distributions through Actions to Optimization