From the perspective of an individual robot, the environment is seen as a discrete random variable
V with two possible values
Black and
White.
\(P(V=Black)=P_B\) is the probability that a random place in the arena is black, and thus, it is equal to the proportion of area in the arena that is covered by black tiles. At a given point of time, a robot has made
S observations of the environment, and it could then compute the likelihood of a given hypothesis
h of the value of
\(P_B\) using its past observations, expressed as
\(P(P_B=h|ob_1...ob_S)\). We can then use Bayes’ rule
$$\begin{aligned} P(P_B=h|ob_1...ob_S)=\frac{P(ob_1...ob_S|P_B=h)P(P_B=h)}{P(ob_1...ob_S)}. \end{aligned}$$
(1)
Here,
\(P(P_B=h)\) is the prior and
\(P(ob_1...ob_S)\) is the marginal likelihood, both of which we assume to be the same for all hypotheses. We can then apply the chain rule to
\(P(ob_1...ob_S|P_B=h)\)$$\begin{aligned} =P(ob_1|P_B=h)P(ob_2|P_B=h,ob_1)...P(ob_S|P_B=h,ob_1,...,ob_{S-1}). \end{aligned}$$
(2)
Assuming the observations are all independent of each other, we can remove the dependencies on previous observations, and thus, we have:
$$\begin{aligned} =P(ob_1|P_B=h)P(ob_2|P_B=h)...P(ob_S|P_B=h). \end{aligned}$$
(3)
In practice, we express the quality of option
h as the normalized likelihood of the hypothesis given previous observations. We arrange all the available hypotheses of black tile fill ratios as matrix
H as follows. The first column is the proportion of black tiles (
\(P_B\)), and the second column is the proportion of white tiles (
\(1-P_B\)).
$$\begin{aligned} H=\begin{bmatrix}0.05 &{}&{}\;\;\; 0.95\\ 0.15 &{}&{}\;\;\; 0.85 \\ \cdots \\ 0.95 &{}&{}\;\;\; 0.05\end{bmatrix} \end{aligned}$$
(4)
An observation can be either
\({\underline{ob}} = \begin{bmatrix}1 \\ 0\end{bmatrix}\) for black or
\({\underline{ob}} = \begin{bmatrix}0 \\ 1\end{bmatrix}\) for white tiles. Therefore, the vector including the quality of all options for a robot can be calculated:
$$\begin{aligned} {\underline{\rho }}=\Pi _{s=1..S} H \cdot {\underline{ob}}_s. \end{aligned}$$
(5)
In practice, the computation of qualities can be iteratively done by a robot, at very low cost in terms of computational power and memory. The robot starts with a prior belief that all options are equally likely. At a fixed sampling interval, the robot collects an observation sample of the arena floor and multiplies its old belief with a new set of probabilities.
$$\begin{aligned} {\underline{\rho }}_{n,0}=\begin{bmatrix}0.1&0.1&0.1&0.1&0.1&0.1&0.1&0.1&0.1&0.1\end{bmatrix}^T \end{aligned}$$
(6)
$$\begin{aligned} {\underline{\rho }}_{n,s}={\underline{\rho }}_{n,s-1} \circ (H \cdot {\underline{ob}}_s). \end{aligned}$$
(7)
In order to maintain the independence assumption made in Eq. (
3), we made the following design choices during implementation of our considered algorithms. First, we use a sampling interval to be large enough to avoid successive observation samples being collected on one tile, which can cause a high level of correlation between them. In addition, in all considered algorithms, the robot can collect an observation sample only when it is moving forward and not when it is turning. This is also to avoid multiple samples being collected on one tile.