Training
The training procedure for the first processing-stage was as follows.
The PC/BC-DIM Algorithm
The main mathematical operation required to implement the PC/BC-DIM algorithm is the calculation of sums of products. The algorithm can therefore be equally simply implemented using matrix multiplication or convolution.
The matrix-multiplication version of PC/BC-DIM is illustrated in Fig.
1b and was implemented using the following equations:
$$ e=x \oslash \left[r\right]_{\epsilon_{2}} $$
(2)
$$ y \leftarrow \left[y \right]_{\epsilon_{1}} \odot We $$
(3)
Where
x is a (
m by 1) vector of input activations;
e is a (
m by 1) vector of error neuron activations;
r is a (
m by 1) vector of reconstruction neuron activations;
y is a (
n by 1) vector of prediction neuron activations;
W is a (
n by
m) matrix of feedforward synaptic weight values, defined by the training process described in the “
Training” section;
V is a (
m by
n) matrix of feedback synaptic weight values; [
v]
𝜖
= max(
𝜖,
v);
𝜖
1 and
𝜖
2 are parameters; ⊘ and ⊙ indicate element-wise division and multiplication, respectively; and ← means that the left-hand side of the equation is assigned the value of the right-hand side. The matrix
V is equal to the transpose of the
W but each column of
V is normalized to have a maximum value of one. Hence, the feedforward and feedback weights are simply rescaled versions of each other.
The convolutional version of PC/BC-DIM was implemented using the following equations:
$$ R_{i}= \sum\limits_{j=1}^{p} \left( v_{ji} \star Y_{j}\right) $$
(4)
$$ E_{i}=X_{i} \oslash \left[R_{i}\right]_{\epsilon_{2}} $$
(5)
$$ Y_{j} \leftarrow \left[Y_{j}\right]_{\epsilon_{1}} \odot \sum\limits_{i=1}^{k} \left( w_{ji} \star E_{i}\right) $$
(6)
Where
X
i
is a two-dimensional array representing channel
i of the input;
R
i
is a two-dimensional array representing the network’s reconstruction of
X
i
;
E
i
is a two-dimensional array representing the error between
X
i
and
R
i
;
Y
j
is a two-dimensional array that represent the prediction neuron responses for a particular class,
j, of prediction neuron;
w
ji
is a two-dimensional kernel representing the feedforward synaptic weights from a particular channel,
i, of the input to a particular class,
j, of prediction neuron, defined by the training process described in the “
Training” section;
v
ji
is a two-dimensional kernel representing the feedback synaptic weights from a particular class,
j, of prediction neuron to a particular channel,
i of the input; and ⋆ represents cross-correlation. The weights
v
ij
are equal to the weights
w
ij
but are rotated by 180
∘ and are normalised so that for each
j the maximum weight value, across all
i, is equal to one. Hence, the feedforward weights, between a pair of error-detecting and prediction neurons, and the feedback weights, between the corresponding pair of reconstruction and prediction neurons, are simply re-scaled versions of each other.
The matrix-multiplication and convolutional version of PC/BC-DIM are interchangeable, and which particular method was used depended on which was most convenient for the particular task. For example, the convolutional version was used when prediction neurons with identical RFs were required to be replicated at every pixel location in an image. To simplify the description of the proposed method, the rest of the text will refer only to the matrix-multiplication version of PC/BC-DIM.
For all the experiments described in this paper,
𝜖
1 and
𝜖
2 were given the values
\(\epsilon _{1}=\frac {\epsilon _{2}}{max\left (\tilde {V}\right )}\) (where
\(\tilde {V}\) is a vector containing the sum of each row of
V, i.e., the sums of feedback weights targeting each reconstruction neuron) and
𝜖
2 = 1×10
−2. Parameter
𝜖
1 prevents prediction neurons becoming permanently non-responsive. It also sets each prediction neuron’s baseline activity rate and controls the rate at which its activity increases when a new stimulus appears at the input to the network. Parameter
𝜖
2 prevents division-by zero errors and determines the minimum strength that an input is required to have in order to effect prediction neuron response. As in all previous work with PC/BC-DIM, these parameters have been given small values compared to typical values of
y and
x, and hence, have negligible effects on the steady-state activity of the network. To determine this steady-state activity, the values of
y were all set to zero, and Eqs.
1 to
3 were then iteratively updated with the new values of
y calculated by Eq.
3 substituted into Eqs.
1 and
3 to recursively calculate the neural activations. This process was terminated after 50 iterations. After 50 iterations, values of
y less than 0.001 were set to zero. To perform simulations with a hierarchical model, the steady-state responses for the first processing-stage were determined. The first-stage prediction neuron responses were then provided as input to the second processing-stage, and Eqs.
1 to
3 applied to the second processing-stage to determine its response.
3
The values of
y represent predictions of the causes underlying the inputs to the network. The values of
r represent the expected inputs given the predicted causes. The values of
e represent the discrepancy (or residual error) between the reconstruction,
r, and the actual input,
x. The full range of possible causes that the network can represent are defined by the weights,
W (and
V). Each row of
W (which correspond to the weights targeting an individual prediction neuron, i.e., its RF) can be thought of as a “dictionary element,” or “basis vector” or “elementary component” or “preferred stimulus,” and
W as a whole can be thought of as a “dictionary” or “codebook” of possible representations, or a model of the external environment. The activation dynamics, described by Eqs.
1,
2, and
3, perform gradient descent on the reconstruction error in order to find prediction neuron activations that accurately reconstruct the input [
14,
18,
62]. Specifically, the equations operate to find values for
y that minimise the Kullback-Leibler (KL) divergence between the input (
x) and the reconstruction of the input (
r) [
14,
63]. The activation dynamics thus result in the PC/BC-DIM algorithm selecting a subset of active prediction neurons whose RFs (which correspond to dictionary elements) best explain the underlying causes of the sensory input. The strength of activation reflects the strength with which each dictionary element is required to be present in order to accurately reconstruct the input. This strength of response also reflects the probability with which that dictionary element (the preferred stimulus of the active prediction neuron) is believed to be present, taking into account the evidence provided by the input signal and the full range of alternative explanations encoded in the RFs of the whole population of prediction neurons.
Compared to some earlier implementations of the PC/BC-DIM model, the algorithm described here differs in the following respects:
1.
The calculation of the reconstruction error (in Eq.
2) is performed using max(
𝜖
2,
r) rather than
𝜖
2 +
r.
2.
The calculation of the prediction neuron responses (in Eq.
3) uses max(
𝜖
1,
y) rather than
𝜖
1 +
y.
3.
The value of 𝜖
1 is a function of the sum of the feedback weights targeting the reconstruction neurons rather than a fixed value (such as 1×10−5).
These changes help PC/BC-DIM to scale-up to very large networks of neurons. Specifically, for a very large population of prediction neurons, adding
𝜖
1 to each prediction neuron response (even when
𝜖
1 is very small) will cause the responses of the reconstruction neurons to be elevated, and the error neurons responses to be suppressed, which will in turn effect the prediction neuron responses. The second change above reduces this effect of
𝜖
1 on the neural responses. The first and third changes allow
𝜖
1 to be given the largest value possible (which speeds-up convergence to the steady-state) while preventing
𝜖
1 from effecting the responses.
In addition, in some earlier implementations of the PC/BC-DIM model, the reconstruction has been used purely as a means to calculate the errors, and hence, Eqs.
1 and
2 have been combined into a single equation. Here, the underlying mathematical model is identical to that used in previous work, but the interpretation has changed in order to consider the reconstruction to be represented by a separate neural population. This change, therefore, has no effect on the current results. However, other recent results have shown that a separate neural population encoding the reconstruction can perform a useful computational role [
42,
64,
65].