Input layer: it is used to input the ultrasonic image of pre-treated papillary carcinoma of thyroid.
PrimaryCaps initial capsule layer: the primarily low-level features can be stored as a vector. In the initial capsule layer, the spatial dimension is transformed, and the expression of the vector is obtained to prepare for the input of the next layer.
Digital capsule layer: this layer is connected with the initial capsule layer by the form of the vector to vector. Moreover, the output of this layer is calculated using a dynamic routing algorithm.
Output layer: the length of the output vector means the probability of occurrence of its represented content. Thus, the output result of classification is the L2 norm of each capsule vector.
Each capsule unit of the capsule layer is represented by a vector, where a capsule unit contains the attitude parameters of the object and the probability of belonging to this class. In the traditional convolutional network, the pooling layer is a shortcoming. In the process of dimensionality reduction, the layer loses some important feature information, which leads to reducing identification accuracy. The capsule network is used in the network dynamic routing method, which is an effective way to solve the above problem [
7]. For dynamic routing method, the correlation of fluctuation capsules layer can be updated by judging the weight value. Interestingly, some trade-offs between DigitCaps layer capsule unit and the proportion of the learning process are required. If the predicted results is closed to the real value, the associated values and capsule unit of DigitCaps layer are promoted. If the predicted results and the real value are far off, the correlation value and certain capsule unit of DigitCaps layer are declined. The specific calculation process of dynamic routing is given as follows:
$$\begin{aligned} \widehat{u}_{j|i}=W_{ij}{u_i}, \end{aligned}$$
(1)
where
\({u_i}\) is the output of the
ith capsule layer,
\({u_{j|i}}\) means the prediction vector output of the
jth layer obtained by calculating the
ith capsule layer,
\({W_{ij}}\) is the weight value that is used for learning and back propagation:
$$\begin{aligned} {c_{ij}} = \frac{\exp (b_{ij})}{\sum _k{\exp (b_{ik})}}, \end{aligned}$$
(2)
where
\(c_{ij}\) is the associated value of adjacent capsule layers,
\(b_{ij}\) is the probability that the
ith capsule layer is selected by the
jth capsule layer. The initial value of
\(b_{ij}\) is set to 0, when the routing layer starts to execute:
$$\begin{aligned} {s_j} = \sum _i{c_{ij}}{\widehat{u}_{j|i}}, \end{aligned}$$
(3)
where
\({s_j}\) is the input vector of the
jth capsule layer:
$$\begin{aligned} {v_j} = \frac{{{{\left\| {{s_j}} \right\| }^2}}}{{1 + {{\left\| {{s_j}} \right\| }^2}}}\frac{{{s_j}}}{{\left\| {{s_j}} \right\| }}, \end{aligned}$$
(4)
where
\({v_j}\) is the output of the
jth capsule layer; the output value
\(\frac{{{s_j}}}{{\left\| {{s_j}} \right\| }}\) is guaranteed to be in the interval [0, 1], which does not exceed the expected range value. (
4) is the remainder of the non-linear activation function:
$$\begin{aligned} {b_{ij}} \leftarrow {b_{ij}} + {\widehat{u}_{j\left| \mathrm{{i}} \right. }}{v_j}, \end{aligned}$$
(5)
where the input
\({\widehat{u}_{j\left| \mathrm{{i}} \right. }}\) and output
\({v_j}\) are updated by the inner product between vectors. The value of
\({b_{ij}}\) is a parameter, which adjusts the degree of correlation between the capsule layers:
$$\begin{aligned} {L_c}&= {T_c}\max {(0,{m^ + } - \left\| {{v_c}} \right\| )^2} \nonumber \\&\quad +\lambda (1 - {T_c})\max {(0,\left\| {{v_c}} \right\| - {m^ - })^2}. \end{aligned}$$
(6)
In (
6),
\({L_c}\) is the loss function of the classified network modules.
c is the category of classification.
\({T_{c}}\) is an indicator function of the classification.
\({T_{c}}=1\) when
c is existence, while
\({T_{c}}=0\) when
c does not exist.
\({m^+}\) means the upper edge, i.e., the punishment false negative;
\({m^-}\) means the lower margin, i.e., false positives are penalized.
\(\lambda \) denotes the proportional coefficient, which is used to adjust the proportion between
\({m^+}\) and
\({m^-}\).
\({m^+}\),
\({m^-}\) and
\(\lambda \) are hyper parameters that have been set before the capsule network learning.
The convolution layer of the capsule network has a low ability to process feature information, which is not enough to provide more detailed advanced feature information for the initial capsule layer. To solve such problem, a network model with better performance is proposed in this paper, which is named ResCaps network model.