Erschienen in:

Open Access 01.07.2020 | Robust, Adaptive, and Network Control

Comparative Analysis of the Results of Training a Neural Network with Calculated Weights and with Random Generation of the Weights

verfasst von: P.Sh. Geidarov

Erschienen in: Automation and Remote Control | Ausgabe 7/2020

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Neural networks based on metric recognition methods allow, based on the initial conditions of the computer vision task such as the number of images and samples, to determine the structure of the neural network (the number of neurons, layers, connections), and also allow to analytically calculate the values of the weights on the connections of the neural network. As feedforward neural networks, they can also be trained by classical learning algorithms. The possibility of precomputation of the values of the neural network weights allows us to say that the procedure for creating and training a feedforward neural network is accelerated in comparison with the classical scheme for creating and training a neural network where values of the weights are randomly generated. In this work, we conduct two experiments based on the handwritten numbers dataset MNIST that confirm this statement.

An erratum to this article is available online at https://doi.org/10.1134/S0005117920120103.

1 Introduction

Neural networks are being widely used in the modern world, especially in computer vision problems. Despite this, in practice, the creation and training of neural networks remains a complex and often unpredictable operation. This is mainly due to the fact that the process of creating and training neural networks [1, 2] is not strictly defined, which leads to a number of difficulties and makes this process time-consuming. The difficulties lie in the choice of the structure of the neural network itself, as well as in the choice of training parameters.

The works [3, 4] propose the architecture of a neural network that implements metric recognition methods [5]. The structure of these neural networks, i.e., the number of neurons, connections, and layers, is strictly determined based on the initial conditions of the problem for metric recognition methods [5], such as the number of samples and the number of patterns being recognized. The values of connection weights for these networks are also calculated analytically based on metric proximity measures [5]. This possibility already allows to get a working neural network without the use of training algorithms. Based on metric recognition methods, neural networks are a special case of the classical three/four-layered multilayered perceptron, but the architectures of these networks allow to determine the structure of a neural network and analytically determine the values of weights. In addition, the architecture of these networks allows to add new samples and patterns to the neural network in a cascade fashion without changing the previous weight values, which also distinguishes these networks from classical feedforward networks, including deep convolutional networks [6, 7].

It is important to note that there exist neural networks with predetermined weights, such as, for example, the Hopfield and Hamming networks [1, 2]. But these networks are not feedforward networks, they have feedbacks and therefore have a number of difficulties and unsolved problems, including the instability problem for these networks. In addition, fixing the values of weights is sometimes used in feedforward neural networks. This is the so-called approach of "weight freezing” [8] used, for example, in cases where it is necessary to reduce training time by freezing the weight of a hidden neuron if its output does not change significantly during training, or when it is necessary to take advantage of a pretrained neural network and use its weights to further configure the current neural network. But these approaches do not determine the values of weights, they aim to correct and accelerate the learning process.

Since neural networks based on metric recognition methods are feedforward networks, these networks can also be trained by classical learning algorithms [1, 2]. The works [9, 10] show a generalized algorithm for determining the values of weights for the second and third layers of a fully connected neural network, obtained on the basis of a neural network based on metric recognition methods (Fig. 1). This algorithm allows to calculate all possible ranges of values of weights and thresholds of the neural network on the second and third layers for which the logic of the neural network in Fig. 1 remains unchanged. It was also suggested there that the computation process and subsequent retraining of the neural network will be faster than the classical way, when the training of the neural network is based on randomly generated weights. But such an assumption might be false, since it does not exclude the possibility that the retraining process can, on the contrary, destroy the efficiency provided by previously calculated values of weights, and as a result, it might prolong the learning process even further. In other words, experimental confirmation is necessary to verify the hypothesis put forward in publications [9, 10]. For this purpose, in this work we carry out a comparative experiment on the basis of MNIST training of the same neural network with both calculated and randomly generated values of weights. In addition, such an experiment will also allow to verify the operability of neural networks based on metric recognition methods on a large test dataset such as the MNIST dataset, as well as to verify the possibility of operation of these networks with continuous activation functions and, accordingly, the possibility of retraining these networks with the backpropagation algorithm. The fundamental difference from classical methods with this approach will be to speed up the procedure for creating and training a neural network by precomputing the values of the weights for neuron connections, while in classical schemes for creating and training a neural network acceleration is achieved using faster neural network learning algorithms with random initialization of the weights.

Note that the ability to accelerate the creation and training of a neural network through precomputation of weights may be especially relevant for future neural networks that are similar to the capabilities of biological neural networks, where the number of recognized patterns will be significant or even huge.

The goal of this work is to carry out two experiments on training the same neural network with the same number of training epochs, with precomputed weights and with initialization of the values of weights by random numbers. The dataset is the MNIST dataset of handwritten numbers. The final goal is to compare the results of two experiments in order to evaluate both the performance in terms of digit recognition on the MNIST dataset and the total time spent on creating and training the neural network.

2 Fundamentals of neural networks based on metric recognition methods

Metric recognition methods are methods that determine whether a recognized object belongs to a particular pattern in some particular feature coordinate system based on the smallest proximity metric to a sample or to a group of image samples [5]. Here, a "sample” refers to selected samples from each pattern in an existing data sample. The performance of metric recognition methods is based on the compactness hypothesis, which assumes that the elements of one class (pattern) in some feature coordinate system are close to each other. As metric proximity characteristics for one point (cell of a table, pixel), different metric expressions can be used, such as, e.g., expressions of mean squared difference (2.1), (2.2):

$${w}_{ij,k}={\left({y}_{etal}-j\right)}^{2},$$

(2.1)

where w_ij,k is the weight in the weight table with coordinates i, j for the kth neuron of the zero layer, y_etal is the active cell of the binary matrix nearest vertically to the coordinate j. The expression (2.1) is applicable, for example, to the curve recognition problem

$${w}_{c,r}^{(0)}={d}_{1}^{2}=\left({\left({c}_{1}-{c}_{p}\right)}^{2}+{\left({r}_{1}-{r}_{p}\right)}^{2}\right).$$

(2.2)

Formula (2.2) is another possible expression for determining the weight value for a zero layer neuron, where (c_p, r_p) is the coordinate of the point (cell in the weight table) for which the weight value is calculated, (c₁, r₁) are the nearest coordinates of active points (cells of the weight table) for a point (cell) with coordinates (c_p, r_p) (Fig. 2a).

Metric methods include such methods as the method of constructing samples (the method of sample), the nearest neighbor method, the k-nearest neighbors algorithm (k-NN) the method of potential functions, and so on [5]. The neural network architecture shown in Fig. 1 implements the nearest neighbor method whose algorithm is implemented as follows:

find the proximity characteristic (coefficient), for example, by the expressions (2.1), (2.2), for each sample;
find the minimum value of the proximity coefficient;
by the assignment of the nearest sample to a particular pattern, determine the label of the nearest pattern (class).

In the neural network shown in Fig. 1, each selected sample corresponds to one neuron in the zero layer, and according to item 1 of the nearest neighbor algorithm, in each zero layer neuron we find the total value of the proximity coefficient between the input element X and the sample that this neuron corresponds to:

$$S{n}_{k}^{(0)}=\mathop{\sum }\limits_{i=1}^{R}\mathop{\sum }\limits_{j=1}^{C}{x}_{ij}{w}_{i,j},$$

(2.3)

where $S{n}_{k}^{(0)}$ is the value of the state function of the kth zero layer neuron, x_ij is the value in the binary matrix cell of the input element to be recognized. R, C are the number of columns and rows of the weight table and binary matrix. In this case, zero layer neurons are linear neurons, i.e., neurons for which the activation function is equal to the state function of the neuron:

$$f\left(S{n}_{k}^{(0)}\right)=S{n}_{k}^{(0)}.$$

(2.4)

Further, according to item 2 of the nearest neighbor method, a sample (zero neuron) is determined with a minimum value of the proximity coefficient $S{n}_{k}^{(0)}$, which corresponds to the nearest sample to the element X being recognized. To do this, in the first layer we perform a pairwise comparison of all outputs of the zero layer, Fig. 1b, for example, for the first neuron on the first layer the state function is given by the expression

$${Sn}_{1}^{\left(1\right)}={w}_{2}^{(1)}f\left({Sn}_{2}^{\left(0\right)}\right)-{w}_{1}^{(1)}f\left({Sn}_{1}^{\left(0\right)}\right),$$

(2.5)

where ${w}_{2}^{(1)}={w}_{1}^{(1)}=1$, and the activation function for this neuron is determined by the conditions:

$$\begin{array}{l}f\left({Sn}_{1}^{\left(1\right)}\right)=1,\quad {\rm{if}}\quad {Sn}_{1}^{\left(1\right)}<0,\\ f\left({Sn}_{1}^{(1)}\right)=0,\quad {\rm{if}}\quad {Sn}_{1}^{(1)}>0.\end{array}$$

(2.6)

A second layer neuron (Fig. 1c) performs the summation of all outputs of the first layer corresponding to one sample (one neuron of the zero layer):

$${Sn}_{k}^{\left(2\right)}=\mathop{\sum }\limits_{j=1,j\ne k}^{N}f\left({Sn}_{k,j}^{\left(1\right)}\right),$$

(2.7)

where the active output of the second layer is chosen with a threshold value H⁽²⁾ = N − 1 and determines the nearest sample to the recognized object X:

$$\begin{array}{l}f\left({Sn}_{k}^{(2)}\right)=1,\quad {\rm{if}}\quad {Sn}_{k}^{(2)}\ge \left(N-1\right)={H}^{\left(2\right)},\\ f\left({Sn}_{k}^{(2)}\right)=0,\quad {\rm{if}}\quad {Sn}_{k}^{(2)}<\left(N-1\right)={H}^{(2)}.\end{array}$$

(2.8)

Each kth neuron on the third layer of the neural network sums the outputs of second layer neurons belonging to the samples of a single kth pattern

$${Sn}_{k}^{\left(3\right)}=\mathop{\sum }\limits_{i\in k}^{{K}_{k}}f\left({Sn}_{i}^{\left(2\right)}\right),$$

(2.9)

where K_k is the number of samples for the kth recognized pattern, and checks at least one neuron input for activity using the activation function:

$$\begin{array}{l}f\left({Sn}_{k}^{\left(3\right)}\right)=1,\quad {\rm{if}}\quad {Sn}_{k}^{\left(3\right)}>0,\\ f\left({Sn}_{k}^{\left(3\right)}\right)=0,\quad {\rm{if}}\quad {Sn}_{k}^{\left(3\right)}\le 0.\end{array}$$

(2.10)

Values of the weights for each input of second and third layer neurons are calculated either according to the generalized algorithm for determining the weights of the second and third layers given in [9, 10], or in the simplest case they are taken to be equal to one:

$${w}_{i,j}^{\left(2\right)}={w}_{i,j}^{\left(3\right)}=1.$$

(2.11)

Thus, according to item 3 of the nearest neighbor method, the number of the active output of the third layer determines the nearest pattern for the recognized element X.

Figure 1 shows an extended scheme of a neural network based on metric recognition methods that implements the nearest neighbor method. A neural network is built on the basis of a set of selected samples and the number of patterns (classes) recognized in the task.

In the diagram on Fig. 1, the number of neurons in the second layer is equal to the number of used samples n⁽²⁾ = N, and the number of neurons in the third layer corresponds to the number of recognized patterns n⁽³⁾ = N_patt. The number of neurons on the first layer in the extended network in Fig. 1b is determined by the number of all possible pairs of samples:

$${n}^{(1)}=N(N-1).$$

(2.12)

The neural network in Fig. 1 may not have a zero layer; in that case, for each first layer neuron we compute a weight table where the values of the weights of first layer neurons are calculated analytically based on metric expressions of proximity of two neurons of the zero layer, for example, by the expression

$${w}_{c,r}^{(1)}={d}_{1}^{2}-{d}_{2}^{2}=\left({\left({c}_{1}-{c}_{p}\right)}^{2}+{\left({r}_{1}-{r}_{p}\right)}^{2}\right)-\left({\left({c}_{2}-{c}_{p}\right)}^{2}+{\left({r}_{2}-{r}_{p}\right)}^{2}\right),$$

(2.13)

where (c_p, r_p) is the coordinate of the point (cell of the weight table) for which the weight value is calculated, (c₁, r₁) and (c₂, r₂) are the nearest coordinates of active points (cells of the weight table) for a point (cell) with coordinates (c_p, r_p) (Figs. 2a and 2b). Simpler or more complex expressions other than (2.13) can also be used as expressions for proximity measures. In this work, expression (2.13) is used to create a neural network.

Note that other metric recognition methods can also be implemented based on the diagram in Fig. 1 with minor additions and corrections.

3 Constructing a neural network and computing weights

In this work, we construct neural network based on MNIST. This means that the set of samples is drawn from the MNIST dataset (Fig. 3), the dimension of the weight table is also determined based on the dimension of the image matrix of the MNIST dataset, exceeding it by a factor of two and equal to 28 × 56. Further retraining of the network is also based on MNIST.

Recall that the MNIST dataset consists of a training set, which includes 60 000 images of handwritten digits, and a control (test) set with 10 000 images of digits. Each set also has its own set of digits whose order is the same as the order of the images of digits in the training and test sets. Images of digits in the datasets are described as a matrix of numbers of dimension 28 × 28. Each digit in the matrix determines the intensity of one pixel of a digit image in the range [0, 255].

Three samples of images of digits from each pattern were selected as samples. In total, 30 samples were used; they are shown in Fig. 4. The samples were selected intuitively from the first 250 images of the digits of the MNIST test set. In Fig. 4, the name of the image is shown above each selected sample, for example 2_1, where the first number indicates the pattern to which the image of the digit belongs, and the second number indicates the serial number of this image in the MNIST database. Here and below, images of the MNIST base will be denoted in this way. Since the number of samples is 30, in accordance with the architecture of neural networks that implement metric recognition methods, the number of neurons in the second layer will also be 30, where each ith output corresponds to the ith sample and determines whether the recognized image belongs to the ith sample.

For the samples in the considered problem, we chose the order of arrangement in the sequence shown in Fig. 4. First, the column with the images of the digit "0,” then the digit "1,” and so on. Accordingly, the network diagram in Fig. 1 determines the outputs of the second layer as well. For example, the sample 0_157 corresponds to the k = 0 output of the second layer, and the sample 5_23 corresponds to the k = 5 × 3 + 3 − 1 = 17 output of the second layer, etc. The number of neurons on the third layer is equal to the number of recognizable patterns of numbers, n⁽³⁾ = 10. Each ith output of the third layer determines whether the recognized element belongs to the ith pattern of a digit. The order of the patterns of numbers is determined sequentially from 0 to 9.

The number of neurons on the first layer is determined by the expression (2.12):

$${n}^{(1)}=30\times 29=870.$$

(3.1)

In order to simplify and speed up the calculation of the first layer weight tables, the zero layer weight tables [4] are calculated by the formula (2.2), based on which the first layer weight tables (2.13) are further calculated.

In the process of recognition or training, a binary matrix is constructed for each input recognized image; the matrix consists of two parts. In the first part of the binary matrix, unit values determine the bright pixels of the image, whose intensity values are > 150 and, on the contrary, zeros correspond to dark pixels of the image with intensity values < 150. The other part of the matrix is mirror-opposite; it defines active (=1) dark image pixels (< 150) and inactive (=0) light image pixels ( > 150). The dimension of the binary matrix is 28 × 56. Accordingly, each zero layer weight table is proportional to the binary matrix and also consists of two parts shown in Fig. 5. Figure 5 shows that, in contrast to the input binary matrix of recognizable patterns, zero pixels correspond to active pixels of the image in the zero layer weight tables. For each sample, we compute its own zero layer weight table, similar to the table in Fig. 5.

The table of weights of the first layer is determined on the basis of the tables of weights of the zero layer by subtracting two pairwise compared tables of weights of the neurons of the zero layer by the expression

$${\overline{W}}_{i,j}^{(1)}={\overline{W}}_{i}^{(0)}-\,{\overline{W}}_{j}^{(0)}.$$

(3.2)

Since the values of the weights on the first layer are large, mainly located in the range [0, 100], which is unnatural for the backpropagation algorithm, each weight value of the first layer is divided by 100. Figure 6 shows a sample table of weights of the first layer, calculated for a pair of samples 2_172 and 5_102. Thus, for each first layer neuron, weight tables are calculated. A total of 870 tables of weights for neurons of the first layer are calculated. We note that in the future, when training the neural network, zero layer neurons are not used any more, only the resulting three-layer network consisting of the first, second, and third layers will be considered (Figs. 1b–1d).

Before starting the training of the neural network in Fig. 1 with the backpropagation algorithm, it is necessary to carry out some transformations in the network. In Figs. 1b–1d the neural network is a feedforward network, but the second and third layers of the network are not fully connected, accordingly, the network itself is not fully connected. In [9, 10] a generalized algorithm for creating a fully connected neural network and calculating the values of the weights of the second and third layers, at which the initial logic of the network operation is preserved, was presented. But in the simplest case, a fully connected neural network can be obtained from the circuit in Fig. 1; by adding all the missing links of the second and third layers, the weights of which will be equal to zero. At the same time, the values of the weights of links (not added) of the second and third layers in Fig. 1 will remain, as before, equal to unity. In this case, the logic of the network in Fig. 1 will also not change. Figure 7 shows the values of the weights of the neurons of the second and third layers located horizontally in rows. Each neuron of the second and third layers corresponds to a line of a sequence of values of link weights: ones and zeros. For example, in Fig. 7b the number of consecutive units in a line is determined by the number of samples that belong to the same image of a digit. In this example, this value is the same for all images and is equal to three (Fig. 4). In Fig. 7 above each line, the threshold values of neurons are also shown. For the second layer according to the expression (2.8) this value is the same for all neurons of the second layer and is H⁽²⁾ = N − 1 = 30 − 1 = 29, for the third layer H⁽³⁾ = 0. The threshold value presented as weight is opposite in sign to the threshold value H, (Wh2 = − H = − 29).

As the activation function for neurons we used the sigmoid activation function

$$f\left(Sw\right)=\frac{1}{1+{e}^{-Sw}}.$$

(3.3)

Since for the network diagram in Fig. 1 all the weight values of the neurons of the second and third layers are positive, respectively, all the outputs of the neurons of the third layer, calculated with the sigmoid activation function, are greater than 0.5. This can be observed in Table 1, where the outputs of the third layer are shown for the recognition of image 2_174 (Fig. 8). In these experiments, as a comparison rule at the output of the neural network, we used the determination of the activity of the output of the third layer neuron by the highest output value Y_i of the neural network; for example, in Table 1 the highest output corresponds to the output 2.

Table 1

Outputs of the third layer of the neural network with threshold and sigmoidal activation function for recognizing image 2_174 (Fig. 8) from the MNIST test set

With threshold activation
0	Sw3 = 0	Yout = 0
1	Sw3 = 0	Yout = 0
2	Sw3 = 1	Yout = 1
3	Sw3 = 0	Yout = 0
4	Sw3 = 0	Yout = 0
5	Sw3 = 0	Yout = 0
6	Sw3 = 0	Yout = 0
7	Sw3 = 0	Yout = 0
8	Sw3 = 0	Yout = 0
9	Sw3 = 0	Yout = 0
With sigmoidal activation
0	Sw3 = 0.000270242677595017	Yout = 0.500067560668988
1	Sw3 = 3.43951083407716E-6	Yout = 0.500000859877708
2	Sw3 = 0.0526576299057448	Yout = 0.513161366435398
3	Sw3 = 0.0109778057331013	Yout = 0.502744423871947
4	Sw3 = 9.28464982946175E-6	Yout = 0.500002321162457
5	Sw3 = 0.000934588149375769	Yout = 0.500233647020337
6	Sw3 = 0.000250751415905857	Yout = 0.500062687853648
7	Sw3 = 1.20144980966343E-7	Yout = 0.500000030036245
8	Sw3 = 0.014626235919069	Yout = 0.503656493794839
9	Sw3 = 0.00149247436075622	Yout = 0.50037311852093

All calculations described in this article were carried out on one computer in the software module shown in Fig. 3, implemented in the C++ Builder environment. For the entire process of creating a neural network and calculating all weights in the software module in Fig. 3, the total time spent was t_construct = 0.54697D2s, i.e., less than a second.

Table 2 shows recognition results of the MNIST test set (10 000 images) based on the resulting neural network using both threshold and sigmoid activation functions. The number and percentage of correctly identified objects of the MNIST test set are also given separately for each digit image (sj, pj, where j is the name of the image), and the total number of images ij for each image j in the MNIST test set is also given.

Table 2

Recognition results for the MNIST test set (10 000 images) without training

s0 = 834	`i`0 = 980	p0 = 85%
s1 = 968	`i`1 = 1135	p1 = 85%
s2 = 530	`i`2 = 1032	p2 = 51%
s3 = 454	`i`3 = 1010	p3 = 44%
s4 = 410	`i`4 = 982	p4 = 41%
s5 = 411	`i`5 = 892	p5 = 46%
s6 = 586	`i`6 = 958	p6 = 61%
s7 = 556	`i`7 = 1028	p7 = 54%
s8 = 773	`i`8 = 974	p8 = 79%
s9 = 750	`i`9 = 1009	p9 = 74%
Total: `i` = 10 000, s = 6272, p = 62 %

The data shown in Table 2 indicates that the total number of correctly identified MNIST images was 62% for the neural network with a threshold activation function. The result of the precomputed weight values is also preserved for the sigmoid activation function, which is also confirmed by the recognition of the MNIST test set with the neural network with the sigmoid activation function and with testing the output of the neural network for the highest output value Y_i. In this case, the result of a neural network with the sigmoid activation function is identical to the result of a neural network with the threshold activation function and is also 62%.

It must also be said that in these experiments, for the sigmoid activation function the value of the weight threshold Wh2 = − 29 increased up to the value Wh2 = − 27, which was done in order to increase the throughput of second layer neurons, since it is known that, unlike the threshold activation function, the sigmoid function tends to unity but does not reach it.

Perhaps there was no need for this change, since the recognition accuracy for the MNIST test set with the sigmoid activation function and testing the output of the neural network by the highest value of the network output yields the same result for both Wh2 = − 27 and Wh2 = − 29 and is equal to 62%, i.e., equal to the test result of a neural network with the threshold activation function.

The neural network was trained using the stochastic backpropagation algorithm on the MNIST training set (60 000 images). Corrections were made after each new image presented to the network inputs in the event of a recognition error at the output of the neural network. If there was no recognition error, no updates were made. During back propagation, the value y^corr = 0.7 was considered as the correct active output of the neural network, and the value y^corr = 0.2 for the correct inactive output, i.e., during training the values of the neural network outputs were pulled up to these values. For each experiment in training a neural network, three epochs were used, the first two of which were trained at the rate nk = 0.1, and the last epoch was trained with rate nk = 0.02. The learning error S_err was computed for each epoch using the formula

$${S}_{err}=\frac{1}{2}\mathop{\sum }\limits_{i=0}^{P}\mathop{\sum }\limits_{k=0}^{{N}_{{\rm{patt}}}-1}{\left({y}_{k}^{(corr)}-f\left(S{n}_{k}^{\left(3\right)}\right)\right)}^{2},$$

(3.4)

where ${y}_{k}^{(corr)}$ is the correct value of the kth output of the third layer for the active output ${y}_{k}^{(corr)}=0.7$, and for the inactive ${y}_{k}^{(corr)}=0.2$; P is the number of incorrectly identified images on the MNIST training set for which weights were corrected by the backpropagation algorithm during the epoch. In the second experiment, the neural network obtained above was also trained, but this time the values of all weights were randomly generated in the range [ − 0.5; 0.5].

The stochastic backpropagation algorithm implemented in the software unit shown on Fig. 3 operates as follows.

1. In sequential order, choose an image from the MNIST training set, construct for it a binary image matrix, and feed it to the input of the neural network with precomputed or randomly generated weights.

2. Perform forward propagation through the neural network with subsequent calculation of the values of state functions and neuron activations using the expressions (2.5)–(2.10), starting from the first layer and ending with the third.

3. The outputs of the neural network are checked according to the rule of the highest value of the network output. If the active output of the neural network corresponds to the established output of the pattern to which the current image belongs, then go to step 1; if it does not match, then the error propagates back over the network from the third layer to the first, and the algorithm goes to step 4.

4. For each ith neuron of the third layer, a new weighted threshold value (bias) is calculated:

$${\rm{Wh}}3[{\mathtt{i}}][0]={\rm{Wh}}3[{\mathtt{i}}][0]+{\rm{dWh}}3[{\mathtt{i}}][0],$$

(3.5)

where dWh3[i][0] is the increment of the weight threshold value of the ith neuron of the third layer

$${\rm{dWh}}3[{\mathtt{i}}][0]={\rm{n}}{\mathtt{k}}\times {\rm{Sig}}3[{\mathtt{i}}];$$

(3.6)

here nk is the learning rate coefficient (field nk in Fig. 3), and Sig3[i] is the error for the ith output of the third layer of the neural network defined by the following expression:

$${\rm{Sig}}3[{\mathtt{i}}]=({\rm{Ycorr}}-{\rm{Yout}}[{\mathtt{i}}])\times {\rm{Yout}}[{\mathtt{i}}]\times (1-{\rm{Yout}}[{\mathtt{i}}]),$$

(3.7)

where Yout[i] is the current ith output value of the third layer, Ycorr is the expected ith output value of the third layer. For the expected correct output Ycorr = 0.7, for the expected incorrect output of the neural network Ycorr = 0.2. At the same stage, the correction of the total quadratic error (3.4)S_err for this image is also calculated and added to the previous value S_err:

$${S}_{err}={S}_{err}+\mathop{\sum }\limits_{i=0}^{{N}_{{\rm{patt}}}-1}{\left({\rm{Ycorr}}-{\rm{Yout}}[{\mathtt{i}}]\right)}^{2};$$

(3.8)

5. The new weight values for the third layer are calculated by the expression:

$${\rm{W}}3[{\mathtt{i}}][{\mathtt{k}}]={\rm{W}}3[{\mathtt{i}}][{\mathtt{k}}]+{\rm{dW}}3[{\mathtt{i}}][{\mathtt{k}}],$$

(3.9)

where dW3[i][k] is the increment of the weight value W3[i][k] connecting the kth neuron of the second layer and the ith neuron of the third layer, and is given by the expression

$${\rm{dW}}3[{\mathtt{i}}][{\mathtt{k}}]={\rm{n}}{\mathtt{k}}\times {\rm{Sig}}3[{\mathtt{i}}]\times {\rm{F}}2[{\mathtt{k}}],$$

(3.10)

where F2[k] is the output of the kth neuron of the second layer. Weighted threshold values of the neurons of the second layer are calculated:

$${\rm{Wh}}2[{\mathtt{k}}][0]={\rm{Wh}}2[{\mathtt{k}}][0]+{\rm{dWh}}2[{\mathtt{k}}][0],$$

(3.11)

where dWh2[k][0] is the increment of the weight threshold value Wh2[k][0] for the kth neuron of the second layer, defined by the expression:

$${\rm{dWh}}2[{\mathtt{k}}][0]={\rm{n}}{\mathtt{k}}\times {\rm{Sig}}2[{\mathtt{k}}];$$

(3.12)

here the error is

$${\rm{Sig}}2\left[{\mathtt{k}}\right]=\mathop{\sum }\limits_{i=0}^{{N}_{{\rm{patt}}}-1}\left({\rm{Sig}}3\left[{\mathtt{i}}\right]\times {\rm{W}}3\left[{\mathtt{i}}\right]\left[{\mathtt{k}}\right]\right).$$

(3.13)

6. The new weight values for the second layer are calculated by the expression

$${\rm{W}}2[{\mathtt{i}}][{\mathtt{k}}][{\mathtt{k}}1]={\rm{W}}2[{\mathtt{i}}][{\mathtt{k}}][{\mathtt{k}}1]+{\rm{dW}}2[{\mathtt{i}}][{\mathtt{k}}][{\mathtt{k}}1],$$

(3.14)

where dW2[i][k][k1] is the increment of the weight value W2[i][k][k1] for the connection between the ith neuron of the second layer and the first layer neuron that performs the identification of a pair of samples k and k1. dW2[i][k][k1] is calculated by the following expression:

$${\rm{dW}}2[{\mathtt{i}}][{\mathtt{k}}][{\mathtt{k}}1]={\rm{n}}{\mathtt{k}}\times {\rm{Sig}}2[{\mathtt{i}}]\times {\rm{F}}1[{\mathtt{k}}][{\mathtt{k}}1],$$

(3.15)

where F1[k][k1] is the output of the first layer neuron that compares samples k and k1.

Weighted threshold values for first layer neurons are calculated as

$${\rm{Wh}}1[{\mathtt{k}}][{\mathtt{k}}1][0]={\rm{Wh}}1[{\mathtt{k}}][{\mathtt{k}}1][0]+{\rm{dWh}}1[{\mathtt{k}}][{\mathtt{k}}1][0],$$

(3.16)

where dWh1[k][k1][0] is the increment of the weight threshold of the value Wh1[k][k1][0] for the first layer neuron, which recognizes samples k and k1

$${\rm{dWh}}1[{\mathtt{k}}][{\mathtt{k}}1][0]={\rm{n}}{\mathtt{k}}\times {\rm{Sig}}1[{\mathtt{k}}][{\mathtt{k}}1],$$

(3.17)

where the error is

$${\rm{Sig}}1\left[{\mathtt{k}}\right]\left[{\mathtt{k}}1\right]=\mathop{\sum }\limits_{i=0}^{N-1}\left({\rm{Sig}}2\left[{\mathtt{i}}\right]\times {\rm{W}}2\left[{\mathtt{i}}\right]\left[{\mathtt{k}}\right]\left[{\mathtt{k}}1\right]\right).$$

(3.18)

7. New weight values are calculated separately for the two parts of the weight tables of the first layer according to the expressions

$${\rm{W}}1[{\mathtt{k}}][{\mathtt{k}}1][0]\left[{\rm{r}}\right]\left[{\rm{c}}\right]={\rm{W}}1[{\mathtt{k}}][{\mathtt{k}}1][0]\left[{\rm{r}}\right]\left[{\rm{c}}\right]+{\rm{dW}}1[{\mathtt{k}}][{\mathtt{k}}1][0]\left[{\rm{r}}\right]\left[{\rm{c}}\right],$$

(3.19)

$${\rm{W}}1[{\mathtt{k}}][{\mathtt{k}}1][1]\left[{\rm{r}}\right]\left[{\rm{c}}\right]={\rm{W}}1[{\mathtt{k}}][{\mathtt{k}}1][1]\left[{\rm{r}}\right]\left[{\rm{c}}\right]+{\rm{dW}}1[{\mathtt{k}}][{\mathtt{k}}1][1]\left[{\rm{r}}\right]\left[{\rm{c}}\right],$$

(3.20)

where ${\rm{dW}}1[{\mathtt{k}}][{\mathtt{k}}1][0]\left[{\rm{r}}\right]\left[{\rm{c}}\right]$ is the increment of the weight value ${\rm{W}}1[{\mathtt{k}}][{\mathtt{k}}1][0]\left[{\rm{r}}\right]\left[{\rm{c}}\right]$ for the connection between the cell in the first part of the weight table with coordinates (r, c) with a first layer neuron that performs pairwise identification of the samples with numbers k and k1, and ${\rm{dW}}1[{\mathtt{k}}][{\mathtt{k}}1][1]\left[{\rm{r}}\right]\left[{\rm{c}}\right]$ is the increment of the weight value ${\rm{W}}1[{\mathtt{k}}][{\mathtt{k}}1][1]\left[{\rm{r}}\right]\left[{\rm{c}}\right]$ for the connection connecting a cell of the second part of the weight table with coordinates (r, c) with the first layer neuron that performs pairwise identification of samples k and k1. The values ${\rm{dW}}1[{\mathtt{k}}][{\mathtt{k}}1][0]\left[{\rm{r}}\right]\left[{\rm{c}}\right]$ and ${\rm{dW}}1[{\mathtt{k}}][{\mathtt{k}}1][1]\left[{\rm{r}}\right]\left[{\rm{c}}\right]$ are defined by the following expressions:

$${\rm{dW}}1[{\mathtt{k}}][{\mathtt{k}}1][0]\left[{\rm{r}}\right]\left[{\rm{c}}\right]={\rm{n}}{\mathtt{k}}\times {\rm{Sig}}1[{\mathtt{k}}][{\mathtt{k}}1]\times {\rm{BinX}}[0]\left[{\rm{r}}\right]\left[{\rm{c}}\right],$$

(3.21)

$${\rm{dW}}1[{\mathtt{k}}][{\mathtt{k}}1][1]\left[{\rm{r}}\right]\left[{\rm{c}}\right]={\rm{n}}{\mathtt{k}}\times {\rm{Sig}}1[{\mathtt{k}}][{\mathtt{k}}1]\times {\rm{BinX}}[1]\left[{\rm{r}}\right]\left[{\rm{c}}\right],$$

(3.22)

where ${\rm{BinX}}[0]\left[{\rm{r}}\right]\left[{\rm{c}}\right]$, ${\rm{BinX}}[1]\left[{\rm{r}}\right]\left[{\rm{c}}\right]$ are the values of the cell with coordinates (r, c) for the first and second parts of the binary matrix of the input image of the MNIST training set.

8. If there was the last, 60 000th, image of the MNIST training base and the last given era (Fig. 3), then the algorithm ends, otherwise the transition to step 1 is performed.

Note that in the software implementation of neural networks based on metric recognition methods in any software environment, it is expedient and convenient to number the neurons of the first layer not by the number of neurons, as is done in the implementation of classical schemes, but by the numbers of two samples that are compared in a given neuron, as shown above, for example, ${\rm{W}}1[{\mathtt{k}}][{\mathtt{k}}1][0]\left[{\rm{r}}\right]\left[{\rm{c}}\right]$. This approach allows you to conveniently and clearly implement the structure of the circuit in Fig. 1, as well as functions: preliminary calculation of the weights, recognition of the input image, training the neural network, outputting the results, etc.

Table 3 shows the results of training the resulting neural network with both precomputed values of neural network weights and in the classical way, with random weight initialization.

Table 3

Comparing the results of training the neural network by the MNIST training set (60 000 images) for each training epoch

Epoch no.	Learning rate	Number of correctly recognized images	Percentage of correctly recognized images	S _{e r r}	Time in minutes
		Training the neural network with precomputed weights
1	0.1	43 932	73 %	1199	159
2	0.1	49 748	83 %	737	98
3	0.02	52 285	87 %	545	72
Total training time in minutes					329
		Training the neural network with random initialization of the weights in the range [ − 0.5; 0.5]
1	0.1	35 370	59 %	1935	256
2	0.1	46 033	76 %	1051	139
3	0.02	49 195	82 %	784	104
Total training time in minutes					499

Table 4 shows the results of the verification of the obtained neural network after each training epoch on the MNIST test set (10 000 images).

Table 4

Comparing the results of training the neural network with testing on the MNIST test set (10 000 images) for each training epoch

Epoch no.	Learning rate in the epoch	Training the neural network with precomputed weights	Training the neural network with random weight initialization
1	0.1	9145	8894
2	0.1	9282	9116
3	0.02	9449	9256

4 Comparing the results of two experiments

According to the results of Tables 3 and 4, we construct diagrams for both the MNIST test set (Fig. 9) and the MNIST training base (Fig. 10). In the diagrams, one can see that the performance of a neural network with precomputed weights is higher after each epoch of training compared to the performance of a trained network for which the initial values of the weights were generated at random. Apparently, this is due to the fact that a neural network with precomputed weight values, according to the plots in Fig. 9 and 10, has more time in reserve and at the same time a shorter way to go needed to reach better results both in the number of correctly recognized images of the MNIST dataset and in lower error values S_err (Fig. 11).

The time spent on training a neural network with precomputed weighted values is also less on all epochs compared to a neural network trained from scratch (Fig. 12). At the same time, on the diagram (Fig. 12) one can also see that the main time advantage accumulates in the first epoch. In general, 329 min = 5 h 29 min was spent on training a neural network with precomputed weight values, and when training a neural network from scratch, the total training time was 499 min = 8 h 19 min, i.e., for the second experiment it took almost 3 hours more. We also note that the above results were obtained with a set of samples shown in Fig. 4 which, apparently, is not the best either in quality or in quantity. In other words, with better quality and with more samples, the results could be even better. The total learning efficiency can also be higher when using training algorithms [1] that give better results compared to the stochastic backpropagation algorithm used in our experiments.

The results in Figs. 9–12, in general, also show that the backpropagation algorithm does not deteriorate the performance of a neural network with precomputed weights, but finds an even better solution when starting from an already existing initial result.

The weight values on the second and third layers shown in Fig. 13 also confirm this. Figure 13 shows that initial values of the weights—ones and zeros, as well as the established thresholds of the second and third layers during the entire training process, changed only slightly in the positive or negative direction, while preserving the main logic of the network. Outputs of the second layer of the neural network, for example, shown in Table 5 for image 2_1 from the test set (Fig. 8), confirm the same conclusion. Table 5 shows that the active output of the second layer for the recognized image 2_1 corresponds to the same output that was originally set in the network diagram for sample 2_172 in Fig. 1c. In other words, we have not observed any significant changes in the structure and logic of the trained neural network.

Table 5

A fragment (9 out of 30) of the second layer outputs in the recognition of the symbol 2_1 (Fig. 8) from the MNIST test set. The network has been trained for three epochs

No. Y2	Samples	Sw2	F2
0	0_157	Sw2 = − 2.25757884565147	F2 = 0.0946977310982947
1	0_25	Sw2 = − 14.7120863080901	F2 = 4.07964075748757E − 7
2	0_28	Sw2 = − 2.45988516820271	F2 = 0.0787186646034626
3	1_135	Sw2 = − 16.1987387416131	F2 = 9.22522804629833E − 8
4	1_2	Sw2 = − 21.1151249699638	F2 = 6.75799304608105E − 10
5	1_46	Sw2 = − 14.8256166236525	F2 = 3.64180217856173E − 7
6	2_172	Sw2 = 1.78349180153865	F2 = 0.85612749736249
7	2_35	Sw2 = − 0.45514766242243	F2 = 0.388137562543613
8	2_77	Sw2 = − 6.77250084528418	F2 = 0.00114351889764061
9	3_32	Sw2 = − 0.268545411318517	F2 = 0.433264229225887

5 Conclusion

Based on the experiments and obtained results, we reach the following conclusions.

1. The time of construction and calculation of all weight values of a neural network, including the weights on the zero layer, for the image format in the MNIST dataset constitutes fractions of a second (for the considered example t_construct = 0.5469 s). We can say that, in comparison with the time spent on training the network, the construction of a neural network and the calculation of all weight values is performed almost instantly.

2. Neurons of a neural network based on metric recognition methods can also have a continuous activation function, for example, a sigmoid activation function.

3. A neural network based on metric recognition methods can be trained by the backpropagation algorithm.

4. The process of retraining a neural network with precomputed weights requires less time compared to the classical training of a neural network with random generation of the weights. In the considered example, the time gain was 2 hours 50 minutes.

5. The above example has also shown that the neural network with precomputed weights has a better recognition result for the MNIST dataset on all three training epochs. The best recognition result on the MNIST test set according to the results of three epochs of training a neural network with precomputed weights was 94%.

6. Based on the above experimental results, it can be assumed that with better selection of the sample set, as well as with a larger number of selected samples in the diagrams shown in Figs. 9–12, the positive differences in performance may be even more significant.

7. The above technology of precomputing weight values can be presumably used in other architectures of feedforward neural networks, in particular in deep networks [6, 7], which can speed up the process of constructing and training these networks.

Vorheriger Artikel Optimal Stopping Time for Geometric Random Walks with Power Payoff Function

Nächster Artikel State Observer-Based Iterative Learning Control of an Uncertain Continuous-Time System

Springer Professional

Comparative Analysis of the Results of Training a Neural Network with Calculated Weights and with Random Generation of the Weights

Abstract

1 Introduction

2 Fundamentals of neural networks based on metric recognition methods

3 Constructing a neural network and computing weights

4 Comparing the results of two experiments

5 Conclusion

Premium Partner

Springer Professional

Abstract

1 Introduction

2 Fundamentals of neural networks based on metric recognition methods

3 Constructing a neural network and computing weights

4 Comparing the results of two experiments

5 Conclusion

Weitere Artikel der Ausgabe 7/2020

Reflexion Processes and Equilibrium in an Oligopoly Model with a Leader

Modeling the Influence of External Factors on the Emergence of Specialization in Abstract Systems

Stationary Characteristics of an Unreliable Single-Server Queueing System with Losses and Preventive Maintenance

Optimal Stopping Time for Geometric Random Walks with Power Payoff Function

Elements of Randomized Forecasting and Its Application to Daily Electrical Load Prediction in a Regional Power System

A Parametric Control Model for Implementing the National Healthcare Project

Premium Partner