30112019  Original Article  Issue 6/2020 Open Access
Study on the prediction of stock price based on the associated network model of LSTM
Important notes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
Stock market has received widespread attention from investors. How to grasp the changing regularity of the stock market and predict the trend of stock prices has always been a hot spot for investors and researchers. The rise and fall of stock prices are influenced by many factors such as politics, economy, society and market. For stock investors, the trend forecast of the stock market is directly related to the acquisition of profits. The more accurate the forecast, the more effectively it can avoid risks. For listed companies, the stock price not only reflects the company’s operating conditions and future development expectations, but also an important technical index for the analysis and research of the company. Stock forecasting research also plays an important role in the research of a country’s economic development. Therefore, the research on the intrinsic value and prediction of the stock market has great theoretical significance and wide application prospects.
The main purpose of this paper is to design a deep network model to predict simultaneously the opening price, the lowest price and the highest price of a stock on the next day according to the historical price of the stock and other technical parameter data. Therefore, it is proposed an LSTMbased deep recurrent neural network model to predict the three associated values (so it is called the associated neural network model, and abbreviated as associated net model). The associated net model is compared with LSTM and LSTMbased deep recurrent neural network, and verified the feasibility of the model by comparing the accuracy of the three models.
Advertisement
The rest of this paper is organized as follows. Section
2 introduces the research status of stock price forecasting. Section
3 introduces the model design of the associated neural network model. Section
4 describes the design of the algorithm and experimental parameters. Section
5 introduces the experimental data set, the experimental results and the analysis on the results. Section
6 concludes the paper.
2 Related works
There are many related researches on stock price prediction. Support vector machines was applied to build a regression model of historical stock data and to predict the trend of stocks [
1]. Particle swarm optimization algorithm is used to optimize the parameters of support vector machine, which can predict the stock value robustly [
2]. This study improves the support vector machine method, but particle swarm optimization algorithm requires a long time to calculate. LSTM was combined with naive Bayesian method to extract market emotion factors to improve the performance of prediction [
3]. This method can be used to predict financial markets in completely different time scales with other variables. The emotional analysis model integrated with the LSTM time series learning model to obtain a robust time series model for predicting the opening price of stocks, and the results showed that this model could improve the accuracy of prediction [
11]. Jia [
12] discussed the effectiveness of LSTM for predicting stock price, and the study showed that LSTM is an effective method to predict stock profits. Realtime wavelet denoising was combined with LSTM network to predict the east Asian stock index, which corrected some logic defects in previous studies [
13]. Compared with the original LSTM, this combination model is greatly improved with high prediction accuracy and small regression error. Bagging method was used to combine multiple neural network method to predict Chinese stock index (including the Shanghai composite index and Shenzhen component index) [
4], each neural network was trained by back propagation method and Adam optimization algorithm, the results show that the method has different accuracy for prediction of different stock index, but the prediction on close is unsatisfactory. The evolutionary method was applied to predict the change trend of stock price [
5]. The deep belief network with inherent plasticity was used to predict the stock price time series [
6]. Convolutional neural network was applied to predict the trend of stock price [
7]. A forward multilayer neural network model was created for future stock price prediction by using a hybrid method combining technical analysis variables and basic analysis variables of stock market indicators and BP algorithm [
8]. The results show that this method has higher accuracy in predicting daily stock price than the technical analysis method. An effective soft computing technology was designed for Dhaka Stock Exchange (DSE) to predict the closing price of DSE [
9]. The comparison experiment with artificial neural network and adaptive neural fuzzy reasoning system shows that this method is more effective. Artificial bee colony algorithm was combined with wavelet transforms and recurrent neural network for stock price forecasting. Many international stock indices were simulated for evaluation, including the Dow Jones industrial average (DJIA), London FTSE 100 index (FTSE), Tokyo Nikkei225 index (Nikkei) and the Taiwan stock exchange Capitalization Weighted Stock Index (TAIEX). The simulation results show that the system has good prediction performance and can be applied to realtime trading system of stock prediction.
A multioutput speaker model based on RNNLSTM was used in the field of speech recognition [
14]. The experimental results show that the model is better than a single speaker model, and finetuning under the infrastructure when adding new output branches. Obtaining a new output model not only reduces memory usage but also better than training a new speaker model. A multiinput multioutput convolutional neural network model (MIMONet) was designed for cell segmentation of fluorescence microscope images [
15]. The experimental results show that this method is superior to the most stateoftheart deep learning based segmentation method.
Inspired by the above research, considering that some parameters and indicators of a stock are associated with one another, it is necessary to design a multivalue associated neural network model that can handle multiple associated prices of the same stock and output these parameters and indicators at the same time. For this purpose, it is proposed an associated neural network model based on LSTM deep recurrent network which is established by historical data and for predicting the opening price, lowest price and highest price of the stock on the next day.
Advertisement
3 Model design
3.1 Long shortterm memory network
Long shortterm memory network (LSTM) is a particular form of recurrent neural network (RNN), which is the general term of a series of neural networks capable of processing sequential data. LSTM is a special network structure with three “gate” structures (shown in Fig.
1). Three gates are placed in an LSTM unit, called input gate, forgetting gate and output gate. While information enters the LSTM’s network, it can be selected by rules. Only the information conforms to the algorithm will be left, and the information that does not conform will be forgotten through the forgetting gate.
×
The gate allows information to be passed selectively and Eq.
1 shows the default activation function of the LSTM network, the sigmoid function. The LSTM can add and delete information for neurons through the gating unit. To determine selectively whether information passes or not, it consists of a Sigmoid neural network layer and a pair multiplication operation. Each element output by the Sigmoid layer is a real number between [0, 1], representing the weight through which the corresponding information passes. In the LSTM neural network, there is also a layer containing tanh activation function which shown in Eq.
2. It is used for updating the state of neurons
$$ \sigma \left( {\text{x}} \right) = \frac{1}{{1 + e^{  x} }} $$
(1)
$$ \tanh \;(x) = \frac{{e^{x}  e^{  x} }}{{e^{x} + e^{  x} }} $$
(2)
The forgetting gate of the LSTM neural network determines what information needs to be discarded, which reads
h
_{t−1} and
x
_{t}, gives the neuron state
C
_{t−1} a value of 0–1. Equation
3 shows the calculation method of forgetting probability
where
h
_{t−1} represents the output of the previous neuron and
x
_{t} is the input of the current neuron.
\( \sigma \) is the sigmoid function.
$$ \mathop f\nolimits_{\text{t}} = \sigma \left( {\mathop W\nolimits_{f} \cdot \left[ {\mathop h\nolimits_{t  1} ,\mathop x\nolimits_{t} } \right] + \mathop b\nolimits_{f} } \right) $$
(3)
The input gate determines how much new information is added to the neuron state. First, the input layer containing the sigmoid activation function determines which information needs to be updated, and then a tanh layer generates candidate vectors
\( \mathop {\hat{c}}\nolimits_{\text{t}} \), an update is made to the state of the neuron, as shown in Eq.
4
where the calculation methods of
\( \mathop i\nolimits_{\text{t}} \) and
\( \mathop {\hat{C}}\nolimits_{\text{t}} \) are shown in Eqs.
5 and
6
$$ \mathop C\nolimits_{\text{t}} = \mathop f\nolimits_{t} * \mathop C\nolimits_{t  1} + \mathop i\nolimits_{t} * \mathop {\hat{C}}\nolimits_{\text{t}} $$
(4)
$$ \mathop i\nolimits_{\text{t}} = \sigma \left( {\mathop W\nolimits_{i} \cdot \left[ {\mathop h\nolimits_{t  1} ,\mathop x\nolimits_{t} } \right] + \mathop b\nolimits_{i} } \right) $$
(5)
$$ \mathop {\hat{C}}\nolimits_{\text{t}} = \tanh \left( {\mathop W\nolimits_{c} \cdot \left[ {\mathop h\nolimits_{t  1} ,\mathop x\nolimits_{t} } \right] + \mathop b\nolimits_{c} } \right) $$
(6)
The output gate is used to control how many current neural unites state are filtered and how many controlling units state are filtered which are shown in Eqs.
7 and
8
$$ \mathop o\nolimits_{\text{t}} = \sigma \left( {\mathop W\nolimits_{o} \cdot \left[ {\mathop h\nolimits_{t  1} ,\mathop x\nolimits_{t} } \right] + \mathop b\nolimits_{o} } \right) $$
(7)
$$ \mathop h\nolimits_{\text{t}} = O_{t} * \tanh \left( {\mathop C\nolimits_{\text{t}} } \right) $$
(8)
3.2 Deep recurrent neural network
A LSTMbased deep recurrent neural network (DRNN) is a variant of the recurrent neural network. To enhance the expressive power of the model, the loop body at each moment can be repeated many times. As shown in Fig.
2 , the structure diagram of deep recurrent neural network is given.
×
Deep recurrent neural network is composed of LSTM, so its operation mechanism is same as LSTM. During the process of constructing the task model, the dropout method was used. Dropout refers to the temporary discarding of the neural network unit from the network according to a certain probability during the training of the deep learning network, which is a means to prevent overfitting. The principle of dropout operation is that the neurons in each layer are randomly deleted with probability P in a training iteration, and the data in this iteration are trained with the network composed of the remaining (1 − p)*N neurons, thus alleviating the overfitting problem. The neural network model without dropout is shown in Fig. 2a, b is the neural network model with dropout.
The LSTMbased deep recurrent neural network model with dropout layer was used as the contrast model to verify the feasibility and applicability of the proposed associated neural network model. The structure of LSTMbased deep recurrent neural network is shown in Fig.
3.
×
3.3 Associated neural network model
Since the daily opening price, the lowest price and the highest price of the stock are associated to one another, and the opening price, the lowest price and the highest price are respectively predicted by different networks generally, the associations between one another are separated. Therefore, based on the deep recurrent neural network, a structural model of multivalue associated neural network (associated net) based on LSTM is designed, it is shown in Fig.
4.
×
The specific data processing flow of the multivalue associated neural network model is shown in Fig.
5. Data through the input layer to all three branches simultaneously. These three branches predict the opening price, the lowest price and the highest price respectively. In the Chinese stock market, the maximum fluctuation of stock price is only 10%. Therefor the model recombines the output of the left branch (opening price) and the output of the LSTM network of the second branch as the input parameter data of the predicted lowest price, and the highest price is subject to the opening price of the day, the impact of the lowest price, so the output of the left branch (opening price) and the output of the intermediate branch (lowest price) and the output of the LSTM network of the third branch form the highest of the new data forecast price.
×
In the model training phase, the total loss L
_{total} is used as the evaluation function, and the goal is to get the minimal value. The calculation method of the total loss is shown in Eq.
9
$$ {\text{L}}_{\text{total}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {L_{i} } $$
(9)
4 Design of algorithm and experiments
Regression method is used to predict a specific value, which is not a predefined category, but an arbitrary real number. Regression problem generally has only one output, and the output is the predicting value. The loss function used in regression problems commonly is the mean square error (MSE) (Eq.
10). It is the expectation of the square of the difference between the estimated parameter and the actual parameter. MSE can evaluate the degree of change of the function. The smaller the value of MSE, the better the accuracy of the prediction model describing the experimental data. Therefore, in the training phase, MSE is used as the criterion to measure the quality of a network model
$$ {\text{MSE}}\;(y,y^{\prime } ) = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i}  y_{i}^{\prime } } \right)^{2} } }}{n} $$
(10)
4.1 Algorithm
Deep learning often requires a lot of time and computational resources to train. It is needed to find an optimization algorithm that requires less resources and has faster convergence speed. The Adam optimization algorithm is an extension of the stochastic gradient descent algorithm and has great advantages in solving the nonconvex optimization problem.
During the training phase, the Adam optimization algorithm is used in the model, and L
_{total} is used as the evaluation function. Multiple values associated with neural network model algorithm framework as shown in Fig.
6, the first input sequence data to Associated Net model, it contains three DRNN networks in Associated Net model. Each DRNN network produces a loss, and the losses sum of these three DRNN networks is the total loss. Then the Adam algorithm is used to optimize the total loss. When the number of iterations did not reach the set number of iterations in the model, the training will continue to reduce the total loss, otherwise training will stop.
×
4.2 Parameter setting
There is a parameter of step size in the input of the LSTM neural network that means how many historical data to remember as a reference for predicting the current price. In order to use a relatively good step size in the experiment of the multivalue associated model, a comparison experiment is performed with 6112 sample data, at the step size of 5, 10, 20 and 30, and with the iteration number of 50. The loss variation graphs are shown in Figs.
7,
8,
9 and
10, separately.
×
×
×
×
According to the loss variation graph at the step size of 5, 10, 20 and 30, it is found that the loss at the step size of 10 and 20 decreases the fastest and finally reaches a steady state. By comparing the average loss as shown in Table
1, it is found that the average loss at step size of five is the lowest. The average loss at the step size of 20 differs from the loss at the step size by 0.0014901(shown in Table
1). Considering the loss variation graph and the average loss comprehensively, 20 is chosen as the step size in the model.
Table 1
Average variance loss of different steps
Step

5

10

20

30


Loss

0.0148642

0.0181064

0.0163543

0.0179096

5 Experimental results and analysis
5.1 Dataset
The experimental data in this paper are the actual historical data downloaded from the Internet. Three data sets were used in the experiments, one is Shanghai composite index 000001 and the others are two stocks of PetroChina (stock code 601857) on Shanghai stock exchange and ZTE (stock code 000063) on Shenzhen stock exchange. Shanghai composite index has 6112 historical data; PetroChina has 2688 historical data and ZTE has 4930 historical data. Each data set is divided into a training set and a test set in chronological order at the ratio of 4:1. Each data set has seven technical parameters. It is used these technical parameters as basic input attributes, and the OP, LP and HP of the next day as output values of the model. The identifiers of technical parameters related to stock are shown in Table
2.
Table 2
The identifiers used for stock related technical parameters
Parameter name

Identifier


Open price

OP

Close price

CP

Lowest price

LP

Highest price

HP

Volumes

V

Money

M

Change

C

Due to the different measurement unit of different stock index data, for avoiding the impact of different measurement unit, all the attribute data are normalized to fall within a same range. In this paper, the min–max normalization method is used. The normalization function is shown in Eq.
11
$$ {\text{x}}^{\prime } = \frac{x  \hbox{min} }{\hbox{max}  \hbox{min} } $$
(11)
Through the normalization operation, the data is scaled to [0, 1], which not only speeds up the gradient descent to find the optimal solution, but also improves the accuracy.
5.2 Experimental analysis of training phase
Using the training data set of the Shanghai Stock Index, Associated Net is compared with LSTM network and LSTMbased deeprecurrent neural network (DRNN) in the experiments. The highest price of the stock of the next day was trained and predicted respectively by LSTM, DRNN and Associated Net. As shown in Table
3, it is found that the mean square error of the three models gradually decreased with the increase of training times. LSTM network and DRNN had experienced a slight fluctuation. From the dimension analysis of the same training times, with the increase of the training times, the LSTM average mean square error is lower than the other two models, but in the test phase, the LSTM has the worst prediction effect and the lowest average accuracy. Because LSTM has been overfitting as the number of training increases. The average mean square error of Associated Net is larger than the average mean square error of LSTM and DRNN. Because our model is more complex and requires a larger number of iterations.
Table 3
Average variance loss of three models under different training times
Training times

LSTM

DRNN

Associated net


50

0.0377711

0.0152948

0.037064

100

0.0147428

0.0191181

0.029533

200

0.0132838

0.00721601

0.026126

300

0.00418345

0.0104519

0.019745

500

0.00818345

0.0106546

0.014983

In order to verify this conjecture, several experiments is conducted on Associated Net. The opening price, the highest price and the lowest price of the next day were trained and predicted by the Associated Net model. As shown in Table
4, and the experimental results proved our conjecture. The root of this problem is that the associated network model is composed of multiple deeprecurrent neural networks. The model is complex, the number of neurons is large, and multiple output losses are combined, so the loss of the model decreases slowly. According to the analysis experiment, the model loss chart of each model for 200 iterations is drawn, as shown in Figs.
11,
12, and
13. The output of the Associated Net is the total loss and his three sublosses (opening price loss, lowest price loss, highest price loss). From the analysis for the loss chart, it is found that the loss of each model is gradually decreasing. The LSTM model has multiple fluctuations during the training process. DRNN and Associated Net are very stable. Moreover, the individual subloss of the associated network model is also gradually decreased. As shown in Table
4, although the total loss of Associated Net is higher than that of the other two models, its subloss is very low, and by increasing the number of iterations, the total loss of Associated Net is gradually reduced.
Table 4
The average square loss of Associated Net
Times

Average square loss of open price

Average square loss of lowest price

Average square loss of highest price

Average loss of three losses


50

0.0460407

0.0341459

0.0310071

0.037064

100

0.0345082

0.031234

0.0228587

0.029533

200

0.0265043

0.028083

0.0237893

0.026126

300

0.023574

0.0184222

0.0172387

0.019745

500

0.015052

0.0145609

0.0153359

0.014983

×
×
×
In order to verify the universality of the model, the historical data of two stocks of PetroChina and ZTE are used to verify the universality of the model. The experimental results are shown in Figs.
14 and
15. Combined Fig.
13 with Table
5, it is concluded that the model fits PetroChina data better. The data fitting result of ZTE is relatively poor at the beginning, but it gradually becomes better; In the end, their average loss of mean square error became similar. Through the experiments, it is found that the more the training data, the better the model fitting effect. Further more, while the number of iterations of the model training was increased appropriately, and the loss of the model decreased gradually. The above results are due to the following reasons.
Table 5
Average square loss of different data sets in the Associated Net model
Stock

Open price

Lowest price

Highest price

Average of three losses


PetroChina

0.036184

0.032914

0.034226

0.034444

ZTE

0.032247

0.037305

0.031708

0.033753

Shanghai Index

0.023419

0.021976

0.030776

0.02539


The model is complex and needs large amount of data to train the parameters of each neuron.

PetroChina has a large circulation, and the stock price fluctuation is relatively small, so that a good fitting effect can be obtained quickly. ZTE’s stock price fluctuations are relatively larger, so that more training data is needed to obtain a good fitting effect.
×
×
5.3 Experimental analysis in the test phase
In order to verify the training of each model in the training phase, the three models were tested separately using a test set of multiple stocks. The mean square error (MSE) is the expected value of the square of the difference between the estimated value of the parameter and the true value of the parameter, MAE is the average of the absolute error, and MAE can better reflect the real situation of the predicted value error. Therefor in the test, the average absolute error of MAE (mean absolute error) (Eq.
12) was used as the evaluation index to calculate the degree of deviation, and the result of 1 − MAE was used as the average accuracy of the model and the measurement index. Average accuracy of the three models was shown in Table
6, and average accuracy of Associated Net model with different data sets was shown in Table
7
$$ {\text{MAE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left {f_{i}  y_{i} } \right} $$
(12)
Table 6
Average accuracy of three models
Model

LSTM

DRNN

Associated Net



Open price

Lowest price

Highest price


Average accuracy

0.787925

0.973754

0.971999

0.956185

0.974434

Table 7
Average accuracy of different data sets on the associated net model
Stock

Open price

Lowest price

Highest price


PetroChina

0.986331

0.9795

0.981839

ZTE

0.963979

0.956655

0.96012

Shanghai Index

0.971999

0.956185

0.974434

There are large errors between LSTM prediction values and the real data (shown in Fig.
16); Fig.
17 shows the comparison of DRNN prediction values and real data. The prediction values and the real data are almost coincident, and the deviation between the prediction values and the real data is much small, indicating that the performance of DRNN is better than LSTM in the test data. The deviation between the three prediction results of Associated Net and the real data is also small as shown in Fig.
18. From Figs.
18 and
17, it is found that for the highest price prediction, Associated Net fits the curve of real data better than DRNN, and the data deviation of Associated Net is smaller than that of DRNN. The comparison of the average accuracy for the three models is shown in Table
6. It can be found that for predicting the highest price, Associated Net model has higher average accuracy than the other two models. This phenomenon confirms that the highest value of the next day is not only related to historical data, but also related to the opening price and the lowest price of the same day. Therefore, the Associated Net model can handle such problems very well, and it performs better than DRNN model.
×
×
×
The trained Associated Net model of PetroChina and ZTE were tested with the test data sets of the two stocks. The test results are shown in Figs.
19 and
20. Combined with Fig.
18, it is found that it fits well in the three data sets. From the data in Table
7, it is found that ZTE’s average prediction accuracy is not as good as the other two stocks. However, the accuracy of the three models is above 95%, and the test results are in line with the prior conjectures. Therefore, Associated Net model can predict multiple associated values at the same time, and the difference between the predicted value and the real value is small.
×
×
6 Conclusion
In this paper, a multivalue associated network model of LSTMbased deeprecurrent neural network (Associated Net) is proposed to predict multiple prices of a stock simultaneously. The model structure, the algorithm framework and the experiment design are presented. The feasibility and accuracy of the Associated Net are verified by comparing the model with LSTM network model and the LSTM deeprecurrent neural network model. Multiple data sets were used to verify the applicability of Associated Net model. Experiments show that the average accuracy of Associated Net model is not only better than that of the other two models. Moreover, it can predict multiple values simultaneously, and the average accuracy of each predicted value is over 95%. Although the model achieves good effect, there are still some aspects can be improved. For example, simple arithmetic mean algorithm is used in the calculation of total loss in training phase, and the goal is to optimize the model by reducing the total loss. This loss calculation method does not take into count the relationship between each subloss, as well as some details when the total loss is the minimum, such as the extreme situation of each subloss and the oscillation in the process of loss reduction. In the next step, we will study the dimension reduction of the input parameters and the optimizing the loss calculation method to improve the average accuracy of the model.
Acknowledgements
This work is partially supported by the Science and Technology Project of Guangxi(Guike AB16380260) and Specialized Scientific Research in Public Welfare Industry (Meteorology) (GYHY201406027).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.