Top

Multimedia Systems

Published in:

Open Access 01-06-2024 | Regular Paper

MF-DAT: a stock trend prediction of the double-graph attention network based on multisource information fusion

Authors: Kun Huang, Xiaoming Li, Neal Xiong, Yihe Yang

Published in: Multimedia Systems | Issue 3/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Stock forecasting research, which aims to predict the future price movement of stocks, has been the focus of investors and scholars. This is important for practical applications related to human-centric computing and information sciences. Previous research has generally only considered market information other than the relationship between stocks, and it is challenging to learn a better representation of stock characteristics by considering the relationship between stocks. In the existing methods of combining market information with stock relationship modeling, most of them use predefined industry relationships to construct stock relationship diagrams, which inevitably ignores the potential interactions between stocks, especially the hidden relationships between stock groups. To this end, a new dual-graph attention model (MF-DAT) based on multisource information fusion is designed. Specifically, first, multiple features are fused by the LMF module, then the long-term and short-term state characteristics of stocks are learned through the first layer of the graph attention layer, and finally the node representation of the stock relationship network constructed by the mining stock cluster structure through community detection is updated. Our model takes into account both stock time-series information and potential relationships between stocks. Experiments on the S &P 500 and NASDAQ datasets show that our MF-DAT has better performance than the 8 SOTA methods that are now more popular.

Communicated by J. Gao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Stock market volatility prediction has always been a hot research topic in the field of financial technology (FinTech) and has received extensive attention from investors and scholars. This is important for practical applications related to human-centric computing and information sciences. Contrary to predicting stock prices, an increasing number of investors are concerned about the future direction of stocks [38]. The desire to find stocks that will rise in the future in volatile financial markets has also stimulated the interest of scholars in exploring more suitable stock forecasting models [11, 16, 17]. In particular, the introduction of graph neural network models has produced promising performance for stock prediction tasks [6, 16, 17]. Unfortunately, most research in the fintech space on predicting stock trends considers using historical price time-series analysis techniques in isolation. They tend to assume that the future movement of stocks is determined by historical price information, ignoring the impact of other stock fluctuations, namely momentum spillovers in finance [1].

Recently, some studies have attempted to model the momentum spillover between stocks using a graph neural network (GNN) [5, 7, 10]. Xu et al. [38] propose that the hierarchical graph neural network combines historical series and stock relationships for stock prediction; Feng et al. [10] studied the effect of momentum spillover effects between stocks on stock forecasts by extracting the relationship between firm entities in Wikidata. These studies treat each stock or associated company as a node in the graph, and the edge connecting the two points is determined by a predefined company relationship. However, traditional graph neural networks cannot update the node state of the related stock and weigh the edges of different degrees of correlation [7]. Among them, the importance of various company relationships will change with market changes, and limiting some relationships defined by prior information will inevitably interfere with and mislead the prediction task.

Modeling the relationships between stocks in stock prediction research is highly challenging, but a few studies have explored alternative methods to model the relationships between stocks (companies), thus overcoming the limitations imposed by predefined relationship methods on prediction results. Cheng et al. [7] cleverly multiplied the nonlinear transformation of the source company attribute and the affiliated company attribute to capture the momentum spillover between stocks; Wang et al. [33] processed the interaction between stocks by coupling macro- and micro-variables to avoid the interference of limited predefined relationships on model learning. Chen et al. [4] and Zhong et al. [43] used the stock market as a complex network to model and analyze the relationship between stocks by the Spearman correlation coefficient. A promising option for studying the complex system of the stock market is to adopt the community detection technique in complex network theory [28]. Li et al. [19] and Chmielewski et al. [8] used complex network theory as a frontier method to analyze the correlation between stocks. However, their research was limited to a better understanding of the interactions between stock network diagrams and did not further incorporate the resulting correlations into stock forecasting tasks.

We illustrate the relationships within and between stock clusters in the stock network graph using Fig. 1 as an example. The intra-cluster relationships refer to the different relationships among different companies (stocks) within the same cluster, which can be explored through predefined knowledge. However, predefined knowledge is inherently limited, and manually collected predefined relationships may be incomplete or biased. In contrast to intra-cluster relationships, inter-cluster relationships among stocks are typically implicit and difficult to precisely define. In Fig. 1, stocks $s_q$ of the same color belong to the same cluster $\pi$, solid lines represent relationships between stocks, and dashed lines represent relationships between stock clusters. In the real world, stocks within a cluster (such as chip stocks) generally exhibit similar trends in price movements, and stocks from related clusters (such as telecommunications and technology) tend to fluctuate with the movements of stocks in the chip cluster. Therefore, without sufficient prior knowledge, it is still challenging to consider the momentum spillover effect transmitted by the complex system of the stock market through various company relationships to obtain better prediction performance by combining historical series information.

In this article, our goal is to predict the future price movement of stocks. To better describe the characteristics of each stock, we must not only model market information such as historical prices and financial news but also model and analyze the potential relationships between stock clusters. Therefore, we propose a dual-graph attention model (MF-DAT) based on multisource information fusion to realize our hypothesis. Specifically, the MF-DAT model includes three steps. First, the LMF module is used to fuse the different characteristics extracted from the stock to capture the correlation between different market information. Second, by learning the weighted long-term and short-term state characteristics of stocks, the attention mechanism enables our model to selectively superimpose feature information at different moments to avoid the adverse effects caused by too many redundant features in the learning data. Finally, by learning the relationship characteristics between stock clusters through the graph attention layer, different weights can be assigned to the relationships between different clusters, and the previously captured node representations can be selectively aggregated into the stock graph network. Extensive experiments on publicly available real-world datasets show the effectiveness of the proposed method.

In summary, the main contributions of this research work are as follows:

A novel multifeature fusion method, the LMF module, is applied to capture the interaction characteristics between multiple modalities of information in the stock market.
According to the correlation of stock yield fluctuations, the cluster structure between stocks is mined through community detection, and new stock relationships are discovered to learn the mutual influence of stocks, making up for the limitations of the previous predefined industry relationship to represent the relationship between stocks.
A dual-graph attention model (MF-DAT) based on multisource information fusion is proposed, and extensive experiments on stocks in the real S &P 500 index and NASDAQ index show that the proposed model outperforms other benchmark methods.

The rest of the study is organized as follows: in Sect. 2, we summarize and compare the relevant work. In Sect. 3, the general framework of the research methodology is first introduced, followed by a detailed explanation of the flow of the individual modules of the MF-DAT model. In Sect. 4, the specific information of our dataset source and the experimental setup is explained, and the experimental results of various models are presented and analyzed in detail. Finally, we summarize the content of this study and make recommendations for future work in Sect. 5.

The research content of this paper is directly related to stock trend prediction, stock network modeling, and the application of graph attention neural networks in the financial field.

2.1 Stock trend forecasting

In recent years, many researchers have been working on classification and regression methods to predict future trends and stock prices, respectively. However, some studies focus on methods based on a single price time-series to predict stock movements. Lin et al. [23] proposed a lightweight module that relies only on historical prices to predict the future movement of stocks. However, the stock market is a dynamic and complex system that is highly influenced by various market signals, so some studies inevitably ignore the intrinsic correlation of other factors. Some researchers are beginning to further refine the model by fusing different factors within the market and their interactions. Zhang et al. [41] significantly improved stock forecasting performance by combining attention-enhancing LSTM models with data from multiple sources. Wu et al. [37] improved the forecasting performance by automatically extracting the one-way relationship between multivariate time-series through the graph learning module. Liu et al. [24] designed a multiscale bidirectional deep neural network (MTDNN) to learn multiscale patterns of stock trading. After combining patterns from two-way learning, their model achieves state-of-the-art performance.

2.2 Modeling the network of stock relationships

Relationship network modeling analysis has become the focus of research in the field of stocks [26, 35], and recent studies have also shown that complex relationships between stocks are important for stock trend prediction. For a stock relationship network, each stock is considered a node, and while the definition of edges is not yet uniform, the most common is to define edges based on calculating the correlation of stock price fluctuations. Wang et al. [34] constructed an equity-correlated network using the Pearson coefficient to analyze the correlation structure and evolution of world stock markets. However, Pearson’s coefficient generally has a precondition; it assumes that the data follow a normal distribution and that the stock market is nonlinear. Therefore, Chen et al. [4] and Zhong et al. [2] measured the nonlinear relationship between stocks through the Spearman correlation coefficient and then used it as an edge to construct a network diagram of industrial stocks.

Recent work has begun to construct stock graphs based on some prior information. Cheng et al. [6] stored the relationships of linked entities extracted from financial news in a financial knowledge graph and predicted stock prices by combining the relationships that led the way in them. Feng et al. [10] and Wang et al. [35] built a stock network by extracting predefined entity relationships in Wikidata and achieved good performance. However, limiting stock correlations to specific, predefined relationships inevitably creates noise. With the deepening of researchers’ research on the stock market and their gradually analyzing and proving many properties of the stock market [2, 27, 40], people gradually realized that the stock market has many characteristics of complex systems, such as dynamics, sensitivity, and nonlinearity, so the latest work began to use complex networks to study stock networks [29] and identify the stock community structure through community detection algorithms. However, their research is limited to a better understanding of the interactions between stock network graphs and does not consider the resulting correlation matrix for downstream tasks.

2.3 Graph attention neural network

Because graph structure data maintain individual characteristics and complex relationships, they have been deeply studied in the financial field and spatio-temporal modeling field, etc. [12, 13]. Graph neural networks (GNNs) are widely used to obtain structural information by updating the representation of nodes [20, 44]. In response to the temporal and volatile nature of the stock market, researchers began to introduce GNN and its series of variants to better address the stock forecasting problem. Li et al. [22] used a GNN combined with improved genetic algorithm to predict stock market volatility; Chen et al. [3] extracted multiple relationship information on multiple relationships in the stock market for a prediction task based on a graph-based learning method. Chen et al. [8] used the joint model of a graph convolutional network (GCN) combined with firm relationships to make predictions, and experiments proved that the prediction model containing company relationship information can make more accurate predictions. However, through these methods, the weight of the company’s node attributes is fixed and equal, and the various relationships among the company’s nodes have different effects on the stock trend. Traditional GNNs cannot handle this situation, and researchers have introduced GAT to solve such problems. Kim et al. [18] designed a hierarchical graph attention network to aggregate stock relationships with different weights, which improved the performance of the model’s prediction. Hsu et al. [15] propose a financial graph attention model to learn the short-term and long-term sequence of stock time-series in layers, and the model shows excellent performance.

3 Our proposed MF-DAT scheme

The overall framework of the research method in this paper is shown in Fig. 1. We first preprocess the characteristics of the market information, perform multifeature fusion through the LMF module, use the attention mechanism to learn the short-term state characteristics of stocks, the long-term state characteristics of stocks, and the relationship characteristics between clusters through the attention layer of the two-layer graph, and finally train the learned features through a simple neural network. Later, we will elaborate on how each module learns the different state characteristics of stocks (Fig. 2).

3.1 Preliminary

The purpose of this step is to first extract historical price characteristics and text sentiment characteristics from historical price series data and financial news information over a certain period and then mine the community structure of the stock network through the correlation of stock yield fluctuations.

Historical price feature extraction LSTM networks are widely used by researchers because of their ability to effectively capture the features of sequence data [10, 16, 38]. In this paper, we also introduce LSTM to extract historical features over time. Historical price series, on the other hand, have many different initial characteristics, such as the opening price, closing price, and high and low prices of the day. In this article, we calculate the rate of change in stock prices as input to LSTM. Calculate the rate of change in the closing price of a stock using Eq. 1:

$$\begin{aligned} r_{i, j}^{s_q}=\frac{p_{i, j}^{s_q}-p_{i, j-1}^{s_q}}{p_{i, j-1}^{s_q}} \end{aligned}$$

(1)

where $r_{i, j}^{s_q}$ represents the rate of change of the price of stock $s_q$ on the jth day of week i, and $p_{i, j}^{s_q}$ represents the closing price of stock on the jth day of week i.

We represent the price change rate series of stock $s_q$ in week i:$p_i^{s_q}=\left\{ r_{i, 1}^{s_q}, \ldots , r_{i, j-1}^{s_q}, r_{i, j}^{s_q}\right\}$. Using LSTM to learn the initial historical characteristics of the stock, the last hidden state $h_i^{s_q}$ of the output is embedded in the order as the historical price characteristics of the stock for one week, so we have

$$\begin{aligned} h_i^{s_q}={\text {LSTM}}\left( p_i^{s_q}\right) \in {\mathbb {R}}^L \end{aligned}$$

(2)

where L represents the price feature vector dimension.

Historical text feature extraction We represent the sequence of text paragraphs of stock $s_q$ on day jth in week i $q_i^{s_q}=\left\{ n_{i, 1}^{s_q}, \ldots , n_{i, j-1}^{s_q}, n_{i, j}^{s_q}\right\}$. The text characteristics of stock $s_q$ in week i are captured by the BERT model, i.e., we have

$$\begin{aligned} v_i^{s_q}={\text {BERT}}\left( p_i^{s_q}\right) \in {\mathbb {R}}^{L'} \end{aligned}$$

(3)

where L’ represents the price feature vector dimension.

Construction of stock networks According to Eq. 1, the return of stock $s_q$ is $R^{s_q}=\left( p_1^{s_q}, \ldots , p_i^{s_q}\right)$, and the corresponding return range can be defined as $\left[ \min p_i^{s_q}, \max p_i^{s_q}\right]$. This interval is then divided equally into m subintervals. The approximate probability of calculating the stock’s return $p_i^{s_q}$ falling in the mth subrange is:

$$\begin{aligned} p_m^{s_q} \approx \frac{f_m^{s_q}}{d} \end{aligned}$$

(4)

where ${f_m^{s_q}}$ is the frequency of stocks $s_q$ falling in the mth interval and d is the sample size (number of stocks). Therefore, the return entropy of stock $s_q$ is:

$$\begin{aligned} H\left( R^{s_q}\right) =-\sum _{m=1}^m p_m^{s_q} \log _2 p_m^{s_q} \end{aligned}$$

(5)

Defining the yield of stock $s_s$ as $R^{s_s}$, its corresponding yield range is $\left[ \min p_i^{s_s}, \max p_i^{s_s}\right]$. Then $\left[ \min p_i^{s_q}, \max p_i^{s_q}\right]$ $\times$ $\left[ \min p_i^{s_s}, \max p_i^{s_s}\right]$ are divided into M$\times$M subintervals. The approximate probability of calculating the combined return ($p_i^{s_q}$,$p_i^{s_s}$) of stock $s_q$ and stock $s_s$ falling in the sub-interval (k,l) is

$$\begin{aligned} p_{k, l}^{s_q, s_s} \approx \frac{f_{k, l}^{s_q, s_s}}{d} \end{aligned}$$

(6)

The joint entropy of the joint rate of return is

$$\begin{aligned} H\left( R^{s_q}, R^{s_s}\right) =-\sum _{k=1}^m \sum _{l=1}^m p_{k, l}^{s_{q,}, s_s} \log _2 p_{k, l}^{s_q, s_s} \end{aligned}$$

(7)

Finally, the mutual information representation of stock $s_q$ and stock $s_s$ is obtained:

$$\begin{aligned} I\left( R^{s_q}, R^{s_s}\right) =H\left( R^{s_q}\right) +H\left( R^{s_s}\right) -H\left( R^{s_q}, R^{s_s}\right) \end{aligned}$$

(8)

To facilitate the comparison of mutual information, mutual information is generally standardized, and the greater the standardized mutual information is, the stronger the correlation between stocks.

$$\begin{aligned} N M I=\left( R^{s_q}, R^{s_s}\right) =\frac{2 I\left( R^{s_q}, R^{s_s}\right) }{H\left( R^{s_q}\right) +H\left( R^{s_s}\right) } \end{aligned}$$

(9)

By calculating the mutual information between stocks, we obtain a fully connected network, but there is still redundant information in the network, so this paper selects the threshold method [34] to initially filter the network and screen the key information. Control the filtering of network information by adjusting thresholds. Finally, the Louvain algorithm is used to detect the community of the stock network, and the new stock relationship is mined according to the stock community structure.

3.2 Learning stock sequential embeddings

3.2.1 Multifeature fusion

Stock trends are affected by multiple of information, not only the stock prediction field, including the information security field [39] based on multifeature fusion has higher matching accuracy, so determining how to efficiently integrate multiple modal feature vectors is very important [36]. The traditional fusion method (as shown in Fig. 3) directly stitches the feature vectors and ignores the correlation between modes, while the TFN used in [21] and [42] increases the feature dimension. To extract high-level features from historical price and text data more efficiently while preserving the correlation between modes, this paper uses the LMF module [25] to compute tensor-based ensembles by utilizing parallel decomposition of low-rank weighted tensors and input tensors. The formula for calculating the fusion feature vector of stocks is as follows:

$$\begin{aligned} e_i^{s_q}=\left( \sum _{i=1}^r w_h^{(i)} \otimes w_v^{(i)}\right) {\varvec{Z}}=\left( \sum _{i=1}^r w_h^{(i)} \cdot h_i^{s_q}\right) \circ \left( \sum _{i=1}^r w_v^{(i)} \cdot v_i^{s_q}\right) \end{aligned}$$

(10)

where Z is the input feature tensor and r is the number of stocks. $w_h^{(i)} \in {\mathbb {R}}^{h \times l}$ and $w_v^{(i)} \in {\mathbb {R}}^{v \times l}$ correspond to the low-order factors corresponding to the price mode and the text mode, respectively. $\otimes$ represents the product of exterior tensors provided on two feature vectors of the input, and $\cdot \circ \cdot$ represents the product of elements.

3.2.2 Sequential learning

Given the characteristics of high variability, a large time span, and high nonlinearity of stock market data, too much data input will make the nave feature extraction model unable to capture more accurate vector representations of some time points, which will lead to low accuracy in downstream prediction tasks. By introducing the attention mechanism, our model can selectively superimpose feature information at different moments to avoid poor performance caused by too many redundant features in the learning data. Given $E_i=\left\{ e_{i, 1}^{s_q}, \cdots , e_{i, t-1}^{s_q}, e_{i, t}^{s_q}\right\}$ as input for the attention learning module, we can obtain the weighted short-term state feature vector representation $g_i^{s_q}$ of the fusion features of stocks $s_q$ at week i:

$$\begin{aligned} g_i^{s_q}= & {} {\text {Attention}}\left( E_i\right) =\sum _{j=1}^t \alpha _{i j} e_{i, j}^{s_q} \end{aligned}$$

(11)

$$\begin{aligned} \alpha _{i j}= & {} \frac{\exp \left( W_g e_{i, j}^{s_q}\right) }{\sum _{k \in t} \exp \left( W_g e_{i, k}^{s_q}\right) } \end{aligned}$$

(12)

where $W_g$ is the matrix of learnable parameters.

3.3 Learning stock relational embeddings

The ultimate goal of the learning stock relationships module is to update the characteristics of nodes in a stock community network identified by community detection. The accuracy of node feature descriptions is the key to the graph prediction task, so the model needs to learn more effective relationship information for aggregation. To this end, we propose a way to better collect different types of relationship information under different types of information through two different layers of the graph attention layer and filter out redundant features that are not related to downstream tasks. Specifically, we design the within-class graph attention learning layer and the interclass graph attention learning layer and model and analyze the weights between different nodes and the linkage effect between stock communities in the long-term state of stocks.

3.3.1 Intra-cluster graph attention learning

In the real stock market, stock prices are affected not only by their short-term state but also by their long-term movements. Therefore, it is necessary for the model to learn different weights between different time points in the long-term state of stocks. First, the short-term (weekly level) feature $g_i^{s_q}$ learned by stock $s_q$ based on the attention mechanism is embedded in the node, we aim to learn the weighted long-term state feature vector representation of the stock through the attention learning layer in the community according to the short-term features. By embedding the short-term feature vector $g_i^{s_q}$ of the past x weeks of week i, we can obtain the unweighted long-term state series representation $G^{s_q}=\left\{ g_{i-x}^{s_q}, g_{i-x-1}^{s_q}, \cdots , g_{i-1}^{s_q}\right\}$ of stock $s_q$. The first layer of the graph attention network is used to learn the attention weight of each short-term state of stocks in the same community and update the characteristic information of each node according to the attention coefficient. The formula for calculating the attention coefficient $\beta _{i j}$ between the state of the target period i and the state of the target period j is

$$\begin{aligned} \beta _{i j}=\frac{\left. \exp \left( {\text {Leaky}} {\text {Re}} L U\left( r_u{ }^T \mid W_u g_i^{s_q} \Vert W_u g_j^{s_q}\right] \right) \right) }{\left. \sum _{k \in x} \exp \left( {\text {Leaky}} {\text {Re}} L U\left( r_u^T{ }^T W_u g_i^{s_q} \Vert W_u g_k^{S_q}\right] \right) \right) } \end{aligned}$$

(13)

where $W_u$ is the matrix of learnable parameters.

Based on the obtained periodic state attention coefficient $\beta _{i j}$, we can obtain the weighted long-term state characteristics:

$$\begin{aligned} u_i^{s_q}=G A T\left( G^{s_q}\right) =\sigma \left( \sum _{j \in d} \beta _{i j} W_u g_j^{s_q}\right) \end{aligned}$$

(14)

3.3.2 Inter-cluster graph attention learning

Based on the attention mechanism layer and the attention learning layer of the intra-cluster graph, the stock-weighted short-term state feature vector and the weighted long-term state feature vector can be learned, respectively. But the relationship between the stock cycle states within the community is only part of the relationship between stocks, and the complex relationship between stock network communities will also affect the stock price trend. Paying attention to the learning layer in the inter-cluster diagram is to apply the attention mechanism to learn the different weights between the communities and avoid ignoring other complex relationships between stocks. Unlike previous work [10, 16, 38], we first perform community detection on the stock network through the Louvain algorithm, using one of the stock communities as a node $\pi _a$, and the node contains all the stocks in this community, instead of treating each stock as a node and then building a fully connected community network $G_\pi$.

Before calculating the inter-cluster attention coefficient, we need to generate a cluster feature vector embedding based on the weighted long-term characteristics of stocks in the community and the community structure features through the graph pooling operation, which is inspired by the work of [32] and chooses to use the graph pooling operation. The inter-cluster diagram that defines cluster $\pi _a$ is $G_{\pi _a}=\left( M_{\pi _a}, E_{\pi _a}\right)$, $M_{\pi _a}$ is the set of stocks belonging to cluster $\pi _a$, and $E_{\pi _a}$ is the set of edges between stocks $s_q$ and $s_s$, $s_q, s_s \in M_{\pi _a}$. By connecting the weighted long-term state feature vector $u_i^{s_q}$ of stocks and the learned cluster structure feature vector $e^{s_q}$ and representing it as $\tau _i^{s_q}$, we generate the embedding vector $z_{\pi _a}$ of cluster $\pi _a$ by the element-level maximum pooling operation:

$$\begin{aligned} z_{\pi _a}={\text {MaxPool}}\left( \left\{ \tau _i^{s_q} \mid \forall s_q \in M_{\pi _a}\right\} \right) \end{aligned}$$

(15)

Finally, we can obtain the feature vector representation of the sequence $z_\pi =\left\{ z_{\pi _a}, z_{\pi _b}, \ldots , z_{\pi _k}\right\}$ for all clusters, where k is the number of stock network communities.

Because the strength of the relationship between each stock cluster may be different, we use the graph attention network to learn the different relationship weights between the communities, and the relationship coefficient between the target stock cluster $\pi _a$ and the target stock community $\pi _b$ is calculated by the following formula:

$$\begin{aligned} \eta _{\pi _a \pi _b}=\frac{\exp \left( {\text {Leaky}} {\text {Re}} L U\left( r_\pi ^T\left[ W_\pi z_{\pi _a} \Vert W_\pi z_{\pi _b}\right] \right) \right) }{\sum _{\pi _k \in G_\pi } \exp \left( {\text {Leaky}} {\text {Re}} L u\left( r_\pi ^T\left[ W_\pi z_{\pi _a} \Vert W_\pi z_{\pi _k}\right) \right) \right) } \end{aligned}$$

(16)

Finally, the aggregate features can be calculated as the weighted average of the source hidden features with the sigmoid function; the formula is

$$\begin{aligned} \tau _{\pi _a}=G A T\left( z_\pi \right) =\sigma \left( \sum _{\pi _b \in G_\pi } \eta _{\pi _a \pi _b} W_\pi z_{\pi _b}\right) \end{aligned}$$

(17)

where $W_\pi$ is a matrix of learnable parameters.

3.4 Stock trend prediction

The last layer of the model is mainly used to aggregate the stock representation learned by the previous attention layer, aggregate to generate the final feature vector of stocks, and predict the final stock representation using the shallow neural network and the softmax function, as shown in Fig. 1III.

Considering the influence of the long-term and short-term states of stocks and the relationship between stock clusters, we connect the weighted short-term state feature vector $g_i^{s_q}$, the weighted long-term state feature $u_i^{s_q}$, and the inter-cluster relationship feature $\tau _{\pi _a}$ to the final feature vector representation $\textit{rep}_i^{s_q}$:

$$\begin{aligned} \textit{rep}_i^{s_q}=\left[ g_i^{s_q} \oplus u_i^{s_q} \oplus \tau _{\pi _a}\right] ^T \end{aligned}$$

(18)

We take the stock trend prediction problem as a classification task, that is, the predicted result will appear as {up, neutral, down} and the prediction result label is completely connected by the layer:

$$\begin{aligned} {\hat{y}}_i^{s_q}={\text {softmax}}\left( W_r r e p_i^{s_q}+b_r\right) \end{aligned}$$

(19)

where $W_r$ is a trainable weight matrix and $b_r$ is the bias vector.

Stock trend prediction is a multiclassification problem, so we define minimizing cross-entropy as a loss function of the model:

$$\begin{aligned} \text { loss }=-\sum _{s_q \in \phi _s} \sum _i^l y_i^{s_q} \ln {\hat{y}}_i^{s_q} \end{aligned}$$

(20)

where $y_i^{s_q}$is the true value label of stock $s_q$ in week i, and $\phi _s$ is defined as the collection of all stocks in the dataset.

Finally, we introduce the complete training process of the proposed model MF-DAT in detail and explain the input variables, feature extraction and fusion, model optimization, and other processes. In the pseudocode below, step 2 reflects the process of establishing the stock community network, and steps 9–11 are the key parts of the algorithm. Steps 13 and 14 describe the optimization process of the model.

4 Performance analysis

In this section, we verify the validity of the proposed model MF-DAT model on three real datasets and compare and analyze several SOTA models in recent years for a comprehensive evaluation. We introduce the dataset and its sources, then elaborate on the experimental parameter settings, and finally analyze the results of each experiment in turn.

4.1 Datasets

Stock price data For the historical stock price series, we collected two real-world datasets: NASDAQ [3] and S &P 500 [4], the details of which are described in the Table 1. The two datasets are from the US stock market and contain daily stock trading data from 2013/02/08 to 2018/03/27. We divide the dataset into three phases: training, validation, and testing. Financial news data For financial news, we chose to align with the timeframe of the historical price dataset, obtaining news about the target stock from Benzinga.com at the same time (Benzinga is a financial data company that publishes 50–60 articles per day). Finally, our financial news dataset consists of more than 1.3 million news headlines.

Table 1

Statistics of datasets

Markets	S &P 500	NASDAQ
Stocks	423	1026
Training period	08/02/2013–23/05/2017 1080 days	08/02/2013–23/05/2017 1080 days
Validation period	24/05/2017–27/03/2018 213 days	24/05/2017–27/03/2018 213 days
Testing period	27/03/2018–29/08/2019 316 days	27/03/2018–29/08/2019 316 days

Table 2

Comparison of classification accuracy predicted by different models

Methods	S &P 500		NASDAQ
Methods	Accuracy	F1	Accuracy	F1
MLP [32]	35.68%	30.22%	36.68%	34.59%
CNN [14]	39.39%	32.78%	38.39%	36.87%
A-LSTM [9]	41.55%	40.28%	41.24%	40.39%
RGCN [31]	45.08%	44.21%	46.37%	45.28%
TGC [10]	46.74%	46.26%	48.56%	47.57%
FinGAT [15]	50.08%	49.47%	50.79%	49.68%
AD-GAT [7]	51.05%	50.34%	51.38%	50.67%
STHAN-SR [30]	51.54%	50.65%	51.89%	50.88%
MF-DAT	55.61%	54.57%	56.13%	55.69%

4.2 Evaluation metrics

In general, the ultimate goal of proposing new models to predict stock trends is profitability, and to measure the profitability of our proposed method, we use common trading strategies to simulate stock trading activity. Specifically, the vector predicted by the model is set to be three-dimensional, and each dimension represents the predicted probability of rising, flattening, and falling. When the probability of rising is highest, the stock is bought at the closing price of the day; if the probability of decline is highest, the stock is sold at the closing price of the day; otherwise, no trading activity is carried out.

To evaluate the profitability of the proposed method, we measure the profitability of the model by two indicators: cumulative return on investment and the Sharpe rate. The cumulative return on investment is calculated as follows:

$$\begin{aligned} \textit{IRR}_i^t=\sum _{i \in F^{t-1}} \frac{p_i^t-p_i^{t-1}}{p_i^{t-1}} \end{aligned}$$

(21)

where $F^{t-1}$ represents a group of stocks traded at time t-1 and $p_i^t$ represents the price of stock i at time t.

The Sharpe rate is a comprehensive consideration of trading returns and risks and can be used to measure the performance of investment risk compared to return. The Sharpe rate is calculated as follows:

$$\begin{aligned} S R_a=\frac{E\left[ R_a-R_f\right] }{{\text {std}}\left[ R_a-R_f\right] } \end{aligned}$$

(22)

where $R_a$ represents the return on investment and $R_f$ represents the risk-free rate.

To compare the effect of the benchmark model and the classification prediction of our model, we select the two evaluation indicators widely used in classification tasks: accuracy and F1-score. Their calculation formula is

$$\begin{aligned} A c c= & {} \frac{T P+T N}{T P+T N+F P+F N} \end{aligned}$$

(23)

$$\begin{aligned} F 1= & {} 2 * \frac{\text { Recall } * \text { Precision }}{\text { Recall }+\text { Precision }} \end{aligned}$$

(24)

$$\begin{aligned} \text { Recall }= & {} \frac{T P}{T P+F P}, \text { Precision }=\frac{T P}{T P+F N} \end{aligned}$$

(25)

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.

4.3 Baseline method

To verify the effectiveness of our proposed model, we select several classical time-series forecasting models as benchmark models, which do not consider the influence of stock relationships on stock trends. In addition, we also chose an SOTA model that considers the complex relationships between stocks. We compare the results with the models below.

Methods without considering stock relationships:

1. MLP [32] MLP is one of the most widely used models in the field of time-series forecasting. In this paper, we use a simple MLP model, which has four layers, including two hidden layers of 128 dimensions and 64 dimensions, and the learning rate is set to 0.0001.

2. CNN [14] Convolutional neural networks are fast to model, so they are also widely used in the field of stock forecasting. We used a CNN network with three convolutional layers and one fully connected layer and selected RMSprop as the optimizer with a learning rate set to 0.01.

3. A-LSTM [9], as a widely used neural network model in the field of time-series forecasting, has superior performance in stock forecasting tasks. In our benchmark model, we use the LSTM model with two layers to learn the final prediction based on historical price data.

Methods to consider stock relationships:

4. RGCN [31] uses the GCN model with two convolutional layers, uses historical price data as input to nodes, reconstructs the stock graph with historical information containing the relationships of target companies, and uses Adam as an optimizer.

5. TGC [10] The temporal profile convolution module proposed by Feng et al. is used for stock relationship modeling. This article uses the same parameter settings as the original article.

6. AD-GAT [7] is an SOTA model that further models and analyzes market information through feature interaction, which improves the performance of stock trend prediction tasks. Initialize all parameters with Glorot, select Adam as the optimizer, and set the initial learning rate to 0.0005.

7. FinGAT [15]The dimensions for hidden layers of GAT is set to 16.A learning rate is set to 5e-4 and the batch size is set to 128. Optimize all the models by the Adam.

8. STHAN-SR [30]A learning rate is set to 1e-4 and use Xavier initialization for all weights.The number of attention heads K=4 and select Adam as the optimizer.

9. MF-DAT is the method proposed in this article. Our MF-DAT model was implemented in TensorFlow and used the Adam optimizer on the NVIDIA Tesla K80 GPU with adjusted parameters of 100 epochs, an initial learning rate set to 5e-4, weight decay of 5e-5, and a batch size of 32. To mitigate overfitting, we set dropout=0.5 at the end of each layer.

4.4 Results and analysis

In this section, we will evaluate the performance of the proposed model based on a series of experimental results. We compare them not only with classic models but also with current advanced SOTA models. Furthermore, ablation experiments are used to explore the different effects of different components and hyperparameters on MF-DAT performance.

4.4.1 Classification accuracy

Table 2 summarizes the accuracy scores of the benchmark model and our method in the classification task. Classical models such as LSTM and CNN only consider time-series data and perform much worse than graph-based models. Experiments show that the intervention of stock relationship factors is conducive to stock trend prediction. Comparing our proposed method with other graph-based models, we find that our model, the MF-DAT performs better than the other models. Although graph-based methods such as GCN and TGC consider that the relationship between stocks will affect the trend of stocks, they are based on predefined industry relationships or market prior information to model and analyze the interaction between stocks, such as the TGC [10] method, which extract the industry relationship in Wikidata to mine the relationship between stocks. Although Wikidata has rich relational data, the limitations of predefined industry relationships inevitably lead to prediction bias. We think that is why our model achieves better results. The MF-DAT model is proposed to avoid the misinformation caused by the lack of prior information to improve the prediction effect.

Overall, our proposed model has more accurate and stable predictive performance than other benchmark models. In the S &P 500 dataset, accuracy and F1-score outperformed the second place by approximately 4.07% and 3.92%, respectively; in the NASDAQ dataset, the second place was exceeded by 4.24% and 4.81%, respectively.

4.4.2 Profitability

To verify the profitability of the model, we backtested stock trades during testing and simulated trading. Table 3 summarizes the profitability results of different models while keeping the trading strategy unchanged. Figure 2 shows the change in asset value in simulated trading by different methods. During the backtesting period, the S &P 500 index and the NASDAQ index rose by 10.47% and 15.82%, respectively (Yahoo Finance data for the same period showed that the S &P 500 index increased from 2615.75 to 2889.75, and the NASDAQ index increased from 6561.25 to 7599.25). From Table 3 and Fig. 2, it can be concluded that our model is more stable in terms of profitability relative to other models. From the results of IRR, the cumulative returns of the MF-DAT model are 13.69% and 16.79%, respectively, which are higher than the increase of the index in the same period and higher than that of other benchmark models. From the SR results, the Sharpe ratio of MF-DAT is positive and larger, indicating that the growth rate of stock returns during the period was higher than the risk-free rate and that the risk-reward obtained was higher as well Fig. (4).

Table 3

Comparison of profitability results of backtesting of different models

Methods	S &P 500		NASDAQ
Methods	IRR	SR	IRR	SR
MLP [32]	$-$5.01%	0.4243	$-$3.28%	0.5443
CNN [14]	$-$2.12%	0.4272	$-$1.87%	0.5873
A-LSTM [9]	3.59%	0.8363	3.98%	0.7852
RGCN [31]	4.50%	1.1681	5.69%	0.9643
TGC [10]	6.87%	1.2599	7.24%	1.1354
FinGAT [15]	9.45%	1.7253	10.31%	1.7958
AD-GAT [7]	10.49%	1.8186	11.88%	1.8936
STHAN-SR [30]	10.51%	1.9327	12.31%	1.9862
MF-DAT	13.69%	2.7890	16.79%	2.6793

4.5 Ablation study

To test the effectiveness of the component blocks of the MF-DAT model, we performed related ablation experiments on two datasets, and we designed four variants by culling and replacing a module:

(1) w/o LMF: MF-DAT does not fuse the characteristics of price data and news text through the LMF module but directly connects the two in series.

(2) w/o Intra-cluster Graph Attention Learning (w/o intra): MF-DAT eliminates the long-term state learning module of stocks and directly embeds the short-term state of stocks learned earlier into the structure of the stock community graph.

(3) w/o Inter-cluster Graph Attention Learning (w/o inter): MF-DAT eliminates the relationship state learning module between stock communities and directly predicts the long-term and short-term state feed training layers of stocks learned earlier.

(4) w/o community detection (w/o com): MF-DAT no longer mines the stock community structure to build a graph through community detection but extracts the predefined stock relationship in Wikidata to generate a stock relationship diagram by traditional methods.

From the results in Table 4, the full MF-DAT model produces better performance, and removing any one of these modules results in worse results, which proves the effectiveness of each component in the model. Each module plays a different role in two different datasets, but both help improve the predictive performance of the model. Through the experimental study of these four variant models, compared with the other three cases, banning community detection and mining of stock community structure to construct stock relationship graphs will lead to the greatest decline in model prediction performance. This shows that the implicit relationship between the stock community structure mined by community detection can make up for the current problem of insufficient prior information and has a significant effect on model prediction. In contrast, removing LMF modules for multifeature fusion brings minimal performance degradation. Nevertheless, the above results prove that the proposed model performs better in the prediction task.

Table 4

Comparison of MF-DAT ablation experimental results

Variants	S &P 500		NASDAQ
Variants	Accuracy	F1	Accuracy	F1
w/o LMF	53.86%	52.43%	54.37%	52.89%
w/o intra	52.25%	51.17%	53.63%	52.32%
w/o inter	51.27%	50.23%	52.29%	51.48%
w/o com	50.49%	49.85%	51.84%	50.39%
MF-DAT	55.61%	54.34%	56.13%	55.69%

From the results in Table 4, the full MF-DAT model produces better performance, and removing any one of these modules results in worse results, which proves the effectiveness of each component in the model. Each module plays a different role in two different datasets, but both help improve the predictive performance of the model. Through the experimental study of these four variant models, compared with the other three cases, banning community detection and mining of stock community structure to construct stock relationship graphs will lead to the greatest decline in model prediction performance. This shows that the implicit relationship between the stock community structure mined by community detection can compensate for the current problem of insufficient prior information and has a significant effect on model prediction. In contrast, removing LMF modules for multifeature fusion results in minimal performance degradation. Nevertheless, the above results prove that the proposed model performs better in the prediction task.

4.6 Case study of different thresholds

To explore the influence of the threshold on the model prediction task, we use community detection to mine different cluster structures of stocks based on five different thresholds and finally update the stock feature representation based on the stock relationship graph under different thresholds. The accuracy results of the model are shown in the Table 5, and different thresholds correspond to different stock cluster networks shown in the Fig. 3.

Table 5

Model performance under different thresholds

Threshold	S &P 500
Threshold	Edges	Clusters	Accuracy	F1
0.90	13,706	6	51.37%	50.59%
0.92	10,717	7	53.79%	52.48%
0.94	8019	8	55.61%	54.34%
0.96	5133	8	52.85%	51.65%
0.98	2624	9	50.83%	49.56%

Based on the results in Table 5, you can see that as the threshold increases, there are fewer edges connecting stocks, and the number of clusters increases. By comparing and analyzing the performance of the model under different thresholds, it can be concluded that the model shows the best performance at a threshold of 0.94. This is because too few thresholds correspond to too many edges, and the stock community detection is too coarse, which easily introduces too much redundant information that is not significantly relevant. When the threshold is too large, there are too few edges, and the detection is too detailed to ignore the relevant information between significantly related stocks. The threshold is too large or too small to accurately learn the relationship between stocks through the graph attention neural network, thus affecting the effect of the prediction task. Although the performance of the model at different thresholds varies, our model performs better that based on predefined industry relationships at most thresholds (Fig. 5).

5 Conclusion and the future work

In this study, to predict the future trend of stocks, we propose a new multifeature fusion dual-graph attention network. We fuse stock historical price characteristics and financial news characteristics to generate new stock features through the LMF module, which makes up for the problem that the interaction between market information cannot be captured when directly stitching multiple features. MF-DAT can learn multiple state characteristics of stocks from market information and stock relationships through a two-layer graph attention network. In particular, the MF-DAT model does not require predefined relationships when learning relationships between stocks, whereas most previous studies have been based on prior information. To evaluate the advanced nature of the proposed method, we conducted experiments on the public datasets S &P 500 and NASDAQ. Experimental results show that MF-DAT has better accuracy and higher return on investment.

However, our current research still has the limitations of being unable to effectively model different relationships between stocks for the dynamics of stock relationships and being unable to mine the overlapping community structure of the stock network. In response to current limitations, our research future can be further studied from the following directions: (1) although this paper avoids generating stock relationship diagrams based on predefined relationships between stocks but builds stock networks through stock fluctuation correlations, it does not consider that the interactions between stocks change with time, so the stock community structure is also dynamic. In future research, we should consider the impact of the dynamics of stock relationships on the forecasting task. (2) In this paper, the stock community structure is detected by Louvain’s algorithm, which can only detect nonoverlapping community structures and cannot mine overlapping community structures. We will consider applying more advanced community detection methods to mine overlapping community structures in stocks. 3) In future research, we consider applying the model to other fields, such as website traffic forecasting.

Acknowledgements

This work was supported in part by the Key R &D program of Zhejiang Province (2022C01083), the National Science Foundation of China (62102262), the Development Project of Xinjiang Production and Construction Corps 12th (no. SR202103), the Practice Conditions and Practice Base Construction of Ministry of Education (no.SR202102624032).

Declarations

Conflict of interest

The authors declare no conflict of interest.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Modeling methods of cylindrical and axisymmetric waterbomb origami based on multi-objective optimization

next article Pimo: memory-efficient privacy protection in video streaming and analytics

Ali, U., Hirshleifer, D.: Shared analyst coverage: Unifying momentum spillover effects. J. Financ. Econ. 136(3), 649–675 (2020)CrossRef

Chaudhari, K., Thakkar, A.: Neural network systems with an integrated coefficient of variation-based feature selection for stock price and trend prediction. Expert Systems with Applications p 119527 (2023)

Chen, Q., Robert, C.Y.: Graph-based learning for stock movement prediction with textual and relational data. The Journal of Financial Data Science 4(4), 152–166 (2022)CrossRef

Chen, W., Jiang, M., Zhang, W.G., et al.: A novel graph convolutional feature based convolutional neural network for stock trend prediction. Inf. Sci. 556, 67–94 (2021)MathSciNetCrossRef

Chen, Y., Wei, Z., Huang, X.: Incorporating corporation relationship via graph convolutional neural networks for stock price prediction. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp 1655–1658 (2018)

Cheng, D., Yang, F., Xiang, S., et al.: Financial time series forecasting with multi-modality graph neural network. Pattern Recogn. 121, 108218 (2022)CrossRef

Cheng, R., Li, Q.: Modeling the momentum spillover effect for stock prediction via attribute-driven graph attention networks. In: Proceedings of the AAAI Conference on artificial intelligence, pp 55–62 (2021)

Chmielewski, L., Amin, R., Wannaphaschaiyong, A., et al.: Network analysis of technology stocks using market correlation. In: 2020 IEEE International Conference on Knowledge Graph (ICKG), IEEE, pp 267–274 (2020)

Feng, F., Chen, H., He, X., et al.: Enhancing stock movement prediction with adversarial training. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (2018)

10.

Feng, F., He, X., Wang, X., et al.: Temporal relational ranking for stock prediction. ACM Transactions on Information Systems (TOIS) 37(2), 1–30 (2019)CrossRef

11.

Feng, S., Xu, C., Zuo, Y., et al.: Relation-aware dynamic attributed graph attention network for stocks recommendation. Pattern Recogn. 121, 108119 (2022)CrossRef

12.

Gao, J., Xu, C.: Learning video moment retrieval without a single annotated video. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1646–1657 (2021)CrossRef

13.

Gao, J., Zhang, T., Xu, C.: Learning to model relationships for zero-shot video classification. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3476–3491 (2020)CrossRef

14.

Hoseinzade, E., Haratizadeh, S.: Cnnpred: Cnn-based stock market prediction using a diverse set of variables. Expert Syst. Appl. 129, 273–285 (2019)CrossRef

15.

Hsu, Y.L., Tsai, Y.C., Li, C.T.: Fingat: Financial graph attention networks for recommending top-$k$ k profitable stocks. IEEE Trans. Knowl. Data Eng. 35(1), 469–481 (2021)

16.

Huang, K., Li, X., Liu, F., et al.: Ml-gat: A multilevel graph attention model for stock prediction. IEEE Access 10, 86408–86422 (2022)CrossRef

17.

Huang, W.C., Chen, C.T., Lee, C., et al.: Attentive gated graph sequence neural network-based time-series information fusion for financial trading. Information Fusion 91, 261–276 (2023)CrossRef

18.

Kim, R., So, C.H., Jeong, M., et al.: Hats: A hierarchical graph attention network for stock movement prediction. arXiv preprint arXiv:1908.07999 (2019)

19.

Li, H., An, H., Fang, W., et al.: Global energy investment structure from the energy stock market perspective based on a heterogeneous complex network model. Appl. Energy 194, 648–657 (2017)CrossRef

20.

Li, S., Wu, J., Jiang, X., et al.: Chart gcn: Learning chart information with a graph convolutional network for stock movement prediction. Knowl.-Based Syst. 248, 108842 (2022)CrossRef

21.

Li, W., Bao, R., Harimoto, K., et al.: Modeling the stock relation with graph network for overnight stock movement prediction. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp 4541–4547 (2021)

22.

Li, X., Jia, H., Cheng, X., et al.: Stock market volatility prediction method based on improved genetic algorithm and graph neural network. Journal of Computer Applications 42(5), 1624 (2022)

23.

Lin, H., Zhou, D., Liu, W., et al.: Learning multiple stock trading patterns with temporal routing adaptor and optimal transport. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp 1017–1026 (2021)

24.

Liu, G., Mao, Y., Sun, Q., et al.: Multi-scale two-way deep neural network for stock trend prediction. In: IJCAI, pp 4555–4561 (2020)

25.

Liu, Z., Shen, Y., Lakshminarasimhan, V. B., et al.: Efficient low-rank multimodal fusion with modality-specific factors. arXiv preprint arXiv:1806.00064 (2018)

26.

Lu, Y., Shi, C., Hu, L., et al.: Relation structure-aware heterogeneous information network embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 4456–4463 (2019)

27.

Ma, Y., Mao, R., Lin, Q., et al.: Multi-source aggregated classification for stock price movement prediction. Information Fusion 91, 515–528 (2023)CrossRef

28.

MacMahon, M., Garlaschelli, D.: Community detection for correlation matrices. arXiv preprint arXiv:1311.1924 (2013)

29.

Purqon, A., et al.: Community detection of dynamic complex networks in stock markets using hybrid methods (rmt-cn-lpam+ and rmt-bdm-sa). Frontiers in Physics 8, 492 (2021)CrossRef

30.

Sawhney, R., Agarwal, S., Wadhwa, A., et al.: Stock selection via spatiotemporal hypergraph attention network: A learning to rank approach. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 497–504 (2021)

31.

Schlichtkrull, M., Kipf, T.N., Bloem, P., et al.: Modeling relational data with graph convolutional networks. In: The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, Springer, pp 593–607 (2018)

32.

Tang, J., Deng, C., Huang, G.B.: Extreme learning machine for multilayer perceptron. IEEE transactions on neural networks and learning systems 27(4), 809–821 (2015)MathSciNetCrossRef

33.

Wang, G., Cao, L., Zhao, H., et al.: Coupling macro-sector-micro financial indicators for learning stock representations with less uncertainty. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 4418–4426 (2021)

34.

Wang, G.J., Xie, C., Stanley, H.E.: Correlation structure and evolution of world stock markets: Evidence from pearson and partial correlation-based networks. Comput. Econ. 51, 607–635 (2018)CrossRef

35.

Wang, H., Li, S., Wang, T., et al.: Hierarchical adaptive temporal-relational modeling for stock trend prediction. In: IJCAI, pp 3691–3698 (2021)

36.

Wang, J., Hu, Y., Jiang, T.X., et al.: Essential tensor learning for multimodal information-driven stock movement prediction. Knowledge-Based Systems p 110262 (2023)

37.

Wu, Z., Pan, S., Long, G., et al.: Connecting the dots: Multivariate time series forecasting with graph neural networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 753–763 (2020)

38.

Xu, C., Huang, H., Ying, X., et al.: Hgnn: Hierarchical graph neural network for predicting the classification of price-limit-hitting stocks. Inf. Sci. 607, 783–798 (2022)CrossRef

39.

Yang, J., Xiong, N., Vasilakos, A.V., et al.: A fingerprint recognition scheme based on assembling invariant moments for cloud computing communications. IEEE Syst. J. 5(4), 574–583 (2011)CrossRef

40.

Yang, X., Loua, M.A., Wu, M., et al.: Multi-granularity stock prediction with sequential three-way decisions. Inf. Sci. 621, 524–544 (2023)CrossRef

41.

Zhang, Q., Yang, L., Zhou, F.: Attention enhanced long short-term memory network with multi-source heterogeneous information fusion: An application to bgi genomics. Inf. Sci. 553, 305–330 (2021)MathSciNetCrossRef

42.

Zhao, Y., Du, H., Liu, Y., et al.: Stock movement prediction based on bi-typed hybrid-relational market knowledge graph via dual attention networks. IEEE Transactions on Knowledge and Data Engineering (2022)

43.

Zhong, T., Peng, Q., Wang, X., et al.: Novel indexes based on network structure to indicate financial market. Phys. A 443, 583–594 (2016)CrossRef

44.

Zhou, J., Cui, G., Hu, S., et al.: Graph neural networks: A review of methods and applications. AI open 1, 57–81 (2020)CrossRef

Title: MF-DAT: a stock trend prediction of the double-graph attention network based on multisource information fusion
Authors: Kun Huang
Xiaoming Li
Neal Xiong
Yihe Yang
Publication date: 01-06-2024
Publisher: Springer Berlin Heidelberg
Published in: Multimedia Systems / Issue 3/2024
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI: https://doi.org/10.1007/s00530-024-01333-9

Methods	S &P 500		NASDAQ
Methods	IRR	SR	IRR	SR
MLP [32]	\(-\)5.01%	0.4243	\(-\)3.28%	0.5443
CNN [14]	\(-\)2.12%	0.4272	\(-\)1.87%	0.5873
A-LSTM [9]	3.59%	0.8363	3.98%	0.7852
RGCN [31]	4.50%	1.1681	5.69%	0.9643
TGC [10]	6.87%	1.2599	7.24%	1.1354
FinGAT [15]	9.45%	1.7253	10.31%	1.7958
AD-GAT [7]	10.49%	1.8186	11.88%	1.8936
STHAN-SR [30]	10.51%	1.9327	12.31%	1.9862
MF-DAT	13.69%	2.7890	16.79%	2.6793

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Related work

2.1 Stock trend forecasting

2.2 Modeling the network of stock relationships

2.3 Graph attention neural network

3 Our proposed MF-DAT scheme

3.1 Preliminary

3.2 Learning stock sequential embeddings

3.2.1 Multifeature fusion

3.2.2 Sequential learning

3.3 Learning stock relational embeddings

3.3.1 Intra-cluster graph attention learning

3.3.2 Inter-cluster graph attention learning

3.4 Stock trend prediction

4 Performance analysis

4.1 Datasets

4.2 Evaluation metrics

4.3 Baseline method

4.4 Results and analysis

4.4.1 Classification accuracy

4.4.2 Profitability

4.5 Ablation study

4.6 Case study of different thresholds

5 Conclusion and the future work

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Other articles of this Issue 3/2024

PathNet: a novel multi-pathway convolutional neural network for few-shot image classification from scratch

Self-supervised graph clustering via attention auto-encoder with distribution specificity

MA-VLAD: a fine-grained local feature aggregation scheme for action recognition

Image retrieval based on deep Tamura feature descriptor

Dual-stream network with cross-layer attention and similarity constraint for micro-expression recognition

A Three-stage multimodal emotion recognition network based on text low-rank fusion