Der Artikel geht der Bedeutung genauer Aktienkursprognosen in der Finanzindustrie nach und betont ihre Rolle bei der strategischen Entscheidungsfindung. Traditionelle statistische Methoden werden diskutiert, wobei ihre Grenzen bei der Erfassung komplexer Muster und der Reaktion auf Marktveränderungen deutlich werden. Der Fokus verlagert sich dann auf das Potenzial des maschinellen Lernens und der Techniken des tiefen Lernens, insbesondere auf generative adversarial networks (GANs) und transformatorbasierte Aufmerksamkeitsmechanismen. Die Integration dieser fortgeschrittenen Methoden wird erforscht und zeigt ihre Fähigkeit, komplexe Datensätze zu verarbeiten und die Genauigkeit der Vorhersagen zu verbessern. Die Studie geht auch auf die Grenzen dieser Techniken ein und schlägt Strategien zu ihrer Überwindung vor. Der Artikel bewertet die Leistung der vorgeschlagenen Modelle anhand realer Börsendaten und diskutiert die Auswirkungen für Investoren und Finanzanalysten. Die Forschung zielt darauf ab, die Zuverlässigkeit und Genauigkeit von Aktienkursprognosen zu verbessern und zu fundierteren Investitionsentscheidungen beizutragen.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
Stock price prediction plays an important role in financial decision-making, enabling investors and analysts to make informed choices regarding trading and investment strategies. Traditional statistical methods have been utilized for the prediction of stock price, but it is often difficult for them to capture complex patterns, adapt to changing market conditions, handle large datasets, and automatically extract relevant features. Recent advancements in machine learning and deep learning offer promising solutions to address these challenges. In this paper, we propose a new approach to enhance the stock price prediction by leveraging generative adversarial networks (GANs) and transformer-based attention mechanisms. GANs are utilized to generate synthetic stock price data, and incorporating market sentiment and volatility. Attention mechanisms will selectively concentrate on the important features and patterns in the data, which may do good to the identification of key market indicators which will impact stock prices. By integrating market social media news which can tell about the sentiment and volatility, our model aims to improve the accuracy and robustness of stock price forecasts. We also address the limitations of GANs and attention mechanisms separately used in stock price prediction, such as unrealistic data generation and overfitting, by employing regularization techniques and incorporating additional data sources. Experimental evaluations using real-world stock market data will be conducted to compare the performance of our proposed models with conventional approaches. The findings of this research have implications for investors, financial analysts, and other stakeholders who are engaged in the stock market ecosystem, providing valuable insights for the investment strategies.
Hinweise
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
The prediction of stock prices holds significant importance in the field of finance for it enables investors and financial analysts to make well-informed decisions regarding the buying, selling and holding of stocks Yun et al. (2021). Accurate estimation of stock prices may help organizations in formulating strategic decisions pertaining to investments, mergers and acquisitions, and other business undertakings. There are a great number of methods which may be used for stock price prediction. For the application of statistical methods in stock price prediction, it has a long historical background, tracing its origins to the nascent stages of the stock market. Traders and investors applied rudimentary statistical techniques, including moving averages and regression analysis, to discern noteworthy trends and patterns in stock price. Over time, traditional statistical methods have become more and more sophisticated, and a variety of models and techniques were developed such as Monte Carlo simulation and time series analysis Jaoudé (2022).
For instance, moving average (MA) technical analysis is simple and popular. It calculates a stock’s average price over a period and smooths short-term swings. MA can be used in two ways: simple moving average for a time window-based average of stock prices and exponential moving average for enhanced responsiveness to recent price changes Sukparungsee et al. (2020). Besides, regression analysis studies the relationship between a dependent variable (e.g., stock price) and one or more independent variables. It can measure the impact of these independent variables on stock prices in stock price prediction. Time Series Forecasting analyses and predicts time-ordered data. Time series forecasting uses previous stock price patterns to anticipate future prices. Autoregressive integrated moving average (ARIMA) and exponential smoothing, such as simple exponential smoothing (SES) or Holt-Winters, assign exponentially decreasing weights to past observations to predict time series Kumar et al. (2022). These approaches can detect stock price trends and seasonality. Finally, Monte Carlo simulation generates random scenarios using probability distributions. It may simulate stock price paths based on model parameters. By evaluating many outcomes, investors can estimate the probability of reaching financial goals or the risk of alternative investment options Pažický (2017).
Anzeige
However, traditional stock price prediction approaches struggle to capture complex patterns, react to changing market conditions, address big and diverse datasets, handle noise and outliers, and automatically extract features. These approaches may fail to accurately capture stock price data’s complex nonlinear patterns due to simplistic assumptions and linear correlations between variables.
In recent years, there has been a notable surge in research related to machine learning and deep learning, resulting in substantial progress across multiple areas of application and may help address the problems traditional statistical methods may have. Transformer models like BERT and GPT, have significantly improved the field of natural language processing by attaining exceptional performance levels across several tasks, thereby establishing themselves as the current benchmark Mishev et al. (2020). Generative models, such as generative adversarial networks (GANs) and variational autoencoders (VAEs), have been applied for the purpose of synthesizing realistic images, videos, and audio content Diqi et al. (2022). There is no doubt that the latest advancements in machine learning and deep learning possess the capability to revolutionize numerous businesses and fields, including healthcare, banking, entertainment, and education. In the field of healthcare, machine learning models are currently being employed to enhance the accuracy of disease diagnosis and treatment. Similarly, within the field of finance, these models are being utilized to detect fraudulent activities, manage risks effectively and stock price forecasting Aziz et al. (2022).
The application of machine learning and deep learning techniques in the field of stock price prediction has seen significant developments in recent times, exhibiting considerable potential for enhancing the accuracy of such predictions. Two methodologies that have garnered considerable interest in the domain are generative adversarial networks (GANs) and attention processes. Generative adversarial network (GAN) is a kind of neural network structures where there are two distinct networks, namely a generator and a discriminator, and they collaborate to acquire knowledge about the fundamental data distribution and produce synthetic data that closely resembles the authentic data. GANs have been applied in the area of stock price prediction to generate synthetic stock market data that will represent the inherent data distribution Jadhav et al. (2021). The use of the data generated can ultimately enhance the accuracy of forecasts in scenarios with limited data. Attention mechanisms, on the other hand, are a technique that enables neural networks to focus on the most important features or patterns in the data Liu and Wang (2019). This can be particularly valuable for the prediction, as it can help to identify key market indicators that may have influence on the stock prices.
Recent research has shown the effectiveness of generative adversarial networks (GANs) and attention mechanisms applied in stock price forecasting. A study conducted by Kang et al. (2019) employed a data augmentation strategy based on generative adversarial networks (GANs) to enhance the predictive accuracy of a conventional long short-term memory (LSTM) model in stock price prediction. The findings of the study illustrated that a GAN-based data augmentation strategy resulted in enhanced model accuracy and mitigated the problem of overfitting. In a recent investigation conducted by Cheng et al. (2018), an attention-based LSTM model was employed to forecast stock prices. The model incorporated a combination of structured and unstructured data such as news and social media posts. The research findings indicated that attention mechanism led to enhanced interpretability of the model and facilitated more precise feature selection.
Anzeige
The current limitations of applying GANs or attention mechanisms only in stock price prediction may increase the need for further research to improve their effectiveness and reliability. One limitation of GANs is that they may generate unrealistic data and suffer from mode collapse, which will result in inaccurate predictions and limit the usefulness of GANs in stock price prediction. Similarly, attention mechanisms require large amounts of data and can suffer from overfitting, where the model performs well on the training data but poorly on new data. This can limit the generalizability of the model and lead to inaccurate predictions in real stock market. In this paper, a novel method for improving stock price forecasting is built to use GANs and transformer-based attention mechanisms, as well as a combination of market sentiment and volatility. Incorporating these factors into the models will improve the accuracy and dependability of the forecasts. In addition, the techniques will be investigated for overcoming the current limitations of GANs and attention mechanisms, such as the use of regularization techniques to reduce overfitting and additional data sources to enhance data diversity.
To evaluate the efficacy of the proposed models, we will conduct experiments with real stock market data and compare their performance to that of our model. In addition, we will investigate the models’ potential limitations as well as their implications for investors and financial analysts.
2 Specific aims
1.
Leveraging the capabilities of generative adversarial networks (GANs) and attention mechanisms based on transformer models to enhance the accuracy of stock price prediction.
2.
By incorporating attention mechanisms, the model can improve the interpretability of stock price prediction models. It can focus on the most vital features or patterns in the datasets, which will allow for a better understanding of the underlying factors driving stock price movements.
3.
Integrating social media news and volatility data into the prediction models to capture the impact of market conditions and investor sentiment on stock prices. This inclusion of market dynamics can lead to more accurate and robust predictions.
4.
Addressing the limitations of traditional statistical methods in stock price prediction such as hard to capture complex patterns and handle large and diverse datasets.
5.
Conducting evaluations to assess the performance of the proposed models. Other real stock market data will be used to compare the reliability of the GANs and attention-based model. Furthermore, the research aims to explore the implications and how it perform under risks in terms of its usefulness for investors, financial analysts, and other stakeholders.
3 Materials and methods
3.1 Datasets
3.1.1 Yahoo finance
The data used for this research are sourced from reliable and widely used platforms, primarily Yahoo Finance. The data contain six years of historical stock price data for Apple Inc. (AAPL) traded on the NASDAQ exchange, covering the period from 2017 to 2023. The choice of Apple Inc. may be due to its prominence in the technology sector and its significant market capitalization.
In order to gain insights for the future stock price prediction of Apple Inc., additional data from other major technology companies such as Amazon, Microsoft, and Google are also collected. These companies are considered as reference due to their influence on the technology industry and their possible impact on Apple’s stock price.
Historical data from major stock exchanges including NASDAQ, NYSE, and RUSSELL2000 are incorporated for purpose of capturing the broader market trends. These stock index data will provide a wider perspective and supplement the analysis of Apple Inc.’s stock price.
To reflect the market volatility, the VIX index, also known as the CBOE Volatility Index, is included in the analysis. It is a widely accepted measure of market volatility and can provide valuable insights into the overall market sentiment and potential impact on Apple Inc.’s stock price. The exchange rate between the Chinese Yuan (CNY) and the United States Dollar (USD) is considered as an economic indicator when thinking about the strong relationship between Apple Inc. and China. Fluctuations in the exchange rate can have a significant impact on Apple Inc.’s revenue and profitability, making it an important factor which may affect the trend of stock price.
Additionally, the 5-year Treasury and 10-year Treasury rates are included in the original dataset. These Treasury rates are often used as benchmarks for assessing the overall economic conditions and can provide insights into the market’s expectations of future interest rates. The inclusion of these rates aims to further enhance the accuracy and reliability of the predictions for Apple Inc.’s stock price.
3.1.2 Seeking Alpha
We have incorporated data from Seeking Alpha which is a reputable platform that provides daily news and analysis, in order to capture the market and investor sentiment toward Apple’s stock price. This data source enables us to gather information about news events that may have a direct or indirect impact on the stock price.
Seeking Alpha serves as a valuable resource for tracking news articles, and financial reports for stock performance, and in this case, specifically for Apple Inc. By analyzing this information, we can identify potential factors that could influence the stock price, such as product launches, earnings announcements, regulatory changes and competitive developments.
The inclusion of news data from Seeking Alpha allows us to acquire real-time market sentiment and apply it into our analysis. News can often generate market reactions and investor sentiment, which in turn can impact the buying and selling decisions surrounding Apple’s stock. By adding news data, we aim to capture the collective sentiment of the market and assess its potential effect on Apple’s stock price.
The collected news data from Seeking Alpha are subjected to further data analysis and preprocessing. This involves extracting relevant information, identifying key events and trends, and encoding the textual data into a format suitable for analysis. With news data alongside other relevant data sources, we aim to gain a comprehensive understanding of the various factors that influence Apple’s stock price and improve the accuracy of the predictions.
Fig. 1
Number of news per day. This figure depicts the daily occurrence of news articles influencing the stock price of Apple from 2017 to 2023
Upon analyzing the news data obtained from Seeking Alpha between 31/07/2017 and 01/08/2023 (as shown in Fig. 1), it is evident that some dates have multiple news articles while six days lack any news coverage. To address this imbalance, measures were taken to normalize the impact of news per day and ensure equitable distribution. These measures involved assigning appropriate weights to clustered articles and supplementing the analysis with relevant market indicators and macroeconomic data for the days without news. Overall, these efforts aimed to create a balanced and accurate representation of market events throughout the analyzed period.
To tackle the aforementioned issue, a solution is devised by employing a weighted average approach to account for instances where multiple news occurs within a single day. The weight assigned to each piece of news is derived from the analysis conducted using FinBERT, a specialized tool for sentiment score assessment. By integrating this weight into the calculation process, the revised probability of a news’ impact on the stock price being positive or negative can be determined which is shown in Fig. 2. This method provides a more comprehensive evaluation of the influence resulted from each news article, thereby improving the accuracy and breadth of the analysis.
Fig. 2
Evaluation of News per day. This figure illustrates the assessment of the impact of news on the stock price of Apple, along with corresponding probabilities
Moving averages are frequently used in stock price forecasting as a valuable technique to augment a dataset’s feature set. Both seven-day and twenty-one-day moving averages were calculated for the analysis. The incorporation of these moving averages as additional features aim to enrich the dataset by capturing and highlighting temporal trends evident in stock prices. By calculating moving averages, the underlying patterns and tendencies exhibited by stock prices over short-term and medium-term intervals can be discerned. These moving averages serve as indicators of price direction and offer insights into market sentiment and investor behavior. Furthermore, the inclusion of supplementary features such as upper and lower bounds further enhances the dataset. These bounds represent predetermined thresholds or statistical measures that define the upper and lower limits within which stock prices are expected to fluctuate. Integrating these bounds provides additional context and understanding of the potential boundaries and constraints that influence stock price movements. The trend of the stock closing price and MA7, MA21 and upper and lower bands is shown in Fig. 3.
Fig. 3
Technical indicators for Apple stock price. This figure shows the overview of the technical indicators that have the possibility to impact the stock price of Apple
Apple closing stock price with Fourier transforms. This figure demonstrates the decomposition of the trend in the Apple stock price using Fourier transform, showcasing the distinct components
The utilization of the Fourier transform in the prediction of stock prices involves using this method for decomposing the time series data of stock prices over a designated six-year timeframe. By utilizing the Fourier transform, the initial stock price signal is efficiently decomposed into its individual frequency constituents. The process of decomposition facilitates the identification and extraction of signals with low frequencies, which are of specific significance due to their capacity to convey information pertaining to underlying patterns. The low-frequency components obtained from the Fourier decomposition are subsequently integrated into the dataset as additional characteristics. The rationale behind this incorporation is driven by the acknowledgement that these infrequent signals provide vital insights into the enduring patterns displayed by stock prices. Essentially, these factors contribute to the observation and analysis of the overall patterns and gradual fluctuations in the behavior of stock prices over a period of time. As shown in Fig. 4, the blue line has provided valuable information regarding the stock price trend during the period of 6 years.
3.3 Model architecture
The model architecture proposed in our paper operates by first obtaining the real stock price data for Apple within a specific time range from 2017 to 2023 and extracting various features such as moving averages, short-time Fourier transform, and sentiment analysis using FinBERT for social media news. Additional indicators like Google and Microsoft stock prices, NASDAQ index, and VIX are also considered.
After preprocessing, we get 22 features for each stock (raw stock data and technical indicators for close price):
− Raw Stock Data
\(*\) High Price
\(*\) Low Price
\(*\) Close Price
\(*\) Adj Close Price
\(*\) Volume
– News Features obtained from FinBERT
– Technical Indicators
\(*\) Exponential Weighted Moving Average (EWMA)
\(\cdot \) 12 Day EWMA
\(\cdot \) 26 Day EWMA
\(*\) Moving Average Convergence Divergence (MACD)
\(*\) Standard Deviation
\(\cdot \) 20 Day Std.
\(*\) Bollinger Bands
\(\cdot \) Bollinger Upper Band
\(\cdot \) Bollinger Lower Band
\(*\) Fourier Components
\(\cdot \) 3 Components
\(\cdot \) 6 Components
\(\cdot \) 9 Components
\(\cdot \) 27 Components
\(\cdot \) 81 Components
\(\cdot \) 100 Components
\(*\) Rolling Averages
\(\cdot \) 7 Day Rolling Average
\(\cdot \) 21 Day Rolling Average
\(*\) Momentum
The 5 * 22 features for (Apple, Google, Microsoft, NASDAQ, VIX) are then fed into a variational autoencoder (VAE) for denoising and extracting high-level representations (Figs. 5, 6, 7, 8, 9, 10, 11, 12).
Fig. 5
Overview of the model architecture. This figure illustrates the model architecture employed in our stock price prediction framework, which incorporates three key components: data preprocessing, generative adversarial networks (GANs), and hyperparameter tuning
Workflow of FinBERT. This figure visually depicts the workflow of our model and showcases the functionality of FinBERT in relation to our model and the social media news data
Workflow of VAE. This figure illustrates the key components of the variational autoencoder (VAE) architecture, namely the encoder, latent space, and decoder Shubham (2019)
Workflow of GRU. This figure depicts the fundamental gated recurrent unit (GRU) cells employed in our model, which play a crucial role in enhancing the generator component of the generative adversarial networks (GANs) model Jin et al. (2020)
Workflow of GANs. This figure illustrates the workflow of generative adversarial networks (GANs), which encompasses both the generator and discriminator components in order to generate synthetic data for stock prices Wang (2020)
Workflow of transformer-based attention. This figure presents the workflow of transformer-based mechanisms, which offer potential improvements in accurately extracting data. It leverages the transformer architecture to enhance the process of extracting relevant and precise information from the dataYang et al. (2021)
Apple closing price prediction for both training set and testing set. This figure delineates the distinction between the training and testing sets using a blue vertical dashed line
Testing set of apple stock price prediction with trading position. This figure shows the predictions made by our model, highlighting four specific points which are intended for utilization in the investment strategy analysis, enabling a comparison with real data
The architecture of the variational autoencoder (VAE) utilized in this study comprises several key components designed to optimize denoising and representation extraction. The encoder, initialized with a self-attention module to capture global dependencies, is followed by a series of linear layers with ReLU activations. This configuration transforms the input features into latent space representations. The mean (\(\mu \)) and variance (logVar) of the latent variables are computed using linear layers, which are essential for the reparameterization trick that ensures differentiability during backpropagation.
In the decoding stage, the latent variables are reconstructed through a sequence of linear layers, culminating in a Sigmoid activation to produce the denoised output. The overall architecture ensures that the high-level representations are robust and informative, facilitating effective downstream tasks.
Additionally, a loss function combining binary cross-entropy for reconstruction accuracy and Kullback–Leibler divergence for regularizing the latent space is employed to train the VAE. This balance helps in achieving a trade-off between reconstruction fidelity and latent space smoothness, which is crucial for generating meaningful representations.
3.3.2 GAN detail
We concatenate the VAE output with the features together as the input of the generative adversarial networks (GANs). The architecture of the GAN implemented in this study consists of a generator and a discriminator, each designed to fulfill specific roles within the adversarial framework.
The generator, denoted as G, is structured to accept input features and generate predicted data for a specified number of future days (num_days_to_predict). It employs a series of GRU layers followed by linear transformations to capture sequential dependencies and produce the output data. The generator’s design ensures the effective synthesis of plausible future data sequences based on the input features.
The discriminator, denoted as D, is responsible for distinguishing between real and generated data. It receives concatenated sequences of real data and generated data as input. The discriminator architecture includes multiple convolutional layers and linear layers, which are used to extract relevant features and make binary classifications (real or fake). This setup leverages convolutional operations to enhance the model’s ability to identify intricate patterns within the time-series data.
The GAN’s training process involves optimizing two separate loss functions for the generator and the discriminator. The discriminator is trained to maximize its ability to correctly classify real and fake data by minimizing the binary cross-entropy loss for both real and generated samples. The generator, on the other hand, aims to minimize the loss incurred when the discriminator identifies its outputs as fake, thereby improving the quality of generated data over successive iterations.
The overall training procedure is carefully managed by separate optimizers for the generator and the discriminator, allowing for fine-tuned updates to each component.
Finally, the hyperparameters of both the generator and discriminator are fine-tuned using reward-based methods and Bayesian optimization. Overall, this model architecture offers a systematic approach for stock price prediction and showcases the integration of various techniques to enhance accuracy in forecasting Apple stock price and then the financial markets.
3.4 Algorithms and network
3.4.1 Financial BERT natural language processing model–FinBERT
FinBERT, a specific model based on BERT (bidirectional encoder representations from transformers), is a useful tool for analyzing financial mood Huang et al. (2023). This model is specifically designed to understand and evaluate the sentiment and contextual information contained in financial text data, such as news stories, earnings reports, and social media posts. Its major goal is to extract insights relevant to financial decision-making processes.
The sentiment score of news data can be successfully analyzed using FinBERT. The model analyzes and interprets the sentiment indicated in the language using its deep learning architecture. This analysis involves acknowledging the nuances, tone, and context embedded in news items, allowing for an in-depth understanding of the sentiment associated with the content.
FinBERT sentiment scores can therefore be used as valuable inputs within the stock price prediction framework. The predictive algorithm gains insights about market sentiment and the likely impact of news on stock prices by adding these sentiment scores. This integration enables a more comprehensive and sophisticated analysis, including the interaction between news sentiment and the fluctuation of stock prices.
3.4.2 Variational auto encoder (VAE)
The variational autoencoder (VAE) is a generative model that combines autoencoder with variational inference techniques. Its primary goal is to learn a latent representation of input data by encoding it into a lower-dimensional latent space and then decoding it to reconstruct the original data Xu and Tan (2020).
The theoretical explanation behind the VAE should be explained below. For it is made up of an encoder and a decoder network, the encoder takes input data and maps it to a latent space using mean (\(\mu \)) and variance (\(\sigma ^2\)) vectors, which parameterize a multivariate Gaussian distribution such as \(P(X|z;\theta \)). The decoder network reconstructs the original data from samples in the latent space. A multivariate Gaussian distribution characterizes the latent space. The VAE is trained by reducing the evidence lower bound (ELBO), which contains the reconstruction error term and a latent space distribution regularization term. The ELBO (Eq. 1) is calculated by taking the reconstructed data’s probability given the latent variable and dividing it by the Kullback–Leibler divergence between the approximate posterior and the prior distribution Carl (2021). Monte Carlo sampling and backpropagation are used for optimization. Unsupervised learning of an expressive latent space for creating and manipulating new samples is possible with VAE. The VAE is critical in data preparation in the context of stock price prediction. The stock price time series is fed into the VAE encoder network in our model. The high-dimensional input data are mapped to a lower-dimensional latent space via this network. By doing so, the VAE captures the data’s core features and patterns while filtering out noise and extraneous information. This preprocessing step enhances the efficiency, accuracy, and reliability of subsequent analysis and prediction tasks.
Gated recurrent units (GRUs) are a form of recurrent neural network architecture (RNN) that effectively tackles the issue of vanishing gradient and facilitates the modeling of extended dependencies in sequential data Shejul et al. (2023). GRUs contain gating mechanisms that regulate the information transit within the network. GRU can be broken down into components and explained in detail.
The first component is the update gate, which serves to determine the extent to which the preceding concealed state information should be preserved and the degree to which the new information should be integrated. It is computed based on the input and the previous hidden state. It is represented in Eq. 3, where \(z_t\) represents the update gate at time step t, \(W_z\) and \(U_z\) are weight matrices, \(x_t\) denotes the input at time step t, and \(h_{t-1}\) represents the previous hidden state.
The next part of GRU is the reset gate, which serves the purpose of determining the extent to which the prior concealed state should be disregarded, as well as the degree to which the incoming input should be taken into account. The computation of the current hidden state is also dependent on both the input and the prior hidden state. The equation can be expressed in Eq. 4, where \(r_t\) is the reset gate at time step t, \(W_r\) and \(U_r\) are weight matrices, and \(b_r\) is the bias vector.
Following should be the Candidate Hidden Gate, which signifies the additional data to be included into the existing hidden state. The computation is performed by integrating the input with the prior hidden state, which is modulated by the reset gate. The representation of the CHG is given by Eq. 5.
Next one should be the Hidden State Update, which is updated by combining the update gate-modulated previous hidden state with the candidate hidden state and it is represented by the form in Eq. 6.
During training, the GRU parameters are learned through backpropagation and gradient descent, optimizing a specified loss function. GRU is selected as a model for the training of stock price prediction due to its ability to capture sequential dependencies and long-term patterns in time series data. By utilizing previous stock price data for training purposes, the model is able to acquire proficiency in accurately forecasting future price fluctuations.
3.4.4 Generative adversarial networks (GANs)
Generative adversarial networks (GANs) are a category of deep learning models including a pair of neural networks, namely a Generator and a Discriminator. The Generator algorithm acquires knowledge of the fundamental data distribution and produces synthetic data samples that exhibit similarities to the authentic data. In contrast, the Discriminator’s objective is to differentiate between authentic and artificially generated samples. The training process involves the simultaneous training of two components, namely the Generator and the Discriminator, in an adversarial manner. The primary objective of the Generator is to develop synthetic data that closely resembles real data, with the intention of deceiving the Discriminator. Conversely, the Discriminator aims to accurately distinguish between genuine and synthetic samples.
The objective function of GAN can be mathematically formulated as follows:
where V(D, G) represents the value function of the GAN, \(P_{\text {real}}\) is the real data distribution, \(P_z\) is the noise distribution, \(x_{\text {real}}\) is a real sample, G(z) is a generated sample, \(D(x_{\text {real}})\) is the discriminator’s output for a real sample, and D(G(z)) is the discriminator’s output for a generated sample Wang (2020).
During the training phase, the primary objective of the Generator is to produce synthetic data that closely resembles real data, hence making it difficult to distinguish between the two. Conversely, the Discriminator’s main goal is to effectively differentiate and categorize samples as either genuine or synthetic. Through an adversarial training process, the Generator and Discriminator are engaged in a dynamic interplay. This interplay enables the Generator to acquire the ability to generate synthetic data that progressively approximates the underlying data distribution of stock prices, thereby enhancing its realism.
To predict the next 5 days of stock prices, we utilize the trained Generator. Given a sequence of 10 days of historical data, the Generator produces synthetic samples that represent potential future stock price movements. The artificially generated samples are subsequently employed as input for an independent prediction model, such as a regression model, in order to anticipate forthcoming stock prices.
3.4.5 Transformer-based attention
Transformer-based models have gained significant attention in natural language processing tasks due to their ability to capture long-range dependencies and learn contextual relationships effectively. In the context of stock price prediction, we leverage the transformer-based attention mechanisms to capture temporal dependencies and extract meaningful features from the historical stock data. The attention mechanism allows the model to focus on relevant information and weigh the importance of different time steps in the input sequence. In our model, the attention mechanism is added in the discriminator which can improve the accuracy and add the autonomy to the model.
3.4.6 Hyperparameter tuning
Bayesian optimization is used in our model for hyperparameter tuning for its ability to efficiently explore the hyperparameter space, leverage probabilistic modeling, handle limited evaluations, search for global optima. The objective function of Bayesian optimization is root mean squared error (RMSE). After the evaluation, the algorithm will update its surrogate model which will capture the relationship between the hyperparameters and the objective function. After that, the algorithm will then select the next set of hyperparameters to test. To efficiently identify optimal configurations, it will integrate exploration (testing out different hyperparameters) and exploitation (focused on promising portions of the search space). These steps will be repeated in order to obtain the best hyperparameters. Subsequently, subsequent to the meticulous optimization of hyperparameters, it has been ascertained that the optimal predictive outcomes manifest when a time frame encompassing 30 consecutive days of historical stock price data is employed to forecast the subsequent single-day stock price.
4 Experiments
4.1 Model training
This purpose of this paper is to develop a forecasting model capable of predicting stock closing prices for the subsequent five days using a dataset comprising the past 10 days’ data. The training process involves incorporating both historical closing prices and 30 relevant features that potentially influence the stock price. To facilitate model training and evaluation, the dataset is divided into a training set, comprising \(70\%\) (943 data points), and a testing set, comprising \(30\%\) (404 data points). Following model training, the testing phase provides an initial assessment of the model’s performance, which is further refined through model tuning. In our paper, the GAN loss is calculated for both generator and discriminator in order to optimize their performance. The GAN loss serves as the objective function that guides the training process. It quantifies the disagreement or discrepancy between the generator’s output and the discriminator’s assessment. The loss is computed based on the predictions made by the discriminator for both real and generated data. In the context of the generator within a GAN, the GAN loss serves to incentivize the generation of synthetic samples that pose a greater challenge for the discriminator in discerning them from authentic data. This objective is accomplished through the minimization of the generator’s loss, thereby enhancing its capacity to effectively deceive the discriminator. More formally, the loss function of the generator can be expressed as follows:
In relation to the discriminator, the GAN loss quantifies its proficiency in accurately categorizing genuine and synthesized samples. The primary objective of the discriminator is to diminish this loss by enhancing its discriminatory capabilities, enabling it to effectively discern between the two categories of data. The expression of the loss function can be articulated as follows:
where: \(y_i\) represents the values of actual stock price, \(\hat{y}_i\) represents the values of predicted stock price, and n represents the total number of historical stock price.
This procedure proceeds for each accessible data point, yielding a single numerical value representing the overall forecast error. The RMSE emphasizes greater errors by calculating the square of the differences, penalizing predictions that deviate greatly from the actual values. As a result, RMSE gives a thorough measure of prediction accuracy, encompassing both the number and direction of errors. A lower RMSE number suggests higher prediction accuracy since it represents smaller average discrepancies between anticipated and actual stock values. A greater RMSE value, on the other hand, indicates larger prediction mistakes and worse accuracy.
The standard root mean square error (RMSE) is computed to assess the accuracy of a stock price prediction model. The evaluation involves calculating the RMSE values for both the train set and testing set. The standard RMSE is determined using the equation \(\text {RMSE}(y_{\text {true}}, y_{\text {pred}})/\text {RMSE}(y_{\text {true}}[1:], y_{\text {true}}[:-1])\), where \(y_{\text {true}}\) represents the actual stock price and \(y_{\text {pred}}\) represents the predicted stock price. This calculation enables the derivation of a standardized measure of prediction error, allowing for the evaluation of the model’s performance in forecasting stock prices.
4.3 Experimental and results
Table 1
Model summary RMSE for Apple
RMSE (train)
RMSE (test)
Standard RMSE (train)
Standard RMSE (test)
5.2264
9.4235
2.4142
3.3315
Figure 11 presents the outcomes of Apple’s closing stock price prediction using both the training set and testing set. The results reveal that following the training phase, the model effectively captures the upward and downward trends in the stock price, enabling accurate predictions for the period subsequent to mid-2022. However, when applied to the testing set, the model fails to capture the rapid growth observed in the actual stock price after March 2023. Notably, in 2023, Apple’s stock price has experienced a remarkable surge of over 42% Harsh (2023). This upsurge in Apple shares can be attributed to the recent advancements in technology stocks on Wall Street, driven by optimism surrounding the potential of artificial intelligence (AI).
To gauge the accuracy of the model, the standard RMSE proves to be a more reliable indicator compared to the conventional RMSE, which often yields inflated values. Table 1 shows that for the current model, the standard RMSE values are 2.414 for the training set and 3.331 for the testing set. These results indicate the model’s commendable predictive capabilities in the realm of stock price forecasting, even in the post-COVID-19 pandemic era.
5 Financial analysis
In order to measure and provide comprehensive financial insights within the context of the project focused on stock prediction, a crucial step entails the calculation of investment returns. This calculation involves a comparative analysis between the predicted stock prices generated by our model and the actual stock prices observed during the corresponding period. By conducting this analysis, we aim to evaluate the practical utility and effectiveness of our model in real-life stock investment scenarios. This evaluation of the model’s performance is a valuable instrument for determining its potential to generate measurable returns and instructing choices regarding investment in the changing and unpredictable stock market.
On the basis of our model’s 30-day forecast, we anticipated a subsequent decline in the price of Apple Stock, with a predicted share price of $134.055 on 16/06/2022. In response to this forecast, we made the deliberate decision to initiate a purchase of the stock at that time.
Then, we detected an upward trend in the stock price, which culminated in a high of $174.890 per share on 18th August 2022, through meticulous observation. This prompted us to move forward with the decision to sell the stock at this time.
Furthermore, additional analysis revealed that the stock price resumed its downward trajectory on 11/01/2023, reaching a value of $131.298 per share, followed by a subsequent upward trend on 23/06/2023, reaching a value of $170.783 per share. Based on these findings, we affirm our intention to continue employing the same investment strategy of purchasing and subsequently selling the stock.
To calculate our return on investment (ROI) based on this transaction, we employ the formula:
It is essential to note that the aforementioned calculation does not take into account transaction costs and other associated fees.
Table 2
Return on investment for Apple stock investing strategy
Date
Predicted stock price
\(\text {ROI}_P\)
Real stock price
\(\text {ROI}_R\)
\(\text {ROI}_P - \text {ROI}_R\)
2022-06-16
134.055
130.060
2022-08-18
174.890
30.46%
174.150
33.90%
+3.44%
2023-01-11
131.298
133.490
2023-06-23
170.783
30.07%
186.680
39.85%
+9.78%
Average ROI
30.265%
36.875%
+6.61%
Table 2 presents the predicted return on investment (ROI) and the actual ROI for the investment strategy based on our prediction model. The findings reveal that the average predicted ROI is 30.265%, whereas the actual ROI is 36.875%, resulting in an excess return of +6.61%. These results demonstrate the potential value of our 30-day-ahead forecasts in guiding investment strategies and informing purchasing and selling decisions.
Nonetheless, it is essential to recognize that the stock market is a dynamic environment influenced by a variety of tangible variables. When making investment decisions, it is necessary to consider additional actual factors, despite the helpfulness of our forecasts. The results indicate that our prediction model can serve as a useful guide, but it should be used in conjunction with a comprehensive market analysis.
Upon obtaining the predicted and actual return on investment (ROI) values for the Apple Stock investment, we proceed to calculate the Sharpe ratio as a means to assess the investment strategy’s risk-adjusted performance. The Sharpe ratio is expressed as follows:
In this equation, \({E}[R_i]\) represents the expected ROI of the Apple Stock investment, while \(R_f\) denotes the risk-free rate, typically approximated by the US 10-year Treasury Yield. For the purposes of this analysis, we adopt a value of 4.19% as the current Risk-Free rate.
Besides, we also calculate the Treynor ratio to gauge the investment strategy’s ability to generate excess returns per unit of systematic risk. The Treynor ratio allows for an evaluation of the strategy’s performance relative to the level of systematic risk it has assumed. Unlike the Sharpe ratio, which considers both systematic and unsystematic risk, it specifically focuses on systematic risk. Mathematically, the Treynor ratio (TPI) is expressed as:
To calculate \(\beta _i\), we rely on the expected return of the market, often represented by the S&P 500 index, as it is widely recognized as a benchmark for market performance. For this analysis, we consider the average market return to be 11.88% and the corresponding standard deviation (\(\sigma _{m}\)) to be 0.172.
Furthermore, we employ the calculation of Jensen’s alpha as a means to evaluate the investment strategy’s performance in terms of generating excess returns while accounting for the systematic risk captured by the portfolio’s beta. Considering its exposure to systematic risk, Jensen’s alpha is a useful indicator of whether an investment strategy has outperformed or underperformed a specified benchmark. The JPI formula is as follows:
In the context of this essay, calculating Jensen’s alpha allows for a comparison of the performance of an investing strategy to a certain benchmark, such as a market index. A positive alpha indicates that the investing strategy outperformed expectations given its systematic risk exposure, demonstrating expertise or superior performance. A negative alpha, on the other hand, shows underperformance in comparison with the benchmark.
Table 3
Important index for risk-adjusted performance of the investment
ROI (%)
\(\sigma _i\)
\(\beta _i\)
\(R_f\) (%)
\(R_m\) (%)
\(\sigma _m\)
Our model prediction
30.265
0.261
1.51
4.19
11.88
0.172
Real market data
36.875
0.327
1.29
4.19
11.88
0.172
Table 3 presents the key factors essential for calculating the risk-adjusted performance of the investment strategy. The subsequent computation of the Sharpe ratio, Treynor ratio, and Jensen’s alpha relies on these factors as foundational elements.
These factors encompass a range of critical inputs that contribute to the evaluation of risk-adjusted performance. They may include the actual return of the investment strategy ROI, the risk-free rate \(R_f\), the expected return of the market benchmark \(R_m\), the portfolio’s beta coefficient \(\beta _i\), and other relevant data points necessary for the calculations (Table 4).
Table 4
Evaluation of investment risks and returns
ROI(%)
SPI
TPI
JPI
Our model prediction
30.265
0.999
0.175
0.145
Real market data
36.875
1.000
0.253
0.228
Based on the comparison between the predicted and real values, it is evident that the disparity between the predicted and actual values of the Sharpe ratio (SPI) is not substantial. However, a discernible gap exists in the calculations of the Treynor ratio (TPI) and Jensen’s alpha (JPI), potentially attributable to variations in the estimated beta coefficient (\(\beta _i\)). The predicted values for TPI and JPI stand at 0.175 and 0.145, respectively, while their real counterparts are 0.253 and 0.228. It is noteworthy that the actual market data consistently exhibits higher returns and superior risk-adjusted performance metrics in comparison to the model’s predictions. This discrepancy suggests the possibility of the model underestimating the investment performance or inadequately capturing certain aspects of the market dynamics.
6 Conclusion and discussion
In summation, our research introduced a model, predicated on transformer-based attention generative adversarial networks (GANs), to augment stock price prediction abilities. As per the empirical evidence garnered through our experiments, several inferences can be made. First and foremost, our model exhibits a commendable performance when compared against preceding scholarly investigations that utilized machine learning methodologies for stock price predictions, using the root mean square error (RMSE) as a performance evaluation benchmark. Secondly, our model demonstrates an adeptness in learning and discerning both upward and downward trends within the stock market, thus providing valuable insights that can be instrumental in making investment decisions. The real and forecasted return on investment (ROI), predicated on the model’s predictive outcomes, align well with the expectations of investors. Thirdly, the application of our model can be broadened to encompass the technology industry stock prediction, enabling it to decipher trends integral to investment decision-making and financial analysis. This is substantiated in the appendix, wherein the model’s performance on tech giants such as Amazon, Google, and Microsoft is illustrated. However, it is worth noting that in the case of Apple Inc., whose stock price witnessed a precipitous rise in 2023, our model may not have adequately captured the pace of growth. It is therefore suggested that further insights be gleaned to refine the precision of future predictions.
For future research endeavors, an emphasis should be placed on the optimization of hyperparameters. Furthermore, it is crucial to understand that the model’s predictive outcomes should serve merely as one of the many references in the investment decision-making process, and not be viewed as the sole determinant.
7 Practical implementations
7.1 Implications for stakeholders in stock market
The application of our model carries significant implications for various stakeholders within the financial market ecosystem. Investors and traders can leverage the predictive capabilities of the model to make informed buy or sell decisions; thus, it can help align their investment strategies with anticipated market trends. Financial analysts, on the other hand, may find the model’s comprehensive analysis of vast datasets invaluable in enhancing the accuracy of market reports and investment recommendations. For portfolio or fund managers, the model can offer insights that facilitate optimized asset allocation and risk management, especially in the technology industry. This capability will enable the development of robust portfolios that are better equipped to withstand market volatility and safeguard the investment. For financial institutions and trading platforms, incorporating the advanced model can not only enhance the value proposition to clients through superior performance but also improve risk management practices by leveraging the model’s ability to forecast market volatility and trends.
7.2 Integration into existing trading systems
The integration of a stock prediction model with a reinforcement learning (RL) agent creates a strategic synergy where the output of the stock prediction model guides the RL agent’s trading decisions. By feeding the RL agent with forecasts on future market trends and volatility, it can more accurately determine the optimal moments to buy, sell, or hold stocks. This process hinges on the prediction model’s ability to generate reliable and actionable insights, which become the states in the RL framework, informing the agent’s decisions and strategy development.
In transitioning from theory to practice, the RL agent initially learns from historical data in a simulated environment, allowing for risk-free strategy refinement. Once confident, the agent applies these strategies to the real market, starting cautiously with small trades to gather live feedback and further refine its approach. This iterative learning process ensures that the agent continually adapts to new information and market conditions, optimizing its decision-making strategies over time.
However, integrating these advanced models poses potential challenges and considerations. The accuracy and timeliness of the stock prediction model’s outputs are critical, as any inaccuracies can lead to suboptimal decision-making by the RL agent. In addition, the complexity of the financial markets, with their inherent noise and volatility, requires sophisticated reward structures and risk management strategies within the RL framework to ensure long-term profitability and sustainability. Regulatory compliance and ethical considerations also play a pivotal role, as the autonomous nature of these trading strategies must align with current financial laws and ethical trading practices, necessitating transparent and responsible model development and deployment.
The implementation of such models within existing trading infrastructures is not without challenges. The complexity of RL models necessitates access to high-quality, real-time data, and substantial computational resources for continuous learning and adaptation. Besides, the integration process must consider the latency in decision-making and execution to ensure that the model’s recommendations remain relevant in the fast-paced environment of stock trading. Moreover, there are regulatory and ethical considerations to address, which should ensure that the use of advanced predictive and decision-making algorithms complies with financial regulations and maintains market integrity. The successful deployment of these models requires not only technical expertise but also a strategic approach to model governance, ongoing evaluation, and adjustment to align with evolving market conditions.
8 Limitation
1.
Insufficient data availability and quality The effectiveness of our predictive model is heavily dependent on the availability and quality of historical financial data. Currently, our access to comprehensive datasets, especially those encompassing assets like the S&P 500, Crude Oil, and Gold, is limited. This restriction not only hampers the model’s ability to forecast stock prices accurately but also introduces potential biases. Future research should aim to incorporate more diverse and extensive datasets, including alternative data sources, to mitigate these biases and improve prediction accuracy.
2.
Potential biases in data sources Our analysis may be subject to biases stemming from the selective nature of available financial datasets. These biases could influence the model’s predictions, particularly if the data over-represents certain market conditions or asset classes (in our cases, for the companies in the technology industry). Acknowledging this, future iterations of our model should explore methodologies to identify and correct for these biases, such as synthetic data generation or advanced data augmentation techniques.
3.
Impact of market anomalies and extreme movements The current model does not sufficiently account for market anomalies and extreme market movements, which are often unpredictable yet have significant impacts on stock prices. The unpredictable nature of these events, such as flash crashes or geopolitical uncertainties, can lead to substantial prediction errors. To enhance the model’s robustness, future research can focus on integrating advanced detection techniques and adaptive models capable of learning complex market patterns, including stress testing and scenario analysis, to better anticipate and react to rapid market changes.
4.
Uncertainty of models and market events While predictive models provide valuable insights, they inherently contain uncertainties. The predicted stock prices may not always align with actual market conditions due to sudden market volatility or unforeseen events. These discrepancies introduce additional risks that may not be fully captured by our current models. Future versions of the model could incorporate advanced volatility forecasting techniques and dynamic adjustment mechanisms to better predict and adapt to rapid market changes.
5.
Ignorance of transaction costs and liquidity Our analysis does not explicitly consider transaction costs, including brokerage fees and slippage, nor does it account for the liquidity of specific stocks. These factors are crucial in determining the actual returns from trading strategies based on predicted prices. Future enhancements should include a more detailed analysis of transaction costs and liquidity factors to provide a more realistic assessment of potential returns and trading feasibility.
9 Future goals
1.
Integration of reinforcement learning methodologies The primary objective of this research is to enhance stock price prediction. Besides, in an effort to enhance stock purchasing decisions and optimize the determination of appropriate investment quantities, we will propose the integration of reinforcement learning methodologies. This advanced approach will enable us to make informed decisions on whether to buy a particular stock, while simultaneously determining the optimal quantity for purchase in the current market conditions. By leveraging reinforcement learning techniques, we aim to improve the accuracy and effectiveness of stock prediction models, ultimately supporting investors in making optimized investment decisions.
2.
Fine-tuning hyperparameters for enhanced model performance Another important objective is to improve model performance by fine-tuning hyperparameters. The model’s performance is affected by hyperparameters, and by utilizing more effective fine-tuning approaches, we hope to optimize the model’s efficacy. It is anticipated that through precise parameter adjustments and optimization strategies, the model’s predictive capabilities will be enhanced, resulting in more accurate and trustworthy stock price forecasts.
3.
Generalization and expansion to non-technology industries We intend to broaden our study beyond the technology industry in order to strengthen the model’s generalization capabilities. This expansion involves investigating firms from industries other than technology. By including a broader range of industries, we desire to improve the model’s capacity to respond to changing conditions in the market. This modification will allow us to evaluate the model’s performance across diverse industries, boosting its generalizability and practical usefulness.
By pursuing these future goals, we intend to contribute to the progress of stock price prediction and investment decision-making optimization. Through the integration of Reinforcement Learning methodologies, fine-tuning of hyperparameters, and expansion to non-technology industries, we seek to develop more robust and accurate stock prediction models. Ultimately, our aim is to provide investors with reliable tools and insights to make informed decisions in an ever-evolving financial landscape.
Acknowledgements
Prof. Soner, Rex (who is serving as TA for this project) have offered their aid in advising my project progress and answering my questions. The information provided is useful to achieve my goals and helps me know how to adjust the wrong steps while facing failure.
Declarations
Conflict of interest
The undersigned authors affirm that they have carefully reviewed and approved the manuscript titled “Enhancing Stock Price Prediction using GANs and Transformer-based Attention Mechanisms”. This manuscript represents original work and has not been previously published, nor is it currently being considered for publication elsewhere.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Amazon closing stock price with Fourier transforms. This figure demonstrates the decomposition of the trend in the Amazon stock price using Fourier transform, showcasing the distinct components
Technical indicators for amazon stock price. This figure presents an overview of the technical indicators that have the potential to impact the stock price of Amazon and improve prediction accuracy
Microsoft closing stock price with Fourier transforms. This figure demonstrates the decomposition of the trend in the Microsoft stock price using Fourier transform, showcasing the distinct components
Technical indicators for Microsoft stock price. This figure presents an overview of the technical indicators that have the potential to impact the stock price of Microsoft and improve prediction accuracy
Google closing stock price with Fourier transforms. This figure demonstrates the decomposition of the trend in the Google stock price using Fourier transform, showcasing the distinct components
Technical indicators for google stock price. This figure presents an overview of the technical indicators that have the potential to impact the stock price of Google and improve prediction accuracy
To extend the model to other companies besides Apple INC. within the technology industry, the data preprocessing and collection procedures were applied consistently to the stocks of Amazon, Google, and Microsoft. The aforementioned process encompassed the period spanning from July 2017 to August 2023. The incorporation of Fourier transform facilitated the capturing of the upward and downward trends in their stock prices, while the integration of the 7-day and 21-day Moving Averages served as a valuable technique to augment the feature set of the dataset.
Amazon closing price prediction for both training set and testing set. This figure delineates the distinction between the training and testing sets using a blue vertical dashed line
Microsoft closing price prediction for both training set and testing set. This figure delineates the distinction between the training and testing sets using a blue vertical dashed line
Google closing price prediction for both training set and testing set. This figure delineates the distinction between the training and testing sets using a blue vertical dashed line
In addition to training and testing the stock price prediction model using data from Apple Inc., we extended our analysis to include three other technology companies. This broader evaluation aimed to assess the applicability of our model in predicting stock prices within the technology industry. The standard root mean square error (RMSE) was calculated for each of these companies.
The results indicate that our model can be effectively applied to these additional companies. Specifically, both Amazon and Google exhibited favorable performance in both the training and testing sets, achieving testing standard RMSE values of 2.897 and 2.844, respectively. For Microsoft, the standard RMSE value was 3.366. The certainly low standard RMSE indicates that the model’s predictions are closer to the actual stock prices which means higher accuracy in its forecasts.
These findings, when combined with the numerical figures, suggest that our model has the capability to capture the upward and downward trends in stock prices and provide relatively accurate predictions. This indicates the potential usefulness of our model in forecasting stock prices within the technology industry.