1 Introduction
2 Understanding stock market data
2.1 Data characteristics
2.1.1 Source
2.1.2 Frequency
2.1.3 Volume
2.2 Data types
2.2.1 Market data
Ticker symbol | AAPL |
---|---|
Name | Apple Inc. |
Last trade price | 289.80 |
Last trade timestamp | 1577480401 |
Last trade volume | 35447203 |
Exchange | NASDAQ |
2.2.2 Fundamental data
2.2.3 Alternative data
Date | Time | Open | High | Low | Close | Volume |
---|---|---|---|---|---|---|
20160128 | 10:00 | 122.17 | 122.27 | 122.09 | 122.09 | 4,934 |
20160128 | 11:00 | 121.42 | 121.60 | 121.38 | 121.52 | 12,254 |
Market data attributes | Fundamental data attributes | Alternative data attributes |
---|---|---|
open price, high price, low price, close price, volume | revenue, earnings per share, market capitalization, dividend, average volume, shares outstanding, next earning date | google trends, news, texts, tweets, satellite imagery |
Source | Type | Frequency | Free | Library |
---|---|---|---|---|
Market, fundamental | Interday | Y | Investpy | |
Market, fundamental | Interday, intraday | N | Na | |
Market, fundamental | Interday, intraday | N | Na | |
Market, fundamental | Interday, intraday | Y | yfinance | |
Market | Interday* | Y | Kaggle-api | |
Market, fundamental | Interday, intraday | N | Tws-api | |
Taiwan market | Interday, intraday | Y | Na | |
pypi.org/project/tushare | China market, fundamental | Interday, intraday | Y | Tushare |
Market, fundamental | Interday, intraday | N | Na | |
Market, fundamental | Interday, intraday | N | Na | |
Market, fundamental | Interday, intraday | N | Na | |
Market, fundamental | Interday, intraday | N | Na | |
China market, fundamental | Interday, intraday | N | Na | |
Nordic market | Intraday* | Y | Na | |
UK market | Interday | N | Na | |
Narket, fundamental | Interday, intraday | N | Na | |
Taiwan market, fundamental | Interday, intraday | N | Na | |
China market, fundamental | Interday, intraday | N | Jqdatasdk |
Source | Market data attributes | Fundamental data attributes | Frequency |
---|---|---|---|
investinga | Open price, high price, low price, close price, volume | Revenue, earnings per share, market capitalization, dividend, average volume, ratio, beta, shares outstanding, next earning date | Daily, weekly, monthly |
y-financeb | Open price, high price, low price, close price, volume | Major holders, institutional holders, mutual fund holders, dividends, splits, actions, calendar, earnings, quarterly earnings, financials, quarterly financials, balance sheet, quarterly balance sheet, cashflow, quarterly cashflow, sustainability, shares outstanding | 1 min, 2 min, 5 min, 15 min, 30 min, 60 min, 90 mins, 1 h, 1 day, 5 days, 1 week, 1 month, 3 months |
taifexc | Open bid, high bid, low bid, last bid, volume, best bid, best ask, historical high, historical low | Not available | Daily |
kaggled | Open price, high price, low price, close price, volume | Not available | Daily |
tusharee | Open price, high price, low price, close price, volume | Account receivable turn day, account receivable turnover, business income, current asset days, current asset turnover, earnings per share, earnings per share (year over year), fixed assets, gross profit rate, inventory days, inventory turnover, liquid assets, net profit ratio, net profits, outstanding, profits (year over year), report date, reserved, reserved per share, return on equity, time to market, total assets | Daily |
etsinf | Open price, high price, low price, close price, volume | Not available | Daily |
investing.com
, finance.yahoo.com
and kaggle.com
utilize either API or libraries, facilitating interactions with them and unlocking better integration with the ML system. Sources without any programmatic interface usually make data available as manual downloadable files.2.3 Data representation
2.3.1 Bars
2.3.2 Charts
2.4 Lessons learned
3 Deep learning for stock market applications
3.1 What is deep learning?
3.1.1 Artificial neurons
3.1.2 Learning techniques
-
Technique based on weight adjustment: The most common learning technique category, this technique is based solely on how weights are adjusted across an iterative process and is dependent on the type of supervision available to the network during the training process. The different types are supervised, unsupervised (or self-organized), and reinforcement learning.
-
Technique based on data availability: When categorized according to how data is presented to the network, the learning technique can be considered offline or online. This technique might be chosen because the complete data are not available for training in one batch. This could be because either data are streaming or a concept in the data changes at intervals, requiring the data to be processed in specific time windows. Another reason could be that the data are too large to fit into the memory, demanding processing in multiple smaller batches.
3.1.3 Network architecture
3.1.3.1 Feed-forward neural networks
3.1.3.2 Recurrent neural network
3.1.3.3 Convolutional neural networks
3.1.3.4 Autoencoder
3.1.3.5 Deep Reinforcement Learning
-
Model-based reinforcement learning The agent retains a transition model of the environment to enable it to select actions that maximize the cumulative utility. The agent learns a utility function that is based on the total rewards from a starting state. It can either start with a known model (i.e., chess) or learn by observing the effects of its actions.
-
Model-free reinforcement learning The agent does not retain a model of the environment, instead focusing on directly learning how to act in different states. This could be via either an action-utility function (Q-learning) that learns the utility of taking an action in a given state or a policy-search in which a reflex agent directly learns to map policy, \(\pi (s)\), from different states to corresponding actions.
3.2 Using deep learning in the stock market
3.2.1 Modeling considerations
3.2.1.1 Sampling intervals
3.2.1.2 Stationarity
3.2.1.3 Backtesting
3.2.1.4 Assessing feature importance
3.2.2 Model evaluation
Predicted | ||||
---|---|---|---|---|
Positive | Negative | Total | ||
Actual | Positive | TP | FP | P |
Negative | FN | TN | N | |
Total | \(P'\) | \(N'\) | \(P+N\) |
Evaluation | Description | Formula |
---|---|---|
Returns | Total amount gained or lost within a specific investment period, typically measured as a percentage of the original investment known as Rate of Returns (RoR) (Kenton 2020). This could also be the absolute total profit or loss for the investment period | \(\displaystyle \frac{V_f - V_i}{V_i} * 100\). |
Compound annual growth rate (CAGR) | The ROR for investment over a number of years, with returns re-invested yearly (Murphy 2019). n: Number of years | \(\displaystyle \left( \frac{V_f}{V_i}\right) ^\frac{1}{n} - 1\) |
Volatility | Degree of variation in asset or total portfolio value (Investopedia 2016). \(\sigma\): Standard deviation of returns; T: Time Horizon or number of holding period | \(\displaystyle \sigma \sqrt{T}\) |
Sharpe ratio | Measures performance in comparison with a risk-free asset, with adjustments for volatility or total risk (Hargrave 2019). \(R_p\): Average portfolio returns; \(r_f\): Risk-free (i.e., treasury bonds) returns; \(\sigma _p\): standard deviation of a portfolio’s excess returns | \(\displaystyle \frac{R_p - r_f}{\sigma _p}\) |
Sortino ratio | A modification of the Sharpe Ratio that differentiates harmful volatility from overall volatility (Kenton 2019). \(\sigma _d\): standard deviation of portfolio’s negative returns, i.e., returns that fall below a user-defined threshold | \(\displaystyle \frac{R_p - r_f}{\sigma _d}\) |
Maximum drawdown (MDD) | Measures the decline of a return from a peak before a new peak that is at least equal to the old peak is achieved (Hayes 2020). This is used to compare the riskiness of different models or strategies. \(V_t\): Trough value; \(V_p\): Peak value | \(\displaystyle \frac{V_t - V_p}{V_p}\) |
Calmar ratio | Risk-adjusted returns (Will Kenton 2020). | \(\displaystyle \frac{V_f - V_i}{MDD}\) |
Value-at-risk (VaR) threshold | Estimate (as threshold) of maximum loss for an investment over time (Harper 2016). \(E_r\): Expected returns; \(z_i\): z-score of confidence interval; \(\sigma _p\): Standard deviation of portfolio; \(V_p\): Value of portfolio | \(\displaystyle \left[ E_r - \left( z_i * \sigma \right) \right] * V_p\) |
Evaluation | Description | Formula |
---|---|---|
Accuracy | The percentage of the correctly predicted classes. | \(\displaystyle \frac{TP + TN}{P + N}\) |
Error rate | The percentage of incorrectly predicted classes. Also computed as \(1 - accuracy\). | \(\displaystyle \frac{FP + FN}{P + N}\) |
Recall | Ratio of true positive classes; also known as measure of exactness or sensitivity. | \(\displaystyle \frac{TP}{P}\) |
Precision | Ratio of positive predictions; also known as measure of completeness. | \(\displaystyle \frac{TP}{TP + FP}\) |
F-score | Harmonic mean of recall and precision. | \(\displaystyle \frac{2 * precision * recall}{precision + recall}\) |
Weighted F-score | Weighted measure of recall and precision. \(\beta < 1\) assigns more weight to precision, while \(\beta > 1\) assigns more wait to recall. | \(\displaystyle \frac{(1+\beta ^2) * precision * recall}{\beta ^2 * precision + recall}; \beta > 0\) |
Mean absolute error (MAE) | Average of the absolute difference between the predicted values and the actual values. | \(\displaystyle \frac{1}{n}\sum ^n_{i=1}|y_i - {\hat{y}}_i|\) |
Mean absolute percentage error (MAPE) | Average of the percentage errors. | \(\displaystyle \frac{100}{n}\sum ^n_{i=1} \frac{y_i - {\hat{y}}_i}{y_i}\) |
Mean square error (MSE) | Average of the squared difference between the predicted values and the actual values. | \(\displaystyle \frac{1}{n}\sum ^n_{i=1}\left( y_i - {\hat{y}}_i \right) ^2\) |
3.2.3 Lessons learned
4 Survey findings
4.1 Research methodology
This query searches for publications including the phrases“deep learning” AND “stock market” AND (“backtest” OR “back test” OR “back-test”)
“deep learning”
, “stock market”
, and any one of “backtest”
, “back test”
or “back-test”
. We observed these three different spellings of “backtest” in different publications, suggesting the importance of catching all of these alternatives. This produced 185 results2, which include several irrelevant papers. For validation, we searched using Semantic Scholar (Scholar 2020), obtaining approximately the same number of journal and conference publications. We chose to proceed with Google Scholar because Semantic Scholar does not feature such algebraic query syntax, requiring that we search for the different combinations of “backtest” individually with the rest of the search query.Publisher | Count | Year | Count |
---|---|---|---|
IEEE | 9 | 2018 | 6 |
arXiv | 8 | 2019 | 10 |
SSRN | 5 | 2020 | 19 |
Elsevier | 3 | ||
ACM | 2 | ||
MDPI | 2 | ||
Springer | 2 | ||
IOP Publishing | 1 | ||
Wiley | 1 | ||
IJCAI | 1 | ||
Institutional Investor Journals | 1 |
4.2 Summary of findings
-
Trade Strategy: Algorithmically generated methods or procedures for making buying and selling decisions in the stock market.
-
Price Prediction: Forecasting the future value of a stock or financial asset in the stock market. It is commonly used as a trading strategy.
-
Portfolio Management: Selecting and managing a group of financial assets for long term profit.
-
Market Simulation: Generating market data under various simulation what-if market scenarios.
-
Stock Selection: Selecting stocks in the stock market as part of a portfolio based on perceived or analyzed future returns. It is commonly used as a trading or portfolio management strategy.
-
Risk Management: Evaluating the risks involved in trading, to maximize returns.
-
Hedging Strategy: Mitigating the risk of investing in an asset by taking an opposite investment position in another asset.
A: architecture, B: market(s), C: dataset source, D: reproducibility | |||||
---|---|---|---|---|---|
References | A | B | C | D | |
Trade strategy | Wang et al. (2019c) | DRL, LSTM | China, US | wind, wrds | No |
Li et al. (2020) | DRL | US | kaggle | No | |
Théate and Ernst (2020) | DRL, FFNN | Asia, US, Europe | unspecified | Yes | |
Zhang et al. (2020d) | DRL | US | pinnacle | No | |
Chakole and Kurhekar (2020) | DRL, FFNN | US, India | yahoo | No | |
Wu et al. (2019) | DRL, LSTM | China | tushare | No | |
Hu et al. (2018b) | Autoencoder, CNN | UK | unspecified | No | |
Lei et al. (2020) | CNN | China | tushare | No | |
Chen et al. (2018b) | CNN | Taiwan | apex | No | |
Wu et al. (2020) | LSTM | Taiwan | tfe | No | |
Koshiyama et al. (2020) | Autoencoder, LSTM | Global | bloomberg | Yes | |
Sun et al. (2019) | LSTM | US | ibkr | No | |
Silva et al. (2020) | LSTM | Unspecified | unspecified | No | |
Wang et al. (2020) | LSTM | China | joinquant | No | |
Chalvatzis and Hristu-Varsakelis (2020) | LSTM | US | unspecified | No | |
Price prediction | Wang et al. (2019a) | Conv-LSTM, RNN | China, US | ibkr | No |
Zhang et al. (2019) | CNN, LSTM | UK, Nordic | lse, etsin | No | |
Zhao et al. (2018) | Autoencoder, CNN, LSTM | US | unspecified | No | |
Zhang et al. (2020b) | Autoencoder, CNN, LSTM | China | unspecified | No | |
Fang et al. (2019) | LSTM | China | private | No | |
Baek and Kim (2018) | LSTM | US | yahoo | Yes | |
Wang et al. (2018) | CNN | US | unspecified | No | |
Zhang et al. (2020c) | Autoencoder | China | unspecified | No | |
Portfolio management | Liang et al. (2018) | DRL | China | investing, wind | Yes |
Park et al. (2020) | DRL | Korea, US | investing, yahoo | No | |
Guo et al. (2018) | DRL, CNN | China | unspecified | Yes | |
Wang and Wang (2019) | FFNN | US | bloomberg | No | |
Market simulation | Maeda et al. (2020) | DRL, LSTM, CNN | Simulated | none | No |
Buehler et al. (2020) | Autoencoder | US | unspecified | Yes | |
Raman and Leidner (2019) | DRL | US | trkd | No | |
Stock selection | Zhang et al. (2020a) | FFNN | China | unspecified | No |
Amel-Zadeh et al. (2020) | RNN, FFNN | US | wrds | No | |
Yang et al. (2019) | CNN, LSTM | China | unspecified | No | |
Risk management | Arimond et al. (2020) | CNN, FFNN, LSTM, RNN | EU, UK, US | refinitive | No |
Hedging strategy | Ruf and Wang (2020) | FFNN | EU, US | optionm, datashop | Yes |
TS | PP | MS | SS | PM | RM | HS | |
---|---|---|---|---|---|---|---|
Returns | 13 | 8 | 2 | 2 | 4 | 1 | 0 |
MDD | 8 | 2 | 1 | 2 | 0 | 0 | 0 |
Sharpe ratio | 7 | 3 | 1 | 1 | 3 | 0 | 0 |
Sortino ratio | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
Calmar ratio | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
Accuracy | 3 | 1 | 0 | 2 | 0 | 0 | 0 |
Volatility | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
Recall | 2 | 1 | 1 | 1 | 0 | 0 | 0 |
Precision | 2 | 2 | 1 | 1 | 0 | 0 | 0 |
F-score | 2 | 1 | 1 | 1 | 0 | 0 | 0 |
VaR threshold | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
MAE | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
MAPE | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
MSE | 1 | 3 | 0 | 0 | 0 | 0 | 1 |
-
Returns is the most common financial evaluation metric because it can more intuitively evaluate profitability.
-
Maximum drawdown and Sharpe ratio are also common, especially for trade strategy and price prediction specialization.
-
The Sortino and Calmar ratios are not as common, but they are useful, especially given the Sortino ratio improves upon the Sharp ratio, and the Calmar ratio adds metrics related to risk assessment. Furthermore, neither is computationally expensive.
-
For completeness, some studies include ML evaluation metrics such as accuracy and precision; however, financial evaluation metrics remains the focus when backtesting.
-
Mean square error is the more common error type used (i.e., more common than MAE or MAPE).
4.2.1 Findings: trade strategy
4.2.2 Findings: price prediction
4.2.3 Findings: portfolio management
4.2.4 Findings: market simulation
4.2.5 Findings: stock selection
4.2.6 Findings: risk management
4.2.7 Findings: hedging strategy
Ref. | Highlights/pros | Problems/cons |
---|---|---|
Trade strategy | ||
Wang et al. (2019c) | Clear implementation of DRL and LSTM with adequate historical data and extensive evaluation metrics | Examples of interpretability should be provided for more than one timeframe |
Li et al. (2020) | DRL hybrid with Adaboost ensemble that provides good performance | Discussion included ML evaluations that were not presented |
Théate and Ernst (2020) | Extensive evaluation criteria with adequate consideration for trading cost | Details of backtesting not provided |
Zhang et al. (2020d) | Includes tests across a vast amount of financial instruments and evaluation measures | Unclear on how or why cross-validation was combined with the backtesting approach that was employed to control overfitting. |
Chakole and Kurhekar (2020) | Focus on market trends using extensive financial and ML evaluation metrics and incorporating transaction costs | Interpretability insights needed to provide context for the good performance |
Wu et al. (2019) | Extensive evaluation metrics and well-defined backtesting strategy | Lacking conversation regarding interpretability |
Hu et al. (2018b) | Uses chart representations of financial data as DL input, producing a good performance | Numerical representation of the same data is missing, precluding a balanced comparison. Furthermore, there is no discussion of model interpretability |
Lei et al. (2020) | ResNet used to improve the effectiveness of moving average indicators in terms of financial and ML evaluation metrics | Minimal explanation for the model’s performance |
Chen et al. (2018b) | Uses a significant amount of high-frequency trading data as input image in a pair trading setup using CNN | Minimal evaluation of returns and no evaluation results presented. Furthermore, there are no comparisons with the raw numerical data |
Wu et al. (2020) | Uses a high-frequency trading technique to predict profitability on daily options trading using LSTM. Provides extensive details regarding the backtesting approach | No baseline comparison provided |
Koshiyama et al. (2020) | Uses LSTM encoder-decoder to transfer trends across 58 different global markets with impressive results across multiple financial and ML evaluation metrics | No interpretation of the model’s operation or details of the featured transfer |
Sun et al. (2019) | Predicts futures market movement using LSTM across multiple criteria, including simulated live trading. Additionally, multiple well backtested models are generated with different parameters and time windows | Lack of clarity regarding why the presented model’s accuracy is worse than chance. A baseline comparison with financial metrics could provide that clarity |
Silva et al. (2020) | Clear presentation of the strategy, and evaluation across multiple financial and ML criteria | No insights or explanations regarding the output of the LSTM model employed |
Wang et al. (2020) | Combines LSTM with market indicators in a novel manner with promising results | The presented evaluation is unclear, and no insights are offered regarding the model’s performance |
Chalvatzis and Hristu-Varsakelis (2020) | Nine DL ML models are combined with LSTM in an ensemble; a well-formalized trading strategy, training, and testing conducted using a practical rolling windows approach and a complete set of evaluation criteria | Although the general discussion regarding evaluation is extensive, it does not provide insights into the model’s performance in relation to the input features |
Price prediction | ||
Wang et al. (2019a) | Convolutional LSTM enables price prediction with improved performance while controlling for overfitting | Lack of discussion regarding explainability |
Zhang et al. (2019) | Combines LSTM and CNN to capture spatial structure in LOB and features sufficient backtesting | Given the approach to backtesting, there is no indication of whether multiple models have been created or the same model is updated |
Zhao et al. (2018) | Uses market charts as input for a CAE that serves as LSTM input | Approach to backtesting unclear due to the unusual data split across training, validation, and test sets. Furthermore, lacks sufficient baseline comparisons and offers no discussion regarding model explainability |
Zhang et al. (2020b) | Combines LSTM with Autoencoder and CNN for improved predictive results across financial and ML metrics | Insufficient discussion regarding model explainability |
Fang et al. (2019) | Regression model is combined with LSTM for better predictive performance | Concludes that the results are not stable for backtested data |
Baek and Kim (2018) | Uses LSTM for data augmentation, specifically targeting controlling overfitting. Provides extensive results across ML, financial, and statistical criteria and discusses model performance | No justification for why the work only considered price data, rendering the provided model’s explanation less complete |
Wang et al. (2018) | Uses one-dimensional CNN for price prediction, demonstrating better generalization than SVM and FFNN | Makes an argument against a buy-and-hold baseline; however evaluation results based on the argument would be sufficient evidence. More discussion regarding model explainability needed |
Zhang et al. (2020c) | Provides good evaluation results for the use of an Autoencoder for feature reduction in an ensemble learning setup | Lack of clarity regarding the backtesting strategy and the data splits. Furthermore, no discussion provided regarding model explainability |
Portfolio management | ||
Liang et al. (2018) | Early attempt at using DRL in the financial market featuring sufficient backtesting and evaluation of results | No discussion regarding model explainability. Furthermore, the paper concludes that the results are unfavorable |
Park et al. (2020) | Uses Q Learning to derive trading strategies in a simulated feature space to gain experience beyond the available data. Impressive performance in comparison to the baseline | No discussion concerning insights into the model’s decisions |
Guo et al. (2018) | Ensemble of portfolio management using the existing state-of-the-art strategy with DRL to provide a vast improvement on returns. | Lack of discussion regarding model explainability, a necessity for insights into the vastly improved performance |
Wang and Wang (2019) | Uses ResNet to address overfitting problems when presented with noisy financial data. Sufficient backtesting results provided across statistical and financial metrics | No insights into model performance provided to help understand the data features contributing to the performance |
Market simulation | ||
Maeda et al. (2020) | Uses DRL and LSTM to simulate market data, enabling the creation of theoretical market conditions with impressive results for returns compared to the baseline | Lack of discussion regarding model explainability |
Buehler et al. (2020) | Provides a good overview of the theories involved in generative financial data modeling | Although there is an argument that no value is derived from using more data, it is worth investigating including a comparison with more kinds of real data based on multiple market scenarios |
Raman and Leidner (2019) | Simulates up to a year of market data using only 6 weeks of real market data; simulated data is used with DRL for test trading decisions with sufficient baseline comparisons | No financial evaluation metrics for the simulated trades. Furthermore, performance implications of a longer time frame for input and simulated data would be useful |
Stock selection | ||
Zhang et al. (2020a) | Combines LSTM with boosting ensembles to identify key market features and reduce overfitting | No comparisons with traditional feature reduction methods such as PCA and no discussion of model explainability |
Amel-Zadeh et al. (2020) | Using only fundamental data, compares RNN and FFNN models with non-DL algorithms; the non-DL methods outperform the DL models | No information on the completeness of the input data, namely, lagged dates, and no insights into the model’s performance |
Yang et al. (2019) | Compares CNN with LSTM, with features derived from profit indicators | No comparisons with other baseline strategies and no discussion of model explainability |
Risk management | ||
Arimond et al. (2020) | Specifically targeted at using CNN and LSTM to estimate VaR with a focus on future research potentials | Fails to formally present the baseline evaluation results or provide an explanation or suggestions regarding potential model performance |
Hedging strategy | ||
Ruf and Wang (2020) | Uses FFNN to predict a derived metric that it uses in a hedging strategy with promising results | No consideration of the state of model or discussion of insights from the model output |
4.3 Lessons learned
5 Challenges and future directions
5.1 Challenges
5.1.1 Availability of historical market data
5.1.2 Access to supplementary data
webhose.io
, to provide API access to supplementary news data for research purposes.5.1.3 Long term investment horizon
5.1.4 Effect of capital gains tax
5.1.5 Financial ML/DL framework
5.2 Future directions
-
Applicability in practice This work’s focus has been on ensuring we attend to how previous works have been validated in practice. Industry applicability, trustworthiness, and usability (The Institute for Ethical AI & Machine Learning 2020; Gundersen et al. 2018) should be our core guiding forces as we expand computer science learnings and research into domain-specific applications such as the financial market. One approach is ensuring that we adhere to guiding protocols, such as backtesting, when conducting research experiments in the financial market context (Arnott et al. 2018). This aligns with pertinent AI research topics such as reproducibility and explainability (i.e., XAI).
-
Improvements in trust Although significant attention has recently been focused on AI trustworthiness, there remains much work to be done. An important principle for building trust in AI is explicability, which entails creating explainable and accountable AI models (Thiebes et al. 2020). Ensuring that research is explicable further improves the chance of employing that research in real-world scenarios. Recall that Sect. 3.2.1 indicated that feature importance could provide explainable insights from input features, which, in turn, endow trust. There remains substantial work to be done on this matter, as the summaries provided in Table 12 evidence, especially the limited attention given to explainability. Another important point of tension for generating trust in AI is reproducibility. Among other considerations, publications must be easy to validate by external researchers. Notably, (Thiebes et al. 2020) provides a checklist including relevant statistical items and code and data availability. However, of the 35 papers reviewed, only seven (20%) provide the source code for their research. Ensuring that all published works include access to the source code and data would help increase trust, making industrial application more plausible.
-
Public availability of data One means of improving trust in AI research is the availability of public data that researchers can use as a benchmark. Unfortunately, because this is relatively uncommon for financial market research, relevant fundamental (i.e., quarterly reports), alternative (i.e., news and social media), and granular/intraday market data are often behind paywalls. This means that even if most researchers were to publish their source code, they still might not be able to publish their data due to legal implications. While efforts made by corporate organizations such as Twitter is laudable (Tornes and Truijillo 2021), there remains work to be done by the industry and researchers to make relevant research data available for this purpose. An ideal set would be historical market data over a long period, with corresponding fundamental and alternative data sets. Although WRDS (Wachowicz 2020) is a good source of such for research purposes, research institutions must choose to subscribe and will provide varying levels of access based on financial commitment.
-
Focus on long-horizon More emphasis should be made to apply DL market strategies to long-horizon investments targeted at growth investing. As previously mentioned, significantly more gains can be expected in the long-term investment horizon (i.e., > a year) by focusing on potential unicorns in their early stage. The consideration that one common investment portfolio type is retirement funds, which feature a relatively long time span, makes a compelling case for considering modeling techniques focused on long-term returns. However, a potential drawback is that this complicates evaluating annualized metrics, especially for longer-term objectives. A hybrid approach might be to mix a short-term strategy with a vision for the long term. Additionally, employing alternative data, such as news articles, about not only the company of interest but also competitors can enable longer-term horizons to be better forecast. Additionally, tracking either or both geopolitical and environmental events and their potential impacts to “learn from the past” represents an interesting future study direction.
-
Financial DL frameworks Significant work has been done to apply ML to stock market research. However, unified frameworks remain uncommon, especially in DL research. Thus, a useful step would be to develop a financial DL toolbox for online learning using non-stationary financial data that are inherently volatile (Pesaranghader et al. 2016). Section 3.2.1 discussed the peculiarities of learning from non-stationary time-series data pertaining to the stock market. A unified financial DL toolbox improved by different research would help to foster innovation based on newer ideas.