1 Introduction
1.1 Background
1.2 Focus
1.3 Hypothesis
-
Polarity indexes calculated from text data contain information that precedes traditional financial time series data such as stocks.
-
Polarity indexes calculated from text data can be a signal to rebalance a portfolio, and this signal can affect increases in portfolio performance.
-
Portfolio performance can be improved by switching between risk-minimizing and return-maximizing optimization strategies according to the change points created by the polarity index.
1.4 Contributions
-
Proposed a highly expressive asset allocation framework using financial text mining techniques.
-
Demonstrated that the estimation of regime change points using financial text is material for active management.
-
Demonstrated that the preceding and following relationships between financial time series and text are material for active management.
2 Related Works
2.1 Asset Allocation Using Machine Learning
2.2 Creation of Economic Index Using Text Mining
2.3 Causal Inference and Its Applications
2.4 Time Series Change Point Detection and Its Applications
3 Task Setting
3.1 SSAAM Overview
-
Step 1 (Creating polarity index): Score financial news titles using MLM scoring. In addition, quartiles are calculated from the same data, and a three-value classification of positive, negative, and neutral is performed according to the quartile range. The calculated values are aggregated daily.
-
Step 2 (Demonstration of leading effects): We use statistical causal inference to demonstrate whether financial news has leading effects on a stock portfolio. Use the polarity index created in Step 1. We also create a portfolio of 10 stocks combined. We use the VAR-LiNGAM algorithm.
-
Step 3 (Change point detection): Verify that the polarity index has leading effects in Step 2. Calculate the regime change point of the polarity index using the change point detection algorithm. We use the Binary Segmentation Search Method algorithm.
-
Step 4 (Portfolio optimization): Portfolio optimization is performed based on the change points created in Step 3. We use the Entropic value-at-risk (EVaR) optimization algorithm.
3.2 Framework Validity
4 Method
4.1 Creating Polarity Index
Classification method | Sentiment score |
---|---|
1st quartile > PLLs | \(-\)1 (negative) |
1st quartile \(\le \) PLLs \(\le \) 3rd quartile | 0 (neutral) |
3rd quartile < PLLs | 1 (positive) |
4.2 Demonstration of Leading Effects
4.3 Change Point Detection
4.4 Portfolio Optimization
-
Minimize risk optimization: A convex optimization problem with constraints imposed to minimize EVaR given a level of expected \(\mu \) (\(\widehat{\mu })\).

-
Maximize return optimization: A convex optimization problem imposed to maximize expected return given a level of expected EVaR (\(\widehat{EVaR}\)).

5 Experiments , Results
5.1 Dataset Description
-
Stock Data: We used the daily stock data provided by Yahoo!Finance.1 The stocks used are the components of the NYSE FANG+ Index: Facebook, Apple, Amazon, Netflix, Google, Microsoft, Alibaba, Baidu, NVIDIA, and Tesla were selected. For these data, adjusted closing prices are used. The time period for this data is January 2015 through December 2019.
-
Financial News Data: We used the daily historical financial news archive provided by Kaggle,2 a data analysis platform. The data represent the historical news archive of U.S. stocks listed on the NYSE/NASDAQ for the past 12 years. The data were confirmed to contain information on 10 stock data issues. The data consist of nine columns and 221,513 rows; the title and release date columns were used in this study. The time period for the data is January 2015 through December 2019.
5.2 Preparation for Back-Testing
Mean | Std | Min | 25% | 50% | 75% | Max |
---|---|---|---|---|---|---|
\(-\)2.06 | 12.30 | \(-\)50.00 | \(-\)8.00 | 0.00 | 5.00 | 39.00 |
Data | Test Statistic |
---|---|
Financial news data | \(-\)5.09 |
Stock data | \(-\)0.73 |
Causal direction | Causal graph value |
---|---|
Index(t-1) \(\dashrightarrow \) index(t) | 0.39 |
Index(t-1) \(\dashrightarrow \) portfolio(t) | 0.11 |
Portfolio(t-1) \(\dashrightarrow \) portfolio(t) | 1.00 |
-
Precision: Precision is the ratio of how many correct values are included in the positive class and the predicted sample. In the context of change point detection, precision is defined as follows:
Regime | Precision | HM |
---|---|---|
5 | 0.67 | 240.00 |
10 | 0.75 | 39.0 |
5.3 Back-Testing Scenarios
-
CPD-EVaR++ (proposed): Change point rebalancing using risk minimization and return maximization EVaR optimization + regular intervals rebalancing strategy
-
CPD-EVaR+: Change point rebalancing using risk minimization and no-restrictions EVaR optimization + regular intervals rebalancing strategy
-
EVaR: EVaR optimization regular intervals rebalancing strategy
-
CVaR: CVaR optimization regular intervals rebalancing strategy
-
MV: Mean-Variance optimization regular intervals rebalancing strategy
Regime | Selected Model | Formula |
---|---|---|
1 | MinRiskOpt | |
2 | MaxReturnOpt | |
3 | MinRiskOpt | |
4 | MaxReturnOpt | |
5 | MinRiskOpt |
Regime | Selected model | Formula |
---|---|---|
1 | MinRiskOpt | |
2 | MaxReturnOpt | |
3 | MinRiskOpt | |
4 | MaxReturnOpt | |
5 | MinRiskOpt | |
6 | MaxReturnOpt | |
7 | MinRiskOpt | |
8 | MaxReturnOpt | |
9 | MinRiskOpt | |
10 | MaxReturnOpt |
5.4 Evaluation by Back-Testing
-
Total return (TR): TR refers to the total return earned from an investment in an investment product within a given period. TR formula is as follows: TR = valuation amount + cumulative distribution amount received + cumulative amount sold – cumulative amount bought. This study does not incorporate tax amounts and trading commissions.
-
Maximum drawdown (MDD): MDD refers to the rate of decline from the maximum asset. MDD formula is as follows: MDD = (trough value – peak value)/peak value.
Rebalance | Regime | Algorithm | TR [%] | MDD [%] |
---|---|---|---|---|
30 days | 5 | CPD-EVaR++ | 810.9915 | 26.8629 |
CPD-EVaR+ | 594.7410 | 26.8629 | ||
10 | CPD-EVaR++ | 485.5201 | 45.0235 | |
CPD-EVaR+ | 392.1392 | 42.4803 | ||
90 days | 5 | CPD-EVaR++ | 535.7349 | 27.6386 |
CPD-EVaR+ | 410.8530 | 27.6386 | ||
10 | CPD-EVaR++ | 417.8354 | 27.7646 | |
CPD-EVaR+ | 373.5849 | 27.7646 | ||
180 days | 5 | CPD-EVaR++ | 152.0988 | 27.3924 |
CPD-EVaR+ | 131.2210 | 27.3924 | ||
10 | CPD-EVaR++ | 169.2992 | 25.3050 | |
CPD-EVaR+ | 232.4513 | 25.3050 |
Rebalance | Algorithm | TR [%] | MDD [%] |
---|---|---|---|
30 days | EVaR | 587.9630 | 46.6651 |
CVaR | 558.7446 | 44.4532 | |
MV | 527.2827 | 42.9851 | |
90 days | EVaR | 500.1421 | 44.9860 |
CVaR | 496.7423 | 44.0592 | |
MV | 459.1195 | 42.7358 | |
180 days | EVaR | 353.2412 | 44.7714 |
CVaR | 382.9451 | 44.2525 | |
MV | 360.4298 | 42.8165 |