Introduction
-
A novel methodology for reconstructing business links between companies based on news events.
-
A novel approach to predicting company volatility based on news events.
-
A novel approach to predicting an occurrence of new events for pairs of companies.
Literature review
Reference | Text type | Text source | No. of items |
---|---|---|---|
David et al. (1989) | General news | New York Times | 49 events |
Mitchell and Mulherin (1994) | Financial news | Dow Jones & Company | 752,647 headlines |
Brad and Douglas (1993) | Financial news columns | Wall Street Journal | 48 + 48 stock picks |
Schumaker and Chen (2009) | Financial news | Yahoo Finance | 9,211 |
Wu et al. (2009) | General news | The Standard | ? |
Lumsdaine (2010) | News readership data | Bloomberg | ? |
Fehrer and Feuerriegel (2015) | Ad hoc announcements | Corporate disclosures | 8359 headlines |
Yu et al. (2013) | General news and social media | Google Blogs, BoardReader, Twitter, Google News | 52,746 messages |
Hagenau et al. (2013) | Corporate announcements and financial news | DGAP, EuroAdhoc | 10,870 and 3478 respectively |
Shen et al. (2016) | General news | Baidu News | 6250 articles |
Ding et al. (2015) | Financial news | Reuters and Bloomberg | 664,399 Documents |
Zhang et al. (2016) | Financial news Column | NetEase | 136 + 106 events |
Data
Financial data
Ticker | Reference | Ticker | Reference | Ticker | Reference |
---|---|---|---|---|---|
AAPL | Apple | ABBV | Abbvie | ABT | Abbott |
ACN | Accenture | AIG | American International Group | AMGN | Amgen |
AMZN | Amazon | AXP | American Express | BA | Boeing |
BAC | Bank of America | BIIB | Biogen | BLK | Blackrock |
BMY | Bristol-Myers Squibb | C | Citigroup | CAT | Caterpillar |
CELG | Celgene | CL | Colgate-Palmolive | COP | ConocoPhillips |
COST | Costco Wholesale | CSCO | Cisco | CVS | CVS Health |
DHR | Danaher | DIS | Walt Disney | DUK | Duke Energy |
EBAY | Ebay | EMR | Emerson Electric | ESRX | Express Scripts |
F | Ford | FB | Facebook | FOXA | Century Fox |
GE | General Electric | GILD | Gilead Sciences | GM | General Motors |
GS | Goldman Sachs | HD | Home Depot | HON | Honeywell |
HPQ | HP | IBM | IBM | INTC | Intel |
JNJ | Johnson & Johnson | JPM | JPMorgan | KO | Coca-Cola |
LLY | Eli Lilly | LMT | Lockheed Martin | MA | Mastercard |
MCD | McDonald’s | MDLZ | Mondelēz | MDT | Medtronic |
MET | MetLife | MO | Altria | MON | Monsanto |
MRK | Merck & Co | MS | Morgan Stanley | MSFT | Microsoft |
NKE | Nike | ORCL | Oracle | OXY | Occidental Petroleum |
PEP | PepsiCo | PFE | Pfizer | PG | Procter & Gamble |
PM | Philip Morris | QCOM | Qualcomm | SBUX | Starbucks |
SLB | Schlumberger | T | AT&T | TWX | Time Warner |
UNH | UnitedHealth | UNP | Union Pacific | UPS | United Parcel Service |
USB | U.S. Bancorp | UTX | United Technologies | VZ | Verizon |
WFC | Wells Fargo | WMT | Walmart | XOM | ExxonMobil |
News data
Proposed methodology
Event embedding
Word2vec word embedding
Events to vectors
Graph formulation
Company graph formulation—direct comparison
Alternative graph formulations
Centroid network
Financial model
Predictive models
Regression models
Point process model
Results
Business graph creation
Case study 1: most popular companies
Case study 2: 3 companies
Market volatility prediction from news
Model | W | News | Finance | News + Finance | |||
---|---|---|---|---|---|---|---|
MAE | MSE | MAE | MSE | MAE | MSE | ||
Decision tree | cent | 0.775 | 35.091 | 0.717 | 3.375 | 0.687 | 3.267 |
clst | 0.768 | 35.076 | 0.717 | 3.375 | 0.719 | 3.322 | |
cos | 0.809 | 35.187 | 0.717 | 3.375 | 0.666 | 3.280 | |
count | 0.710 | 34.853 | 0.717 | 3.375 | 0.727 | 3.402 | |
Gaussian process | cent | 0.587 | 34.314 | 0.570 | 2.699 | 0.566 | 2.689 |
clst | 0.587 | 34.311 | 0.570 | 2.699 | 0.571 | 2.703 | |
cos | 0.586 | 34.309 | 0.570 | 2.699 | 0.567 | 2.706 | |
count | 0.589 | 34.312 | 0.570 | 2.699 | 0.565 | 2.730 | |
KernelRidge | cent | 2.347 | 50.602 | 0.611 | 3.084 | 1.243 | 7.191 |
clst | 1.981 | 42.257 | 0.611 | 3.084 | 2.478 | 18.887 | |
cos | 2.743 | 57.785 | 0.611 | 3.084 | 1.242 | 11.319 | |
count | 1.227 | 35.929 | 0.611 | 3.084 | 2.709 | 32.986 | |
Linear regression | cent | 2.644 | 55.001 | 0.612 | 3.096 | 3.244 | 40.500 |
clst | 2.932 | 62.219 | 0.612 | 3.096 | 4.134 | 57.534 | |
cos | 2.969 | 62.224 | 0.612 | 3.096 | 3.391 | 62.338 | |
count | 2.967 | 65.251 | 0.612 | 3.096 | 3.291 | 73.918 | |
Nearest neighbors | cent | 0.559 | 34.293 | 0.661 | 2.844 | 0.618 | 2.694 |
clst | 0.579 | 34.403 | 0.661 | 2.844 | 0.642 | 2.796 | |
cos | 0.542 | 34.375 | 0.661 | 2.844 | 0.617 | 2.779 | |
count | 0.562 | 34.379 | 0.661 | 2.844 | 0.667 | 2.797 | |
Neural net | cent | 0.801 | 35.289 | 1.513 | 21.822 | 0.700 | 3.587 |
clst | 0.922 | 36.004 | 1.491 | 20.703 | 0.743 | 3.726 | |
cos | 0.730 | 34.990 | 1.493 | 20.175 | 0.755 | 4.007 | |
count | 1.150 | 36.867 | 1.482 | 20.203 | 0.943 | 5.766 | |
Random forest | cent | 0.699 | 34.401 | 0.660 | 2.880 | 0.627 | 2.804 |
clst | 0.697 | 34.397 | 0.660 | 2.880 | 0.638 | 2.695 | |
cos | 0.697 | 34.416 | 0.660 | 2.880 | 0.635 | 2.904 | |
count | 0.662 | 34.325 | 0.660 | 2.880 | 0.662 | 2.861 |
Model | W | MAE | MSE |
---|---|---|---|
Decision tree | cent | 0.957 | 0.968 |
clst | 1.002 | 0.984 | |
cos | 0.929 | 0.972 | |
count | 1.013 | 1.008 | |
Gaussian process | cent | 0.992 | 0.996 |
clst | 1.001 | 1.002 | |
cos | 0.994 | 1.002 | |
count | 0.990 | 1.011 | |
KernelRidge | cent | 2.034 | 2.331 |
clst | 4.053 | 6.123 | |
cos | 2.032 | 3.670 | |
count | 4.431 | 10.695 | |
Linear regression | cent | 5.298 | 13.083 |
clst | 6.750 | 18.585 | |
cos | 5.538 | 20.137 | |
count | 5.374 | 23.878 | |
Nearest neighbors | cent | 0.936 | 0.947 |
clst | 0.971 | 0.983 | |
cos | 0.934 | 0.977 | |
count | 1.010 | 0.984 | |
Neural net | cent | 0.463 | 0.164 |
clst | 0.499 | 0.180 | |
cos | 0.505 | 0.199 | |
count | 0.636 | 0.285 | |
Random Forest | cent | 0.951 | 0.974 |
clst | 0.968 | 0.936 | |
cos | 0.963 | 1.008 | |
count | 1.004 | 0.993 |
Sector level results
Model | W | Consumer discretionary | Consumer staples | Energy | Financials | Health care | Industrials | Information technology | Materials and utilities | Telecommunication services |
---|---|---|---|---|---|---|---|---|---|---|
mae | mae | mae | mae | mae | mae | mae | mae | mae | ||
(a) Errors of prediction models per sector with news | ||||||||||
Decision tree | cent | 0.617 | 0.521 | 0.649 | 0.767 | 0.799 | 0.836 | 0.606 | 0.385 | 0.774 |
clst | 0.641 | 0.563 | 0.697 | 0.777 | 0.826 | 0.867 | 0.655 | 0.444 | 0.798 | |
cos | 0.590 | 0.502 | 0.621 | 0.733 | 0.780 | 0.831 | 0.594 | 0.337 | 0.759 | |
count | 0.644 | 0.565 | 0.698 | 0.800 | 0.850 | 0.903 | 0.629 | 0.408 | 0.792 | |
Gaussian process | cent | 0.489 | 0.399 | 0.566 | 0.619 | 0.676 | 0.717 | 0.500 | 0.309 | 0.642 |
clst | 0.494 | 0.406 | 0.568 | 0.623 | 0.681 | 0.722 | 0.506 | 0.315 | 0.647 | |
cos | 0.490 | 0.401 | 0.565 | 0.619 | 0.677 | 0.718 | 0.502 | 0.310 | 0.643 | |
count | 0.489 | 0.400 | 0.550 | 0.620 | 0.674 | 0.716 | 0.500 | 0.309 | 0.647 | |
KernelRidge | cent | 1.131 | 1.122 | 1.200 | 1.286 | 1.345 | 1.454 | 1.167 | 0.897 | 1.364 |
clst | 2.336 | 2.415 | 2.430 | 2.509 | 2.569 | 2.709 | 2.382 | 2.129 | 2.625 | |
cos | 1.113 | 1.170 | 1.189 | 1.269 | 1.355 | 1.445 | 1.149 | 0.909 | 1.342 | |
count | 2.620 | 2.571 | 2.635 | 2.813 | 2.790 | 2.887 | 2.619 | 2.393 | 2.858 | |
Linear regression | cent | 3.020 | 3.127 | 3.185 | 3.296 | 3.358 | 3.618 | 3.118 | 2.758 | 3.504 |
clst | 3.938 | 4.196 | 4.050 | 4.142 | 4.188 | 4.405 | 4.037 | 3.746 | 4.275 | |
cos | 3.165 | 3.465 | 2.979 | 3.394 | 3.405 | 3.824 | 3.340 | 2.843 | 3.772 | |
count | 3.624 | 3.689 | 3.926 | 3.853 | 4.007 | 4.182 | 3.707 | 3.259 | 3.991 | |
Nearest neighbors | cent | 0.539 | 0.461 | 0.604 | 0.669 | 0.721 | 0.774 | 0.556 | 0.376 | 0.701 |
clst | 0.564 | 0.480 | 0.630 | 0.692 | 0.742 | 0.796 | 0.583 | 0.390 | 0.737 | |
cos | 0.537 | 0.455 | 0.606 | 0.666 | 0.720 | 0.775 | 0.557 | 0.356 | 0.712 | |
count | 0.592 | 0.502 | 0.651 | 0.722 | 0.768 | 0.822 | 0.607 | 0.422 | 0.745 | |
Neural net | cent | 0.620 | 0.489 | 0.780 | 0.771 | 0.845 | 0.855 | 0.594 | 0.361 | 0.799 |
clst | 0.643 | 0.543 | 0.757 | 0.822 | 0.881 | 0.904 | 0.651 | 0.484 | 0.845 | |
cos | 0.660 | 0.532 | 0.840 | 0.826 | 0.901 | 0.918 | 0.668 | 0.396 | 0.831 | |
count | 0.849 | 0.804 | 0.888 | 0.991 | 1.056 | 1.102 | 0.881 | 0.656 | 1.060 | |
Random forest | cent | 0.545 | 0.465 | 0.580 | 0.703 | 0.732 | 0.794 | 0.555 | 0.340 | 0.717 |
clst | 0.560 | 0.482 | 0.609 | 0.701 | 0.740 | 0.795 | 0.572 | 0.377 | 0.720 | |
cos | 0.554 | 0.478 | 0.580 | 0.705 | 0.736 | 0.804 | 0.566 | 0.345 | 0.731 | |
count | 0.581 | 0.508 | 0.634 | 0.735 | 0.772 | 0.816 | 0.581 | 0.380 | 0.734 |
Model | Consumer discretionary | Consumer staples | Energy | Financials | Health care | Industrials | Information technology | Materials and utilities | Telecommunication services | |
---|---|---|---|---|---|---|---|---|---|---|
mae | mae | mae | mae | mae | mae | mae | mae | mae | ||
Errors of prediction models per sector with no news | ||||||||||
Decision tree | 0.630 | 0.564 | 0.680 | 0.817 | 0.841 | 0.881 | 0.612 | 0.367 | 0.772 | |
Gaussian process | 0.494 | 0.406 | 0.568 | 0.622 | 0.680 | 0.721 | 0.505 | 0.315 | 0.646 | |
Kernel ridge | 0.512 | 0.462 | 0.591 | 0.656 | 0.751 | 0.768 | 0.529 | 0.324 | 0.678 | |
Linear regression | 0.513 | 0.463 | 0.592 | 0.657 | 0.753 | 0.769 | 0.530 | 0.324 | 0.679 | |
Nearest neighbors | 0.563 | 0.497 | 0.577 | 0.762 | 0.747 | 0.858 | 0.592 | 0.390 | 0.730 | |
Neural net | 0.725 | 0.756 | 0.651 | 1.001 | 0.876 | 0.956 | 0.781 | 0.461 | 0.781 | |
Random forest | 0.580 | 0.505 | 0.614 | 0.755 | 0.767 | 0.824 | 0.565 | 0.359 | 0.720 |
News event prediction
Model | Pred. log likelihood |
---|---|
Standard Hawkes | 0.994 |
Network Hawkes (Erdős-Renyi) | 1.087 |
Network Hawkes (latent distance) | 0.695 |
News as drivers of the financial market
Model | Pred. log likelihood | |
---|---|---|
Market data | Market data with news | |
Standard Hawkes | 1.32 | 1.25 |
Network Hawkes (Erdős-Renyi) | 1.28 | 1.20 |
Network Hawkes (latent distance) | 1.30 | 1.18 |