Introduction
Related works
Studies Based on Quantitative Dataset
Studies based on qualitative dataset
Studies based on both qualitative and quantitative datasets
Reference | Technique | No. of data source | Input data source | Sock data | Reported Results |
---|---|---|---|---|---|
[1] | Co-evolving tensor-based learning | 2 | W & SM | China A-share and HK stock | 55–63% |
[7] | extended coupled hidden Markov | 2 | W & HSD | China A-share | 52–63% |
[44] | NN, LR, SVM, KNN, RF, AB & KF | 2 | HSD & MD | NS | NS |
[45] | 2 | W & HSD | S&P 500 SPY index | p-value better than 0.05 | |
[24] | CNN and RNN | 2 | HSD & W | ||
[8] | Multi-Source Multiple Instance Learning | 3 | HSD, SM & W | China A-share | 62.1% |
[43] | Delta Naive Bayes (DNB) | 3 | W, SM & GS | Argentina, Peru & Mexico | p-value (0.583–0.702) |
[9] | ANN | 3 | W, SM and GS | Ghana | Accuracy (49.4 – 77.12)% |
Methodology
Study framework
Datasets
Quantitative dataset
Data source | Number of features | Percentage (%) |
---|---|---|
Twitter (SM) | 6 | 8.57 |
Web news (W) | 5 | 7.14 |
Forum discussions (FD) | 4 | 5.71 |
Macroeconomic data (MD) | 44 | 62.86 |
Historical stock data (HSD) | 10 | 14.29 |
Google trends (GTI) | 1 | 1.43 |
Total | 70 | 100 |
Qualitative (textual) dataset
Data fusion
Model design
Feature engineering with CNN
LSTM classifier
Parameters | CNN | LSTM |
---|---|---|
Input layer | 1 | |
Input feature dimension | 1–70 | 62 |
Dense Layers | 2 | 2 |
Output Layer | 1 | 1 |
Dropout rate | 0.2 | |
Epoch | 100 | 10 |
Activation | ReLU/ sigmoid | Tanh/ sigmoid functions |
Weight | Normal [0,1] | |
Optimiser | Adam | Adam |
Learning rate | 0.002 | 1e-3—1e-4 |
Objective function | Cross-entropy | Cross-entropy |
Evaluation metrics
Empirical Implementation
Layer (type) | Output Shape | Param # |
---|---|---|
conv1d_20 (Conv1D) | (None, 26, 64) | 192 |
max_pooling1d_20 (MaxPooling) | (None, 13, 64) | 0 |
flatten_20 (Flatten) | (None, 832) | 0 |
dense_39 (Dense) | (None, 50) | 41,650 |
dense_40 (Dense) | (None, 1) | 51 |
Total params: 41,893 Trainable params: 41,893 Non-trainable params: 0 |
Empirical results and discussions
Feature engineering by CNN
Training and testing results based on the optimised features
Dataset | Specificity | F-score | Sensitivity |
---|---|---|---|
Unstructured dataset | 0. 758 | 0. 6083 | 0.7445 |
Structured dataset | 0.9743 | 0.9397 | 0.9229 |
All combine | 0.9975 | 0.9672 | 0.8939 |