1 Introduction
2 Literature review
Work | Language | Method | Features | Ground Truth Annotation | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Machine Learning | Deep Learning | BoW | Topics | Emotion Terms | Behavioural | Metadata | Embeddings | Expert | Survey | Self-declared | ||
Tsugawa et al. (2013) | Japanese | Regression (Reg Coef 0.43) | ✓ | SDS | ||||||||
Resnik et al. (2015) | English | Support Vector Regression Precision (62%-74%) | ✓ | LDA SLDA SNLDA | LIWC | ✓ | ✓ | |||||
Resnik et al. (2013) | English | Linear Regression Precision (43%-50%) | LDA | LIWC | Big-5 BDI | |||||||
Tsugawa et al. (2015) | Japanese | Support Vector Machines Accuracy (61%-66%) | ✓ | LDA | Japanese Lexicon | ✓ | ✓ | CES-D BDI | ✓ | |||
De Choudhury et al. (2013) | English | Support Vector Machines Accuracy (70%) | LIWC | ✓ | ✓ | CES-D BDI | ✓ | |||||
Reece et al. (2016) | English | Random Forest–ROC (0.87–0.89) | LIWC A-NEW | ✓ | ✓ | ✓ | CES-D TSQ | ✓ | ||||
Almouzini et al. (2019) | Arabic | Multiple Classifiers Accuracy (55%–87%) | ✓ | CES-D PHQ-9 | ||||||||
Shetty et al. (2020) | English | Multiple Classifiers Accuracy (72%–76%) | LSTM & CNN Accuracy (93%–95%) | ✓ | NA | NA | NA | |||||
Husseini Orabi et al. (2018) | English | SVM –Accuracy (73%–77%) | CNN &RNN Accuracy (51%-87%) | ✓ | Word2Vec CBOW Skip-grams | ✓ | ✓ |
2.1 Analyzing the public wellbeing
2.2 Feature engineering challenges in machine learning-based methods for wellbeing assessment
2.3 Contextual language models’ limitations in deep learning-based wellbeing assessment methods
2.4 Inadequacy of existing labelled training data for wellbeing
3 Methods
3.1 Module 1: distant supervision
3.1.1 Parsing JSON objects
3.1.2 Cleaning and pre-processing tweet’s text
3.1.3 Annotation-based on distant supervision
3.2 Module 2: assessment and prediction
3.3 Module 3: analytics and visualization
4 Experimental work and results
4.1 Validating distant supervision method
Original Tweet | Processed tweet | Annotation |
---|---|---|
We can heal the Earth
https://t.co/RwrL0BR1Vf | we can heal the earth love | Positive |
Please retweet. Yes Trump said worries about cornov is the new hoax
. https://t.co/SsKL9Wuq4L | please retweet yes trump said worries about cornov is the new hoax angry | Negative |
4.2 Validating wellbeing predictions with BERT-based sentence embeddings
Data | Size |
---|---|
Total | 10,888 |
Negative tweets | 9369 |
Positive tweets | 1519 |
4.2.1 Pre-processing of the evaluation dataset
4.2.2 Baselines with different feature vector models
4.2.3 Experiments and results
Features | Measures | LR | KNN | SVM | DT |
---|---|---|---|---|---|
BERT | Precision | 0.81 | 0.86 | 0.87 | 0.84 |
F-measure | 0.85 | 0.87 | 0.88 | 0.84 | |
BoW | Precision | 0.7 | 0.67 | 0.87 | 0.61 |
F-measure | 0.47 | 0.56 | 0.54 | 0.61 | |
NRC | Precision | 0.45 | 0.47 | 0.45 | 0.47 |
F-measure | 0.47 | 0.47 | 0.47 | 0.47 | |
Word2Vec | Precision | 0.64 | 0.71 | 0.63 | 0.60 |
F-measure | 0.62 | 0.54 | 0.62 | 0.60 |
Features | Measures | LR | KNN | SVM | DT |
---|---|---|---|---|---|
BERT | Precision | 0.74 | 0.83 | 0.84 | 0.8 |
F-measure | 0.8 | 0.83 | 0.85 | 0.8 | |
BoW | Precision | 0.68 | 0.63 | 0.80 | 0.58 |
F-measure | 0.46 | 0.57 | 0.56 | 0.58 | |
NRC | Precision | 0.43 | 0.56 | 0.43 | 0.53 |
F-measure | 0.46 | 0.51 | 0.46 | 0.47 | |
Word2Vec | Precision | 0.7 | 0.81 | 0.69 | 0.60 |
F-measure | 0.69 | 0.52 | 0.68 | 0.60 |
Features | Measures | LR | KNN | SVM | DT |
---|---|---|---|---|---|
BERT | Precision | 0.81 | 0.87 | 0.87 | 0.85 |
F-measure | 0.85 | 0.87 | 0.88 | 0.84 | |
BoW | Precision | 0.7 | 0.67 | 0.87 | 0.61 |
F-measure | 0.47 | 0.56 | 0.53 | 0.61 | |
NRC | Precision | 0.45 | 0.45 | 0.45 | 0.47 |
F-measure | 0.47 | 0.47 | 0.47 | 0.47 | |
Word2Vec | Precision | 0.64 | 0.78 | 0.64 | 0.57 |
F-measure | 0.62 | 0.52 | 0.62 | 0.57 |