skip to main content
research-article
Open Access

Chinese EmoBank: Building Valence-Arousal Resources for Dimensional Sentiment Analysis

Published:19 January 2022Publication History

Skip Abstract Section

Abstract

An increasing amount of research has recently focused on dimensional sentiment analysis that represents affective states as continuous numerical values on multiple dimensions, such as valence-arousal (VA) space. Compared to the categorical approach that represents affective states as distinct classes (e.g., positive and negative), the dimensional approach can provide more fine-grained (real-valued) sentiment analysis. However, dimensional sentiment resources with valence-arousal ratings are very rare, especially for the Chinese language. Therefore, this study aims to: (1) Build a Chinese valence-arousal resource called Chinese EmoBank, the first Chinese dimensional sentiment resource featuring various levels of text granularity including 5,512 single words, 2,998 multi-word phrases, 2,582 single sentences, and 2,969 multi-sentence texts. The valence-arousal ratings are annotated by crowdsourcing based on the Self-Assessment Manikin (SAM) rating scale. A corpus cleanup procedure is then performed to improve annotation quality by removing outlier ratings and improper texts. (2) Evaluate the proposed resource using different categories of classifiers such as lexicon-based, regression-based, and neural-network-based methods, and comparing their performance to a similar evaluation of an English dimensional sentiment resource.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Sentiment analysis has emerged as a leading technique for the automatic identification of affective information texts [Pang and Lee 2008; Calvo and D'Mello 2010; Liu 2012; Feldman 2013]. In sentiment analysis, representation of affective states is an essential issue and can be generally divided into categorical and dimensional approaches [Calvo and Kim 2013].

The categorical approach represents affective states as several discrete classes such as positive, neutral, and negative, Ekman's six basic emotions (i.e., anger, happiness, fear, sadness, disgust and surprise) [Ekman 1992], and Plutchik's [1991] eight emotions (Ekman's six plus trust and anticipation). The dimensional approach represents affective states as continuous numerical values in multiple dimensions, such as valence-arousal (VA) space [Russell 1980], as shown in Figure 1. The valence represents the degree of pleasant and unpleasant (i.e., positive and negative) feelings, while the arousal represents the degree of excitement and calm. Based on this representation, any sentiment expressions can be represented as a point in the VA coordinate plane by recognizing their valence-arousal ratings. Any affective state can be represented as a point in the VA coordinate plane. Applications can benefit from such representation to provide more fine-grained (real-valued) sentiment analysis. For instance, mood analysis systems can identify high risk Twitter users with different mental illnesses because analysis of Twitter posts suggests that depressive users express lower valence and arousal than those with post-traumatic stress disorder (PTSD), and both are lower than control (normal) subjects [Preoţiuc-Pietro et al. 2015]. Product review systems can prioritize high-arousal positive (or high-arousal negative) reviews because recent marketing research suggests that these reviews are usually of interest and could drive purchasing behavior [Ren and Nickerson 2014].

Fig. 1.

Fig. 1. Two-dimensional valence-arousal space.

Affective lexicons and corpora with VA ratings are useful resources for the development of sentiment applications. For English, researchers have developed several dimensional lexicons such as the Affective Norms for English Words (ANEW) [Bradley and Lang 1999], Extended ANEW [Warriner et al. 2013], and NRC-VAD [Mohammad 2018b], and corpora such as Affective Norms for English Text (ANET) [Bradley and Lang 2007], Facebook posts [Preoţiuc-Pietro et al. 2016], and EmoBank [Buechel and Hahn 2017]. For Chinese, dimensional sentiment resources are very rare, including only one small lexicon of 162 words [Wei et al. 2011] and no corpora.

Therefore, this study focuses on building a Chinese valence-arousal resource named Chinese EmoBank, the first Chinese dimensional sentiment resource featuring various levels of text granularity including words, phrases, sentences and multi-sentence texts. Chinese EmoBank consists of two lexicons called Chinese valence-arousal words (CVAW) and Chinese valence-arousal phrases (CVAP) and two corpora called Chinese valence-arousal sentences (CVAS) and Chinese valence-arousal texts (CVAT). The CVAW contains 5,512 single words collected from two polarity-based sentiment lexicons, the Chinese LIWC (C-LIWC) [Huang 2012] and NTUSD [Ku and Chen 2007]. The CVAP contains 2,998 multi-word phrases where each phrase is composed of an affective word in the CVAW and a set of modifiers (e.g., negator, degree adverb, and modal) that modify the affective word. The CVAS contains 2,582 single sentences selected from the Twitter microblogging and social networking service. The CVAT contains 2,969 multi-sentence texts extracted from web forums, reviews, and news articles. The annotation of VA ratings is accomplished by crowdsourcing based on the Self-Assessment Manikin (SAM) rating scale [Bradley and Lang 1994]. A corpus cleanup procedure is also used to improve annotation quality by removing outlier ratings and improper texts. To further demonstrate the feasibility of the constructed resource, we evaluate it using different categories of classifiers such as lexicon-based, regression-based, and neural-network-based methods, and compare their performance to a similar evaluation of an English dimensional sentiment resource.

The rest of this paper is organized as follows. Section 2 introduces existing lexicons, corpora, and prediction methods for dimensional sentiment analysis. Section 3 describes the process of building Chinese EmoBank. Section 4 presents the analysis results and feasibility evaluation. Conclusions are finally drawn in Section 5.

Skip 2RELATED WORK Section

2 RELATED WORK

This section presents existing single-dimension and multi-dimensions sentiment lexicons and corpora, followed by a description of automatic methods for dimensional score prediction at the word-, phrase- and sentence-levels.

2.1 Dimensional Sentiment Resources

Table 1 presents the language resources for dimensional sentiment analysis. A number of one-dimensional sentiment lexicons provide sentiment intensity or strength of words, including SentiWordNet [Baccianella et al. 2010], SentiFul [Neviarouskaya et al. 2011], SO-CAL [Taboada et al. 2011], AFINN [Nielsen 2011], SentiStrength [Thelwall et al. 2012], and VADER [Hutto and Gilbert 2014]. Specifically, NRC-EIL provides sentiment intensity for eight emotions [Mohammad 2018a]. The SemEval and WASSA shared tasks also released several datasets for single words, multi-word phrases [Rosenthal et al. 2015; Kiritchenko et al. 2016], and sentences [Cortis et al. 2017; Mohammad and Bravo-Marquez 2017; Mohammad et al. 2018]. Stanford Sentiment Treebank [Socher et al. 2013] provided fully labeled parse trees containing sentiment scores at both the phrase- and sentence-levels.

Table 1.
LexiconGranularitySizeScaleDimension
SentiWordNet [Baccianella et al. 2010]Word147,306Continuous [0, 1]Valence
SentiFul [Neviarouskaya et al. 2011]Word12,900Continuous [0, 1]Valence
SO-CAL [Taboada et al. 2011]Word5,042Multi-point [−5, 5]Valence
AFINN [Nielsen 2011]Word2,477Multi-point [−5, 5]Valence
SentiStrength [Thelwall et al. 2012]Word2,609Multi-point [−4, 4]Valence
VADER [Hutto and Gilbert 2014]Word7,520Continuous [−4, 4]Valence
NRC-EIL [Mohammad 2018a]Word9,921Continuous [0, 1]Valence for Eight emotions
SemEval 2015 Task 10 [Rosenthal et al. 2015]Word/Phrase1,515 (subtask E)Continuous [0, 1]Valence
SemEval 2016 Task 7 [Kiritchenko et al. 2016]Word/Phrase3,207 (subtask 1)Continuous [−1, 1]Valence
SST [Socher et al. 2013b]Sentence11,855Continuous [0, 1]Valence
SemEval-2017 Task 5 [Cortis et al. 2017]Tweets (subtask 1) Headlines (subtask 2)2,510 (subtask 1) 1,647 (subtask 2)Continuous [−1, 1]Valence
WASSA-2017 [Mohammad and Bravo-Marquez 2017]Tweets7,097Continuous [0, 1]Valence for four emotions
SemEval-2018 Task 1 [Mohammad et al. 2018]Tweets12,634 (EI-reg) 2,567 (V-reg)Continuous [0, 1]Valence for four emotions
ANEW [Bradley and Lang 1999]Word1,034Continuous [1,9]Valence, Arousal, Dominance
Extended ANEW [Warriner et al. 2013]Word13,915Continuous [1,9]Valence, Arousal, Dominance
NRC-VAD [Mohammad 2018b]Word20,007Continuous [0, 1]Valence, Arousal, Dominance
ANET [Bradley and Lang 2007]Text120Continuous [1,9]Valence, Arousal, Dominance
Facebook posts [Preoţiuc-Pietro et al. 2016]Sentence2,895Continuous [1,9]Valence, Arousal
EmoBank [Buechel and Hahn 2017]Sentence10,062Continuous [1,9]Valence, Arousal, Dominance

Table 1. Language Resources for Dimensional Sentiment Analysis

Among multi-dimensional resources, ANEW is the first three-dimensional lexicon providing real-valued scores for the valence, arousal, and dominance dimensions [Bradley and Lang 1999]. ANEW has been extended from 1,034 words to 13,915 words [Warriner et al. 2013]. NRC-VAD provides 20,007 English words with valence, arousal, and dominance ratings [Mohammad 2018b]. In addition to lexicon resources, several multi-dimensional corpora have been proposed. ANET is the first three-dimensional corpus providing valence, arousal, and dominance ratings [Bradley and Lang 2007]. A corpus of 2,895 Facebook posts [Preoţiuc-Pietro et al. 2016] was annotated to provide two-dimensional valence and arousal ratings. EmoBank [Buechel and Hahn 2017] provides 10,062 sentences with valence, arousal, and dominance ratings. All of the above multi-dimensional lexicons and corpora are scored from 1 to 9.

2.2 Dimension Score Prediction

The above dimensional sentiment resources have been used for dimension score prediction at the word-, phrase-, and sentence-levels. These approaches can be categorized as lexicon-based [Paltoglou et al. 2013], regression-based [Wei et al. 2011, Malandrakis et al. 2013; Paltoglou and Thelwall 2013; Amir et al. 2015; Wang et al. 2016a; 2016b], and neural-network-based models [Du and Zhang 2016; Vilares et al. 2016; Wu et al. 2017; Goel et al. 2017; Zhu et al. 2019; Yu et al. 2020; Wang et al. 2020; Huang et al. 2020].

Lexicon-based methods typically determine the sentiment score of a text by averaging the sentiment scores of the words in the text [Paltoglou et al. 2013]. Regression-based methods have been intensively studied for dimension score prediction. Wei et al. [2011] proposed a cross-lingual approach that trained a linear regression model using the dimension scores of a set of English seed words (source) and their translated Chinese seed words (target). Wang et al. [2016a] further extended their work using a locally weighted linear regression model. Malandrakis et al. [2013] built a linear regression model using n-grams with sentiment scores as features. Both Paltoglou and Thelwall [2013] and Amir et al. [2015] used support vector regression (SVR). Wang et al. [2016b] developed a community-based weighted graph model that performed the regression task on a graph using a social network method to predict the dimension scores of words.

Recently, deep neural network models with word embeddings [Mikolov et al. 2013a; 2013b; Pennington et al. 2014; Bojanowski et al. 2017] or sentiment embeddings [Tang et al. 2016; Yu et al. 2018] have been widely applied to dimensional score prediction. Du and Zhang [2016] used a boosted neural network trained on character-enhanced word embeddings to predict the dimension scores of words. Vilares et al. [2016] used a CNN trained on Twitter word embeddings to determine the sentiment of tweets from highly negative to highly positive using a five-point scale. Wu et al. [2017] introduced a densely connected deep LSTM model to concatenate features at different levels to predict the dimension scores of both words and phrases. Goel et al. [2017] presented an ensemble of different neural networks to determine the intensity level for different emotion categories such as anger, fear, joy, and sadness. Zhu et al. [2019] presented an adversarial attention network to predict the dimension scores of short texts. Yu et al. [2020] proposed a pipelined neural network model to sequentially learn word intensity and modifier weights for phrase-level sentiment intensity prediction. Wang et al. [2020] developed a regional CNN-LSTM model that integrates both local (regional) information within sentences and long-distance dependency across sentences to predict the dimension scores of long texts. Huang et al. [2020] incorporated a context-dependent sentiment lexicon into a 3-channel CNN to predict the strength of both words and texts.

Skip 3THE CHINESE EMOBANK CONSTRUCTION Section

3 THE CHINESE EMOBANK CONSTRUCTION

This section describes the process of building the Chinese EmoBank including the CVAW, CVAP, CVAS, and CVAT.

3.1 Data Collection

The words in CVAW are collected from two polarity-based sentiment lexicons, C-LIWC [Huang 2012] and NTUSD [Ku and Chen 2007]. These affective words are then combined with a set of modifiers such as negators (e.g., not), degree adverbs (e.g., very), and modals (e.g., should) to form multi-word phrases, i.e., the CVAP. The frequency of each phrase is then retrieved from a large web-based corpus. Only phrases with a frequency greater than or equal to 3 are retained as candidates. To prevent several modifiers from dominating the whole resource, each modifier (or modifier combination) can have at most 50 phrases. In addition, the phrases are selected to maximize the balance between positive and negative words, which helps prevent positive or negative words from dominating the whole resource. The balance is accomplished by, for example, randomly selecting 25 positive words and 25 negative words to constitute the 50 phrases for each modifier. Finally, a total of 2,998 multi-word phrases are selected into the CVAP. To build the CVAS, we first collect Chinese tweets from the social networking service Twitter, selecting tweets that contain the greatest number of affective words found in the CVAW. The tweets are then split into sentences using existing punctuation for a total of 2,582 single sentences excluding emoticons, URLs, and abusive language. For the CVAT, we collect web texts from six different categories: news articles, political discussion forums, car discussion forums, hotel reviews, book reviews, and laptop reviews. Texts containing incomplete semantics and abusive language are excluded. Finally, a total of 2,969 multi-sentence texts containing the greatest number of affective words found in the CVAW were selected for annotation.

3.2 Annotation Details

The annotation of VA ratings is accomplished by crowdsourcing. For the CVAW, each word is randomly assigned to five annotators for rating, and each instance of CVAP, CVAS, and CVAT is randomly assigned to 10 annotators. The number of annotators used for the CVAW is lower than that of CVAP, CVAS, and CVAT because the word-level rating is relatively easier to determine than at the phrase-, sentence-, and multi-sentence levels.

The annotation platform is implemented with the SAM rating scale [Bradley and Lang 1994] using the Google app engine. Figure 2 shows an example of the annotation screen. The top part of Figure. 2 presents an example sentence in the CVAT, followed by the picture-oriented SAM rating scale. Both the valence and arousal dimensions use a nine-degree scale. Value 1 on the valence and arousal dimensions respectively denotes extremely high-negative and low-arousal sentiment, while 9 denotes extremely high-positive and high-arousal sentiment, and 5 denotes a neutral and medium-arousal sentiment. The picture-oriented protocol can help annotators determine the VA ratings more precisely. Volunteer annotators use the annotation screen to provide the VA ratings for each instance in the CVAW, CVAP, CVAS, and CVAT.

Fig. 2.

Fig. 2. Annotation screen with the modified 9-point SAM rating scale.

3.3 Corpus Cleanup

Once the annotation process is finished, a cleanup procedure is performed to remove outlier ratings and improper instances (e.g., those containing abusive or vulgar language). Outliers are identified if they do not fall into the interval of the mean plus/minus 1.5 standard deviation (SD). They are then excluded from the calculation of the average VA ratings for each instance in the constructed Chinese EmoBank. Table 2 shows the annotation results of the example sentence presented in Figure 2. For the valence dimension, the rating 8 provided by annotator A10 is marked as an outlier because it exceeds the mean plus 1.5 standard deviation. Similarly, the rating 2 in the arousal dimension provided by annotator A1 is also marked as an outlier. After excluding outlier ratings, the (mean, SD) of the example sentence for the valence dimension is (5.778, 0.416) and that for the arousal dimension is (5.333, 0.667). To use the Chinese EmoBank to train a prediction model, the standard deviation is a useful metric to exclude instances with inconsistent annotations. For example, a previous study suggested excluding instances with a standard deviation higher than 2 [Paltoglou et al. 2013].

Table 2.
  • *denotes an outlier.

Table 2. Example of Corpus Cleanup

  • *denotes an outlier.

Skip 4RESULTS AND EVALUATION Section

4 RESULTS AND EVALUATION

This section presents the results of the annotation statistics, a visualization of the corpus, and an evaluation of dimension score prediction.

4.1 Results of the Chinese EmoBank

Table 3 shows the mean and standard deviation of the VA ratings for the Chinese EmoBank. The results show that the standard deviation of the arousal dimension is greater than that of valence, indicating that the arousal ratings are more difficult to be determined by the annotators.

Table 3.
Number of InstancesValenceArousal
MeanSDMeanSD
CVAW5,5124.5400.7175.0231.276
CVAP2,9984.5940.4515.6170.561
CVAS2,5824.6370.4104.9671.035
CVAT2,9694.8030.6644.8451.084

Table 3. Annotation Statistics of the Chinese EmoBank

Figure 3 shows the scatter plot of the CVAW. It presents a smile curve, indicating that both high-positive and high-negative words usually have a high arousal value. Table 4 lists several example words for the four quadrants of the VA plane.

Fig. 3.

Fig. 3. Scatter plot of the CVAW lexicon.

Table 4.

Table 4. Annotated Examples of the CVAW

For the CVAP, a total of 52 modifiers (including 4 negators, 42 degree adverbs, and 6 modals - see Table 5) are combined with the affective words in the CVAW to form the multi-word phrases. Table 6 shows the distribution of different phrase patterns in the CVAP. Table 7 lists an example phrase for each pattern. Figure 4 shows the scatter plot of the CVAP.

Table 5.

Table 5. Modifier Set Used in the CVAP

Table 6.
Phrase TypePatternNumber (ratio%)
2-word phrasesNegator + Word181 (6.04%)
Degree Adverb + Word1,160 (38.69%)
Modal + Word143 (4.77%)
3-word phrasesNegator + Degree Adverb + Word373 (12.44%)
Degree Adverb + Negator + Word646 (21.55%)
Modal + Negator + Word151 (5.05%)
Modal + Degree Adverb + Word323 (10.77%)
Degree Adverb + Modal + Word21 (0.70%)
TotalAll2,998 (100%)

Table 6. Distribution of the Phrase Pattern in the CVAP

Table 7.

Table 7. Annotated Examples of the CVAP

Fig. 4.

Fig. 4. Scatter plot of the CVAP.

Table 8 shows the distribution of the text categories along with their sentence and word counts in CVAS and CVAT. The CVAS is collected from Twitter alone and the CVAT is collected from web texts based on six different categories where News is the major class (50.83%). Figures 5 and 6 respectively show the scatter plot of single sentences and multi-sentence texts in the CVAS and CVAT. Both scatter plots present a smile curve, which is consistent with that of the CVAW and CVAP. Tables 9 and 10 respectively list several example sentences for the CVAS and CVAT.

Table 8.
CategoryNum. of Texts (ratio%)Num. of SentencesNum. of WordsAvg. Words
CVASTwitter2,582 (100%)2,58218,3837.12
CVATBook Review287 (9.67%)1,0076,9586.91
Car Forum253 (8.52%)85911,12412.95
Hotel Review299 (10.07%)1,0016,1016.10
Laptop Review182 (6.13%)7384,5386.15
Politics Forum439 (14.78%)1,71713,4207.82
News Article1,509 (50.83%)6,77150,0967.40
Total2,969 (100%)12,09392,2377.63

Table 8. Distribution of Text Categories in the CVAS and CVAT

Fig. 5.

Fig. 5. Scatter plot of the CVAS corpus.

Fig. 6.

Fig. 6. Scatter plot of the CVAT corpus.

Table 9.

Table 9. Annotated Examples of the CVAS

Table 10.

Table 10. Annotated Examples of the CVAT

4.2 Valence-Arousal Rating Prediction

To demonstrate the application of the constructed affective resources, this section evaluates the performance of the lexicon-based, regression-based, and neural-network-based methods for the valence-arousal rating prediction of the affective corpora.

This experiment used three affective corpora. (i) EmoBank [Buechel and Hahn 2017] contains 10,062 multi-sentence texts. Each text was rated with individual dimensions (valence/arousal/dominance) in the range of (1, 5). (ii) Chinese valence-arousal single-sentences (CVAS) and (iii) Chinese valence-arousal multi-sentence texts (CVAT) are our constructed affective corpora with VA ratings. The former is collected from Twitter. The latter consists of texts collected from six categories, including book reviews, car reviews, hotel reviews, laptop reviews, political commentary, and news.

The following methods were compared to demonstrate their performance. (i) Lexicon-based method [Paltoglou et al. 2013]: For the EmoBank, the Extended ANEW [Warriner et al. 2013] was used to predict the valence (or arousal) ratings of a given sentence by averaging the valence (or arousal) ratings of the words matched in the Extended ANEW. For both CVAS or CVAT, we used the CVAW and CVAP. (ii) Regression-based method: including the linear regression (LR) [Wei et al. 2011] and SVR [Paltoglou and Thelwall 2013; Amir et al. 2015]. (iii) Neural-Network-based method: including CNN, RNN, LSTM, attention LSTM [Yang et al. 2016], BERT [Devlin et al. 2018], and XLNet [Yang et al. 2019].

Experimental settings are described as follows. We used a 5-fold cross validation technique to evaluate the effectiveness of the above methods. In addition, the suggested default parameters shown in Table 11 were selected without further fine-tuning. The word vectors for English and Chinese were trained using the BERT technique [Devlin et al. 2018]. Pre-trained models with whole word masking were downloaded from official BERT GitHub website. For English, we used BERT-Large, Cased (24-layer 1024-hidden, 16-heads, 340 parameters). The dimensionality is 1024. For Chinese, we used BERT-Base, Chinese Simplified, and Traditional (12-layer, 768-hidden, 12-heads, 110 parameters). The dimensionality is 768. The pre-trained XLNet1 and BERT2 models are publicly available online for evaluation.

Table 11.
MethodsCNNRNN, LSTMAttentionXLNetBERT
Filter Number6060
Filters Length33
Pool Length22
Hidden State Dim.120120120120
Layer Number24 (Cht.)12 (Eng.)12
Hidden Number768768
Head Number1212
OptimizerAdam
Batch Size32
(Recurrent) Dropout0.25
Epoch20

Table 11. Hyper-parameters Used in the Classifiers

Performance was evaluated using the mean absolute error (MAE) and Pearson correction coefficient (r), defined as the follows

  • Mean Absolute Error (MAE):(1) \[\begin{equation}MAE = \frac{1}{n}\sum\limits_{i = 1}^n {|{a_i} - {p_i}|} \end{equation}\]

  • Person Correlation Coefficient (r):(2) \[\begin{equation}r = \frac{1}{{n - 1}}\sum\limits_{i = 1}^n {\left(\frac{{{a_i} - {\mu _A}}}{{{\sigma _A}}}\right)\left(\frac{{{p_i} - {\mu _P}}}{{{\sigma _A}}}\right)} \end{equation}\]

where \({a_i} \in A\) and \({p_i} \in P\) respectively denote the i-th actual value and predicted value, n is the number of test samples, μA and σA respectively represent the mean value and the standard deviation of A, while μP and σP respectively represent the mean value and the standard deviation of P. The MAE measures the error rate, and the r measures the linear correlation between the actual values and the predicted values. A lower MAE and a higher r indicate more accurate prediction performance.

Tables 12, 13, and 14 respectively show the prediction results for CVAP, CVAS and CVAT. All three datasets produce nearly consistent findings. The lexicon-based method provides baseline results. For both regression-based methods, the SVR approach outperformed the LR in both the valence or arousal dimensions. The BERT model outperformed the other neural-network-based methods in all dimensions. For CVAP, the Attention model underperformed the LSTM, possibly because phrases are usually very short with one or two modifiers to a word, thus raising challenges for the attention mechanism. Comparing the results achieved by our constructed CVAS and CVAT, all models on the CVAS clearly underperformed the corresponding model results on the CVAT. Based on our observations, the CVAS data containing single sentences from Twitter has more difficulty predicting valence-arousal ratings than multi-sentence texts that provide more information in CVAT. Table 15 also shows the EmoBank results for reference. The consistent conclusions confirm the reliability of our constructed Chinese EmoBank corpus.

Table 12.

Table 12. Comparative Results of Different Methods in CVAP

Table 13.

Table 13. Comparative Results of Different Methods in CVAS

Table 14.

Table 14. Comparative Results of Different Methods in CVAT

Table 15.

Table 15. Comparative Results of Different Methods in EmoBank

Skip 5CONCLUSIONS AND FUTURE WORK Section

5 CONCLUSIONS AND FUTURE WORK

This study constructs a language resource, the Chinese Emobank, annotated with valence-arousal ratings for dimensional sentiment analysis. The Chinese EmoBank presents a Chinese affective lexicon with 5,512 single words (CVAW) and 3,000 multi-word phrases (CVAP), and a Chinese affective corpus of 2,582 single sentences (CVAS) and 2,969 multi-sentence texts (CVAT) with six different categories, all annotated with valence-arousal values. A cleanup procedure removed outlier ratings and improper texts to improve annotation quality. Experimental results provide a feasibility evaluation and baseline performance for VA prediction using the constructed resources.

Future work will focus on developing advanced VA prediction methods and building useful dimensional sentiment applications based on the constructed resources. For example, Figures 36 show that the valence and arousal dimensions may correlate with each other. It is worth investigating how relations between dimensions can be integrated into the prediction model to enhance performance. Finally, we will release the entire Chinese EmoBank with fully annotated valence-arousal ratings to facilitate future development in related research areas.

Footnotes

REFERENCES

  1. Amir S., Astudillo R. F., Ling W., Martins B., Silva M., and Trancoso I.. 2015. INESC-ID: A regression model for large scale Twitter sentiment lexicon induction. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15).Google ScholarGoogle ScholarCross RefCross Ref
  2. Baccianella S., Esuli A., and Sebastiani F.. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10). 22002204.Google ScholarGoogle Scholar
  3. Bojanowski P., Grave E., Joulin A., and Mikolov T.. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135146.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bradley M. M. and Lang P. J.. 1994. Measuring emotion: The Self-Assessment Manikin and the semantic differential. Journal of Behavior Therapy and Experimental Psychiatry 25, 1 (1994), 4959.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bradley M. M. and Lang P. J.. 1999. Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical Report C-1, University of Florida, Gainesville, FL.Google ScholarGoogle Scholar
  6. Bradley M. M. and Lang P. J.. 2007. Affective norms for English text (ANET): Affective ratings of text and instruction manual. Technical Report D-1, University of Florida, Gainesville, FL.Google ScholarGoogle Scholar
  7. Buechel S. and Hahn U.. 2017. EmoBank: Studying the impact of annotation perspective and representation format on dimensional emotion analysis. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL’17). 578585.Google ScholarGoogle ScholarCross RefCross Ref
  8. Calvo R. A. and D'Mello S.. 2010. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing 1, 1, (2010), 1837. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Calvo R. A. and Kim S. M.. 2013. Emotions in text: Dimensional and categorical models. Computational Intelligence 29, 3 (2013), 527543.Google ScholarGoogle ScholarCross RefCross Ref
  10. Cortis K., Freitas A., Daudert T., Huerlimann M., Zarrouk M., Handschuh S., and Davis B.. 2017. SemEval-2017 Task 5: Fine-grained sentiment analysis on financial microblogs and news. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval’17). 519535.Google ScholarGoogle ScholarCross RefCross Ref
  11. Devlin J., Chang M. W., Lee K., and Toutanova K.. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Google ScholarGoogle Scholar
  12. Du S. and Zhang X.. 2016. Aicyber's system for IALP 2016 shared task: Character-enhanced word vectors and boosted neural networks. In Proceedings of the 2016 International Conference on Asian Language Processing (IALP’16). 161163.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ekman P.. 1992. An argument for basic emotions. Cognition and Emotion 6 (1992), 169200.Google ScholarGoogle ScholarCross RefCross Ref
  14. Feldman R.. 2013. Techniques and applications for sentiment analysis. Communications of the ACM 56, 4 (2013), 8289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Goel P., Kulshreshtha D., Jain P., and Shukla K. K.. 2017. Prayas at EmoInt 2017: An ensemble of deep neural architectures for emotion intensity prediction in Tweets. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA’17). 5865.Google ScholarGoogle ScholarCross RefCross Ref
  16. Huang C. L., Chung C. K., Hui N., Lin Y. C., Seih Y. T., Lam B. C. P., and Pennebaker J. W.. 2012. Development of the Chinese linguistic inquiry and word count dictionary. Chinese Journal of Psychology 54, 2 (2012), 185201.Google ScholarGoogle Scholar
  17. Huang M., Xie H., Rao Y., Feng J., and Wang F. L.. 2020. Sentiment strength detection with a context-dependent lexicon-based convolutional neural network. Information Sciences. 520 (2020), 389399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hutto C. J. and Gilbert E.. 2014. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of 8th International AAAI Conference on Weblogs and Social Media. 216225.Google ScholarGoogle Scholar
  19. Kiritchenko S., Mohammad S. M., and Salameh M.. 2016. SemEval-2016 Task 7: Determining sentiment intensity of English and Arabic phrases. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 4251.Google ScholarGoogle ScholarCross RefCross Ref
  20. Ku L. W. and Chen H. H.. 2007. Mining opinions from the web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology 58, 2 (2007), 18381850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Liu B.. 2012. Sentiment Analysis and Opinion Mining. Morgan & Claypool, Chicago, IL.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Malandrakis N., Potamianos A., Iosif E., and Narayanan S.. 2013. Distributional semantic models for affective text analysis. IEEE Transactions on Audio, Speech, and Language Processing 21, 11 (2013), 23792392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mikolov T., Chen Kai, Corrado Greg, and Dean Jeffrey. 2013a. Distributed representations of words and phrases and their compositionality. In Proceedings of Advances in Neural Information Processing Systems (NIPS’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mikolov T., Corrado G., Chen K., and Dean J.. 2013b. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR’13).Google ScholarGoogle Scholar
  25. Mohammad S. M.. 2018a. Word affect intensities. In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC’18). 174183.Google ScholarGoogle Scholar
  26. Mohammad S. M.. 2018b. Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18), 174184.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mohammad S. M. and Bravo-Marquez F.. 2017. WASSA-2017 shared task on emotion intensity. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA’17), 3449.Google ScholarGoogle ScholarCross RefCross Ref
  28. Mohammad S. M., Bravo-Marquez F., Salameh M., and Kiritchenko S.. 2018. Semeval-2018 Task 1: Affect in tweets, In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval’18). 117.Google ScholarGoogle ScholarCross RefCross Ref
  29. Neviarouskaya A., Prendinger H., and Ishizuka M.. 2011. SentiFul: A lexicon for sentiment analysis. IEEE Transactions on Affective Computing 2, 1 (2011), 2236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nielsen F. Å.. 2011. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. In Proceedings of the ESWC2011 Workshop on Making Sense of Microposts: Big Things Come in Small Packages.Google ScholarGoogle Scholar
  31. Paltoglou G. and Thelwall M.. 2013. Seeing stars of valence and arousal in blog posts. IEEE Transactions on Affective Computing 4, 1 (2013), 116123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Paltoglou G., Theunis M., Kappas A., and Thelwall M.. 2013. Predicting emotional responses to long informal text. IEEE Transactions on Affective Computing 4, 1 (2013), 106115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Pang B. and Lee L.. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2, 1–2 (2008), 1135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Pennington J., Socher R., and Manning C. D.. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 15321543.Google ScholarGoogle ScholarCross RefCross Ref
  35. Plutchik R.. 1991. The Emotions, Lanham, MD, USA: Univ. Press Amer.Google ScholarGoogle Scholar
  36. Preoţiuc-Pietro D., Eichstaedt J., Park G., Sap M., Smith L., Tobolsky V., Schwartz H. A., and Ungar L.. 2015. The role of personality, age, and gender in tweeting about mental illness. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 2130.Google ScholarGoogle ScholarCross RefCross Ref
  37. Preoţiuc-Pietro D., Schwartz H. A., Park G., Eichstaedt J., Kern M., Ungar L., and Shulman E.. 2016. Modelling valence and arousal in Facebook posts. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA’16). 915.Google ScholarGoogle ScholarCross RefCross Ref
  38. Ren J. and Nickerson J. V.. 2014. Online review systems: How emotional language drives sales. In Proceedings of the 20th Americas Conference on Information Systems (AMCIS’14).Google ScholarGoogle Scholar
  39. Rosenthal S., Nakov P., Kiritchenko S., Mohammad S. M., Ritter A., and Stoyanov V.. 2015. SemEval-2015 Task 10: Sentiment analysis in Twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 451463.Google ScholarGoogle ScholarCross RefCross Ref
  40. Russell J. A.. 1980. A circumplex model of affect. Journal of Personality and Social Psychology 39, 6 (1980), 1161.Google ScholarGoogle ScholarCross RefCross Ref
  41. Socher R., Perelygin A., Wu J. Y., Chuang J., Manning C. D., Ng A., and Potts C.. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 16311642.Google ScholarGoogle Scholar
  42. Tang D., Wei F., Qin B., Yang N., Liu T., and Zhou M.. 2016. Sentiment embeddings with applications to sentiment analysis. IEEE Transactions on Knowledge and Data Engineering 28, 2 (2016), 496509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Taboada M., Brooke J., Tofiloski M., Voll K., and Stede M.. 2011. Lexicon-based methods for sentiment analysis. Computational Linguistics 37, 2 (2011), 267307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Thelwall M., Buckley K., and Paltoglou G.. 2012. Sentiment strength detection for the social web. Journal of the Association for Information Science and Technology 63, 1 (2012), 163173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Vilares D., Doval Y., Alonsoa M. A., and Gómez-Rodríguez C.. 2016. LyS at SemEval-2016 Task 4: Exploiting neural activation values for Twitter sentiment classification and quantification. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 7984.Google ScholarGoogle ScholarCross RefCross Ref
  46. Warriner A. B., Kuperman V., and Brysbaert M.. 2013. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods 45, 4 (2013), 11911207.Google ScholarGoogle ScholarCross RefCross Ref
  47. Wang J., Yu L. C., Lai K. R., and Zhang X.. 2016a. Locally weighted linear regression for cross-lingual valence-arousal prediction of affective words. Neurocomputing 194, 271278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wang J., Yu L. C., Lai K. R., and Zhang X.. 2016b. Community-based weighted graph model for valence-arousal prediction of affective words. IEEE/ACM Trans. Audio, Speech and Language Processing 24, 11 (2016), 19571968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wang J., Yu L. C., Lai K. R., and Zhang X.. 2020. Tree-structured regional CNN-LSTM model for dimensional sentiment analysis. IEEE/ACM Transactions on Audio Speech and Language Processing 28, 581591.Google ScholarGoogle ScholarCross RefCross Ref
  50. Wei W. L., Wu C. H., and Lin J. C.. 2011. A regression approach to affective rating of Chinese words from ANEW. In Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction (ACII’11). 121131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wu C., Wu F., Huang Y., Wu S., and Yuan Z.. 2017. THU NGN at IJCNLP-2017 Task 2: Dimensional sentiment analysis for Chinese phrases with deep LSTM. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP’17). 4252.Google ScholarGoogle Scholar
  52. Yang Z., Yang D., Dyer C., He X., Smola A., and Hovy E.. 2016. Hierarchical attention networks for document classification. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT’16), 14801489.Google ScholarGoogle ScholarCross RefCross Ref
  53. Yang Z., Dai Z., Yang Y., Carbonell J., Salakhutdinov R., and Le Q. V.. 2019. XLNet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv: 1906.08237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Yu L. C., Lee L. H., Hao S., Wang J., He Y., Hu J., Lai K. R., and Zhang X.. 2016. Building Chinese affective resources in valence-arousal dimensions. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAAC-HLT’16). 540545.Google ScholarGoogle ScholarCross RefCross Ref
  55. Yu L. C., Wang J., Lai K. R., and Zhang X.. 2018. Refining word embeddings using intensity scores for sentiment analysis. IEEE/ACM Transactions on Audio Speech and Language Processing 26, 3 (2018), 671681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yu L. C., Wang J., Lai K. R., and Zhang X.. 2020. Pipelined neural networks for phrase-level sentiment intensity prediction. IEEE Transactions on Affective Computing 11, 3 (2020), 447458.Google ScholarGoogle ScholarCross RefCross Ref
  57. Zhu S., Li S., and Zhou G.. 2019. Adversarial attention modeling for multi-dimensional emotion regression. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 471480.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Chinese EmoBank: Building Valence-Arousal Resources for Dimensional Sentiment Analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 4
      July 2022
      464 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3511099
      Issue’s Table of Contents

      Copyright © 2022 Copyright held by the owner/author(s).

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 January 2022
      • Accepted: 1 September 2021
      • Revised: 1 June 2021
      • Received: 1 January 2021
      Published in tallip Volume 21, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format