Top

Data Science and Engineering

Published in:

Open Access 15-06-2019

Sentiment Classification Using Negative and Intensive Sentiment Supplement Information

Authors: Xingming Chen, Yanghui Rao, Haoran Xie, Fu Lee Wang, Yingchao Zhao, Jian Yin

Published in: Data Science and Engineering | Issue 2/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Traditional methods of annotating the sentiment of an unlabeled document are based on sentiment lexicons or machine learning algorithms, which have shown low computational cost or competitive performance. However, these methods ignore the semantic composition problem displaying in several ways such as negative reversing and intensification. In this paper, we propose a new method for sentiment classification using negative and intensive sentiment supplementary information, so as to exploit the linguistic feature of negative and intensive words in conjunction with the context information. Particularly, our method can solve the domain-specific problem without relying on the external sentiment lexicons. Experimental results on two real-world datasets demonstrate the effectiveness of our proposed method.

Extended from their APWEB-WAIM 2018 papers

1 Introduction

Sentiment analysis is a fundamental task of classifying given instances into coarse-grained classes such as positive, neutral, and negative, or fine-grained classes (e.g., very positive, positive, neutral, negative, very negative) in natural language processing. The traditional way of conducting the above task is based on sentiment lexicons [3, 10, 31]. Sentiment lexicons can serve as a word-level basis to help analyze the sentiment of unlabeled documents for the discrete information such as polarities and strengths they contain. Lexicon-based methods mainly exploited features such as the counts, the total strengths, and the maximum strengths of positive and negative words [2, 12, 31]. For example, a straightforward method was proposed in [10, 29] by counting the aggregated value of sentiment strengths of all words that exist in a sentiment lexicon to analyze the sentiment of each document. Although such methods have been shown simple and efficient, they suffer from the imitation of existing sentiment lexicons. Particularly, a fixed polarity or strength is assigned to each word in a sentiment lexicon, but the same word under different domains may have different polarities or strengths. Take “hot” as an example, it expresses positive in comments of a popular song while negative in comments of a restaurant. Another stream of work focused on employing machine learning methods. Unlike lexicon-based models which leverage lexicon features to predict sentiments, machine learning-based methods are data dependent because most of them are trained on a specific corpus. Benefit from the growing of user generated messages, there are various deep neural networks including CNN [13, 15], recursive autoencoders [24, 25], and LSTM [6, 17, 27, 34], being exploited into sentiment analysis. However, these models also present some drawbacks despite their great success. For example, most of these methods ignore the semantic composition problem. Semantic composition can display in several ways such as negative reversing (e.g., not interesting), negative shifting (e.g., not terrific), and intensification (e.g., very good). Although the above issue can be alleviated by tree-structured models like recursive autoencoders and Tree-LSTM [27, 34], it is quite time-consuming to parse tree structures and annotate phrase-level features.

A method called supplementary information modeling [32] was recently proposed for sentiment classification by taking the role of negative and intensive words into consideration. In this method, two sentences “the movie is not good” and “the movie is very boring” approximate to “the movie is bad” and “the movie is boring + boring,” by generating new representation for “not good” and “very boring,” respectively. However, it may drop some context information by considering the sentiment supplementary information only. For the sake of overcoming the above shortcoming, we propose a new model for sentiment classification using negative and intensive sentiment supplementary information. In our model, we keep all information of the corpus in conjunction with the negative and intensive supplementary information which can exploit the linguistic feature of negative and intensive words. When a negative or intensive word exists in a sentence, we use a backward LSTM to encode the semantics after it and generate a new word embedding vector to emphasize the effect of the negative or intensive word. Furthermore, local sentiment attributes of all words through several deep learning networks are generated to help predict the sentiment of a given document. Our contributions are three-folds as follows:

This paper proposes the concept of semantic composition based on the linguistic role of negative and intensive words. Without dropping any context information, we develop a backward LSTM to model the reversing effect of negative words and the valence that modified by the intensive words on the following content.
Unlike previous lexicon-based methods which directly employ external sentiment lexicons, we generate the sentiment strength under different sentiment polarities of each word in the corpus, which can solve the domain-specific problem for methods based on external sentiment lexicons.
Our method can adapt to the situation that two negative or intensive words exist in a sentence, and can be employed to several deep neural network models to improve the performance on sentiment classification.

The remainder of this paper is organized as follows. We describe related work in Sect. 2. We present the method for sentiment classification in Sect. 3. We detail the dataset, results, and discussions in Sect. 4. Finally, we present conclusions in Sect. 5.

2.1 Lexicon-Based Sentiment Classification

Sentiment lexicons [10, 31] usually define the prior sentiment attributes (i.e., polarity or strengths) of a collection of words and phrases, which is useful for lexicon-based methods [2, 4, 7, 8, 13, 19, 26, 30] in sentiment classification. There are two mainstreams of lexicon-based methods, one is under the bag-of-words framework and the other is rule based. In the bag-of-words framework, lexicon features like the counts of sentiment words, their total strengths, and the maximum strength are leveraged to predict the sentiment of a given document. Beyond bag-of-words models that exploit sentiment lexicons, rule-based methods [20, 26] introduce semantic composition [21] into sentiment operations. However, the above methods are domain-specific primarily.

2.2 Deep Neural Networks for Sentiment Classification

Unlike lexicon-based methods that predict sentiments based on the bag-of-words assumption or straightforward rules, deep neural networks predict sentence-level sentiments of given documents by training various neural networks, such as convolutional [11, 13, 16, 23], on a large amount of labeled data. A noticeable work can be seen in Teng’s research [28], in which the sentiment of each sentence is determined by the weighted sum score of negative words and sentiment words, and the weights are learned by a neural network. These models all achieved competitive accuracies, and their main characteristics are as follows. Compared with recursive models, which usually require fine-grained annotations and tree-structured data, convolutional neural network does not need these features. Long short-term memory models [9], which are usually used to model the prefix or suffix context, can be also applied to sequential data and tree-structured data [27, 34]. A method called supplementary information modeling [32] is integrated to several deep neural networks to model the role of negative and intensive words for sentiment classification, and it is validated to be effective.

3 Proposed Model

This research aims to tackle the semantic composition issue of existing deep neural networks for sentiment classification. Particularly, we model the distinct effects of negative and intensive words through a LSTM network, as follows: (1) Negative Expression Modeling. Negative words, such as “not” and “never,” mainly have a sentiment reversing effect on the following expression. In most cases, the sentiment polarity of the content following a negative word is reversed. To this end, we convert a sentence such as “the movie is not good” to “the movie is bad” approximately, by employing a backward LSTM to model each negative word in conjunction with all following words. (2) Intensive Expression Modeling. Intensive words, such as “very” and “so,” mainly change the sentiment strength of the following expression (e.g., from the negative side to the very negative side). Take the sentence “the movie is so boring” as an example, by employing a backward LSTM on the expression of “so boring,” we can convert the original sentence into a new sentence like “the movie is boring + boring.” For the convenience of describing our method, frequently used notations are summarized in Table 1.

Table 1

Frequently used notations

Notation	Description
tw	The target word, i.e., a negative or intensive word
ntw	The number of target words
$S=[x_{1}, x_{2}, \ldots , x_{n}]$	A sentence with n words
$x_{i} \in R^{d}$	A d-dimensional word embedding of the ith word
$V=[v_{1}, v_{2}, \dots , v_{n+ntw}]$	The vector representations generated by our method
$F=[f_{1}, f_{2}, \ldots , f_{m}]$	A hidden vector containing sentiment feature of each sentence
$r_{\mathrm{avg}}$	A vector containing the average information
$f_{i}$	The sentiment feature of the ith word
$W\in ws*d$	A convolutional filter applied to continuous word embeddings
ssinfo	Sentiment supplement information extract by the backward LSTM
ssvec	A sentiment supplement vector generate by $\lambda$ times the ssinfo
$s_{i}^{1}$	The positive polarity strength of the ith sentence
$s_{i}^{0}$	The negative polarity strength of the ith sentence
$C_{\mathrm{p}}$	The predicted value of the positive label
$C_{\mathrm{n}}$	The predicted value of the negative label

To validate the effectiveness of the above operations, we incorporate them into three deep neural networks, CNN [12], LSTM [9], and CharSCNN [8], and denote these new models as NIM-CNN, NIM-LSTM, and NIM-CharSCNN, where “NIM” means “Negative and Intensive Modeling.” For example, the architecture of NIM-CNN is shown in Fig. 1.

In the following, we first describe how to generate sentiment supplementary information. Then, we detail the sentence encoding using the above information. Finally, we show methods of training the proposed model and predicting the sentiment of a new document.

3.1 Sentiment Supplementary Information Generation

We use LSTM to model the effect of negative and intensive words, which is called sentiment supplementary information. The generation of sentiment supplementary information (ssinfo) is shown in Fig. 2. A LSTM cell block employs an input gate $I_{t}$, a memory cell $C_{t}$, a forget gate $F_{t}$, and an output gate $O_{t}$ to make use of the information from the previous inputs. Formally, given the input $x_{t}$ at time step t, $O_{t}$ is computed as follows:

$$\begin{aligned} I_{t}= & {} \sigma (W_{i}x_{t}+U_{i}h_{t-1}+V_{i}C_{t-1}+b_{i}), \end{aligned}$$

(1)

$$\begin{aligned} F_{t}= & {} 1.0-I_{t}, \end{aligned}$$

(2)

$$\begin{aligned} G_{t}= & {} {\mathrm{tan}}h(W_{g}x_{t}+U_{g}h_{t-1}+b_{g}), \end{aligned}$$

(3)

$$\begin{aligned} C_{t}= & {} F_{t}\odot C_{t-1} + i_{t}\odot G_{t}, \end{aligned}$$

(4)

$$\begin{aligned} O_{t}= & {} \sigma (W_{o}x_{t}+U_{o}h_{t-1}+V_{o}C_{t}+b_{o}), \end{aligned}$$

(5)

where $h_{t}$ and $h_{t-1}$ denote the current and previous hidden state, respectively. $\sigma$ denotes to the sigmoid function, $\odot$ refers to element-wise multiplication, and $\{W_{i},U_{i},V_{i},b_{i},W_{q},U_{q},b_{q},W_{o},U_{o},V_{o},b_{o}\}$ are LSTM parameters.

A backward LSTM is used to encode the content following each target word (tw) and generate ssinfo, where tw denotes a negative or intensive word. We now discuss three situations. Firstly, given a sentence $S = [x_{1},x_{2},\ldots ,x_{i},\ldots ,x_{n}]$, where $x_{i}$ denotes the word embedding of the ith word in the sentence. If the sentence does not contain tw, the ssinfo of S is empty. Secondly, given a sentence $S = [x_{1},x_{2},\ldots ,x_{t-1},tw,x_{t+1},\ldots ,x_{n}]$, which contains one tw, we use the backward LSTM on word embeddings $\{x_{t+1},x_{t+2}\ldots ,x_{n}\}$ and get the ssinfo of S. Finally, given a sentence $S = [x_{1},\ldots ,x_{t-1},tw_{1},x_{t+1},\ldots ,x_{d-1},tw_{2},x_{d+1}\ldots ,x_{n}]$, which contains two instances of tw, we use a backward LSTM on word embeddings $\{x_{t+1},\ldots ,x_{d-1}\}$ and word embeddings $\{x_{d+1}\ldots ,x_{n}\}$ to generate ssinfo1 and ssinfo2, respectively. Considering the efficiency of the proposed model, we do not consider the situation that a sentence contains more than two target words.

3.2 Sentence Encoding

After the generation of ssinfo, we encode the sentence to a new vector representation V. Given the sentence $S = [x_{1}, x_{2}, \ldots , x_{t-1}, tw,$ $x_{t+1}, \ldots ,x_{n}]$, we get a new sentence representation $\{x_{1},x_{2},\ldots ,x_{t-1},tw,x_{t+1},\ldots ,x_{n},\lambda *ssinfo\}$ after applying the backward LSTM on the sequence $\{x_{t+1},\ldots ,x_{n}\}$. Here, we call $\lambda *ssinfo$ as a sentiment supplement vector (ssvec). For negative words, the value of $\lambda$ will be initially set to − 2. For intensive words, the value of $\lambda$ will be initially set to +1. After adding the above ssvec to each sentence, the model generates a new vector representation of the sentence which could be described as $V=[v_{1},v_{2},\ldots ,v_{i},\ldots ,v_{n+ntw}]$, where $v_{i}$ denotes the ith vector representation in V, and ntw denotes the number of target words.

3.3 Model Training and Sentiment Prediction

After sentence encoding, we can employ V to a deep neural network for model training and sentiment prediction. Take CNN as an instance. In order to extract the sentiment information of each element in V, a convolution operation which involves many filters $W\in ws*d$ is applied to V to generate the feature map of V by: $F=g(W*V)$, where “$*$” is a two-dimensional convolution operation and g indicates a nonlinear function. Then, the average pooling operation is employed to capture the average sentiment information, which is defined as:

$$\begin{aligned} r_{\mathrm{avg}}=\frac{1}{n+ntw-ws+1}\sum _{j=1}^{n+ntw-ws+1}f_{j}, \end{aligned}$$

(6)

where $f_{j}$ is the jth element of the feature map F.

The model uses two polarity related weight vectors (denoted as $C_{\mathrm{p}}$ and $C_{\mathrm{n}}$) and feature map vector R obtained by pooling layer to generate the score under different polarities (denoted as ${\mathrm{Score}}_{pi}$, ${\mathrm{Score}}_{ni}$) of the ith sentence. Here, we use $L_{i}$ to indicate the actual sentiment of the ith sentence. For the polarity, we use the softmax to calculate the possibility of being positive and negative as $s_{i}^{1}$ and $s_{i}^{0}$. In particular, $s_{i}^{1}$ and $s_{i}^{0}$ are estimated as:

$$\begin{aligned} s_{i}^{1}= & {} \frac{e^{\mathrm{Score}_{pi}}}{e^{\mathrm{Score}_{pi}}+e^{\mathrm{Score}_{ni}}}, \end{aligned}$$

(7)

$$\begin{aligned} s_{i}^{0}= & {} \frac{e^{\mathrm{Score}_{ni}}}{e^{\mathrm{Score}_{pi}}+e^{\mathrm{Score}_{ni}}}. \end{aligned}$$

(8)

Note that the whole model is trained end-to-end and ssinfo is also updated along with the other components. We use cross-entropy to calculate the loss of the model. Assume that there are N training sentences; the loss function is defined as:

$$\begin{aligned} {\mathrm{Loss}}|\theta |=-\sum _{i=1}^{N}L_{i}{\mathrm{log}} s_{i}^{L_{i}}+\frac{\lambda _{r}}{2}||\theta ||^2, \end{aligned}$$

(9)

where $\theta$ is the set of model parameters, $\lambda _{r}$ is a parameter for L2 regularization.

4 Experiments

4.1 Datasets

We evaluate the proposed model on three datasets. The first one is Movie Review (MR) [22], in which every sentence is annotated with two classes as positive and negative. The second one is Stanford Sentiment Treebank (SST) [16, 25], where each sentence is classified into five classes, including very negative, negative, neutral, positive, and very positive. The third one is Sentiment Labeled Sentences (SLS) [18], which is collected from reviews of products (Amazon), movies (IMDB), and restaurants (Yelp). Statistics of the three datasets are summarized in Table 2.

Table 2

Dataset statistics

Dataset	$N_S$	$L_S$	\|V\|	\|N\| (%)	\|I\| (%)
MR	10,662	20	18,376	33.7	53.2
SST	9613	17	17,439	25.8	49.8
SLS	3000	12	5170	27.8	39.0

$N_{\mathrm{S}}$ number of sentences, $L_{\mathrm{S}}$ average sentence length, |V| vocabulary size, |N| percentage of documents with negative words. |I|, percentage of documents with intensive words

Negative and intensive words are derived from Linguistic Inquiry and Word Count (LIWC2007), in which a certain word is labeled according to its characteristic or property. We use all negative words from the “Negate” part of LIWC2007 and select intensive words manually from the “Adverb” part by removing some words that are obviously not intensive words. Some of the negative and intensive words are shown in Table 3.

Table 3

Examples of negative and intensive words

Negative words	Cannot
	Negate
	Neither
	Never
	No
	Nobody
	...
Intensive words	Cannot
	Absolutely
	Completely
	Even
	Just
	Mostly
	...

4.2 Experiment Design

To evaluate the performance of the proposed NIM-CNN, NIM-LSTM, and NIM-CharSCNN, we implemented the following baselines for comparison:

Sentiment lexicon-based methods. Such methods annotate the sentiment of each unlabeled document by summing sentiment strengths of all words that exist in the sentiment lexicon [10, 29]. Here, we use three sentiment lexicons, which are SentiWordNet [3], SCL-NMA [14], and Opinion Lexicon [10].
CNN [12]. It generates sentence representation by a convolutional layer with multiple kernels (i.e., kernels’ size of 3, 4, 5 with 100 feature maps each) and pooling operations. Note that the dropout operation is added to prevent over-fitting.
LSTM [9]. The whole corpus is process as a single sequence, and LSTM generates the sentence representation by calculating the mean of the whole hidden states of all words. The hidden state size was empirically set to 128.
CharSCNN [8]. It employs two convolutional layers to extract features from character and sentence levels, and the output of the second convolutional layer is passed to two fully connected layer is passed to two fully connected layers to calculate the sentiment score. Empirically, the context windows of word and character were set to 1. The convolution kernel size of the character-level layer and that of the sentence-level layer were, respectively, set to 20 and 150.
Supplementary information modeling-based methods [32]. Such methods incorporate a kind of sentiment supplementary information into three neural networks, i.e., CNN, LSTM, and CharSCNN. These new models are denoted as NIS-CNN, NIS-LSTM, and NIS-CharSCNN, where “NIS” means “Negative and Intensive Supplement.”

Our experiments were implemented using the TensorFlow [1] and Keras [5] Python libraries. We used Stochastic Gradient Descent with Adadelta [33] for training, which can adjust the learning rate adaptively rather than rely on a global variable. We set the batch size at each iteration to 32 and the size of word embeddings to 300 for all datasets and models. Furthermore, word embeddings were obtained by the word2vec tool. All other parameters were initialized to their default values as specified in the TensorFlow and Keras libraries. For all datasets, we randomly selected 80% samples as the training set, 10% as validation samples and the remaining 10% for testing.

In our negative and intensive supplement method, LSTM’s hidden state sizes d and the dropout rate p were tuned on the validation set for each dataset. Particularly, the optimal values of these parameters for each dataset are shown in Table 4.

Table 4

Optimal hyperparameters for each dataset

Dataset	d	p
MR	128	0.5
SST	256	0.3
SLS	128	0.2

4.3 Evaluation Metrics

We use Accuracy, Precision, Recall and F-measure to evaluate the model performance, as follows:

$$\begin{aligned} {\mathrm{Accuracy}}= & \frac{\sum_{i=1}^{S}tp_i+tn_i}{\sum _{i=1}^{S}tp_i+fp_i+tn_i+fn_i},\\ {\mathrm{Precision}}= & {} \frac{\sum _{i=1}^{S}tp_i}{\sum_{i=1}^{S}tp_i+fp_i},\\ {\mathrm{Recall}}= & {} \frac{\sum_{i=1}^{S}tp_i}{\sum _{i=1}^{S}tp_i+fn_i},\\{\mathrm{F}}-{\mathrm{measure}}= & {}\frac{2*{\mathrm{Precision}}*{\mathrm{Recall}}}{{\mathrm{Precision}} + {\mathrm{Recall}}}{,} \end{aligned}$$

where $tp_i$ is 1 if the ith sentence is positive actually and the predicted label is positive, otherwise, it is 0. $tn_i$ is 1 if the ith sentence is negative actually and the predicted label is negative, otherwise, it is 0. $fp_i$ is 1 if the ith sentence is negative and the predicted label is positive, otherwise, it is 0. $fn_i$ is 1 if the ith sentence is positive and the predicted label is negative, otherwise, it is 0. S is the number of sentences.

4.4 Results and Analysis

As shown in Table 5, the proposed NIM-CNN performed better than those baselines for all datasets. Compared with deep neural network (DNN)-based methods, we can observe that sentiment lexicon-based methods have a relatively poor performance. On the one hand, sentiment lexicons are domain specific, but the same word under different domains may have different polarities and different strengths. On the other hand, sentiment lexicon-based methods are typically based on bag-of-words models which ignore the semantic composition problem. In contrast, DNN-based methods are data dependent and can learn high-level interactions among deep latent features which contribute a lot to predict the sentiment polarity.

Table 5

Accuracy (%) of all models on MR, SST, and SLS datasets

Model	Dataset	Accuracy	Precision	Recall	F1-measure
SentiWordNet [3]	MR	58.3	56.3	77.8	65.3
	SLS	64.9	59.7	86.1	70.5
	SST	60.1	59.0	78.4	67.3
SCL-NMA [14]	MR	60.9	59.2	82.0	68.7
	SLS	69.5	66.4	88.8	76.0
	SST	65.0	63.4	84.7	72.5
Opinion Lexicon [10]	MR	69.0	68.8	73.9	71.3
	SLS	80.6	76.5	94.9	84.7
	SST	73.7	74.6	78.4	76.5
CNN [12]	MR	78.9	79.5	77.9	78.7
	SLS	87.8	89.3	85.9	87.6
	SST	81.6	82.5	80.2	81.3
NIS-CNN [32]	MR	79.8	79.7	80.0	79.8
	SLS	88.3	89.7	86.5	88.1
	SST	82.1	82.6	81.3	82.0
IM-CNN	MR	79.1	79.2	78.9	79.1
	SLS	88.0	89.0	86.7	87.8
	SST	81.7	82.3	80.1	81.5
NM-CNN	MR	79.7	79.9	79.4	79.6
	SLS	88.4	89.5	87.0	88.2
	SST	82.2	82.8	81.3	82.0
NIM-CNN	MR	80.1	79.8	80.6	80.2
	SLS	88.6	89.9	87.0	88.4
	SST	82.3	82.9	81.4	82.1
LSTM [9]	MR	75.9	75.4	76.9	76.1
	SLS	85.8	86.0	85.5	85.8
	SST	75.8	77.6	72.3	75.0
NIS-LSTM [32]	MR	76.2	76.0	76.6	76.3
	SLS	86.1	86.3	85.8	86.1
	SST	76.3	77.5	74.1	75.8
IM-LSTM	MR	76.0	75.3	77.4	76.3
	SLS	85.9	85.8	86.0	85.9
	SST	75.8	77.9	72.0	74.9
NM-LSTM	MR	76.2	75.8	77.0	76.4
	SLS	86.2	86.3	86.1	86.2
	SST	76.2	78.4	72.3	75.2
NIM-LSTM	MR	76.3	76.2	76.5	76.3
	SLS	86.4	86.5	86.3	86.4
	SST	76.4	78.7	72.4	75.4
CharSCNN [8]	MR	74.0	75.1	71.8	73.4
	SLS	86.4	88.6	85.3	86.9
	SST	81.7	83.1	79.6	81.3
NIS-CharSCNN [32]	MR	74.4	75.5	72.2	73.8
	SLS	86.9	88.4	85.9	87.3
	SST	82.0	83.5	79.8	81.6
IM-CharSCNN	MR	74.1	75.1	72.1	73.6
	SLS	86.6	88.4	85.2	86.7
	SST	81.9	83.3	79.8	81.5
NM-CharSCNN	MR	74.4	75.7	71.9	73.7
	SLS	86.9	88.9	85.7	87.3
	SST	82.2	83.4	80.4	81.8
NIM-CharSCNN	MR	74.6	75.2	73.4	74.3
	SLS	87.3	88.8	85.9	87.3
	SST	82.3	83.6	80.4	82.0

We also conducted ablation experiments to evaluate the functional performance of negative words and intensive words, respectively; these experiments were conducted on the three datasets mentioned above. First of all, we conducted the experiment with no negative and intensive words. Then, we removed either negative words or intensive words each time on the basis of our model and executed the NM-CNN and the IM-CNN on the whole dataset, respectively. In Table 5, significant improvement could be observed among CNN, NIS-CNN and NIM-CNN on MR (the accuracy increases from 78.9% to 79.8%, and then to 80.1%), SST (the accuracy increases from 81.6% to 82.1%, and then to 82.3%), SLS (the accuracy increases from 87.8% to 88.3%, and then to 88.6%), which validated the effectiveness of NIM-CNN on modeling the linguistic role of negative and intensive words. To further validate the effectiveness of the supplement information provided by negative and intensive words, we conducted similar ablation experiments on LSTM and CharSCNN. Improvements could also be observed among LSTM, NIS-LSTM, and NIM-LSTM on MR (the accuracy increases from 75.9 to 76.2%, and then to 76.3%), SST (the accuracy increases from 75.8 to 76.3%, and then to 76.4%), and SLS (the accuracy increases from 85.8 to 86.1%, and then to 86.4%), as well as among CharSCNN, NIS-CharSCNN, and NIM-CharSCNN on MR (the accuracy increases from 74.0 to 74.4%, and then to 74.6%), SST (the accuracy increases from 81.7 to 82.0%, and then to 82.3%), and SLS (the accuracy increases from 86.4 to 86.9%, and then to 87.3%). Although supplementary information modeling methods performed better than conventional DNN-based methods by taking the role of negative and intensive words into consideration, it may drop some salient context information since it removes the target words and the sequences of words following them. By keeping the context information of the whole sentence, our ssinfo has a positive impact on the performance.

The performance improvement of our model over baselines on the MR dataset is larger than that on SST when considering negative and intensive words. The reason may be that although the total number of sentences in MR is similar to that in SST, the percentage of sentences with negative words in MR is larger than that in SST. To sum up, modeling the sentiment reversing effect of negative words can significantly improve the accuracy of sentiment prediction by correcting the labels of sentences with negative words that are annotated with wrong labels.

However, we also observe that methods with negative words showed significant improvement on the accuracy of sentiment classification when compared with methods without negative and intensive words, while methods with intensive words only showed a slight improvement and even a little descend. To explore the reason behind such phenomenon, we conducted detail experiments as follows.

For negative words, we extracted all the sentences with negative words in the MR dataset and compared the probability under different polarities predicted by CNN and NM-CNN. We could observe in Table 6, for those sentences with negative words that were annotated with the false label by CNN, NM-CNN can correct such faults and consequently improve the accuracy. This is because not only CNN, but also LSTM and CharSCNN, do not take the sentiment reversing effect of negative words into account and do not process the negative words as special words. Therefore, when we model the sentiment reversing effect of negative words and introduce it into CNN, the probability under different polarities will reverse too. Then, we can correct those sentences that are classified into wrong classes by CNN.

Table 6

Examples about the effect of negative words on MR dataset

Sentence	NW	C	CNN		NM-CNN
Sentence	NW	C	Pos	Neg	Pos	Neg
You cannot help but get caught up	Cannot	Positive	38.1	61.9	58.3	41.7
Hollywood wouldn’t have the guts to make	Not	Positive	41.7	58.3	61.4	38.6
The story is nowhere near gripping enough	Nowhere	Negative	69.1	30.9	32.9	67.1

NW negative word, C category. Pos the predicted probability of being positive (%). Neg the predicted probability of being negative (%)

For intensive words, we conducted similar experiments as on negative words. From the experimental results, we can draw the reason of why the modeling of intensive words cannot improve the accuracy as much as negative words, that is, intensive words just change the sentiment level of the sentence with intensive words but do not change the sentiment polarity. Therefore, even though we can successfully model the sentiment shifting effect of intensive words and incorporate it into basic methods, the new methods still annotate the sentence with the same label as the original method has labeled. For example, in Table 7, the sentence “An extremely unpleasant film” with the intensive word “extremely” is labeled correctly by CNN. When considering the sentiment shifting effect of intensive words, the probability of negative predicted by IM-CNN is higher than the probability of being positive, while the label keeps negative too. In summary, when a sentence is annotated with a false label, considering intensive words will not help to correct it. Intensive words may play a more significant role in fine-grained sentiment classification tasks.

Table 7

Examples about the effect of intensive words on MR dataset

Sentence	IW	C	CNN		IM-CNN
Sentence	IW	C	Pos	Neg	Pos	Neg
An extremely unpleasant film	Extremely	Negative	22.8	77.2	5.3	94.7
Really quite funny	Really	Positive	73.5	26.5	88.0	12.0
Too silly to take seriously	Too	Negative	19.8	80.2	11.5	88.5
The tenderness of the piece is still intact	Still	Positive	52.8	47.2	42.7	57.3

IW intensive word, C category, Pos the predicted probability of being positive (%), Neg the predicted probability of being negative (%)

5 Conclusions

In this work, we proposed an effective model for sentiment classification. Without drop any context information, the proposed model addressed the sentiment reversing effect of negative words and the sentiment shifting effect of intensive words. Experimental results validated the effectiveness of our model. In the future, we plan to introduce attention mechanisms to model the valence of every word in the sentence, including the negative and intensive words that change the sentiment of the sentence. Furthermore, we will apply the similar process on negative and intensive words to conjunctions, which may shift the sentiment level of a sentence to some extent.

Acknowledgements

We are grateful to the anonymous reviewers for their valuable comments on this article. This research has been supported in part by the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS11/E03/16), the Interdisciplinary Research Scheme of the Dean’s Research Fund 2018–19 (FLASS/DRF/IDS-3), Top-Up Fund (TFG-04) for General Research Fund/Early Career Scheme of the Dean’s Research Fund (DRF) 2018–19 and the Internal Research Grant (RG 90/2018–2019R) of The Education University of Hong Kong, the National Key R&D Program of China (2018YFB1004404), Key R&D Program of Guangdong Province (2018B010107005), and National Natural Science Foundation of China (U1711262, U1401256, U1501252, U1611264, U1711261). The preliminary version of this article has been published in APWeb-WAIM 2018 [32].

Compliance with Ethical Standards

Yes.

Conflict of interest

All authors declare that they have no conflict of interest.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

previous article Distributed Similarity Queries in Metric Spaces

next article Neural-Brane: Neural Bayesian Personalized Ranking for Attributed Network Embedding

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI volume 16, pp 265–283

Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of twitter data. In: Proceedings of the workshop on languages in social media. Association for Computational Linguistics, pp 30–38

Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, vol 10, pp 2200–2204

Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107CrossRef

Choi K, Joo D, Kim J (2017) Kapre: On-gpu audio preprocessing layers for a quick implementation of deep neural network models with keras. arXiv:1706.05781

Chung J, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555

Dong L, Wei F, Tan C, Tang D, Zhou M, Ke X (2014) Adaptive recursive neural network for target-dependent twitter sentiment classification. In: ACL, vol 2, pp 49–54

Guerini M, Gatti L, Turchi M (2013) Sentiment analysis: how to derive prior polarities from sentiwordnet. arXiv:1309.5843

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

10.

Hu M, Liu B (2004) Mining and summarizing customer reviews. In: SIGKDD, pp 168–177

11.

Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. arXiv:1404.2188

12.

Kim S-M, Hovy E (2004) Determining the sentiment of opinions. In: COLING, pp 1367–1373

13.

Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882

14.

Kiritchenko S, Mohammad S (2016) The effect of negators, modals, and degree adverbs on sentiment composition. In: Proceedings of the 7th workshop on computational approaches to subjectivity, sentiment and social media analysis. Association for Computational Linguistics, pp 43–52

15.

Lei T, Barzilay R, Jaakkola T (2015) Molding CNNs for text: non-linear, non-consecutive convolutions. arXiv:1508.04112

16.

Li J, Luong M-T, Jurafsky D, Hovy E (2015) When are tree structures necessary for deep learning of representations? arXiv preprint arXiv:1503.00185

17.

Mikolov T (2012) Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April

18.

Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

19.

Mohammad SM, Kiritchenko S, Zhu X (2013) Nrc-Canada: building the state-of-the-art in sentiment analysis of tweets. arXiv:1308.6242

20.

Moilanen K, Pulman S (2007) Sentiment composition. In: RANLP, vol 7, pp 378–382

21.

Montague R (1974) Formal philosophy: selected papers of Richard Montague. Ed. and with an introd. by Richmond H. Thomason. Yale University Press, New Haven

22.

Pang B, Lee L (2005) Exploiting class relationships for sentiment categorization with respect rating sales. ACL, pp 115–124

23.

Ren Y, Zhang Y, Zhang M, Ji D (2016) Context-sensitive twitter sentiment classification using neural network. In: AAAI, pp 215–221

24.

Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: EMNLP, pp 151–161

25.

Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642

26.

Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput linguist 37(2):267–307CrossRef

27.

Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075

28.

Teng Z, Vo DT, Zhang Y (2016) Context-sensitive lexicon features for neural sentiment analysis. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1629–1638

29.

Turney PD (2002) Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In: ACL, pp 417–424

30.

Vo D-T, Zhang Y (2015) Target-dependent twitter sentiment classification with rich automatic features. In: IJCAI, pp 1347–1353

31.

Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: EMNLP, pp 347–354

32.

Xu Z, Fu Y, Chen X, Rao Y, Xie H, Wang FL, Peng Y (2018) Sentiment classification via supplementary information modeling. In: APWeb-WAIM, pp 54–62

33.

Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701

34.

Zhu X, Sobihani P, Guo H (2015) Long short-term memory over recursive structures. In: ICML, pp 1604–1612

Title: Sentiment Classification Using Negative and Intensive Sentiment Supplement Information
Authors: Xingming Chen
Yanghui Rao
Haoran Xie
Fu Lee Wang
Yingchao Zhao
Jian Yin
Publication date: 15-06-2019
Publisher: Springer Berlin Heidelberg
Published in: Data Science and Engineering / Issue 2/2019
Print ISSN: 2364-1185
Electronic ISSN: 2364-1541
DOI: https://doi.org/10.1007/s41019-019-0094-8

Springer Professional

Sentiment Classification Using Negative and Intensive Sentiment Supplement Information

Abstract

1 Introduction

2.1 Lexicon-Based Sentiment Classification

2.2 Deep Neural Networks for Sentiment Classification

3 Proposed Model

3.1 Sentiment Supplementary Information Generation

3.2 Sentence Encoding

3.3 Model Training and Sentiment Prediction

4 Experiments

4.1 Datasets

4.2 Experiment Design

4.3 Evaluation Metrics

4.4 Results and Analysis

5 Conclusions

Acknowledgements

Compliance with Ethical Standards

Conflict of interest

Premium Partner

Notation	Description
tw	The target word, i.e., a negative or intensive word
ntw	The number of target words
\(S=[x_{1}, x_{2}, \ldots , x_{n}]\)	A sentence with n words
\(x_{i} \in R^{d}\)	A d-dimensional word embedding of the ith word
\(V=[v_{1}, v_{2}, \dots , v_{n+ntw}]\)	The vector representations generated by our method
\(F=[f_{1}, f_{2}, \ldots , f_{m}]\)	A hidden vector containing sentiment feature of each sentence
\(r_{\mathrm{avg}}\)	A vector containing the average information
\(f_{i}\)	The sentiment feature of the ith word
\(W\in ws*d\)	A convolutional filter applied to continuous word embeddings
ssinfo	Sentiment supplement information extract by the backward LSTM
ssvec	A sentiment supplement vector generate by \(\lambda\) times the ssinfo
\(s_{i}^{1}\)	The positive polarity strength of the ith sentence
\(s_{i}^{0}\)	The negative polarity strength of the ith sentence
\(C_{\mathrm{p}}\)	The predicted value of the positive label
\(C_{\mathrm{n}}\)	The predicted value of the negative label

Springer Professional

Abstract

1 Introduction

2 Related Work

2.1 Lexicon-Based Sentiment Classification

2.2 Deep Neural Networks for Sentiment Classification

3 Proposed Model

3.1 Sentiment Supplementary Information Generation

3.2 Sentence Encoding

3.3 Model Training and Sentiment Prediction

4 Experiments

4.1 Datasets

4.2 Experiment Design

4.3 Evaluation Metrics

4.4 Results and Analysis

5 Conclusions

Acknowledgements

Compliance with Ethical Standards

Consent for Publication

Conflict of interest

Other articles of this Issue 2/2019

An Efficient CGM-Based Parallel Algorithm for Solving the Optimal Binary Search Tree Problem Through One-to-All Shortest Paths in a Dynamic Graph

Scaling Word2Vec on Big Corpus

Neural-Brane: Neural Bayesian Personalized Ranking for Attributed Network Embedding

Estimating the Optimal Number of Clusters k in a Dataset Using Data Depth

Distributed Similarity Queries in Metric Spaces

Premium Partner