1 Introduction
2 Related work
3 Data
notion
as an aspect category paired with a review (or sentence) in which it is mentioned. Each notion
has a textual unit which contains the text of the review (or sentence). Our training data contains 335 reviews with 1435 review-level notion
instances and 2455 sentence-level notion
instances. The test data contains 90 reviews with 404 review-level notion
instances and 859 sentence-level notion
instances.4 Method
notion
instances, an aspect category paired with a review (or sentence) in which it is mentioned, from this preprocessed data.4.1 Ontology
Ontology concept | URL |
---|---|
AmbienceNegativeProperty | |
AmbiencePositiveProperty | |
GenericNegativeProperty | |
GenericPositiveProperty | |
ServiceNegativeProperty | |
ServicePositiveProperty | |
SustenanceNegativeProperty | |
Meat |
4.2 Algorithms
notion
we create a new feature vector instance. Our second algorithm is a sentence aggregation algorithm and a more refined method for the prediction of the aspect sentiments in reviews. We once again use a linear multi-class SVM, though now with the classes positive, negative, and neutral. Contrary to the review-based algorithm, we predict the sentiment of aspects in a single sentence instead of a review. Using these predictions, we use an aggregation step to sum up the predicted polarities of each aspect per sentence. This step is shown in Eq. 1, where \(p_{a,r}\) is the expressed polarity of a given aspect a within a given review r, s is a sentence contained in review r, and \(p_{a,s}\) is the computed polarity of aspect a in sentence s. Thus, if a review has for example five sentences, where in three of them the FOOD#QUALITY aspect appears, we sum up the predicted polarity of these three sentences. Note the difference between the neutral and conflicted cases.4.3 Model features
4.3.1 Feature generators
notion
, we use its corresponding aspect category as a feature in the SVM model, using dummy variables.notion
and adds this value to the feature vector.notion
. For this item, all words within the dataset are added to the SVM feature vector and the instance value is set equal to one if the word appears in the textual unit of the current notion
, and zero otherwise (cf. bag-of-words model).notion
whether it is a lexicalization of a concept in our ontology. If this is the case, we then find all superclasses of this class. If at least one of these superclasses is related to the current aspect category (e.g., SERVICE#GEN-ERAL) with the aspect annotation, we set all features that correspond to these superclasses in the feature vector to one. By adding all the superclasses, we can make use of implicitly stated information.4.3.2 Feature adaptors
notion
; however, ‘starter’ does not appear as a lexicalization in the ontology. We thus consider the synonyms of ‘starter.’ The word ‘starter’ has, among others, the synonyms ‘newcomer’ and ‘appetizer,’ yet only ‘appetizer’ appears in our ontology. Therefore, we select only the set of synonyms which contain at least one concept that is already in the ontology. For this to work, we assume that a word is used with only one meaning (the domain related one) in our domain text.notion
to create features, we determine a set of word windows. The pseudocode describing this step can be found in Algorithm 1.
notion
. To illustrate the concept of a word window, consider the word ‘prices’ which appears in the sentence ‘Prices too high for this cramped and unappealing restaurant.’ The word window surrounding ‘prices’ is [Prices, too, high, restaurant, .], where we, for this example, assume that k is equal to one.4.4 Parameter optimization
m
|
k
| ||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | |
1.0 | 0.8182 | 0.8193 | 0.8114 | 0.8138 | 0.8188 |
2.0 | 0.8141 | 0.8192 | 0.8129* | 0.8096 | 0.8194 |
3.0 | 0.8220 | 0.8276 | 0.8123 | 0.8178 | 0.8112 |
4.0 | 0.8295 | 0.8115 | 0.8111 | 0.8121 | 0.8141 |
5.0 |
0.8320
| 0.8140 | 0.8141 | 0.8118 | 0.8145* |
6.0 | 0.8199 | 0.8154 | 0.8128 | 0.8202 | 0.8094 |
7.0 | 0.8236 | 0.8249 | 0.8178 | 0.8148 | 0.8211 |
8.0 | 0.8221 | 0.8140 | 0.8068 | 0.8144 | 0.8102 |
9.0 | 0.8153 | 0.8195 | 0.8141 | 0.8173 | 0.8146 |
10.0 | 0.8242 | 0.8162 | 0.8159 | 0.8222 | 0.8244 |
Tenfold cross-valid. | p value | Training data | Test data | ||
---|---|---|---|---|---|
Avg. \(F_1\) | SD |
\(F_1\)
|
\(F_1\)
| ||
baseSL | 0.7008 | 0.0513 | – | 0.8436 | 0.7229 |
ontSL | 0.8217 | 0.0500 | < 0.0001 | 0.8811 | 0.7963 |
4.5 Model evaluation
5 Evaluation
5.1 Performance
Tenfold cross-valid. | p value | Test data | ||
---|---|---|---|---|
Avg. \(F_1\) | SD |
\(F_1\)
| ||
baseSA | 0.6897 | 0.0616 | – | 0.6824 |
ontSA | 0.8130 | 0.0512 | < 0.0001 | 0.7717 |
Gold value (upper bound) | 0.9633 |
5.2 Data size sensitivity
Tenfold cross-valid. | p value versus base | Training set | Test set | ||
---|---|---|---|---|---|
Avg. \(F_1\) | SD |
\(F_1\)
|
\(F_1\)
| ||
Base | 0.7852 | 0.0524 | – | 0.8718 | 0.8020 |
Final | 0.8001 | 0.0506 | < 0.0001 | 0.8753 | 0.8119 |
Team | (Un)Constrained | Accuracy |
---|---|---|
UWB | Unconstrained | 0.8193 |
ECNU | Unconstrained | 0.8144 |
final
| Unconstrained | 0.8119 |
UWB | Constrained | 0.8094 |
ECNU | Constrained | 0.7871 |
bunji | Unconstrained | 0.7055 |
bunji | Constrained | 0.6658 |
GTI | Unconstrained | 0.6411 |
5.3 SVM model comparison
Avg. \(F_1\) | SD | Training data | Test data | |
---|---|---|---|---|
\(F_1\)
|
\(F_1\)
| |||
Linear kernel | 0.8001 | 0.0506 | 0.8753 | 0.8119 |
RBF kernel | 0.7987 | 0.0473 | 0.8676 | 0.7921 |
notions
with sentiment label neutral or conflict were set to negative or positive. Table 8 shows that the multi-class model has a better accuracy than the binary models.Neutral | Conflict | Training data | Test data |
---|---|---|---|
Predicted as |
\(F_1\)
|
\(F_1\)
| |
Negative | Negative | 0.9080 | 0.7401 |
Positive | Negative | 0.9136 | 0.7401 |
Negative | Positive | 0.9171 | 0.7401 |
Positive | Positive | 0.9185 | 0.7451 |
Multi-class | 0.8753 | 0.8119 |
0.2381 | Sentiment count: numNegative |
0.1159 | Ontology: Negative |
0.0821 | Lemma: ‘not’ |
0.0638 | Ontology: SustenanceNegativeProperty |
0.0557 | Sentiment count: numPositive |
0.0539 | Ontology: ServiceNegativeProperty |
0.0522 | Lemma: ‘do’ |
0.0517 | Sentence count: numSentences |
0.0515 | Ontology: GenericNegativeProperty |
0.0443 | Lemma: ‘horrible’ |
Tenfold cross-valid | Training data | Test data | ||
---|---|---|---|---|
Avg. \(F_1\) | SD |
\(F_1\)
|
\(F_1\)
| |
base
| 0.7852 | 0.0524 | 0.8718 | 0.8020 |
+Weight | 0.7969 | 0.0446 | 0.8808 | 0.8020 |
+Sentiment count | 0.7988 | 0.0459 | 0.8808 | 0.8045 |
+Negation handling | 0.7992 | 0.0459 | 0.8808 | 0.8045 |
final
| 0.8001 | 0.0506 | 0.8753 | 0.8119 |
5.4 Feature analysis
notions
within the dataset are labeled as positive, features that expose the negativity of a textual unit are important. Furthermore, we also calculate the internal attribute weights of the final SVM model. The 80 features with the largest weight are all ontology-related features such as Negative, Boring and Cozy. This emphasizes the added value of an ontology.