Position-aware hierarchical transfer model for aspect-level sentiment classification
Introduction
Aspect-level sentiment classification (ASC) [1], [2], [3] is a fundamental branch of sentiment classification [4], [5], [6] that aims to predict the sentiment polarity (i.e., positive, neutral or negative) of the review/comment toward a given aspect. As shown in Fig. 1, the sentence “granted the space is smaller than most, it is the best service you will find in even the largest of restaurants.” contains two aspects-“space” and “service”. The sentiment polarity for “service” is positive, whereas it is negative for “space,” as expressed by the reviewer’s negative sentiment based on the small size.
One of the challenges in ASC is how to capture the related information for a given aspect. Most traditional methods, which are considerably laborious, rely on the well-designed features [7]. Recently, attention-based neural networks NNs have demonstrated good performance in capturing the salient parts of a sentence [8]. To highlight the aspect information, Ma et al. [9] and Wang et al. [10] incorporated the aspect representation into the attention mechanism for improved sentence representation. Additionally, the position information was studied to emphasize the aspect-related information [11], [12], [13]. Chen et al. [11] proposed a position-weighted memory, where the word closer word to the target obtains higher weight in the memory slide. To locate more accurate sentiment indicators for a given aspect, Li et al. [13] adopted a proximity strategy to scale the input of the convolutional layer with positional relevance between a word and an aspect. To the best of our knowledge, most of the existing work focuses on characterizing the aspect via attention or position modeling in the lower word level, whereas the higher-level information such as segment is neglected. Furthermore, how to integrate the multi-level positional information into attention to enhance the aspect-based sentence representation has not been studied well.
The other challenge is posed by the limited size of the benchmark datasets for ASC. Sentiment labelling for each aspect is very laborious task, therefore, there is a relatively small number of samples for ASC compared with sentence-level sentiment classification (SSC) that predicts the overall sentiment polarity for a sentence regardless of the aspect. Therefore, it is difficult for neural models to learn sufficient features for ASC. Transfer learning has achieved great success in natural language processing (NLP); therefore, several researchers show interest in investigating how to transfer the knowledge learned on large-scale datasets for boosting the performance of classification on small dataset. For example, the well-known word embeddings such as Word2Vec [14] and GloVe [15] have been widely used to boost the performance of various NLP tasks. To obtain more context-sensitive representations, several advanced models such as GPT [16], ELMO [17], and BERT [18] have been proposed recently. These models rely on very large data for pre-training and do not pay special attention to the aspects in different domains. The work most related to ours is the model proposed by He et al. [19] that transfers the document-level knowledge to ASC. However, they only considered transferring the knowledge on the shallow layers; the position information was not studied well.
To address the aforementioned two challenges, we propose a position-aware hierarchical transfer (PAHT) model for ASC in this paper. We first split each sentence into several segments via the rhetorical structure theory (RST) [20]. Subsequently, we propose a position-aware hierarchical network that models the aspect-based positional attention in both word and segment levels to obtain more salient information for a given aspect. To incorporate more aspect-related knowledge for boosting the performance, we design three strategies for sampling from the large-scale SSC datasets, and adapt our model to learn more knowledge by pre-training on the samples. For example, as shown in Fig. 1, we can learn related knowledge about the terms: “best service,” “restaurant,” “place,” and “small” from the SSC data. After that, the learned knowledge is transferred from four levels: embedding, word, segment, and classifier. We perform experiments on the following four benchmark ASC datasets: Restaurants14-16 and Laptop14 and learn the related knowledge from the large-scale SSC data such as Yelp 2014 and Amazon Electronics to boost the performance of ASC. The results show that the proposed model is comparable to the state-of-the-art approaches in ASC. Furthermore, the components of the proposed model: the position-aware hierarchical network and the hierarchical transfer have been demonstrated to be effective for ASC.
The main contributions of our work can be summarized as follows:
- 1.
We propose a new PAHT model that captures the aspect related information with a position-aware hierarchical network from the benchmark ASC datasets, and performs hierarchical transfer to learn more related knowledge from the resource-rich SSC datasets to improve the performance.
- 2.
To the best of our knowledge, this is the first attempt to transfer the knowledge from multiple levels for ASC. Particularly, we explore the influence of each level: embedding, word, segment, and classifier, which sheds light on how to perform effective transfer in similar tasks.
- 3.
We present an aspect-based positional attention for both words and segments, which can be naturally integrated into the position-aware hierarchical network to capture more salient information towards a given aspect.
- 4.
We conduct elaborate analyses of the experimental results on four benchmark ASC datasets that provide improved understanding of the effectiveness of our proposed model.
The rest of this paper is organized as follows. Section 2 presents an overview of the related work. Section 3 introduces the details of the proposed PAHT model. Section 4 describes the experimental setup details and Section 5 provides the experimental results and analyses. Finally, Section 6 concludes our work and presents ideas for future research.
Section snippets
Related work
Traditional machine learning methods and NNs are two popular techniques used in ASC [7], [21], [22], [23]. Machine learning methods with a supervised classifier highly rely on the quality of extensive features [7], [24]. However, NNs have attracted increased attention for used in ASC owing to the growing trend of reduced human intervention in tasks in recent years. Here, we mainly review the work about NN for ASC that is most related to our research. Furthermore, we introduce the relevant work
Proposed model
In this section, we present the detail of the PAHT model proposed by us. First, in Section 3.1, we describe the preliminaries. Subsequently, in Section 3.2, we present a position-aware hierarchical network to capture the aspect specific information. Finally, in Section 3.3, we introduce how to learn related knowledge from the SSC data to boost the performance of ASC by performing hierarchical transfer. The differences between the proposed and the existing approaches are summarized as follows.
Experimental setup
We first introduce the datasets and the evaluation metrics used in our experiments in Section 4.1. Furthermore, the details of the parameter settings and the baselines for comparison are provided in Sections 4.2 and 4.3, respectively.
Results and analyses
In this section, we report the experimental results and conduct extensive analyses. To be specific, we first make comparisons with the recent progressions in ASC to investigate the effectiveness of the proposed PAHT model in Section 5.1. Subsequently, we study the influence of each transferred level in Section 5.2. Furthermore, we discuss the influence of the sampling and epoch numbers in Sections 5.3 and 5.4, respectively. Finally, we provide an intuitive understanding of why the proposed
Conclusions and future work
In this paper, we proposed a PAHT model for ASC. Specifically, we first presented a position-aware hierarchical network that embeds the aspect-based positional attention in the word and the segment levels to capture the related information for a given aspect. To learn more domain knowledge for improving the performance of ASC, we incorporated hierarchical transfer to leverage full advantage of the related data obtained by utilizing three sampling strategies. Experimental results on the four
CRediT authorship contribution statement
Jie Zhou: Conceptualization, Data curation, Formal analysis, Writing - original draft. Qin Chen: Conceptualization, Formal analysis, Funding acquisition, Investigation, Writing - original draft. Jimmy Xiangji Huang: Conceptualization, Formal analysis, Investigation, Writing - original draft. Qinmin Vivian Hu: Conceptualization, Formal analysis, Investigation, Visualization, Writing - original draft. Liang He: Conceptualization, Formal analysis, Methodology, Writing - original draft.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personalrelationships that could have appeared to influence the work reported in this paper.
Acknowledgment
We deeply appreciate the anonymous reviewers and the associate editor for their valuable and high quality comments that greatly helped improve the quality of this paper. This research is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada, an NSERC CREATE award in ADERSIM6, the York Research Chairs (YRC) program and an ORF-RE (Ontario Research Fund-Research Excellence) award in BRAIN Alliance7. This research is also supported by the National Natural Science
References (49)
- et al.
Aspect based fine-grained sentiment analysis for online reviews
Inf. Sci.
(2019) - et al.
Three-way enhanced convolutional neural networks for sentence-level sentiment classification
Inf. Sci.
(2019) - et al.
Adaptive recursive neural network for target-dependent twitter sentiment classification
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL) (Volume 2: Short Papers)
(2014) - et al.
Survey on aspect-level sentiment analysis
IEEE Trans. Knowl. Data Eng.
(2016) - et al.
Deep learning for aspect-level sentiment classification: survey, vision and challenges
IEEE Access
(2019) - et al.
SNNN: promoting word sentiment and negation in neural sentiment classification
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence
(2018) - et al.
A co-memory network for multimodal sentiment analysis
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
(2018) - et al.
NRC-Canada-2014: detecting aspects and sentiment in customer reviews
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
(2014) - et al.
SAAN: a sentiment-aware attention network for sentiment analysis
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
(2018) - et al.
Interactive attention networks for aspect-level sentiment classification
Proceedings of International Joint Conferences on Artificial Intelligence (IJCAI)
(2017)