Elsevier

Information Sciences

Volume 513, March 2020, Pages 1-16
Information Sciences

Position-aware hierarchical transfer model for aspect-level sentiment classification

https://doi.org/10.1016/j.ins.2019.11.048Get rights and content

Abstract

Recently, attention-based neural networks (NNs) have been widely used for aspect-level sentiment classification (ASC). Most neural models focus on incorporating the aspect representation into attention, however, the position information of each aspect is not studied well. Furthermore, the existing ASC datasets are relatively small owing to the labor-intensive labeling that largely limits the performance of NNs. In this paper, we propose a position-aware hierarchical transfer (PAHT) model that models the position information from multiple levels and enhances the ASC performance by transferring hierarchical knowledge from the resource-rich sentence-level sentiment classification (SSC) dataset. We first present aspect-based positional attention in the word and the segment levels to capture more salient information toward a given aspect. To make up for the limited data for ASC, we devise three sampling strategies to select related instances from the large-scale SSC dataset for pre-training and transfer the learned knowledge into ASC from four levels: embedding, word, segment and classifier. Extensive experiments on four benchmark datasets demonstrate that the proposed model is effective in improving the performance of ASC. Particularly, our model outperforms the state-of-the-art approaches in terms of accuracy over all the datasets considered.

Introduction

Aspect-level sentiment classification (ASC) [1], [2], [3] is a fundamental branch of sentiment classification [4], [5], [6] that aims to predict the sentiment polarity (i.e., positive, neutral or negative) of the review/comment toward a given aspect. As shown in Fig. 1, the sentence “granted the space is smaller than most, it is the best service you will find in even the largest of restaurants.” contains two aspects-“space” and “service”. The sentiment polarity for “service” is positive, whereas it is negative for “space,” as expressed by the reviewer’s negative sentiment based on the small size.

One of the challenges in ASC is how to capture the related information for a given aspect. Most traditional methods, which are considerably laborious, rely on the well-designed features [7]. Recently, attention-based neural networks NNs have demonstrated good performance in capturing the salient parts of a sentence [8]. To highlight the aspect information, Ma et al. [9] and Wang et al. [10] incorporated the aspect representation into the attention mechanism for improved sentence representation. Additionally, the position information was studied to emphasize the aspect-related information [11], [12], [13]. Chen et al. [11] proposed a position-weighted memory, where the word closer word to the target obtains higher weight in the memory slide. To locate more accurate sentiment indicators for a given aspect, Li et al. [13] adopted a proximity strategy to scale the input of the convolutional layer with positional relevance between a word and an aspect. To the best of our knowledge, most of the existing work focuses on characterizing the aspect via attention or position modeling in the lower word level, whereas the higher-level information such as segment is neglected. Furthermore, how to integrate the multi-level positional information into attention to enhance the aspect-based sentence representation has not been studied well.

The other challenge is posed by the limited size of the benchmark datasets for ASC. Sentiment labelling for each aspect is very laborious task, therefore, there is a relatively small number of samples for ASC compared with sentence-level sentiment classification (SSC) that predicts the overall sentiment polarity for a sentence regardless of the aspect. Therefore, it is difficult for neural models to learn sufficient features for ASC. Transfer learning has achieved great success in natural language processing (NLP); therefore, several researchers show interest in investigating how to transfer the knowledge learned on large-scale datasets for boosting the performance of classification on small dataset. For example, the well-known word embeddings such as Word2Vec [14] and GloVe [15] have been widely used to boost the performance of various NLP tasks. To obtain more context-sensitive representations, several advanced models such as GPT [16], ELMO [17], and BERT [18] have been proposed recently. These models rely on very large data for pre-training and do not pay special attention to the aspects in different domains. The work most related to ours is the model proposed by He et al. [19] that transfers the document-level knowledge to ASC. However, they only considered transferring the knowledge on the shallow layers; the position information was not studied well.

To address the aforementioned two challenges, we propose a position-aware hierarchical transfer (PAHT) model for ASC in this paper. We first split each sentence into several segments via the rhetorical structure theory (RST) [20]. Subsequently, we propose a position-aware hierarchical network that models the aspect-based positional attention in both word and segment levels to obtain more salient information for a given aspect. To incorporate more aspect-related knowledge for boosting the performance, we design three strategies for sampling from the large-scale SSC datasets, and adapt our model to learn more knowledge by pre-training on the samples. For example, as shown in Fig. 1, we can learn related knowledge about the terms: “best service,” “restaurant,” “place,” and “small” from the SSC data. After that, the learned knowledge is transferred from four levels: embedding, word, segment, and classifier. We perform experiments on the following four benchmark ASC datasets: Restaurants14-16 and Laptop14 and learn the related knowledge from the large-scale SSC data such as Yelp 2014 and Amazon Electronics to boost the performance of ASC. The results show that the proposed model is comparable to the state-of-the-art approaches in ASC. Furthermore, the components of the proposed model: the position-aware hierarchical network and the hierarchical transfer have been demonstrated to be effective for ASC.

The main contributions of our work can be summarized as follows:

  • 1.

    We propose a new PAHT model that captures the aspect related information with a position-aware hierarchical network from the benchmark ASC datasets, and performs hierarchical transfer to learn more related knowledge from the resource-rich SSC datasets to improve the performance.

  • 2.

    To the best of our knowledge, this is the first attempt to transfer the knowledge from multiple levels for ASC. Particularly, we explore the influence of each level: embedding, word, segment, and classifier, which sheds light on how to perform effective transfer in similar tasks.

  • 3.

    We present an aspect-based positional attention for both words and segments, which can be naturally integrated into the position-aware hierarchical network to capture more salient information towards a given aspect.

  • 4.

    We conduct elaborate analyses of the experimental results on four benchmark ASC datasets that provide improved understanding of the effectiveness of our proposed model.

The rest of this paper is organized as follows. Section 2 presents an overview of the related work. Section 3 introduces the details of the proposed PAHT model. Section 4 describes the experimental setup details and Section 5 provides the experimental results and analyses. Finally, Section 6 concludes our work and presents ideas for future research.

Section snippets

Related work

Traditional machine learning methods and NNs are two popular techniques used in ASC [7], [21], [22], [23]. Machine learning methods with a supervised classifier highly rely on the quality of extensive features [7], [24]. However, NNs have attracted increased attention for used in ASC owing to the growing trend of reduced human intervention in tasks in recent years. Here, we mainly review the work about NN for ASC that is most related to our research. Furthermore, we introduce the relevant work

Proposed model

In this section, we present the detail of the PAHT model proposed by us. First, in Section 3.1, we describe the preliminaries. Subsequently, in Section 3.2, we present a position-aware hierarchical network to capture the aspect specific information. Finally, in Section 3.3, we introduce how to learn related knowledge from the SSC data to boost the performance of ASC by performing hierarchical transfer. The differences between the proposed and the existing approaches are summarized as follows.

Experimental setup

We first introduce the datasets and the evaluation metrics used in our experiments in Section 4.1. Furthermore, the details of the parameter settings and the baselines for comparison are provided in Sections 4.2 and 4.3, respectively.

Results and analyses

In this section, we report the experimental results and conduct extensive analyses. To be specific, we first make comparisons with the recent progressions in ASC to investigate the effectiveness of the proposed PAHT model in Section 5.1. Subsequently, we study the influence of each transferred level in Section 5.2. Furthermore, we discuss the influence of the sampling and epoch numbers in Sections 5.3 and 5.4, respectively. Finally, we provide an intuitive understanding of why the proposed

Conclusions and future work

In this paper, we proposed a PAHT model for ASC. Specifically, we first presented a position-aware hierarchical network that embeds the aspect-based positional attention in the word and the segment levels to capture the related information for a given aspect. To learn more domain knowledge for improving the performance of ASC, we incorporated hierarchical transfer to leverage full advantage of the related data obtained by utilizing three sampling strategies. Experimental results on the four

CRediT authorship contribution statement

Jie Zhou: Conceptualization, Data curation, Formal analysis, Writing - original draft. Qin Chen: Conceptualization, Formal analysis, Funding acquisition, Investigation, Writing - original draft. Jimmy Xiangji Huang: Conceptualization, Formal analysis, Investigation, Writing - original draft. Qinmin Vivian Hu: Conceptualization, Formal analysis, Investigation, Visualization, Writing - original draft. Liang He: Conceptualization, Formal analysis, Methodology, Writing - original draft.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personalrelationships that could have appeared to influence the work reported in this paper.

Acknowledgment

We deeply appreciate the anonymous reviewers and the associate editor for their valuable and high quality comments that greatly helped improve the quality of this paper. This research is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada, an NSERC CREATE award in ADERSIM6, the York Research Chairs (YRC) program and an ORF-RE (Ontario Research Fund-Research Excellence) award in BRAIN Alliance7. This research is also supported by the National Natural Science

References (49)

  • Y. Wang et al.

    Attention-based LSTM for aspect-level sentiment classification

    Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    (2016)
  • P. Chen et al.

    Recurrent attention network on memory for aspect sentiment analysis

    Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    (2017)
  • S. Gu et al.

    A position-aware bidirectional attention network for aspect-level sentiment analysis

    Proceedings of the 27th International Conference on Computational Linguistics

    (2018)
  • X. Li et al.

    Transformation networks for target-oriented sentiment classification

    Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)

    (2018)
  • T. Mikolov et al.

    Distributed representations of words and phrases and their compositionality

    Advances in Neural Information Processing Systems

    (2013)
  • J. Pennington et al.

    Glove: Global vectors for word representation

    Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    (2014)
  • A. Radford et al.

    Improving language understanding by generative pre-training

    (2018)
  • M.E. Peters et al.

    Deep contextualized word representations

    Proc. of NAACL

    (2018)
  • J. Devlin et al.

    BERT: pre-training of deep bidirectional transformers for language understanding

    Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics

    (2019)
  • R. He et al.

    Exploiting document knowledge for aspect-level sentiment classification

    Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)

    (2018)
  • M. Surdeanu et al.

    Two practical rhetorical structure theory parsers

    Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT): Software Demonstrations

    (2015)
  • R.K. Amplayo et al.

    Incorporating product description to sentiment topic models for improved aspect-based sentiment analysis

    Inf. Sci.

    (2018)
  • T.H. Nguyen et al.

    PhraseRNN: phrase recursive neural network for aspect-based sentiment analysis

    Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)

    (2015)
  • B. Pang et al.

    Thumbs up?: Sentiment classification using machine learning techniques

    Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10

    (2002)
  • Cited by (0)

    View full text