Skip to main content

Open Access 26.04.2024 | Original Article

A syntactic features and interactive learning model for aspect-based sentiment analysis

verfasst von: Wang Zou, Wubo Zhang, Zhuofeng Tian, Wenhuan Wu

Erschienen in: Complex & Intelligent Systems

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The aspect-based sentiment analysis (ABSA) consists of two subtasks: aspect term extraction (AE) and aspect term sentiment classification (ASC). Previous research on the AE task has not adequately leveraged syntactic information and has overlooked the issue of multi-word aspect terms in text. Current researchers tend to focus on one of the two subtasks, neglecting the connection between the AE and ASC tasks. Moreover, the problem of error propagation easily occurs between two independent subtasks when performing the complete ABSA task. To address these issues, we present a unified ABSA model based on syntactic features and interactive learning. The proposed model is called syntactic interactive learning based aspect term sentiment classification model (SIASC). To overcome the problem of extracting multi-word aspect terms, the model utilizes part-of-speech features, words features, and dependency features as textual information. Meanwhile, we designs a unified ABSA structure based on the end-to-end framework, reducing the impact of error propagation issues. Interaction learning in the model can establish a connection between the AE task and the ASC task. The information from interactive learning contributes to improving the model’s performance on the ASC task. We conducted an extensive array of experiments on the Laptop14, Restaurant14, and Twitter datasets. The experimental results show that the SIASC model achieved average accuracy of 84.11%, 86.65%, and 78.42% on the AE task, respectively. Acquiring average accuracy of 81.35%, 86.71% and 76.56% on the ASC task, respectively. The SIASC model demonstrates superior performance compared to the baseline model.
Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

With the rapid development of e-commerce, online shopping has gradually become a habit in people’s lives. Web service platforms generate a large amount of comment data every day, from sources such as Amazon, eBay, Twitter and others. In this context, there is a growing need for technologies capable of processing vast amounts of textual data. Aspect-based sentiment analysis (ABSA) [1, 2] is not only a significant topic in data mining (DM) [3] but also a crucial task in natural language processing (NLP) [4]. The ABSA technology can identify aspect terms in sentences and analyze their sentiment polarity. Consequently, numerous scholars have conducted extensive research in this field. For instance, consider a restaurant review: “The pasta at this restaurant is delicious, but the service is poor”. The ABSA techniques can extract and analyze the sentiment of aspect terms, yielding “pasta, positive” and “service, negative”.
Researchers have categorized ABSA technology into two main tasks: aspect term extraction (AE) [5] and aspect sentiment classification (ASC) [6]. Currently, the majority of research is primarily focused on one of these two subtasks, overlooking the interrelation between them. Some researchers have proposed pipeline approaches [7] to handle these two subtasks sequentially. However, due to the issue of error propagation, the performance of the AE task directly influences the accuracy of the ASC task. Building an end-to-end unified model helps mitigate the issue of error propagation between multiple tasks. Simultaneously, there is a common occurrence of multi-word aspect terms in textual data, such as “battery life”, “wireless mice”, “salad dressings” and “foie gras saute”. These terms consist of multiple words, and accurately recognizing their boundaries is a challenge that can impact the AE task’s accuracy. Current researchers addressing the issue of multi-word aspect terms tend to focus on the dependency features and word features of sentences, neglecting the role of part-of-speech characteristics. Utilizing syntactic information is advantageous for the identification of boundaries in multi-word aspect terms. To address the above-mentioned issues, we propose a unified model for ABSA called syntactic interactive learning for aspect-based sentiment classification (SIASC).1 The SIASC model is the structured of an end-to-end framework that reduce the effects of error propagation on both subtasks. To tackle the challenge of extracting multi-word aspect terms, we leverage textual information by incorporating part-of-speech features [8], word features, and dependency features [9] as inputs to the model. Adequate learning of syntactic knowledge enables the model to more accurately identify the boundaries of multi-word aspect terms. To enhance sentiment analysis performance, we introduce interactive learning [10] and a technique known as local context focus on syntax (LCFS) [11] into the model. Interactive learning contributes to the model capture the connection between the AE task and the ASC task. The LCFS technique enables the handling of irrelevant terms around aspect terms, thereby improving aspect sentiment classification accuracy.
In summary, this paper makes the following main contributions:
1.
We design a unified ABSA structure based on an end-to-end framework, which reduce the impact of error propagation between the AE and ASC tasks.
 
2.
We fully utilize textual features information by incorporating part-of-speech features, word features, and dependency features. These enhancements enable the model to accurately extract multi-word aspect terms.
 
3.
We employ interactive learning to facilitate the model in capturing the relationship between the AE and ASC tasks. Additionally, the LCFS technique effectively process irrelevant words in the context of aspect terms, thereby improving sentiment classification accuracy.
 
4.
We conducted experiments on the Laptop14, Restaurant14, and Twitter datasets. The results demonstrate that the SIASC model outperforms the baseline model.
 
In this section, we present previous research in aspect-based sentiment analysis: BERT and Attention, Syntactic knowledge and GNN, Pipeline and End-to-end.

BERT and attention

Vaswani et al [16] proposed that the attention mechanism (Attention) can assign a corresponding weight value to each word in the text. The Attention mechanism focuses on the key words, effectively enhancing model performance. Therefore, researchers have widely used Attention in ABSA task. For instance, Wang et al. [12] proposed an attention based long short-term memory network model (LSTM) [13]. Ma et al. [10] proposed the interactive attention networks model to generate representations of aspect term and context simultaneously. Fan et al. [14] proposed a fine-grained attention neural network model to achieve word-level interactions between aspect and context. Devlin et al. [15] based on the Attention and Transformer [16] model proposed a pre-training bidirectional encoder representations from Transformer model (BERT). Due to the significant advancements in text representation achieved by the BERT model, it has been widely adopted by numerous researchers. Dai and Song [17] proposed a neural network model based on BERT and mining rules. Song et al. [18] proposed a model based on BERT and attention encoder network. The model employed attention based encoder for the modeling between context and aspect term, avoiding the problem of difficult long-term memory due to recursion. Yang and Zeng et al. [19] proposed a local context focus sentiment classification mechanism based on BERT. The model utilized context feature dynamic masking and context feature dynamic weighting methods to focus on local context words. However, these studies do not fully leverage syntactic features, leading to difficulties in accurately extracting multi-word aspect terms from the text.

Syntactic knowledge and GNN

The syntactic knowledge assists the model in better understanding the syntactic correlations between words in the text. Utilizing the technique of dependency syntactic analysis effectively represent words. The graph neural network (GNN) [20] can take the results of dependency syntactic analysis as input in the form of a directed graph. It uses neural network models to learn syntactic features between words, with words as nodes and syntactic relations as directed edges in the graph. To leverage syntactic knowledge, some scholars have employed GNN to implement the ABSA task. Zhang et al. [21] proposed an aspect specific graph convolutional neural network model. The model acquires syntactical information and word dependencies from a dependency syntactic analysis tree of sentences, employing a graph convolutional neural network to further learn syntactic relations. Huang and Carley [22] proposed a new target dependency graph attention network model, which utilized the dependency relation among words. Li et al. [23] proposed a dual-graph convolutional neural network model to address the problem that inaccuracy of dependency syntactic analysis results. The model utilizes the syntax graph convolutional neural network (GCN) [24] to minimize errors in dependency syntactic analysis and employs the semantics GCN to capture semantic relevance. Additionally, the orthogonal regularizer is applied to mitigate the impact of text noise on Attention mechanism, and the differential regularizer is used to capture the difference in two GCN learning features. However, these studies focused on one of the two subtasks of ABSA, ignoring the connection between the AE task and the ASC task.

Pipeline and end-to-end

In order to model all subtasks of ABSA, researchers proposed the concept of a pipeline framework. The pipeline framework can construct corresponding models for all subtasks of ABSA, passing the model output to the next one. Some researchers have conducted studies on it. Shang et al. [25] designed a pipeline framework-based model that can capture the associations of targets and sentiment cues to enhance the overall performance of targeted sentiment analysis. Wu et al. [26] designed an effective grid tagging scheme inference strategy that exploited the mutual indication between different opinion factors to achieve more accurate extraction. However, the pipeline approach suffers from error propagation problem, and the accuracy of the AE task affects the performance of the ASC task.
Some researchers have proposed end-to-end framework to reduce the effects of error propagation. For example, Xu et al. [27] proposed an end-to-end model with a novel location-aware tagging scheme that enables joint acquisition of triples of aspect term, sentiment polarity, and opinions. Zhang et al. [28] proposed a multi-task learning encoder-decoder framework to jointly extract aspect term and opinion term while analyzing the sentiment dependencies between them using BiAffine [29] scorer. Peng et al. [30] proposed a two-stage framework to solve the aspect-opinion-sentiment triplet extraction task. Li et al. [31] proposed a novel unified model that adopted a unified labeling scheme for addressing the complete task.
Compared to the aforementioned approach, the end-to-end framework reduces the impact of error propagation. Abundant syntactic information enables the model to recognize multi-word aspect terms. Therefore, this paper proposes a unified ABSA model based on the end-to-end framework. Additionally, the model uses interactive learning to establish a connection between the AE task and the ASC task.

Proposed method

The overall structure of the SIASC model for aspect-based sentiment analysis is shown in Fig. 1. The model consists of three important parts: syntactic features representation, LCFS processing, and interactive learning. The part responsible for syntactic feature representation transforms the input sentence into part-of-speech features, words features, and dependency features using the BiAffine model [32]. Subsequently, these three features are concatenated to form the syntactic features of the sentence. The LCFS processing module can handle the context words of aspect terms in a sentence using the context feature dynamic mask (CDM) or context feature dynamic weighted (CDW) approach. This module can attenuate the influence of irrelevant words on sentiment classification. The interactive learning part uses the multi-head attention mechanism [33] to interactively learn between the LCFS processed feature vectors and syntactic feature vectors. Additionally, the SIASC model determines the number of final units in the fully connected layer based on the number of output aspect terms. When a sentence contains multiple aspect terms, they are combined into (aspect term, sentiment) pairs according to the order in which the model outputs aspect terms and sentiment polarities. Given a sequence of input sentences X  = {\({x}_1, {x}_2, {x}_3,..., {x}_{n}\)} and their corresponding label y, where n is the number of words. For the AE task, the label y belongs to the set {BIO} (BeginInsideOutside). Where B represents the beginning of the aspect term, I represents the interior of the aspect term, and O represents the non-aspect term. For the ASC task, the label y belongs to the set {PositiveNeutralNegative}.

Syntactic features representation

The details of the syntactic features representation are shown in Fig. 2. In this section, the BiAffine model2 [33] is primarily used to convert sentence information into part-of-speech features, word features, and dependency features. The BiAffine model is a syntactic parsing tool3 developed by the NLP Group at Stanford University. Firstly, the model employed BiLSTM and two multi-layer perceptron (MLP) layers to stack the obtained hidden vectors. Then, it utilized bi-affine transformation to compute the dependency relationships between any two words in the sentence. When given a sequence of sentences \({X=\{x_1, x_2, x_3,..., x_n\}}\), the BiAffine model performs part-of-speech tagging on each word, resulting in pairs (\({x_i}\), \({p_i}\)), where \({p_i}\) belongs to the part-of-speech set P. Simultaneously, it computes the dependency relationships between words, resulting in triples (\({x_i, d_i, x_j}\)), where \({d_i}\) belongs to the dependency relationships set D. The computational procedure for sentence analysis using the BiAffine model is as follows.
$$\begin{aligned} ({x_i},{p_i})= & {} BiAffine({x_i}) \end{aligned}$$
(1)
$$\begin{aligned} ({x_i},{d_i},{x_j})= & {} BiAffine({x_i}) \end{aligned}$$
(2)
where \({i,j \in [1,n]}\) and \({i\ne j}\), n is the number of words. The BiAffine denotes BiAffine model.
Next, the BERT model is utilized to embed part-of-speech features, word features, and dependency features individually. The BERT model can transform these three types of feature information into vectors. Finally, the three feature vectors are concatenated, and one-dimensional max-pooling is applied to process them.
$$\begin{aligned} {v_i}^p= & {} BERT(({x_i},{p_i})) \end{aligned}$$
(3)
$$\begin{aligned} v_i^w= & {} BERT({x_i}) \end{aligned}$$
(4)
$$\begin{aligned} v_i^d= & {} BERT(({x_i},{d_i},{x_j})) \end{aligned}$$
(5)
$$\begin{aligned} v_i^t= & {} Concat(v_i^p,v_i^w,v_i^d) \end{aligned}$$
(6)
$$\begin{aligned} v_i^s= & {} MaxPool1d(3,v_i^t) \end{aligned}$$
(7)
where \({v_i^p,v_i^w,v_i^d \in {{\mathbb {R}}^{1 \times 768}}}\) respectively represent the part-of-speech feature vector, words feature vector, and dependency feature vector, with 768 being the fixed dimension of BERT model output. \({v_i^t \in {{\mathbb {R}}^{1 \times 2304}}}\) is the vector obtained by concatenating the three vectors, and the dimensionality of 2034 is obtained by concatenating three 768-dimensional vectors. Concat indicates the concatenate operation. \({v_i^s \in {{\mathbb {R}}^{1 \times 768}}}\) is the result after pooling. MaxPool1d is the maximum one-dimensional pooling, where the pooling window size is set at 3.

LCFS processing

The local context focus on syntax (LCFS) mechanism4 was developed by Phan et al. [11] in the ABSA Task. This method is an improvement on the local context focus (LCF) proposed by Zeng et al. [19]. The LCFS uses semantic relative distance (SRD) [34] to determine the processing of contextual words. It used two processing methods: context feature dynamic mask (CDM) and context feature dynamic weighted (CDW). SRD measure the relationship between the target word and the contextual words. Phan et al. introduced an innovative approach that uses the distance between nodes in a dependent syntax tree [35] as the SRD threshold. This approach also accounts for calculating the SRD threshold for multi-word aspect terms. When the aspect term comprises multiple words, the SRD threshold is computed as the average distance between the target word and each constituent word. However, the calculation of the SRD values in this method is affected by sentence noise and the accuracy of the dependency parser. For example, in Fig. 3 uses the distances between nodes in the syntactic tree are considered, and when the threshold is set to 2, it will retain the word “delicious” while processing the word “service”. We proposed using the computation result of the dot product between the node distance matrix and the attention score matrix as the SRD values.5 The introduction of the attention score matrix helps mitigate the impact of sentence noise and syntactic parsing on LCFS, facilitating the capture of key nodes at longer distances. Since a sentence may contain multiple aspect words, we employ a multi-head attention mechanism with a head count of 4 when computing the attention weight matrix. The SRD threshold between the multi-word aspect terms “gourmet food” and the context word “delicious” is calculated as follows.
$$\begin{aligned}{} & {} SRD(\mathrm{{gourmet}},\mathrm{{delicious}}) = 2\\ {}{} & {} SRD(\mathrm{{food}},\mathrm{{delicious}}) = 1\\ {}{} & {} SRD(\mathrm{{gourmet food,delicious}}) = 1.5\end{aligned}$$
Context-feature dynamic mask (CDM)
The principle of the CDM method is shown in Fig. 3. The context words with smaller SRD values will be preserved and context words with larger SRD values will be masked. The computation process using the CDM method is as follows.
$$\begin{aligned}{} & {} {h^a} = MHA({V^B}) \end{aligned}$$
(8)
$$\begin{aligned}{} & {} SRD = {h^T} \cdot {h^a} \end{aligned}$$
(9)
$$\begin{aligned}{} & {} \alpha = x \cdot mean({h^a}) \end{aligned}$$
(10)
$$\begin{aligned}{} & {} v_i^m = \left\{ {\begin{array}{*{20}{c}} O\\ I \end{array}} \right. \begin{array}{*{20}{c}} {}\\ {} \end{array}\begin{array}{*{20}{c}} {SR{D_i} > \alpha }\\ {SR{D_i} \le \alpha } \end{array} \end{aligned}$$
(11)
$$\begin{aligned}{} & {} {M_1} = [v_1^m,v_2^m,...,v_n^m] \end{aligned}$$
(12)
$$\begin{aligned}{} & {} {V^{CDM}} = {M_1} \cdot {V^L} \end{aligned}$$
(13)
Where MHA is a multi-attention mechanism, \({h^a} \in {{\mathbb {R}}^{n \times n}}\) is the attention score matrix and \({h^T} \in {{\mathbb {R}}^{n \times n}}\) is the distance matrix between nodes. x is the value of the input and mean is the mean value calculation. The results of the mean computation rows for \({h^a}\) are the same, so we take the first result. O is the zero vector and I is the one vector. \(\alpha \) is the threshold of SRD. \({v_i^m} \in {{\mathbb {R}}^n}\) stores the value between the i-th word and the context word. \({M_1} \in {{\mathbb {R}}^{n \times n}}\) is the vector matrix for masking processing and the store’s values as 0 and 1. n is the length of the words in the sentence. \({V^L}\) denotes the word vector matrix of BERT output.

Context feature dynamic weighted (CDW)

The principle of the CDW method is shown in Fig. 4. The context words with larger SRD values will be weighted.
$$\begin{aligned}{} & {} v_i^w = \left\{ {\begin{array}{*{20}{c}} {(1 - \frac{{SR{D_i} - a}}{N}) \cdot I}\\ I \end{array}} \right. \begin{array}{*{20}{c}} {}\\ {} \end{array}\begin{array}{*{20}{c}} {SR{D_i} > \alpha }\\ {SR{D_i} \le \alpha } \end{array} \end{aligned}$$
(14)
$$\begin{aligned}{} & {} {M_2} = [v_1^w,v_2^w,...,v_n^w] \end{aligned}$$
(15)
$$\begin{aligned}{} & {} {V^{CDW}} = {M_2} \cdot {V^L} \end{aligned}$$
(16)
Where I is the one vector and \(\alpha \) is the threshold of SRD. \(v_i^w \in {{\mathbb {R}}^n}\) stores the value between the i-th word and the context word. \({M_2} \in {{\mathbb {R}}^{n \times n}}\) is the vector matrix for weighted processing and the stores values as 1 and 0-1 decimal. n is the length of the words in the sentence. \({V^{CDW}}\in {{\mathbb {R}}^{n \times h}}\)denotes the word vector matrix of LCFS (CDW) output.

Interactive learning

The details of interactive learning in the model are shown in Figure 5. Interactive learning is based on the attention mechanism, which utilizes pooled hidden vectors as query vectors. Interacting through two different hidden vectors allows us to capture the relevant relationships between the vectors. Assigning appropriate weights to the context of aspect terms facilitates the model in accurately discriminating sentiment polarity. For improved performance in interaction learning, we use the multi-head attention mechanism. Firstly, the BiLSTM layer is employed to learn long-range dependencies between feature vectors. Then, multi-head attention mechanism is used to interactively learn features from both types of hidden vectors. Finally, the two obtained vectors are concatenated. The computational process for interactive learning is as follows.
$$\begin{aligned}{} & {} h_i^g = BiLSTM(v_i^s) \end{aligned}$$
(17)
$$\begin{aligned}{} & {} {G_{avg}} = \sum \limits _{i = 1}^m {h_i^g/m} \end{aligned}$$
(18)
$$\begin{aligned}{} & {} h_i^l = BiLSTM({v_i^w}) \end{aligned}$$
(19)
$$\begin{aligned}{} & {} {L_{avg}} = \sum \limits _{i = 1}^n {h_i^l/n} \end{aligned}$$
(20)
Where \({v_i^s \in {{\mathbb {R}}^h}}\) is the vector after processing of syntactic features representation. \({v_i^w \in {{\mathbb {R}}^h}}\) is the vector after LCFS layer processing. \({i \in [1,n]}\) is the length of the word and h is the vector dimension of 768. \({h^l} \in {{\mathbb {R}}^{n \times 2\,h}}\) and \({h^g} \in {{\mathbb {R}}^{m \times 2\,h}}\) are the hidden layer vectors, h is the dimension of the unit output vector in the BiLSTM model. \({L_{avg}} \in {{\mathbb {R}}^{2\,h}}\) and \({G_{avg}} \in {{\mathbb {R}}^{2\,h}}\) are the hidden layer vectors after the average pooling process.
Then, the pooled vector \({G_{avg}}\) is used to calculate attention scores with another feature vector matrix \({h^l}\). The calculation is as follows:
$$\begin{aligned} {a_i} = \frac{{\exp (\varepsilon (h_i^l,{G_{avg}}))}}{{\sum \nolimits _{j = 1}^n {\exp (\varepsilon (h_j^l,{G_{avg}}))} }} \end{aligned}$$
(21)
Where \({\epsilon }\) is the score function that calculates the importance of \({h^l_i}\) in the context. The score function \({\epsilon }\) is defined as:
$$\begin{aligned} \varepsilon (h_i^l,{G_{avg}}) = \tanh (h_i^l \cdot {W_1} \cdot G_{avg}^T + {b_1}) \end{aligned}$$
(22)
Where \({W_1} \in {{\mathbb {R}}^{2d \times 2d}}\) is weight matrix and \({b_1}\) is bias respectively. tanh is the non-linear function and \({G^T_{avg}}\) is the transpose of the \({G_{avg}}\). As above, we calculate attention scores between the hidden vector matrix \({h^g}\) and the pooled vector \({L_{avg}}\).
$$\begin{aligned} {b_i} = \frac{{\exp (\varepsilon (h_i^g,{L_{avg}}))}}{{\sum \nolimits _{j = 1}^m {\exp (\varepsilon (h_j^g,{L_{avg}}))} }} \end{aligned}$$
(23)
After computing the word attention weights, we can get context and target representations L and G based on the attention vectors \({a_i}\) and \({b_i}\) by:
$$\begin{aligned} L = \sum \limits _{i = 1}^n {{a_i} \cdot h_i^l} \end{aligned}$$
(24)
$$\begin{aligned} G = \sum \limits _{i = 1}^m {{b_i} \cdot h_i^g} \end{aligned}$$
(25)
Finally, the Multi-head Attention mechanism concatenates the results of each attention head calculation to obtain the hidden layer vector \({h^s=\{h_i^s, i \in [1,n]\}}\) and the hidden layer vector \({h^a=\{h_i^a, i \in [1,n]\}}\).
$$\begin{aligned}{} & {} {h^s} = Concat({L_1},{L_2},...,{L_H}){W_2} \end{aligned}$$
(26)
$$\begin{aligned}{} & {} {h^a} = Concat({G_1},{G_2},...,{G_H}){W_3} \end{aligned}$$
(27)
$$\begin{aligned}{} & {} {h^m} = Concat({h^s},{h^a}) \end{aligned}$$
(28)
Where \({W_2}\) and \({W_3} \in {{\mathbb {R}}^{H \cdot u \times d}}\), \({d=H}\times {u}\) and H is the number of heads, u is the dimension of each attention head. For example, assume that d = 1024 and H = 16 for each attention header u = d ÷ H = 64. \({h^s} \in {{\mathbb {R}}^{n \times d}}\) and \({h^a} \in {{\mathbb {R}}^{n \times d}}\) are vector matrices of interactive learning outputs.
For the AE task, the model uses the conditional random fields (CRF) layer to predict the sequence labeling information of the final aspect terms in the sentence. The output of the AE task is calculated as follows.
$$\begin{aligned} {f_i}^s,{f_i}^e = CRF(h_i^g) \end{aligned}$$
(29)
$$\begin{aligned} y_i^s = soft\max (f_i^s) \end{aligned}$$
(30)
$$\begin{aligned} y_i^e = soft\max (f_i^e) \end{aligned}$$
(31)
Where \({y_i^s}\) denotes the probability that the ith word is the annotation start position. \({y_i^e}\) denotes the probability that the ith word is the annotation end position. For the ASC task, the model uses the fully connected layer to compute the sentiment polarity of aspect terms. The output of the ASC task is calculated as follows.
$$\begin{aligned}{} & {} f_i^m = \tanh ({W_l} \cdot {h_i}^m + {b_l}) \end{aligned}$$
(32)
$$\begin{aligned}{} & {} {y_i} = soft\max (f_i^m) \end{aligned}$$
(33)
Where \({W_l}\) is the weight matrix in the fully connected layer and \({b_l}\) is the bias item.

Loss function

When training the AE task in the SIASC model, we use the cross-entropy loss function to predict the probabilities of the starting and ending positions for each word.
$$\begin{aligned} {{{\mathcal {L}}}_{AE}} = - {\sum \limits _{i}}^{c} {{y'_{i}}^{s}} \log ({y_{i}}^s) - {\sum \limits _i}^c {{y'_i}^e\log ({y_i}^e)} \end{aligned}$$
(34)
Where \({{y'_i}^s}\) represents the starting position of annotated words. \({{y'_i}^e}\) represents the ending position of annotated words. c is the datasets. When training the ASC task, we use a cross-entropy loss function along with L2 regularization to predict the sentiment polarity of aspect terms.
$$\begin{aligned} {{{\mathcal {L}}}_{ASC}}(\theta ) = - \sum \limits _1^c {{{y'}_i}\log {y_i} + \lambda \sum \limits _{\theta \in \Theta } {{\theta ^2}} } \end{aligned}$$
(35)
Where \({y'_i}\) is the predicted sentiment polarity corresponding to \({y_i}\). \({\Theta }\) is the parameter-set of the SIASC model. Finally, we employ a multi-task training strategy to combine the loss functions of the AE task and ASC task to train the final loss values of the model.
$$\begin{aligned} {\zeta (\theta )\mathrm{~{ = }}~\alpha \cdot {{{\mathcal {L}}}_{AE}} + \beta \cdot {{{\mathcal {L}}}_{ASC}}(\theta ) } \end{aligned}$$
(36)
Where \({\alpha ,\beta \in [0,1]}\) are hyper-parameters to control the contributions of objectives. \({\alpha }\) and \({\beta }\) are initially set to 1/2 and continuously adjusted during the model training process for optimization.

Experiments

In this section, we first provide a detailed introduction to the datasets, experimental settings, and baseline models used in the experiments. Then, we separately present comparison study, ablation study, the SRD threshold study, attention visualization, and case study respectively to validate the performance of the SIASC model.

Datasets and experimental settings

To demonstrate the performance of SIASC model, we conducted experiments on the three benchmark datasets shown in Table 1. The Laptop and Restaurant datasets are from the SemEval-2014 Task 4 challenge6 [36]. Additionally, the Twitter datasets7 [37] consists of online comments, which can verify the model’s performance on informal datasets. All three datasets include examples with positive, negative, and neutral sentiment polarities. Each sentence in these datasets is annotated with marked aspect terms and their corresponding polarities.
Table 1
The experimental datasets
Datasets
Positive
Negative
Neural
 
Train
Test
Train
Test
Train
Test
Laptop
994
341
870
128
646
169
Restaurant
2164
728
807
196
637
196
Twitter
1561
173
1560
173
3127
346
In addition to the hyper-parameter settings mentioned in the previous studies, we also conducted experiment on the impact of SRD thresholds. Analyzing the experimental results to optimize the hyper-parameter settings. The hyper-parameter settings in the SL-AESC model are shown in Table 2.
Table 2
The hyper-parameter settings
Hyper-parameter
Setting
BERT dim
768
BiLSTM
128
MaxPool1d kernel
3
Average Pooling kernel
128
Multi-Head
12
SRD thresholds
4
epochs
50
batch_size
64
learning_rate
2e-5
dropout
0.3
optimizer
Adam
L2 regularization
0.01

Baseline model

Comparison experiments are conducted to separately validate the performance of the SIASC model on the AE task and ASC task. The experimental details of the baseline models are as follows.
DTBCSNN [38]: (Ye et al. 2017) proposed a stacked convolutional neural network based on dependency trees which used an inference layer for aspect term extraction.
BiDTreeCRF [39]: (Luo et al. 2019) integrated word embedding representation and BiLSTM plus CRF to learn tree structure and sequential features.
Seq2Seq4ATE [40]: (Ma et al. 2019) designed the gated unit networks to incorporate the corresponding word representation into the decoder. The model also adopted position-aware attention to focus on the adjacent words of the aspect term.
ATAE-LSTM [12]: (Wang et al. 2016) utilized the Attention mechanism to capture the importance of different context information for the aspect term.
AEN-BERT [18]: (Song et al. 2019) employed the contextualized BERT and Attention mechanism to model the relation between context and aspect term.
RINANTE [17]: (Dai et al. 2019) designed an algorithm to automatically mine extraction rules from existing training examples based on dependency parsing results.
SPAN-BERT [41]: (Hu et al. 2019) proposed a pipeline method using BERT as the backbone. A multi-target model is used for aspect term extraction and sentiment classification.
JET-BERT [27]: (Xu et al. 2020) designed the end-to-end model with a novel position aware tagging scheme that is capable of jointly extracting the triplets.
Peng-to-stage [30]: (Peng et al. 2020) used the model with a two-stage framework. The first stage predicts aspect term, sentiment polarity and the causes of sentiment polarity. The second stage outputs the results of the first stage predictions as the triplet.
LCFS-BERT [11]: (Phan et al. 2020) used the local context focus on syntax mechanism that masks or weakens semantically distant context words.
R-GAT+BERT [42]: (Wang et al. 2020) defined a unified target aspect term dependency tree structure by reshaping and pruning the ordinary dependency parse tree.
OTE-MTL [28]: (Zhang et al. 2020) adopted the multi-task learning framework to jointly extract aspect term and opinion term. The model also utilized the BiAffine scorer to calculate the sentiment dependency relationship between the two.
RACL-BERT [43]: (Chen et al. 2020) used a relation aware collaborative Learning framework allows the subtasks to work in coordination.
DREGCN [44]: (Liang et al. 2021) designed a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning.
Table 3
Comparison experiment results on the AE task
Model
Laptop
Restaurant
Twitter
 
Accuracy (%)
F1 (%)
Accuracy (%)
F1 (%)
Accuracy (%)
F1 (%)
DTBCSNN
78.45
74.66
84.50
82.97
74.52
73.20
BiDTreeCRF
78.74
74.46
84.10
82.73
74.46
72.98
Seq2Seq4ATE
82.61
75.57
84.24
82.41
76.47
73.37
RINANTE
82.67
76.34
81.58
79.92
72.67
70.51
SPAN-BERT
82.66
75.36
83.79
81.52
76.24
72.82
JET-BERT
83.25
76.40
84.57
82.75
Peng-to-stage
82.13
75.02
82.60
80.30
74.41
72.04
OTE-MTL
83.34
75.68
84.71
85.71
77.57
75.08
RACL-BERT
82.79
76.59
85.38
85.27
77.63
75.31
DREGCN
82.54
75.26
83.64
81.45
BART-ABSA
83.50
75.92
85.20
83.56
77.42
74.12
PD-GAT
83.64
75.82
84.87
82.64
77.52
74.36
SSEGCN
83.54
74.62
84.50
82.57
76.75
73.50
Sentic GCN
84.06
76.50
86.35
85.45
76.45
75.25
KGAN
83.98
76.12
86.25
85.59
77.23
74.85
WSIN
84.15
76.54
86.47
85.68
78.20
75.64
KDGN
84.24
76.85
85.94
85.71
77.69
75.45
SIASC
84.11
77.14
86.65
85.90
78.42
75.90
BART-ABSA [45]: (Yan et al. 2021) proposed a unified generative model based on aspect-based sentiment analysis. This addresses all aspect-based sentiment analysis sub-tasks in the end-to-end framework.
PD-GAT [46]: (Wu et al. 2022) constructed a relational graph attention networks based on phrase dependency graphs.
SSEGCN [47]: (Zhang et al. 2022) proposed an aspect-aware attention mechanism combined with self-attention.
Sentic GCN [48]: (Liang et al. 2022) utilized specific aspects to exploit the sentiment dependencies of sentences to propose a graphical convolutional network based on SenticNet.
KGAN [49]: (Zhong et al. 2023) proposed a knowledge graph augmentation network combines external knowledge with explicit syntactic information.
WSIN [50]: (Gu et al. 2023) fully considered the information of the current aspect word interacting with the contextual information.
KDGN [51]: (Wu et al. 2023) combined domain knowledge, dependency annotation and syntactic paths to propose a dependency graph-based knowledge-aware network.

Comparison study

To demonstrate the performance of SIASC model, the experiments evaluate its performance in both the AE task and ASC task. We first analyzed the experimental results of the AE task and then those of the ASC task.
Table 4
Comparison experiment results on the ASC task
Model
Laptop
Restaurant
Twitter
 
Accuracy (%)
F1 (%)
Accuracy (%)
F1 (%)
Accuracy (%)
F1 (%)
ATAE-LSTM
68.70
77.20
AEN-BERT
79.93
76.31
83.12
74.76
74.71
73.13
LCFS-BERT
79.68
75.55
85.21
78.16
75.83
74.40
R-GAT+BERT
79.73
75.50
85.50
79.33
76.15
74.88
RINANTE
79.93
74.31
83.12
74.76
75.18
74.01
SPAN-BERT
80.50
76.24
86.35
79.72
75.94
74.52
JET-BERT
80.39
75.47
83.25
74.94
72.15
70.40
Peng-to-stage
80.21
76.20
85.91
79.12
75.57
73.82
OTE-MTL
80.46
75.68
84.57
77.40
RACL-BERT
80.94
76.72
86.54
80.16
76.19
74.68
DREGCN
80.32
75.54
84.25
77.08
BART-ABSA
80.57
76.08
85.95
76.96
75.66
73.66
PD-GAT
80.79
76.85
86.24
78.95
76.09
74.85
SSEGCN
80.95
76.98
85.75
79.68
75.85
74.63
Sentic GCN
81.07
77.25
86.54
80.25
76.20
75.27
KGAN
80.95
76.92
86.27
79.82
76.15
75.29
WSIN
81.14
77.24
86.45
80.35
75.85
74.50
KDGN
81.29
77.48
86.67
80.54
76.19
74.85
SIASC (CDM)
81.05
77.10
86.35
80.10
76.40
74.82
SIASC (CDW)
81.35
77.45
86.71
80.31
76.56
75.40
The results of the AE task are presented in Table 3. The experiments reveal that the SIASC model outperforms the baseline model. Our model achieved accuracy rates of 84.11%, 86.65%, and 78.42%, as well as F1 scores of 77.14%, 85.90%, and 75.90% on the three datasets, respectively. The best performance among the baseline models is the KDGN model. Compared to the KDGN model, our proposed SIASC model achieved F1 score improvements of 0.29%, 0.19%, and 0.45%, respectively. The accuracy on the Restaurant and Twitter datasets improved by 0.71% and 0.73%, respectively. Analyzing the specific reasons behind these improvements. On the one hand, the SIASC model employs the BiAffine model to parse sentence information, obtaining part-of-speech features, word features, and dependency features. The BiAffine model possesses representational capability needed for accurate dependency parsing. On the other hand, converting text into three types of feature information benefits the model training, particularly when dealing with datasets with limited resources. Compared to single-word features, incorporating part-of-speech features and dependency features is more advantageous for the model to extract multi-word aspect terms.
The results of the ASC task are presented in Table 4. The SIASC model also demonstrates superior performance in the ASC task compared to the baseline model. Specifically, the SIASC (CDW) model achieved accuracy scores of 81.35%, 86.71%, and 76.56% on the three datasets, respectively. In comparison to the KDGN model, our model showed improvements in accuracy of 0.06%, 0.04%, and 0.37%, respectively. Analyzing the reasons, the KDGN model integrates domain knowledge and dependency features to construct a graph neural network for learning the syntactic structure of sentences. However, it overlooks the part-of-speech features within the sentences. In contrast, our model makes full use of syntactic information, enabling more accurate extraction of aspect terms. Additionally, the Twitter dataset, sourced from online web reviews belonging to the informal and prone to sentence noise. In comparison to the KDGN model, our accuracy has improved by 0.37%, indicating that the SIASC model demonstrates better generalization on real web review data. The outstanding performance of the SIASC model can be attributed to the utilization of the LCFS mechanism and interactive learning method. When analyzing the specific reasons, several factors stand out. Firstly, the LCFS mechanism effectively manages context words that are unrelated to aspect terms. This benefits the model in determining the sentiment polarity of aspect terms. Secondly, interactive learning establishes a crucial between the AE task and the ASC task. Information from the AE task proves beneficial in improving the performance of the model on the ASC task. Finally, the design of the SIASC model’s structure aids in reducing error propagation between the AE and ASC tasks. Notably, the use of the CDW method is significantly more effective than the CDM method. This underscores the superiority of employing a weighted approach within the LCFS mechanism over the masking method.
Table 5
Experimental results of multi-word aspect terms on the AE task
Model
Laptop
Restaurant
Twitter
 
Accuracy (%)
F1 (%)
Accuracy (%)
F1 (%)
Accuracy (%)
F1 (%)
KGAN
80.32
76.84
81.75
78.25
68.35
65.35
WSIN
79.11
75.49
85.08
82.57
73.17
69.27
KDGN
82.68
79.35
85.32
81.49
75.91
71.45
SIASC
84.00
80.27
86.63
84.75
77.67
75.30
To validate the performance of the SIASC model on sentences containing multi-word aspect terms, we also conducted comparative experiments and confusion matrix analysis for multi-word aspect terms. The experimental results for the AE task on sentences with multi-word aspect terms are shown in Table 5. The results indicate that the performance of the SIASC model is significantly better than other models on the Laptop, Restaurant, and Twitter datasets. Compared to the KDGN model, our model’s performance has improved by 1.32%, 1.31%, and 1.76%, indicating that the model effectively utilizes syntactic features to accurately extract multi-word aspect terms. The confusion matrix results for the ASC task on multi-word aspect terms are shown in Fig. 6. As observed from the figure, on the three datasets, the SIASC model demonstrates a more accurate prediction of the sentiment polarity of multi-word aspect terms compared to other models. Analyzing the reasons, our model adopts a unified structure to minimize the impact of error propagation between the two subtasks. The incorporation of the LCFS mechanism aids in determining the context of multi-word aspect terms, thus enhancing the discernment of sentiment polarity.
Table 6
Ablation experiment results of SIASC model. Where “w/o” denotes components removed from the model
  
Laptop
Restaurant
Twitter
Task
Model
Accuracy (%)
Accuracy (%)
Accuracy (%)
AE
SIASC
84.11
86.65
78.42
 
w/o Part-of-speech
(83.34)\(\downarrow \)0.77
(86.25)\(\downarrow \)0.40
(77.90)\(\downarrow \)0.52
 
w/o Dependency
(82.47)\(\downarrow \)1.64
(85.56)\(\downarrow \)1.09
(77.18)\(\downarrow \)1.24
 
w/o Part-of-speech & Dependency
(81.68)\(\downarrow \)2.43
(85.12)\(\downarrow \)1.53
(76.67)\(\downarrow \)1.75
ASC
SIASC(CDM)
81.05
86.35
76.40
 
SIASC(CDW)
81.35
86.71
76.56
 
w/o LCFS(CDM)
(80.12)\(\downarrow \)0.93
(85.37)\(\downarrow \)0.98
(74.82)\(\downarrow \)1.58
 
w/o LCFS(CDW)
(80.24)\(\downarrow \)1.11
(85.39)\(\downarrow \)1.32
(74.76)\(\downarrow \)1.80

Ablation study

To verify the effectiveness of the syntactic feature representation and LCFS processing in the SIASC model, we conducted ablation study. In the AE task, we demonstrated the effectiveness of syntactic feature representation by progressively removing part-of-speech features and dependency features from the model. In the ASC task, we verified the effectiveness of using LCFS by removing it from the model. The results of the ablation experimental are presented in Table 6 and Figs. 78, 9.
The results of the ablation experiments indicate that the inclusion of syntactic feature representation and LCFS processing in the SIASC model can enhance its performance. When the model simultaneously removes both the part-of-speech features and dependency features, its accuracy decreases by 2.43%, 1.53%, and 1.75%, respectively. Analyzing the specific reasons, the part-of-speech features and the dependency features are essential components of textual information. Leveraging syntactic knowledge in the text facilitates the model better understand the datasets. Removing the LCFS mechanism from the SIASC (CDM) model results in a decrease in accuracy by 0.93%, 0.98%, and 1.58%, respectively. While the accuracy of the SIASC (CDW) model was decreased by 1.11%, 1.32%, 1.80%. This indicates that the LCFS mechanism can significantly improve the model’s performance. Additionally, the experimental results show that the CDW method is superior to the CDM method. Analyzing the specific reasons, the CDW method is similar to the attention mechanism. It assigns appropriate weights to context words. However, the CDM method uses masking to remove words from the context, leading to a loss of aspect term contextual information.
Table 7
SRD threshold experiment results
Dataset category
LCFS method
Laptop
Restaurant
Twitter
Test set
CDW
3
3
5
 
CDM
4
3
4
Validation set
CDW
3
3
4
 
CDM
3
3
3

The SRD threshold study

The setting of the SRD threshold size directly affects the performance of the LCFS mechanism. To find the optimal SRD threshold \({\alpha }\) regarding the input value x for the LCFS mechanism on the three datasets, we conducted SRD threshold experiment. The experiment is designed to compare the performance of the SIASC model on the ASC task by setting the SRD threshold to values between 1 and 6, respectively. The experimental results are shown in Table 7 and Fig. 10.
The experimental results demonstrate that the CDW method with an SRD threshold of 3 performs the best on the Laptop dataset, with accuracy of 81.35% and 82.58% on the test set and validation set, respectively. Similarly, the CDW method with an SRD threshold of 3 shows optimal performance on the Restaurant dataset, achieving 86.71% and 86.95% accuracy on the test and validation sets, respectively. For the Twitter dataset, on the test set, the CDW method with an SRD threshold of 5 achieves the best performance, with an accuracy of 76.56%, and additionally, a threshold of 4 also achieves a decent accuracy of 76.52%. On the validation set, the CDW method with an SRD threshold of 4 achieves the best performance, with an accuracy of 77.05%. The SRD threshold experiment results once again demonstrate that the CDW method is significantly superior to the CDM method within the LCFS mechanism. According to the above experimental results, we finally determined to set the SRD threshold in the LCFS mechanism to 3 for the laptop and restaurant datasets, and to set the SRD threshold to 4 for the Twitter dataset.

Attention visualization

To assess the impact of interactive learning on the model, we conducted a visualization study. Interactive learning in the SIASC model is based on the multi-head attention mechanism. Therefore, we can evaluate the functionality of interactive learning by visualizing the attention score matrix. The SIASC model includes 12 attention heads in its multi-head attention mechanism. We selected representative results for visualization in the experiments. The model takes the sentences: “The food was delicious but the wine was terrible” as input, with “food” and “wine” are aspect terms. The experimental results are presented in Figs. 11 and 12.
In Fig. 11, it’s evident that the scores for the phrases “the food was delicious” and “the wine was terrible” in the sentence are not very high. However, the scores between the target words and the context words are relatively high. For example, “food” has higher scores with the words in “the food was delicious”, whereas its scores with the words in “the wine was terrible” are lower. This is because the LCFS mechanism can handle context-irrelevant words around the target word. Contextually relevant words are retained, while irrelevant words are masked or assigned smaller weight values. Moving to Fig. 12, the score distribution in the score matrix after syntactic feature processing is not as regular. However, there are high scores between key words in both score matrices. For instance, there is a high score for “food” with “delicious” and “wine” with “terrible”. This confirms the impact of interactive learning on both score matrices. Interactive learning captures the differences between two hidden layer vectors, further establishing their connection.
Table 8
Case experiment results. Where the notations “P”, “N” and “O” represents positive, negative and neutral sentiment, respectively. “✓” and “✗” denotes model predicted correct and incorrect, respectively
  
WSIN
KDGN
SIASC
#
Sentences
AE
ASC
AE
ASC
AE
ASC
1
I choose apple MacBook because of their design and the aluminum casing.
Design, casing
P✓ P✓
Design, aluminum casing
P✓ P✓
Design, aluminum casing
P✓ P✓
2
I continued to take the computer in AGAIN and they replaced the hard drive and mother board.
Hard drive, mother board
P✗ N✓
Hard drive, mother board
P✗ N✓
Hard drive, mother board
N✓ N✓
3
Similar to other Indian restaurants, they use the dinner special to attract customers.
Dinner
P✓
Dinner
O✗
Dinner special
O✓
4
The blond wood decor is very soothing, the premium sake is excellent and the service is great.
Blond wood, sake, service
P✓ P✓P✓
Blond wood, sake, service
P✓ P✓ P✓
Blond wood decor, sake, service
P✓ P✓ P✓
5
The duck confit is always amazing and the foie gras terrine with figs was out of this world.
Duck confit, foie gras
P✓ N✗
Duck confit, foie gras terrine
P✓ N✗
Duck confit, foie gras terrine with figs
P✓ P✓
6
The food is nothing like its menu description.
Food, menu
N✓ N✓
Food, menu description
N✓ N✓
Food, menu description
N✓ N✓

Case study

In the case study, we selected the better performing WSIN and KDGN models as comparison models. The experimental results are shown in Table 8, which displays the performance of the three models on seven example sentences. To facilitate a better understanding of the results, we have applied red and blue highlighting to identify the aspect terms in the sentences. The experimental results unequivocally demonstrate that the SIASC model excels in accurately extracting multi-word aspect terms compared to the WSIN and KDGN models. For instance, the WSIN and KDGN models struggled to accurately extract the “blond wood decor” in sentence 5 and the “foie gras terrine with figs” in sentence 6. In stark contrast, the SIASC model can accurately identify both of these complex multi-word aspect terms. Regarding the ASC task, the WSIN model erroneously discriminated between the sentiment polarity of “hard disk” and “dinner special”. Similarly, the KDGN model encountered difficulty in accurately distinguish the affective polarity of the “hard disk”. Conversely, our proposed SIASC model accurately discerned sentiment polarity in the example sentences. The results of this case study underscore our model’s effectiveness in addressing the challenge of multi-word aspect terms and its commendable performance in sentiment classification.

Conclusion

To comprehensively address both the AE and ASC sub-tasks in ABSA, we design a unified model based on an end-to-end framework, known as the SIASC model. The results of comparison experiments and case experiments indicate that the overall performance of the SIASC model is superior to the baseline model on the three benchmark datasets. Fully utilizing the syntactic feature information of the text benefits the accurate extraction of multi-word aspect terms by the model. The results of the ablation experiment indicate that utilizing syntactic features and the LCFS component can enhance the model’s performance. Additionally, attention visualization further validates the impact of the LCFS mechanism and interactive learning on the model. The SRD threshold experiment explores the optimal threshold setting for the LCFS mechanism. In future work, we plan to employ GNN to learn syntactic structural features of text and utilize a multi-task learning framework [52] to establish connections between multiple subtasks, advancing research on ABSA tasks.

Acknowledgements

This work was supported by the Hubei Province Key Research Project No. TA02002. Hubei Provincial Central Leading Local Science and Technology Development Special Project No. 2018ZYYD007. the Hubei Province Education Department Science and Technology Research Project No. Q20201801. Ph.D Research Startup Fund Project of Hubei University of Automotive Technology No. BK202004. Natural Science Foundation of Hubei Province No. 2022CFB538.

Declarations

Conflict of interest

On behalf of all the authors, the corresponding author states that there is no Conflict of interest.

Code availability

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Fußnoten
1
The SIASC model code available at: https://​github.​com/​ZouWang-spider/​AESC.
 
4
The LCFS mechanism available at: https://​github.​com/​HieuPhan33/​LCFS-BERT.
 
6
The Laptop 14 and Restaurant 14 datasets available at: http://​alt.​qcri.​org/​semeval2014/​task4/​.
 
7
The Twitter datesets available at: http://​goo.​gl/​5Enpu7.
 
Literatur
1.
Zurück zum Zitat Li J, Zhao Y, Jin Z, Li G, Shen T, Tao Z, Tao C (2022) SK2: Integrating Implicit Sentiment Knowledge and Explicit Syntax Knowledge for Aspect-Based Sentiment Analysis. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp 1114–1123. https://doi.org/10.1145/3511808.3557452 Li J, Zhao Y, Jin Z, Li G, Shen T, Tao Z, Tao C (2022) SK2: Integrating Implicit Sentiment Knowledge and Explicit Syntax Knowledge for Aspect-Based Sentiment Analysis. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp 1114–1123. https://​doi.​org/​10.​1145/​3511808.​3557452
6.
Zurück zum Zitat Mukherjee A, Liu B (2012) Aspect extraction through semi-supervised modeling. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), pp 339-348 Mukherjee A, Liu B (2012) Aspect extraction through semi-supervised modeling. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL), pp 339-348
12.
Zurück zum Zitat Wang Y, Huang M, Zhao L, Zhu X (2016) Attention-based LSTM for aspect level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), pp 606-615 Wang Y, Huang M, Zhao L, Zhu X (2016) Attention-based LSTM for aspect level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), pp 606-615
14.
16.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems
21.
Zurück zum Zitat Zhang C, Li Q, Song D (2019) Aspect-based sentiment classification with aspect-specific graph convolutional networks. In: Proceedings of the conference on empirical methods in natural language processing 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4568-4578. https://doi.org/10.48550/arXiv.1909.03477 Zhang C, Li Q, Song D (2019) Aspect-based sentiment classification with aspect-specific graph convolutional networks. In: Proceedings of the conference on empirical methods in natural language processing 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4568-4578. https://​doi.​org/​10.​48550/​arXiv.​1909.​03477
22.
Zurück zum Zitat Huang B, Carley K (2019) Syntax-aware aspect level sentiment classification with graph attention networks. In: Proceedings of Conference on Empirical Methods in Natural Language Processing 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 5469-5477 Huang B, Carley K (2019) Syntax-aware aspect level sentiment classification with graph attention networks. In: Proceedings of Conference on Empirical Methods in Natural Language Processing 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 5469-5477
23.
Zurück zum Zitat Li R, Chen H, Feng F, Ma Z, Wang X, Hovy E (2021) Dual graph convolutional networks for aspect-based sentiment analysis. In: Proceedings of the 59th annual meeting of the association for computational linguistics and 11th international joint conference on natural language processing (ACL-IJCNLP), pp 6319-6329. https://doi.org/10.18653/v1/2021.acl-long.494 Li R, Chen H, Feng F, Ma Z, Wang X, Hovy E (2021) Dual graph convolutional networks for aspect-based sentiment analysis. In: Proceedings of the 59th annual meeting of the association for computational linguistics and 11th international joint conference on natural language processing (ACL-IJCNLP), pp 6319-6329. https://​doi.​org/​10.​18653/​v1/​2021.​acl-long.​494
26.
Zurück zum Zitat Wu Z, Ying C, Zhao F, Fan Z, Dai X, Xia R (2020) Grid tagging scheme for aspect-oriented fine-grained opinion extraction. In: Proceedings of conference on empirical methods in natural language processing (EMNLP), pp 2576-2585 Wu Z, Ying C, Zhao F, Fan Z, Dai X, Xia R (2020) Grid tagging scheme for aspect-oriented fine-grained opinion extraction. In: Proceedings of conference on empirical methods in natural language processing (EMNLP), pp 2576-2585
27.
Zurück zum Zitat Xu L, Li H, Lu W, Bing L (2020) Position-aware tagging for aspect sentiment triplet extraction. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 2339-2349 Xu L, Li H, Lu W, Bing L (2020) Position-aware tagging for aspect sentiment triplet extraction. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 2339-2349
28.
Zurück zum Zitat Zhang C, Li Q, Song D, Wang B (2020) A multi-task learning framework for opinion triplet extraction.In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 819-828 Zhang C, Li Q, Song D, Wang B (2020) A multi-task learning framework for opinion triplet extraction.In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 819-828
29.
Zurück zum Zitat Li Y, Li Z, Zhang M, Wang R, Li S, Si L (2019) Self-attentive Biaffine Dependency Parsing. In: International joint conference on artificial intelligence (IJCAI), pp 5067-5073 Li Y, Li Z, Zhang M, Wang R, Li S, Si L (2019) Self-attentive Biaffine Dependency Parsing. In: International joint conference on artificial intelligence (IJCAI), pp 5067-5073
36.
Zurück zum Zitat Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S (2014) SemEval-2014 task 4: Aspect based sentiment analysis. In: Proceedings of 8th international workshop on semantic evaluation (SemEval), pp 27-35 Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S (2014) SemEval-2014 task 4: Aspect based sentiment analysis. In: Proceedings of 8th international workshop on semantic evaluation (SemEval), pp 27-35
37.
Zurück zum Zitat Dong L, Wei F, Tan C, Tang D, Zhou M, Xu K (2014) Adaptive recursive neural network for target-dependent Twitter sentiment classification. In: Proceedings of 52th annual meeting of the association for computational linguistics (ACL), pp 49-54 Dong L, Wei F, Tan C, Tang D, Zhou M, Xu K (2014) Adaptive recursive neural network for target-dependent Twitter sentiment classification. In: Proceedings of 52th annual meeting of the association for computational linguistics (ACL), pp 49-54
40.
Zurück zum Zitat Ma D, Li S, Wu F, Xie X, Wang H (2019) Exploring sequence-to-sequence learning in aspect term extraction. In: Proceedings of the 57th annual meeting of the association for computational linguistics (ACL), pp 3538-3547. https://doi.org/10.18653/v1/P19-1344 Ma D, Li S, Wu F, Xie X, Wang H (2019) Exploring sequence-to-sequence learning in aspect term extraction. In: Proceedings of the 57th annual meeting of the association for computational linguistics (ACL), pp 3538-3547. https://​doi.​org/​10.​18653/​v1/​P19-1344
41.
Zurück zum Zitat Hu M, Peng Y, Huang Z, Li D, Lv Y (2019) Open-domain targeted sentiment analysis via span-based extraction and classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics (ACL), pp 537-546 Hu M, Peng Y, Huang Z, Li D, Lv Y (2019) Open-domain targeted sentiment analysis via span-based extraction and classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics (ACL), pp 537-546
42.
Zurück zum Zitat Wang K, Shen W, Yang Y, Quan X, Wang R (2020) Relational graph attention network for aspect-based sentiment analysis. In:Proceedings of the 58th annual meeting of the association for computational linguistics (ACL), pp 3229-3238 Wang K, Shen W, Yang Y, Quan X, Wang R (2020) Relational graph attention network for aspect-based sentiment analysis. In:Proceedings of the 58th annual meeting of the association for computational linguistics (ACL), pp 3229-3238
45.
Zurück zum Zitat Yan H, Dai J, Ji T, Qiu X, Zhang Z (2021) A unified generative framework for aspect-based sentiment analysis. In: Proceedings of the 59th annual meeting of the association for computational linguistics 11th international joint conference on natural language processing (IJCNLP), pp 2416-2429 Yan H, Dai J, Ji T, Qiu X, Zhang Z (2021) A unified generative framework for aspect-based sentiment analysis. In: Proceedings of the 59th annual meeting of the association for computational linguistics 11th international joint conference on natural language processing (IJCNLP), pp 2416-2429
47.
Zurück zum Zitat Zhang Z, Zhou Z, Wang Y (2022) SSEGCN: Syntactic and semantic enhanced graph convolutional network for aspect-based sentiment analysis. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4916-4925. https://doi.org/10.18653/v1/2022.naacl-main.362 Zhang Z, Zhou Z, Wang Y (2022) SSEGCN: Syntactic and semantic enhanced graph convolutional network for aspect-based sentiment analysis. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4916-4925. https://​doi.​org/​10.​18653/​v1/​2022.​naacl-main.​362
Metadaten
Titel
A syntactic features and interactive learning model for aspect-based sentiment analysis
verfasst von
Wang Zou
Wubo Zhang
Zhuofeng Tian
Wenhuan Wu
Publikationsdatum
26.04.2024
Verlag
Springer International Publishing
Erschienen in
Complex & Intelligent Systems
Print ISSN: 2199-4536
Elektronische ISSN: 2198-6053
DOI
https://doi.org/10.1007/s40747-024-01449-5

Premium Partner