Introduction
-
We propose the idea of utilizing different sentence parts with respect to different downstream tasks to enhance the performance of sentence representations.
-
We implement the idea of sentence parts enhancing on BERT. By using a pooling operation to extract the embedding of sentence parts, we design three strategies to merge different embeddings to render the incorporation more feasible.
-
Our experiments on six datasets, including sentiment classification and semantic textual similarity tasks, show that SpeBERT achieves significant and consistent improvement in performance. Furthermore, our experiments on different downstream tasks show the importance of sentence parts that need to be considered for corresponding tasks, which may be useful in the research of downstream tasks.
Related work
Proposed method
Overview
Extraction of different embeddings
Sentence part masks
Sentence | Results of dependency parsing |
---|---|
The grain was terrible | ((‘terrible’, ‘JJ’), ‘nsubj’, (‘grain’, ‘NN’)) |
((‘grain’, ‘NN’), ‘det’, (‘The’, ‘DT’)) | |
((‘terrible’, ‘JJ’), ‘cop’, (‘was’, ‘VBD’)) |
Sentence part pooling
Different fusion strategies
Mean strategy
Concatenation strategy
Weighted mean strategy
Model training
Experiment
Dataset | Task | #Number | #Label | Metrics |
---|---|---|---|---|
MR | Sentiment classification | 11 k | 2 | Accuracy |
CR | Sentiment classification | 4 k | 2 | Accuracy |
SST-2 | Sentiment classification | 70 k | 2 | Accuracy |
SICK-R | Semantic similarity | 10 k | 1 | Pearson correlation |
STS-B | Semantic similarity | 8.7 k | 1 | Spearman correlation |
Chinese-STS-B | Semantic similarity | 8.5 k | 1 | Spearman correlation |
Datasets
-
MR: Sentiment prediction for movie reviews [26].
-
CR: Sentiment prediction for customer product reviews [27].
-
SST-2: Stanford Sentiment Treebank with binary labels [6].
-
SICK-R: Semantic relatedness subtask from Sentence Involving Compositional Knowledge (SICK) [28].
-
STS-B: Semantic Textual Similarity (STS) benchmark with sentence pairs derived from the three categories of captions, news, and forums [7].
Experimental details
Experimental results
Sentiment classification
Model | MR | CR | SST-2 | SICK-R | STS-B | Chinese-STS-B |
---|---|---|---|---|---|---|
Unsupervised training methods | ||||||
FastSent [12] | 70.80 | 78.40 | – | – | – | – |
FastSent+AE [12] | 71.80 | 76.70 | – | – | – | – |
Skip-thought [10] | 76.50 | 80.10 | 82.00 | 0.858 | – | – |
Skip-thought-LN [15] | 79.40 | 83.10 | 82.90 | 0.858 | 70.20 | – |
Supervised training methods | ||||||
DictRep (bow) [15] | 76.70 | 78.70 | – | – | – | – |
InferSent [15] | 81.10 | 86.30 | 84.60 | 0.884 | 75.50 | – |
Multitask training methods | ||||||
LSMTL [23] | 82.50 | 87.70 | 83.20 | 0.888 | 78.60 | – |
Self-supervised training methods | ||||||
DisSent books 5 [24] | 80.20 | 85.40 | 82.80 | 0.845 | – | – |
DisSent books 8 [24] | 79.80 | 85.00 | 83.90 | 0.854 | – | – |
DisSent books ALL [24] | 80.10 | 84.90 | 84.10 | 0.849 | – | – |
Fine-tuned methods | ||||||
BERT-base\(^{1}\) | 87.44 | 90.69 | 93.25 | 0.884 | 84.34 | 79.94 |
BERT-large\(^{1}\) | 88.86 | 91.70 | 94.01 | 0.890 | 85.13 | – |
SBERT-base [8] | 83.64 | 89.43 | 88.96 | – | 84.67 | – |
SBERT-large [8] | 84.88 | 90.07 | 90.66 | – | 84.45 | – |
SpeBERT-base\(^{1}\) | 88.04 | 91.41 | 93.85 | 0.889 | 85.52 | 80.86 |
SpeBERT-large\(^{1}\) | 89.47 | 92.65 | 94.84 | 0.894 | 86.37 | – |
Supervised models trained from scratch (Results extracted from [23]) | ||||||
Naive bayes-SVM | 79.40 | 81.80 | 83.10 | – | – | – |
AdaSent | 83.10 | 86.30 | – | – | – | – |
BLSTM-2DCNN | 82.30 | – | 89.50 | – | – | – |
Semantic textual similarity
Ablation study
Ablation setting | SST-2 | STS-B |
---|---|---|
Sentence part | ||
w/ main parts | 92.20 | 89.01 |
w/ other parts | 92.78 | 88.37 |
w/o (BERT) | 92.55 | 88.56 |
Fusion strategy | ||
MEAN | 92.78 | 89.01 |
CONCAT | 92.66 | 89.30 |
WMEAN | 93.12 | 88.89 |
Sentence part
Fusion strategy
Sentence | Ground truth | BERT | SpeBERT |
---|---|---|---|
Is n’t it great ? | 1 | 0 | 1 |
A very average science fiction film. | 0 | 1 | 0 |
Propelled not by characters but by caricatures. | 0 | 1 | 0 |
Analysis
Case analysis
Parameter size
Model | MR | CR | SST-2 | SICK-R | STS-B | Chinese-STS-B |
---|---|---|---|---|---|---|
BERT-base | 110074370 | 110074370 | 110074370 | 110076677 | 110077446 | 102862854 |
BERT-large | 336193538 | 336193538 | 336193538 | 336196613 | 336197638 | – |
SpeBERT-base | 110075907 | 110075907 | 110075907 | 110080517 | 110082054 | 102867462 |
SpeBERT-large | 336195587 | 336195587 | 336195587 | 336201733 | 336203782 | – |
Model | MR | CR | SST-2 | SICK-R | STS-B | Chinese-STS-B |
---|---|---|---|---|---|---|
BERT-base | 328s | 143s | 2022s | 189s | 214s | 198s |
BERT-large | 943s | 392s | 6070s | 534s | 609s | – |
SpeBERT-base | 327s | 142s | 2028s | 190s | 216s | 199s |
SpeBERT-large | 947s | 392s | 6073s | 531s | 610s | – |