Introduction
-
The proposed LTR-BERT model preserves the vector semantic representation of documents in advance. This process needs to be calculated only once, so the cost is lower than that of interaction-based semantic matching.
-
Aiming at the missing query semantics caused by the large difference between the query length and the document length, a query expansion strategy is designed to improve the semantic matching ability of a query and related documents.
-
Inspired by exact term matching, the proposed LTR-BERT model uses a cheap interaction mechanism without training parameters. The interaction mechanism considers the fine-grained relevance of documents and queries while saving computing time during matching.
Related work
Interaction-focused neural IR model
Representation-focused neural IR model
Long-text inference model
Long-text retrieval model
Representation-focused long-text retrieval method
An overview of the architecture
Query representation via online BERT
Long-text representation via off-line BERT
Matching mechanism without training parameters
Model training
Combining relevance matching and semantic matching
Experimental data and parameter setting
Experimental data and evaluation
Compare models and parameter settings
Experimental results and analysis
Study on semantic matching for long text
Model | 2019 TREC deep learning track dev—binary relevance labels | ||||
---|---|---|---|---|---|
Max doc length | nDCG@10 | MRR@10 | MAP@100 | Average docs./ms | |
BM25 (tuned) | – | 0.3251 | 0.2646 | 0.2769 | – |
MatchPyramid | 500 | 0.3364 | 0.2716 | 0.2789 | 27 |
PACRR | 500 | 0.3344 | 0.2729 | 0.2816 | 22 |
CO-PACRR | 500 | 0.3436 | 0.2824 | 0.2819 | 14 |
K-NRM | 500 | 0.3214 | 0.2609 | 0.2638 | 49 |
CONV-KNRM | 500 | 0.3424 | 0.2836 | 0.2866 | 10 |
BERT-base [CLS] | 500 | 0.4165 | 0.3535 | 0.3594 | 0.1 |
ColBERT | 500 | 0.4057 | 0.3425 | 0.3498 | 31.3 |
TKL | 1000 | 0.3758 | 0.3125 | 0.3213 | 1.5 |
TKL | 2000 | 0.3396 | 0.2790 | 0.2892 | 1.1 |
DRSCM (2 sum) | 1000 | 0.4216 | 0.3433 | 0.3567 | 10.0 |
DRSCM (4 sum) | 2000 | 0.4178 | 0.3325 | 0.3425 | 10.0 |
LTR-Longformer | 1000 | 0.3347 | 0.2658 | 0.2734 | 26 |
LTR-Longformer | 2000 | 0.3364 | 0.2712 | 0.2776 | 26 |
LTR-BERT | 1000 | 0.4338 | 0.3632 | 0.3701 | 33.3 |
LTR-BERT | 2000 | 0.4289 | 0.3574 | 0.3647 | 33.3 |
Study on fine-grained semantic matching for long text
Model | 2019 TREC deep learning track test—continuous relevance labels | ||||
---|---|---|---|---|---|
Max doc length | nDCG@10 | MRR@10 | MAP@100 | Average docs/ms | |
BM25 (tuned) | – | 0.5234 | 0.8632 | 0.2339 | |
MatchPyramid | 500 | 0.5741 | 0.9011 | 0.2324 | 27 |
PACRR | 500 | 0.5960 | 0.8591 | 0.2183 | 22 |
CO-PACRR | 500 | 0.5349 | 0.8845 | 0.2231 | 14 |
K-NRM | 500 | 0.4936 | 0.7631 | 0.2124 | 49 |
CONV-KNRM | 500 | 0.5465 | 0.8993 | 0.2341 | 10 |
BERT-base [CLS] | 500 | 0.6512 | 0.9436 | 0.2613 | 0.1 |
bm25_marcomb | – | 0.640 | 0.913 | 0.323 | < 0.1 |
ColBERT | 500 | 0.6439 | 0.9279 | 0.2610 | 31.3 |
TKL | 1000 | 0.5284 | 0.910 | 0.2278 | 1.5 |
TKL | 2000 | 0.5475 | 0.915 | 0.2351 | 1.1 |
DRSCM (2 sum) | 1000 | 0.6434 | 0.9193 | 0.2531 | 10.0 |
DRSCM (4 sum) | 2000 | 0.6375 | 0.9108 | 0.2452 | 10.0 |
LTR-Longformer | 1000 | 0.5366 | 0.9085 | 0.2286 | 26 |
LTR-Longformer | 2000 | 0.5424 | 0.9128 | 0.2347 | 26 |
LTR-BERT | 1000 | 0.6674 | 0.9341 | 0.2711 | 33.3 |
LTR-BERT | 2000 | 0.6666 | 0.9341 | 0.2734 | 33.3 |
Ablation study
Study on different types of datasets
NDCG@10 | MRR@10 | MAP | |||||
---|---|---|---|---|---|---|---|
BM25 | 0.3251 | 0.2646 | 0.2769 | ||||
MS docs dev | 500 | 0.4329* | + 33.16% | 0.3627* | + 37.07% | 0.3696* | + 33.48% |
1000 | 0.4338* | + 33.44% | 0.3632* | + 37.26% | 0.3701* | + 33.66% | |
1500 | 0.4343* | + 33.59% | 0.3636* | + 37.41% | 0.3705* | + 33.80% | |
2000 | 0.4289* | + 31.93% | 0.3574* | + 35.07% | 0.3647* | + 31.71% | |
BM25 | 0.5234 | 0.7843 | 0.2339 | ||||
MS docs Test2019 | 500 | 0.6602* | + 26.14% | 0.9341* | + 19.10% | 0.2666* | + 13.98% |
1000 | 0.6674* | + 27.51% | 0.9341* | + 19.10% | 0.2711* | + 15.90% | |
1500 | 0.6673* | + 27.49% | 0.9341* | + 19.10% | 0.2726* | + 16.55% | |
2000 | 0.6666* | + 27.36% | 0.9341* | + 19.10% | 0.2734* | + 16.89% | |
BM25 | 0.3190 | 0.4218 | 0.2188 | ||||
FBIS | 500 | 0.3557* | + 11.50% | 0.4495* | + 6.57% | 0.2297* | + 4.98% |
1000 | 0.3404* | + 6.71% | 0.4583* | + 8.65% | 0.2265* | + 3.52% | |
1500 | 0.3416* | + 7.08% | 0.5755* | + 36.44% | 0.2286* | + 4.48% | |
2000 | 0.3383* | + 6.05% | 0.4568* | + 8.30% | 0.2280* | + 4.20% | |
BM25 | 0.3276 | 0.5186 | 0.2019 | ||||
SJMN | 500 | 0.3426* | + 4.58% | 0.5454 | + 5.17% | 0.2074* | + 2.72% |
1000 | 0.3435* | + 4.85% | 0.5428 | + 4.67% | 0.2077* | + 2.87% | |
1500 | 0.3468* | + 5.86% | 0.5494 | + 5.94% | 0.2081* | + 3.07% | |
2000 | 0.3468* | + 5.86% | 0.5494 | + 5.94% | 0.2080* | + 3.02% | |
BM25 | 0.5040 | 0.6327 | 0.2375 | ||||
Disk1&2 | 500 | 0.5302* | + 5.20% | 0.6720* | + 6.21% | 0.2502* | + 5.35% |
1000 | 0.5414* | + 7.42% | 0.6867* | + 8.53% | 0.2561* | + 7.83% | |
1500 | 0.5381* | + 6.77% | 0.6866* | + 8.52% | 0.2562* | + 7.87% | |
2000 | 0.5373* | + 6.61% | 0.6850* | + 8.27% | 0.2561* | + 7.83% | |
BM25 | 0.3772 | 0.5750 | 0.2470 | ||||
LA | 500 | 0.4019* | + 6.55% | 0.5853* | + 1.79% | 0.2649* | + 7.25% |
1000 | 0.3973* | + 5.33% | 0.5899* | + 2.59% | 0.2642* | + 6.96% | |
1500 | 0.4039* | + 7.08% | 0.6030* | + 4.87% | 0.2674* | + 8.26% | |
2000 | 0.3977* | + 5.43% | 0.6001* | + 4.37% | 0.2677* | + 8.38% |
Indexing throughput and footprint
Method | Dim | Space (GiBs) | Throughput (documents/s) | MAP |
---|---|---|---|---|
LTR-BERT | 128 | 8.96G | 76.294 | 0.2536 |
LTR-BERT | 96 | 6.87G | 97.247 | 0.2528 |
LTR-BERT | 48 | 3.45G | 111.366 | 0.2517 |
LTR-BERT | 24 | 1.74G | 111.025 | 0.2496 |
LTR-BERT | 12 | 0.95G | 131.765 | 0.2204 |