1 Introduction
-
A three-phase skeletal framework targeted at exploiting semantic resources for diversified query expansion. This framework does not rely on query logs or other kinds of supervision, and thus, is immune to cold start issues.
-
A Wikipedia-based grounding of the framework leading to a method, Select-Link-Rank, abbreviated SLR. SLR addresses both diversified query expansion and entity recommendation by harvesting terms from initial query results, followed by prioritizing terms and entities using the Wikipedia graph in a diversity conscious fashion.
-
Select-Embed-Rank, abbreviated SER, another method based on the framework, but one that exploits word embeddings instead of Wikipedia. SER, like SLR, starts by selecting terms from initial query results, but constructs the graph using similarities of word embedding vectors, followed by a diversity ranking.
-
We present an empirical evaluation including a user study that benchmark SLR and SER against the state-of-the-art methods for DQE and DER, illustrating the effectiveness of these methods over existing methods.
2 Related work
2.1 Search result diversification
2.2 Diversified query expansion
Methoda
| User Data Reqd. | External Resource Reqd. | Source of Exp. Terms | Remarks |
---|---|---|---|---|
BHN [2] | − | ConceptNet | Entity Names | Expansion terms from the small |
(DER Baseline) | vocabulary of entity names | |||
Sub-topics (i.e, aspects) | − |
Documents
| Relevance judgements are often | |
(DQE Baseline) | and sub-topic level | impractical to get, in real systems | ||
relevance judgements | ||||
LBSN [21] | Query Logs | − | Query Logs | Cold start issue, also inapplicable |
for small-scale systems | ||||
BLN [3] | Query Logs | ConceptNet | Entity Names, | Expansion terms from small |
Wikipedia | Categories, | vocabulary as BHN and | ||
Query Logs etc. | query log usage as LBSN | |||
SLR (Ours) | − | Wikipedia | Documents | |
SER (Ours) | − | Pre-learned | Documents | |
Word Embeddings |
2.3 Semantic resources for query expansion
2.3.1 Wikipedia
2.3.2 Word embeddings
2.4 DQE uptake model
3 Problem statement and solution framework
3.1 Problem statement
3.2 Framework for using semantic resources in diversified query expansion
-
Selection: This phase selects information of relevance to the query from the document corpus used in the retrieval system. Across our methods, we select a subset of terms that are deemed relevant to the query.
-
Correlation: The information selected in the first phase is now correlated with external semantic resources. We propose separate methods for correlating with Wikipedia and pre-learned word embeddings, as we will illustrate in the next section.
-
Ranking: This phase involves ranking candidate expansion terms in order to arrive at a final result set, \(\mathbb {E}\). In both our methods, we make use of diversity-conscious graph node ranking using the vertex reinforced random walk technique, to rank the expansion terms. However, differences in the previous phase across the methods entail consequent differences in this phase as well.
4 Select-Link-Rank: Wikipedia for diversified query expansion
4.1 Select: Selecting candidate expansion terms
4.2 Link: Linking to wikipedia and entity graph creation
4.2.1 Identifying relevant wikipedia entities
4.2.2 Wikipedia subgraph creation
4.2.3 Entity importance weights
4.3 Rank: Ranking candidate terms
4.3.1 Vertex reinforced random walk
4.3.2 Diversified term ranking
4.4 Computational costs
-
The Select phase makes use of a search engine such as Indri [31] to run the queries, which might internally make use of language modelling and inference networks to perform the search. The system is reported to be quite fast delivering response times of the order of a second, as outlined in [31]. Selection of T terms from K retrieved documents can be performed using a heap, at a cost of \(\mathcal {O}(K \times L_{max} + W_{u} \times log(K^{\prime }))\) where L m a x is the maximum number of non-stop-words per document, and W u is the total number of unique words.
-
In the Link phase, each of the T chosen terms from the previous phase are used to expand queries and link to entities. This is performed using a reverse index from words to Wiki pages and a scoring mechanism such as TF-IDF.11 Computational costs depend on the number of candidate pages, which is roughly proportional to the total number of pages (with a very small constant), and inversely to the vocabulary of the corpus (number of unique words).
-
The Rank phase involves VRRW, whose matrix implementation takes time quadratic in \(\mathcal {O}(|S|^{2})\) per iteration, where |S| denotes the number of nodes in S, the graph over which VRRW is executed. In practice, we found VRRW to converge in less than 15 iterations, leading to very fast computations in the order of a few seconds.
4.5 Summary and remarks
5 Select-Embed-Rank: Word embeddings for diversified query expansion
5.1 Select: Selecting candidate expansion terms
5.2 Embed: Using word embeddings for term graph construction
5.2.1 Term graph construction
5.2.2 Term graph refinement
5.3 Rank: Ranking candidate terms
5.4 Using SER for diversified entity recommendation
5.5 Computational costs
-
The Select phase in SER, being identical to SLR, involves invocation of an IR engine such as Indri [31].Selection of T terms from K retrieved documents involves a cost of \(\mathcal {O}(K \times L_{max} + W_{u} \times log(K^{\prime }))\) where L m a x is the maximum number of non-stop-words per document, and W u is the total number of unique words.
-
The SER Embed phase differs significantly from the SLR Link phase in that it involves building a graph spanning the T terms selected in the previous phase. In the absence of any indexes, the graph construction is \(\mathcal {O}(T^{2})\). However this can be completely offset by maintaining a pre-computed index of similar terms which would result in a linear complexity of \(\mathcal {O}(\rho T)\), ρ being the edge-limit.
-
The Rank phase, being identical to SLR, is \(\mathcal {O}(|S|^{2})\) where |S| denotes the size of the refined term graph.
5.6 Summary
6 Experiments
6.1 Experimental setup
6.2 User study results
6.2.1 DQE evaluation: SLR vs. ts x Q u A D
Query Information | DQE Expansions Eval. | DER Entities Eval. | |||
---|---|---|---|---|---|
Sl# | Query | SLR |
t
s
x
Q
u
A
D
| SLR | BHN |
1 | coke |
37
| 6 |
40
| 11 |
2 | fifa 2006 |
40
| 3 |
33
| 18 |
3 | batman |
32
| 11 |
49
| 2 |
4 | jennifer actress |
40
| 3 |
48
| 3 |
5 | phoenix |
39
| 4 |
42
| 10 |
6 | valve |
38
| 5 |
40
| 12 |
7 | rock and roll |
40
| 3 |
46
| 4 |
8 | amazon |
39
| 4 |
39
| 13 |
9 | washington |
37
| 6 |
38
| 12 |
10 | jaguar |
37
| 6 |
46
| 5 |
11 | apple |
30
| 14 |
41
| 9 |
12 | world cup |
36
| 8 |
50
| 1 |
13 | michael jordan |
39
| 4 |
36
| 13 |
14 | java |
41
| 2 |
41
| 9 |
15 | python |
39
| 4 | 25 |
26
|
Average |
37.6
| 5.53 |
40.9
| 9.87 | |
Percentage |
87%
| 13% |
81%
| 19% |
6.2.2 DER evaluation: SLR vs. BHN
6.2.3 DQE evaluation: SLR vs. SER-Wiki and SLR vs. SER-News
Query Information | User Study Results | ||||
---|---|---|---|---|---|
Sl# | Query | SLR | SER-Wiki | SLR | SER-News |
1 | coke | 13 |
14
|
12
| 8 |
2 | fifa 2006 |
14
| 13 | 9 |
11
|
3 | batman | 6 |
21
|
14
| 6 |
4 | jennifer actress |
20
| 7 |
13
| 7 |
5 | phoenix |
17
| 10 |
10
|
10
|
6 | valve |
24
| 3 |
18
| 2 |
7 | rock and roll |
18
| 9 | 7 |
13
|
8 | amazon |
15
| 12 |
15
| 5 |
9 | washington |
21
| 6 |
17
| 3 |
10 | jaguar |
17
| 10 |
15
| 5 |
11 | apple | 13 |
14
|
11
| 9 |
12 | world cup |
22
| 5 |
18
| 2 |
13 | michael jordan |
23
| 4 |
17
| 3 |
14 | java | 6 |
21
|
16
| 4 |
15 | python |
16
| 11 |
12
| 8 |
Average |
16.33
| 10.67 |
13.6
| 6.4 | |
Percentage |
60%
| 40% |
68%
| 32% |
6.3 Automated diversity evaluation
6.3.1 Gini index analysis
-
Unsupervised Unevenness (UU): This measures the unevenness, using the Gini index, of relevance scores across all entities in \(\mathbb {N}\).
-
Supervised Unevenness (SU): For this, we measure the evenness of relevance scores across the entities in \(\mathbb {N}^{*}\), a manually identified set of entities that are known to be relevant to the query. Note that it is not necessarily the case that \(\mathbb {N}^{*} \subseteq \mathbb {N}\) since the DQE method could potentially miss some relevant entities due to weaknesses in the method. For computing this measure, we set the relevance scores of all entities in \(\mathbb {N}^{*} - \mathbb {N}\) to be 0.0, thus penalizing the DQE method for excluding such entities. Thus, the supervised unevenness is the Gini index measured over relevance scores of entities in \(\mathbb {N}^{*}\). Our \(\mathbb {N}^{*}\) is a set of 10 manually identified relevant entities to each query in our query set.
Method | Unsupervised | Supervised |
---|---|---|
Unevenness | Unevenness | |
SLR
|
0.465
|
0.241
|
SER-Wiki
| 0.599 | 0.675 |
SER-News
| 0.553 | 0.703 |
RM-CombSum-Wiki | 0.583 | 0.705 |
RM-CombSum-News | 0.645 | 0.734 |
ts
xQuAD
| 0.620 | 0.734 |
6.4 Parameter sensitivity analysis
SLR | SER-Wiki | ||||
---|---|---|---|---|---|
Parameter | Range | Stability factor | Parameter | Range | Stability factor |
α
| 0.6 − 0.7 | 90% |
ρ
| 4 − 6 | 77% |
λ
| 0.15 − 0.25 | 90% |
τ
| 0.35 − 0.45 | 58% |
μ
| 3 − 5 | 65% |