Introduction
Background and preliminaries
Term frequency
WordNet
Word embeddings
Particles swarm optimization
Related work
Framework for proposed approach
Step 1 | For each term \(t_{i}\) in query q Construct set of synonyms \(q_{c}\) based on wordnet |
Step 2 | Create extended query set \(q^{\prime }\) by unifying original query q with \(q_{c}\). |
Step 3 | For each term \(t_{i^{\prime }}\,in \,q^{\prime }\) |
Extract the most similar relevant sense of the term within query Context based on word2vec(c) | |
Step 4 | Select the most frequent m terms from PRD |
Step 5 | Unify m with c for generating final candidate term that produce the Sense of query context |
Step 6 | For each final candidate term tf from step 5 |
Step 7 | Select optimal average weight for each term in final candidate term using PSO algorithm |
Step 8 | Unify the top optimal term from step 7 with original query q |
Proposed AQE approach
Proposed PSO term-weighting approach
Experiments and evaluation
Experimental environment
Building corpora (index) and query designing
Size | Number of documents | Number of sentence | Number of words |
---|---|---|---|
1 GB | 6464 | 9561 | 48,305 |
Parameter setting
Number of top ranked documents (N)
Size of pseudo relevance documents | ||||
---|---|---|---|---|
5 | 10 | 15 | 20 | |
MAP | 0.2179 | 0.514 | 0.3475 | 0.1875 |
Number of candidate terms selected for query expansion (M)
Number of expansions terms | |||||
---|---|---|---|---|---|
2 | 4 | 6 | 8 | 10 | |
MAP | 0.44179 | 0.5175 | 0.534 | 0.3375 | 0.2937 |
Experimental results
Overall performance
The comparison accuracy between the proposed approach and without expansion
The comparison accuracy between the proposed approach and WordNet based approach
The comparison accuracy between the proposed approach and TF based approach
The comparison results between the proposed approach and W2v based approach
Query no. | Original query | TF | WordNet | W2V | Proposed approach | |||||
---|---|---|---|---|---|---|---|---|---|---|
Recall | Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall | Precision | |
1 | 0.145 | 0.088 | 0.194 | 0.083 | 0.244 | 0.21 | 0.36 | 0.3 | 0.6 | 0.6 |
2 | 0.25 | 0.166 | 0.255 | 0.166 | 0.260 | 0.251 | 0.56 | 0.475 | 0.875 | 0.76 |
3 | 0.124 | 0. 076 | 0.136 | 0.064 | 0.295 | 0.166 | 0.29 | 0.2 | 0.8 | 0.8 |
4 | 0.253 | 0.177 | 0.257 | 0.1397 | 0.29 | 0.1331 | 0.68 | 0.68 | 0.43 | 0.41 |
5 | 0.150 | 0.075 | 0.17 | 0.11 | 0.81 | 0.75 | 0.78 | 0.78 | 0.69 | 0.65 |
6 | 0.133 | 0.10 | 0.43 | 0.33 | 0.58 | 0.43 | 0.66 | 0.59 | 0.80 | 0.74 |
7 | 0.135 | 0.094 | 0.100 | 0.038 | 0.168 | 0.166 | 0.278 | 0.22 | 0.78 | 0.68 |
8 | 0.170 | 0.09 | 0.18 | 0.16 | 0.2412 | 0.21 | 0.58 | 0.55 | 0.43 | 0.41 |
Original query | TF | WordNet | W2V | Proposed approach | |||||
---|---|---|---|---|---|---|---|---|---|
Recall | Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall | Precision |
0.17375 | 0.1194 | 0.1985 | 0.1488 | 0.2163 | 0.1974 | 0.2737 | 0.2706 | 0.53062 | 0.4728 |
Approach | h-value | p-value |
---|---|---|
Original query | 1 | 0.0257 |
tf-based approach | 1 | 0.0258 |
WordNet-based approach | 1 | 0.0295 |
W2v-based approach | 1 | 0.0330 |