research-article

ReQue: A Configurable Workflow and Dataset Collection for Query Refinement

Authors:
Mahtab Tamannaee

Ryerson University, Toronto, Canada

Ryerson University, Toronto, Canada
View Profile

,
Hossein Fani

University of Windsor, Windsor, Canada

University of Windsor, Windsor, Canada
View Profile

,
Fattane Zarrinkalam

Ryerson University, Toronto, Canada

Ryerson University, Toronto, Canada
View Profile

,
Jamil Samouh

Ryerson University, Toronto, Canada

Ryerson University, Toronto, Canada
View Profile

,
Samad Paydar

Ryerson University, Toronto, Canada

Ryerson University, Toronto, Canada
View Profile

,
Ebrahim Bagheri

Ryerson University, Toronto, Canada

Ryerson University, Toronto, Canada
View Profile

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020Pages 3165–3172https://doi.org/10.1145/3340531.3412775

Published:19 October 2020Publication History

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 3165–3172

ABSTRACT

In this paper, we implement and publicly share a configurable software workflow and a collection of gold standard datasets for training and evaluating supervised query refinement methods. Existing datasets such as AOL and MS MARCO, which have been extensively used in the literature for this purpose, are based on the weak assumption that users' input queries improve gradually within a search session, i.e., the last query where the user ends her information seeking session is the best reconstructed version of her initial query. In practice, such an assumption is not necessarily accurate for a variety of reasons, e.g., topic drift. The objective of our work is to enable researchers to build gold standard query refinement datasets without having to rely on such weak assumptions. Our software workflow, which generates such gold standard query datasets, takes three inputs: (1) a dataset of queries along with their associated relevance judgements (e.g. TREC topics), (2) an information retrieval method (e.g., BM25), and (3) an evaluation metric (e.g., MAP), and outputs a gold standard dataset. The produced gold standard dataset includes a list of revised queries for each query in the input dataset, each of which effectively improves the performance of the specified retrieval method in terms of the desirable evaluation metric. Since our workflow can be used to generate gold standard datasets for any input query set, in this paper, we have generated and publicly shared gold standard datasets for TREC queries associated with Robust04, Gov2, ClueWeb09, and ClueWeb12. The source code of our software workflow, the generated gold datasets, and benchmark results for three state-of-the-art supervised query refinement methods over these datasets are made publicly available for reproducibility purposes.

Supplemental Material

3340531.3412775.mp4

mp4

69.8 MB

Download

References

Wasi Uddin Ahmad, Kai-Wei Chang, and Hongning Wang. 2019. Context Attentive Document Ranking and Query Suggestion. In 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR. 385--394.Google Scholar
Bashar Al-Shboul and Sung-Hyon Myaeng. 2014. Wikipedia-based query phrase expansion in patent class search. Information Retrieval 17, 5--6 (2014), 430--451.Google ScholarDigital Library
Hiteshwar Kumar Azad and Akshay Deepak. 2019. Query expansion techniques for information retrieval: A survey. Inf. Process. Manag. 56, 5 (2019), 1698--1735.Google ScholarDigital Library
Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks.Google Scholar
Fei Cai and Honghui Chen. 2017. Term-level semantic similarity helps time-aware term popularity based query completion. J. Intell. Fuzzy Syst. 32, 6 (2017), 3999--4008.Google ScholarDigital Library
Fei Cai and Maarten de Rijke. 2016. Learning from homologous queries and semantically related terms for query auto completion. Inf. Process. Manag. 52, 4 (2016), 628--643.Google ScholarDigital Library
Fei Cai and Maarten de Rijke. 2016. Selectively Personalizing Query Auto-Completion. In 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR. 993--996.Google Scholar
Fei Cai, Shangsong Liang, and Maarten de Rijke. 2016. Prefix-Adaptive and Time-Sensitive Personalized Query Auto Completion. IEEE Trans. Knowl. Data Eng. 28, 9 (2016), 2452--2466.Google ScholarDigital Library
Claudio Carpineto, Renato de Mori, Giovanni Romano, and Brigitte Bigi. 2001. An information-theoretic approach to automatic query expansion. ACM Trans. Inf. Syst. 19, 1 (2001), 1--27.Google ScholarDigital Library
Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. 2017. Personalized Query Suggestion Diversification. In 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 817--820.Google Scholar
Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. 2018. Attention-based Hierarchical Neural Query Suggestion. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR. 1093--1096.Google Scholar
Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, and Pascal Fleury. 2017. Learning to Attend, Copy, and Generate for Session-Based Query Suggestion. In 2017 ACM on Conference on Information and Knowledge Management. 1747--1756.Google ScholarDigital Library
Heng Ding, Shuo Zhang, Darío Garigliotti, and Krisztian Balog. 2018. Generating High-Quality Query Suggestion Candidates for Task-Based Search. In 40th European Conference on IR Research, ECIR (Lecture Notes in Computer Science, Vol. 10772). Springer, 625--631.Google Scholar
Paolo Ferragina and Ugo Scaiella. 2010. TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In 19th ACM Conference on Information and Knowledge Management, CIKM 2010. ACM, 1625--1628.Google ScholarDigital Library
Nicolas Fiorini and Zhiyong Lu. 2018. Personalized neural language models for real-world query auto completion. In 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. Association for Computational Linguistics, 208--215.Google ScholarCross Ref
Fred X. Han, Di Niu, Haolan Chen, Kunfeng Lai, Yancheng He, and Yu Xu. 2019. A Deep Generative Approach to Search Extrapolation and Recommendation. In 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD. 1771--1779.Google ScholarDigital Library
Fred X. Han, Di Niu, Kunfeng Lai, Weidong Guo, Yancheng He, and Yu Xu. 2019. Inferring Search Queries from Web Documents via a Graph-Augmented Sequence to Attention Network. In The World Wide Web Conference, WWW. 2792--2798.Google ScholarDigital Library
Ming-Hung Hsu, Ming-Feng Tsai, and Hsin-Hsi Chen. 2006. Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison. In Information Retrieval Technology, Third Asia Information Retrieval Symposium, AIRS 2006 (Lecture Notes in Computer Science, Vol. 4182). Springer, 1--13.Google Scholar
Zhipeng Huang and Nikos Mamoulis. 2017. Location-Aware Query Recommendation for Search Engines at Scale. In 15th International Symposium, SSTD (Lecture Notes in Computer Science, Vol. 10411). 203--220.Google ScholarCross Ref
Aaron Jaech and Mari Ostendorf. 2018. Personalized Language Model for Query Auto-Completion. In 56th Annual Meeting of the Association for Computational Linguistics, ACL. Association for Computational Linguistics, 700--705.Google Scholar
Jyun-Yu Jiang and Wei Wang. 2018. RIN: Reformulation Inference Network for Context-Aware Query Suggestion. In 27th ACM International Conference on Information and Knowledge Management, CIKM. 197--206.Google ScholarDigital Library
Reiner Kraft and Jason Y. Zien. 2004. Mining anchor text for query refinement. In 13th international conference on World Wide Web, WWW 2004. ACM, 666--674.Google Scholar
Saar Kuzi, Anna Shtok, and Oren Kurland. 2016. Query Expansion Using Word Embeddings. In 25th ACM International Conference on Information and Knowledge Management, CIKM 2016. ACM, 1929--1932.Google Scholar
Kyung-Soon Lee, W. Bruce Croft, and James Allan. 2008. A cluster-based resampling method for pseudo-relevance feedback. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008. ACM, 235--242.Google ScholarDigital Library
Ruirui Li, Liangda Li, Xian Wu, Yunhong Zhou, and Wei Wang. 2019. Click Feedback-Aware Query Recommendation Using Adversarial Examples. In The World Wide Web Conference, WWW 2019. ACM, 2978--2984.Google Scholar
Yuezhang Li, Ronghuo Zheng, Tian Tian, Zhiting Hu, Rahul Iyer, and Katia P. Sycara. 2016. Joint Embedding of Hierarchical Categories and Entities for Concept Categorization and Dataless Classification. In COLING 2016, 26th International Conference on Computational Linguistics, The Conference: Technical Papers, December 11--16, 2016, Osaka, Japan. ACL, 2678--2688.Google Scholar
Xiaoyu Liu, Shunda Pan, Qi Zhang, Yu-Gang Jiang, and Xuanjing Huang. 2018. Generating Keyword Queries for Natural Language Queries to Alleviate Lexical Chasm Problem. In 27th ACM International Conference on Information and Knowledge Management. 1163--1172.Google ScholarDigital Library
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP. The Association for Computational Linguistics, 1412--1421.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States. 3111--3119.Google Scholar
Bhaskar Mitra. 2015. Exploring Session Context using Distributed Representations of Queries and Reformulations. In 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3--12.Google ScholarDigital Library
Bhaskar Mitra and Nick Craswell. 2015. Query Auto-Completion for Rare Prefixes. In 24th ACM International Conference on Information and Knowledge Management, CIKM. 1755--1758.Google Scholar
Apostol Natsev, Alexander Haubold, Jelena Tesic, Lexing Xie, and Rong Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In 15th International Conference on Multimedia. ACM, 991--1000.Google ScholarDigital Library
Dipasree Pal, Mandar Mitra, and Kalyankumar Datta. 2014. Improving query expansion using WordNet. J. Assoc. Inf. Sci. Technol. 65, 12 (2014), 2469--2478.Google ScholarDigital Library
Dae Hoon Park and Rikio Chiba. 2017. A Neural Language Model for Query Auto-Completion. In 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1189--1192.Google ScholarDigital Library
Shuyao Qi, Dingming Wu, and Nikos Mamoulis. 2016. Location Aware Keyword Query Suggestion Based on Document Proximity. IEEE Trans. Knowl. Data Eng. 28, 1 (2016), 82--97.Google ScholarDigital Library
Alexandra Schofield and David M. Mimno. 2016. Comparing Apples to Apple: The Effects of Stemmers on Topic Models. Trans. Assoc. Comput. Linguistics 4 (2016), 287--300.Google ScholarCross Ref
Milad Shokouhi. 2013. Learning to personalize query auto-completion. In 36th International ACM SIGIR conference on research and development in Information Retrieval. 103--112.Google ScholarDigital Library
Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian-Yun Nie. 2015. A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion. In 24th ACM International Conference on Information and Knowledge Management, CIKM 2015. ACM, 553--562.Google ScholarDigital Library
Liling Tan. [n.d.]. Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]. https://github.com/alvations/pywsd.Google Scholar
Stewart Whiting and Joemon M. Jose. 2014. Recent and robust query auto-completion. In 23rd International World Wide Web Conference, WWW. 971--982.Google Scholar
Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1253--1256.Google ScholarDigital Library

Index Terms

ReQue: A Configurable Workflow and Dataset Collection for Query Refinement
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query suggestion

Recommendations

Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Researchers have already shown that it is possible to improve retrieval effectiveness through the systematic reformulation of users' queries. Traditionally, most query reformulation techniques relied on unsupervised approaches such as query expansion ...
Read More
RePair: An Extensible Toolkit to Generate Large-Scale Datasets for Query Refinement via Transformers
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Query refinement is the process of transforming users' queries into newrefined versions without semantic drift to enhance the relevance of search results. Prior query refiners were benchmarked on web query logs followingweak assumptions that users' input ...
Read More
Query Reformulation for Content Based Multimedia Retrieval in MARS
ICMCS '99: Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2

Unlike traditional database management systems, in content-based multimedia retrieval databases, it is difficult for users to express their exact information need directly in a precise query. A typical interface allows users to express their information ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
gold standard dataset
query refinement
reproducibility
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 144
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ReQue: A Configurable Workflow and Dataset Collection for Query Refinement

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation

RePair: An Extensible Toolkit to Generate Large-Scale Datasets for Query Refinement via Transformers

Query Reformulation for Content Based Multimedia Retrieval in MARS