skip to main content
10.1145/3340531.3412775acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

ReQue: A Configurable Workflow and Dataset Collection for Query Refinement

Published:19 October 2020Publication History

ABSTRACT

In this paper, we implement and publicly share a configurable software workflow and a collection of gold standard datasets for training and evaluating supervised query refinement methods. Existing datasets such as AOL and MS MARCO, which have been extensively used in the literature for this purpose, are based on the weak assumption that users' input queries improve gradually within a search session, i.e., the last query where the user ends her information seeking session is the best reconstructed version of her initial query. In practice, such an assumption is not necessarily accurate for a variety of reasons, e.g., topic drift. The objective of our work is to enable researchers to build gold standard query refinement datasets without having to rely on such weak assumptions. Our software workflow, which generates such gold standard query datasets, takes three inputs: (1) a dataset of queries along with their associated relevance judgements (e.g. TREC topics), (2) an information retrieval method (e.g., BM25), and (3) an evaluation metric (e.g., MAP), and outputs a gold standard dataset. The produced gold standard dataset includes a list of revised queries for each query in the input dataset, each of which effectively improves the performance of the specified retrieval method in terms of the desirable evaluation metric. Since our workflow can be used to generate gold standard datasets for any input query set, in this paper, we have generated and publicly shared gold standard datasets for TREC queries associated with Robust04, Gov2, ClueWeb09, and ClueWeb12. The source code of our software workflow, the generated gold datasets, and benchmark results for three state-of-the-art supervised query refinement methods over these datasets are made publicly available for reproducibility purposes.

Skip Supplemental Material Section

Supplemental Material

3340531.3412775.mp4

mp4

69.8 MB

References

  1. Wasi Uddin Ahmad, Kai-Wei Chang, and Hongning Wang. 2019. Context Attentive Document Ranking and Query Suggestion. In 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR. 385--394.Google ScholarGoogle Scholar
  2. Bashar Al-Shboul and Sung-Hyon Myaeng. 2014. Wikipedia-based query phrase expansion in patent class search. Information Retrieval 17, 5--6 (2014), 430--451.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hiteshwar Kumar Azad and Akshay Deepak. 2019. Query expansion techniques for information retrieval: A survey. Inf. Process. Manag. 56, 5 (2019), 1698--1735.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks.Google ScholarGoogle Scholar
  5. Fei Cai and Honghui Chen. 2017. Term-level semantic similarity helps time-aware term popularity based query completion. J. Intell. Fuzzy Syst. 32, 6 (2017), 3999--4008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fei Cai and Maarten de Rijke. 2016. Learning from homologous queries and semantically related terms for query auto completion. Inf. Process. Manag. 52, 4 (2016), 628--643.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fei Cai and Maarten de Rijke. 2016. Selectively Personalizing Query Auto-Completion. In 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR. 993--996.Google ScholarGoogle Scholar
  8. Fei Cai, Shangsong Liang, and Maarten de Rijke. 2016. Prefix-Adaptive and Time-Sensitive Personalized Query Auto Completion. IEEE Trans. Knowl. Data Eng. 28, 9 (2016), 2452--2466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Claudio Carpineto, Renato de Mori, Giovanni Romano, and Brigitte Bigi. 2001. An information-theoretic approach to automatic query expansion. ACM Trans. Inf. Syst. 19, 1 (2001), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. 2017. Personalized Query Suggestion Diversification. In 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 817--820.Google ScholarGoogle Scholar
  11. Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. 2018. Attention-based Hierarchical Neural Query Suggestion. In 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR. 1093--1096.Google ScholarGoogle Scholar
  12. Mostafa Dehghani, Sascha Rothe, Enrique Alfonseca, and Pascal Fleury. 2017. Learning to Attend, Copy, and Generate for Session-Based Query Suggestion. In 2017 ACM on Conference on Information and Knowledge Management. 1747--1756.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Heng Ding, Shuo Zhang, Darío Garigliotti, and Krisztian Balog. 2018. Generating High-Quality Query Suggestion Candidates for Task-Based Search. In 40th European Conference on IR Research, ECIR (Lecture Notes in Computer Science, Vol. 10772). Springer, 625--631.Google ScholarGoogle Scholar
  14. Paolo Ferragina and Ugo Scaiella. 2010. TAGME: on-the-fly annotation of short text fragments (by wikipedia entities). In 19th ACM Conference on Information and Knowledge Management, CIKM 2010. ACM, 1625--1628.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nicolas Fiorini and Zhiyong Lu. 2018. Personalized neural language models for real-world query auto completion. In 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. Association for Computational Linguistics, 208--215.Google ScholarGoogle ScholarCross RefCross Ref
  16. Fred X. Han, Di Niu, Haolan Chen, Kunfeng Lai, Yancheng He, and Yu Xu. 2019. A Deep Generative Approach to Search Extrapolation and Recommendation. In 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD. 1771--1779.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fred X. Han, Di Niu, Kunfeng Lai, Weidong Guo, Yancheng He, and Yu Xu. 2019. Inferring Search Queries from Web Documents via a Graph-Augmented Sequence to Attention Network. In The World Wide Web Conference, WWW. 2792--2798.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ming-Hung Hsu, Ming-Feng Tsai, and Hsin-Hsi Chen. 2006. Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison. In Information Retrieval Technology, Third Asia Information Retrieval Symposium, AIRS 2006 (Lecture Notes in Computer Science, Vol. 4182). Springer, 1--13.Google ScholarGoogle Scholar
  19. Zhipeng Huang and Nikos Mamoulis. 2017. Location-Aware Query Recommendation for Search Engines at Scale. In 15th International Symposium, SSTD (Lecture Notes in Computer Science, Vol. 10411). 203--220.Google ScholarGoogle ScholarCross RefCross Ref
  20. Aaron Jaech and Mari Ostendorf. 2018. Personalized Language Model for Query Auto-Completion. In 56th Annual Meeting of the Association for Computational Linguistics, ACL. Association for Computational Linguistics, 700--705.Google ScholarGoogle Scholar
  21. Jyun-Yu Jiang and Wei Wang. 2018. RIN: Reformulation Inference Network for Context-Aware Query Suggestion. In 27th ACM International Conference on Information and Knowledge Management, CIKM. 197--206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Reiner Kraft and Jason Y. Zien. 2004. Mining anchor text for query refinement. In 13th international conference on World Wide Web, WWW 2004. ACM, 666--674.Google ScholarGoogle Scholar
  23. Saar Kuzi, Anna Shtok, and Oren Kurland. 2016. Query Expansion Using Word Embeddings. In 25th ACM International Conference on Information and Knowledge Management, CIKM 2016. ACM, 1929--1932.Google ScholarGoogle Scholar
  24. Kyung-Soon Lee, W. Bruce Croft, and James Allan. 2008. A cluster-based resampling method for pseudo-relevance feedback. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008. ACM, 235--242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ruirui Li, Liangda Li, Xian Wu, Yunhong Zhou, and Wei Wang. 2019. Click Feedback-Aware Query Recommendation Using Adversarial Examples. In The World Wide Web Conference, WWW 2019. ACM, 2978--2984.Google ScholarGoogle Scholar
  26. Yuezhang Li, Ronghuo Zheng, Tian Tian, Zhiting Hu, Rahul Iyer, and Katia P. Sycara. 2016. Joint Embedding of Hierarchical Categories and Entities for Concept Categorization and Dataless Classification. In COLING 2016, 26th International Conference on Computational Linguistics, The Conference: Technical Papers, December 11--16, 2016, Osaka, Japan. ACL, 2678--2688.Google ScholarGoogle Scholar
  27. Xiaoyu Liu, Shunda Pan, Qi Zhang, Yu-Gang Jiang, and Xuanjing Huang. 2018. Generating Keyword Queries for Natural Language Queries to Alleviate Lexical Chasm Problem. In 27th ACM International Conference on Information and Knowledge Management. 1163--1172.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP. The Association for Computational Linguistics, 1412--1421.Google ScholarGoogle Scholar
  29. Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States. 3111--3119.Google ScholarGoogle Scholar
  30. Bhaskar Mitra. 2015. Exploring Session Context using Distributed Representations of Queries and Reformulations. In 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bhaskar Mitra and Nick Craswell. 2015. Query Auto-Completion for Rare Prefixes. In 24th ACM International Conference on Information and Knowledge Management, CIKM. 1755--1758.Google ScholarGoogle Scholar
  32. Apostol Natsev, Alexander Haubold, Jelena Tesic, Lexing Xie, and Rong Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In 15th International Conference on Multimedia. ACM, 991--1000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dipasree Pal, Mandar Mitra, and Kalyankumar Datta. 2014. Improving query expansion using WordNet. J. Assoc. Inf. Sci. Technol. 65, 12 (2014), 2469--2478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dae Hoon Park and Rikio Chiba. 2017. A Neural Language Model for Query Auto-Completion. In 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1189--1192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shuyao Qi, Dingming Wu, and Nikos Mamoulis. 2016. Location Aware Keyword Query Suggestion Based on Document Proximity. IEEE Trans. Knowl. Data Eng. 28, 1 (2016), 82--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Alexandra Schofield and David M. Mimno. 2016. Comparing Apples to Apple: The Effects of Stemmers on Topic Models. Trans. Assoc. Comput. Linguistics 4 (2016), 287--300.Google ScholarGoogle ScholarCross RefCross Ref
  37. Milad Shokouhi. 2013. Learning to personalize query auto-completion. In 36th International ACM SIGIR conference on research and development in Information Retrieval. 103--112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian-Yun Nie. 2015. A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion. In 24th ACM International Conference on Information and Knowledge Management, CIKM 2015. ACM, 553--562.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Liling Tan. [n.d.]. Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]. https://github.com/alvations/pywsd.Google ScholarGoogle Scholar
  40. Stewart Whiting and Joemon M. Jose. 2014. Recent and robust query auto-completion. In 23rd International World Wide Web Conference, WWW. 971--982.Google ScholarGoogle Scholar
  41. Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the Use of Lucene for Information Retrieval Research. In 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1253--1256.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ReQue: A Configurable Workflow and Dataset Collection for Query Refinement

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
      October 2020
      3619 pages
      ISBN:9781450368599
      DOI:10.1145/3340531

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader