research-article

Discovering key concepts in verbose queries

Authors:
Michael Bendersky

University of Massachusetts, Amherst, MA, USA

University of Massachusetts, Amherst, MA, USA
View Profile

,
W. Bruce Croft

University of Massachusetts, Amherst, MA, USA

University of Massachusetts, Amherst, MA, USA
View Profile

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalJuly 2008Pages 491–498https://doi.org/10.1145/1390334.1390419

Published:20 July 2008Publication History

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 491–498

ABSTRACT

Current search engines do not, in general, perform well with longer, more verbose queries. One of the main issues in processing these queries is identifying the key concepts that will have the most impact on effectiveness. In this paper, we develop and evaluate a technique that uses query-dependent, corpus-dependent, and corpus-independent features for automatic extraction of key concepts from verbose queries. We show that our method achieves higher accuracy in the identification of key concepts than standard weighting methods such as inverse document frequency. Finally, we propose a probabilistic model for integrating the weighted key concepts identified by our method into a query, and demonstrate that this integration significantly improves retrieval effectiveness for a large set of natural language description queries derived from TREC topics on several newswire and web collections.

References

J. Allan, M.E. Connell, W.B. Croft, F.F. Feng, D. Fisher, and X. Li. INQUERY and TREC-9. Proceedings of the Ninth Text Retrieval Conference (TREC-9), pages 551--562, 2000.Google Scholar
James Allan, Jamie Callan, W. Bruce Croft, Lisa Ballesteros, John Broglio, Jinxi Xu, and Hongmin Shu. INQUERY at TREC-5. pages 119--132. NIST, 1997.Google Scholar
L. Bentivogli and E. Pianta. Beyond lexical units: Enriching wordnets with phrasets. Proceedings of the Research Note Sessions of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL03), pages 67--70, 2003. Google ScholarDigital Library
D.M. Bikel, R. Schwartz, and R.M. Weischedel. An Algorithm that Learns What's in a Name. Machine Learning, 34(1):211--231, 1999. Google ScholarDigital Library
Thorsten Brants and Alex Franz. Web 1T 5-gram Version 1, 2006.Google Scholar
Chris Buckley, Mandar Mitra, Janet A. Walz, and Claire Cardie. Using clustering and superconcepts within SMART: TREC 6. Information Processing and Management, 36(1):109--131, 2000. Google ScholarDigital Library
James P. Callan, W. Bruce Croft, and John Broglio. TREC and tipster experiments with INQUERY. Information Processing and Management, 31(3):327--343, 1995. Google ScholarDigital Library
Kenneth W. Church and William A. Gale. Poisson mixtures. Natural Language Engineering, 1(2):163--190, 1995.Google ScholarCross Ref
K. Collins-Thompson and J. Callan. Query expansion using random walk models. Proceedings of the 14th ACM international conference on Information and knowledge management, pages 704--711, 2005. Google ScholarDigital Library
W. Bruce Croft and John Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003. Google ScholarDigital Library
J.F. da Silva, J. Mexia, C.A. Coelho, and J.G.P. Lopes. Document Clustering and Cluster Topic Extraction in Multilingual Corpora. Proceedings of the 2001 IEEE International Conference on Data Mining, pages 513--520, 2001. Google ScholarDigital Library
E. Frank, G.W. Paynter, I.H. Witten, C. Gutwin, and C.G. Nevill-Manning. Domain-specific keyphrase extraction. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), pages 668--673, 1999. Google ScholarDigital Library
Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference, 148:156, 1996.Google Scholar
Djoerd Hiemstra. Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 35--41. ACM, 2002. Google ScholarDigital Library
A. Hulth. Improved automatic keyword extraction gmore linguistic knowledge. Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pages 216--223, 2003. Google ScholarDigital Library
Kevin Knight and Daniel Marcu. Statistics-based summarization - step one: Sentence compression. In AAAI/IAAI, pages 703--710, 2000. Google ScholarDigital Library
Giridhar Kumaran and James Allan. A case for shorter queries, and helping user create them. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 220--227, 2006.Google Scholar
O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 194--201, 2004. Google ScholarDigital Library
Hugo Liu. MontyLingua: An end-to-end natural language processor with common sense, 2004. Available at: web.media.mit.edu/ hugo/montylingua.Google Scholar
X. Liu and W.B. Croft. Cluster-based retrieval using language models. Proceedings of the 27th annual international conference on Research and developement in information retrieval, pages 186--193, 2004. Google ScholarDigital Library
Q. Mei, H. Fang, and C. Zhai. A study of poisson query generation model for information retrieval. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 319--326. ACM, 2007. Google ScholarDigital Library
D. Metzler and W.B. Croft. A Markov random field model for term dependencies. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 472--479, 2005. Google ScholarDigital Library
D. Metzler and W.B. Croft. Latent concept expansion using markov random fields. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 311--318, 2007. Google ScholarDigital Library
P. Ogilvie and J. Callan. Experiments using the Lemur toolkit. Proceedings of the Tenth Text Retrieval Conference (TREC-10), pages 103--108, 2001.Google Scholar
Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998. Google ScholarDigital Library
M. Porter. The Porter Stemming Algorithm. Accessible at http://www.tartarus.org/martin/PorterStemmer.Google Scholar
Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5):513--523, 1988. Google ScholarDigital Library
T. Strohman, D. Metzler, H. Turtle, and W.B. Croft. Indri: A language model-based search engine for complex queries. Proceedings of the International Conference on Intelligence Analysis, 2004.Google Scholar
P.D. Turney. Learning Algorithms for Keyphrase Extraction. Information Retrieval, 2(4):303--336, 2000. Google ScholarDigital Library
X. Wei and W.B. Croft. LDA-based document models for ad-hoc retrieval. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 178--185, 2006. Google ScholarDigital Library
I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005. Google ScholarDigital Library
J. Xu and W.B. Croft. Query expansion using local and global document analysis. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 4--11, 1996. Google ScholarDigital Library
Wen T. Yih, Joshua Goodman, and Vitor R. Carvalho. Finding advertising keywords on web pages. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 213--222, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
Y. Zhou and W.B. Croft. Query performance prediction in web search environments. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 543--550, 2007. Google ScholarDigital Library

Index Terms

Discovering key concepts in verbose queries
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Evaluating verbose query processing techniques
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Verbose or long queries are a small but significant part of the query stream in web search, and are common in other applications such as collaborative question answering (CQA). Current search engines perform well with keyword queries but are not, in ...
Read More
Information Retrieval with Verbose Queries
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Recently, the focus of many novel search applications shifted from short keyword queries to verbose natural language queries. Examples include question answering systems and dialogue systems, voice search on mobile devices and entity search engines like ...
Read More
Reducing long queries using query quality predictors
SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information retrieval
key concepts extraction
verbose queries
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 179
  Total Citations
  View Citations
- 1,546
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discovering key concepts in verbose queries

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluating verbose query processing techniques

Information Retrieval with Verbose Queries

Reducing long queries using query quality predictors