research-article

Predicting Query Performance by Query-Drift Estimation

Authors:
Anna Shtok

Technion -- Israel Institute of Technology

Technion -- Israel Institute of Technology
View Profile

,
Oren Kurland

Technion -- Israel Institute of Technology

Technion -- Israel Institute of Technology
View Profile

,
David Carmel

IBM Haifa Research Labs

IBM Haifa Research Labs
View Profile

,
Fiana Raiber

Technion -- Israel Institute of Technology

Technion -- Israel Institute of Technology
View Profile

,
Gad Markovits

Technion -- Israel Institute of Technology

Technion -- Israel Institute of Technology
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 30 Issue 2Article No.: 11pp 1–35https://doi.org/10.1145/2180868.2180873

Published:01 May 2012Publication History

ACM Transactions on Information Systems

Abstract

Predicting query performance, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. We present a novel approach to this task that is based on measuring the standard deviation of retrieval scores in the result list of the documents most highly ranked. We argue that for retrieval methods that are based on document-query surface-level similarities, the standard deviation can serve as a surrogate for estimating the presumed amount of query drift in the result list, that is, the presence (and dominance) of aspects or topics not related to the query in documents in the list. Empirical evaluation demonstrates the prediction effectiveness of our approach for several retrieval models. Specifically, the prediction quality often transcends that of current state-of-the-art prediction methods.

References

Abdul-Jaleel, N., Allan, J., Croft, W. B., Diaz, F., Larkey, L., Li, X., Smucker, M. D., and Wade, C. 2004. UMASS at trec 2004 -- Novelty and hard. In Proceedings of the Text Retrieval Conference (TREC-13).Google Scholar
Amati, G., Carpineto, C., and Romano, G. 2004. Query difficulty, robustness and selective application of query expansion. In Proceedings of the European Conference on IR Research (ECIR’04). 127--137.Google Scholar
Arampatzis, A. and Robertson, S. 2011. Modeling score distributions in information retrieval. Inf. Retriev. 14, 1, 26--46. Google ScholarDigital Library
Arampatzis, A., Kamps, J., and Robertson, S. 2009. Where to stop reading a ranked list? Threshold optimization using truncated score distributions. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 524--531. Google ScholarDigital Library
Aslam, J. A. and Pavlu, V. 2007. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Proceedings of the European Conference on IR Research (ECIR’07). 198--209. Google ScholarDigital Library
Bendersky, M., Croft, W. B., and Diao, Y. 2011. Quality-Biased ranking of Web documents. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’11). 95--104. Google ScholarDigital Library
Bernstein, Y., Billerbeck, B., Garcia, S., Lester, N., Scholer, F., and Zobel, J. 2005. RMIT university at trec 2006: Terabyte and robust track. In Proceedings of the Text Retrieval Conference (TREC-14).Google Scholar
Buckley, C. 2004. Why current IR engines fail. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. Poster. 584--585. Google ScholarDigital Library
Buckley, C., Salton, G., Allan, J., and Singhal, A. 1994. Automatic query expansion using SMART: TREC3. In Proceedings of the Text Retrieval Conference (TREC-3). 69--80.Google Scholar
Carmel, D. and Yom-Tov, E. 2010. Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool. Google ScholarDigital Library
Carmel, D., Yom-Tov, E., Darlow, A., and Pelleg, D. 2006. What makes a query difficult? In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 390--397. Google ScholarDigital Library
Clarke, C. L. A., Craswell, N., and Soboroff, I. 2009. Overview of the trec 2009 Web track. In Proceedings of the Text Retrieval Conference (TREC).Google Scholar
Cormack, G. V., Smucker, M. D., and Clarke, C. L. A. 2010. Efficient and effective spam filtering and re-ranking for large Web datasets. CoRR abs/1004.5168.Google Scholar
Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Information Retrieval Book Series, Number 13. Kluwer. Google ScholarDigital Library
Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2002. Predicting query performance. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 299--306. Google ScholarDigital Library
Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2004. A language modeling framework for selective query expansion. Tech. rep. IR-338, Center for Intelligent Information Retrieval, University of Massachusetts.Google Scholar
Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2006. Precision prediction based on ranked list coherence. Inf. Retriev. 9, 6, 723--755. Google ScholarDigital Library
Cummins, R., Jose, J. M., and O’Riordan, C. 2011a. Improved query performance prediction using standard deviation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1089--1090. Google ScholarDigital Library
Cummins, R., Lalmas, M., O’Riordan, C., and Jose, J. M. 2011b. Navigating the user query space. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE’11). 380--385. Google ScholarDigital Library
Dai, K., Kanoulas, E., Pavlu, V., and Aslam, J. A. 2011. Variational bayes for modeling score distributions. Inf. Retriev. 14, 1, 47--67. Google ScholarDigital Library
Diaz, F. 2007. Performance prediction using spatial autocorrelation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 583--590. Google ScholarDigital Library
Fang, H. and Zhai, C. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 480--487. Google ScholarDigital Library
Fang, H., Tao, T., and Zhai, C. 2004. A formal study of information retrieval heuristics. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 49--56. Google ScholarDigital Library
Fuhr, N. 1992. Probabilistic models in information retrieval. Comput. J. 35, 3, 243--255. Google ScholarDigital Library
Harman, D. 1992. Relevance feedback revisited. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1--10. Google ScholarDigital Library
Harman, D. and Buckley, C. 2004. The NRRC reliable information access (ria) workshop. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 528--529. Google ScholarDigital Library
Hauff, C., Hiemstra, D., and de Jong, F. 2008a. A survey of preretrieval query performance predictors. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’08). 1419--1420. Google ScholarDigital Library
Hauff, C., Murdock, V., and Baeza-Yates, R. 2008b. Improved query difficulty prediction for the Web. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’08). 439--448. Google ScholarDigital Library
Hauff, C., Kelly, D., and Azzopardi, L. 2010. A comparison of user and system query performance predictions. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’10). 979--988. Google ScholarDigital Library
He, B. and Ounis, I. 2004. Inferring query performance using pre-retrieval predictors. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE’04). 43--54.Google Scholar
Kanoulas, E., Dai, K., Pavlu, V., and Aslam, J. A. 2010. Score distribution models: Assumptions, intuition, and robustness to score manipulation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 242--249. Google ScholarDigital Library
Lafferty, J. D. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 111--119. Google ScholarDigital Library
Lavrenko, V. and Croft, W. B. 2001. Relevance-based language models. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 120--127. Google ScholarDigital Library
Lin, J., Metzler, D., Elsayed, T., and Wang, L. 2010. Of ivory and smurfs: Loxodontan mapreduce experiments for Web search. In Proceedings of the Text Retrieval Conference (TREC).Google Scholar
Liu, X. and Croft, W. B. 2008. Evaluating text representations for retrieval of the best group of documents. In Proceedings of the European Conference on IR Research (ECIR’08). 454--462. Google ScholarDigital Library
Lv, Y. and Zhai, C. 2011. When documents are very long, bm25 fails! In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1103--1104. Google ScholarDigital Library
Manmatha, R., Rath, T. M., and Feng, F. 2001. Modeling score distributions for combining the outputs of search engines. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 267--275. Google ScholarDigital Library
Metzler, D. and Croft, W. B. 2005. A Markov random field model for term dependencies. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 472--479. Google ScholarDigital Library
Metzler, D. and Croft, W. B. 2007. Linear feature-based models for information retrieval. Inf. Retriev. 10, 3, 257--274. Google ScholarDigital Library
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 206--214. Google ScholarDigital Library
Mothe, J. and Tanguy, L. 2005. Linguistic features to predict query difficulty. In ACM SIGIR’05 Workshop on Predicting Query Difficulty - Methods and Applications.Google Scholar
Pérez-Iglesias, J. and Araujo, L. 2009. Ranking list dispersion as a query performance predictor. In Proceedings of the 2nd International Conference on Theory of Information Retrieval (ICTIR’09). 371--374. Google ScholarDigital Library
Pérez-Iglesias, J. and Araujo, L. 2010. Standard deviation as a query hardness estimator. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE’10). 207--212. Google ScholarDigital Library
Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281. Google ScholarDigital Library
Raiber, F. and Kurland, O. 2010. On identifying representative relevant documents. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’10). 99--108. Google ScholarDigital Library
Robertson, S. 2007. On score distributions and relevance. In Proceedings of the European Conference on IR Research (ECIR’07). 40--51. Google ScholarDigital Library
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1994. Okapi at trec-3. In Proceedings of the Text Retrieval Conference (TREC).Google Scholar
Rocchio, J. J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, G. Salton Ed., Prentice Hall, 313--323.Google Scholar
Salton, J., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 11, 613--620. Google ScholarDigital Library
Scholer, F. and Garcia, S. 2009. A case for improved evaluation of query difficulty prediction. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 640--641. Google ScholarDigital Library
Scholer, F., Williams, H. E., and Turpin, A. 2004. Query association surrogates for Web search. J. Am. Soc. Inf. Sci. Technol. 55, 7, 637--650. Google ScholarDigital Library
Seo, J. and Croft, W. B. 2010. Geometric representations for multiple documents. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 251--258. Google ScholarDigital Library
Shtok, A., Kurland, O., and Carmel, D. 2009. Predicting query performance by query-drift estimation. In Proceedings of the International Conference on Theory of Information Retrieval (ICTIR’09). 305--312. Google ScholarDigital Library
Shtok, A., Kurland, O., and Carmel, D. 2010. Using statistical decision theory and relevance models for query performance prediction. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 259--266. Google ScholarDigital Library
Song, F. and Croft, W. B. 1999. A general language model for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (Poster abstract). 279--280. Google ScholarDigital Library
Terra, E. L. and Warren, R. 2005. Poison pills: Harmful relevant documents in feedback. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 319--320. Google ScholarDigital Library
Tomlinson, S. 2004. Robust, Web and terabyte retrieval with hummingbird search server at trec 2004. In Proceedings of the Text Retrieval Conference (TREC-13).Google Scholar
Turtle, H. R. and Croft, W. B. 1990. Inference networks for document retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1--24. Google ScholarDigital Library
Vinay, V., Cox, I. J., Milic-Frayling, N., and Wood, K. R. 2006. On ranking the effectiveness of searches. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 398--404. Google ScholarDigital Library
Voorhees, E. M. 2004. Overview of the trec 2004 robust retrieval track. In Proceedings of the Text Retrieval Conference (TREC-13).Google Scholar
Yom-Tov, E., Fine, S., Carmel, D., and Darlow, A. 2005. Learning to estimate query difficulty: Including applications to missing content detection and distributed information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 512--519. Google ScholarDigital Library
Zhai, C. and Lafferty, J. D. 2001a. Model-Based feedback in the language modeling approach to information retrieval. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’01). 403--410. Google ScholarDigital Library
Zhai, C. and Lafferty, J. D. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 334--342. Google ScholarDigital Library
Zhao, Y., Scholer, F., and Tsegay, Y. 2008. Effective preretrieval query performance prediction using similarity and variability evidence. In Proceedings of the European Conference on IR Research (ECIR’08). 52--64. Google ScholarDigital Library
Zhou, Y. 2007. Retrieval performance prediction and document quality. Ph.D. thesis, University of Massachusetts Amherst. Google ScholarDigital Library
Zhou, Y. and Croft, W. B. 2006. Ranking robustness: A novel framework to predict query performance. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’06). 567--574. Google ScholarDigital Library
Zhou, Y. and Croft, W. B. 2007. Query performance prediction in Web search environments. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 543--550. Google ScholarDigital Library

Index Terms

Predicting Query Performance by Query-Drift Estimation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Predicting Query Performance by Query-Drift Estimation
ICTIR '09: Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory

Predicting <em>query performance</em> , that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. Our novel approach to addressing this challenge is based on estimating the potential amount ...
Read More
Query-Performance Prediction Using Minimal Relevance Feedback
ICTIR '13: Proceedings of the 2013 Conference on the Theory of Information Retrieval

There has been much work on devising query-performance prediction approaches that estimate search effectiveness without relevance judgments (i.e., zero feedback). Specifically, post-retrieval predictors analyze the result list of top-retrieved ...
Read More
Query-performance prediction: setting the expectations straight
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

The query-performance prediction task has been described as estimating retrieval effectiveness in the absence of relevance judgments. The expectations throughout the years were that improved prediction techniques would translate to improved retrieval ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 30, Issue 2
May 2012
245 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/2180868
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2012
- Accepted: 1 February 2012
- Revised: 1 January 2012
- Received: 1 March 2011
Published in tois Volume 30, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Query-performance prediction
query drift
score distribution
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 103
  Total Citations
  View Citations
- 747
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predicting Query Performance by Query-Drift Estimation

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Predicting Query Performance by Query-Drift Estimation

Query-Performance Prediction Using Minimal Relevance Feedback

Query-performance prediction: setting the expectations straight

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Predicting Query Performance by Query-Drift Estimation

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Predicting Query Performance by Query-Drift Estimation

Query-Performance Prediction Using Minimal Relevance Feedback

Query-performance prediction: setting the expectations straight

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media