skip to main content
research-article

Predicting Query Performance by Query-Drift Estimation

Published:01 May 2012Publication History
Skip Abstract Section

Abstract

Predicting query performance, that is, the effectiveness of a search performed in response to a query, is a highly important and challenging problem. We present a novel approach to this task that is based on measuring the standard deviation of retrieval scores in the result list of the documents most highly ranked. We argue that for retrieval methods that are based on document-query surface-level similarities, the standard deviation can serve as a surrogate for estimating the presumed amount of query drift in the result list, that is, the presence (and dominance) of aspects or topics not related to the query in documents in the list. Empirical evaluation demonstrates the prediction effectiveness of our approach for several retrieval models. Specifically, the prediction quality often transcends that of current state-of-the-art prediction methods.

References

  1. Abdul-Jaleel, N., Allan, J., Croft, W. B., Diaz, F., Larkey, L., Li, X., Smucker, M. D., and Wade, C. 2004. UMASS at trec 2004 -- Novelty and hard. In Proceedings of the Text Retrieval Conference (TREC-13).Google ScholarGoogle Scholar
  2. Amati, G., Carpineto, C., and Romano, G. 2004. Query difficulty, robustness and selective application of query expansion. In Proceedings of the European Conference on IR Research (ECIR’04). 127--137.Google ScholarGoogle Scholar
  3. Arampatzis, A. and Robertson, S. 2011. Modeling score distributions in information retrieval. Inf. Retriev. 14, 1, 26--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arampatzis, A., Kamps, J., and Robertson, S. 2009. Where to stop reading a ranked list? Threshold optimization using truncated score distributions. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 524--531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Aslam, J. A. and Pavlu, V. 2007. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Proceedings of the European Conference on IR Research (ECIR’07). 198--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bendersky, M., Croft, W. B., and Diao, Y. 2011. Quality-Biased ranking of Web documents. In Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM’11). 95--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bernstein, Y., Billerbeck, B., Garcia, S., Lester, N., Scholer, F., and Zobel, J. 2005. RMIT university at trec 2006: Terabyte and robust track. In Proceedings of the Text Retrieval Conference (TREC-14).Google ScholarGoogle Scholar
  8. Buckley, C. 2004. Why current IR engines fail. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. Poster. 584--585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Buckley, C., Salton, G., Allan, J., and Singhal, A. 1994. Automatic query expansion using SMART: TREC3. In Proceedings of the Text Retrieval Conference (TREC-3). 69--80.Google ScholarGoogle Scholar
  10. Carmel, D. and Yom-Tov, E. 2010. Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services. Morgan & Claypool. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Carmel, D., Yom-Tov, E., Darlow, A., and Pelleg, D. 2006. What makes a query difficult? In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 390--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Clarke, C. L. A., Craswell, N., and Soboroff, I. 2009. Overview of the trec 2009 Web track. In Proceedings of the Text Retrieval Conference (TREC).Google ScholarGoogle Scholar
  13. Cormack, G. V., Smucker, M. D., and Clarke, C. L. A. 2010. Efficient and effective spam filtering and re-ranking for large Web datasets. CoRR abs/1004.5168.Google ScholarGoogle Scholar
  14. Croft, W. B. and Lafferty, J. 2003. Language Modeling for Information Retrieval. Information Retrieval Book Series, Number 13. Kluwer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2002. Predicting query performance. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 299--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2004. A language modeling framework for selective query expansion. Tech. rep. IR-338, Center for Intelligent Information Retrieval, University of Massachusetts.Google ScholarGoogle Scholar
  17. Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2006. Precision prediction based on ranked list coherence. Inf. Retriev. 9, 6, 723--755. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cummins, R., Jose, J. M., and O’Riordan, C. 2011a. Improved query performance prediction using standard deviation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1089--1090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Cummins, R., Lalmas, M., O’Riordan, C., and Jose, J. M. 2011b. Navigating the user query space. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE’11). 380--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dai, K., Kanoulas, E., Pavlu, V., and Aslam, J. A. 2011. Variational bayes for modeling score distributions. Inf. Retriev. 14, 1, 47--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Diaz, F. 2007. Performance prediction using spatial autocorrelation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 583--590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Fang, H. and Zhai, C. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 480--487. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Fang, H., Tao, T., and Zhai, C. 2004. A formal study of information retrieval heuristics. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fuhr, N. 1992. Probabilistic models in information retrieval. Comput. J. 35, 3, 243--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Harman, D. 1992. Relevance feedback revisited. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Harman, D. and Buckley, C. 2004. The NRRC reliable information access (ria) workshop. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 528--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hauff, C., Hiemstra, D., and de Jong, F. 2008a. A survey of preretrieval query performance predictors. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’08). 1419--1420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hauff, C., Murdock, V., and Baeza-Yates, R. 2008b. Improved query difficulty prediction for the Web. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’08). 439--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hauff, C., Kelly, D., and Azzopardi, L. 2010. A comparison of user and system query performance predictions. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’10). 979--988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. He, B. and Ounis, I. 2004. Inferring query performance using pre-retrieval predictors. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE’04). 43--54.Google ScholarGoogle Scholar
  31. Kanoulas, E., Dai, K., Pavlu, V., and Aslam, J. A. 2010. Score distribution models: Assumptions, intuition, and robustness to score manipulation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 242--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lafferty, J. D. and Zhai, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 111--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lavrenko, V. and Croft, W. B. 2001. Relevance-based language models. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 120--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lin, J., Metzler, D., Elsayed, T., and Wang, L. 2010. Of ivory and smurfs: Loxodontan mapreduce experiments for Web search. In Proceedings of the Text Retrieval Conference (TREC).Google ScholarGoogle Scholar
  35. Liu, X. and Croft, W. B. 2008. Evaluating text representations for retrieval of the best group of documents. In Proceedings of the European Conference on IR Research (ECIR’08). 454--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lv, Y. and Zhai, C. 2011. When documents are very long, bm25 fails! In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1103--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Manmatha, R., Rath, T. M., and Feng, F. 2001. Modeling score distributions for combining the outputs of search engines. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 267--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Metzler, D. and Croft, W. B. 2005. A Markov random field model for term dependencies. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 472--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Metzler, D. and Croft, W. B. 2007. Linear feature-based models for information retrieval. Inf. Retriev. 10, 3, 257--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 206--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Mothe, J. and Tanguy, L. 2005. Linguistic features to predict query difficulty. In ACM SIGIR’05 Workshop on Predicting Query Difficulty - Methods and Applications.Google ScholarGoogle Scholar
  42. Pérez-Iglesias, J. and Araujo, L. 2009. Ranking list dispersion as a query performance predictor. In Proceedings of the 2nd International Conference on Theory of Information Retrieval (ICTIR’09). 371--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Pérez-Iglesias, J. and Araujo, L. 2010. Standard deviation as a query hardness estimator. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE’10). 207--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ponte, J. M. and Croft, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 275--281. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Raiber, F. and Kurland, O. 2010. On identifying representative relevant documents. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’10). 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Robertson, S. 2007. On score distributions and relevance. In Proceedings of the European Conference on IR Research (ECIR’07). 40--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., and Gatford, M. 1994. Okapi at trec-3. In Proceedings of the Text Retrieval Conference (TREC).Google ScholarGoogle Scholar
  48. Rocchio, J. J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, G. Salton Ed., Prentice Hall, 313--323.Google ScholarGoogle Scholar
  49. Salton, J., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 11, 613--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Scholer, F. and Garcia, S. 2009. A case for improved evaluation of query difficulty prediction. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 640--641. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Scholer, F., Williams, H. E., and Turpin, A. 2004. Query association surrogates for Web search. J. Am. Soc. Inf. Sci. Technol. 55, 7, 637--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Seo, J. and Croft, W. B. 2010. Geometric representations for multiple documents. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 251--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Shtok, A., Kurland, O., and Carmel, D. 2009. Predicting query performance by query-drift estimation. In Proceedings of the International Conference on Theory of Information Retrieval (ICTIR’09). 305--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Shtok, A., Kurland, O., and Carmel, D. 2010. Using statistical decision theory and relevance models for query performance prediction. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 259--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Song, F. and Croft, W. B. 1999. A general language model for information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (Poster abstract). 279--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Terra, E. L. and Warren, R. 2005. Poison pills: Harmful relevant documents in feedback. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’05). 319--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Tomlinson, S. 2004. Robust, Web and terabyte retrieval with hummingbird search server at trec 2004. In Proceedings of the Text Retrieval Conference (TREC-13).Google ScholarGoogle Scholar
  58. Turtle, H. R. and Croft, W. B. 1990. Inference networks for document retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Vinay, V., Cox, I. J., Milic-Frayling, N., and Wood, K. R. 2006. On ranking the effectiveness of searches. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 398--404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Voorhees, E. M. 2004. Overview of the trec 2004 robust retrieval track. In Proceedings of the Text Retrieval Conference (TREC-13).Google ScholarGoogle Scholar
  61. Yom-Tov, E., Fine, S., Carmel, D., and Darlow, A. 2005. Learning to estimate query difficulty: Including applications to missing content detection and distributed information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 512--519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Zhai, C. and Lafferty, J. D. 2001a. Model-Based feedback in the language modeling approach to information retrieval. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’01). 403--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Zhai, C. and Lafferty, J. D. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 334--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Zhao, Y., Scholer, F., and Tsegay, Y. 2008. Effective preretrieval query performance prediction using similarity and variability evidence. In Proceedings of the European Conference on IR Research (ECIR’08). 52--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Zhou, Y. 2007. Retrieval performance prediction and document quality. Ph.D. thesis, University of Massachusetts Amherst. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Zhou, Y. and Croft, W. B. 2006. Ranking robustness: A novel framework to predict query performance. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’06). 567--574. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Zhou, Y. and Croft, W. B. 2007. Query performance prediction in Web search environments. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval. 543--550. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Predicting Query Performance by Query-Drift Estimation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 30, Issue 2
      May 2012
      245 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/2180868
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 May 2012
      • Accepted: 1 February 2012
      • Revised: 1 January 2012
      • Received: 1 March 2011
      Published in tois Volume 30, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader