ABSTRACT
There is a growing interest in estimating the effectiveness of search. Two approaches are typically considered: examining the search queries and examining the retrieved document sets. In this paper, we take the latter approach. We use four measures to characterize the retrieved document sets and estimate the quality of search. These measures are (i) the clustering tendency as measured by the Cox-Lewis statistic, (ii) the sensitivity to document perturbation, (iii) the sensitivity to query perturbation and (iv) the local intrinsic dimensionality. We present experimental results for the task of ranking 200 queries according to the search effectiveness over the TREC (discs 4 and 5) dataset. Our ranking of queries is compared with the ranking based on the average precision using the Kendall t statistic. The best individual estimator is the sensitivity to document perturbation and yields Kendall t of 0.521. When combined with the clustering tendency based on the Cox-Lewis statistic and the query perturbation measure, it results in Kendall t of 0.562 which to our knowledge is the highest correlation with the average precision reported to date.
- E. Yom-Tov, S. Fine, D. Carmel and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In Proceedings of the 28th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval. Salvador, Brazil, 2005 Google ScholarDigital Library
- S. Cronen-Townsend, Y. Zhou and B. Croft. Predicting Query Performance. Proceedings of the 25th Annual International ACM SIGIR conference on Research and Development in Information Retrieval. Tampere, Finland, 2002 Google ScholarDigital Library
- G. Amati, C. Carpineto and G. Romano. Query difficulty, robustness and selective application of query expansion. In Proceedings of the 25th European Conference on Information Retrieval. Sunderland, Great Britain, 2004Google ScholarCross Ref
- B. He and I. Ounis. Inferring Query Performance Using Pre-retrieval Predictors. In Proceedings of the 11th Symposium on String Processing and Information Retrieval, Padova, Italy, 2004Google ScholarCross Ref
- C. J. van Rijsbergen. Information Retrieval. Butterworths, London, Second Edition, 1979 Google ScholarDigital Library
- A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall Advanced Reference Series, Year : 1988 Google ScholarDigital Library
- T. F. Cox and T. Lewis. A conditional distance ratio method for analyzing spatial patterns. Biometrika 63, 483--491, 1976Google ScholarCross Ref
- A. Tombros and C.J. van Rijsbergen. Query-sensitive similarity measures for Information Retrieval. Invited paper, Knowledge and Information Systems, 2004 Google ScholarDigital Library
- K. Fukunaga and D.R. Olsen. An Algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, C-20(2), pp. 176--183, 1971Google ScholarDigital Library
- T. P. Minka. Automatic Choice of Dimensionality for PCA. MIT Media Laboratory Perceptual Computing Section Technical Report No. 514Google Scholar
- The Lemur Toolkit for Language Modeling and Information Retrieval, http://www.lemurproject.org/.Google Scholar
Index Terms
- On ranking the effectiveness of searches
Recommendations
Ranking robustness: a novel framework to predict query performance
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementIn this paper, we introduce the notion of ranking robustness, which refers to a property of a ranked list of documents that indicates how stable the ranking is in the presence of uncertainty in the ranked documents. We propose a statistical measure ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Measuring ranked list robustness for query performance prediction
We introduce the notion of ranking robustness, which refers to a property of a ranked list of documents that indicates how stable the ranking is in the presence of uncertainty in the ranked documents. We propose a statistical measure called the ...
Comments