Abstract
We consider users' attempts to express their information needs through queries, or search requests and try to predict whether those requests will be of high or low quality. The second type of methods under investigation are those which attempt to estimate the quality of search systems themselves. Given a number of search systems to consider, these methods estimate how well or how poorly the systems will perform in comparison to each other.
First, pre-retrieval predictors are investigated, which predict a query's effectiveness before the retrieval step and are thus independent of the ranked list of results. Such predictors base their predictions solely on query terms, collection statistics and possibly external sources. Twenty-two prediction algorithms are categorized and their quality is assessed on three different TREC test collections. A number of newly applied methods for combining various predictors are examined to obtain a better prediction of a query's effectiveness.
Building on the analysis of pre-retrieval predictors, post-retrieval approaches are then investigated, which estimate a query's effectiveness on the basis of the retrieved results. The thesis focuses in particular on the Clarity Score approach and provides an analysis of its sensitivity towards different variables such as the collection, the query set and the retrieval approach. Adaptations to Clarity Score are introduced which improve the estimation accuracy of the original algorithm.
The utility of query effectiveness prediction methods is commonly evaluated by reporting correlation coefficients, such as Kendall's Tau. Largely unexplored though is the question of the relationship between the current evaluation methodology for query effectiveness prediction and the change in effectiveness of retrieval systems that employ a predictor. We investigate this question by examining how the observed quality of predictors (with respect to Kendall's Tau) affects the retrieval effectiveness in two adaptive system settings: selective query expansion and meta-search.
The last part of the thesis is concerned with the task of estimating the ranking of retrieval systems according to their retrieval effectiveness without relying on costly relevance judgments. Five different system ranking estimation approaches are evaluated on a wide range of data sets which cover a variety of retrieval tasks and test collections. It is shown that under certain conditions, automatic methods yield a highly accurate ranking of systems.
Available online at http://www.cs.utwente.nl/~hauffc/phd/thesis.pdf.
Index Terms
- Predicting the effectiveness of queries and retrieval systems
Recommendations
Improving Image Retrieval Effectiveness via Multiple Queries
Conventional approaches to image retrieval are based on the assumption that relevant images are physically near the query image in some feature space. This is the basis of the cluster hypothesis. However, semantically related images are often scattered ...
Predicting the effectiveness of keyword queries on databases
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementKeyword query interfaces (KQIs) for databases provide easy access to data, but often suffer from low ranking quality, i.e. low precision and/or recall, as shown in recent benchmarks. It would be useful to be able to identify queries that are likely to ...
Improving image retrieval effectiveness via multiple queries
MMDB '03: Proceedings of the 1st ACM international workshop on Multimedia databasesConventional approaches to image retrieval are based on the assumption that relevant images are physically near the query image in some feature space. This is the basis of the cluster hypothesis. However, semantically related images are often scattered ...
Comments