Article

Performance prediction using spatial autocorrelation

Author:
Fernando Diaz

University of Massachusetts, Amherst, MA

University of Massachusetts, Amherst, MA
View Profile

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalJuly 2007Pages 583–590https://doi.org/10.1145/1277741.1277841

Published:23 July 2007Publication History

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 583–590

ABSTRACT

Evaluation of information retrieval systems is one of the core tasks in information retrieval. Problems include the inability to exhaustively label all documents for a topic, generalizability from a small number of topics, and incorporating the variability of retrieval systems. Previous work addresses the evaluation of systems, the ranking of queries by difficulty, and the ranking of individual retrievals by performance. Approaches exist for the case of few and even no relevance judgments. Our focus is on zero-judgment performance prediction of individual retrievals. One common shortcoming of previous techniques is the assumption of uncorrelated document scores and judgments. If documents are embedded in a high-dimensional space (as they often are), we can apply techniques from spatial data analysis to detect correlations between document scores. We find that the low correlation between scores of topically close documents often implies a poor retrieval performance. When compared to a state of the art baseline, we demonstrate that the spatial analysis of retrieval scores provides significantly better prediction performance. These new predictors can also be incorporated with classic predictors to improve performance further. We also describe the first large-scale experiment to evaluate zero-judgment performance prediction for a massive number of retrieval systems over a variety collections in several languages.

References

J. Aslam and V. Pavlu. Query hardness estimation using jensen-shannon divergence among multiple scoring functions. In ECIR 2007: Proceedings of the 29th European Conference on Information Retrieval, 2007. Google ScholarDigital Library
J. A. Aslam, V. Pavlu, and E. Yilmaz. A statistical method for system evaluation using incomplete judgments. In S. Dumais, E. N. Efthimiadis, D. Hawking, and K. Jarvelin, editors, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 541--548. ACM Press, August 2006. Google ScholarDigital Library
D. Carmel, E. Yom-Tov, A. Darlow, and D. Pelleg. What makes a query difficult? In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 390--397, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 268--275, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
A. D. Cliff and J. K. Ord. Spatial Autocorrelation. Pion Ltd., 1973.Google Scholar
M. Connell, A. Feng, G. Kumaran, H. Raghavan, C. Shah, and J. Allan. Umass at tdt 2004. Technical Report CIIR Technical Report IR -- 357, Department of Computer Science, University of Massachusetts, 2004.Google Scholar
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Precision prediction based on ranked list coherence. Inf. Retr., 9(6):723--755, 2006. Google ScholarDigital Library
F. Diaz. Regularizing ad hoc retrieval scores. In CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 672--679, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 18--24, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
D. A. Griffith. Spatial Autocorrelation and Spatial Filtering. Springer Verlag, 2003.Google ScholarCross Ref
B. He and I. Ounis. Inferring Query Performance Using Pre-retrieval Predictors. In The Eleventh Symposium on String Processing and Information Retrieval (SPIRE), 2004.Google Scholar
N. Jardine and C. J. V. Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7:217--240, 1971.Google ScholarCross Ref
D. Jensen and J. Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In ICML '02: Proceedings of the Nineteenth International Conference on Machine Learning, pages 259--266, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 194--201, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
M. Montague and J. A. Aslam. Relevance score normalization for metasearch. In CIKM '01: Proceedings of the tenth international conference on Information and knowledge management, pages 427--433, New York, NY, USA, 2001. ACM Press. Google ScholarDigital Library
T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 408--415, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
I. Soboroff, C. Nicholas, and P. Cahan. Ranking retrieval systems without relevance judgments. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 66--73, New York, NY, USA, 2001. ACM Press. Google ScholarDigital Library
V. Vinay, I. J. Cox, N. Milic-Frayling, and K. Wood. On ranking the effectiveness of searches. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 398--404, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
Y. Zhou and W. B. Croft. Ranking robustness: a novel framework to predict query performance. In CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management, pages 567--574, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library

Index Terms

Performance prediction using spatial autocorrelation
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval models and ranking

Recommendations

Relevance prediction in similarity-search systems using extreme value theory
Highlights
- It is possible to predict the relevance of top-ranked results returned by a retrieval system by considering only their scores and employing the Extreme Value ...
Abstract
Among the challenges present in the design of retrieval systems, how to accurately assess their performance is perhaps one of the most important. Many applications such as rank aggregation or relevance feedback can be significantly ...
Read More
Performance evaluation and optimization for content-based image retrieval

Performance evaluation of content-based image retrieval (CBIR) systems is an important but still unsolved problem. The reason for its importance is that only performance evaluation allows for comparison and integration of different CBIR systems. We ...
Read More
Query-Performance Prediction Using Minimal Relevance Feedback
ICTIR '13: Proceedings of the 2013 Conference on the Theory of Information Retrieval

There has been much work on devising query-performance prediction approaches that estimate search effectiveness without relevance judgments (i.e., zero feedback). Specifically, post-retrieval predictors analyze the result list of top-retrieved ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
autocorrelation
performance prediction
regularization
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 61
  Total Citations
  View Citations
- 727
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Performance prediction using spatial autocorrelation

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Relevance prediction in similarity-search systems using extreme value theory

Performance evaluation and optimization for content-based image retrieval

Query-Performance Prediction Using Minimal Relevance Feedback