Article

The maximum entropy method for analyzing retrieval measures

Authors:
Javed A. Aslam

Northeastern University, Boston, MA

Northeastern University, Boston, MA
View Profile

,
Emine Yilmaz

Northeastern University, Boston, MA

Northeastern University, Boston, MA
View Profile

,
Virgiliu Pavlu

Northeastern University, Boston, MA

Northeastern University, Boston, MA
View Profile

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrievalAugust 2005Pages 27–34https://doi.org/10.1145/1076034.1076042

Published:15 August 2005Publication History

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 27–34

ABSTRACT

We present a model, based on the maximum entropy method, for analyzing various measures of retrieval performance such as average precision, R-precision, and precision-at-cutoffs. Our methodology treats the value of such a measure as a constraint on the distribution of relevant documents in an unknown list, and the maximum entropy distribution can be determined subject to these constraints. For good measures of overall performance (such as average precision), the resulting maximum entropy distributions are highly correlated with actual distributions of relevant documents in lists as demonstrated through TREC data; for poor measures of overall performance, the correlation is weaker. As such, the maximum entropy method can be used to quantify the overall quality of a retrieval measure. Furthermore, for good measures of overall performance (such as average precision), we show that the corresponding maximum entropy distributions can be used to accurately infer precision-recall curves and the values of other measures of performance, and we demonstrate that the quality of these inferences far exceeds that predicted by simple retrieval measure correlation, as demonstrated through TREC data.

References

A. L. Berger, V. D. Pietra, and S. D. Pietra. A maximum entropy approach to natural language processing. Comput. Linguist., 22:39--71, 1996. Google ScholarDigital Library
C. Buckley and E. Voorhees. Evaluating evaluation measure stability. In SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 33--40. ACM Press, 2000. Google ScholarDigital Library
W. S. Cooper. On selecting a measure of retrieval effectiveness. part i. In Readings in information retrieval, pages 191--204. Morgan Kaufmann Publishers Inc., 1997. Google ScholarDigital Library
T. M. Cover and J. Thomas. Elements of Information Theory. John Wiley & sons, 1991. Google ScholarDigital Library
B. Dervin and M. S. Nilan. Information needs and use. In Annual Review of Information Science and Technology, volume~21, pages 3--33, 1986.Google Scholar
W. R. Greiff and J. Ponte. The maximum entropy approach and probabilistic ir models. ACM Trans. Inf. Syst., 18(3):246--287, 2000. Google ScholarDigital Library
E. Jaynes. On the rationale of maximum entropy methods. In Proc.IEEE, volume 70, pages 939--952, 1982.Google ScholarCross Ref
E. T. Jaynes. Information theory and statistical mechanics: Part i. Physical Review 106, pages 620--630, 1957a.Google Scholar
E. T. Jaynes. Information theory and statistical mechanics: Part ii. Physical Review 108, page 171, 1957b.Google Scholar
Y. Kagolovsky and J. R. Moehr. Current status of the evaluation of information retrieval. J. Med. Syst., 27(5):409--424, 2003. Google ScholarDigital Library
P. B. Kantor and J. Lee. The maximum entropy principle in information retrieval. In SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval, pages 269--274. ACM Press, 1986. Google ScholarDigital Library
D. D. Lewis. Evaluating and optimizing autonomous text classification systems. In SIGIR '95: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pages 246--254. ACM Press, 1995. Google ScholarDigital Library
R. M. Losee. When information retrieval measures agree about the relative quality of document rankings. J. Am. Soc. Inf. Sci., 51(9):834--840, 2000. Google ScholarDigital Library
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.Google Scholar
D. Pavlov, A. Popescul, D. M. Pennock, and L. H. Ungar. Mixtures of conditional maximum entropy models. In T. Fawcett and N. Mishra, editors, ICML, pages 584--591. AAAI Press, 2003.Google Scholar
S. J. Phillips, M. Dudik, and R. E. Schapire. A maximum entropy approach to species distribution modeling. In ICML '04: Twenty-first international conference on Machine learning, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
V. Raghavan, P. Bollmann, and G. S. Jung. A critical investigation of recall and precision as measures of retrieval system performance. ACM Trans. Inf. Syst., 7(3):205--229, 1989. Google ScholarDigital Library
A. Ratnaparkhi and M. P. Marcus. Maximum entropy models for natural language ambiguity resolution, 1998.Google Scholar
T. Saracevic. Evaluation of evaluation in information retrieval. In SIGIR '95: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pages 138--146. ACM Press, 1995. Google ScholarDigital Library
C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal 27, pages 379--423 & 623--656, 1948.Google Scholar
N. Wu. The Maximum Entropy Method. Springer, New York, 1997.Google ScholarCross Ref

Index Terms

The maximum entropy method for analyzing retrieval measures
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Estimating average precision with incomplete and imperfect judgments
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

We consider the problem of evaluating retrieval systems using incomplete judgment information. Buckley and Voorhees recently demonstrated that retrieval systems can be efficiently and effectively evaluated using incomplete judgments via the bpref ...
Read More
A geometric interpretation of r-precision and its correlation with average precision
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

We consider two of the most commonly cited measures of retrieval performance: average precision and R-precision. It is well known that average precision and R-precision are highly correlated and similarly robust measures of performance, though the ...
Read More
Maximum Entropy Principle with General Deviation Measures

An approach to the Shannon and Rényi entropy maximization problems with constraints on the mean and law-invariant deviation measure for a random variable has been developed. The approach is based on the representation of law-invariant deviation measures ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
General Chairs:
Ricardo Baeza-Yates
University of Chile, Chile
,
Nivio Ziviani
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Gary Marchionini
University of North Carolina, USA
,
Alistair Moffat
University of Melbourne, Australia
,
John Tait
University of Sunderland, UK
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
average precision
evaluation
maximum entropy
modeling
precision-recall curve
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 1,429
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The maximum entropy method for analyzing retrieval measures

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Estimating average precision with incomplete and imperfect judgments

A geometric interpretation of r-precision and its correlation with average precision

Maximum Entropy Principle with General Deviation Measures