article

eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

Authors:
Gabriella Kazai

Queen Mary, University of London, London, UK

Queen Mary, University of London, London, UK
View Profile

,
Mounia Lalmas

Queen Mary, University of London, London, UK

Queen Mary, University of London, London, UK
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 24 Issue 4pp 503–542https://doi.org/10.1145/1185877.1185883

Published:01 October 2006Publication History

ACM Transactions on Information Systems

Abstract

We propose and evaluate a family of measures, the eXtended Cumulated Gain (XCG) measures, for the evaluation of content-oriented XML retrieval approaches. Our aim is to provide an evaluation framework that allows the consideration of dependency among XML document components. In particular, two aspects of dependency are considered: (1) near-misses, which are document components that are structurally related to relevant components, such as a neighboring paragraph or container section, and (2) overlap, which regards the situation wherein the same text fragment is referenced multiple times, for example, when a paragraph and its container section are both retrieved. A further consideration is that the measures should be flexible enough so that different models of user behavior may be instantiated within. Both system- and user-oriented aspects are investigated and both recall and precision-like qualities are measured. We evaluate the reliability of the proposed measures based on the INEX 2004 test collection. For example, the effects of assessment variation and topic set size on evaluation stability are investigated, and the upper and lower bounds of expected error rates are established. The evaluation demonstrates that the XCG measures are stable and reliable, and in particular, that the novel measures of effort-precision and gain-recall (ep/gr) show comparable behavior to established IR measures like precision and recall.

References

Amati, G. 2003. Probability models for information retrieval based on divergence from randomness. Ph.D. thesis, University of Glasgow.]]Google Scholar
Baeza-Yates, R., Fuhr, N., and Maarek, Y., eds. 2002. Proceedings of the SIGIR Workshop on XML and Information Retrieval.]]Google Scholar
Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison Wesley Reading, MA.]] Google Scholar
Blanken, H. M., Grabs, T., Schek, H.-J., Schenkel, R., and Weikum, G., eds. 2003. Intelligent Search on XML Data, Applications, Languages, Models, Implementations, and Benchmarks. Lecture Notes in Computer Science, vol. 2818. Springer-Verlag.]]Google Scholar
Borlund, P. 2003. The concept of relevance in IR. J. American Society Inf. Sci. Technol. 54, 10, 913--925.]] Google Scholar
Bray, T., Paoli, J., and Sperberg-McQueen, C. M. 1998. Extensible markup language (XML) 1.0. http://www.w3.org/TR/1998/REC-xml-19980210, W3C Recommendation. Tech. Rep., W3C (World Wide Web Consortium). Feb.]]Google Scholar
Buckley, C. and Voorhees, E. M. 2000. Evaluating evaluation measure stability. In SIGIR: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. 33--40.]] Google Scholar
Burgin, R. 1992. Variations in relevance judgments and the evaluation of retrieval performance. Inf. Proces. Manage. 28, 5, 619--627.]] Google Scholar
Chiaramella, Y., Mulhem, P., and Fourel, F. 1996. A model for multimedia information retrieval. Tech. Rep. Fermi ESPRIT BRA 8134, University of Glasgow.]]Google Scholar
Clark, J. and DeRose, S. 1999. XML path language (XPath) version 1.0. W3C Recommendation. http://www.w3.org/TR/xpath. Tech. Rep. REC-xpath-19991116, WWW Consortium. Nov.]]Google Scholar
Conover, W. 1980. Practical Non-Parametric Statistics, 2nd ed. John Wiley, New York.]]Google Scholar
Cooper, W. 1968. Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation 19, 1, 30--41.]]Google Scholar
de Vries, A., Kazai, G., and Lalmas, M. 2004. Tolerance to irrelevance: A user-effort oriented evaluation of retrieval systems without predefined retrieval unit. In Proceedings of the Recherche d'Informations Assistee par Ordinateur (RIAO) Conference. Avignon, France.]]Google Scholar
Fuhr, N., Lalmas, M., and Malik, S., eds. 2004. Proceedings of the 2nd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX). Dagstuhl, Germany, Dec. 15--17, 2003. http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf.]]Google Scholar
Fuhr, N., Lalmas, M., Malik, S., and Szlavik, Z., eds. 2005. Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Schloss Dagstuhl, Germany, 6--8 Dec. 2004. Lecture Notes in Computer Science, vol. 3493. Springer.]]Google Scholar
Fuhr, N., Malik, S., and Lalmas, M. 2004. Overview of the INitiative for the Evaluation of XML Retrieval (INEX) 2003. In Proceedings of the 2nd workshop of the initiative for the Evaluation of XML Retrieval. Dagstuhl, Germany. 1--11. http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf.]]Google Scholar
Goevert, N., Fuhr, N., Lalmas, M., and Kazai, G. 2006. Evaluating the effectiveness of content-oriented XML retrieval methods. J. Inf. Retrieval (to appear).]] Google Scholar
Gövert, N. and Kazai, G. 2003. Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002. In Proceedings of the 1st Workshop of the INitiative for the Evaluation of XML Retrieval (INEX). Dagstuhl, Germany, 8--11 Dec. 2002, Sophia Antipolis, France. 1--17.]]Google Scholar
Harter, S. P. 1996. Variations in relevance assessments and the measurement of retrieval effectiveness. J. American Society Inf. Sci. 47, 1, 37--49.]] Google Scholar
Hawking, D., Voorhees, E., Craswell, N., and Bailey, P. 1999. Overview of the TREC-8 Web Track. In Proceedings of the TREC Conference.]]Google Scholar
Hull, D. 1993. Using statistical testing in the evaluation of retrieval experiments. In SIGIR: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM New York. 329--338.]] Google Scholar
Järvelin, K. and Kekäläinen, J. 2000. IR evaluation methods for retrieving highly relevant documents. In SIGIR: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM New York. 41--48.]] Google Scholar
Järvelin, K. and Kekäläinen, J. 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4, 422--446.]] Google Scholar
Kando, N., Kuriyama, K., and Yoshioka, M. 2001. Information retrieval system evaluation using multi-grade relevance judgements - Discussion on averageable single-numbered measures (in japanese). Tech. Rep.]]Google Scholar
Kazai, G. and Lalmas, M. 2005. Notes on what to measure in INEX. In Proceedings of the INEX Workshop on Element Retrieval Methodology. Glasgow, July 2005.]]Google Scholar
Kazai, G. and Lalmas, M. 2006. INEX 2005 evaluation metrics. In Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Schloss Dagstuhl, 28--30 Nov. 2005. Lecture Notes in Computer Science vol. 3977. Springer-Verlag. 16--29.]]Google Scholar
Kazai, G., Lalmas, M., and de Vries, A. P. 2004. The overlap problem in content-oriented XML retrieval evaluation. In SIGIR: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. 72--79.]] Google Scholar
Kazai, G., Lalmas, M., and de Vries, A. P. 2005. Reliability tests for the XCG and inex-2002 metrics. In Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Schloss Dagstuhl, 6--8 Germany, Dec. 2004. Lecture Notes in Computer Science vol. 3493. Springer-Verlag. 60--72.]]Google Scholar
Kazai, G., Lalmas, M., and Piwowarski, B. 2004. INEX relevance assessment guide. In Proceedings of the 2nd workshop of the Initiative for the Evaluation of XML Retrieval. Dagstuhl, Germany. 204--209. http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf.]]Google Scholar
Kekäläinen, J. 2005. Binary and graded relevance in IR evaluations: Comparison of the effects on ranking of IR systems. Inf. Process. Manage. 41, 5, 1019--1033.]] Google Scholar
Kekäläinen, J. and Järvelin, K. 2002. Using graded relevance assessments in IR evaluation. J. American Society Inf. Sci. Technol. 53, 13, 1120--1129.]] Google Scholar
Lalmas, M. and Malik, S. 2004. INEX 2004 retrieval task and result submission specification. In Proceedings of the Advances in XML Information Retrieval: 3rd Workshop of the Initiative for the Evaluation of XML Retrieval. Schloss Dagstuhl, Germany. Lecture Notes in Computer Science vol. 3493. Springer-Verlag.]]Google Scholar
Larsen, B., Malik, S., and Tombros, A. 2006. The interactive track at INEX 2005. In Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Schloss Dagstuhl, Germany, 28--30 Nov. Lecture Notes in Computer Science vol. 3977. Springer-Verlag. 404--417.]] Google Scholar
Lesk, M. and Salton, G. 1969. Relevance assessments and retrieval system evaluation. Inf. Storage and Retrieval 4, 4, 343--359.]]Google Scholar
Malik, S., Kazai, G., Lalmas, M., and Fuhr, N. 2006. Overview of INEX 2005. In Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Schloss Dagstuhl, Germany, 28--30 Nov. Lecture Notes in Computer Science vol. 3977. Springer-Verlag. 1--15.]] Google Scholar
Malik, S., Lalmas, M., and Fuhr, N. 2005. Overview of INEX 2004. In Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Schloss Dagstuhl, Germany, 6--8 Dec. Lecture Notes in Computer Science vol. 3493. Springer-Verlag. 1--15.]]Google Scholar
Piwowarski, B. and Gallinari, P. 2004. Expected ratio of relevant units: A measure for structured document information retrieval. In Proceedings of the 2nd Workshop of the INitiative for the Evaluation of XML retrieval (INEX). Dagstuhl, Germany, Dec. 2003. 158--166.]]Google Scholar
Piwowarski, B., Gallinari, P., and Dupret, G. 2006. An extension of precision-recall with user modelling (PRUM): Application to XML retrieval. ACM Trans. Inf. Syst. (to appear).]] Google Scholar
Raghavan, V. V., Bollmann, P., and Jung, G. S. 1989. Retrieval system evaluation using recall and precision: Problems and answers. In SIGIR: Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. 59--68.]] Google Scholar
Rijsbergen, C. J. V. 1979. Information Retrieval. Butterworth-Heinemann, Newton, MA http://www.dcs.glasgow.ac.uk/Keith/Preface.html.]] Google Scholar
Sakai, T. 2004. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the NTCIR Workshop 4 Meeting Working Notes.]]Google Scholar
Sakai, T. 2005. The reliability of metrics based on graded relevance. In AIRS, G. G. Lee et al. eds. Lecture Notes in Computer Science vol. 3689. Springer-Verlag. 1--16.]] Google Scholar
Sanderson, M. and Zobel, J. 2005. Information retrieval system evaluation: Effort, sensitivity, and reliability. In SIGIR: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. 162--169.]] Google Scholar
Schamber, L. 1994. Relevance and information behavior. Ammal Rev. Inf. Sci. Technol. 3--48.]]Google Scholar
Sparck Jones, K. and Willett, P., eds. 1997. Readings in Information Retrieval. Morgan Kaufmann, San Francisco, CA.]] Google Scholar
Tague-Sutcliffe, J. 1992. The pragmatics of information retrieval experimentation, revisited. Inf. Process. Manage. 28, 4, 467--490.]] Google Scholar
Tombros, T., Larsen, B., and Malik, S. 2005. The interactive track at INEX 2004. In Proceedings of the 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX). Dagstuhl, Germany, Dec. 2004.]]Google Scholar
Trotman, A. and Sigurbjörnsson, B. 2005. Narrowed Extended XPath I (NEXI). In Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Schloss Dagstuhl, Germany, 6--8 Dec. 2004. Lecture Notes in Computer Science vol. 3493. Springer-Verlag. 41--53.]]Google Scholar
Vegas, J., de la Fuente, P., and Crestani, F. 2002. A graphical user interface for structured document retrieval. In Proceedings of the 24th BCS-IRSG European Colloquium on IR Research. Springer-Verlag. 268--283.]] Google Scholar
Voorhees, E. M. 2000. Variations in relevance judgments and the measurement of retrieval effectiveness. Inf. Process. Manage. 36, 5, 697--716.]] Google Scholar
Voorhees, E. M. 2001. Evaluation by highly relevant documents. In SIGIR: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM New York. 74--82.]] Google Scholar
Voorhees, E. M. 2003a. Overview of the TREC 2003 question answering track. In Proceedings of the Text REtrieval Conference. Gaithersburg, Germany.]]Google Scholar
Voorhees, E. M. 2003b. Overview of the TREC 2003 robust retrieval track. In Proceedings of the TREC Conference. 69--77.]]Google Scholar
Voorhees, E. M. and Buckley, C. 2002. The effect of topic set size on retrieval experiment error. In SIGIR: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York. 316--323.]] Google Scholar
Voorhees, E. M. and Harman, D. K. 2005. TREC: Experiment and Evaluation in Information Retrieval. MIT Press Cambridge, MA.]] Google Scholar
Wallis, P. and Thom, J. A. 1996. Relevance judgments for assessing recall. Inf. Process. Manage. 32, 3, 273--286.]] Google Scholar

Index Terms

eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval
1. Information systems
  1. Information retrieval

Recommendations

Cumulated gain-based evaluation of IR techniques

Modern large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation. In order to develop ...
Read More
The overlap problem in content-oriented XML retrieval evaluation
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Within the INitiative for the Evaluation of XML Retrieval(INEX) a number of metrics to evaluate the effectiveness of content-oriented XML retrieval approaches were developed. Although these metrics provide a solution towards addressing the problem of ...
Read More
Sound and complete relevance assessment for XML retrieval

In information retrieval research, comparing retrieval approaches requires test collections consisting of documents, user requests and relevance assessments. Obtaining relevance assessments that are as sound and complete as possible is crucial for the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 24, Issue 4
October 2006
138 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1185877
Issue’s Table of Contents

Copyright © 2006 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2006
Published in tois Volume 24, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
INEX
XML retrieval
cumulated gain
dependency
evaluation
metrics
near-miss
overlap
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 674
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Cumulated gain-based evaluation of IR techniques

The overlap problem in content-oriented XML retrieval evaluation

Sound and complete relevance assessment for XML retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Cumulated gain-based evaluation of IR techniques

The overlap problem in content-oriented XML retrieval evaluation

Sound and complete relevance assessment for XML retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media