Skip to main content
Top

2022 | OriginalPaper | Chapter

Streamlining Evaluation with ir-measures

Authors : Sean MacAvaney, Craig Macdonald, Iadh Ounis

Published in: Advances in Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present ir-measures, a new tool that makes it convenient to calculate a diverse set of evaluation measures used in information retrieval. Rather than implementing its own measure calculations, ir-measures provides a common interface to a handful of evaluation tools. The necessary tools are automatically invoked (potentially multiple times) to calculate all the desired metrics, simplifying the evaluation process for the user. The tool also makes it easier for researchers to use recently-proposed measures (such as those from the C/W/L framework) alongside traditional measures, potentially encouraging their adoption.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Azzopardi, L., Mackenzie, J., Moffat, A.: ERR is not C/W/L: exploring the relationship between expected reciprocal rank and other metrics. In: ICTIR (2021) Azzopardi, L., Mackenzie, J., Moffat, A.: ERR is not C/W/L: exploring the relationship between expected reciprocal rank and other metrics. In: ICTIR (2021)
2.
go back to reference Azzopardi, L., Thomas, P., Moffat, A.: Cwl_eval: an evaluation tool for information retrieval. In: SIGIR (2019) Azzopardi, L., Thomas, P., Moffat, A.: Cwl_eval: an evaluation tool for information retrieval. In: SIGIR (2019)
3.
go back to reference Bajaj, P., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: CoCo@NIPS (2016) Bajaj, P., et al.: MS MARCO: a human generated machine reading comprehension dataset. In: CoCo@NIPS (2016)
4.
go back to reference Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: SIGIR (2004) Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: SIGIR (2004)
5.
go back to reference Buckley, C., Voorhees, E.M.: Retrieval System Evaluation. MIT Press, Cambridge (2005) Buckley, C., Voorhees, E.M.: Retrieval System Evaluation. MIT Press, Cambridge (2005)
6.
go back to reference Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: CIKM (2009) Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: CIKM (2009)
7.
go back to reference Clarke, C.L.A., et al.: Novelty and diversity in information retrieval evaluation. In: SIGIR (2008) Clarke, C.L.A., et al.: Novelty and diversity in information retrieval evaluation. In: SIGIR (2008)
8.
go back to reference Clarke, C.L.A., Kolla, M., Vechtomova, O.: An effectiveness measure for ambiguous and underspecified queries. In: ICTIR (2009) Clarke, C.L.A., Kolla, M., Vechtomova, O.: An effectiveness measure for ambiguous and underspecified queries. In: ICTIR (2009)
9.
go back to reference Clarke, C.L.A., Vtyurina, A., Smucker, M.D.: Assessing top-k preferences. TOIS 39(3), 1–21 (2021)CrossRef Clarke, C.L.A., Vtyurina, A., Smucker, M.D.: Assessing top-k preferences. TOIS 39(3), 1–21 (2021)CrossRef
10.
go back to reference Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.: Overview of the TREC 2019 deep learning track. In: TREC (2019) Craswell, N., Mitra, B., Yilmaz, E., Campos, D., Voorhees, E.: Overview of the TREC 2019 deep learning track. In: TREC (2019)
11.
go back to reference Fuhr, N.: Some common mistakes in ir evaluation, and how they can be avoided. SIGIR Forum 51, 32–41 (2018)CrossRef Fuhr, N.: Some common mistakes in ir evaluation, and how they can be avoided. SIGIR Forum 51, 32–41 (2018)CrossRef
12.
go back to reference Harman, D.: Evaluation issues in information retrieval. IPM 28(4), 439–440 (1992)MathSciNet Harman, D.: Evaluation issues in information retrieval. IPM 28(4), 439–440 (1992)MathSciNet
13.
go back to reference Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. TOIS 20(4), 422–446 (2002)CrossRef Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. TOIS 20(4), 422–446 (2002)CrossRef
14.
go back to reference Jose, K.M., Nguyen, T., MacAvaney, S., Dalton, J., Yates, A.: Diffir: exploring differences in ranking models’ behavior. In: SIGIR (2021) Jose, K.M., Nguyen, T., MacAvaney, S., Dalton, J., Yates, A.: Diffir: exploring differences in ranking models’ behavior. In: SIGIR (2021)
15.
go back to reference Kantor, P., Voorhees, E.: The TREC-5 confusion track. Inf. Retr. 2(2–3), 165–176 (2000)CrossRef Kantor, P., Voorhees, E.: The TREC-5 confusion track. Inf. Retr. 2(2–3), 165–176 (2000)CrossRef
16.
go back to reference Lin, J., et al.: Supporting interoperability between open-source search engines with the common index file format. In: SIGIR (2020) Lin, J., et al.: Supporting interoperability between open-source search engines with the common index file format. In: SIGIR (2020)
17.
go back to reference Lucchese, C., Muntean, C.I., Nardini, F.M., Perego, R., Trani, S.: Rankeval: an evaluation and analysis framework for learning-to-rank solutions. In: SIGIR (2017) Lucchese, C., Muntean, C.I., Nardini, F.M., Perego, R., Trani, S.: Rankeval: an evaluation and analysis framework for learning-to-rank solutions. In: SIGIR (2017)
18.
go back to reference MacAvaney, S.: OpenNIR: a complete neural ad-hoc ranking pipeline. In: WSDM (2020) MacAvaney, S.: OpenNIR: a complete neural ad-hoc ranking pipeline. In: WSDM (2020)
19.
go back to reference MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: SIGIR (2021) MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., Goharian, N.: Simplified data wrangling with ir_datasets. In: SIGIR (2021)
20.
go back to reference Macdonald, C., Tonellotto, N.: Declarative experimentation ininformation retrieval using PyTerrier. In: Proceedings of ICTIR 2020 (2020) Macdonald, C., Tonellotto, N.: Declarative experimentation ininformation retrieval using PyTerrier. In: Proceedings of ICTIR 2020 (2020)
21.
go back to reference Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Inst: an adaptive metric for information retrieval evaluation. In: Australasian Document Computing Symposium (2015) Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Inst: an adaptive metric for information retrieval evaluation. In: Australasian Document Computing Symposium (2015)
22.
go back to reference Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Incorporating user expectations and behavior into the measurement of search effectiveness. TOIS 35(3), 1–38 (2017)CrossRef Moffat, A., Bailey, P., Scholer, F., Thomas, P.: Incorporating user expectations and behavior into the measurement of search effectiveness. TOIS 35(3), 1–38 (2017)CrossRef
23.
go back to reference Moffat, A., Scholer, F., Thomas, P.: Models and metrics: IR evaluation as a user process. In: Australasian Document Computing Symposium (2012) Moffat, A., Scholer, F., Thomas, P.: Models and metrics: IR evaluation as a user process. In: Australasian Document Computing Symposium (2012)
25.
go back to reference Palotti, J., Scells, H., Zuccon, G.: TrecTools: an open-source python library for information retrieval practitioners involved in TREC-like campaigns. In: SIGIR (2019) Palotti, J., Scells, H., Zuccon, G.: TrecTools: an open-source python library for information retrieval practitioners involved in TREC-like campaigns. In: SIGIR (2019)
26.
go back to reference Piwowarski, B.: Experimaestro and datamaestro: experiment and dataset managers (for IR). In: SIGIR (2020) Piwowarski, B.: Experimaestro and datamaestro: experiment and dataset managers (for IR). In: SIGIR (2020)
27.
go back to reference Sakai, T.: On Fuhr’s guideline for IR evaluation. SIGIR Forum 54, 1–8 (2020)CrossRef Sakai, T.: On Fuhr’s guideline for IR evaluation. SIGIR Forum 54, 1–8 (2020)CrossRef
28.
go back to reference Van Gysel, C., de Rijke, M.: Pytrec_eval: an extremely fast python interface to trec_eval. In: SIGIR (2018) Van Gysel, C., de Rijke, M.: Pytrec_eval: an extremely fast python interface to trec_eval. In: SIGIR (2018)
29.
go back to reference Van Rijsbergen, C.J.: Information retrieval (1979) Van Rijsbergen, C.J.: Information retrieval (1979)
30.
go back to reference Voorhees, E., et al.: Trec-covid: constructing a pandemic information retrieval test collection. ArXiv abs/2005.04474 (2020) Voorhees, E., et al.: Trec-covid: constructing a pandemic information retrieval test collection. ArXiv abs/2005.04474 (2020)
31.
go back to reference Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. In: CIKM (2006) Yilmaz, E., Aslam, J.A.: Estimating average precision with incomplete and imperfect judgments. In: CIKM (2006)
32.
go back to reference Zhang, F., Liu, Y., Li, X., Zhang, M., Xu, Y., Ma, S.: Evaluating web search with a bejeweled player model. In: SIGIR (2017) Zhang, F., Liu, Y., Li, X., Zhang, M., Xu, Y., Ma, S.: Evaluating web search with a bejeweled player model. In: SIGIR (2017)
Metadata
Title
Streamlining Evaluation with ir-measures
Authors
Sean MacAvaney
Craig Macdonald
Iadh Ounis
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-030-99739-7_38