Skip to main content
Top

2020 | OriginalPaper | Chapter

Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations

Authors : Tetsuya Sakai, Peng Xiao

Published in: Information Retrieval Technology

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The present study concerns depth-k pooling for building IR test collections. At TREC, pooled documents are traditionally presented in random order to the assessors to avoid judgement bias. In contrast, an approach that has been used widely at NTCIR is to prioritise the pooled documents based on “pseudorelevance,” in the hope of letting assessors quickly form an idea as to what constitutes a relevant document and thereby judge more efficiently and reliably. While the recent TREC 2017 Common Core Track went beyond depth-k pooling and adopted a method for selecting documents to judge dynamically, even this task let the assessors process the usual depth-10 pools first: the idea was to give the assessors a “burn-in” period, which actually appears to echo the view of the NTCIR approach. Our research questions are: (1) Which depth-k ordering strategy enables more efficient assessments? Randomisation, or prioritisation by pseudorelevance? (2) Similarly, which of the two strategies enables higher inter-assessor agreements? Our experiments based on two English web search test collections with multiple sets of graded relevance assessments suggest that randomisation outperforms prioritisation in both respects on average, although the results are statistically inconclusive. We then discuss a plan for a much larger experiment with sufficient statistical power to obtain the final verdict.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
3
The “sort by document number” advice from TREC should not be taken literally: if the publication date is embedded in the document identifier, then sorting by document ID would mean sorting by time, which is not what we want. Similarly, if the target document collection consists of multiple subcollections and the document IDs contain different prefixes accordingly, such a sort would actually cluster documents by source (See [5]), which again is not what we want. Throughout this study, we interpret the advice from TREC as “randomise”.
 
6
Although it is debatable whether making fewer judgement corrections is better, it does imply higher efficiency.
 
7
We refrain from treating the official assessments as the gold data: we argue that they are also just one version of qrels.
 
8
Microsoft version of normalised discounted cumulative gain, cutoff-version of Q-measure, and normalised expected reciprocal rank, respectively [13].
 
9
“It is astonishing how many papers report work in which a slight effect is investigated with a small number of trials. Given that such investigations would generally fail even if the hypothesis was correct, it seems likely that many interesting research questions are unnecessarily discarded.” [22, p. 225].
 
Literature
1.
go back to reference Allan, J., Carterette, B., Aslam, J.A., Pavlu, V., Dachev, B., Kanoulas, E.: Million query track 2007 overview (2008) Allan, J., Carterette, B., Aslam, J.A., Pavlu, V., Dachev, B., Kanoulas, E.: Million query track 2007 overview (2008)
2.
go back to reference Allan, J., Harman, D., Kanoulas, E., Li, D., Van Gysel, C., Voorhees, E.: TREC common core track overview. In: Proceedings of TREC 2017 (2018) Allan, J., Harman, D., Kanoulas, E., Li, D., Van Gysel, C., Voorhees, E.: TREC common core track overview. In: Proceedings of TREC 2017 (2018)
3.
go back to reference Carterette, B., Pavlu, V., Fang, H., Kanoulas, E.: Million query track 2009 overview. In: Proceedings of TREC 2009 (2010) Carterette, B., Pavlu, V., Fang, H., Kanoulas, E.: Million query track 2009 overview. In: Proceedings of TREC 2009 (2010)
4.
go back to reference Cormack, G.V., Palmer, C.R., Clarke, C.L.: Efficient construction of large test collections. In: Proceedings of ACM SIGIR 1998, pp. 282–289 (1998) Cormack, G.V., Palmer, C.R., Clarke, C.L.: Efficient construction of large test collections. In: Proceedings of ACM SIGIR 1998, pp. 282–289 (1998)
5.
go back to reference Damessie, T.T., Culpepper, J.S., Kim, J., Scholer, F.: Presentation ordering effects on assessor agreement. In: Proceedings of ACM CIKM 2018, pp. 723–732 (2018) Damessie, T.T., Culpepper, J.S., Kim, J., Scholer, F.: Presentation ordering effects on assessor agreement. In: Proceedings of ACM CIKM 2018, pp. 723–732 (2018)
6.
go back to reference Eisenberg, M., Barry, C.: Order effects: a study of the possible influence of presentation order on user judgments of document relevance. J. Am. Soc. Inf. Sci. 39(5), 293–300 (1988)CrossRef Eisenberg, M., Barry, C.: Order effects: a study of the possible influence of presentation order on user judgments of document relevance. J. Am. Soc. Inf. Sci. 39(5), 293–300 (1988)CrossRef
7.
go back to reference Harlow, L.L., Mulaik, S.A., Steiger, J.H.: What If There Were No Significance Tests? (Classic Edition). Routledge, London (2016)CrossRef Harlow, L.L., Mulaik, S.A., Steiger, J.H.: What If There Were No Significance Tests? (Classic Edition). Routledge, London (2016)CrossRef
8.
go back to reference Harman, D.K.: The TREC test collections. In: Voorhees, E.M., Harman, D.K. (eds.) TREC: Experiment and Evaluation in Information Retrieval (Chapter 2). The MIT Press, Cambridge (2005) Harman, D.K.: The TREC test collections. In: Voorhees, E.M., Harman, D.K. (eds.) TREC: Experiment and Evaluation in Information Retrieval (Chapter 2). The MIT Press, Cambridge (2005)
9.
go back to reference Huang, M.H., Wang, H.Y.: The influence of document presentation order and number of documents judged on users’ judgments of relevance. J. Am. Soc. Inf. Sci. 55(11), 970–979 (2004)CrossRef Huang, M.H., Wang, H.Y.: The influence of document presentation order and number of documents judged on users’ judgments of relevance. J. Am. Soc. Inf. Sci. 55(11), 970–979 (2004)CrossRef
11.
go back to reference Losada, D.E., Parapar, J., Barreiro, Á.: Multi-armed bandits for ordering judgements in pooling-based evaluation. Inf. Process. Manag. 53(3), 1005–1025 (2017)CrossRef Losada, D.E., Parapar, J., Barreiro, Á.: Multi-armed bandits for ordering judgements in pooling-based evaluation. Inf. Process. Manag. 53(3), 1005–1025 (2017)CrossRef
12.
go back to reference Losada, D.E., Parapar, J., Barreiro, Á.: When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections. J. Assoc. Inf. Sci. Technol. 70(1), 49–60 (2018)CrossRef Losada, D.E., Parapar, J., Barreiro, Á.: When to stop making relevance judgments? A study of stopping methods for building information retrieval test collections. J. Assoc. Inf. Sci. Technol. 70(1), 49–60 (2018)CrossRef
13.
go back to reference Luo, C., Sakai, T., Liu, Y., Dou, Z., Xiong, C., Xu, J.: Overview of the NTCIR-13 we want web task. In: Proceedings of NTCIR-13, pp. 394–401 (2017) Luo, C., Sakai, T., Liu, Y., Dou, Z., Xiong, C., Xu, J.: Overview of the NTCIR-13 we want web task. In: Proceedings of NTCIR-13, pp. 394–401 (2017)
14.
go back to reference Mao, J., Sakai, T., Luo, C., Xiao, P., Liu, Y., Dou, Z.: Overview of the NTCIR-14 we want web task. In: Proceedings of NTCIR-14 (2019) Mao, J., Sakai, T., Luo, C., Xiao, P., Liu, Y., Dou, Z.: Overview of the NTCIR-14 we want web task. In: Proceedings of NTCIR-14 (2019)
15.
go back to reference Rosenthal, R.: The “file drawer problem” and tolerance for null results. Psychol. Bull. 86(3), 638–641 (1979)CrossRef Rosenthal, R.: The “file drawer problem” and tolerance for null results. Psychol. Bull. 86(3), 638–641 (1979)CrossRef
16.
go back to reference Sakai, T.: Statistical significance, power, and sample sizes: a systematic review of SIGIR and TOIS, 2006–2015. In: Proceedings of ACM SIGIR 2016, pp. 5–14 (2016) Sakai, T.: Statistical significance, power, and sample sizes: a systematic review of SIGIR and TOIS, 2006–2015. In: Proceedings of ACM SIGIR 2016, pp. 5–14 (2016)
19.
go back to reference Sakai, T., et al.: Overview of the NTCIR-7 ACLIA IR4QA task. In: Proceedings of NTCIR-7, pp. 77–114 (2008) Sakai, T., et al.: Overview of the NTCIR-7 ACLIA IR4QA task. In: Proceedings of NTCIR-7, pp. 77–114 (2008)
21.
go back to reference Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of ACM SIGIR 1998, pp. 307–314 (1998) Zobel, J.: How reliable are the results of large-scale information retrieval experiments? In: Proceedings of ACM SIGIR 1998, pp. 307–314 (1998)
Metadata
Title
Randomised vs. Prioritised Pools for Relevance Assessments: Sample Size Considerations
Authors
Tetsuya Sakai
Peng Xiao
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-42835-8_9