Skip to main content

2022 | OriginalPaper | Buchkapitel

Less is Less: When are Snippets Insufficient for Human vs Machine Relevance Estimation?

verfasst von : Gabriella Kazai, Bhaskar Mitra, Anlei Dong, Nick Craswell, Linjun Yang

Erschienen in: Advances in Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Traditional information retrieval (IR) ranking models process the full text of documents. Newer models based on Transformers, however, would incur a high computational cost when processing long texts, so typically use only snippets from the document instead. The model’s input based on a document’s URL, title, and snippet (UTS) is akin to the summaries that appear on a search engine results page (SERP) to help searchers decide which result to click. This raises questions about when such summaries are sufficient for relevance estimation by the ranking model or the human assessor, and whether humans and machines benefit from the document’s full text in similar ways. To answer these questions, we study human and neural model based relevance assessments on 12k query-documents sampled from Bing’s search logs. We compare changes in the relevance assessments when only the document summaries and when the full text is also exposed to assessors, studying a range of query and document properties, e.g., query type, snippet length. Our findings show that the full text is beneficial for humans and a BERT model for similar query and document types, e.g., tail, long queries. A closer look, however, reveals that humans and machines respond to the additional input in very different ways. Adding the full text can also hurt the ranker’s performance, e.g., for navigational queries.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bolotova, V., Blinov, V., Zheng, Y., Croft, W.B., Scholer, F., Sanderson, M.: Do people and neural nets pay attention to the same words: studying eye-tracking data for non-factoid QA evaluation. In: Proceedings of CIKM, pp. 85–94 (2020). https://doi.org/10.1145/3340531.3412043 Bolotova, V., Blinov, V., Zheng, Y., Croft, W.B., Scholer, F., Sanderson, M.: Do people and neural nets pay attention to the same words: studying eye-tracking data for non-factoid QA evaluation. In: Proceedings of CIKM, pp. 85–94 (2020). https://​doi.​org/​10.​1145/​3340531.​3412043
Zurück zum Zitat Clarke, C.L., Agichtein, E., Dumais, S., White, R.W.: The influence of caption features on clickthrough patterns in web search. In: Proceedings of SIGIR, pp. 135–142. ACM (2007) Clarke, C.L., Agichtein, E., Dumais, S., White, R.W.: The influence of caption features on clickthrough patterns in web search. In: Proceedings of SIGIR, pp. 135–142. ACM (2007)
Zurück zum Zitat Cutrell, E., Guan, Z.: What are you looking for? an eye-tracking study of information usage in web search. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 407–416 (2007) Cutrell, E., Guan, Z.: What are you looking for? an eye-tracking study of information usage in web search. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 407–416 (2007)
Zurück zum Zitat Edmundson, H.: Problems in automatic abstracting. Commun. ACM 7(4), 259–263 (1964)CrossRef Edmundson, H.: Problems in automatic abstracting. Commun. ACM 7(4), 259–263 (1964)CrossRef
Zurück zum Zitat Hofstätter, S., Mitra, B., Zamani, H., Craswell, N., Hanbury, A.: Intra-document cascading: learning to select passages for neural document ranking. In: Proceedings of SIGIR, ACM. ACM (2021) Hofstätter, S., Mitra, B., Zamani, H., Craswell, N., Hanbury, A.: Intra-document cascading: learning to select passages for neural document ranking. In: Proceedings of SIGIR, ACM. ACM (2021)
Zurück zum Zitat Hofstätter, S., Zamani, H., Mitra, B., Craswell, N., Hanbury, A.: Local self-attention over long text for efficient document retrieval. In: Proceedings of SIGIR. ACM (2020) Hofstätter, S., Zamani, H., Mitra, B., Craswell, N., Hanbury, A.: Local self-attention over long text for efficient document retrieval. In: Proceedings of SIGIR. ACM (2020)
Zurück zum Zitat Kaisser, M., Hearst, M.A., Lowe, J.B.: Improving search results quality by customizing summary lengths. In: Proceedings of ACL-08: HLT, pp. 701–709 (2008) Kaisser, M., Hearst, M.A., Lowe, J.B.: Improving search results quality by customizing summary lengths. In: Proceedings of ACL-08: HLT, pp. 701–709 (2008)
Zurück zum Zitat Lagun, D., Agichtein, E.: Viewser: Enabling large-scale remote user studies of web search examination and interaction. In: Proceedings of SIGIR, pp. 365–374. ACM (2011) Lagun, D., Agichtein, E.: Viewser: Enabling large-scale remote user studies of web search examination and interaction. In: Proceedings of SIGIR, pp. 365–374. ACM (2011)
Zurück zum Zitat Lagun, D., Agichtein, E.: Re-examining search result snippet examination time for relevance estimation. In: Proceedings of SIGIR, pp. 1141–1142. ACM (2012) Lagun, D., Agichtein, E.: Re-examining search result snippet examination time for relevance estimation. In: Proceedings of SIGIR, pp. 1141–1142. ACM (2012)
Zurück zum Zitat Li, C., Yates, A., MacAvaney, S., He, B., Sun, Y.: Parade: Passage representation aggregation for document reranking. arXiv preprint arXiv:2008.09093 (2020) Li, C., Yates, A., MacAvaney, S., He, B., Sun, Y.: Parade: Passage representation aggregation for document reranking. arXiv preprint arXiv:​2008.​09093 (2020)
Zurück zum Zitat Lin, J., Nogueira, R., Yates, A.: Pretrained transformers for text ranking: Bert and beyond. arXiv preprint arXiv:2010.06467 (2020) Lin, J., Nogueira, R., Yates, A.: Pretrained transformers for text ranking: Bert and beyond. arXiv preprint arXiv:​2010.​06467 (2020)
Zurück zum Zitat Lou, Y., Caruana, R., Gehrke, J., Hooker, G.: Accurate intelligible models with pairwise interactions. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 623–631, KDD 2013. ACM (2013). https://doi.org/10.1145/2487575.2487579 Lou, Y., Caruana, R., Gehrke, J., Hooker, G.: Accurate intelligible models with pairwise interactions. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 623–631, KDD 2013. ACM (2013). https://​doi.​org/​10.​1145/​2487575.​2487579
Zurück zum Zitat Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends® Inf. Retrieval 3(4), 333–389 (2009) Robertson, S., Zaragoza, H., et al.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends® Inf. Retrieval 3(4), 333–389 (2009)
Zurück zum Zitat Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of SIGIR. ACM (1993) Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of SIGIR. ACM (1993)
Zurück zum Zitat Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: Proceedings of SIGIR, pp. 2–10. ACM (1998) Tombros, A., Sanderson, M.: Advantages of query biased summaries in information retrieval. In: Proceedings of SIGIR, pp. 2–10. ACM (1998)
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proceedings of NeurIPS (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Proceedings of NeurIPS (2017)
Zurück zum Zitat White, R.W., Jose, J.M., Ruthven, I.: A task-oriented study on the influencing effects of query-biased summarisation in web searching. Inf. Process. Manag. 39(5), 707–733 (2003)CrossRefMATH White, R.W., Jose, J.M., Ruthven, I.: A task-oriented study on the influencing effects of query-biased summarisation in web searching. Inf. Process. Manag. 39(5), 707–733 (2003)CrossRefMATH
Zurück zum Zitat Yan, M., et al.: IDST at TREC 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In: TREC (2019) Yan, M., et al.: IDST at TREC 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In: TREC (2019)
Zurück zum Zitat Yan, M., et al.: IDST at TREC 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In: TREC (2020) Yan, M., et al.: IDST at TREC 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In: TREC (2020)
Zurück zum Zitat Yue, Y., Patel, R., Roehrig, H.: Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1011–1018. ACM (2010) Yue, Y., Patel, R., Roehrig, H.: Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1011–1018. ACM (2010)
Metadaten
Titel
Less is Less: When are Snippets Insufficient for Human vs Machine Relevance Estimation?
verfasst von
Gabriella Kazai
Bhaskar Mitra
Anlei Dong
Nick Craswell
Linjun Yang
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-99739-7_18