nach oben

Erschienen in:

2022 | OriginalPaper | Buchkapitel

Establishing Strong Baselines For TripClick Health Retrieval

verfasst von : Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, Allan Hanbury

Erschienen in: Advances in Information Retrieval

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the – originally too noisy – training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact of different domain-specific pre-trained models on TripClick. Finally, we show that dense retrieval outperforms BM25 by considerable margins, even with simple training procedures.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Identifying Suitable Tasks for Inductive Transfer Through the Analysis of Feature Attributions

Nächstes Kapitel Less is Less: When are Snippets Insufficient for Human vs Machine Relevance Estimation?

The TripDatabase allows users to use different ranking schemes, such as popularity, source quality and pure relevance, as well as filtering results by facets. Unfortunately, this information is not available in the public dataset.

Bajaj, P., et al.: MS MARCO: a human generated MAchine Reading COmprehension dataset. In: Proceedings of NIPS (2016)

Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: Proceedings of EMNLP-IJCNLP (2019)

Chuklin, A., Markov, I., de Rijke, M.: Click Models for Web Search. Morgan & Claypool, San Rafael (2015)CrossRef

Cormack, G., Grossman, M.: Technology-assisted review in empirical medicine: waterloo participation in clef ehealth 2018. In CLEF (Working Notes) (2018)

Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL (2019)

Fernández-Pichel, M., Losada, D., Pichel, J.C., Elsweiler, D.: Citius at the trec 2020 health misinformation track (2020)

Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing (2020)

Hofstätter, S., Althammer, S., Schröder, M., Sertkan, M., Hanbury, A.: Improving efficient neural ranking models with cross-architecture knowledge distillation. arXiv preprint2010.02666 (2020)

Hofstätter, S., Hanbury, A.: Let’s measure run time! Extending the IR replicability infrastructure to include performance aspects. In: Proceedings of OSIRRC (2019)

10.

Hofstätter, S., Lipani, A., Althammer, S., Zlabinger, M., Hanbury, A.: Mitigating the position bias of transformer models in passage re-ranking. In: Proceedings of ECIR (2021)

11.

Hofstätter, S., Rekabsaz, N., Eickhoff, C., Hanbury, A.: On the effect of low-frequency terms on neural-IR models. In: Proceedings of SIGIR (2019)

12.

Hofstätter, S., Zlabinger, M., Hanbury, A.: Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking. In: Proceedings of ECAI (2020)

13.

Khattab, O., Zaharia, M.: Colbert: efficient and effective passage search via contextualized late interaction over Bert. In: Proceedings of SIGIR (2020)

14.

Li, M., Li, M., Xiong, K., Lin, J.: Multi-task dense retrieval via model uncertainty fusion for open-domain question answering. In: Findings of EMNLP (2021)

15.

Lima, L.C., et al.: Denmark’s participation in the search engine TREC COVID-19 challenge: lessons learned about searching for precise biomedical scientific information on COVID-19. arXiv preprint2011.12684 (2020)

16.

Lin, J.: A proposed conceptual framework for a representational approach to information retrieval. arXiv preprint2110.01529 (2021)

17.

Lu, W., Jiao, J., Zhang, R.: Twinbert: distilling knowledge to twin-structured Bert models for efficient retrieval. arXiv preprint arXiv:2002.06275 (2020)

18.

Luan, Y., Eisenstein, J., Toutanova, K., Collins, M.: Sparse, dense, and attentional representations for text retrieval. arXiv preprint arXiv:2005.00181 (2020)

19.

MacAvaney, S., Cohan, A., Goharian, N.: Sledge: a simple yet effective baseline for COVID-19 scientific knowledge search. arXiv preprint2005.02365 (2020)

20.

MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR: contextualized embeddings for document ranking. In: Proceedings of SIGIR (2019)

21.

McDonald, R., Brokos, G.-I., Androutsopoulos, I.: Deep relevance ranking using enhanced document-query interactions. arXiv preprint1809.01682 (2018)

22.

Möller, T., Reina, A., Jayakumar, R., Pietsch, M.: COVID-QA: a question answering dataset for COVID-19. In: Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, Online, July 2020. Association for Computational Linguistics (2020)

23.

Nentidis, A., et al.: Overview of BioASQ 2020: the Eighth BioASQ challenge on large-scale biomedical semantic indexing and question answering, pp. 194–214, September 2020

24.

Nogueira, R., Cho, K.: Passage re-ranking with bert. arXiv preprint arXiv:1901.04085 (2019)

25.

Paszke, A., et al.: Automatic differentiation in PYTORCH. In: Proceedings of NIPS-W (2017)

26.

Reddy, R.G., et al.: End-to-end QA on COVID-19: domain adaptation with synthetic training. arXiv preprint2012.01414 (2020)

27.

Rekabsaz, N., Lesota, O., Schedl, M., Brassey, J., Eickhoff, C.: Tripclick: the log files of a large health web search engine. arXiv preprint2103.07901 (2021)

28.

Roberts, K., et al.:. Overview of the TREC 2019 precision medicine track. The ... text REtrieval Conference: TREC. Text REtrieval Conference, 26 (2019)

29.

Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of Bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

30.

Tang, R., et al.: Rapidly bootstrapping a question answering dataset for COVID-19. CoRR, abs/2004.11339 (2020)

31.

Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: Beir: a heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663 4 2021

32.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., et al.: Attention is all you need. In: Proceedings of NIPS (2017)

33.

Voorhees, E., et al.: TREC-COVID: constructing a pandemic information retrieval test collection. ArXiv, abs/2005.04474 (2020)

34.

Wang, K., Reimers, N., Gurevych, I.: TSDAE: using transformer-based sequential denoising auto-encoderfor unsupervised sentence embedding learning. arXiv preprint arXiv:2104.06979, April 2021

35.

Wang, L.L., et al.: CORD-19: the COVID-19 open research dataset. arXiv preprint2004.10706 (2020)

36.

Wang, X.J., Grossman, M.R., Hyun, S.G.: Participation in TREC 2020 COVID track using continuous active learning. arXiv preprint2011.01453 (2020)

37.

Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. ArXiv, pages arXiv-1910 (2019)

38.

Xiong, C., et al.: CMT in TREC-COVID round 2: mitigating the generalization gaps from web to special domain search. arXiv preprint2011.01580 (2020)

39.

Xiong, L., et al.: Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020)

40.

Yilmaz, Z.A., Yang, W., Zhang, H., Lin,J.: Cross-domain modeling of sentence-level evidence for document retrieval. In: Proceedings of EMNLP-IJCNLP (2019)

Titel: Establishing Strong Baselines For TripClick Health Retrieval
verfasst von: Sebastian Hofstätter
Sophia Althammer
Mete Sertkan
Allan Hanbury
Verlag: Springer International Publishing
Buch: Advances in Information Retrieval
Print ISBN: 978-3-030-99738-0

Electronic ISBN: 978-3-030-99739-7

Copyright-Jahr: 2022
DOI: https://doi.org/10.1007/978-3-030-99739-7_17

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"