Skip to main content
Erschienen in: Discover Computing 4/2022

06.08.2022

Highlighting exact matching via marking strategies for ad hoc document ranking with pretrained contextualized language models

verfasst von: Lila Boualili, Jose G. Moreno, Mohand Boughanem

Erschienen in: Discover Computing | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Pretrained language models (PLMs) exemplified by BERT have proven to be remarkably effective for ad hoc ranking. As opposed to pre-BERT models that required specialized neural components to capture different aspects of query-document relevance, PLMs are solely based on transformers where attention is the only mechanism used for extracting signals from term interactions. Thanks to the transformer’s cross-match attention, BERT was found to be an effective soft matching model. However, exact matching is still an essential signal for assessing the relevance of a document to an information-seeking query aside from semantic matching. We assume that BERT might benefit from explicit exact match cues to better adapt to the relevance classification task. In this work, we explore strategies for integrating exact matching signals using marker tokens to highlight exact term-matches between the query and the document. We find that this simple marking approach significantly improves over the common vanilla baseline. We empirically demonstrate the effectiveness of our approach through exhaustive experiments on three standard ad hoc benchmarks. Results show that explicit exact match cues conveyed by marker tokens are beneficial for BERT and ELECTRA variant to achieve higher or at least comparable performance. Our findings support that traditional information retrieval cues such as exact matching are still valuable for large pretrained contextualized models such as BERT.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Akkalyoncu Yilmaz, Z., Yang, W., Zhang, H., & Lin, J. (2019). Cross-domain modeling of sentence-level evidence for document retrieval. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCN), ACL, Hong Kong, China, (pp. 3488–3494). Akkalyoncu Yilmaz, Z., Yang, W., Zhang, H., & Lin, J. (2019). Cross-domain modeling of sentence-level evidence for document retrieval. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCN), ACL, Hong Kong, China, (pp. 3488–3494).
Zurück zum Zitat Boualili, L., Moreno, J. G., & Boughanem, M. (2020). MarkedBERT: Integrating traditional IR cues in pre-trained language models for passage retrieval (pp. 1977–1980). New York, NY, USA: Association for Computing Machinery. Boualili, L., Moreno, J. G., & Boughanem, M. (2020). MarkedBERT: Integrating traditional IR cues in pre-trained language models for passage retrieval (pp. 1977–1980). New York, NY, USA: Association for Computing Machinery.
Zurück zum Zitat Câmara, A., & Hauff, C. (2020). Diagnosing bert with retrieval heuristics. In J. M. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro, M. J. Silva, & F. Martins (Eds.), Advances in information retrieval (pp. 605–618). Cham: Springer International Publishing. Câmara, A., & Hauff, C. (2020). Diagnosing bert with retrieval heuristics. In J. M. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro, M. J. Silva, & F. Martins (Eds.), Advances in information retrieval (pp. 605–618). Cham: Springer International Publishing.
Zurück zum Zitat Chen, X., Li, C., He, B., & Sun, Y. (2019). UCAS at TREC-2019 deep learning track. In Voorhees EM, Ellis A (eds) Proceedings of the Twenty-Eighth Text REtrieval Conference, TREC 2019, (Vol. 1250). 2019, National Institute of Standards and Technology (NIST), NIST Special Publication: Gaithersburg, Maryland, USA. Chen, X., Li, C., He, B., & Sun, Y. (2019). UCAS at TREC-2019 deep learning track. In Voorhees EM, Ellis A (eds) Proceedings of the Twenty-Eighth Text REtrieval Conference, TREC 2019, (Vol. 1250). 2019, National Institute of Standards and Technology (NIST), NIST Special Publication: Gaithersburg, Maryland, USA.
Zurück zum Zitat Chen, X., He, B., Sun, L., & Sun, Y. (2020). ICIP at TREC-2020 deep learning track. In Voorhees EM, Ellis A (eds) Proceedings of the Twenty-Ninth Text REtrieval Conference, TREC 2020, (Vol. 1266) . National Institute of Standards and Technology (NIST), NIST Special Publication: Gaithersburg, Maryland, USA. Chen, X., He, B., Sun, L., & Sun, Y. (2020). ICIP at TREC-2020 deep learning track. In Voorhees EM, Ellis A (eds) Proceedings of the Twenty-Ninth Text REtrieval Conference, TREC 2020, (Vol. 1266) . National Institute of Standards and Technology (NIST), NIST Special Publication: Gaithersburg, Maryland, USA.
Zurück zum Zitat Clark, K., Luong, M.T., Le, Q.V., & Manning, C.D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations. Clark, K., Luong, M.T., Le, Q.V., & Manning, C.D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations.
Zurück zum Zitat Craswell, N., Mitra, B., Yilmaz, E., Campos, D., & Voorhees, E.M. (2020). Overview of the TREC 2019 deep learning track. arXiv:2003.07820 Craswell, N., Mitra, B., Yilmaz, E., Campos, D., & Voorhees, E.M. (2020). Overview of the TREC 2019 deep learning track. arXiv:​2003.​07820
Zurück zum Zitat Dai, Z., & Callan, J. (2019a). Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv preprint arXiv:1910.10687 Dai, Z., & Callan, J. (2019a). Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv preprint arXiv:​1910.​10687
Zurück zum Zitat Dai, Z., & Callan, J. (2019b). Deeper text understanding for ir with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 985–988). Dai, Z., & Callan, J. (2019b). Deeper text understanding for ir with contextual neural language modeling. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 985–988).
Zurück zum Zitat Dai, Z., & Callan, J. (2020a). Context-aware document term weighting for ad-hoc search. In Proceedings of The Web Conference 2020, Association for Computing Machinery: New York, NY, USA, (pp. 1897-1907). Dai, Z., & Callan, J. (2020a). Context-aware document term weighting for ad-hoc search. In Proceedings of The Web Conference 2020, Association for Computing Machinery: New York, NY, USA, (pp. 1897-1907).
Zurück zum Zitat Dai, Z., & Callan, J. (2020b). Context-aware term weighting for first stage passage retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery: New York, NY, USA, (pp. 1533-1536). Dai, Z., & Callan, J. (2020b). Context-aware term weighting for first stage passage retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery: New York, NY, USA, (pp. 1533-1536).
Zurück zum Zitat Dai, Z., Xiong, C., Callan, J., & Liu, Z. (2018). Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Association for Computing Machinery: New York, NY, USA, WSDM ’18, (pp. 126-134) Dai, Z., Xiong, C., Callan, J., & Liu, Z. (2018). Convolutional neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, Association for Computing Machinery: New York, NY, USA, WSDM ’18, (pp. 126-134)
Zurück zum Zitat Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 NAACL-HLT Conference, 1, 4171–4186. Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 NAACL-HLT Conference, 1, 4171–4186.
Zurück zum Zitat Guo, J., Fan, Y., Ai, Q., & Croft, W. (2016). A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (pp. 55–64). Guo, J., Fan, Y., Ai, Q., & Croft, W. (2016). A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (pp. 55–64).
Zurück zum Zitat Humeau, S., Shuster, K., Lachaux, M.A., & Weston, J. (2020). Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In International Conference on Learning Representations. Humeau, S., Shuster, K., Lachaux, M.A., & Weston, J. (2020). Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In International Conference on Learning Representations.
Zurück zum Zitat Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W.t. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, (pp. 6769–6781). Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W.t. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, (pp. 6769–6781).
Zurück zum Zitat Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and effective passage search via contextualized late interaction over BERT (pp. 39–48). New York, NY, USA: Association for Computing Machinery. Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and effective passage search via contextualized late interaction over BERT (pp. 39–48). New York, NY, USA: Association for Computing Machinery.
Zurück zum Zitat Kingma, D.P., & Ba, J. (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. Kingma, D.P., & Ba, J. (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
Zurück zum Zitat Lavrenko, V., & Croft, W.B. (2001). Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery: New York, NY, USA, SIGIR ’01, (pp. 120-127). Lavrenko, V., & Croft, W.B. (2001). Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery: New York, NY, USA, SIGIR ’01, (pp. 120-127).
Zurück zum Zitat Li, C., Yates, A., MacAvaney, S., He, B., & Sun, Y. (2020). PARADE: passage representation aggregation for document reranking. arXiv:2008.09093 Li, C., Yates, A., MacAvaney, S., He, B., & Sun, Y. (2020). PARADE: passage representation aggregation for document reranking. arXiv:​2008.​09093
Zurück zum Zitat Li, H. (2011). Learning to Rank for Information Retrieval and Natural Language Processing. Morgan & Claypool Publishers. Li, H. (2011). Learning to Rank for Information Retrieval and Natural Language Processing. Morgan & Claypool Publishers.
Zurück zum Zitat Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 225–331. Liu, T. Y. (2009). Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3), 225–331.
Zurück zum Zitat Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized BERT pretraining approach. arXiv:1907.11692 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized BERT pretraining approach. arXiv:​1907.​11692
Zurück zum Zitat Luan, Y., Eisenstein, J., Toutanova, K., & Collins, M, (2020). Sparse, dense, and attentional representations for text retrieval. arXiv:2005.00181 Luan, Y., Eisenstein, J., Toutanova, K., & Collins, M, (2020). Sparse, dense, and attentional representations for text retrieval. arXiv:​2005.​00181
Zurück zum Zitat MacAvaney, S., Yates, A., Cohan, A., & Goharian, N. (2019). Cedr: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. (pp. 1101–1104). MacAvaney, S., Yates, A., Cohan, A., & Goharian, N. (2019). Cedr: Contextualized embeddings for document ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. (pp. 1101–1104).
Zurück zum Zitat MacAvaney, S., Feldman, S., Goharian, N., Downey, D., & Cohan, A. (2020a). ABNIRML: analyzing the behavior of neural IR models. arXiv:2011.00696 MacAvaney, S., Feldman, S., Goharian, N., Downey, D., & Cohan, A. (2020a). ABNIRML: analyzing the behavior of neural IR models. arXiv:​2011.​00696
Zurück zum Zitat MacAvaney, S., Nardini, F.M., Perego, R., Tonellotto, N., Goharian, N., & Frieder, O. (2020b). Efficient document re-ranking for transformers by precomputing term representations. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery: New York, NY, USA, (pp. 49-58). MacAvaney, S., Nardini, F.M., Perego, R., Tonellotto, N., Goharian, N., & Frieder, O. (2020b). Efficient document re-ranking for transformers by precomputing term representations. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery: New York, NY, USA, (pp. 49-58).
Zurück zum Zitat Mitra, B., Diaz, F., & Craswell, N. (2017). Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web. (pp. 1291–1299). Mitra, B., Diaz, F., & Craswell, N. (2017). Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web. (pp. 1291–1299).
Zurück zum Zitat Mitra, B., Craswell, N., et al. (2018). An introduction to neural information retrieval. Foundations and Trends in Information Retrieval, 13(1), 1–126. Mitra, B., Craswell, N., et al. (2018). An introduction to neural information retrieval. Foundations and Trends in Information Retrieval, 13(1), 1–126.
Zurück zum Zitat Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., & Deng, L. (2016). MS MARCO: A human generated machine reading comprehension dataset. arXiv:1611.09268 Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., & Deng, L. (2016). MS MARCO: A human generated machine reading comprehension dataset. arXiv:​1611.​09268
Zurück zum Zitat Nogueira, R., Lin, J., & Epistemic, A. (2019). From doc2query to doctttttquery. Online preprint 6. Nogueira, R., Lin, J., & Epistemic, A. (2019). From doc2query to doctttttquery. Online preprint 6.
Zurück zum Zitat Nogueira, R., Jiang, Z., Pradeep, R., & Lin, J. (2020). Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, (pp. 708–718). Nogueira, R., Jiang, Z., Pradeep, R., & Lin, J. (2020). Document ranking with a pretrained sequence-to-sequence model. In Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, (pp. 708–718).
Zurück zum Zitat Onal, K. D., Zhang, Y., Altingovde, I. S., Rahman, M. M., Karagoz, P., Braylan, A., Dang, B., Chang, H. L., Kim, H., Mcnamara, Q., Angert, A., Banner, E., Khetan, V., Mcdonnell, T., Nguyen, A. T., Xu, D., Wallace, B. C., Rijke, M., & Lease, M. (2018). Neural information retrieval: At the end of the early years. Information Retrieval Journal, 21(2–3), 111–182. Onal, K. D., Zhang, Y., Altingovde, I. S., Rahman, M. M., Karagoz, P., Braylan, A., Dang, B., Chang, H. L., Kim, H., Mcnamara, Q., Angert, A., Banner, E., Khetan, V., Mcdonnell, T., Nguyen, A. T., Xu, D., Wallace, B. C., Rijke, M., & Lease, M. (2018). Neural information retrieval: At the end of the early years. Information Retrieval Journal, 21(2–3), 111–182.
Zurück zum Zitat Padaki, R., Dai, Z., & Callan, J. (2020). Rethinking query expansion for bert reranking. In J. M. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro, M. J. Silva, & F. Martins (Eds.), Advances in Information Retrieval (pp. 297–304). Cham: Springer International Publishing. Padaki, R., Dai, Z., & Callan, J. (2020). Rethinking query expansion for bert reranking. In J. M. Jose, E. Yilmaz, J. Magalhães, P. Castells, N. Ferro, M. J. Silva, & F. Martins (Eds.), Advances in Information Retrieval (pp. 297–304). Cham: Springer International Publishing.
Zurück zum Zitat Padigela, H., Zamani, H., & Croft, W.B. (2019). Investigating the successes and failures of BERT for passage re-ranking. arXiv:1905.01758 Padigela, H., Zamani, H., & Croft, W.B. (2019). Investigating the successes and failures of BERT for passage re-ranking. arXiv:​1905.​01758
Zurück zum Zitat Pradeep, R., Ma, X., Zhang, X., Cui, H., Xu, R., Nogueira, R., & Lin, J. (2020). H2oloo at trec 2020: When all you got is a hammer... deep learning, health misinformation, and precision medicine. In Text Retrieval Conference (TREC). Pradeep, R., Ma, X., Zhang, X., Cui, H., Xu, R., Nogueira, R., & Lin, J. (2020). H2oloo at trec 2020: When all you got is a hammer... deep learning, health misinformation, and precision medicine. In Text Retrieval Conference (TREC).
Zurück zum Zitat Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1–67.MathSciNetMATH Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1–67.MathSciNetMATH
Zurück zum Zitat Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 conference on empirical methods in natural language processing. Association for Computational Linguistics. arXiv:1908.10084 Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 conference on empirical methods in natural language processing. Association for Computational Linguistics. arXiv:​1908.​10084
Zurück zum Zitat Reimers, N., & Gurevych, I. (2020). Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. Reimers, N., & Gurevych, I. (2020). Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics.
Zurück zum Zitat Rennings, D., Moraes, F., & Hauff, C. (2019). An axiomatic approach to diagnosing neural ir models. In L. Azzopardi, B. Stein, N. Fuhr, P. Mayr, C. Hauff, & D. Hiemstra (Eds.), Advances in Information Retrieval (pp. 489–503). Cham: Springer International Publishing. Rennings, D., Moraes, F., & Hauff, C. (2019). An axiomatic approach to diagnosing neural ir models. In L. Azzopardi, B. Stein, N. Fuhr, P. Mayr, C. Hauff, & D. Hiemstra (Eds.), Advances in Information Retrieval (pp. 489–503). Cham: Springer International Publishing.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., & Gomez, A.N., Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, (pp. 5998–6008). Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., & Gomez, A.N., Kaiser, Ł., Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, (pp. 5998–6008).
Zurück zum Zitat Voorhees, E., Alam, T., Bedrick, S., Demner-Fushman, D., Hersh, W.R., Lo, K., Roberts, K., Soboroff, I., & Wang, L.L. (2021). Trec-covid: Constructing a pandemic information retrieval test collection. SIGIR Forum 54(1). Voorhees, E., Alam, T., Bedrick, S., Demner-Fushman, D., Hersh, W.R., Lo, K., Roberts, K., Soboroff, I., & Wang, L.L. (2021). Trec-covid: Constructing a pandemic information retrieval test collection. SIGIR Forum 54(1).
Zurück zum Zitat Wang, W., Bi, B., Yan, M., Wu, C., Xia, J., Bao, Z., Peng, L., Si, L. (2020). Structbert: Incorporating language structures into pre-training for deep language understanding. In International Conference on Learning Representations. Wang, W., Bi, B., Yan, M., Wu, C., Xia, J., Bao, Z., Peng, L., Si, L. (2020). Structbert: Incorporating language structures into pre-training for deep language understanding. In International Conference on Learning Representations.
Zurück zum Zitat Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Online, (pp. 38–45). Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. (2020). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Online, (pp. 38–45).
Zurück zum Zitat Xiong, C., Dai, Z., Callan, J., Liu, Z., & Power, R. (2017). End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, NY, USA, SIGIR ’17,( pp. 55-64). Xiong, C., Dai, Z., Callan, J., Liu, Z., & Power, R. (2017). End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, NY, USA, SIGIR ’17,( pp. 55-64).
Zurück zum Zitat Xiong, L., Xiong, C., Li, Y., Tang, K., Liu, J., Bennett, P.N., Ahmed, J., & Overwijk, A. (2021). Approximate nearest neighbor negative contrastive learning for dense text retrieval. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net. Xiong, L., Xiong, C., Li, Y., Tang, K., Liu, J., Bennett, P.N., Ahmed, J., & Overwijk, A. (2021). Approximate nearest neighbor negative contrastive learning for dense text retrieval. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net.
Zurück zum Zitat Yan, M., Li, C., Xia, J., & Wang, W. (2019). Idst at trec 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In:TREC. Yan, M., Li, C., Xia, J., & Wang, W. (2019). Idst at trec 2019 deep learning track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In:TREC.
Zurück zum Zitat Yang, P., Fang, H., & Lin, J. (2017). Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 1253–1256). Yang, P., Fang, H., & Lin, J. (2017). Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 1253–1256).
Zurück zum Zitat Yang, W., Lu, K., Yang, P., & Lin, J. (2019a). Critically examining the “neural hype” weak baselines and the additivity of effectiveness gains from neural ranking models. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 1129–1132). Yang, W., Lu, K., Yang, P., & Lin, J. (2019a). Critically examining the “neural hype” weak baselines and the additivity of effectiveness gains from neural ranking models. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (pp. 1129–1132).
Metadaten
Titel
Highlighting exact matching via marking strategies for ad hoc document ranking with pretrained contextualized language models
verfasst von
Lila Boualili
Jose G. Moreno
Mohand Boughanem
Publikationsdatum
06.08.2022
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 4/2022
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-022-09414-x

Weitere Artikel der Ausgabe 4/2022

Discover Computing 4/2022 Zur Ausgabe

Premium Partner