nach oben

Erschienen in:

2023 | OriginalPaper | Buchkapitel

CoSPLADE: Contextualizing SPLADE for Conversational Information Retrieval

verfasst von : Nam Hai Le, Thomas Gerald, Thibault Formal, Jian-Yun Nie, Benjamin Piwowarski, Laure Soulier

Erschienen in: Advances in Information Retrieval

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Conversational search is a difficult task as it aims at retrieving documents based not only on the current user query but also on the full conversation history. Most of the previous methods have focused on a multi-stage ranking approach relying on query reformulation, a critical intermediate step that might lead to a sub-optimal retrieval. Other approaches have tried to use a fully neural IR first-stage, but are either zero-shot or rely on full learning-to-rank based on a dataset with pseudo-labels. In this work, leveraging the CANARD dataset, we propose an innovative lightweight learning technique to train a first-stage ranker based on SPLADE. By relying on SPLADE sparse representations, we show that, when combined with a second-stage ranker based on T5Mono, the results are competitive on the TREC CAsT 2020 and 2021 tracks. The source code is available at https://github.com/nam685/cosplade.git.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Neural Approaches to Multilingual Information Retrieval

Nächstes Kapitel SR-CoMbEr: Heterogeneous Network Embedding Using Community Multi-view Enhanced Graph Convolutional Network for Automating Systematic Reviews

Note that for the second stage, we rely on weak labels since our model is similar to previous works. Given that the gap between first-stage and second-stage rankers continues to decrease, training a second-stage ranker might not be necessary in the future.

Selected by the organizer as the most relevant answer of a baseline system.

The weights can be found at https://huggingface.co/naver/splade-cocondenser-ensembledistil.

In the experiments, we also explore an alternative model where answers and queries are considered at once.

To improve coherence, we chose to make keywords follow their order of appearance in the context, but did not vary this experimental setting.

We used the Huggingface checkpoint https://huggingface.co/castorini/monot5-base-msmarco.

https://sites.google.com/view/qanta/projects/canard.

This might be due to the simple way to use past answers, i.e. Equation 4, but all the other variations we tried did not perform better.

Aliannejadi, M., Chakraborty, M., Ríssola, E.A., Crestani, F.: Harnessing evolution of multi-turn conversations for effective answer retrieval, pp. 33–42. https://doi.org/10.1145/3343413.3377968, http://arxiv.org/abs/1912.10554

Arabzadeh, N., Clarke, C.L.A.: Waterlooclarke at the TREC 2020 conversational assistant track (2020)

Christmann, P., Roy, R.S., Weikum, G.: Conversational question answering on heterogeneous sources. In: Amigó, E., Castells, P., Gonzalo, J., Carterette, B., Culpepper, J.S., Kazai, G. (eds.) SIGIR 2022: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022, pp. 144–154. ACM (2022). https://doi.org/10.1145/3477495.3531815

Clarke, C.L.A.: Waterlooclarke at the TREC 2019 conversational assistant track. In: Voorhees, E.M., Ellis, A. (eds.) Proceedings of the Twenty-Eighth Text REtrieval Conference, TREC 2019, Gaithersburg, Maryland, USA, 13–15 November 2019. NIST Special Publication, vol. 1250. National Institute of Standards and Technology (NIST) (2019). https://trec.nist.gov/pubs/trec28/papers/WaterlooClarke.C.pdf

Culpepper, J.S., Diaz, F., Smucker, M.D.: Research frontiers in information retrieval: report from the third strategic workshop on information retrieval in Lorne (SWIRL 2018). SIGIR Forum 52(1), 34–90 (2018). https://doi.org/10.1145/3274784.3274788

Dalton, J., Xiong, C., Callan, J.: CAsT 2020: The conversational assistance track overview, p. 10

Dalton, J., Xiong, C., Callan, J.: TREC CAsT 2019: The conversational assistance track overview. http://arxiv.org/abs/2003.13624

Dalton, J., Xiong, C., Callan, J.: TREC CAsT 2021: the conversational assistance track overview, p. 7 (2021)

Elgohary, A., Peskov, D., Boyd-Graber, J.: Can you unpack that? Learning to rewrite questions-in-context. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5918–5924. Association for Computational Linguistics, Hong Kong, November 2019. https://doi.org/10.18653/v1/D19-1605, https://aclanthology.org/D19-1605

10.

Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: From distillation to hard negative sampling: making sparse neural IR models more effective. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2022, pp. 2353–2359. Association for Computing Machinery, New York, July 2022. https://doi.org/10.1145/3477495.3531857

11.

Formal, T., Piwowarski, B., Clinchant, S.: SPLADE: sparse lexical and expansion model for first stage ranking. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2021, pp. 2288–2292. Association for Computing Machinery, New York, July 2021. 10/gm2tf2, https://doi.org/10.1145/3404835.3463098

12.

Hofstätter, S., Althammer, S., Schröder, M., Sertkan, M., Hanbury, A.: Improving efficient neural ranking models with cross-architecture knowledge distillation. arXiv abs/2010.02666 (2020)

13.

Hofstätter, S., Lin, S.C., Yang, J.H., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2021, pp. 113–122. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3404835.3462891

14.

Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via contextualized late interaction over BERT. http://arxiv.org/abs/2004.12832

15.

Krasakis, A.M., Yates, A., Kanoulas, E.: Zero-shot Query Contextualization for Conversational Search. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2022, pp. 1880–1884. Association for Computing Machinery, New York, July 2022. https://doi.org/10.1145/3477495.3531769

16.

Kumar, V., Callan, J.: Making information seeking easier: an improved pipeline for conversational search, p. 10

17.

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=H1eA7AEtvS

18.

Lin, S.C., Yang, J.H., Lin, J.: Contextualized query embeddings for conversational search. http://arxiv.org/abs/2104.08707

19.

Lin, S.C., Yang, J.H., Lin, J.: In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval. In: Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pp. 163–173. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.repl4nlp-1.17, https://aclanthology.org/2021.repl4nlp-1.17

20.

Lin, S.C., Yang, J.H., Lin, J.: TREC 2020 notebook: CAsT track. Technical report, TREC, December 2021

21.

Lin, S.C., Yang, J.H., Nogueira, R., Tsai, M.F., Wang, C.J., Lin, J.: Multi-stage conversational passage retrieval: an approach to fusing term importance estimation and neural query rewriting. http://arxiv.org/abs/2005.02230

22.

Lin, S., Yang, J., Nogueira, R., Tsai, M., Wang, C., Lin, J.: Query reformulation using query history for passage retrieval in conversational search. CoRR abs/2005.02230 (2020). https://arxiv.org/abs/2005.02230

23.

Mele, I., Muntean, C.I., Nardini, F.M., Perego, R., Tonellotto, N.: Finding context through utterance dependencies in search conversations. Technical report (2021)

24.

Nogueira, R., Jiang, Z., Pradeep, R., Lin, J.: Document ranking with a pretrained sequence-to-sequence model. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 708–718. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.63, https://www.aclweb.org/anthology/2020.findings-emnlp.63

25.

Qu, C., Yang, L., Chen, C., Qiu, M., Croft, W.B., Iyyer, M.: Open-retrieval conversational question answering, pp. 539–548. https://doi.org/10.1145/3397271.3401110, http://arxiv.org/abs/2005.11364

26.

Qu, C., Yang, L., Chen, C., Qiu, M., Croft, W.B., Iyyer, M.: Open-retrieval conversational question answering. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2020, pp. 539–548. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3397271.3401110

27.

Qu, C., Yang, L., Chen, C., Qiu, M., Croft, W.B., Iyyer, M.: Open-retrieval conversational question answering. In: Huang, J.X., et al. (eds.) Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, Virtual Event, China, 25–30 July 2020, pp. 539–548. ACM (2020). https://doi.org/10.1145/3397271.3401110

28.

Qu, C., Yang, L., Qiu, M., Croft, W.B., Zhang, Y., Iyyer, M.: BERT with history answer embedding for conversational question answering. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2019, pp. 1133–1136. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3331184.3331341

29.

Qu, C., et al.: Attentive history selection for conversational question answering. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1391–1400 (2019)

30.

Reddy, S., Chen, D., Manning, C.D.: CoQA: a conversational question answering challenge. Trans. Assoc. Comput. Linguist. 7, 249–266 (2019). https://doi.org/10.1162/tacl_a_00266, https://aclanthology.org/Q19-1016

31.

Reddy, S., Chen, D., Manning, C.D.: COQA: a conversational question answering challenge. Trans. Assoc. Comput. Linguist. 7, 249–266 (2019). https://doi.org/10.1162/tacl_a_00266

32.

Santhanam, K., Khattab, O., Saad-Falcon, J., Potts, C., Zaharia, M.: ColBERTv2: effective and efficient retrieval via lightweight late interaction. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3715–3734. Association for Computational Linguistics, Seattle, United States, July 2022. https://doi.org/10.18653/v1/2022.naacl-main.272, https://aclanthology.org/2022.naacl-main.272

33.

Vakulenko, S., Longpre, S., Tu, Z., Anantha, R.: Question rewriting for conversational question answering. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 355–363. ACM. https://doi.org/10.1145/3437963.3441748, https://dl.acm.org/doi/10.1145/3437963.3441748

34.

Voskarides, N., Li, D., Panteli, A., Ren, P.: ILPS at TREC 2019 conversational assistant track, p. 4

35.

Voskarides, N., Li, D., Ren, P., Kanoulas, E., de Rijke, M.: Query resolution for conversational search with limited supervision, pp. 921–930. https://doi.org/10.1145/3397271.3401130, http://arxiv.org/abs/2005.11723

36.

Yan, X., Clarke, C.L.A., Arabzadeh, N.: Waterlooclarke at the TREC 2021 conversational assistant track (2021)

37.

Yang, J.H., Lin, S.C., Wang, C.J., Lin, J.J., Tsai, M.F.: Query and answer expansion from conversation history. In: TREC (2019)

38.

Yu, S., et al.: Few-shot generative conversational query rewriting. http://arxiv.org/abs/2006.05009

39.

Zamani, H., Trippas, J.R., Dalton, J., Radlinski, F.: Conversational Information Seeking, January 2022. https://doi.org/10.48550/arXiv.2201.08808, http://arxiv.org/abs/2201.08808, arXiv:2201.08808 [cs]

Titel: CoSPLADE: Contextualizing SPLADE for Conversational Information Retrieval
verfasst von: Nam Hai Le
Thomas Gerald
Thibault Formal
Jian-Yun Nie
Benjamin Piwowarski
Laure Soulier
Verlag: Springer Nature Switzerland
Buch: Advances in Information Retrieval
Print ISBN: 978-3-031-28243-0

Electronic ISBN: 978-3-031-28244-7

Copyright-Jahr: 2023
DOI: https://doi.org/10.1007/978-3-031-28244-7_34

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.