Top

Published in:

2021 | OriginalPaper | Chapter

Cross-Domain Retrieval in the Legal and Patent Domains: A Reproducibility Study

Authors : Sophia Althammer, Sebastian Hofstätter, Allan Hanbury

Published in: Advances in Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Domain specific search has always been a challenging information retrieval task due to several challenges such as the domain specific language, the unique task setting, as well as the lack of accessible queries and corresponding relevance judgements. In the last years, pretrained language models – such as BERT – revolutionized web and news search. Naturally, the community aims to adapt these advancements to cross-domain transfer of retrieval models for domain specific search. In the context of legal document retrieval, Shao et al. propose the BERT-PLI framework by modeling the Paragraph-Level Interactions with the language model BERT. In this paper we reproduce the original experiments, we clarify pre-processing steps and add missing scripts for framework steps, however we are not able to reproduce the evaluation results. Contrary to the original paper, we demonstrate that the domain specific paragraph-level modelling does not appear to help the performance of the BERT-PLI model compared to paragraph-level modelling with the original BERT. In addition to our legal search reproducibility study, we investigate BERT-PLI for document retrieval in the patent domain. We find that the BERT-PLI model does not yet achieve performance improvements for patent document retrieval compared to the BM25 baseline. Furthermore, we evaluate the BERT-PLI model for cross-domain retrieval between the legal and patent domain on individual components, both on a paragraph and document-level. We find that the transfer of the BERT-PLI model on the paragraph-level leads to comparable results between both domains as well as first promising results for the cross-domain transfer on the document-level. For reproducibility and transparency as well as to benefit the community we make our source code and the trained models publicly available.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

next chapter A Critical Assessment of State-of-the-Art in Entity Alignment

checkpoint from https://github.com/google-research/bert.

https://github.com/huggingface/transformers.

https://github.com/castorini/pyserini.

https://github.com/ThuYShao/BERT-PLI-IJCAI2020.

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html.

Akkalyoncu Yilmaz, Z., Yang, W., Zhang, H., Lin, J.: Cross-domain modeling of sentence-level evidence for document retrieval. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3490–3496. Association for Computational Linguistics, Hong Kong, China, November 2019. https://doi.org/10.18653/v1/D19-1352. https://www.aclweb.org/anthology/D19-1352

Bhattacharya, P., et al.: Fire 2019 AILA track: artificial intelligence for legal assistance. In: Proceedings of the 11th Forum for Information Retrieval Evaluation, FIRE 2019, pp. 4–6. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3368567.3368587

Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, October 2014. https://doi.org/10.3115/v1/D14-1179. https://www.aclweb.org/anthology/D14-1179

Cormack, G., Grossman, M.: Autonomy and reliability of continuous active learning for technology-assisted review, April 2015

Cormack, G.V., Grossman, M.R.: Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In: Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, pp. 153–162. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2600428.2609601

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, June 2019. https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423

Gao, L., Dai, Z., Callan, J.: Modularized transfomer-based ranking framework. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (2020)

Hedin, B., Zaresefat, S., Baron, J., Oard, D.: Overview of the TREC 2009 legal track. In: The Eighteenth Text Retrieval Conference (TREC 2009) Proceedings, January 2009

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

10.

Hofstätter, S., Hanbury, A.: Let’s measure run time! Extending the IR replicability infrastructure to include performance aspects. In: Proceedings of OSIRRC (2019)

11.

Hofstätter, S., Zlabinger, M., Hanbury, A.: Interpretable & time-budget-constrained contextualization for re-ranking. In: Proceedings of ECAI (2020)

12.

Lee, K., Chang, M.W., Toutanova, K.: Latent retrieval for weakly supervised open domain question answering. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6086–6096. Association for Computational Linguistics, Florence, July 2019. https://doi.org/10.18653/v1/P19-1612. https://www.aclweb.org/anthology/P19-1612

13.

MacAvaney, S., Cohan, A., Goharian, N.: SLEDGE-Z: a zero-shot baseline for COVID-19 literature search. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4171–4179. Association for Computational Linguistics, November 2020. https://doi.org/10.18653/v1/2020.emnlp-main.341. https://www.aclweb.org/anthology/2020.emnlp-main.341

14.

Piroi, F., Lupu, M., Hanbury, A.: Overview of CLEF-IP 2013 lab. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 232–249. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_25CrossRef

15.

Piroi, F., Lupu, M., Hanbury, A., Zenz, V.: CLEF-IP 2011: retrieval in the intellectual property domain, January 2011

16.

Piroi, F., Tait, J.: CLEF-IP 2010: retrieval experiments in the intellectual property domain (2010)

17.

Rabelo, J., Kim, M.-Y., Goebel, R., Yoshioka, M., Kano, Y., Satoh, K.: A summary of the COLIEE 2019 competition. In: Sakamoto, M., Okazaki, N., Mineshima, K., Satoh, K. (eds.) JSAI-isAI 2019. LNCS (LNAI), vol. 12331, pp. 34–49. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58790-1_3CrossRef

18.

Robertson, S., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019

19.

Rossi, J., Kanoulas, E.: Legal information retrieval with generalized language models (2019)

20.

Shao, Y., et al.: BERT-PLI: modeling paragraph-level interactions for legal case retrieval. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pp. 3501–3507. International Joint Conferences on Artificial Intelligence Organization, July 2020. Main track

21.

Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 623–632. Association for Computing Machinery, New York (2007). https://doi.org/10.1145/1321440.1321528

22.

Tran, V., Nguyen, M.L., Satoh, K.: Building legal case retrieval systems with lexical matching and summarization using a pre-trained phrase scoring model. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, ICAIL 2019, pp. 275–282. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3322640.3326740

23.

Urbano, J., Lima, H., Hanjalic, A.: Statistical significance testing in information retrieval: an empirical analysis of type i, type ii and type iii errors. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, pp. 505–514. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3331184.3331259

24.

Xiong, C., et al.: CMT in TREC-COVID round 2: mitigating the generalization gaps from web to special domain search. In: ArXiv preprint (2020)

25.

Yang, W., et al.: End-to-end open-domain question answering with BERTserini. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 72–77. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-4013

26.

Zhang, Y., Nie, P., Geng, X., Ramamurthy, A., Song, L., Jiang, D.: DC-BERT: decoupling question and document for efficient contextual encoding (2020)

Title: Cross-Domain Retrieval in the Legal and Patent Domains: A Reproducibility Study
Authors: Sophia Althammer
Sebastian Hofstätter
Allan Hanbury
Publisher: Springer International Publishing
Book: Advances in Information Retrieval
Print ISBN: 978-3-030-72239-5

Electronic ISBN: 978-3-030-72240-1

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-72240-1_1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"