Skip to main content
Erschienen in: Scientific and Technical Information Processing 5/2023

01.12.2023

Approaches to Cross-Language Retrieval of Similar Legal Documents Based on Machine Learning

verfasst von: V. V. Zhebel, D. A. Devyatkin, D. V. Zubarev, I. V. Sochenkov

Erschienen in: Scientific and Technical Information Processing | Ausgabe 5/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In order to study global experience for legislation changing and rule-making necessitates, tools for information retrieval of regulatory documents written in different languages become increasingly necessary. One of the aspects of information identification is retrieval of thematically similar documents for a given input document. In this context, an important task of cross-lingual search arises when the user of an information system specifies a reference document in one language, and the search results contain relevant documents in other languages. The article describes different approaches to solving this problem: from classic mediator-based methods to more modern solutions, based on distributional semantics. The test collection used in the study was taken from the United Nations Digital Library, which provides legal documents in both the original English and their Russian translations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dini, L., Peters, W., Liebwald, D., Schweighofer, E., Mommers, L., and Voermans, W., Cross-lingual legal information retrieval using a WordNet architecture, Proc. 10th Int. Conf. on Artificial Intelligence and Law, Bologna, 2005, New York: Association for Computing Machinery, 2005, pp. 163–167. https://doi.org/10.1145/1165485.1165510 Dini, L., Peters, W., Liebwald, D., Schweighofer, E., Mommers, L., and Voermans, W., Cross-lingual legal information retrieval using a WordNet architecture, Proc. 10th Int. Conf. on Artificial Intelligence and Law, Bologna, 2005, New York: Association for Computing Machinery, 2005, pp. 163–167. https://​doi.​org/​10.​1145/​1165485.​1165510
2.
Zurück zum Zitat Abramova, N.N. and Globus, E.I., Formirovanie mnogoyazychnykh slovarei i ikh ispol’zovanie pri kross-yazykovom poiske informatsii, Internet-matematika. Avtomaticheskaya obrabotka veb-dannykh (Internet Mathematics: Automated Processing of Web Data), 2005, pp. 18–37. Abramova, N.N. and Globus, E.I., Formirovanie mnogoyazychnykh slovarei i ikh ispol’zovanie pri kross-yazykovom poiske informatsii, Internet-matematika. Avtomaticheskaya obrabotka veb-dannykh (Internet Mathematics: Automated Processing of Web Data), 2005, pp. 18–37.
3.
Zurück zum Zitat Curtoni, P., Dini, L., Tomaso, V., Mommers, L., Peters, W., Quaresma, P., Schweighofer, E., and Tiscornia, D., Semantic access to multilingual legal information, 1999. Curtoni, P., Dini, L., Tomaso, V., Mommers, L., Peters, W., Quaresma, P., Schweighofer, E., and Tiscornia, D., Semantic access to multilingual legal information, 1999.
4.
Zurück zum Zitat Oard, D.W. and Hackett, P., Document translation for cross-language text retrieval at the University of Maryland, The 6th Text Retrieval Conf. (TREC-6), Voorchees, E M and Harman, D K, 1998. Oard, D.W. and Hackett, P., Document translation for cross-language text retrieval at the University of Maryland, The 6th Text Retrieval Conf. (TREC-6), Voorchees, E M and Harman, D K, 1998.
5.
Zurück zum Zitat McCarley, J.S., Should we translate the documents or the queries in cross-language information retrieval?, Proc. 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, Md., 1999, Stroudsburg, Pa.: Association for Computational Linguistics, 1999, pp. 208–214. https://doi.org/10.3115/1034678.1034716 McCarley, J.S., Should we translate the documents or the queries in cross-language information retrieval?, Proc. 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, College Park, Md., 1999, Stroudsburg, Pa.: Association for Computational Linguistics, 1999, pp. 208–214. https://​doi.​org/​10.​3115/​1034678.​1034716
6.
Zurück zum Zitat Dumais, S., Letsche, T., Littman, M., and Landauer, T., Automatic cross-language retrieval using latent semantic indexing, AAAI Spring Symp. on Cross-Language Text and Speech Retrieval, Stanford Univ., 1997, pp. 18–24. Dumais, S., Letsche, T., Littman, M., and Landauer, T., Automatic cross-language retrieval using latent semantic indexing, AAAI Spring Symp. on Cross-Language Text and Speech Retrieval, Stanford Univ., 1997, pp. 18–24.
13.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., and Dean, J., Efficient estimation of word representations in vector space, ICLR Workshop, 2013. Mikolov, T., Chen, K., Corrado, G., and Dean, J., Efficient estimation of word representations in vector space, ICLR Workshop, 2013.
14.
Zurück zum Zitat Rekabsaz, N., Lupu, M., Hanbury, A., and Zuccon, G., Generalizing translation models in the probabilistic relevance framework, Proc. 25th ACM Int. Conf. on Information and Knowledge Management, Indianapolis, Ind., 2016, New York: Association for Computing Machinery, 2016. https://doi.org/10.1145/2983323.2983833 Rekabsaz, N., Lupu, M., Hanbury, A., and Zuccon, G., Generalizing translation models in the probabilistic relevance framework, Proc. 25th ACM Int. Conf. on Information and Knowledge Management, Indianapolis, Ind., 2016, New York: Association for Computing Machinery, 2016. https://​doi.​org/​10.​1145/​2983323.​2983833
15.
Zurück zum Zitat Robertson, S.E., Okapi at TREC-3.0, Proc. Third Text REtrieval Conf. (TREC 1994), Gaithersburg, Md.: 1994. Robertson, S.E., Okapi at TREC-3.0, Proc. Third Text REtrieval Conf. (TREC 1994), Gaithersburg, Md.: 1994.
16.
Zurück zum Zitat Vulić, I. and Moens, M.F., Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction, Proc. 53rd Annu. Meeting of the Assoc. for Computational Linguistics and the 7th Int. Joint Conf. on Natural Language Processing, Stroudsburg, Pa.: Association for Computational Linguistics, 2015, vol. 2, pp. 719–725. Vulić, I. and Moens, M.F., Bilingual word embeddings from non-parallel document-aligned data applied to bilingual lexicon induction, Proc. 53rd Annu. Meeting of the Assoc. for Computational Linguistics and the 7th Int. Joint Conf. on Natural Language Processing, Stroudsburg, Pa.: Association for Computational Linguistics, 2015, vol. 2, pp. 719–725.
18.
Zurück zum Zitat Tiedemann, J., Parallel data, tools and interfaces in OPUS, Proc. of the Language Resources and Evaluation (LREC), 2012, pp. 2214–2218. Tiedemann, J., Parallel data, tools and interfaces in OPUS, Proc. of the Language Resources and Evaluation (LREC), 2012, pp. 2214–2218.
22.
Zurück zum Zitat Reimers, N. and Gurevych, I., Sentence-BERT: Sentence embeddings using Siamese BERT-networks, Proc. 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP), Hong Kong: Association for Computational Linguistics, 2019, pp. 3982–3992. https://doi.org/10.18653/v1/d19-1410 Reimers, N. and Gurevych, I., Sentence-BERT: Sentence embeddings using Siamese BERT-networks, Proc. 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP), Hong Kong: Association for Computational Linguistics, 2019, pp. 3982–3992. https://​doi.​org/​10.​18653/​v1/​d19-1410
Metadaten
Titel
Approaches to Cross-Language Retrieval of Similar Legal Documents Based on Machine Learning
verfasst von
V. V. Zhebel
D. A. Devyatkin
D. V. Zubarev
I. V. Sochenkov
Publikationsdatum
01.12.2023
Verlag
Pleiades Publishing
Erschienen in
Scientific and Technical Information Processing / Ausgabe 5/2023
Print ISSN: 0147-6882
Elektronische ISSN: 1934-8118
DOI
https://doi.org/10.3103/S0147688223050167

Weitere Artikel der Ausgabe 5/2023

Scientific and Technical Information Processing 5/2023 Zur Ausgabe

Premium Partner