Skip to main content

11.09.2024 | Original Research

System for the anonymization of Romanian jurisprudence

verfasst von: Vasile Păiş, Radu Ion, Elena Irimia, Verginica Barbu Mititelu, Valentin Badea, Dan Tufiș

Erschienen in: Artificial Intelligence and Law


Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

loading …


The transparency of the judicial process and the consistency of judicial decisions can be improved through their publication. Access to jurisprudence is of paramount importance both for law professionals (judges, lawyers, law students) and for the larger public. However, public access must ensure the preservation of privacy for people involved, in accordance with national and international regulations. This paper presents the work behind building an artificial intelligence system for the anonymization of Romanian jurisprudence, allowing it to be accessed through the ReJust portal operated by the Superior Council of Magistracy in Romania.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"


Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"


Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe


Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"


Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Zurück zum Zitat Ajausks Ē, Arranz V, Bié L, et al (2020) The multilingual anonymisation toolkit for public administrations (MAPA) project. In: Martins A, Moniz H, Fumega S, et al (eds) Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. European Association for Machine Translation, Lisboa, Portugal, pp 471–472, Ajausks Ē, Arranz V, Bié L, et al (2020) The multilingual anonymisation toolkit for public administrations (MAPA) project. In: Martins A, Moniz H, Fumega S, et al (eds) Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. European Association for Machine Translation, Lisboa, Portugal, pp 471–472, https://​aclanthology.​org/​2020.​eamt-1.​57
Zurück zum Zitat Arranz V, Choukri K, Cuadros M, et al (2022) MAPA project: Ready-to-go open-source datasets and deep learning technology to remove identifying information from text documents. In: Siegert I, Rigault M, Arranz V (eds) Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 64–72, Arranz V, Choukri K, Cuadros M, et al (2022) MAPA project: Ready-to-go open-source datasets and deep learning technology to remove identifying information from text documents. In: Siegert I, Rigault M, Arranz V (eds) Proceedings of the Workshop on Ethical and Legal Issues in Human Language Technologies and Multilingual De-Identification of Sensitive Data In Language Resources within the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 64–72, https://​aclanthology.​org/​2022.​legal-1.​12
Zurück zum Zitat Avram AM, Smădu RA, Păiş V, et al (2023b) Towards improving the performance of pre-trained speech models for low-resource languages through lateral inhibition. In: 2023 46th International Conference on Telecommunications and Signal Processing (TSP), pp 234–237, Avram AM, Smădu RA, Păiş V, et al (2023b) Towards improving the performance of pre-trained speech models for low-resource languages through lateral inhibition. In: 2023 46th International Conference on Telecommunications and Signal Processing (TSP), pp 234–237, https://​doi.​org/​10.​1109/​TSP59544.​2023.​10197791
Zurück zum Zitat Barbu Mititelu V, Ion R, Simionescu R, et al (2016) The romanian treebank annotated according to universal dependencies. In: Proceedings of The Tenth International Conference on Natural Language Processing (HrTAL2016) Barbu Mititelu V, Ion R, Simionescu R, et al (2016) The romanian treebank annotated according to universal dependencies. In: Proceedings of The Tenth International Conference on Natural Language Processing (HrTAL2016)
Zurück zum Zitat Barbu Mititelu V, Irimia E, Păiş V, et al (2022) Use case: Romanian language resources in the LOD paradigm. In: Declerck T, McCrae JP, Montiel E, et al (eds) Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 35–44, Barbu Mititelu V, Irimia E, Păiş V, et al (2022) Use case: Romanian language resources in the LOD paradigm. In: Declerck T, McCrae JP, Montiel E, et al (eds) Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 35–44, https://​aclanthology.​org/​2022.​ldl-1.​5
Zurück zum Zitat Cohen RA (2011) Lateral inhibition. Encyclopedia of Clinical Neuropsychology. Springer, New York, pp 1436–1437CrossRef Cohen RA (2011) Lateral inhibition. Encyclopedia of Clinical Neuropsychology. Springer, New York, pp 1436–1437CrossRef
Zurück zum Zitat Coman A, Mitrofan M, Tufiş D (2019) Automatic identification and classification of legal terms in romanian law texts. In: International Conference on Linguistic Resources and Tools for Natural Language Processing Coman A, Mitrofan M, Tufiş D (2019) Automatic identification and classification of legal terms in romanian law texts. In: International Conference on Linguistic Resources and Tools for Natural Language Processing
Zurück zum Zitat Devlin J, Chang MW, Lee K, et al (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186 Devlin J, Chang MW, Lee K, et al (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186
Zurück zum Zitat Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 159–174 Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 159–174
Zurück zum Zitat Leitner E, Rehm G, Moreno-Schneider J (2019) Fine-grained named entity recognition in legal documents. In: Acosta M, Cudré-Mauroux P, Maleshkova M et al (eds) Semantic Systems. The Power of AI and Knowledge Graphs. Springer International Publishing, Cham, pp 272–287CrossRef Leitner E, Rehm G, Moreno-Schneider J (2019) Fine-grained named entity recognition in legal documents. In: Acosta M, Cudré-Mauroux P, Maleshkova M et al (eds) Semantic Systems. The Power of AI and Knowledge Graphs. Springer International Publishing, Cham, pp 272–287CrossRef
Zurück zum Zitat Leitner E, Rehm G, Moreno-Schneider J (2020) A dataset of German legal documents for named entity recognition. In: Calzolari N, Béchet F, Blache P, et al (eds) Proceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 4478–4485, Leitner E, Rehm G, Moreno-Schneider J (2020) A dataset of German legal documents for named entity recognition. In: Calzolari N, Béchet F, Blache P, et al (eds) Proceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 4478–4485, https://​aclanthology.​org/​2020.​lrec-1.​551
Zurück zum Zitat Mitrofan M, Păiş V (2022) Improving romanian bioner using a biologically inspired system. In: Proceedings of the 21st Workshop on Biomedical Language Processing. Association for Computational Linguistics, Dublin, Ireland, pp 316–322, Mitrofan M, Păiş V (2022) Improving romanian bioner using a biologically inspired system. In: Proceedings of the 21st Workshop on Biomedical Language Processing. Association for Computational Linguistics, Dublin, Ireland, pp 316–322, https://​aclanthology.​org/​2022.​bionlp-1.​30
Zurück zum Zitat Păiş V, Ion R, Tufiş D (2020) A processing platform relating data and tools for Romanian language. In: Rehm G, Bontcheva K, Choukri K, et al (eds) Proceedings of the 1st International Workshop on Language Technology Platforms. European Language Resources Association, Marseille, France, pp 81–88, Păiş V, Ion R, Tufiş D (2020) A processing platform relating data and tools for Romanian language. In: Rehm G, Bontcheva K, Choukri K, et al (eds) Proceedings of the 1st International Workshop on Language Technology Platforms. European Language Resources Association, Marseille, France, pp 81–88, https://​aclanthology.​org/​2020.​iwltp-1.​13
Zurück zum Zitat Păiş V, Ion R, Avram AM et al (2021) In-depth evaluation of Romanian natural language processing pipelines. Romanian Journal of Information Science and Technology (ROMJIST) 24(4):384–401 Păiş V, Ion R, Avram AM et al (2021) In-depth evaluation of Romanian natural language processing pipelines. Romanian Journal of Information Science and Technology (ROMJIST) 24(4):384–401
Zurück zum Zitat Păiş V (2022) Racai at semeval-2022 task 11: Complex named entity recognition using a lateral inhibition mechanism. In: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). Association for Computational Linguistics, Seattle, United States, pp 1562–1569, Păiş V (2022) Racai at semeval-2022 task 11: Complex named entity recognition using a lateral inhibition mechanism. In: Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). Association for Computational Linguistics, Seattle, United States, pp 1562–1569, https://​aclanthology.​org/​2022.​semeval-1.​215
Zurück zum Zitat Păiş V, Mitrofan M, Gasan CL, et al (2021b) Named entity recognition in the Romanian legal domain. In: Proceedings of the Natural Legal Language Processing Workshop 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp 9–18, Păiş V, Mitrofan M, Gasan CL, et al (2021b) Named entity recognition in the Romanian legal domain. In: Proceedings of the Natural Legal Language Processing Workshop 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, pp 9–18, https://​aclanthology.​org/​2021.​nllp-1.​2
Zurück zum Zitat Păiş V, Barbu Mititelu V, Irimia E, et al (2022) Romanian micro-blogging named entity recognition including health-related entities. In: Gonzalez-Hernandez G, Weissenbacher D (eds) Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task. Association for Computational Linguistics, Gyeongju, Republic of Korea, pp 190–196, Păiş V, Barbu Mititelu V, Irimia E, et al (2022) Romanian micro-blogging named entity recognition including health-related entities. In: Gonzalez-Hernandez G, Weissenbacher D (eds) Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task. Association for Computational Linguistics, Gyeongju, Republic of Korea, pp 190–196, https://​aclanthology.​org/​2022.​smm4h-1.​49
Zurück zum Zitat Samy D, Arenas-García J, Pérez-Fernández D (2020) Legal-ES: A set of large scale resources for Spanish legal text processing. In: Samy D, Pérez-Fernández D, Arenas-García J (eds) Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov). European Language Resources Association, Marseille, France, pp 32–36, Samy D, Arenas-García J, Pérez-Fernández D (2020) Legal-ES: A set of large scale resources for Spanish legal text processing. In: Samy D, Pérez-Fernández D, Arenas-García J (eds) Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov). European Language Resources Association, Marseille, France, pp 32–36, https://​aclanthology.​org/​2020.​lt4gov-1.​6
Zurück zum Zitat Stenetorp P, Pyysalo S, Topić G, et al (2012) brat: a web-based tool for NLP-assisted text annotation. In: Segond F (ed) Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Avignon, France, pp 102–107, Stenetorp P, Pyysalo S, Topić G, et al (2012) brat: a web-based tool for NLP-assisted text annotation. In: Segond F (ed) Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Avignon, France, pp 102–107, https://​aclanthology.​org/​E12-2021
Zurück zum Zitat Straka M, Hajič J, Straková J (2016) Ud-pipe: trainable pipeline for processing conll-u files performing tokenization, morphological analysis, pos tagging and parsing. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association, Portorož, Slovenia Straka M, Hajič J, Straková J (2016) Ud-pipe: trainable pipeline for processing conll-u files performing tokenization, morphological analysis, pos tagging and parsing. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association, Portorož, Slovenia
Zurück zum Zitat Tufiş D, Barbu Mititelu V, Irimia E et al (2019) Little strokes fell great oaks. creating CoRoLa, the reference corpus of contemporary romanian. Revue Roumaine de Linguistique 64(3):227–240 Tufiş D, Barbu Mititelu V, Irimia E et al (2019) Little strokes fell great oaks. creating CoRoLa, the reference corpus of contemporary romanian. Revue Roumaine de Linguistique 64(3):227–240
Zurück zum Zitat Vanallemeersch T, Szoc S (2021) Final report d91.1 specification on anonymisation. SMART 2019/1083 Action on CEF Automated Translation Core Service Platform Vanallemeersch T, Szoc S (2021) Final report d91.1 specification on anonymisation. SMART 2019/1083 Action on CEF Automated Translation Core Service Platform
Zurück zum Zitat Váradi T, Nyéki B, Koeva S, et al (2022) Introducing the curlicat corpora: Seven-language domain specific annotated corpora from curated sources. In: Proceedings of the Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 100–108, Váradi T, Nyéki B, Koeva S, et al (2022) Introducing the curlicat corpora: Seven-language domain specific annotated corpora from curated sources. In: Proceedings of the Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 100–108, https://​aclanthology.​org/​2022.​lrec-1.​11
System for the anonymization of Romanian jurisprudence
verfasst von
Vasile Păiş
Radu Ion
Elena Irimia
Verginica Barbu Mititelu
Valentin Badea
Dan Tufiș
Springer Netherlands
Erschienen in
Artificial Intelligence and Law
Print ISSN: 0924-8463
Elektronische ISSN: 1572-8382