Skip to main content

2022 | OriginalPaper | Buchkapitel

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents

verfasst von : Maud Ehrmann, Matteo Romanello, Antoine Doucet, Simon Clematide

Erschienen in: Advances in Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present the HIPE-2022 shared task on named entity processing in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, this edition confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. HIPE-2022 is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources. In this context, the main objective of the evaluation lab is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
Classical commentaries are scholarly publications dedicated to the in-depth analysis and explanation of ancient literary works. As such, they aim to facilitate the reading and understanding of a given literary text. More information on the HIPE-2022 classical commentaries corpus in Sect. 3.2.
 
7
Impresso [4] and SoNAR guidelines [12] were derived from Quaero guidelines [16], while NewsEye guidelines correspond to a subset of the impresso guidelines.
 
Literatur
1.
Zurück zum Zitat Beryozkin, G., Drori, Y., Gilon, O., Hartman, T., Szpektor, I.: A joint named-entity recognizer for heterogeneous tag-sets using a tag hierarchy. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 140–150, Florence, Italy, July 2019. https://aclanthology.org/P19-1014 Beryozkin, G., Drori, Y., Gilon, O., Hartman, T., Szpektor, I.: A joint named-entity recognizer for heterogeneous tag-sets using a tag hierarchy. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 140–150, Florence, Italy, July 2019. https://​aclanthology.​org/​P19-1014
3.
Zurück zum Zitat Ehrmann, M., Colavizza, G., Rochat, Y., Kaplan, F.: Diachronic evaluation of NER systems on old newspapers. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), pp. 97–107, Bochum (2016). Bochumer Linguistische Arbeitsberichte. https://infoscience.epfl.ch/record/221391 Ehrmann, M., Colavizza, G., Rochat, Y., Kaplan, F.: Diachronic evaluation of NER systems on old newspapers. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), pp. 97–107, Bochum (2016). Bochumer Linguistische Arbeitsberichte. https://​infoscience.​epfl.​ch/​record/​221391
4.
Zurück zum Zitat Ehrmann, M., Romanello, M., Flückiger, A., Clematide, S.: Impresso Named Entity Annotation Guidelines. Annotation guidelines, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Zurich University (UZH), January 2020. https://zenodo.org/record/3585750 Ehrmann, M., Romanello, M., Flückiger, A., Clematide, S.: Impresso Named Entity Annotation Guidelines. Annotation guidelines, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Zurich University (UZH), January 2020. https://​zenodo.​org/​record/​3585750
6.
Zurück zum Zitat Ehrmann, M., Hamdi, A., Pontes, E.L., Romanello, M., Doucet, A.: Named Entity Recognition and Classification on Historical Documents: A Survey. arXiv:2109.11406 [cs], September 2021 Ehrmann, M., Hamdi, A., Pontes, E.L., Romanello, M., Doucet, A.: Named Entity Recognition and Classification on Historical Documents: A Survey. arXiv:​2109.​11406 [cs], September 2021
8.
Zurück zum Zitat Hamdi, A., et al.: A multilingual dataset for named entity recognition, entity linking and stance detection in historical newspapers. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2328–2334, New York, NY, USA, July 2021. Association for Computing Machinery. ISBN 978-1-4503-8037-9. https://doi.org/10.1145/3404835.3463255 Hamdi, A., et al.: A multilingual dataset for named entity recognition, entity linking and stance detection in historical newspapers. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2328–2334, New York, NY, USA, July 2021. Association for Computing Machinery. ISBN 978-1-4503-8037-9. https://​doi.​org/​10.​1145/​3404835.​3463255
10.
Zurück zum Zitat Li, J., Chiu, B., Feng, S., Wang, H.: Few-shot named entity recognition via meta-learning. IEEE Trans. Knowl. Data Eng. 1 (2020) Li, J., Chiu, B., Feng, S., Wang, H.: Few-shot named entity recognition via meta-learning. IEEE Trans. Knowl. Data Eng. 1 (2020)
11.
Zurück zum Zitat Li, J., Shang, S., Shao, L.: Metaner: Named entity recognition with meta-learning. In: Proceedings of The Web Conference 2020, WWW 2020, pp. 429–440, New York, NY, USA (2020). Association for Computing Machinery. ISBN 9781450370233. https://doi.org/10.1145/3366423.3380127 Li, J., Shang, S., Shao, L.: Metaner: Named entity recognition with meta-learning. In: Proceedings of The Web Conference 2020, WWW 2020, pp. 429–440, New York, NY, USA (2020). Association for Computing Machinery. ISBN 9781450370233. https://​doi.​org/​10.​1145/​3366423.​3380127
14.
Zurück zum Zitat Ridge, M., Colavizza, G., Brake, L., Ehrmann, M., Moreux, J.P., Prescott, A.: The past, present and future of digital scholarship with newspaper collections. In: DH 2019 Book of Abstracts, pp. 1–9, Utrecht, The Netherlands (2019). http://infoscience.epfl.ch/record/271329 Ridge, M., Colavizza, G., Brake, L., Ehrmann, M., Moreux, J.P., Prescott, A.: The past, present and future of digital scholarship with newspaper collections. In: DH 2019 Book of Abstracts, pp. 1–9, Utrecht, The Netherlands (2019). http://​infoscience.​epfl.​ch/​record/​271329
15.
Zurück zum Zitat Matteo, R., Sven, N.-M., Bruce, R.: Optical character recognition of 19th century classical commentaries: the current state of affairs. In: The 6th International Workshop on Historical Document Imaging and Processing (HIP 2021), Lausanne, September 2021. Association for Computing Machinery. https://doi.org/10.1145/3476887.3476911 Matteo, R., Sven, N.-M., Bruce, R.: Optical character recognition of 19th century classical commentaries: the current state of affairs. In: The 6th International Workshop on Historical Document Imaging and Processing (HIP 2021), Lausanne, September 2021. Association for Computing Machinery. https://​doi.​org/​10.​1145/​3476887.​3476911
16.
Zurück zum Zitat Rosset, S., Grouin, C., Zweigenbaum, P.: Entités nommées structurées : Guide d’annotation Quaero. Technical Report 2011–04, LIMSI-CNRS, Orsay, France (2011) Rosset, S., Grouin, C., Zweigenbaum, P.: Entités nommées structurées : Guide d’annotation Quaero. Technical Report 2011–04, LIMSI-CNRS, Orsay, France (2011)
Metadaten
Titel
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
verfasst von
Maud Ehrmann
Matteo Romanello
Antoine Doucet
Simon Clematide
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-99739-7_44

Neuer Inhalt