Skip to main content
Top

2022 | OriginalPaper | Chapter

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents

Authors : Maud Ehrmann, Matteo Romanello, Antoine Doucet, Simon Clematide

Published in: Advances in Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present the HIPE-2022 shared task on named entity processing in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, this edition confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. HIPE-2022 is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources. In this context, the main objective of the evaluation lab is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
3
Classical commentaries are scholarly publications dedicated to the in-depth analysis and explanation of ancient literary works. As such, they aim to facilitate the reading and understanding of a given literary text. More information on the HIPE-2022 classical commentaries corpus in Sect. 3.2.
 
7
Impresso [4] and SoNAR guidelines [12] were derived from Quaero guidelines [16], while NewsEye guidelines correspond to a subset of the impresso guidelines.
 
Literature
1.
go back to reference Beryozkin, G., Drori, Y., Gilon, O., Hartman, T., Szpektor, I.: A joint named-entity recognizer for heterogeneous tag-sets using a tag hierarchy. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 140–150, Florence, Italy, July 2019. https://aclanthology.org/P19-1014 Beryozkin, G., Drori, Y., Gilon, O., Hartman, T., Szpektor, I.: A joint named-entity recognizer for heterogeneous tag-sets using a tag hierarchy. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 140–150, Florence, Italy, July 2019. https://​aclanthology.​org/​P19-1014
3.
go back to reference Ehrmann, M., Colavizza, G., Rochat, Y., Kaplan, F.: Diachronic evaluation of NER systems on old newspapers. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), pp. 97–107, Bochum (2016). Bochumer Linguistische Arbeitsberichte. https://infoscience.epfl.ch/record/221391 Ehrmann, M., Colavizza, G., Rochat, Y., Kaplan, F.: Diachronic evaluation of NER systems on old newspapers. In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), pp. 97–107, Bochum (2016). Bochumer Linguistische Arbeitsberichte. https://​infoscience.​epfl.​ch/​record/​221391
4.
go back to reference Ehrmann, M., Romanello, M., Flückiger, A., Clematide, S.: Impresso Named Entity Annotation Guidelines. Annotation guidelines, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Zurich University (UZH), January 2020. https://zenodo.org/record/3585750 Ehrmann, M., Romanello, M., Flückiger, A., Clematide, S.: Impresso Named Entity Annotation Guidelines. Annotation guidelines, Ecole Polytechnique Fédérale de Lausanne (EPFL) and Zurich University (UZH), January 2020. https://​zenodo.​org/​record/​3585750
6.
go back to reference Ehrmann, M., Hamdi, A., Pontes, E.L., Romanello, M., Doucet, A.: Named Entity Recognition and Classification on Historical Documents: A Survey. arXiv:2109.11406 [cs], September 2021 Ehrmann, M., Hamdi, A., Pontes, E.L., Romanello, M., Doucet, A.: Named Entity Recognition and Classification on Historical Documents: A Survey. arXiv:​2109.​11406 [cs], September 2021
8.
go back to reference Hamdi, A., et al.: A multilingual dataset for named entity recognition, entity linking and stance detection in historical newspapers. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2328–2334, New York, NY, USA, July 2021. Association for Computing Machinery. ISBN 978-1-4503-8037-9. https://doi.org/10.1145/3404835.3463255 Hamdi, A., et al.: A multilingual dataset for named entity recognition, entity linking and stance detection in historical newspapers. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021, pp. 2328–2334, New York, NY, USA, July 2021. Association for Computing Machinery. ISBN 978-1-4503-8037-9. https://​doi.​org/​10.​1145/​3404835.​3463255
10.
go back to reference Li, J., Chiu, B., Feng, S., Wang, H.: Few-shot named entity recognition via meta-learning. IEEE Trans. Knowl. Data Eng. 1 (2020) Li, J., Chiu, B., Feng, S., Wang, H.: Few-shot named entity recognition via meta-learning. IEEE Trans. Knowl. Data Eng. 1 (2020)
11.
go back to reference Li, J., Shang, S., Shao, L.: Metaner: Named entity recognition with meta-learning. In: Proceedings of The Web Conference 2020, WWW 2020, pp. 429–440, New York, NY, USA (2020). Association for Computing Machinery. ISBN 9781450370233. https://doi.org/10.1145/3366423.3380127 Li, J., Shang, S., Shao, L.: Metaner: Named entity recognition with meta-learning. In: Proceedings of The Web Conference 2020, WWW 2020, pp. 429–440, New York, NY, USA (2020). Association for Computing Machinery. ISBN 9781450370233. https://​doi.​org/​10.​1145/​3366423.​3380127
14.
go back to reference Ridge, M., Colavizza, G., Brake, L., Ehrmann, M., Moreux, J.P., Prescott, A.: The past, present and future of digital scholarship with newspaper collections. In: DH 2019 Book of Abstracts, pp. 1–9, Utrecht, The Netherlands (2019). http://infoscience.epfl.ch/record/271329 Ridge, M., Colavizza, G., Brake, L., Ehrmann, M., Moreux, J.P., Prescott, A.: The past, present and future of digital scholarship with newspaper collections. In: DH 2019 Book of Abstracts, pp. 1–9, Utrecht, The Netherlands (2019). http://​infoscience.​epfl.​ch/​record/​271329
15.
go back to reference Matteo, R., Sven, N.-M., Bruce, R.: Optical character recognition of 19th century classical commentaries: the current state of affairs. In: The 6th International Workshop on Historical Document Imaging and Processing (HIP 2021), Lausanne, September 2021. Association for Computing Machinery. https://doi.org/10.1145/3476887.3476911 Matteo, R., Sven, N.-M., Bruce, R.: Optical character recognition of 19th century classical commentaries: the current state of affairs. In: The 6th International Workshop on Historical Document Imaging and Processing (HIP 2021), Lausanne, September 2021. Association for Computing Machinery. https://​doi.​org/​10.​1145/​3476887.​3476911
16.
go back to reference Rosset, S., Grouin, C., Zweigenbaum, P.: Entités nommées structurées : Guide d’annotation Quaero. Technical Report 2011–04, LIMSI-CNRS, Orsay, France (2011) Rosset, S., Grouin, C., Zweigenbaum, P.: Entités nommées structurées : Guide d’annotation Quaero. Technical Report 2011–04, LIMSI-CNRS, Orsay, France (2011)
Metadata
Title
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
Authors
Maud Ehrmann
Matteo Romanello
Antoine Doucet
Simon Clematide
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-030-99739-7_44