Skip to main content
Top

2020 | OriginalPaper | Chapter

Extraction of a Knowledge Graph from French Cultural Heritage Documents

Authors : Erwan Marchand, Michel Gagnon, Amal Zouaq

Published in: ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Cultural heritage in Quebec is often represented as collections of French documents that contain a lot of valuable, yet unstructured, data. One of the current aims of the Quebec Ministry of Culture and Communications (MCCQ) is to learn a knowledge graph from unstructured documents to offer an integrated semantic portal on Quebec’s cultural heritage. In the context of this project, we describe a machine learning and open information extraction approach that leverages named entity extraction and open relation extraction in English to extract a knowledge graph from French documents. We also enhance the generic entities that can be recognized in texts with domain-related types.
Our results show that our method leads to a substantial enrichment of the knowledge graph based on the initial corpus provided by the MCCQ.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Araújo, C., Martini, R.G., Henriques, P.R., Almeida, J.J.: Annotated documents and expanded CIDOC-CRM ontology in the automatic construction of a virtual museum. In: Rocha, Á., Reis, L.P. (eds.) Developments and Advances in Intelligent Systems and Applications. SCI, vol. 718, pp. 91–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-58965-7_7CrossRef Araújo, C., Martini, R.G., Henriques, P.R., Almeida, J.J.: Annotated documents and expanded CIDOC-CRM ontology in the automatic construction of a virtual museum. In: Rocha, Á., Reis, L.P. (eds.) Developments and Advances in Intelligent Systems and Applications. SCI, vol. 718, pp. 91–110. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-58965-7_​7CrossRef
3.
go back to reference Bhutani, N., Jagadish, H., Radev, D.: Nested propositions in open information extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 55–64 (2016) Bhutani, N., Jagadish, H., Radev, D.: Nested propositions in open information extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 55–64 (2016)
4.
go back to reference Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366 (2013) Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366 (2013)
5.
go back to reference Doerr, M.: The CIDOC CRM, an ontological approach to schema heterogeneity. In: Dagstuhl Seminar Proceedings (2005) Doerr, M.: The CIDOC CRM, an ontological approach to schema heterogeneity. In: Dagstuhl Seminar Proceedings (2005)
6.
go back to reference Gashteovski, K., Gemulla, R., Corro, L.D.: MinIE: minimizing facts in open information extraction. (2017) Gashteovski, K., Gemulla, R., Corro, L.D.: MinIE: minimizing facts in open information extraction. (2017)
7.
go back to reference Gotti, F., Langlais, P.: Harnessing open information extraction for entity classification in a French corpus. In: Canadian Conference on Artificial Intelligence, pp. 150–161 (2016) Gotti, F., Langlais, P.: Harnessing open information extraction for entity classification in a French corpus. In: Canadian Conference on Artificial Intelligence, pp. 150–161 (2016)
8.
go back to reference Guha, R.V., Brickley, D., Macbeth, S.: Schema.org: evolution of structured data on the web. Commun. ACM 59(2), 44–51 (2016)CrossRef Guha, R.V., Brickley, D., Macbeth, S.: Schema.org: evolution of structured data on the web. Commun. ACM 59(2), 44–51 (2016)CrossRef
9.
10.
go back to reference Lauscher, A., Song, Y., Gashteovski, K.: MinScIE: citation-centered open information extraction. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 386–387 (2019) Lauscher, A., Song, Y., Gashteovski, K.: MinScIE: citation-centered open information extraction. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 386–387 (2019)
11.
go back to reference Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014) Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
13.
go back to reference Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534 (2012) Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534 (2012)
14.
go back to reference Song, M., Kim, W.C., Lee, D., Heo, G.E., Kang, K.Y.: PKDE4J: entity and relation extraction for public knowledge discovery. J. Biomed. Inform. 57, 320–332 (2015)CrossRef Song, M., Kim, W.C., Lee, D., Heo, G.E., Kang, K.Y.: PKDE4J: entity and relation extraction for public knowledge discovery. J. Biomed. Inform. 57, 320–332 (2015)CrossRef
15.
go back to reference Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1063–1064 (2012) Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1063–1064 (2012)
16.
go back to reference Weibel, S.L., Koch, T.: The Dublin core metadata initiative. D-lib Mag. 6(12), 1082–9873 (2000) Weibel, S.L., Koch, T.: The Dublin core metadata initiative. D-lib Mag. 6(12), 1082–9873 (2000)
Metadata
Title
Extraction of a Knowledge Graph from French Cultural Heritage Documents
Authors
Erwan Marchand
Michel Gagnon
Amal Zouaq
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-55814-7_2

Premium Partner