Skip to main content

2020 | OriginalPaper | Buchkapitel

Extraction of a Knowledge Graph from French Cultural Heritage Documents

verfasst von : Erwan Marchand, Michel Gagnon, Amal Zouaq

Erschienen in: ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cultural heritage in Quebec is often represented as collections of French documents that contain a lot of valuable, yet unstructured, data. One of the current aims of the Quebec Ministry of Culture and Communications (MCCQ) is to learn a knowledge graph from unstructured documents to offer an integrated semantic portal on Quebec’s cultural heritage. In the context of this project, we describe a machine learning and open information extraction approach that leverages named entity extraction and open relation extraction in English to extract a knowledge graph from French documents. We also enhance the generic entities that can be recognized in texts with domain-related types.
Our results show that our method leads to a substantial enrichment of the knowledge graph based on the initial corpus provided by the MCCQ.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Araújo, C., Martini, R.G., Henriques, P.R., Almeida, J.J.: Annotated documents and expanded CIDOC-CRM ontology in the automatic construction of a virtual museum. In: Rocha, Á., Reis, L.P. (eds.) Developments and Advances in Intelligent Systems and Applications. SCI, vol. 718, pp. 91–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-58965-7_7CrossRef Araújo, C., Martini, R.G., Henriques, P.R., Almeida, J.J.: Annotated documents and expanded CIDOC-CRM ontology in the automatic construction of a virtual museum. In: Rocha, Á., Reis, L.P. (eds.) Developments and Advances in Intelligent Systems and Applications. SCI, vol. 718, pp. 91–110. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-58965-7_​7CrossRef
3.
Zurück zum Zitat Bhutani, N., Jagadish, H., Radev, D.: Nested propositions in open information extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 55–64 (2016) Bhutani, N., Jagadish, H., Radev, D.: Nested propositions in open information extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 55–64 (2016)
4.
Zurück zum Zitat Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366 (2013) Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366 (2013)
5.
Zurück zum Zitat Doerr, M.: The CIDOC CRM, an ontological approach to schema heterogeneity. In: Dagstuhl Seminar Proceedings (2005) Doerr, M.: The CIDOC CRM, an ontological approach to schema heterogeneity. In: Dagstuhl Seminar Proceedings (2005)
6.
Zurück zum Zitat Gashteovski, K., Gemulla, R., Corro, L.D.: MinIE: minimizing facts in open information extraction. (2017) Gashteovski, K., Gemulla, R., Corro, L.D.: MinIE: minimizing facts in open information extraction. (2017)
7.
Zurück zum Zitat Gotti, F., Langlais, P.: Harnessing open information extraction for entity classification in a French corpus. In: Canadian Conference on Artificial Intelligence, pp. 150–161 (2016) Gotti, F., Langlais, P.: Harnessing open information extraction for entity classification in a French corpus. In: Canadian Conference on Artificial Intelligence, pp. 150–161 (2016)
8.
Zurück zum Zitat Guha, R.V., Brickley, D., Macbeth, S.: Schema.org: evolution of structured data on the web. Commun. ACM 59(2), 44–51 (2016)CrossRef Guha, R.V., Brickley, D., Macbeth, S.: Schema.org: evolution of structured data on the web. Commun. ACM 59(2), 44–51 (2016)CrossRef
9.
10.
Zurück zum Zitat Lauscher, A., Song, Y., Gashteovski, K.: MinScIE: citation-centered open information extraction. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 386–387 (2019) Lauscher, A., Song, Y., Gashteovski, K.: MinScIE: citation-centered open information extraction. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 386–387 (2019)
11.
Zurück zum Zitat Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014) Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
13.
Zurück zum Zitat Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534 (2012) Schmitz, M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534 (2012)
14.
Zurück zum Zitat Song, M., Kim, W.C., Lee, D., Heo, G.E., Kang, K.Y.: PKDE4J: entity and relation extraction for public knowledge discovery. J. Biomed. Inform. 57, 320–332 (2015)CrossRef Song, M., Kim, W.C., Lee, D., Heo, G.E., Kang, K.Y.: PKDE4J: entity and relation extraction for public knowledge discovery. J. Biomed. Inform. 57, 320–332 (2015)CrossRef
15.
Zurück zum Zitat Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1063–1064 (2012) Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1063–1064 (2012)
16.
Zurück zum Zitat Weibel, S.L., Koch, T.: The Dublin core metadata initiative. D-lib Mag. 6(12), 1082–9873 (2000) Weibel, S.L., Koch, T.: The Dublin core metadata initiative. D-lib Mag. 6(12), 1082–9873 (2000)
Metadaten
Titel
Extraction of a Knowledge Graph from French Cultural Heritage Documents
verfasst von
Erwan Marchand
Michel Gagnon
Amal Zouaq
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-55814-7_2

Premium Partner