Skip to main content
Erschienen in: Knowledge and Information Systems 1/2020

23.03.2019 | Regular Paper

Constructing biomedical domain-specific knowledge graph with minimum supervision

verfasst von: Jianbo Yuan, Zhiwei Jin, Han Guo, Hongxia Jin, Xianchao Zhang, Tristram Smith, Jiebo Luo

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Domain-specific knowledge graph is an effective way to represent complex domain knowledge in a structured format and has shown great success in real-world applications. Most existing work on knowledge graph construction and completion shares several limitations in that sufficient external resources such as large-scale knowledge graphs and concept ontologies are required as the starting point. However, such extensive domain-specific labeling is highly time-consuming and requires special expertise, especially in biomedical domains. Therefore, knowledge extraction from unstructured contexts with minimum supervision is crucial in biomedical fields. In this paper, we propose a versatile approach for knowledge graph construction with minimum supervision based on unstructured biomedical domain-specific contexts including the steps of entity recognition, unsupervised entity and relation embedding, latent relation generation via clustering, relation refinement and relation assignment to assign cluster-level labels. The experimental results based on 24,687 unstructured biomedical science abstracts show that the proposed framework can effectively extract 16,192 structured facts with high precision. Moreover, we demonstrate that the constructed knowledge graph is a sufficient resource for the task of knowledge graph completion and new knowledge inference from unseen contexts.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Angeli G, Premkumar MJJ, Manning CD (2015) Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian Federation of Natural Language Processing, July 26–31, 2015, vol 1. Long Papers, Beijing, China, pp 344–354 Angeli G, Premkumar MJJ, Manning CD (2015) Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian Federation of Natural Language Processing, July 26–31, 2015, vol 1. Long Papers, Beijing, China, pp 344–354
2.
Zurück zum Zitat Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 1027–1035 Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 1027–1035
3.
Zurück zum Zitat Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25CrossRef Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25CrossRef
4.
Zurück zum Zitat Augenstein I, Vlachos A, Maynard D (2015) Extracting relations between non-standard entities using distant supervision and imitation learning. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Florence, pp 747–757 Augenstein I, Vlachos A, Maynard D (2015) Extracting relations between non-standard entities using distant supervision and imitation learning. In: Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Florence, pp 747–757
5.
Zurück zum Zitat Bai T, Gong L, Wang Y, Wang Y, Kulikowski CA, Huang L (2016) A method for exploring implicit concept relatedness in biomedical knowledge network. BMC Bioinform 17(9):265CrossRef Bai T, Gong L, Wang Y, Wang Y, Kulikowski CA, Huang L (2016) A method for exploring implicit concept relatedness in biomedical knowledge network. BMC Bioinform 17(9):265CrossRef
6.
Zurück zum Zitat Belleau F, Nolin M-A, Tourigny N, Rigault P, Morissette J (2008) Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 41(5):706–716CrossRef Belleau F, Nolin M-A, Tourigny N, Rigault P, Morissette J (2008) Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 41(5):706–716CrossRef
7.
Zurück zum Zitat Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:​1607.​04606
8.
Zurück zum Zitat Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, New York City, pp 1247–1250 Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, New York City, pp 1247–1250
9.
Zurück zum Zitat Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795 Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795
10.
Zurück zum Zitat Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings compression and complexity of sequences. IEEE, Piscataway, pp 21–29 Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings compression and complexity of sequences. IEEE, Piscataway, pp 21–29
11.
Zurück zum Zitat Consortium U (2016) Uniprot: the universal protein knowledgebase. Nucleic Acids Res 45(D1):D158–D169 Consortium U (2016) Uniprot: the universal protein knowledgebase. Nucleic Acids Res 45(D1):D158–D169
12.
Zurück zum Zitat Ernst P, Siu A, Milchevski D, Hoffart J, Weikum G (2016) Deeplife: an entity-aware search, analytics and exploration platform for health and life sciences. ACL, Vancouver, p 19 Ernst P, Siu A, Milchevski D, Hoffart J, Weikum G (2016) Deeplife: an entity-aware search, analytics and exploration platform for health and life sciences. ACL, Vancouver, p 19
13.
Zurück zum Zitat Ernst P, Siu A, Weikum G (2015) Knowlife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinform 16(1):157CrossRef Ernst P, Siu A, Weikum G (2015) Knowlife: a versatile approach for constructing a large knowledge graph for biomedical sciences. BMC Bioinform 16(1):157CrossRef
14.
Zurück zum Zitat Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, Florence, pp 363–370 Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, Florence, pp 363–370
15.
Zurück zum Zitat Galárraga L, Heitz G, Murphy K, Suchanek FM (2014) Canonicalizing open knowledge bases. In: Proceedings of the 23rd ACM international conference on information and knowledge management. ACM, New York City, pp 1679–1688 Galárraga L, Heitz G, Murphy K, Suchanek FM (2014) Canonicalizing open knowledge bases. In: Proceedings of the 23rd ACM international conference on information and knowledge management. ACM, New York City, pp 1679–1688
16.
Zurück zum Zitat Hoffmann R, Zhang C, Ling X, Zettlemoyer L, Weld DS (2011) Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, Florence, pp 541–550 Hoffmann R, Zhang C, Ling X, Zettlemoyer L, Weld DS (2011) Knowledge-based weak supervision for information extraction of overlapping relations. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, Florence, pp 541–550
17.
Zurück zum Zitat Ji M, He Q, Han J, Spangler S (2015) Mining strong relevance between heterogeneous entities from unstructured biomedical data. Data Min Knowl Discov 29(4):976–998MathSciNetCrossRef Ji M, He Q, Han J, Spangler S (2015) Mining strong relevance between heterogeneous entities from unstructured biomedical data. Data Min Knowl Discov 29(4):976–998MathSciNetCrossRef
18.
Zurück zum Zitat Kilicoglu H, Fiszman M, Rodriguez A, Shin D, Ripple A, Rindflesch TC (2008) Semantic medline: a web application for managing the results of pubmed searches. In: Proceedings of the third international symposium for semantic mining in biomedicine, vol 2008. Citeseer, Princeton, pp 69–76 Kilicoglu H, Fiszman M, Rodriguez A, Shin D, Ripple A, Rindflesch TC (2008) Semantic medline: a web application for managing the results of pubmed searches. In: Proceedings of the third international symposium for semantic mining in biomedicine, vol 2008. Citeseer, Princeton, pp 69–76
19.
Zurück zum Zitat Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, Van Kleef P, Auer S et al (2015) Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web 6(2):167–195CrossRef Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, Van Kleef P, Auer S et al (2015) Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web 6(2):167–195CrossRef
20.
Zurück zum Zitat Lin Y, Shen S, Liu Z, Luan H, Sun M (2016) Neural relation extraction with selective attention over instances. In: Proceedings of ACL, vol 1, pp 2124–2133 Lin Y, Shen S, Liu Z, Luan H, Sun M (2016) Neural relation extraction with selective attention over instances. In: Proceedings of ACL, vol 1, pp 2124–2133
21.
Zurück zum Zitat Mahdisoltani F, Biega J, Suchanek F (2014) Yago3: a knowledge base from multilingual wikipedias. In: CIDR conference 7th Biennial conference on innovative data systems research Mahdisoltani F, Biega J, Suchanek F (2014) Yago3: a knowledge base from multilingual wikipedias. In: CIDR conference 7th Biennial conference on innovative data systems research
22.
Zurück zum Zitat Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The stanford coreNLP natural language processing toolkit. ACL, Florence, p 55 Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The stanford coreNLP natural language processing toolkit. ACL, Florence, p 55
23.
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119
24.
Zurück zum Zitat Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33CrossRef Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33CrossRef
25.
Zurück zum Zitat Niu F, Zhang C, Ré C, Shavlik JW (2012) Deepdive: web-scale knowledge-base construction using statistical learning and inference. VLDS 12:25–28 Niu F, Zhang C, Ré C, Shavlik JW (2012) Deepdive: web-scale knowledge-base construction using statistical learning and inference. VLDS 12:25–28
26.
Zurück zum Zitat Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543 Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
27.
Zurück zum Zitat Ren X, Wu Z, He W, Qu M, Voss CR, Ji H, Abdelzaher TF, Han J (2016) Cotype: joint extraction of typed entities and relations with knowledge bases. arXiv preprint arXiv:1610.08763 Ren X, Wu Z, He W, Qu M, Voss CR, Ji H, Abdelzaher TF, Han J (2016) Cotype: joint extraction of typed entities and relations with knowledge bases. arXiv preprint arXiv:​1610.​08763
29.
Zurück zum Zitat Rindflesch TC, Fiszman M (2003) The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 36(6):462–477CrossRef Rindflesch TC, Fiszman M (2003) The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform 36(6):462–477CrossRef
30.
Zurück zum Zitat Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRef Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRef
31.
Zurück zum Zitat Siu A, Ernst P, Weikum G (2016) Disambiguation of entities in medline abstracts by combining mesh terms with knowledge. ACL, Florence, p 72 Siu A, Ernst P, Weikum G (2016) Disambiguation of entities in medline abstracts by combining mesh terms with knowledge. ACL, Florence, p 72
32.
Zurück zum Zitat Siu A, Nguyen DB, Weikum G (2013) Fast entity recognition in biomedical. In: Proceedings of workshop on data mining for healthcare (DMH) at conference on knowledge discovery and data mining (KDD). ACM Press, New York Siu A, Nguyen DB, Weikum G (2013) Fast entity recognition in biomedical. In: Proceedings of workshop on data mining for healthcare (DMH) at conference on knowledge discovery and data mining (KDD). ACM Press, New York
33.
Zurück zum Zitat Surdeanu M, Tibshirani J, Nallapati R, Manning CD (2012) Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, Florence, pp 455–465 Surdeanu M, Tibshirani J, Nallapati R, Manning CD (2012) Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. Association for Computational Linguistics, Florence, pp 455–465
34.
Zurück zum Zitat Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27–31, 2014, Québec City, Québec, Canada, pp 1112–1119 Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27–31, 2014, Québec City, Québec, Canada, pp 1112–1119
35.
Zurück zum Zitat Xie R, Liu Z, Sun M (2016) Representation learning of knowledge graphs with hierarchical types. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 2965–2971 Xie R, Liu Z, Sun M (2016) Representation learning of knowledge graphs with hierarchical types. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 2965–2971
36.
Zurück zum Zitat You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence. AAAI Press, Palo Alto, pp 381–388 You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence. AAAI Press, Palo Alto, pp 381–388
37.
Zurück zum Zitat Yuan J, Guo H, Jin Z, Jin H, Zhang X, Luo J (2017) One-shot learning for fine-grained relation extraction via convolutional siamese neural network. In: IEEE international conference on big data. IEEE, Piscataway, pp 2194–2199 Yuan J, Guo H, Jin Z, Jin H, Zhang X, Luo J (2017) One-shot learning for fine-grained relation extraction via convolutional siamese neural network. In: IEEE international conference on big data. IEEE, Piscataway, pp 2194–2199
38.
Zurück zum Zitat Yuan J, Holtz C, Smith T, Luo J (2016) Autism spectrum disorder detection from semi-structured and unstructured medical data. EURASIP J Bioinform Syst Biol 2017(1):3CrossRef Yuan J, Holtz C, Smith T, Luo J (2016) Autism spectrum disorder detection from semi-structured and unstructured medical data. EURASIP J Bioinform Syst Biol 2017(1):3CrossRef
39.
Zurück zum Zitat Zeng D, Liu K, Chen Y, Zhao J (2015) Distant supervision for relation extraction via piecewise convolutional neural networks. In: EMNLP, pp 1753–1762 Zeng D, Liu K, Chen Y, Zhao J (2015) Distant supervision for relation extraction via piecewise convolutional neural networks. In: EMNLP, pp 1753–1762
Metadaten
Titel
Constructing biomedical domain-specific knowledge graph with minimum supervision
verfasst von
Jianbo Yuan
Zhiwei Jin
Han Guo
Hongxia Jin
Xianchao Zhang
Tristram Smith
Jiebo Luo
Publikationsdatum
23.03.2019
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2020
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-019-01351-4

Weitere Artikel der Ausgabe 1/2020

Knowledge and Information Systems 1/2020 Zur Ausgabe