1 Background
1.1 Preliminaries
1.2 Method
1.2.1 Technical setup
1.2.2 Creating a document and context graph with basic context extraction
hasAffiliation
, isAuthor
, hasDocument
, hasCitation
(Attribute: provenance), isOfType
.Node type | Attributes |
---|---|
Author | Forename, surname |
Affiliation | Affiliation |
Document | DocumentId, title, collection, provenance, etc. |
Journal | Journal |
PublicationType | Identifier, type |
Entity | Source, identifier, preferredLabel, uri |
Unstructured | Value, uri |
BELFunction |
1.2.3 Extending the knowledge graph using NLP-technologies
2 Results
2.1 Real-world use cases for testing
# | Query | Input example | Output |
---|---|---|---|
1 | Which author was the first to state that {Entity1} has an enhancing effect on {Entity2}? | APP, gamma Secretase Complex | Author and document title |
2 | Which genes {Entity1} play a role in two diseases {Entity2}? | Entity.source = HGNC, MESH | Subgraph of genes with 2 diseases |
3 | In which journal was it published that {Entity1} has an enhancing effect on {Entity2}? | APP, gamma Secretase Complex | Document and Journal |
4 | What is the shortest way between {Entity1} and {Entity2} and what is on that way? | Axonal transport, LRP3 | Path between nodes |
5 | Where was it published that {Entity1} has an enhancing effect on {Entity2} and what documents cite this? | APP, gamma Secretase Complex | List of publishing and citing documents |
6 | What are the most important entities in context of {Entity1} disease? | Alzheimer’s | Page Rank of neighboring entities |
7 | Which authors publish in the same journal on the topic {Entity1} and have not yet published together? | Alzheimer’s disease | List of author couples |
8 | Find a path of biological entities that connects {Entity1} with {Entity2} | Alzheimer’s disease, ACHE | Path of entities |
9 | Are there authors within the same affiliation who make contradictory statements regarding protein {Entity1} and protein {Entity2}? | Apoptotic process, SLC25A21 | Number of statements for both variants |
10 | Do the data in the literature correlate with the concomitant diseases for illness {Entity1}? So are the genes mentioned in {Entity1} documents also mentioned in {Entity2} documents of the concomitant disease? | Alzheimer’s, Down syndrome | Genes involved in both diseases in the literature |
11 | Does the function of a gene {Entity} differ in different contexts? | IL1B | List of all functions in contexts |
12 | How far apart are {document1} and {document2}? | PMID:16160056, PMID:16160050 | Shortest path between documents |
13 | Does the biological process on gene {Entity1} also exist in context of {Entity2}? And what author describes it? | APOE, brain | Outcome graph in context of the brain |
14 | Are there BEL statements that have no source, so should be checked? | – | List of relations |
15 | How many sources are there for the statements of a contradictory BEL statement? | hasRelation. function = increases, decreases | Number of sources for each of the cases |
16 | Is there also a relation between the documents describing the entities {Entity1} and {Entity2} that matches the relation in a BEL statement with the entities {Entity1} and {Entity2}? | APP, Alzheimer | Document pairs |
17 | Find the oldest document describing an entity {entity} | APP | Oldest Document |
18 | Is a reviewer {Author1} suitable for a proposal with the author {Author} or is there a conflict of interest? Does the reviewer have relationships with the author in the form of joint work or equal affiliation? | Ulrich Rothe, A. Castillo | Potential Graph between the authors |
19 | On which topics does the author {Author} write most? | Ulrich Rothe | List of the most frequent annotations |
20 | In which other journals could the author {Author} write with his main topics? Which journal in which he has not yet published would suit him from his main topics? | Ulrich Rothe | List of journals that could fit him |
21 | Which Affiliation has the most publications on the topic {Entity} in the Journal {Journal}? | D008358, Biotechnology letters | Affiliation with the highest number of publications |
22 | From when is the document cited in documents dealing with the subject {Entity}? | D017629 | Publication date of cited document |
23 | Which document is the most cited paper in connection with {Entity}, of papers that also annotate {Entity}? Determined by PageRank. | D017629 | Most cited paper-type document |
24 | Which entities have many relations with {Entity}? Determined by Community Detection. | APP | Surrounding community graph |
25 | Which author connects the two subject areas {Entity1} and {Entity2} most strongly? | Alzheimer Disease, Parkinson | Author with highest betweenness centrality |
26 | Which gene {Entity} is the most important? | Entity.source = HGNC | Entity with highest degree centrality |
27 | Are there strongly connected components between the entities? | Assignment of the entities to cliques |
2.2 Storing the knowledge graph
2.3 Polyglot persistence systems
Query | Poly1 (%) | Poly2 (%) | Problem |
---|---|---|---|
14 | 26.8 | 25.8 | RPQ |
27 | 23.8 | \(-\)2.6 | Connected components |
11 | 22.5 | 17.7 | RPQ |
8 | 18.2 | 43.3 | ECRPQ |
2 | 11.5 | 22.9 | RPQ |
15 | 10.3 | 4.5 | CRPQ |
20 | 9.2 | 2.5 | CRPQ |
23 | 7.7 | 6.8 | Page rank |
26 | 6.8 | 2.4 | Degree centrality |
16 | 6.6 | 5.1 | RPQ |
5 | 5.4 | 4.6 | CRPQ |
22 | 3.8 | 3.5 | ECRPQ |
17 | 3.1 | 31.9 | RPQ |
10 | \(-\)0.2 | 7.0 | CRPQ |
3 | \(-\)2.3 | 7.9 | CRPQ |
19 | \(-\)2.3 | 8.0 | RPQ |
1 | \(-\)2.5 | 4.9 | CRPQ |
13 | \(-\)4.1 | 4.8 | CRPQ |
21 | \(-\)11.0 | \(-\)0.3 | RPQ |
18 | \(-\)15.7 | \(-\)15.1 | ECRPQ |
Average | 5.8 | 9.8 |
entity
and hasRelation
) to experience a greater decrease in runtimes than queries with many node and edge types.2.4 Graph queries
match (n:Entity preferredLabel: "APP"
)-[r:hasRelation function: "increases"
]->(m:Entity preferredLabel: "gamma Secretase Complex"
), (doc:Document documentID: r.context
)<-[r2:isAuthor
]-(author:Author
) return doc, author order by doc.publicationDate limit
.