1 Introduction
2 Related Work
3 LLM-Assisted Knowledge Graph Engineering – Potential Application Areas
-
Assistance in knowledge graph usage:
-
Exploration and summarization of existing knowledge graphs (related experiment in Sect. 4.5)
-
Conversion of competency questions to SPARQL queries
-
Code generation or configuration of tool(chain)s for data pipelines
-
Assistance in knowledge graph construction
-
Populating knowledge graphs (related experiment in Sect. 4.4) and vice versa
-
Creation or enrichment of knowledge graph schemas / ontologies
-
Get hints for problematic graph design by analysing ChatGPT usages problems with a knowledge graph
-
Semantic search for concepts or properties defined in other already existing knowledge graphs
-
Creation and adjustment of knowledge graphs based on competency questions
-
4 Experiments
4.1 SPARQL Query Generation for a Custom Small Knowledge Graph
ChatGPT-3 | ChatGPT-4 | |
---|---|---|
syntactically correct | 5/5 | 5/5 |
plausible query structure | 4/5 | 3/5 |
producing correct result | 3/5 | 2/5 |
using only defined classes and properties | 3/5 | 4/5 |
correct usage of classes and properties | 5/5 | 5/5 |
correct prefix for the graph | 5/5 | 4/5 |
4.2 Token Counts for Knowledge Graphs Schemas
Graph | Serialisation Type | Token Count |
---|---|---|
Mondial Oracle DB schema | SQL schema | 2,608 token |
Mondial RDF schema | turtle | 5,339 token |
Mondial RDF schema | functional syntax | 9,696 token |
Mondial RDF schema | manchester syntax | 11,336 token |
Mondial RDF schema | xml/rdf | 17,179 token |
Mondial RDF schema | json-ld | 47,229 token |
Wine Ontology | turtle | 13,591 token |
Wine Ontology | xml/rdf | 24,217 token |
Pizza Ontology | turtle | 5.431 token |
Pizza Ontology | xml/rdf | 35,331 token |
DBpedia RDF schema | turtle | 471,251 token |
DBpedia RDF schema | xml/rdf | 2,338,484 token |
4.3 SPARQL Query Generation for the Mondial Knowledge Graph
ChatGPT-3 | ChatGPT-4 | |
---|---|---|
syntactically correct | 5/5 | 5/5 |
plausible query structure | 2/5 | 4/5 |
producing correct result | 0/5 | 0/5 |
using only defined classes and properties | 1/5 | 3/5 |
correct usage of classes and properties | 0/5 | 3/5 |
correct prefix for mondial graph | 0/5 | 1/5 |
4.4 Knowledge Extraction from Fact Sheets
-
The JSON-LD output format prioritizes usage of schema.org vocabulary in the 5 evaluation runs. This works good for well-known entities and properties (e.g.
Organization
@type for the manufacturer, or thename
property), however, for the AM-specific feature key names or terms likeprinter
ChatGPT-3 invents reasonable but non-existent property names (in the schema.org namespace) instead of accurately creating a new namespace or using a dedicated AM ontology for that purpose. -
Requesting
turtle
as output format instead, leads to different results. E.g. the property namespace prefix is based on the printer ID and therefore printer descriptions are not interoperable and can not be queried in unified way in a joint KG. -
Successfully splitting x, y and z values of the maximum print dimension (instead of extracting all dimensions into one string literal) works in 3 runs. Although ChatGPT-3 accurately appends the unit of measurement to all x, y, z values (which is only mentioned after the z value in the input) in those cases, this is a modelling flaw, as querying the KG will be more complex. In one run it addressed this issue by separating units into a separate unit code field.
-
A similar effect was observed when it comes to modelling the dependent entities. E.g., in 4 runs, the manufacturer was modelled correctly as a separate typed entity, in 1 as string literal instead.
4.5 Knowledge Graph Exploration
prefix:concept
notation. If the first question did not achieve the goal, we asked additional questions or demands to ChatGPT-3. The results are presented in Tab. 4 and we evaluated the displayed graphs based on the following criteria:
rdfs:subPropertyOf
relation, and the nodes were labelled in prefix notation, as were the edges. By arranging it as a tree using the subClassOf-pattern, only two different properties were used for the relations (edges). The root node was of type owl:Thing
other nodes are connected as (sub)classes from the DBpedia ontology. These were: Place, Organization, Event, Work, Species, and Person. The class Work had one more subClassOf relation to the class MusicalWork. The class Person had the most complex representation, with two more subClassOf relations leading to foaf:Person
and foaf:Agent
, the latter of which is a subclass of the root node (owl:Thing
).dbo:Occupation
is non-existent. All remaining nodes and edges comply with the rules of the ontology, even if the concepts used are derived through further subclass relationships. The resulting diagram is shown in the Supplemental Online Resources.