1 Introduction
2 Modeling and query technologies
2.1 The domain model
2.2 Modeling technologies
OO | EMF | Property graphs | RDF | SQL |
---|---|---|---|---|
Class definition |
EClass instance | Node label or type property |
rdfs:Class
| Table definition |
Reference definition |
EReference instance | Edge label |
rdf:Property, owl:ObjectProperty
| Foreign key definition |
Attribute definition |
EAttribute instance | Property name |
rdf:Property, owl:DataTypeProperty
| Column definition |
Type |
EDataType instance | (Only primitives) |
rdfs:Datatype
| (Only primitives) |
Class attributes |
eAttributes reference |
\(\Circle \)
|
rdfs:domain
| Table columns |
Class references |
eReferences reference |
\(\Circle \)
|
rdfs:domain
| Foreign keys |
Attribute type |
eAttributeType reference | property type |
rdfs:range
| Column type |
Reference type |
eReferenceType reference |
\(\Circle \)
|
rdfs:range
|
\(\Circle \)
|
Superclasses |
eSuperTypes reference |
\(\Circle \)
|
rdfs:subClassOf
| (Various mappings) |
Aggregation |
containment flag |
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
|
2.2.1 Eclipse Modeling Framework (EMF)
2.2.2 Property graphs
2.2.3 Resource Description Framework (RDF)
-
RDF models with inferred triples For a resource of a given type, all supertypes are explicitly asserted in the model. For example, a resource with the type Segment also has the type TrackElement. Such an instance model is shown in Fig. 5b. Note that the
_1
and_2
resources not only have the type Segment and Switch, but also the type TrackElement.
x
represents a unique identifier.Format | Tool | Query language | Incremental | In-memory engine | Implementation language |
---|---|---|---|---|---|
EMF | Drools | DRL |
\(\CIRCLE \)
|
\(\CIRCLE \)
| Java |
Eclipse OCL | OCL |
\(\Circle \)
|
\(\CIRCLE \)
| Java | |
EMF API | − |
\(\Circle \)
|
\(\CIRCLE \)
| Java | |
Viatra Query | VQL |
\(\CIRCLE \)
|
\(\CIRCLE \)
| Java | |
graph | Neo4j | Cypher |
\(\Circle \)
|
\(\Circle \)
| Java |
TinkerGraph | − |
\(\Circle \)
|
\(\CIRCLE \)
| Java | |
RDF | Jena | SPARQL |
\(\Circle \)
|
\(\CIRCLE \)
| Java |
RDF4J | SPARQL |
\(\Circle \)
|
\(\CIRCLE \)
| Java | |
SQL | SQLite | SQL |
\(\Circle \)
|
\(\CIRCLE \)
| C |
MySQL | SQL |
\(\Circle \)
|
\(\Circle \)
| C ++
|
2.2.4 Relational databases
2.3 Query technologies
2.3.1 EMF tools
-
As a baseline, we have written a local search-based algorithm for each query in Java, using the EMF API. The implementations traverse the model without specific search plan optimizations, but they cut unnecessary search branches at the earliest possibility.
-
The OCL [54] language is commonly used for querying EMF model instances in validation frameworks. It is a standardized navigation-based query language, applicable over a range of modeling formalisms. Taking advantage of the expressive features and widespread adoption of this query language, the project Eclipse OCL [19] provides a powerful query interface that evaluates such expressions over EMF models.
-
Viatra Query [8] is an Eclipse Modeling project where several authors of the current paper are involved. Viatra Query provides incremental query evaluation using the Rete algorithm [23]. Queries are defined in a graph pattern-based query language [10] and evaluated over EMF models. Viatra Query is developed with a focus on incremental query evaluation; however, it is also capable of evaluating queries with a local search-based algorithm [14].
-
Incremental query evaluation is also supported by Drools [41], a rule engine developed by Red Hat. Similarly to Viatra Query, Drools is based on ReteOO, an object-oriented version of the Rete algorithm [23]. In particular, Drools 6 uses PHREAK, an improved version of ReteOO with support for lazy evaluation. Queries can be formalized using DRL, the Drools Rule Language. While Drools is not a dedicated EMF tool, the Drools implementation of the Train Benchmark works on EMF models. While EMF has some memory overhead [87], its advanced features, including deserialization and notifications, make it well suited for using with Drools.
2.3.2 RDF tools
2.3.3 Property graph tools
-
As of 2016, the most popular graph database is Neo4j [52] which provides multiple ways to query graphs: (1) a low-level core API for elementary graph operations, (2) the Cypher language, a declarative language focusing on graph pattern matching.While Cypher is very expressive and its optimization engine is being actively developed, it may be beneficial for some queries to implement the search algorithms manually [67, Chapter 6: Graph Database Internals].
-
TinkerGraph is an in-memory reference implementation of the property graph interfaces provided by the Apache TinkerPop framework [4].
3 Benchmark specification
3.1 Inputs and outputs
3.2 Phases
3.3 Use case scenarios
3.3.1 Batch validation scenario (Batch)
3.3.2 Fault injection scenario (Inject)
3.3.3 Automated model repair scenario (Repair)
3.4 Specification of queries
-
Positive conditions define the structure and type of the vertices and edges that must be satisfied.
-
Negative conditions (also known as negative application conditions) define subpatterns which must not be satisfied. Negative conditions are displayed in a red rectangle with the NEG caption.
-
Filter conditions are defined to check the value of vertex properties. Filter conditions are typeset in italic.
-
PosLength (Fig. 7a) requires that a segment must have a positive length. The corresponding query defines a simple property check, a common use case in validation.
-
SwitchMonitored (Fig. 7b) requires every switch to have at least one sensor connected to it. The corresponding query checks whether a vertex is connected to another vertex. This pattern is common in more complex queries, e.g. it is used in the RouteSensor and SemaphoreNeighbor queries.
-
RouteSensor (Fig. 7c) requires that all sensors associated with a switch that belongs to a route must also be associated directly with the same route. The corresponding query checks for the absence of circles, so the efficiency of performing navigation and evaluating negative conditions is tested.
-
SwitchSet (Fig. 7d) requires that an entry semaphore of an active route may show GO only if all switches along the route are in the position prescribed by the route. The corresponding query tests the efficiency of navigation and filtering operations.
-
ConnectedSegments (Fig. 7e) requires each sensor to have at most 5 segments. The corresponding query checks for “chains” similar to a transitive closure. This is a common use case in model validation.
-
SemaphoreNeighbor (Fig. 7f) requires routes which are connected through a pair of sensors and a pair of track elements to belong to the same semaphore. The corresponding query checks for the absence of circles, so the efficiency of join and antijoin [76] operations is tested. One-way navigable references are also present in the constraint, so the efficiency of their evaluation is also measured. Subsumption inference is required, as the two track elements (te1, te2) can be switches or segments.
-
RouteSensor \(\leftrightarrow \) simple physical channel: both check for the absence of a circle through edges with many-to-one and many-to-many cardinalities.
-
SemaphoreNeighbor \(\leftrightarrow \) signal group mapping: both check for the absence of a circle of 7 elements.
-
SwitchMonitored \(\leftrightarrow \) ISignal: both check a negative application condition for a single element.
PosLength
|
SwitchMonitored
|
RouteSensor
|
SwitchSet
|
ConnectedSegments
|
SemaphoreNeighbor
| |
---|---|---|---|---|---|---|
# parameters | 2 | 1 | 4 | 6 | 7 | 7 |
# variables | 2 | 2 | 4 | 6 | 7 | 7 |
# vertex types | 1 | 2 | 4 | 4 | 2 | 4 |
# attributes | 1 | 0 | 0 | 4 | 0 | 0 |
# attribute and equality checks | 1 | 0 | 0 | 3 | 0 | 1 |
# edge constraints | 0 | 0 | 3 | 3 | 11 | 6 |
# negative conditions | 0 | 1 | 1 | 0 | 0 | 1 |
3.5 Specification of transformations
-
Inserting new vertices and new edges between existing vertices (marked with «new»).
-
Deleting existing vertices and edges (marked with «del»). The deletion of a vertex implies the deletion of all of its edges to eliminate dangling edges.
-
Updating the properties of a vertex (noted as property \(\leftarrow \) new value).
3.6 Query and transformations for constraint RouteSensor
3.7 Instance model generation and fault injection
-
Constraint PosLength is violated by assigning an invalid value to the length attribute.
-
Constraint SwitchMonitored is violated by deleting all monitoredBy edges of a Switch.
-
Constraint RouteSensor is violated by deleting the requires edge from a Route to a Sensor.
-
Constraint SwitchSet is violated by setting an invalid currentPosition attribute to a Switch (i.e. not the position of the corresponding SwitchPosition followed by the Route).
-
Constraint SemaphoreNeighbor is violated by deleting an entry edge between a Route and a Semaphore.
-
Constraint ConnectedSegments is violated by adding an additional (sixth) Segment to the same Sensor and connecting it to the last Segment.
Constraint |
Batch (%) |
Inject (%) |
Repair (%) |
---|---|---|---|
PosLength
| 0 | 2 | 10 |
SwitchMonitored
| 0 | 2 | 18 |
RouteSensor
| 0 | 4 | 10 |
SwitchSet
| 0 | 8 | 15 |
ConnectedSegments
| 0 | 5 | 5 |
SemaphoreNeighbor
| 0 | 7 | 25 |
3.8 Ensuring deterministic results
-
The elements for transformation are chosen using a pseudorandom generator with a fixed random seed.
-
The elements are always selected from a deterministically sorted list.
4 Evaluation
4.1 Benchmark parameters
Parameter | Values | Details in |
---|---|---|
(a) Benchmark-specific parameters | ||
Scenario |
Batch
| Section 3.3.1
|
Inject
| Section 3.3.2
| |
Repair
| Section 3.3.3
| |
Queries |
ConnectedSegments
| Section 7.1
|
PosLength
| Section 7.2
| |
RouteSensor
| Section 3.6
| |
SemaphoreNeighbor
| Section 7.4
| |
SwitchMonitored
| Section 7.5
| |
SwitchSet
| Section 7.6
| |
Size |
\(1, 2, 4, \ldots , 2048\)
| Section 3.7
|
Tool | Version | Parameters |
---|---|---|
(b) Tool-specific parameters | ||
Drools | 6.5.0 | – |
Eclipse OCL | 3.3.0 | – |
EMF API | 2.10.0 | – |
Jena | 3.0.0 | No inferencing |
inferencing | ||
MySQL | 5.7.16 | – |
Neo4j | 3.0.4 | Core API |
Cypher | ||
RDF4J | 2.1 | No inferencing |
SQLite | 3.8.11.2 | – |
TinkerGraph | 3.2.3 | – |
Viatra Query | 1.4.0 | Local search |
Incremental |
4.2 Execution of benchmark runs and environment
4.2.1 Benchmark scenarios
4.2.2 Benchmark environment
4.3 Measurement of execution times
4.3.1 How to read the charts?
4.4 Measurement of memory consumption
-
Figure 12 compares disk-based and in-memory databases. As expected, in-memory databases provide better response times in general.
-
Figure 13 shows the formats. It shows that tools using EMF implementations perform very well on small models and also scale for large models. SQL and property graph implementations show moderate performance. RDF implementations are slower for small models and do not scale for large models.
4.5 Analysis of results
4.5.1 Technology-specific findings
Domain of findings | Area | Observations |
---|---|---|
Technology | EMF |
\(\oplus \) EMF tools are suitable for model validation |
\(\ominus \) No built-in indexing support in EMF | ||
Graph databases |
\(\oplus \) Good storage performance for graph databases | |
RDF databases |
\(\ominus \) Underperforming RDF systems | |
\(\ominus \) Slow inferencing in RDF4J | ||
Relational databases |
\(\oplus \) Fast model load and good scalability from SQLite | |
\(\ominus \) MySQL slowdown for complex queries | ||
Approach | Incremental |
\(\oplus \) Incremental tools prevail for continuous validation |
\(\ominus \) The scalability of incremental tools is limited by memory constraints | ||
Search-based |
\(\oplus \) Search-based tools scale well for large models and simple queries | |
\(\ominus \) Search-based tools face problems for complex queries | ||
Indexing |
\(\oplus \) Substantial effect of indexing on performance | |
Query language features |
\(\ominus \) Long path expressions are hard to evaluate | |
Performance |
\(\oplus \) Huge differences in runtime across technologies | |
Size of modifications |
\(\oplus \) Noticeable differences between the Inject and Repair scenarios |
4.5.2 Approach-specific findings
4.6 Threats to validity
4.6.1 Internal threats
4.6.2 External threats
4.7 Summary
5 Related work
5.1 Model transformation and graph transformation benchmarks
5.1.1 Graph transformation benchmarks
5.1.2 Tool contests
Year | Case | Goal | Scope | t2m | m2m | m2t | Updates | Performance-oriented |
---|---|---|---|---|---|---|---|---|
2015 | The Train Benchmark for Incremental Validation | Perform well-formedness validations and quick fix-like repair transformations. | Validation and modification |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\CIRCLE \)
|
\(\CIRCLE \)
|
The Model Execution Case | Define a transformation, which specifies the operational semantics of the UML activity diagram language by updating the runtime state of executed UML activity diagrams. | Model execution |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\LEFTcircle \)
| |
The Java Refactoring Case | Parse the source code, perform refactoring operations (pull up method, create superclass) on the program graph and generate the source code. | Refactoring |
\(\CIRCLE \)
|
\(\CIRCLE \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
| |
Java Annotations (Live) | Use annotations to extend existing Java code. | Refactoring |
\(\CIRCLE \)
|
\(\CIRCLE \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
| |
2014 | FIXML to Java, C # and C++
| Transform financial transaction data expressed in FIXML format into class definitions in Java, C # and C++ . | Deserialization |
\(\CIRCLE \)
|
\(\CIRCLE \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
Movie Database | Determine all actor couples who performed together in a set of at least three movies. | Model modification |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\CIRCLE \)
| |
The Transformation Tool Soccer Worldcup (Live) | Implement a soccer client, using model transformations. | AI |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
| |
2013 | Petri-Nets to Statecharts | Mapping from Petri-Nets to statecharts. | Model synthesis |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\CIRCLE \)
|
Class diagram restructuring | Perform refactoring operations: pull up, create superclass, create subclass. | Program refactoring |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
| |
Flowgraphs | Analysis and transformations in compiler construction: working on the data structures, control flow and data flow graphs. | Model synthesis and validation |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
| |
2011 | GMF Model Migration | Migrate models in response to metamodel adaptation. | Model migration |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
|
Compiler Optimization | Use the intermediate representation of the code to perform local optimizations, and do instruction selections to transform the intermediate representation to a least cost target representation. | Compiler optimization |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\CIRCLE \)
| |
Program Understanding: Reengineering | Create a state machine model out of a Java syntax graph. | Model synthesis |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\LEFTcircle \)
| |
Hello World | Several primitive tasks that can be solved straight away with most transformation tools. | Various |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\CIRCLE \)
|
\(\CIRCLE \)
|
\(\Circle \)
| |
2010 | Model Migration | Define a transformation to migrate the activity diagrams from UML 1.4 to UML 2.2. | model migration |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
|
Topology Analysis of Dynamic Communication Systems | Compute the topologies that may occur for the merge protocol, a communication protocol which is used in car platooning. | Model synthesis |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\CIRCLE \)
| |
Ecore to GenModel | Use m2m transformation to synthesize the GenModel from the Ecore metamodel. | Model synthesis |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
| |
2009 | BPMN to BPEL Model Transformation | Define model transformations between BPMN and BPEL. | Model synthesis |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
|
Program Comprehension | Perform (1) a simple filtering query on large models, (2) a complex query on small models, resulting in control flow graph and program dependence graph. | Model synthesis |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\CIRCLE \)
| |
Leader Election | Model and validate a simple leader election protocol using graph transformation rules, verification and testing. | Model verification |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
| |
Live Challenge Problem | Model a luggage system, define a transformation from a luggage system to a statechart model and perform simulation on the statechart model. | Model synthesis and simulation |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
| |
2008 | Program Refactoring | Import the models to GXL, allow for interactive transformations, export to GXL. | Program refactoring |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
|
AntWorld | Perform a simulation of ants searching for food based on a few simple rules. | Model simulation |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\CIRCLE \)
|
\(\CIRCLE \)
| |
2007 | Don’t Get Angry; Ludo Board Game | Model and play a board game using graph transformation rules. | Various |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\CIRCLE \)
|
UML to CSP Transformation | Perform a transformation from UML activity diagrams to Communicating Sequential Processes. | Model synthesis |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\Circle \)
|
\(\Circle \)
| |
Sierpinski Triangle | Construct a Sierpinski triangle. | Construction |
\(\Circle \)
|
\(\CIRCLE \)
|
\(\Circle \)
|
\(\CIRCLE \)
|
\(\CIRCLE \)
|
5.1.3 Assessment of incremental model queries
5.2 RDF benchmarks
Benchmark | Year | Instance models | Model domain | Number of classes | Number of properties | Largest model | Number of queries | Multiple users | Workload profile | Measurement goal | Updates |
---|---|---|---|---|---|---|---|---|---|---|---|
LUBM | 2004 | Synthetic | University | 43 | 32 | 6.9M | 14 |
\(\Circle \)
| Simple queries | Inferencing performance |
\(\Circle \)
|
Barton | 2007 | Real | Library | 11 | 28 | 50M | 7 |
\(\Circle \)
| Library search | Query response time |
\(\Circle \)
|
SP2Bench | 2009 | Synthetic | DBLP | 8 | 22 | 1B+ | 12 |
\(\Circle \)
| Publication research | Query response time |
\(\Circle \)
|
BSBM | 2009 | Synthetic | e-commerce | 8 | 51 | 150B | 12 |
\(\CIRCLE \)
| Multi-user query set | Query throughput |
\(\LEFTcircle \)
|
DBpedia | 2011 | Real | DBpedia | 8 | 1200 | 300M | 25 |
\(\Circle \)
| Query set | Query throughput |
\(\Circle \)
|
LDBC SNB | 2015 | Synthetic | Social network | 19 | 27 | 1B+ | 14 |
\(\Circle \)
| Query set | Navigational pattern matching |
\(\CIRCLE \)
|
Train Benchmark | 2016 | Synthetic | Railway | 9 | 13 | 23M+ | 6 |
\(\Circle \)
| Model validation | Query and transformation response time |
\(\CIRCLE \)
|
5.3 The Train Benchmark
-
The benchmark features three distinct scenarios: Batch, Inject and Repair, each capturing a different aspect of real-world model validation scenarios. Previous publications only considered one or two scenarios.
-
In this paper, we investigate the performance of query sets. Previously, we only executed the individual queries separately.
-
Previous publications only used tools from one or two technologies. In the current paper, we assess 10 tools, taken from four substantially different technological spaces. This demonstrates that our benchmark is technology independent; thus, the results provide potentially useful feedback for different communities.
-
The workload profile follows a real-world model validation scenario by updating the model with changes derived by simulated user edits or transformations.
-
The benchmark measures the performance of both initial validation and (more importantly) incremental revalidation.
-
This cross-technology benchmark can be adapted to different model representation formats and query technologies. This is demonstrated by 10 reference implementations over four different technological spaces (EMF, graph databases, RDF and SQL) presented as part of the current paper.