## 1 Preliminaries

### 1.1 Case studies

`Region`

s, which in turn contain states (called `Vertex`

es) and `Transition`

s. The abstract state `Vertex`

is further refined into `RegularStates`

(like `State`

or `FinalState`

) and `PseudoState`

s (like `Entry`

, `Exit`

or `Choice`

).`WT`

is composed of different `ControlUnit`

s as building blocks (each realizing a separate control algorithm), organized in a containment hierarchy of `Subsystem`

s (and `MainSubsystem`

s). Such control units (and, by virtue of containment, the subsystems themselves as well) consume `input`

s and produce `output`

s; such signals, from the point of view of the entire system, may be classified as `SystemInput`

s, `SystemOutput`

s or internal variables (`SystemVariable`

s). Control units are further characterized by a number of additional features including attributes such as `cycle`

and cross-references to elements such as `SystemTimer`

s.^{1}) necessitate the storage of additional meta-information: There is a concept of ownership, where certain classes of specialists are assigned responsibility over of a

`Subsystem`

(and everything contained therein); while certain subsystems may express protected intellectual property (IP) and must be hidden from such specialist contractors. We have opted to extend the original metamodel to be able to seamlessly express this information in our experiments; hence, the new root container object `AuthorizedSystem`

, the class `Ownership`

and their various features were introduced. Since IKERLAN has opted not to disclose the list of their actual proprietary control unit types, we have combined all of them into a single class `ControlUnit`

as a further deviation from the original. The reader is encouraged to compare Fig. 3 with the original metamodel diagram in [23].### 1.2 Metamodels and instance models

- The interpretation of a unary predicate symbol \({\texttt {C}}_{i}\) is defined in accordance with the types of the EMF model: \(\mathcal {I}_{M}({\texttt {C}}_{i}): { Obj _{M}} \rightarrow {\{1,0\}}\). An object \(o \in Obj _{M}\) is an instance of (more precisely, conforms to) a class \({\texttt {C}}_{i}\) in a model M if \(\mathcal {I}_M({\texttt {C}}_{i})(o) = 1\). It is possible for an object to conform to multiple types, e.g., in case of inheritance or abstract classes. In EMF, it is required that each object conforms to at least one non-abstract type, and all of its supertypes. However, by stating that all supertypes and subtypes are listed, it can also be applied to more general type systems.
- The interpretation of a binary predicate symbol \({\texttt {R}}_{j}\) is defined in accordance with the links in the EMF model: \(\mathcal {I}_{M}({\texttt {R}}_{j}): { Obj _{M} \times Obj _{M}} \rightarrow {\{1,0\}}\). There is a reference \({\texttt {R}}_j\) between \(o_1, o_2 \in Obj _{M}\) in model M if \(\mathcal {I}_M({\texttt {R}}_{j})(o_1,o_2) = 1\).
- The interpretation of a predicate symbol \({\texttt {P}}_k\) is discussed in Sect. 2.3: \(\mathcal {I}_{M}({\texttt {P}}_{k}): { Obj _{M}^{\alpha ({\texttt {P}}_{k})}} \rightarrow {\{1,0\}}\). There is a match on objects \(o_1,o_2,\ldots ,o_{\alpha ({\texttt {P}}_{k})} \in Obj _{M}\) if \(\mathcal {I}_M({\texttt {P}}_{k})(o_1,o_2,\ldots ,o_{\alpha ({\texttt {P}}_{k})}) = 1\).

`Region`

s and inverse (opposite) relations `incoming`

`Transitions`

and `outgoingTransitions`

are excluded from the diagram. In \(M_1\) there are two `State`

s (s1 and s2), which are connected to a loop via `Transition`

s t2 and t3. The initial state is marked by a `Transition`

t1 from an entry e1 to state s1. \(M_2\) describes a similar statechart with three states in loop (s3, s4 and s5 connected via t5, t6 and t7). Finally, in \(M_3\) there are two main differences: there is an incoming `Transition`

t11 to an `Entry`

state (e3), and there is a `State`

s7 that does not have outgoing transition. Additional constraints expressed in the metamodel, such as each transition having at most one source and one target, are also satisfied by the above graphs; therefore, they can be considered valid instance models. While all these instance models are non-isomorphic, later we illustrate why they are not diverse.### 1.3 Graph predicates

`and`

, `or`

, `not`

, `collect`

, `select`

, `exists`

, `forall`

, `includes`

, `excludes`

, but excluding expressions like `size`

, `min`

or `max`

). In our current implementation, we used the graph pattern language of Viatra [58, 60], where error patterns describe malformedness of the model, and derived logic predicates in accordance with [47].- \(\varphi : incomingToEntry (E):= \exists T: {\texttt {Entry}}(E) \wedge {\texttt {target}}(T,E)\)
- \(\phi : noOutgoingFromEntry (E):= {\texttt {Entry}}(E) \wedge \lnot (\exists T: {\texttt {source}}(T,E))\)

### 1.4 Motivation: testing of DSL tools

### 1.5 Motivation: testing model-based access control policies

^{2}.

## 2 Shape-based model diversity metrics

### 2.1 Structural diversity of models

`Entry`

with an outgoing `Transition`

(Structure 1) once. However, \(M_3\) contains also an `Entry`

state with both incoming and outgoing `Transition`

s (Structure 2), which can distinguish model \(M_3\) from \(M_1\) and \(M_2\). Therefore, if a design flaw (like a missing well-formedness constraint) is characterized with this structure, it can be detected with model \(M_3\), but not with models like \(M_1\) and \(M_2\).`State`

s with both incoming and outgoing `Transition`

s, which occurs 2, 3 and 1 times in models \(M_1\), \(M_2\) and \(M_3\), respectively. Structure 4 describes only incoming `Transition`

s, and the occurrences of this structure increase to 2 in model \(M_3\). This increase indicates a `State`

without any outgoing `Transition`

s (i.e., deadlocks), which may describe a challenging scenario. On the other hand, the number of occurrences of Structures 3 and 4 cannot distinguish models \(M_1\) and \(M_2\), which indicates that they are similar with respect to those structures.### 2.2 Neighborhood shapes of graphs

`State`

class into subclasses such as `InitialState`

, `IntermediateState`

, `TrapState`

, etc. based on the existence or nonexistence of incoming and outgoing `source`

and `target`

edges.- As a base case, for range \(i=0\), the neighborhood is an empty set: \({ Nbh }_{0}=\emptyset \).
- For ranges \(i>0\) neighborhood descriptors are build upon the description of references \({ Rel }_{i}\), which consists of predicate symbols \({\texttt {P}} \in \Sigma \), parameter indexes of \({\texttt {P}}\) and \(\alpha ({\texttt {P}})-1\) neighborhoods. Therefore, \({ Rel }_{i} \subseteq \Sigma \times \mathbb {N} \times \bigcup _{{\texttt {P}} \in \Sigma } { Nbh }_{i-1}^{\alpha ({\texttt {P}})-1}\).
- For range \(i>0\) the neighborhood of an object can be formalized as \({ Nbh }_{i} = { Nbh }_{i-1} \times 2^{{ Rel }_{i}}\), where in a tuple \(\langle n, r \rangle \)n is the neighborhood of the object with range \(i-1\) and r describes the set of its relations.

`Entri`

es are both mapped to the same neighborhood n1, while e3 can be distinguished from them as it has incoming reference from a transition, thus creating a different neighborhood n5.### 2.3 Shape-based model diversity

### 2.4 Distance metrics for model diversity

- d is non-negative: \(d(M_j,M_k) \ge 0\)
- d is symmetric \(d(M_j,M_k) = d(M_k, M_j)\)
- if \(M_j\) and \(M_k\) are isomorphic, then \(d(M_j,M_k) = 0\)
- triangle inequality: \(d(M_j,M_l) {\le }d(M_k, M_j) {+} d(M_j, M_l)\)

### 2.5 Diversity-based model ordering for test case prioritization

### 2.6 From instance model to shaping

## 3 Evaluation

### 3.1 Target domains and artifacts under test

Case Study | ST | WT |
---|---|---|

DSL style | Behavioral | Structural |

AUT | WF constraints | Access policy |

Inputs versus WF | Arbitrary | Consistent |

Mutated | Patterns | Everything else |

Extra predicates | – (black-box) | Optional |

^{3}), we used WT system models conforming to the metamodel illustrated in Fig. 3, containing 15 classes, 18 references, two relevant attributes, which is extended by two classes and an enum type to encode access control metadata. For this case study, we also formalized 10 WF constraints. Additionally, we formalized 25 access control rules using a further 18 graph patterns to define a complex access control policy motivated by the literature. This access control policy serves as the AUT; i.e., we measured the ability of model generators to reveal bugs in the policy.

### 3.2 Compared model generation approaches

- First, we generated models with our Viatra Solver [48] (denoted with VS), a state-of-the-art scalable graph model generator. For WT case study, we also configured the generator to satisfy WF constraints, so that all generated models are consistent. We used the generator in three configurations:
- VS/All: This configuration is able to generate all consistent models and non-isomorphic model for a specific scope (number of objects).
- VS+ i: This incremental configuration reuses the search space between each model generation steps. The solver is configured to introduce at least 3 new node shapes in each steps, using neighborhood abstraction for range 1.
- VS-i: After finding a valid model, this non-incremental configuration restarts the search. Similarly, the solver is configured to introduce at least 3 new node shapes in each steps, using neighborhood abstraction for range 1.

- Next, we generated models for the same domains using the Alloy Analyzer [56], a well-known SAT-based relational model finder, used as a model generator for several model generation scenarios. For representing EMF metamodels, we used traditional encoding techniques [13, 47]. We used the latest 4.2 build for Alloy with the default Sat4j [34] as back-end solver. We artificially prevented the deterministic run of the solver to enable statistical analysis by adding a random amount of additional true statements. To enforce model diversity, Alloy was configured with two different setups for symmetry-breaking predicates. All other configuration options were set to default.
- A/20: default configuration. For greater values, the tool produced the same set of models.
- A/0: minimal value for symmetry breaking. Between 0 and 20, lower values produced better models with respect to mutation scores.

- We used the EMF random instantiator [5] (REMF) to produce random models for the domains. REMF does not enforce WF constraints (i.e., the generated models will typically violate some of the constraints), and it produces models which are at least as large as the designated specified size (so the designated size is the minimum size).
- Finally, for the ST case study, we also included 1250 manually created statechart models in our analysis (marked by Human). The models were created by students as solutions for similar (but not identical) statechart modeling homework assignments [61] representing real models which were not intentionally created for testing purposes. We did not have such manually created models for the WT case study.

### 3.3 Measurement setup

- VS/All: We generated the complete set of models of a certain size using VS in the order the generator produces the models (this default ordering of models is denoted by VS/Solver). For case study WT, we generated all 3483 well-formed models with five objects; for case study ST, we generated all 4606 models with seven objects. Therefore, in this measurement, all non-isomorphic models are available to fairly evaluate the efficiency of different metrics.
- REMF: We generated the same number of models of same size using REMF as random generator. REMF does not guarantee the well-formedness of the models, and isomorphic models may be constructed multiple times.

### 3.4 RQ1: external diversity for test prioritization

- Shape, CosDist, SymmDiffDist, illustrate orderings with respect to Coverage order, Distance order with respect to Cosine distance, and Distance order with respect to Symmetric Difference.
- \(+\) P and − P denotes whether or not the shape abstraction aggregates extra predicates (see Sect. 3.6) in addition to the graph structure. +P is available only in case study WT, where such extra predicates represent the graph patterns underlying the access control rules (white-box testing).
- Solver denotes the unchanged order in which VS originally produced the models.
- Random shows the average score of 50 random shufflings of the test suite.
- Optimal shows an optimal ordering of test models with respect to mutation score; i.e., for each test case, we select the model that increases the score of the test suite the most.

- Random ordering was always better than the order which VS+i produced the models. This is in accordance with the experience reported in [31].
- Surprisingly, complete mutation coverage could be achieved very quickly with optimal ordering (1–4 models). However, with the exception of the random sampling example of WT, at least one of our metrics was also able to provide complete coverage with fewer than ten models.
- Finally, in all cases, there was at least one test ordering heuristic which significantly outperformed both the original order and the random order, using one or two orders of magnitude fewer models.

Case study | Test suite | Order | 95% | 100% |
---|---|---|---|---|

ST | Shape-P | 7 | 19 | |

All | CosDist-P | 6 | 9 | |

SymDiffDist-P | 7 | 19 | ||

(4606 models) | Solver | 118 | 420 | |

Random | 12 | 202 | ||

Optimal | 2 | 2 | ||

Randomly | Shape-P | 3 | 3 | |

generated | CosDist-P | 2 | 8 | |

SymDiffDist-P | 5 | 5 | ||

(4700 models) | Random | 16 | 36 | |

Optimal | 1 | 1 | ||

WT | Shape+P | 10 | 10 | |

CosDist+P | 9 | 9 | ||

SymDiffDist+P | 16 | 16 | ||

All | Shape-P | 9 | 50 | |

CosDist-P | 25 | 25 | ||

(3480 models) | SymDiffDist-P | 23 | 23 | |

Solver | 1363 | 1381 | ||

Random | 39 | 208 | ||

Optimal | 2 | 3 | ||

Shape+P | 1135 | 1135 | ||

CosDist+P | 230 | 308 | ||

Randomly | SymDiffDist+P | 118 | 262 | |

generated | Shape-P | 1206 | 1206 | |

CosDist-P | 111 | 151 | ||

(3500 models) | SymDiffDist-P | 1129 | 1129 | |

Random | 330 | 1196 | ||

Optimal | 3 | 4 |

### 3.5 RQ2: test input generation techniques.

- From all generator VS-i produced the best mutation score, and combined the high initial mutation score of VS+i with the steepness of REMF.
- REMF provided good mutation score, but it dominantly generated malformed instances.
- From the three generators, A found the least amount of mutant.
- The significant difference in VS-i and VS+i in WT shows that restarting the solver had larger effect on the diversity heuristic (at least at the early stage).

### 3.6 RQ3: measurement results and analysis

Model set | ST | WT |
---|---|---|

Autogenerated WF models | 0.95 | 0.89 |

VS | \(-\) 0.01 | \(-\) 0.05 |

A | 0.55 | 0.73 |

REMF | 0.54 | \(-\) 0.16 |

Human | 0.12 |

- On both case studies, automatically generated valid model sets (A/0, A/20, VS-i, VS+i and REMF in case of ST) showed high correlation of 0.95 and 0.89.
- VS-i and VS+i on itself does not show high correlation as all models have high diversity and mutation score.
- On the other hand, A generated both highly symmetric and diverse models, and internal diversity showed fairly good correlation in both case study (0.55 and 0.73).
- REMF produced mixed results: for ST it showed fairly good correlation, but in case of WT there was no correlation: all model had high diversity, but the mutation score was fairly low. This is because WT case study expected well-formed models, and REMF models was mostly ill-formed.
- Finally, Human models showed low correlation between mutation score and diversity, and it seams that those properties were independent.
- Generally speaking, A produced the lowest, Human and REMF medium, and VS high mutation score and diversity for single models.