1 Background
-
A problem metamodel describing the structure of problems and solutions;
-
A set of endogenous model transformations typed over the problem metamodel, called mutation operators;
-
A set of solution constraints. These are either multiplicity constraints refining the problem metamodel multiplicities or additional well-formedness constraints implemented using OCL or Java. These constraints together with the problem metamodel are used to define the solution metamodel, that is, the metamodel to which all valid problem solutions conform to;
-
A set of objective functions implemented as OCL or Java queries over solution models;
-
A valid instance of the problem metamodel, providing initial problem constraints;
-
Parameter tuning is used before starting the search and the aim is to find the ideal parameter values to start the EA with;
-
Parameter control is concerned with changing parameter values during search.
2 Running example
Importance
metric, denoting how important they are for a stakeholder, in addition to the Effort
metric, which shows the required effort for completion. The product owner is required to prioritise these tasks so that the average stakeholder importance is equally distributed across the sprints required to implement the work items in the backlog. We call this objective the Stakeholder Satisfaction Index, and we calculate it as the standard deviation of average stakeholder importance across sprints.
WorkItem
elements to a number of Sprints
with the following objectives: Objective 1 minimise the Sprint
effort deviation; Objective 2 minimise the Stakeholder Satisfaction Index. In Listings 1 and 2, the two problem objective functions are given in OCL. Objective 1 calculates the standard deviation of Sprint
Effort
. This objective is minimised to ensure that all Sprints
have close to identical Effort
values. Objective 2 first calculates the Importance
standard deviation for each Stakeholder
across all Sprints
. Then, to ensure that all stakeholders have an even Importance
distribution across all Sprints
, the objective calculates the standard deviation of all Stakeholders
Importance
distributions. This objective is minimised to prefer solutions with small standard deviations between Stakeholders Importance
distributions.WorkItem
entities must be assigned to a Sprint
; Constraint 2 no solution must have fewer Sprints
than total backlog effort divided by team velocity. In Listings 3 and 4 we give the two problem constraints in OCL. Constraint 1 counts the number of WorkItem
entities that are not assigned to a Sprint
. This constraint is equivalent with refining the metamodel multiplicity from a lower bound of 0 to a lower bound of 1 for the sprints
edge between a Plan
and Sprint
and also for the isPlannedFor
edge between a WorkItem
and a Sprint
. Constraint 2 calculates the total number of Sprints
desired using total WorkItems
Effort
and the maximum Effort
that can be delivered in a Sprint
and the ensures that there are no planned Sprints
with more Effort
than the team Effort
velocity.
Sprint
entities and assign WorkItem
elements to them, until all the WorkItem
elements belong to a Sprint
. In Fig. 4, we include the mutation operators implemented manually for this case study. Figure 4a shows the mutation operator to create a new Sprint
and assign to it a WorkItem
that has not already been assigned to another Sprint
. In Fig. 4b, we include the operator that deletes an empty Sprint
, which deletes a Sprint
that has no WorkItems
assigned to it. In Fig. 4c, we include the operator used to add WorkItems
to an existing Sprint
, ensuring that any WorkItems
allocated by this operator are not already assigned to another Sprint
. Finally, in Fig. 4d we include a mutation operator that unassigns a WorkItem
from a Sprint
and assigns it to another Sprint
. This operator can create empty Sprints
, which can then be deleted by the delete sprint operator in Fig. 4b.
3 Generating mutation operators
WorkItems
to them.3.1 Requirements on mutation operators
Sprint
and one for moving a WorkItem
from one Sprint
to another. Once all the WorkItem
elements have been assigned to a Sprint
, no more new Sprint
nodes can be created: because there are no more free WorkItem
elements, the lower-bound constraint that no Sprint should be empty can no longer be satisfied for these new Sprints
. If all the WorkItems
have initially been assigned to a small number of Sprints
, and no new Sprints
can be created, the search will be unable to find solutions that have a good average distribution of WorkItems
across the created Sprints
. Note that creating two mutation operators, one to create an empty Sprint
and one to move an existing WorkItem
to the newly created Sprint
, won’t solve this problem: until the constraint is satisfied, the search algorithm would have to include the invalid solution in the archive and then apply the required repair operator in one of the following iterations. However, if all the other population individuals are valid, they will dominate the one with the invalid Sprint
, which will be removed from the population. Generally, this problem is encountered where there are nonzero lower-bound multiplicities. In these cases, we require mutation operators to apply both edit and repair in one step.WorkItem
that has not yet been assigned to another Sprint
and assign it to the new Sprint
. In the second phase, this rule is not applicable anymore because no unassigned WorkItems
remain. However, there is an alternative repair that takes a WorkItem
from an existing Sprint with at least two WorkItem
elements assigned to it. We need to generate appropriate mutation operators for each phase of the search.3.2 General structure of MPSOs
3.3 Generation algorithm
3.3.1 Manipulating nodes
\(n=0\) | \(n = 1\) and \(m > n\) | \(n > 1\) and \(m > n\) | \(n = m\) | |
---|---|---|---|---|
\(k\,\ge \,0\) | c A | c A add n B (f#l A) | c A add n B (f#l A) | c A add n B (f#l A) |
\(l\,>\,k\) | c A lb r single B | c A lb r single B | ||
\(l<\)* | c A lb r many B | |||
\(k \ge \) 0 | c A add n B | |||
\(l = *\) | ||||
\(k = l\) | c A lb r single B | c A add n B | N/A | |
c A lb r single B | ||||
c A lb r many B |
-
NAC repair: The first type of aMPSO that we generate, is for creating nodes that have a multiplicity pattern with (\(n > 0\)). For this case, we generate a rule to create a node of type A and connect it to n existing nodes of type B. If (\(l < *\)), then a negative application condition (NAC) is added for the connected nodes B to ensure that no upper-bound multiplicity invalidations occur (no more than l nodes of type A assigned for each B). Nodes that have an open multiplicity don’t need a repair operation.
Sprint
node, that is connected to a note of type WorkItem
. The rule includes a NAC for the WorkItem
node which requires that the WorkItem
node is not already assigned to a Sprint
node.-
Single source lower bound repair: The second type of aMPSO for creating a node is for creating nodes that have a multiplicity pattern with (\(n > 0\)) and (\(l < *\)). This pattern means that A must have at least n nodes of type B assigned to it, and node B can have a limited number of nodes of type A assigned to it. We generate a rule to create a node of type A, and connect it to n nodes of type B. Then, the upper bound for the existing n nodes of type B is repaired by deleting the edges between the required n nodes of type B from a single existing node of type A and creating edges between them and the newly created node of type A. A positive application condition (PAC) is generated for the existing source node A to ensure its lower-bound multiplicity (n) is not broken after the node B used in the repair is unassigned. The multiplicity pattern for this aMPSO partially overlaps with the pattern for NAC repair, and when this is the case during the generation stage, both operators are generated.
Sprint
node, when all WorkItems
are already assigned to other Sprints
. The rule includes a PAC for the existing Sprint
node from which the WorkItem
node used for the repair is taken, to make sure that the lower-bound multiplicity is not invalidated.-
Multiple sources lower bound repair: The third type of aMPSO for creating a node is for creating nodes that have a multiplicity pattern with (\(n > 1\)) and (\(l < *\)). This pattern means that A must have at least n nodes of type B assigned to it, and node B can have a limited number of nodes of type A assigned to it. For this case, we generate a rule to create a node of type A and connect it to n nodes of type B. Then, we repair the upper bound for the existing n nodes of type B by deleting the edges between the required n nodes of type B from n existing nodes of type A and creating edges between them and the newly created node of type A. A PAC is generated for the existing nodes of type A to ensure that the lower-bound multiplicity is not broken by this operation.
-
PAC repair: The first type of aMPSO that we generate is for deleting nodes that have a closed multiplicity (\(k > 0\)). This pattern means that B must have at least k nodes of type A assigned, and each node of type A must be assigned to at least n nodes of type A.For this case, we generate a rule to delete a node of type A and for each of its connected nodes of type B, a PAC is added to ensure that no lower-bound multiplicity invalidations occur after the deletion of the A node. This rule is not generated for cases where (\(k=l\)) because nodes with multiplicity (\(k=l\)) cannot be repaired with a PAC.Example PAC repair rulesIn Fig. 8c, we include an example of this aMPSO for the SP case study, generated for deleting a
Sprint
node, that has aWorkItem
assigned to it. For this example rule, there is no PAC generated for theWorkItem
because there is no lower-bound multiplicity limit.An example of the generated aMPSO of this type for multiplicity (\(k = 2\)) is included in Fig. 10b. The aMPSO deletes node A and requires that node B still has 2 nodes of type B connected to it.For cases when (\(k = 0\)), no PAC is generated for node B, and node A is simply deleted. An example aMPSO for this scenario is shown in Fig. 10a. -
Single target lower bound repair: This type of aMPSO for deleting a node is for manipulating nodes that have a multiplicity pattern (\(k = l\) and \(k > 0\)). This pattern means that each node of type B must be assigned to k nodes of type A.For this case, we generate a repair to satisfy the lower-bound for the n nodes B, by creating edges between them and another single existing node of type A. A NAC is generated for the existing node A to ensure that the upper-bound multiplicity is not broken if (\(m < *\)). This repair ensures that after the A node to which k nodes of type B are assigned is deleted, the B nodes are assigned to another node A not to invalidate their multiplicity constraint.Example single target lower bound repair rules Figure 8d shows an example of this aMPSO for the SP case study, generated for deleting a
Sprint
node, that has aWorkItem
assigned to it. For this example rule, there is no NAC generated because there is no upper-bound multiplicity limit. -
Multiple target lower bound repair delete: This type of aMPSO is used for deleting nodes that have a multiplicity pattern with (\(k = l\)) and (\(l > 1\)). This pattern means that A must have at least n nodes of type B assigned, and each node of type B must be assigned to exactly k nodes of type A.For this case, we generate a repair to satisfy the lower bound for node B by creating edges between them and another existing n nodes of type A. If required, a NAC is generated for the existing nodes of type A to ensure that the upper-bound multiplicity is not broken if (\(m < *\)). We only generate this rule for the case where exactly n nodes of type B are attached to the A node to be deleted.Example multiple target lower bound repair delete rulesFigure 10e shows an example of this aMPSO for multiplicity pattern (\(n = 2\)), (\(m = 5\)) and (\(k = l = 2\)). The rule contains a NAC for each node of type A that is assigned a node of type B from the deleted node B. For the case when no NAC is necessary (e.g. \(m = *\)), no NAC is generated as seen in the aMPSO shown in Fig. 10e.
\(m > n\) and \(m<\) * | \(m = *\) | \(n = m \) | |
---|---|---|---|
\(k = 0\) | d A | ||
\(k > 0\) | d A (require each B still has #k A) | ||
\(l > k\) | |||
\(k=l=1\) | d A r lb sg B (f#m A) | d A r lb sg B | N/A |
\(k=l>\) 1 | d A r lb sg B (f#m A) | d A r lb sg B | N/A |
d A r lb mn B (f#m A) | d A r lb mn B |
\(m < *\)
|
\(m = *\)
|
\(n=m \)
| |
---|---|---|---|
\(l < *\)
| Add edge NAC A B | Add edge NAC B | Swap edge |
\(l = * \)
| Add edge NAC A | Add edge | Swap edge |
\(k = l\)
| Change edge (P/N A) | Change edge (P/N A) | Swap edge |
\(n = 0\)
|
\(n > 0\)
|
\(n = m\)
| |
---|---|---|---|
\(k = 0\)
| Remove edge | Remove edge PAC A | Swap edge |
\(k>\) 0 | Remove edge PAC B | Remove edge PAC AB | Swap edge |
\(k = l\)
| Change edge (P/N A) | Change edge (P/N A) | Swap edge |
3.3.2 Manipulating edges
Sprint
and a WorkItem
with a NAC, forbidding that the two nodes are already connected.Sprint
and a WorkItem
with a PAC, requiring that after the application of this rule, the Sprint
node still has at least one WorkItem
node still assigned to it, to satisfy the lower-bound multiplicity.WorkItem
and two Sprints
. The rule includes a PAC for the Sprint
element from which the WorkItem
element is unassigned to ensure that the lower-bound multiplicity of this node is not invalidated after the application of the rule.3.3.3 Iterative repair
3.3.4 Generation algorithm completeness
3.3.5 Generation algorithm limitations
3.4 Running search with aMPSOs
4 Experiments
-
RQ: What are the search quality and performance benefits of automatically generated mutation operators compared to mutation operators created manually?
4.1 Case studies
4.1.1 Class-responsibility assignment
Classes
in the ClassModel
and assign Features
to them such that: all Features
are assigned to a Class
; the model with the highest CRA index value is found. The problem has an additional constraint requiring that each Feature is assigned to only one Class
at a time.Input model | A | B | C | D | E |
---|---|---|---|---|---|
Attributes | 5 | 10 | 20 | 40 | 80 |
Methods | 4 | 8 | 15 | 40 | 80 |
Data dependencies | 8 | 15 | 50 | 150 | 300 |
Functional dependencies | 6 | 15 | 50 | 150 | 300 |
4.1.2 Scrum planning
Features
do in the CRA case, and this case study is specified as a multi-objective problem.Input model | A | B |
---|---|---|
Stakeholders | 5 | 10 |
WorkItems | 119 | 254 |
Backlog Size | 455 | 1021 |
4.1.3 Next release problem
Customer
has a desire which can consist of one or many SoftwareArtifacts
. SoftwareArtifacts
can have a recursive dependency on other SoftwareArtifacts
.SoftwareArtifacts
to a Solution
such that the total cost of the selected SoftwareArtifacts
is minimised and the total customer satisfaction is maximised.Solution
and a SoftwareArtifact
. However, the difference between this case study and the others considered in this paper is that the selection of a SoftwareArtifact
directly influences the Cost fitness function and indirectly influences the Customer Satisfaction objective. A SoftwareArtifact
is considered for the calculation of a RequirementRealization
, only when all its dependencies are also assigned to the solution. The set of evolvers manually designed for this case study uses this additional information, ensuring that a SoftwareArtifact and all its dependencies are added in a single step. In contrast, the automatically generated aMPSOs do not use any additional problem information. With this difference, we aim to evaluate how the generated rules explore the search space in cases where the fitness functions provide only coarse-granular guidance.Input model | A | B |
---|---|---|
Customers | 5 | 25 |
Requirements | 25 | 50 |
Software Artifacts | 63 | 203 |
4.2 Experiment configurations
5 Results
5.1 Class responsibility assignment
Manual | Gen aMPSO |
---|---|
Create class | Create class |
N/A | Create class Lb repair |
Assign feature | Assign feature |
Change feature | Change feature |
N/A | Remove feature |
Delete empty class | Delete class |
N/A | Delete class Lb repair |
Config | Evol | Median | Min | Max | SD | Skew | Kurt |
---|---|---|---|---|---|---|---|
Man A | 500 | 2.333 | 0.850 | 3.000 | 0.552 | -0.679 | -0.509 |
Gen A | 500 |
3.000
| 3.000 | 3.000 | 0.000 | 0 | 0 |
Man B | 500 | 1.865 | 1.238 | 3.104 | 0.514 | 0.642 | -0.032 |
Gen B | 500 |
3.167
| 1.826 | 4.083 | 0.599 | -0.470 | -0.376 |
Man C | 500 | 2.224 | 1.148 | 3.240 | 0.572 | -0.089 | 0.824 |
Gen C | 500 |
3.129
| 2.110 | 3.806 | 0.428 | -0.539 | -0.039 |
Man D | 2000 | 5.191 | 3.557 | 7.041 | 0.837 | 0.068 | 0.339 |
Gen D | 2000 |
9.863
| 7.634 | 12.273 | 1.257 | -0.176 | 0.782 |
Man E | 2500 | 11.572 | 8.879 | 14.691 | 1.639 | 0.122 | 0.663 |
Gen E | 2500 |
17.323
| 11.698 | 20.051 | 1.604 | -1.106 | -3.176 |
Class
still has at least one Feature
assigned following the application of this operator. At the same time, the Man Change Feature operator can generate an empty class upon its application, and such instances are fixed by the delete empty class operator.Class
after all the Features
have been assigned. These ensure that the search does not get stuck in local optima in cases where the Features
are assigned to too many or too few Classes
.5.2 Scrum planning
A | B | C | D | E | |
---|---|---|---|---|---|
p value | < 0.05% | < 0.05% | 0.05% | 0.05% | 0.05% |
U-value | 795 | 809.5 | 817 | 900 | 884 |
Cohen’s d | Large | Large | Large | Large | Large |
Config | Evol | Median | Min | Max | SD | RS | RSC | BSR |
---|---|---|---|---|---|---|---|---|
Man A | 1500 | 0.000 | 0.000 | 0.960 | 0.460 | 13 | 0 | 0.00 |
Gen A | 1500 |
0.959
| 0.957 | 0.995 | 0.010 | 13 | 13 |
1.00
|
Man B | 2500 | 0.492 | 0.000 | 0.996 | 0.505 | 25 | 19 |
0.76
|
Gen B | 2500 |
0.988
| 0.983 | 0.998 | 0.004 | 25 | 6 | 0.24 |
Manual | Gen aMPSO |
---|---|
Create sprint | Create sprint |
N/A | Create sprint Lb repair |
Add WorkItem | Add WorkItem |
Change WorkItem | Change WorkItem |
N/A | Remove WorkItem |
Delete empty sprint | Delete sprint |
N/A | Delete sprint Lb repair |
Sprint
and WorkItem
metamodel entities is identical to the multiplicity between Class
and Feature
, the Gen mutation operators are similar to the ones generated for the CRA case study.WorkItem
elements between Sprints
until the right configuration is found (Fig. 23).5.3 Next release problem
Config | Evol | Median | Min | Max | SD | RS | RSC | BSR |
---|---|---|---|---|---|---|---|---|
Man A | 750 | 0.791 | 0.791 | 0.791 | 0.000 | 32 | 32 | 1.00 |
Gen A | 750 | 0.791 | 0.791 | 0.791 | 0.000 | 32 | 32 | 1.00 |
Man B | 1500 |
0.718
| 0.712 | 0.722 | 0.003 | 281 | 281 |
1.00
|
Gen B | 1500 | 0.641 | 0.635 | 0.643 | 0.002 | 281 | 63 | 0.22 |
Manual | Gen aMPSO |
---|---|
Modify software artifact | N/A |
Modify SA with dependencies | N/A |
Assign highest realisation | N/A |
Fix dependencies | N/A |
N/A | Add software artifact |
N/A | Remove software artifact (PAC) |
SoftwareArtifacts
. In both situations, chances are that costs are raised without improving customer satisfaction due to the introduction of missing dependencies.SoftwareArtifact
if all of its dependencies are already part of the solution. Likewise, the removal of an artefact is only possible if no dependent artefacts are left over. The second operator allows for larger steps through the search space by adding and removing SoftwareArtifacts
together with their direct dependencies and dependent artefacts, respectively. Assign Highest Realisation
tries to exploit the fact that among Realisations
of the same Requirement
those with the highest percentage
contribute most to the customer satisfaction. Considering all Realisations
with yet unfulfilled dependencies, the operator selects the one with the highest percentage
of fulfilment and adds its missing SoftwareArtifacts
to the solution.SoftwareArtifact
or removing the dependent artefacts of a formerly removed SoftwareArtifact
.SoftwareArtifact
, together with all its dependencies, are all assigned to a Solution in a single application. At the same time, the Gen operators assign SoftwareArtifacts
, one by one, by adding or removing edges, in atomic operations. Because the Customer Satisfaction objective does not guide the solutions, unless all SoftwareArtifacts
realising a complete Realisation are assigned, Gen is slower at finding converging solutions, requiring more evolutions. However, comparing the structure of the operators, a single operation of the manual operator, which moves a SoftwareArtifact
together with all its dependencies in a single evolution, is equivalent to multiple applications of the Gen operator for adding an edge.Time | Man A | Gen A | Man B | Gen B | Man C | Gen C | Man D | Gen D | Man E | Gen E |
---|---|---|---|---|---|---|---|---|---|---|
Mean | 15.10 | 27.90 | 23.27 | 43.76 | 41.32 | 75.28 | 611.40 | 1177.70 | 2972.65 | 4298.16 |
Median | 14.92 | 27.61 | 22.04 | 44.24 | 41.07 | 75.65 | 590.75 | 1188.91 | 2869.16 | 4198.20 |
Min | 11.44 | 26.48 | 17.33 | 33.92 | 26.79 | 64.19 | 452.90 | 991.29 | 2193.35 | 3582.67 |
Max | 17.64 | 30.09 | 34.99 | 50.13 | 58.43 | 86.35 | 853.47 | 1416.96 | 4202.11 | 5189.39 |
5.4 Search operators efficiency comparison
Time | Man A | Gen A | Man B | Gen B |
---|---|---|---|---|
Mean | 119.13 | 291.25 | 484.90 | 3069.18 |
Median | 120.67 | 291.87 | 487.76 | 3016.74 |
Min | 107.63 | 261.25 | 447.16 | 2686.92 |
Max | 130.47 | 367.78 | 510.77 | 4171.07 |
Time | Man A | Gen A | Man B | Gen B |
---|---|---|---|---|
Mean | 275.42 | 223.42 | 1677.80 | 1355.29 |
Median | 274.96 | 224.27 | 1676.22 | 1348.97 |
Min | 258.84 | 215.79 | 1610.85 | 1312.45 |
Max | 307.71 | 234.52 | 1813.63 | 1412.93 |
5.5 Threats to validity
6 Related work
6.1 Model-based optimisation
7 Conclusions
-
We validated the correctness of our approach by using a suite of unit tests to check that the generation algorithm produces the expected output for each supported scenario. Formalising our approach, along with providing a correctness and completeness proof, is left for future work.
-
In the CRA case study, we saw that the startup behaviour of our generated rules differs from that of the manual rules, such that the manual rules find better solutions in early evolutions. We will study what affects this startup behaviour and how we may be able to improve our generated rules in this area. For example, it may be useful to use separate sets of rules for the two phases of the search (cf. Sect. 4.1) to ensure more focused exploration during the first phase.
-
Optimisation problems use other constraints beyond multiplicities.Arbitrary constraints are difficult to handle without additional user input; however, specific types of constraints or constraint templates can be more easily incorporated. For example, we are currently working on implementing rule generation for feature-model constraints.
-
Recursive repair offers additional opportunities for repair in MPSOs, but at the cost of higher generation effort and a larger set of search operators. Which, if any, recursive repair strategies offer benefits to the overall search?
-
Some problems, including some of the examples discussed in this paper, may be solvable by optimising constraint solvers. However, constraint solvers work off a relatively low-level problem encoding, making them more difficult to use. We are interested in understanding if, and under what circumstances, it is possible to translate automatically from our high-level problem representation into an encoding that can be efficiently solved by a constraint solver.