1 Introduction
-
Search budget distribution: If a coverage goal is infeasible, then all search effort to try to cover it would be wasted (except for any other coverage goals accidentally covered during the search). Unfortunately, determining whether a goal is feasible or not is an undecidable problem. If a coverage goal is trivial, then it will typically be covered by the first random input. Given a set of coverage goals and an overall available budget of computational resources (e.g., time), how to assign a search budget to the individual goals to maximize the overall coverage?
-
Coverage goal ordering: Unless some smart strategies are designed, the search for each coverage goal is typically independent, and potentially useful information is not shared between individual searches. For example, to cover a nested branch one first needs to cover its parent branch, and test data for the latter could be used to help the search for the nested branch (instead of starting from scratch). In this regard, the order in which coverage goals are sought can have a large impact on final performance.
-
Even if the whole test suite approach covers more goals, those are not necessarily going to be a superset of those that the traditional approach would cover. Is the higher coverage due to more easy goals being covered? Is the coverage of other goals adversely affected? Are there goals that the traditional one goal at a time approach can cover and the whole approach can not? Although higher coverage might lead to better regression test suites, for testing purposes some coverage goals might be more “valuable” than others. So, from a practical point of view, preferring the whole test suite approach over the traditional one may not necessarily lead to improvement in testing effectiveness.
-
When generating individual tests, once a coverage goal is satisfied, it is no longer involved in test generation, and resources are invested only on uncovered goals. In whole test suite generation, all goals, including those already covered during the search, are part of the optimization until the search ends. For example, after mutation in the genetic algorithm a test suite may cover a new branch, but if the mutation meant that two already covered goals are “lost” by that test suite, the fitness evaluation would not consider this new test an improvement. Does this affect whole test suite optimization in practice? An easy solution to overcome this problem would be to keep a test “archive” for the already covered goals, and focus the search only on those goals not yet covered.
2 Background
3 Whole Test Suite Generation
3.1 Generating Tests for Individual Coverage Goals
-
New test cases are only generated for goals that have not already been covered through collateral coverage of previously created test cases. However, we do not evaluate the collateral coverage of all individuals during the search, as this would add a significant overhead, and it is not clear what effects this would have given the fixed timeout we used in our experiments.
-
When applying the one goal at a time approach, a possible improvement could be to use a seeding strategy (Wegener et al. 2001). During the search, we could store the test data that have good fitness values on coverage goals that are not covered yet. These test data can then be used as starting point (i.e., for seeding the first generation of a genetic algorithm) in the successive searches for those uncovered goals. However, we decided not to implement this, as Wegener et al. (2001) do not provide sufficient details to reimplement the technique, and there is no conclusive data regarding several open questions; for example, potentially a seeding strategy could reduce diversity in the population, and so in some cases it might in fact reduce the overall performance of the search algorithm.
-
The order in which coverage goals are selected might also influence the result. As in the literature usually no order is specified (e.g., Tonella 2004; Harman et al. 2010), we selected the branches in random order. However, in the context of procedural code approaches to prioritize coverage goals have been proposed, e.g., based on dynamic information (Wegener et al. 2001). However, the goal of this paper is neither to study the impact of different orders, nor to adapt these prioritization techniques to object-oriented code.
-
In practice, when applying a single goal strategy, one might also bootstrap an initial random test suite to identify the trivial test goals, and then use a more sophisticated technique to address the difficult goals; here, a difficult, unanswered question is when to stop the random phase and start the search.
3.2 Whole Test Suite Generation
3.3 Archive-based Whole Test Suite Generation
4 Empirical Study
4.1 Experimental Setup
-
(a) The one goal at a time approach (OneGoal )
-
(b) The whole test suite approach (Whole )
-
(c) The archive-based whole test suite approach (Archive)
4.2 RQ1: Are There Coverage Goals in Which OneGoal Performs Better than Whole ?
Project | Class | OneGoal | Whole |
\(\hat {A}_{12}\)
|
p-value |
---|---|---|---|---|---|
13_jdbacl | AbstractTableMapper | 0.20 | 0.71 | 0.97 | <0.001 |
18_jsecurity | IniResource | 0.11 | 0.73 | 0.99 | <0.001 |
18_jsecurity | ResourceUtils | 0.26 | 0.80 | 1.00 | <0.001 |
18_jsecurity | DefaultWebSecurityManager | 0.02 | 0.42 | 1.00 | <0.001 |
21_geo-google | AddressToUsAddressFunctor | 0.00 | 0.55 | 0.99 | <0.001 |
24_saxpath | XPathLexer | 0.10 | 0.70 | 1.00 | <0.001 |
32_httpanalyzer | ScreenInputFilter | 0.58 | 0.83 | 0.77 | <0.001 |
35_corina | TRML | 0.00 | 0.20 | 1.00 | <0.001 |
35_corina | SiteListPanel | 0.00 | 0.00 | 0.94 | <0.001 |
43_lilith | EventIdentifier | 0.98 | 1.00 | 0.53 | <0.001 |
43_lilith | LogDateRunnable | 0.36 | 0.59 | 0.69 | <0.001 |
44_summa | FacetMapSinglePackedFactory | 0.01 | 0.46 | 0.99 | <0.001 |
45_lotus | Phase | 0.97 | 1.00 | 0.54 | <0.001 |
46_nutzenportfolio | KategorieDaoService | 0.00 | 0.16 | 1.00 | <0.001 |
54_db-everywhere | Select | 0.00 | 0.08 | 0.95 | <0.001 |
58_fps370 | MouseMoveBehavior | 0.08 | 0.53 | 0.99 | <0.001 |
61_noen | DataType | 0.00 | 1.00 | 1.00 | <0.001 |
61_noen | WatchDog | 0.22 | 0.54 | 0.87 | <0.001 |
62_dom4j | STAXEventReader | 0.00 | 0.01 | 1.00 | <0.001 |
62_dom4j | CloneHelper | 0.77 | 1.00 | 0.82 | <0.001 |
62_dom4j | PerThreadSingleton | 0.66 | 0.85 | 0.63 | <0.001 |
66_openjms | SecurityConfigurationDescriptor | 0.27 | 0.68 | 0.98 | <0.001 |
66_openjms | And | 0.90 | 1.00 | 0.68 | <0.001 |
66_openjms | BetweenExpression | 0.23 | 0.87 | 0.98 | <0.001 |
69_lhamacaw | CategoryStateEditor | 0.00 | 0.08 | 0.95 | <0.001 |
70_echodep | PackageDissemination | 0.00 | 0.09 | 0.99 | <0.001 |
74_fixsuite | ListView | 0.06 | 0.10 | 0.71 | <0.001 |
75_openhre | User | 0.19 | 0.98 | 1.00 | <0.001 |
75_openhre | HL7SegmentMapImpl | 0.99 | 1.00 | 0.51 | <0.001 |
78_caloriecount | BudgetWin | 0.01 | 0.12 | 0.99 | <0.001 |
78_caloriecount | ArchiveScanner | 0.01 | 0.63 | 1.00 | <0.001 |
78_caloriecount | RecordingEvent | 0.75 | 1.00 | 0.96 | <0.001 |
78_caloriecount | BlockThread | 0.46 | 0.82 | 0.93 | <0.001 |
79_twfbplayer | BattlefieldCell | 0.00 | 0.36 | 0.95 | <0.001 |
80_wheelwebtool | Block | 0.08 | 0.81 | 0.98 | <0.001 |
84_ifx-framework | ChkOrdInqRs_Type | 0.92 | 1.00 | 0.69 | <0.001 |
84_ifx-framework | PassbkItemInqRs_Type | 0.98 | 1.00 | 0.52 | <0.001 |
85_shop | JSListSubstitution | 0.90 | 0.97 | 0.59 | <0.001 |
88_jopenchart | InterpolationChartRenderer | 0.00 | 0.05 | 0.96 | <0.001 |
89_jiggler | SignalCanvas | 0.19 | 0.97 | 1.00 | <0.001 |
89_jiggler | ImageOutputStreamJAI | 0.05 | 0.79 | 0.99 | <0.001 |
89_jiggler | LocalDifferentialGeometry | 0.03 | 0.32 | 0.99 | <0.001 |
92_jcvi-javacommon | PhdFileDataStoreBuilder | 0.21 | 0.80 | 0.99 | <0.001 |
93_quickserver | SimpleCommandSet | 0.66 | 0.83 | 0.65 | <0.001 |
93_quickserver | AuthStatus | 0.12 | 0.33 | 0.80 | <0.001 |
Average | 0.63 | 0.78 | 0.67 |
Project | Class | OneGoal | Whole |
\(\hat {A}_{12}\)
|
p-value |
---|---|---|---|---|---|
13_jdbacl | H2Util | 0.64 | 0.81 | 0.62 | <0.001 |
13_jdbacl | AbstractTableMapper | 0.22 | 0.68 | 0.99 | <0.001 |
18_jsecurity | IniResource | 0.20 | 0.80 | 0.99 | <0.001 |
18_jsecurity | ResourceUtils | 0.50 | 0.97 | 0.99 | <0.001 |
18_jsecurity | DefaultWebSecurityManager | 0.04 | 0.43 | 1.00 | <0.001 |
21_geo-google | AddressToUsAddressFunctor | 0.00 | 0.53 | 0.99 | <0.001 |
24_saxpath | XPathLexer | 0.13 | 0.86 | 1.00 | <0.001 |
32_httpanalyzer | ScreenInputFilter | 0.56 | 0.92 | 0.94 | <0.001 |
35_corina | TRML | 0.01 | 0.43 | 0.99 | <0.001 |
35_corina | SiteListPanel | 0.00 | 0.00 | 0.96 | <0.001 |
44_summa | FacetMapSinglePackedFactory | 0.24 | 0.49 | 0.95 | <0.001 |
46_nutzenportfolio | KategorieDaoService | 0.01 | 0.19 | 1.00 | <0.001 |
54_db-everywhere | Select | 0.04 | 0.29 | 1.00 | <0.001 |
57_hft-bomberman | RoundTimeOverMsg | 0.46 | 0.66 | 0.65 | <0.001 |
58_fps370 | MouseMoveBehavior | 0.12 | 0.66 | 0.99 | <0.001 |
58_fps370 | Teder | 0.12 | 0.33 | 0.80 | <0.001 |
6_jnfe | CST_COFINS | 0.23 | 0.57 | 0.95 | <0.001 |
61_noen | DataType | 0.00 | 1.00 | 1.00 | <0.001 |
61_noen | WatchDog | 0.27 | 0.55 | 0.91 | <0.001 |
62_dom4j | STAXEventReader | 0.00 | 0.01 | 1.00 | <0.001 |
62_dom4j | CloneHelper | 0.11 | 0.38 | 0.99 | <0.001 |
62_dom4j | PerThreadSingleton | 0.67 | 0.86 | 0.83 | <0.001 |
63_objectexplorer | LoggerFactory | 0.64 | 0.80 | 0.60 | <0.001 |
66_openjms | SecurityConfigurationDescriptor | 0.30 | 0.66 | 1.00 | <0.001 |
66_openjms | And | 0.94 | 1.00 | 0.66 | <0.001 |
66_openjms | BetweenExpression | 0.84 | 0.99 | 0.86 | <0.001 |
70_echodep | PackageDissemination | 0.00 | 0.12 | 1.00 | <0.001 |
74_fixsuite | ListView | 0.53 | 0.55 | 0.51 | <0.001 |
75_openhre | User | 0.27 | 0.98 | 1.00 | <0.001 |
75_openhre | HL7SegmentMapImpl | 0.99 | 1.00 | 0.52 | <0.001 |
78_caloriecount | BudgetWin | 0.03 | 0.26 | 0.99 | <0.001 |
78_caloriecount | ArchiveScanner | 0.02 | 0.68 | 1.00 | <0.001 |
78_caloriecount | RecordingEvent | 0.70 | 0.97 | 1.00 | <0.001 |
78_caloriecount | BlockThread | 0.28 | 0.62 | 0.91 | <0.001 |
79_twfbplayer | BattlefieldCell | 0.04 | 0.41 | 0.92 | <0.001 |
79_twfbplayer | CriticalHit | 0.81 | 0.99 | 0.60 | <0.001 |
80_wheelwebtool | Block | 0.04 | 0.70 | 0.99 | <0.001 |
83_xbus | ByteArrayConverterAS400 | 0.00 | 0.06 | 0.97 | <0.001 |
84_ifx-framework | ChkOrdInqRs_Type | 0.80 | 1.00 | 0.83 | <0.001 |
84_ifx-framework | PassbkItemInqRs_Type | 0.92 | 0.99 | 0.62 | <0.001 |
85_shop | JSListSubstitution | 0.97 | 0.98 | 0.47 | 0.019 |
88_jopenchart | InterpolationChartRenderer | 0.00 | 0.02 | 0.97 | <0.001 |
89_jiggler | SignalCanvas | 0.13 | 0.89 | 1.00 | <0.001 |
89_jiggler | ImageOutputStreamJAI | 0.09 | 0.91 | 0.99 | <0.001 |
89_jiggler | LocalDifferentialGeometry | 0.11 | 0.48 | 0.99 | <0.001 |
92_jcvi-javacommon | PhdFileDataStoreBuilder | 0.29 | 0.80 | 0.99 | <0.001 |
93_quickserver | SimpleCommandSet | 0.75 | 0.87 | 0.67 | <0.001 |
93_quickserver | AuthStatus | 0.02 | 0.16 | 0.91 | <0.001 |
Average | 0.62 | 0.78 | 0.69 |
Project | Class | OneGoal | Whole |
\(\hat {A}_{12}\)
|
p-value |
---|---|---|---|---|---|
13_jdbacl | H2Util | 0.78 | 0.93 | 0.67 | <0.001 |
13_jdbacl | AbstractTableMapper | 0.13 | 0.67 | 0.99 | <0.001 |
18_jsecurity | Md2CredentialsMatcher | 0.00 | 1.00 | 1.00 | <0.001 |
18_jsecurity | IniResource | 0.12 | 0.70 | 0.99 | <0.001 |
18_jsecurity | ResourceUtils | 0.23 | 0.84 | 1.00 | <0.001 |
18_jsecurity | DefaultWebSecurityManager | 0.06 | 0.46 | 0.99 | <0.001 |
21_geo-google | AddressToUsAddressFunctor | 0.00 | 0.63 | 1.00 | <0.001 |
21_geo-google | PremiseNumberSuffix | 0.00 | 1.00 | 1.00 | <0.001 |
24_saxpath | XPathLexer | 0.08 | 0.70 | 1.00 | <0.001 |
27_gangup | MapCell | 0.00 | 1.00 | 1.00 | <0.001 |
32_httpanalyzer | ScreenInputFilter | 0.50 | 0.86 | 0.92 | <0.001 |
35_corina | TRML | 0.00 | 0.22 | 0.99 | <0.001 |
35_corina | SiteListPanel | 0.00 | 0.00 | 0.94 | <0.001 |
43_lilith | EventIdentifier | 0.70 | 1.00 | 1.00 | <0.001 |
43_lilith | LogDateRunnable | 0.26 | 0.55 | 0.75 | <0.001 |
44_summa | FacetMapSinglePackedFactory | 0.22 | 0.49 | 0.94 | <0.001 |
45_lotus | Phase | 0.25 | 1.00 | 1.00 | <0.001 |
46_nutzenportfolio | KategorieDaoService | 0.00 | 0.09 | 1.00 | <0.001 |
52_lagoon | Wildcard | 0.66 | 0.99 | 1.00 | <0.001 |
54_db-everywhere | Select | 0.00 | 0.04 | 1.00 | <0.001 |
57_hft-bomberman | RoundTimeOverMsg | 0.00 | 1.00 | 1.00 | <0.001 |
58_fps370 | MouseMoveBehavior | 0.02 | 0.39 | 1.00 | <0.001 |
58_fps370 | Teder | 0.00 | 0.50 | 1.00 | <0.001 |
6_jnfe | CST_COFINS | 0.55 | 0.88 | 0.95 | <0.001 |
61_noen | WatchDog | 0.43 | 0.60 | 0.85 | <0.001 |
61_noen | MeasurementReport | 0.00 | 1.00 | 1.00 | <0.001 |
61_noen | OperationResult | 0.00 | 1.00 | 1.00 | <0.001 |
62_dom4j | STAXEventReader | 0.00 | 0.00 | 1.00 | <0.001 |
62_dom4j | CloneHelper | 0.10 | 0.64 | 0.99 | <0.001 |
62_dom4j | PerThreadSingleton | 0.28 | 0.69 | 0.87 | <0.001 |
63_objectexplorer | LoggerFactory | 0.00 | 1.00 | 1.00 | <0.001 |
66_openjms | SecurityConfigurationDescriptor | 0.12 | 0.54 | 1.00 | <0.001 |
66_openjms | And | 0.46 | 0.98 | 1.00 | <0.001 |
66_openjms | BetweenExpression | 0.22 | 0.93 | 0.99 | <0.001 |
69_lhamacaw | CategoryStateEditor | 0.00 | 0.04 | 0.89 | <0.001 |
70_echodep | PackageDissemination | 0.00 | 0.13 | 1.00 | <0.001 |
74_fixsuite | ListView | 0.09 | 0.09 | 0.54 | <0.001 |
75_openhre | User | 0.17 | 0.97 | 1.00 | <0.001 |
75_openhre | HL7SegmentMapImpl | 0.45 | 0.98 | 1.00 | <0.001 |
78_caloriecount | BudgetWin | 0.03 | 0.15 | 0.94 | <0.001 |
78_caloriecount | ArchiveScanner | 0.01 | 0.62 | 1.00 | <0.001 |
78_caloriecount | RecordingEvent | 0.52 | 0.98 | 1.00 | <0.001 |
78_caloriecount | BlockThread | 0.34 | 0.83 | 0.99 | <0.001 |
79_twfbplayer | BattlefieldCell | 0.03 | 0.38 | 0.96 | <0.001 |
79_twfbplayer | CriticalHit | 0.63 | 0.97 | 0.88 | <0.001 |
80_wheelwebtool | Block | 0.00 | 0.84 | 0.99 | <0.001 |
80_wheelwebtool | JSONStringer | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | ChkAcceptAddRs_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | ChkInfo_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | ChkOrdInqRs_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | CreditAdviseRs_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | DepAcctStmtInqRq_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | EMVCardAdviseRs_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | ForExDealMsgRec_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | PassbkItemInqRs_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | RecPmtCanRq_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | StdPayeeId_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | SvcAcctStatus_Type | 0.00 | 1.00 | 1.00 | <0.001 |
84_ifx-framework | TINInfo_Type | 0.00 | 1.00 | 1.00 | <0.001 |
85_shop | JSListSubstitution | 0.22 | 0.88 | 0.99 | <0.001 |
86_at-robots2-j | RobotScoreKeeper | 0.69 | 1.00 | 1.00 | <0.001 |
88_jopenchart | InterpolationChartRenderer | 0.00 | 0.02 | 0.91 | <0.001 |
89_jiggler | SignalCanvas | 0.37 | 0.92 | 0.99 | <0.001 |
89_jiggler | ImageOutputStreamJAI | 0.07 | 0.82 | 0.99 | <0.001 |
89_jiggler | LocalDifferentialGeometry | 0.03 | 0.32 | 0.99 | <0.001 |
9_falselight | falselight | 0.00 | 1.00 | 1.00 | <0.001 |
92_jcvi-javacommon | PhdFileDataStoreBuilder | 0.17 | 0.77 | 1.00 | <0.001 |
93_quickserver | SimpleCommandSet | 0.00 | 0.91 | 1.00 | <0.001 |
93_quickserver | AuthStatus | 0.00 | 0.16 | 1.00 | <0.001 |
Average | 0.36 | 0.77 | 0.83 |
Criterion | # of Targets | Statistically at 0:05 | Never Covered by the Other | |
---|---|---|---|---|
Branch Coverage | Whole is better: | 1410 | 1239 | 385 |
Equivalent: | 717 | |||
OneGoal is better: | 255 | 3 | 0 | |
Total: | 2382 | |||
Line Coverage | Whole is better: | 2292 | 1978 | 468 |
Equivalent: | 1457 | |||
OneGoal is better: | 62 | 4 | 2 | |
Total: | 3811 | |||
Weak Mutation Testing | Whole is better: | 9335 | 8634 | 3202 |
Equivalent: | 3137 | |||
OneGoal is better: | 1 | 1 | 1 | |
Total: | 12473 |
4.3 RQ2: How Many Coverage Goals Found by Whole Get Missed by OneGoal ?
4.4 RQ3: Which Factors Influence the Relative Performance of Whole and OneGoal ?
Criterion | Property |
r
| Confidence interval | p-value |
τ
|
p-value |
ρ
|
p-value |
---|---|---|---|---|---|---|---|---|
Branch Coverage
| Whole vs. OneGoal | 0.53 | [0.50, 0.55] | ≤0.001 | 0.50 | ≤0.001 | 0.62 | ≤0.001 |
OneGoal coverage | −0.38 | [-0.41, -0.34] | ≤0.001 | −0.02 | 0.117 | −0.06 | 0.007 | |
# of branches | 0.42 | [0.39, 0.45] | ≤0.001 | 0.34 | ≤0.001 | 0.46 | ≤0.001 | |
Line Coverage
| Whole vs. OneGoal | 0.40 | [0.37, 0.43] | ≤0.001 | 0.35 | ≤0.001 | 0.44 | ≤0.001 |
OneGoal coverage | −0.20 | [-0.23, -0.17] | ≤0.001 | 0.03 | 0.020 | 0.07 | ≤0.001 | |
# of lines | 0.17 | [0.14, 0.20] | ≤0.001 | 0.16 | ≤0.001 | 0.23 | ≤0.001 | |
Weak Mutation Testing
| Whole vs. OneGoal | 0.23 | [0.21, 0.24] | ≤0.001 | 0.46 | ≤0.001 | 0.56 | ≤0.001 |
OneGoal coverage | 0.29 | [0.28, 0.31] | ≤0.001 | 0.35 | ≤0.001 | 0.45 | ≤0.001 | |
# of mutants | −0.19 | [-0.21, -0.18] | ≤0.001 | −0.11 | ≤0.001 | −0.15 | ≤0.001 |
4.5 RQ4: How Does Using an Archiving Solution, Archive, Influence the Performance of Whole ?
Criterion | # of Targets | Statistically at 0:05 | Never Covered by the Other | |
---|---|---|---|---|
Branch Coverage
| Archive is better: | 530 | 370 | 20 |
Equivalent: | 703 | |||
Whole is better: | 1149 | 254 | 42 | |
Total: | 2382 | |||
Line Coverage
| Archive is better: | 986 | 444 | 12 |
Equivalent: | 1984 | |||
Whole is better: | 841 | 263 | 124 | |
Total: | 3811 | |||
Weak Mutation Testing
| Archive is better: | 5479 | 4076 | 310 |
Equivalent: | 5139 | |||
Whole is better: | 1855 | 271 | 126 | |
Total: | 12473 |
Criterion | # of Targets | Statistically at 0:05 | Never Covered by the Other | |
---|---|---|---|---|
Branch Coverage
| Archive is better: | 6288 | 2039 | 711 |
Equivalent: | 23566 | |||
Whole is better: | 4196 | 1726 | 2588 | |
Total: | 34050 | |||
Line Coverage
| Archive is better: | 8466 | 5109 | 1391 |
Equivalent: | 32651 | |||
Whole is better: | 3934 | 930 | 1376 | |
Total: | 45051 | |||
Weak Mutation Testing
| Archive is better: | 57621 | 20358 | 8327 |
Equivalent: | 106946 | |||
Whole is better: | 14270 | 167 | 1896 | |
Total: | 17883 |