1 Introduction
2 Background and related work
2.1 Reverse engineering variability models
2.2 Testing a configurable system
2.2.1 Variability-aware testing
2.2.2 Configurations sampling
2.3 Comparison of sampling approaches
2.4 Motivation of this Study
3 Case study
3.1 Research questions
3.1.1 (RQ1) What is the feasibility of testing all JHipster configurations?
-
(RQ1.1) What is the cost of engineering an infrastructure capable of automatically deriving and testing all configurations?
-
(RQ1.2) What are the computational resources needed to test all configurations?
3.1.2 (RQ2) To what extent can sampling help to discover defects in JHipster?
-
(RQ2.1) How many and what kinds of failures/faults can be found in all configurations?
-
(RQ2.2) How effective are sampling techniques comparatively?
-
(RQ2.3) How do our sampling techniques effectiveness findings compare to other case studies and works?
3.1.3 (RQ3) How can sampling help JHipster developers?
-
(RQ3.1) What is the most cost-effective sampling strategy for JHipster?
-
(RQ3.2) What are the recommendations for the JHipster project?
3.2 Methodology
4 All configurations testing costs (RQ1)
4.1 Reverse engineering variability
4.2 Fully automated derivation and testing
4.2.1 Engineering a configurable system for testing configurations
4.2.2 Implementing testing procedures
4.2.3 Building an all-inclusive testing environment
4.2.4 Distributing the tests
4.2.5 Opportunistic optimizations and sharing
4.2.6 Validation of the testing infrastructure
4.3 Human Cost
4.4 Computational cost
5 Results of the testing workflow execution (RQ2.1)
5.1 Bugs: A quick inventory
5.2 Statistical analysis
-
let F = {ft1,ft2,…,ftn,bs} be a set of n features (fti) plus the status of the build (bs), i.e., build failed or not;
-
let C = {c1,c2,…,cm} be a set of m configurations.
-
X the left-hand side (LHS) or antecedent of the rule;
-
Y the right-hand side (RHS) or consequent of the rule.
Conf. | gradle | mariadb | enableSocialSignIn | websocket | ... | build failure |
---|---|---|---|---|---|---|
1 | 1 | 0 | 0 | 0 | ... | 0 |
2 | 0 | 1 | 0 | 0 | ... | 0 |
3 | 0 | 0 | 1 | 1 | ... | 0 |
4 | 1 | 1 | 0 | 0 | ... | 1 |
5 | 1 | 0 | 0 | 0 | ... | 0 |
6 | 1 | 1 | 0 | 0 | ... | 1 |
... | ... | ... | ... | ... | ... | ... |
Id | Left-hand side | Right-hand side | Support | Conf. | GitHub Issue | Report/Correction date |
---|---|---|---|---|---|---|
MOSO
| DatabaseType=“mongodb", EnableSocialSignIn=true | Compile=KO | 0.488% | 1 | 4037 | 27 Aug 2016 (report and fix for milestone 3.7.0) |
MAGR
| prodDatabaseType=“mariadb", buildTool=“gradle" | Build=KO | 16.179% | 1 | 4222 | 27 Sep 2016 (report and fix for milestone 3.9.0) |
UADO
| Docker=true, authenticationType=“uaa" | Build=KO | 6.825% | 1 | UAA is in Beta | Not corrected |
OASQL
| authenticationType=“uaa", hibernateCache=“no" | Build=KO | 2.438% | 1 | 4225 | 28 Sep 2016 (report and fix for milestone 3.9.0) |
UAEH
| authenticationType=“uaa", hibernateCache=“ehcache" | Build=KO | 2.194% | 1 | 4225 | 28 Sep 2016 (report and fix for milestone 3.9.0) |
MADO
| prodDatabaseType=“mariadb", applicationType=“monolith", searchEngine=“false", Docker=“true" | Build=KO | 5.590% | 1 | 4543 | 24 Nov 2016 (report and fix for milestone 3.12.0) |
5.3 Qualitative analysis
gulp
in our case).6 Sampling techniques comparison (RQ2.2)
6.1 JHipster team sampling strategy
6.2 Comparison of sampling techniques
6.2.1 Sampling techniques
6.2.2 Fault and failure efficiency
Sampling technique | Sample size | Failures (σ) | Failures eff. | Faults (σ) | Fault eff. |
---|---|---|---|---|---|
1-wise | 8 | 2.000 (N.A.) | 25.00% | 2.000 (N.A.) | 25.00% |
Random(8) | 8 | 2.857 (1.313) | 35.71% | 2.180 (0.978) | 27.25% |
PLEDGE(8) | 8 | 3.160 (1.230) | 39.50% | 2.140 (0.825) | 26.75% |
Random(12) | 12 | 4.285 (1.790) | 35.71% | 2.700 (1.040) | 22.5% |
PLEDGE(12) | 12 | 4.920 (1.230) | 41.00% | 2.820 (0.909) | 23.50% |
2-wise | 41 | 14.000 (N.A.) | 34.15% | 5.000 (N.A.) | 12.20% |
Random(41) | 41 | 14.641 (3.182) | 35.71% | 4.490 (0.718) | 10.95% |
PLEDGE(41) | 41 | 17.640 (2.500) | 43.02% | 4.700 (0.831) | 11.46% |
3-wise | 126 | 52.000 (N.A.) | 41.27% | 6.000 (N.A.) | 4.76% |
Random(126) | 126 | 44.995 (4.911) | 35.71% | 5.280 (0.533) | 4.19% |
PLEDGE(126) | 126 | 49.080 (11.581) | 38.95% | 4.660 (0.698) | 3.70% |
4-wise | 374 | 161.000 (N.A.) |
43.05%
| 6.000 (N.A.) | 1.60% |
Random(374) | 374 | 133.555 (8.406) | 35.71% | 5.580 (0.496) | 1.49% |
PLEDGE(374) | 374 | 139.200 (31.797) | 37.17% | 4.620 (1.181) | 1.24% |
Most-enabled-disabled | 2 | 0.683 (0.622) | 34.15% | 0.670 (0.614) |
33.50%
|
All-most-enabled-disabled | 574 | 190.000 (N.A.) | 33.10% | 2.000 (N.A.) | 0.35% |
One-disabled | 34 | 7.699 (2.204) | 0.23% | 2.398 (0.878) | 0.07% |
All-one-disabled | 922 | 253.000 (N.A.) | 27.44% | 5.000 (N.A.) | 0.54% |
One-enabled | 34 | 12.508 (2.660) | 0.37% | 3.147 (0.698) | 0.09% |
All-one-enabled | 2,340 | 872.000 (N.A.) | 37.26% | 6.000 (N.A.) | 0.26% |
ALL | 26,256 | 9,376.000 (N.A.) | 35.71% | 6.000 (N.A.) | 0.02% |
6.2.3 Discussion
7 Comparison with Other Studies (RQ2.3)
7.1 Studies selection protocol
Reference | Samplings | Validation |
---|---|---|
Medeiros et al. (2016) | Statement-coverage, | 135 configuration-related faults in 24 |
one-enabled,one-disabled, | open-source C ( #ifdef ) | |
most-enabled-disabled, | configurable systems | |
random, pair-wise, | three-wise, four-wise, | |
five-wise, six-wise | ||
Sánchez et al. (2017) | Pairwise | Drupal (PHP modules based |
Web content management system) | ||
Parejo et al. (2016) | Multi-objective | Drupal (PHP modules based |
Web content management system) | ||
Souto et al. (2017) | random, one-enabled, | 8 small SPLs + GCC (50 most used options). |
one-disabled, most-enabled | Samplings’ sizes and number of | |
enabled disabled and pairwise | failures were considered. | |
computed from SPLat | ||
(Kim et al. 2013) | ||
Apel et al. (2013b) | one-wise, pairwise, three-wise | 3 configurable systems written in |
compared to a family-based | C and 3 in JAVA | |
stategy and enumeration | ||
of all products |
7.2 Selected studies
#ifdef
) to implement variability.7.3 Comparison of findings
7.3.1 Sampling effectiveness
-
2-wise is indeed one of the most effective sampling technique, capable of identifying numerous failures and the 5 most important faults;
-
most-enabled-disabled is also efficient to detect failures (34.15%) and faults (33.5% on average).
-
one-enabled and one-disabled perform very poorly in our case, requiring a substantial number of configurations to find either failures or faults;
-
despite a high fault efficiency, most-enabled-disabled is only able to capture 0.670 faults on average, thus missing important faults.