1 Introduction
2 Foundations
2.1 Software architecture consistency checking
-
Conformance by construction: These approaches attempt through automatic or semi-automatic generation of lower level artifacts, such as source code, to make sure that these artifacts are consistent with the higher level architecture description. A prominent example of conformance by construction is model-driven development, e.g., Mattsson et al. (2012).
-
Conformance by extraction: These approaches extract information from artifacts of the implementation process, such as source code dependencies, and compare this information with a specification of an intended architecture based on specified mappings or rules.
-
Dependency matrices: These approaches capture source code dependencies in a matrix of source code elements complemented by architectural rules that constrain possible values in the matrix and hence the allowed dependencies (Sangal et al. 2005).
-
Source code query language (SCQL)-based approaches: In these approaches, SCQLs are used to specify constraints that the architecture imposes on the source code and which need to be followed in order to consider the source code architecturally consistent (de Moor et al. 2008; Herold and Rausch 2013).
-
Reflexion modeling-type approaches: These approaches focus on the creation of graphical, box-and-lines models that define intended architectures in terms of modules and allowed dependencies (Murphy et al. 2001).
2.2 Reflexion modeling foundations and terminology
2.3 The reflexion modeling tool used—JITTAC
3 Study design
3.1 Research questions
3.2 Case selection
JabRef | Lucene | Ant | |
---|---|---|---|
Source code statistics | |||
No. of classes | 736 | 507 | 774 |
LOC | 82,783 | 60,298 | 92,140 |
No. of inconsistencies | 1459 | 638 | 377 |
No. of classes with inconsistencies | 186 | 115 | 80 |
No. of classes with outgoing inconsistencies | 119 | 75 | 38 |
No. of classes with incoming inconsistencies | 86 | 49 | 42 |
Architecture statistics | |||
No. of modules | 6 | 7 | 15 |
No. of allowed dependencies | 15 | 16 | 77 |
No. of divergences | 12 | 15 | 14 |
3.3 Data collection—gathered source code metrics
Tool | Metric | Type |
---|---|---|
PMD | Total number of issues | Anomaly count |
Number of coupling issues | Anomaly count | |
Number of design issues | Anomaly count | |
Number of code size issues | Anomaly count | |
Number of high-priority issues | Anomaly count | |
FindBugs | Total number of issues | Anomaly count |
Number of style issues | Anomaly count | |
Number of bad practice issues | Anomaly count | |
Number of scary issues | Anomaly count | |
SourceMonitor | Lines of code | Size |
Number of statements | Size | |
Number of method calls | Size | |
Number of classes and interfaces | Size | |
Methods per class | Size | |
Average method size | Size | |
Percentage of comments | Size | |
Branch percentage | Complexity | |
Maximum method complexity | Complexity | |
Average block depth | Complexity | |
Average complexity | Complexity | |
CKJM | Weighted methods per class | Complexity |
Number of public methods | Size | |
Response for a class | Coupling | |
Depth of inheritance tree | Inheritance coupling | |
Number of children | Inheritance Coupling | |
Fan-out | Coupling | |
Fan-in | Coupling | |
Lack of method cohesion | Cohesion | |
SonarQube | Lines of code | Size |
Number of statements | Size | |
Public API | Size | |
Number of functions | Size | |
Public undocumented API | Size | |
Number of comment lines | Size | |
Comment line density | Size | |
Cyclomatic complexity | Complexity | |
Cyclo. complexity excl. inner classes | Complexity | |
Number of duplicated lines | Anomaly count | |
Number of issues | Anomaly count | |
Number of code smells | Anomaly count | |
Number of bugs | Anomaly count | |
Number of security vulnerabilities | Anomaly count | |
Sqale index | Technical debt | |
Sqale debt ratio | Technical debt | |
VizzAnalyzer | Message Passing Coupling | Coupling |
Data Abstraction Coupling | Coupling | |
Locality of Data | Coupling | |
Git | Number of commits | Code churn |
Timestamp of creation | Age |
3.4 Data analysis
3.5 Replication package
4 Results
4.1 Correlation analysis
4.2 Categorical analysis
4.2.1 Classes being sources of architectural inconsistencies
C
l
a
s
s
e
s
I
n
c
o
n
s
i
s
t
e
n
c
i
e
s
|
C
l
a
s
s
e
s
N
o
I
n
c
o
n
s
i
s
t
e
n
c
i
e
s
| |
---|---|---|
PublicAPI > 50% | a: 56 | b: 152 |
PublicAPI ≤ 50% | c: 19 | d: 280 |
4.2.2 Classes being targets of architectural inconsistencies
Metric | JabRef | Lucene | Ant | ||||||
---|---|---|---|---|---|---|---|---|---|
Percentile | 50th | 75th | 90th | 50th | 75th | 90th | 50th | 75th | 90th |
Public API | 3.29 | 3.90 | 5.65 | 2.06 | 3.06 | 3.60 | 2.11 | 2.48 | 3.69 |
Fan-in | 8.02 | 5.36 | 10.02 | 12.32 | 10.80 | 13.44 | 7.75 | 3.61 | 4.31 |
4.2.3 Classes being sources or targets of architectural inconsistencies
4.3 Significant metrics and size
Metric | JabRef | Lucene | Ant |
---|---|---|---|
WMC | 0.77 (p < 2.2e−16) | 0.76 (p < 2.2e−16) | 0.86 (p < 2.2e−16) |
Total issues (PMD) | 0.94 (p < 2.2e−16) | 0.96 (p < 2.2e−16) | 0.94 (p < 2.2e−16) |
Design issues (PMD) | 0.46 (p < 2.2e−16) | 0.7 (p < 2.2e−16) | 0.71 (p < 2.2e−16) |
Code smells (SQ) | 0.71 (p < 2.2e−16) | 0.81 (p < 2.2e−16) | 0.77 (p < 2.2e−16) |
Complexity | 0.9 (p < 2.2e−16) | 0.96 (p < 2.2e−16) | 0.97 (p < 2.2e−16) |
RFC | 0.91 (p < 2.2e−16) | 0.87 (p < 2.2e−16) | 0.90 (p < 2.2e−16) |
LCOM | 0.56 (p < 2.2e−16) | 0.33 (p : 5.65e−14) | 0.71 (p < 2.2e−16) |
Fan-in | 0.14 (p : 5.08e−5) | 0 (p : 0.96) | 0.09 (p : 0.01) |
5 Discussion
5.1 Interpretation of the results
Finding 1: The fan-in and the public API metrics are the most suitable indicators for classes with architectural inconsistencies found in this study.
Finding 2: Method counts seem to be suitable for indicating architectural inconsistencies.
RQ1: Method counts and the fan-in show amoderate positive correlation with the amount of architectural inconsistencies that aclass participates in.
Finding 3: Classes which violate various design principles (being very large, complex, and containing many code smells) are significantly more likely to be the source of architectural inconsistencies.
Finding 4: Anomaly counts and detection strategies are less suitable indicators than more traditional source code metrics.
Finding 5: The metrics related to technical debt applied in this study are not suitable to indicate classes contributing to architectural inconsistencies.
RQ2: Avariety of metrics can be used to discriminate between classes that are more likely to participate in architectural inconsistencies and those that are less likely to do so, including size, coupling, cohesion, and complexity metrics, as well as anomaly counts, but the best discriminators are the fan-in and the size of the public API of aclass.
RQ3: There is aconfounding effect of class size on most metrics that are suitable discriminators, but especially the fan-in and, to alimited degree, lack of method cohesion are exceptions to this effect.