1 Introduction
1.1 Contributions
-
wit: a technique to automatically infer the exception preconditions of Java methods based on a novel combination of static analysis and heuristics that trade-off exhaustiveness for high precision.
-
An implementation of wit and an experimental evaluation targeting five JDK 11 modules and 46 open-source Java projects (including popular ones like Apache JCommons Lang, and the Jh2database), which demonstrates wit’s effectiveness, practical applicability to real-world projects, and usefulness.
-
For reproducibility, wit’s implementation and the detailed experimental outputs are available.,2
1.2 Extended Version
2 Showcase Examples of Using wit
Dubbo
3 and Commons Lang
.4 The examples showcase wit’s capabilities and practical usefulness: wit could automatically extract exception preconditions in many methods of these two projects, including some that were not documented (Section 2.1) or incorrectly documented (Section 2.2). Section 5.6 reports further empirical evidence that wit’s exception preconditions can be useful as a source of documentation.Dubbo
’s GitHub repository5 has over 24 thousand forks and 36 thousand stars. As a result, they are particularly well documented and tested Zhong et al. (2020); Nassif et al. (2021). The fact that wit could find some of their few missing or inconsistent pieces of their documentation indicates that it has the potential to be practically useful and widely applicable.
2.1 Missing Documentation
throws: | JIndexOutOfBoundException |
when: | Joff>= 0 && len>= 0 && bs.length< len + off |
example: | J[off=0, len=1, bs.length=0] |
throws: | JIllegalArgumentException |
when: | Jb.length>= 0 && code.length< 64 |
example: | J[b.length=0, code.length=0] |
2.2 Inconsistent Documentation
Apache Commons Lang
’s class JNumberUtils, which computes the minimum of an Jarray of integers. Unlike the previous example, Jmin’s documentation is detailed and clearly expresses the conditions under which an exception is thrown. Unfortunately, the documentation is partially incorrect: when Jarray is null, Jmin throws a JNullPointerException, not an JIllegalArgumentException, as precisely reported by wit: throws: JNullPointerException when: Jarray == null |
3 How wit Works
3.1 Parsing and CFG
JGraphT
.8 More precisely, we build a CFG for each method Jm individually; and annotate branches in the CFG with each branch’s Boolean condition.Apache Commons Lang
. Method Jinsert puts some values Jv into an array Ja of Booleans at a given index Jk. The initial part of its implementation calls another method, JisEmpty, of the same class to determine if Jv is empty; in turn, JisEmpty calls method JgetLength. wit builds CFGs for Jinsert, JisEmpty, and JgetLength, since they are all part of the input source code.
3.2 Local Exception Paths
3.3 Global Exception Paths
3.4 Modular Analysis
RandomStringGenerator
in project Commons Text
calls method Validate.isTrue
15 in another project Commons Lang
. If we run wit on project Commons Text
alone, the call to isTrue
is marked as opaque, and hence no exception precondition would be reported for this path. We could run wit on both projects Commons Text
and Commons Lang
together; this would take a considerable amount of time, and it would not scale to combining even more projects. Instead, we can use wit’s modular analysis and first analyze Commons Lang
in isolation; this would report the exception precondition !expr
for method validate.isTrue
. Then, when wit runs on Commons Text
, it would replace the call to isTrue
in generate
with if (!(length> 0)) throw new IllegalArgumentException()
, which leads to inferring exception precondition length
\(\mathtt {<= 0}\) for this path in method generate
.3.5 Path Feasibility
int
, boolean
, byte
, arrays, strings) with their most common operations (including arithmetic, equality, length
, contains
, isEmpty
), as well as of a few widely used JDK library methods (such as Array.getLength
). wit uses this encoding to build an SMT formula \(\phi \) corresponding to each global expath p: if \(\phi \) is satisfiable, then the global expath p is feasible, and hence it corresponds to a possible exceptional behavior of method m
.Z3Py
Python API.19 Listing 5 shows a simplified excerpt of the SMT program encoding the feasibility of insert
’s global expath \(p_1\). First, it declares logic variables of the appropriate types to encode program variables (e.g., k
), their basic properties (e.g., a_length
, which corresponds to the Java expression a.length
), and the values passed via method calls (e.g., getLength
is an integer variable storing getLength()
’s output). Then, it builds a list c
of constraints that capture the path constraints and the semantics of the statements along the path. For example, a_length
must be nonnegative, since it corresponds to array a
’s length (line 5); the properties of array v
are copied to those of x
, since insert
’s argument v
is the actual argument for isEmpty
’s formal argument x
(line 7); and path constraint !isEmpty(v)
corresponds to the complement of Boolean variable isEmpty
(line 12). In this case, Z3 easily finds that the constraints in c
are unsatisfiable, since Not(0 == 0)
is identically false. In contrast, the constraints corresponding to path \(p_2\) are satisfiable, and thus Z3 outputs a satisfying assignment of all variables in that case.getLength
’s implementation wasn’t available.getLength
returns an integer without any constraints; therefore it would classify path p as feasible but mark it as maybe since it is just an educated guess without correctness guarantees.3.6 Exception Preconditions
m
that trigger an exception. In order to characterize those inputs as an exception precondition, wit encodes p’s constraints as a formula that only refers to m
’s arguments, as well as to any members that are accessible at m
’s entry (such as the target object this
, if m
is an instance method). To this end, it works backward from the last node of exception path p; it collects all path constraints along p, while replacing any reference to local variables with their definition. For example, method void f(int x){int y=x+1; if(y> 0)throw;}
has a single feasible expath with path condition y> 0
, which becomes x + 1> 0
after backward substitution through the assignment to variable y
. Since x + 1> 0
only mentions argument x
, it is a suitable exception precondition for method m
.x.inc() == 0
, where calling inc
increments the value of x
. Here too, wit is conservative and marks as maybe any exception precondition that involves calls to methods that are not known to be pure.0 + 1 == 1
.
throw
statement (including any exception message or other arguments used to instantiate the exception object), the line in the analyzed method m
where the exception is thrown or propagated, and a sequence of method calls starting from the analyzed method and ending in the throwing method. Moreover, wit reports the generated Z3 and SymPy Python programs’ source code.3.7 Heuristics and Limitations
throw
statement; it does not consider low-level errors—such as division by zero, out-of-bound array access, and buffer overflow—that are signaled by exceptions raised by the JVM. This restriction is customary in techniques that infer exceptional behavior, since implicitly thrown exceptions are “generally indicative of programming errors rather than design choices Weimer and Necula (2004)” Raymond and Weimer (2008), and usually do not belong in API-level documentation Forward and Lethbridge (2002) and are best analyzed separately. Extending wit to also track implicit exceptions would not be technically difficult; for example, one could first instrument the code to be analyzed with explicit checks before any statement that may thrown an implicit exception.21 However, indiscriminately considering all exceptions that are thrown implicitly would produce a vast number of boilerplate exception preconditions that are not specific to a method’s explicitly programmed behavior; hence, they would be outside wit’s current focus.instanceof
operators, for-each
loops, switch
statements, and try/catch
blocks. When these features are used, the CFG may omit some paths that exist in the actual program. (Supporting the latter three features is possible in principle, but would substantially complicate the CFG construction.)22,23 The SMT encoding used for path feasibility (Section 3.5) is limited to a core subset of Java features and standard library methods. As a result, wit won’t report exception preconditions that involve unsupported features (or will report them as maybe, that is without correctness guarantee).String
and StringBuilder
classes. This heuristic only applies when wit uses modular analysis: these two JDK classes have a complex implementation involving native code and JVM internals. Thus, wit’s analysis of String
and StringBuilder
can only retrieve a few correct maybes; as a result, using them in the modular analysis of other client classes is likely to introduce a large number of spurious maybes—which this heuristic avoids.a.length
implicitly requires that a != null
). In some complex cases, this encoding may be overly conservative, leading to marking as unsatisfiable exception preconditions that are actually correct. To accommodate these unusual cases, wit also offers a less conservative logic encoding of the same features, which trades off correctness for recall; users can switch to this alternative encoding when analyzing software where a high recall is more important than an absolute correctness guarantee.4 Experimental Evaluation
expres | maybes | |||||||
---|---|---|---|---|---|---|---|---|
project | hash | kloc | time | # | m | p | ?# | ?p |
com/sun | – | 30 | – | 55 | 48 | 1.0 | 78 | – |
sun | – | 128 | – | 566 | 474 | 1.0 | 1 068 | – |
java | – | 209 | – | 3 420 | 2 578 | 1.0 | 1 666 | – |
javax | – | 8 | – | 190 | 145 | 1.0 | 41 | – |
jdk | – | 52 | – | 847 | 742 | 1.0 | 598 | – |
overall JDK | da75f3c4ad5 | 428 | 765 | 5 078 | 3 987 | 1.0 | 3 451 | 0.36 |
accumulo | 7db0561cac | 33 | 311 | 995 | 908 | 1.0 | 1 335 | 0.3 |
Activiti | 31024bc756 | 103 | 150 | 685 | 543 | 1.0 | 212 | 0.2 |
asm | 72e8ec49 | 28 | 130 | 203 | 126 | 1.0 | 428 | 0.8 |
asterisk-java | 5c56735c | 30 | 27 | 27 | 24 | 1.0 | 46 | 0.4 |
AutomatedCar | c137e56a | 4 | 2 | 2 | 2 | 1.0 | 4 | 0.5 |
Baragon | 10660b41 | 15 | 6 | 10 | 10 | 1.0 | 50 | 0.2 |
bigtop | ee28ba88 | 6.5 | 4 | 9 | 9 | 1.0 | 6 | 0.2 |
byte-buddy | 4c57c80aab | 57 | 974 | 356 | 348 | 1.0 | 374 | 0.8 |
camel | 0a735ae926c | 972 | 2 626 | 1 558 | 1 276 | 1.0 | 1 111 | 0.4 |
closure-compiler | fe0cebacd | 287 | 538 | 158 | 157 | 1.0 | 654 | 0.2 |
commons-bcel | f1a1459f | 35 | 137 | 76 | 74 | 1.0 | 896 | 0.4 |
commons-configuration | 1b406c17 | 20 | 12 | 170 | 139 | 1.0 | 53 | 0.4 |
commons-io | 2ae025fe | 9.5 | 23 | 240 | 187 | 1.0 | 186 | 0 |
commons-lang | 90e0a9bb2 | 29 | 55 | 611 | 484 | 1.0 | 230 | 0.8 |
commons-math | 674805c64 | 61 | 264 | 1 078 | 612 | 1.0 | 573 | 0.8 |
commons-text | 21fc34f | 10 | 32 | 235 | 156 | 1.0 | 138 | 0.6 |
Confucius | e375cb9 | 0.5 | 1 | 45 | 18 | 1.0 | 14 | 0.4 |
curator | 9aafdec9 | 26 | 35 | 192 | 116 | 1.0 | 126 | 0.6 |
dubbo | b5e65a6d2 | 99 | 274 | 413 | 341 | 1.0 | 225 | 0.4 |
flink | db248b2176 | 568 | 1 245 | 5 661 | 4 059 | 1.0 | 5 201 | 0.8 |
gae-java-mini-profiler | 9cb1ba6 | 0.5 | 1 | 0 | 0 | – | 0 | – |
h2database | 0ee51f54a | 150 | 229 | 526 | 507 | 1.0 | 834 | 0.6 |
httpcomponents-client | 29ba623eb | 32 | 37 | 27 | 24 | 1.0 | 90 | 0.4 |
itext7 | ae78654a5 | 145 | 880 | 681 | 522 | 1.0 | 702 | 0.7 |
jackrabbit | 35d5732bc | 260 | 300 | 1 224 | 1 111 | 1.0 | 1 595 | 0.8 |
jackrabbit-oak | f8c7b551a4 | 26 | 334 | 502 | 493 | 1.0 | 667 | 0.4 |
jackson-databind | 972d5a28a | 63 | 57 | 180 | 166 | 1.0 | 153 | 0.6 |
jfreechart | 5aac9ae4 | 84 | 133 | 1 387 | 1 149 | 1.0 | 800 | 1.0 |
jmonkeyengine | 499e73ab0 | 19 | 376 | 634 | 569 | 1.0 | 1 220 | 0.2 |
joda-time | 27edfffa | 29 | 58 | 250 | 228 | 1.0 | 355 | 0.6 |
logging-log4j2 | 59f6848b7 | 99 | 159 | 472 | 304 | 1.0 | 392 | 0.2 |
lucene-solr | 7ada4032180 | 685 | 1 545 | 3 380 | 2 755 | 1.0 | 4 132 | 0.6 |
pdfbox | 01bce4dde | 106 | 230 | 255 | 239 | 1.0 | 362 | 0.2 |
poi | 270107d9e | 260 | 403 | 710 | 624 | 1.0 | 1 851 | 0.2 |
santuario-xml-security-java | 86179876 | 35 | 38 | 167 | 142 | 1.0 | 131 | 0.6 |
shiro | 0c0d9da2 | 27 | 39 | 154 | 141 | 1.0 | 145 | 0.2 |
spoon | 34c23fc7 | 75 | 86 | 272 | 268 | 1.0 | 357 | 0.4 |
spring-cloud-gcp | 6c95a16f | 20 | 20 | 13 | 13 | 1.0 | 10 | 0.8 |
spring-data-commons | 4acd3b70 | 28 | 24 | 31 | 29 | 1.0 | 123 | 0.4 |
swingx | 9e33bc0 | 72 | 108 | 157 | 149 | 1.0 | 217 | 0.8 |
traccar | eac5f4889 | 54 | 60 | 2 | 2 | 1.0 | 76 | 0 |
visualee | 88732d9 | 1.8 | 3 | 0 | 0 | – | 3 | 0 |
weiboclient4j | 80556b1 | 7.8 | 10 | 6 | 6 | 1.0 | 9 | 0.2 |
wicket | 7c0009c8df | 109 | 1 069 | 930 | 811 | 1.0 | 656 | 0.6 |
wildfly-elytron | 3457737d98 | 80 | 128 | 340 | 316 | 1.0 | 233 | 0.2 |
xmlgraphics-fop | 7edce5dd5 | 165 | 940 | 385 | 318 | 1.0 | 617 | 0.6 |
overall other projects | – | 5 720 | 14 116 | 25 409 | 20 474 | 1.0 | 27 592 | 0.5 |
overall | – | 6 148 | 14 881 | 30 487 | 24 461 | 1.0 | 31 043 | 0.5 |
4.1 Experimental Subjects
java.base/share/classes
) and ran wit on all of them as if it were a regular Java project: modules com/sun
, java
, javax
, sun
, and jdk
.gae-java-mini-profiler
, visualee
, and AutomatedCar
are no longer maintained. This minority of projects makes the selection more diverse, so that we will be able to evaluate wit’s capabilities in different scenarios.Apache lucene-solr
was recently split into two separate projects, and thus we used the last version before the split; we analyzed version 2.6 of Apache Commons IO
to match Nassif et al. (2021)’s thorough manual analysis—which we used as ground truth to answer RQ2.4.2 Experimental Setup
+
src/test/+) or other auxiliary code. All experiments ran on a Windows 11 Intel i9 laptop with 32GB of RAM. By default, wit only infers the exception preconditions of public methods; if a public method calls a non-public one, wit will also analyze the latter, but will report only public exception preconditions. wit analyzes each class in isolation; then, it combines the results for all classes in the same project and outputs them to the user.lang
, io
, text
, math
, configuration
, in this order) followed by all other projects in alphabetical order. Since practically all projects use some JDK libraries, and several projects also use Apache Commons libraries, this execution order maximizes the chances that wit can reuse the results of one of its previous runs to perform an effective modular analysis. In contrast, client-of dependencies between projects other than the JDK and Apache Commons libraries are more sparse; therefore, the alphabetical order is somewhat arbitrary, but even following a different order is unlikely to significantly affect wit’s capabilities.offset
is negative”) for all public methods in Apache Commons IO
’s base package collected from all origins (package code, libraries, tests, documentation, ...). We counted the exception preconditions inferred by wit that are semantically equivalent to any in DSc. Matching DSc’s natural-language preconditions to wit’s was generally straightforward, as we didn’t have to deal with subtle semantic ambiguities: since wit only reports correct exception preconditions as expres, we only had to match (usually simple) natural-language expressions to their Java Boolean expression counterparts.Commons IO
project, whose extensive usage of I/O operations complicates (any) static analysis; (ii) DSc describes all sorts of exceptional behavior, including the “not typically documented” runtime exceptions Nassif et al. (2021). To assess wit’s recall on a more varied collection of projects, we also considered Zhong et al. (2020)’s dataset—henceforth, DPA—which includes 503 so-called “parameter rules” of public methods in 9 projects (a subset of our 46 projects described in Section 4.1). A parameter rule is a pair \(\langle m, p \rangle \), where m is a fully-qualified method name and p is one of m’s arguments; it denotes that calling m with some values of p may throw an exception. Important, parameter rules do not express the values of p that determine an exception, and hence they are much less expressive than preconditions; however, they are still useful to determine “how much” exceptional behavior wit captures. We counted the exception preconditions inferred by wit that match DPa: a precondition c matches a parameter rule \(\langle m, p \rangle \) if c is an exception precondition of method m that depends on the value of p. This is a much weaker correspondence than for DSc, but it’s all the information we can extract from DPa’s parameter rules.m
with arguments int x
and String y
, whose class includes fields int[] a
, int count
, and method boolean active()
:void m(int x, int[] y) | argument | state |
---|---|---|
null | y == null | this.a != null |
value | x == 1 | this.count> 0 |
query | y.isEmpty() | !this.active() |
Accumulo
,27Commons Lang
,28,29,30Commons Math
,31,32Commons Text
,33 and Commons IO
.34 We selected these five projects as they are very active and routinely spend effort in maintaining a good-quality documentation. Each pull request combines the exception preconditions of methods in the same class or package, and expresses wit’s exception preconditions using Javadoc @throws
tags. To compile each pull request, we sometimes complemented the Javadoc with a brief complementary natural-language description, and possibly some tests (expressing wit’s example inputs in the form of unit tests). We also tried to adjust the Javadoc syntax to be consistent with each project’s style (for example, expressing a != null
as either a not null
or @code a != null)
. In all cases, reformulating wit’s output was a trivial matter.5 Experimental Results
5.1 RQ1: Precision
all | supported | |||||
---|---|---|---|---|---|---|
dataset | project | # | E% | E+M% | E% | E+M% |
DSc Nassif et al. (2021) | commons-io | 844 | 9 | 12 | 57 | 72 |
DPa Zhong et al. (2020) | asm | 54 | 6 | 23 | 25 | 75 |
commons-io | 65 | 77 | 78 | 94 | 96 | |
jfreechart | 42 | 80 | 85 | 84 | 89 | |
overall | 1,345 | 13 | 23 | 48 | 84 |
5.2 RQ2: Recall
5.2.1 Dataset DSc
Commons IO
(1 in FileNameUtils
, 4 in LineIterator
, 15 in IOUtils
, 8 in FileCleaningTracker
, 44 in FileUtils
, 3 in HexDump
, and 2 in ByteOrderMark
), that is a recall of 9% (77/844). However, 708 out of DSc’s 844 exception preconditions are of kinds unsupported by wit (see Section 3.7). After excluding unsupported exception precondition kinds,37wit’s recall estimate becomes 57% (\(77/(844-708)\)).
Commons IO
exception preconditions from DSc that wit didn’t report as expres. We can classify these missed preconditions in two groups.-
Unsupported features: As mentioned, the largest group of missed preconditions (547 or 77% of the missed preconditions) involve Java language features that wit does not support.
-
Implicit exceptions: Another group of missed preconditions (161 or 23% of the missed preconditions) correspond to implicit exceptions that are thrown by the Java runtime (e.g., when a null pointer is dereferenced), which we deliberately ignore (as discussed in Section 3.7). A significant case is class
EndianUtils
38 for which DSc reports 48 exception preconditions involvingArrayIndexOutOfBounds
orNullPointer
exceptions thrown implicitly.
5.2.2 Dataset DPa
asm
, which extensively uses features and coding patterns40 that wit currently doesn’t adequately support: as a result, wit’s recall is fairly low (considering all parameter rules, 6% with expres only and 23% with expres+maybes; considering only supported ones, 25%/75%). In contrast, more “traditional” Java projects like JFreeChart
41 extensively follow programming practices such as validating a method’s input, which are a better match to wit’s current capabilities: as a result, wit’s recall is quite high (considering all parameter rules, 80% with expres only and 85% with expres+maybes; considering only supported ones, 84%/89%).
5.3 RQ3: Features
true
. Nine of 13 expres of spring-cloud-gcp
are of this kind. These usually correspond to methods that unconditionally throw an UnsupportedOperation
exception to signal that they are effectively not available;43,44lucene-solr
’s class ResultSetImpl
for an example.45 In Java, this is a common idiom to provide “placeholders,” which will be replaced by actual implementations through overriding in subclasses. While this is a common programming pattern that leverages polymorphism, it nominally breaks behavioral substitutability Liskov and Wing (1994); Nguyen et al. (2014): a method’s precondition should only be weakened Meyer (1997), but no Boolean expression is weaker than true
.e
of a ternary expression. c ? t : e
. It also followed method calls collecting complex conditions and presenting them in a readable, simplified form. For example, for a ConcurrentModification
exception,47 or after collecting constant values from other classes.48IndexOutOufBoundsException
\(\mathtt {i < 0}\) or \(\mathtt {i >}\) array.length
”.495.4 RQ4: Modularity
5.4.1 Exception Preconditions in Modular vs. Non-Modular
jfreechart
and pdfbox
when enabling modular analysis. This happens because modular analysis replaces a call to an opaque method with whatever exception path wit extracted from the called method. In some cases, the called method’s exception precondition may be a very partial approximation of the callee’s full exceptional behavior; therefore, using it in place of the call may be counterproductive to obtain a provably feasible exception precondition in the caller. In fact, this is a common problem of modular reasoning Tschannen et al. (2014): if the callee’s specification is weak, there is very little we can conclude about the caller’s behavior.String.length()
and String.equals()
. For example, when wit analyzes String.equals()
’s implementation in the JDK,50 it encounters several features and special cases that limit its effectiveness, such as different string encodings51 and compacted strings;52 furthermore, the Java runtime represents a String
as a byte array,53 a type that wit does not currently support. As a result, wit only reports some very narrow, overly complex exception paths for String.equals()
, corresponding to the few paths within its implementation that do not depend on any of those complex language features. What happens when wit processes a method such as the one in Listing 7, which makes numerous calls to String.equals()
, with modular analysis enabled? Replacing the calls with the previously extracted exception paths leads to an overly narrow, needlessly complex path condition, which bogs down the SMT solver and does not lead to any provably feasible path in the caller. In contrast, if modular analysis is disabled, wit simply encodes the calls to String.equals()
as Boolean variables with basic constraints, which is sufficient in some cases to get to a working proof of feasibility—and hence to an expre correctly characterizing setHighlightingMode
’s54 exceptional path.expres | maybes | |||
---|---|---|---|---|
project | \(\Delta \) time | \(\Delta \) # | \(\Delta \) m | \(\Delta \) ?# |
camel | 1.17 | 1.04 | 1.03 | 3.51 |
commons-io | 0.65 | 1.03 | 1.06 | 1.81 |
commons-lang | 2.30 | 1.05 | 1.05 | 6.21 |
jfreechart | 1.70 | 0.99 | 1.01 | 0.91 |
pdfox | 2.67 | 0.98 | 0.97 | 2.71 |
overall | 1.32 | 1.02 | 1.02 | 2.56 |
5.4.2 Correctness of Maybes in Modular vs. Non-Modular
Double
in Java); wit’s simple encoding of numbers cannot deal with special values such as NaN
55 and Inf
(obtained, for example, when dividing 1.0
by 0.0
56). Another one is the JDK’s Collections Framework, which would require a suitable (non-trivial) logic encoding in Z3 to work in wit.Stack.pop
,57 which throws an exception when the stack is empty. wit reports a correct exception precondition for pop
; however, the precondition expression mentions a protected field58 used in Stack
’s internal representation.59 As a result, the exception precondition is not usable correctly to analyze clients of the Stack
class, such as in one of the maybes we inspected for project pdfbox
.60 To handle such cases Zeng et al. (2021), one could try to convert any references to private members into calls to public getter methods—if they are available.IntersectionResult
’s constructor61 in project Commons Text
. As you can see in Listing 8, the exception path that ends at line 10 involves a call to the JDK’s Math.min
function. Without modular analysis, wit can only report the whole conditional expression as a maybe. In contrast, wit’s modular analysis can recover Math.min
’s behavior from its previous analysis of the JDK; thus, it reports two correct expres for the same exceptional path:-
×
-
×
5.5 RQ5: Efficiency
Apache Commons Lang
in just 55 minutes—17 seconds on average for each of the project’s 200 top-level classes. It also scales well to very large projects: it analyzed the 9 780 classes of Apache Camel
(the largest project in our collection) in 44 hours—just 16 seconds per class on average. Key to this performance is wit’s capability of analyzing each class in isolation, without requiring any compilation or build of the whole project.ASMifier.appendAccess()
62 as an example of how wit’s heuristics are useful. It is from project ASM
and embedded under the internal subdirectory of the JDK. The method has several nested if-else branches, that lead to millions of paths. wit’s heuristics are crucial to avoid getting bogged down analyzing such complex pieces of code.
5.6 RQ6: Usefulness
5.6.1 Usefulness: Regular Projects
@throws Exception
tag that does not specify the conditions under which an Exception
is thrown).Apache Curator
’s method validatePath(String)
,63 whose Javadoc just says that “@throws
IllegalArgumentException
if the path is invalid”. wit detects several different exception preconditions for when an exception of class IllegalArgumentException
is thrown: a path is invalid when it is null
, empty, not starting or ending with a /
, etc.Commons IO
’s method FileUtils.copyURLToFile()
,64 which calls methods from JDK class URLConnection
65,66. Commons IO
’s documentation of this method mentions five conditions under which the method will throw an IOException
. The DSc dataset reports another two exception preconditions that trigger implicitly a NullPointerException
. However, only wit found that that the calls to setConnectTimeout
and to setReadTimeout
will throw an IllegalArgumentException
if their argument is a negative integer. This is yet another example that manually detecting and documenting exception preconditions is tedious, time-consuming, and error prone; thus, the kind of automation provided by wit can be very useful.
5.6.2 Usefulness: JDK Modules
throws
keyword to declare (unchecked) runtime exceptions.68\(^{,}\)69 JDK’s package Time
70 uses a distinctly different style of documenting NullPointerException
s, which betrays the package’s origins as a derivative of project joda-time
; to declare that a method thows a null pointer exception when one of its parameters p
is null
, it writes:@param p
\(\mathtt {<}\)description of p
\(\mathtt {>}\), not null
.71 Incidentally, project JFreechart
uses a similar style of documentation.NullPointerException
without arguments (i.e., no message). Despite these outliers, the JDK generally tries to use expressive exception messages, and to improve their clarity. For example, Integer.parseInt
throws a null pointer exception with an uninformative message "null"
in JDK 11;74 in JDK 17, however, the maintainers changed it to the more informative "Cannot parse null string"
.75
5.6.3 Improving Project Documentation Using wit
Commons Math
have not been reviewed yet. Interestingly, one to project Commons Lang
was on hold for several months because the project maintainers realized that the 10 methods whose exceptional behavior we document are inconsistent in using IllegalArgumentException
vs. NullPointerException
, and they preferred to fix this inconsistency before updating the documentation.Commons Lang
, we opened a JIRA issue81 sharing our findings. Several months after our initial pull request, a GitHub user submitted four Javadoc modifications in a new pull request82 that mentioned our JIRA issue. Shortly afterwards, a Commons Lang
maintainer asked us to review the modifications in the new pull request, and suggested that we submit all our findings (i.e., all the exception preconditions that could be included in the documentation) in order to close the JIRA issue. In the end, we worked together with the author of the latest pull request to submit 89 wit exception preconditions (27 new pieces of Javadoc documentation and 62 fixing existing documentation), as well as tests for 9 classes. All of the exceptions from the additions and fixes occur in nested calls, which may explain why they went undetected for a long time. The pull request was accepted in the same day and merged ten days later.6 Threats to Validity
Commons IO
). As we discuss in Section 5.2, the nature of this project makes it especially challenging for wit, which implies that its recall may be higher on other projects (as the experiments using the other dataset DPa Zhong et al. (2020) suggest).7 Related Work
throw
statements that depend on them. PaRu’s empirical evaluation matches this mapping to the available documentation to assess its completeness; it found that 86% of the parameters linked to exceptional behavior are not documented in Javadoc.8 Discussion of Applications
8.1 Documentation
Conversion
in project Apache Commons Lang
. wit outputs two exception preconditions for the method:IllegalArgument
exception. At first sight, it may seem that wit’s output is incomplete (it doesn’t mention the preconditions “src
is empty” and “src
is null
” in the Javadoc)84 and needlessly verbose (isn’t src.length
\(\mathtt {<= 8}\) redundant?). A closer look, however, reveals that several aspects of the natural-language documentation are questionable or inconsistent. First, it mixes explicitly and implicitly thrown exceptions: a NullPointer
exception is thrown by the Java runtime when evaluating the expression on line 6, not by the method’s implementation. wit ignores such language-level exceptions by design; as we mentioned in Section 3.7, not including implicit exceptions in API documentation may be preferable Forward and Lethbridge (2002); Raymond and Weimer (2008). A second issue with Listing 10’s documentation is that it is incorrect: if src
is empty, the method does not throw an IllegalArgument
exception; instead, the Java runtime throws an IndexOutOfBounds
exception at line 12 (another system-level implicit exception). Finally, Listing 10’s documentation is inconsistent regarding the order in which the various exception preconditions are checked: whether src
is null
is checked first (implicitly), followed by src.length
\(\mathtt {> 8}\) (explicitly), src.length - srcPos
\(\mathtt {< 4}\) (explicitly), and whether src
is empty (implicitly)—in this order. Thus, predicate src.length
\(\mathtt {<= 8}\) in wit’s second inferred preconditions is not redundant but rather useful to ensure that the precondition precisely captures the conditions under which a certain path is taken. Admittedly, wit may sometimes present preconditions in a form that is harder to understand for a human; for example, it is questionable that the “simplification” of src.length - srcPos
\(\mathtt {< 4}\) into srcPos - src.length
\(\mathtt {> -4}\) improves readability. However, these are just pretty-printing details that are currently left to SymPy; changing them to generate constraints that follow certain preferred templates could be done following Nguyen et al.’s Nguyen et al. (2014) approach. In fact, one could even let the user decide the output format according to their preference. Overall, this example demonstrates that wit’s output often has all the information needed to generate accurate documentation that avoids ambiguities or other inconsistencies.
8.2 Generating Tests
[src.length=2, srcPos=0]
. Writing a test that initializes an array with two elements, calls the method in Listing 10, and checks that an IllegalArgumentException
is thrown (and that it contains a specific message) is straightforward. In fact, one could even try to automate the generation of tests and oracles from wit’s examples and preconditions. For example, using property-based testing Claessen and Hughes (2000): after expressing (2) (or even the specific example) as an input property, let a tool like jqwik
jqwik: Property-Based Testing in Java: https://jqwik.net/ randomly generate inputs that satisfy it.Fraction.getFraction
in Apache
Commons Lang
, which takes three integers whole
, num
, den
, and returns an object representing the fraction \(\texttt {whole} + \texttt {num}/\texttt {den}\). As we can see in Listing 11, getFraction
has 4 exception preconditions: (a) (line 2) when den
is 0; (b) (line 3) when den
is negative; (c) (line 4) when num
is negative; (d) (line 8) when the resulting numerator nv
exceeds the largest integer in absolute value. Commons Lang
is a thoroughly tested project Nassif et al. (2021), and in fact all four exceptional behaviors are tested.https://github.com/apache/commons-lang/blob/ce477d9140f1439c44c7a852d7df1e069e21cb85/src/test/java/org/apache/commons/lang3/math/FractionTest.java#L437 The 4 behaviors are not evenly tested though: 3 calls cover (a), 6 calls cover (b) (including three identical calls, which is likely a copy-paste error), 1 call covers (c), and 4 calls cover (d). Comments in the test method which refer to the four categories are sometimes misplaced (for example, two calls under “zero denominator” actually cover (d)). In contrast, wit’s example inputs correspond one-to-one and uniquely to each exception precondition: (a) den=0
; (b) den=-1
; (c) num=-1, den=1
; (d) whole=2147483648, num=0, den=1
. If we wanted multiple example inputs for the same precondition, we could just ask Z3 to generate more. In all, wit’s output can be quite useful to guide a systematic test-case generation process.
Commons Text
’s method FormattableUtils.append()
,https://github.com/apache/commons-text/blob/04748ac3693163685e411167e5c689eb9ae98dac/src/main/java/org/apache/commons/text/FormattableUtils.java#L90 which takes 6 arguments and comes from Java’s Formatter
interface.https://docs.oracle.com/en/java/javase/18/docs/api/index.htmlFormattableUtils.append()
’s exception precondition involves the negation of a disjunction of three Boolean predicates: . wit suggests an example input where e.length()
is 1, and p
is 0, which is easy to implement as a test. Another example is method StringSubstitutor.replace()
https://github.com/apache/commons-text/blob/04748ac3693163685e411167e5c689eb9ae98dac/src/main/java/org/apache/commons/text/StringSubstitutor.java#L742 in the same project, which takes three arguments (one character array and two integers) and may throw an exception in a nested call. As regularly seen in Apache Commons projects, the method accepts null or empty arrays; however, when the array is non-null, the exception precondition gets quite complex. wit provides exception triggering inputs for the three arguments, including that the character array must not be null and could be empty. In cases like this, we could reuse parts of wit’s extracted precondition to document the complex exception condition. The complexity of the precondition, together with it being in a nested call, may be the reason why the documentation and tests were missing in the project.
9 Conclusions
A
, it can use them to analyze the behavior of another project B
whenever it calls out to any methods in A
. Our empirical analysis suggested that modular analysis is a bit of a mixed bag: it does increase the number of exception precondition wit can detect, but it may also decrease the precision for the so-called “maybes”—exception preconditions that are reported separately, as wit could not conclusively establish that they are correct. Accordingly, wit can be configured to use modular analysis selectively, according to what is the main goal of its users. Investigating heuristics to help the automatic selection of these configuration options is an interesting direction for future work.