1 Introduction
java.util
package, that our learned class invariants are at least as accurate as, and often surpass, those detected by Daikon or documented in the code. Beyond software maintenance, class invariants also support various software development activities, including software testing [13].2 Foundations
SimpleSquare
in Figure 1 models a square with a non-zero positive length using the two integer attributes width (w
) and height (h
). Other objects can interact with SimpleSquare
by invoking its public methods to set the length of the square or to compute its geometric properties, or to obtain an equivalent object of class SimpleRectangle
. Note that method setLength
performs thorough input validation and throws an IllegalArgumentException
if the provided method argument value is not strictly positive.
aspectRatio
does not need to check that attribute h
is non-zero to avoid a division-by-zero exception, because this is implied by the invariant. Similarly, method toRect
can assume that constructing a new SimpleRectangle
object always succeeds.toSquare
does not raise an exception.3 Approach
SimpleSquare
running example.it. | current assertions A | found new \({o/\bar{o}}\) | removed assertions \(A_{del}\) | new assertions \(A_{new}\) |
---|---|---|---|---|
1 | \(\emptyset \) | \(\emptyset \) | \(\{ false \}\) | |
2 | \(\{ false \}\) | \(\{ false \}\) | \(\{ w=1 \}\) | |
3 | \(\{ w=1 \}\) | \(\emptyset \) | \(\{ w=h \}\) | |
4 | \(\{ w=1, w=h \}\) | \(\{ w=1 \}\) | \(\{ w > 0 \}\) | |
5 | \(\{ w=h, w > 0\}\) | \(\bot \) |
SimpleSquare
. Valid objects such as
are indicated by a solid box, while invalid objects such as
are shown in a dashed box. The remainder of this section uses this example to illustrate the workings of our invariant learning approach.3.1 A Triangle of Oracles
3.2 Generating Valid Object States via Random Walks
IllegalArgumentException
thrown by method setLength
. In contrast, unexpected exceptions are prevented by the class invariant. For example, a division-by-zero exception cannot be thrown in method aspectRatio
, because the invariant guarantees that the height is non-zero. In practice, all checked exceptions in Java are typically expected exceptions and some unchecked exceptions are unexpected exceptions.setLength
with value 2 suffice. To enforce termination, we bound the random walk with respect to the number of walks and the number of method calls per walk. To ensure deterministic behavior, one may either randomly select a builder/action using a fixed seed (like Randoop [26]) or exhaustively explore all builder/action combinations up to a given depth (like EvoSpex [25]). Thus, not finding a valid object that is misclassified as invalid by the candidate class invariant does not guarantee the absence of one. The effectiveness of finding a misclassified object depends on the object state coverage achieved by the random walk.setLength(2)
on the freshly constructed object. No valid object is misclassified as invalid for the invariant at the start of the fifth iteration (\(w = h \wedge w > 0\)). Hence, this invariant is sound and, as we will see later, it is also complete.3.3 Detecting Invalid Objects via Behavioral Oracles
SimpleSquare
objects (\(\bullet \) detected, \(\circ \) undetected)invoked method/tested property | |||||
---|---|---|---|---|---|
aspectRatio() | \(\bullet \) | \(\bullet \) | \(\circ \) | \(\circ \) | \(\circ \) |
toRect() | \(\bullet \) | \(\bullet \) | \(\bullet \) | \(\circ \) | \(\circ \) |
area()>0 | \(\bullet \) | \(\bullet \) | \(\circ \) | \(\circ \) | \(\circ \) |
perimeter()>0 | \(\bullet \) | \(\circ \) | \(\bullet \) | \(\circ \) | \(\circ \) |
aspectRatio()==1 | \(\bullet \) | \(\bullet \) | \(\circ \) | \(\bullet \) | \(\circ \) |
toRect().toSquare() | \(\bullet \) | \(\bullet \) | \(\bullet \) | \(\bullet \) | \(\bullet \) |
aspectRatio
throws a division-by-zero exception if the height is zero, thus detecting the first two invalid objects. Method toRect
creates a new rectangle with the same width and height as the current square. The constructor of class SimpleRectangle
(not shown) validates the input width and height and throws an exception if argument values are not strictly positive, thus subsuming the aspectRatio
method in terms of its detection capabilities. However, it fails to detect objects whose strictly positive width and height differ.SimpleSquare
is that the area and the perimeter must be greater than zero and that the aspect ratio must be equal to one. In addition, the translation from a square to a rectangle and back to a square should be possible without raising an exception. Observe that the area property detects invalid objects with either the width, height or both equal to zero. The perimeter property detects those invalid objects where the sum of width and height is not strictly positive. Note that the aspect ratio property, in addition to its corresponding observer action, detects some states (due to integer division) where w
and h
differ. The last property subsumes its associated observer action and detects all invalid objects.3.4 Generating Invalid Objects via Bounded-Exhaustive Testing Techniques
3.5 Invalid Object Integration
SimpleSquare
object. Terminals such as “1” or “>” denote constants or operators, and non-terminals such as \(\textbf{Int}\) are types. Symbol \(:\,\!:=^{+}\) indicates that we supplement a non-terminal with new rules.aspectRatio()
in Figure 4 detects the invalid object
, which can be integrated by adding the assertion \(w=h\) to the candidate class invariant. Note that this assertion also detects the invalid object
that is not detected by the oracle.4 Evaluation
-
RQ1 How do random walks and property-based tests compare to a ground-truth class invariant in terms of detecting invalid objects?
-
RQ2 What is the disparity between the class invariant learned by Geminus and the employed behavioral oracle?
-
RQ3 How does the accuracy of the class invariant(s) learned by Geminus, detected by Daikon, and documented as invariant validation methods differ?
4.1 Benchmark Composition
StackAr
and QueueAr
, which were adapted from [37] and provide an array-based implementation of a stack and queue, respectively. The majority of our dynamic data structures originate from the Java collections framework java.util
. Class ArrayList
and legacy class Vector
both provide a linear collection via an array-based implementation. In addition, class LinkedList
provides Deque
/Queue
functionalities via a linkage-based implementation, while class ArrayDeque
uses an array-based implementation. Class PriorityQueue
handles comparable elements via an array-based priority heap, and class BitSet
offers a memory-efficient bit vector.+
, -
, ==
, !=
, \(\texttt {>=}\), \(\texttt {>}\)), object identity, range null checks in arrays, and the ternary operator (c?b:true
) to encode implications. Extending the grammar with additional operators, such as multiplication or division among integers, is straightforward and may improve the expressiveness of the grammar. However, the increase of assertions expressible in the grammar may lead to timeouts during assertion synthesis. For our experiments, we limit assertion generation to a maximum of 75 000 assertions.4.2 Evaluation Results
Item | Ground-truth | RW | PBT | ||
---|---|---|---|---|---|
val. | inv. | A | FN | FN | |
SimpleSquare | 10 | 431 | 2 | 90 | 0 |
StackAr | 4097 | 4095 | 3 | 0 | 0 |
QueueAr | 322 | 10678 | 13 | 3 078 | 152 |
PriorityQueue | 1918 | 154954 | 8 | 63 149 | 36 708 |
BitSet | 2047 | 40961 | 6 | 19 099 | 18 434 |
ArrayList | 4083 | 38925 | 4 | 16 398 | 16 398 |
Vector | 4083 | 38925 | 4 | 16 398 | 16 398 |
LinkedList | 4 | 38335 | 4 | 4 | 0 |
ArrayDeque | 385 | 345727 | 12 | 169 593 | 0 |
4.3 Oracle Accuracy Comparison
StackAr
, ArrayList
, and Vector
. Additionally, random walks identify a significant portion of invalid objects for LinkedList
. The majority of unexpected exceptions arise from null dereferencing or accessing out-of-bounds indices in arrays. Random walks cannot assess whether the retrieved elements from a PriorityQueue
are in the correct order. The documentation states that retrieving the first element from an ArrayDeque
throws an exception if the structure is empty, but random walks cannot detect cases where the queue is considered empty, yet a retrieval does not throw an exception.BitSet
, ArrayList
, and Vector
implementations nullify unused array elements to aid garbage collection, which does not affect functional behavior. However, our tests, which focus on functional behavior, cannot detect objects violating this property. Random walks can also only uncover faults related to functional behavior. In the case of StackAr
, where the ground-truth class invariant is limited to functional aspects only, both our tests and the random walks detect all invalid objects. For PriorityQueue
, polling the first element involves a sift-down operation, partially repairing an invalid object state. In contrast, a QueueAr
with a capacity of zero is considered both empty and full simultaneously, leading any method to return immediately, and concealing the remaining state. This is a known debugging scenario [38], where a bug can lead to an invalid object state without necessarily causing an observable error.Item | Geminus+RW | Geminus+PBT | Daikon | Doc | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FN | t | A | O | \(\bar{O}\) | FN | t | A | O | \(\bar{O}\) | FN | FP | t | A | FN | A | |
SimpleSquare | 0 | 3 | 2 | 2 | 2 | 0 | 4 | 2 | 1 | 2 | 1 | 0 | 7 | 2 | – | – |
StackAr | 0 | 6 | 2 | 3 | 4 | 0 | 5 | 2 | 3 | 4 | 0 | 0 | 7 | 4 | – | – |
QueueAr | 2 229 | 7 | 4 | 13 | 12 | 542 | 93 | 15 | 39 | 50 | 2 513 | 0 | 9 | 9 | – | – |
PriorityQueue | 62 545 | 31 | 2 | 4 | 5 | 9 277 | 298 | 3 | 11 | 11 | 112 462 | 0 | 32 | 6 | – | – |
BitSet | 18 434 | 11 | 2 | 3 | 4 | 18 434 | 9 | 2 | 3 | 4 | 55 | 2036 | 45 | 3 | 0 | 3 |
ArrayList | 16 398 | 10 | 2 | 3 | 4 | 16 398 | 10 | 2 | 3 | 4 | 16 398 | 0 | 53 | 3 | 7 181 | 2 |
Vector | 16 398 | 21 | 2 | 3 | 4 | 16 398 | 21 | 2 | 3 | 4 | 16 398 | 0 | 49 | 4 | 7 181 | 2 |
LinkedList | 0 | 15 | 10 | 4 | 29 | 0 | 15 | 10 | 4 | 29 | 0 | 4 | 26 | 16 | 729 | 1 |
LinkedList* | 0 | 10 | 2 | 3 | 2 | 0 | 10 | 2 | 3 | 2 | ||||||
ArrayDeque | 98 966 | 74 | 5 | 6 | 9 | 0 | 60 | 8 | 23 | 24 | 169 593 | 0 | 23 | 7 | 30 079 | 7 |
4.4 Disparity between Learned Invariants and Leveraged Oracles
SimpleSquare
(see Table 2), Geminus still manages to learn the correct class invariant. The accuracy of the learned class invariant depends on the assertion grammar and the order in which candidate assertions are generated. For SimpleSquare
, assertions \(w=h\) and \(w > 0\) are generated before assertions \(w \ge 1\) and \(h \ge 1\), which would also resolve all misclassified objects found by the random walk oracle.PriorityQueue
and ArrayDeque
. The current assertion grammar is not sufficiently expressive to generate a parametrized assertion such as \(\texttt {queue[(i-1)/2].compareTo(queue[i])<=0}\), which is required for item PriorityQueue
. Nevertheless, the learned invariant is more accurate than the underlying oracle. In contrast, Geminus learns a less accurate class invariant for QueueAr
. While the assertion grammar is expressive enough to generate a suitable assertion with multiple conditions that resolves the indistinguishability, the current assertion limit is insufficient in this case.4.5 Comparing Geminus, Daikon, and Invariant Validation Methods
StackAr
, but it generates less accurate class invariants for other benchmark items. For SimpleSquare
, it identifies the incorrect invariant \(w = h \wedge w \ge 0\), which fails to detect
. While [20] excludes unqualified calls, Daikon considers them, which may result in learning an overly permissive invariant. In contrast, Geminus considers qualified calls only and learns the correct invariant.BitSet
and LinkedList
. For BitSet
, this is due to the random walk configuration inadequately representing the object state space, which leads Daikon to retain the overly restrictive assertion \(\texttt {words[] elements >= 0}\), encoding that all array elements are greater than or equal to zero. Because Geminus solely adds assertions to detect previously undetected invalid objects, it learns the correct invariant in this example. While this mechanism proves advantageous when dealing with unrepresentative valid objects, Geminus relies on a representative set of invalid objects.LinkedList
class uses a doubly-linked list structure with prev
and next
attributes. Daikon detects assertions aiding program comprehension, but it lacks the necessary guards to avoid false-positives. While Daikon only considers valid objects and thus does not require an additional oracle to detect invalid objects, it may learn overly permissive invariants. For example, Daikon identifies the doubly-linked style through the first == first.next.prev
assertion. However, it overlooks the need for a guard to prevent null dereferencing. Identifying necessary assertions containing guards is a challenging task when only valid objects are available. Considering invalid objects assists Geminus in finding the necessary assertions, like first != last ? first == first.next.prev : true
. Despite its recursive structure, Geminus learns an invariant that accurately detects all invalid objects. This is possible because the bounded-exhaustive object state generator only covers object states for LinkedList
, including up to three list nodes. Note that linkage-based classes exhibit large object state spaces even for a small number of linked elements, which is due to reference aliasing. While the documented validation method accurately characterizes the case of an empty list, it imposes an overly permissive constraint for non-empty lists, namely first.prev == null && last.next == null
. The crucial constraint that the previous attribute of the next node is the current node is not documented.first
element via the next
attribute is reverse to the closure from the last
element via the prev
attribute. In LinkedList*
, Geminus uses this grammar to learn an invariant that generalizes to lists of arbitrary length.BitSet
, ArrayList
, and Vector
require null elements at the next free array location, while our ground-truth checks all remaining locations. Both constraints do not affect the functional behavior and are thus not detectable by our oracles. In practice, invariants ensuring a functionally equivalent behavior typically suffice. Similarly, ArrayDeque
requires elements in the queue to be different from null. It concludes from a null value when fetching the first/last element that the queue is empty. The documentation mentions that all non-live elements in the array are null, but this is only partially checked in their checkInvariants
method, leading to numerous undetected invalid objects.LinkedList
or ArrayDeque
.5 Related Work
6 Conclusions
java.util
package, resulting in a higher accuracy in detecting invalid objects. Considering invalid objects, too, allows Geminus to prevent the generation of equivalent assertions, thereby leading to concise invariants without the need for static assertion equivalence checks.