Dieses Kapitel untersucht die Anwendung der Incorrectness Logic (IL) zur Erkennung von Fehlern in objektorientierten (OO) Programmen, aufbauend auf der Dualität zwischen IL und Hoare Logic (HL). Es befasst sich mit den Beschränkungen bestehender Werkzeuge wie Pulse-X, die mit OO-spezifischen Merkmalen wie Klassenhierarchien und dynamischem Dispatching zu kämpfen haben. Das Kapitel führt ein typsensitives IL-Spezifikationsinferenzsystem ein, das Gießvorgänge und Methoden überschreiben kann, was entscheidend für die genaue Fehlererkennung in OO-Programmen ist. Eine Schlüsselinnovation ist der Zusammenführungsmechanismus, der Fehlermeldestrategien verallgemeinert, um mehr echte Fehler zu entdecken und das Problem der Pfadexplosion zu entschärfen. Dieser Mechanismus ist besonders effektiv bei der Identifizierung von class-cast-exclutions (CCEs) und null-pointer-exclusions (NPEs), zwei weit verbreiteten Fehlertypen in OO-Programmen. Das Kapitel präsentiert auch eine detaillierte Bewertung des vorgeschlagenen Systems ILoop, das seine überlegene Leistung bei der Erkennung von Fehlern im Vergleich zu modernen Werkzeugen demonstriert. Anhand motivierender Beispiele und experimenteller Ergebnisse veranschaulicht das Kapitel, wie ILoop die Präzision statischer Analysen deutlich verbessern kann, wodurch es zu einem unschätzbaren Werkzeug für Entwickler und Forscher im Bereich der Softwareverifikation und -prüfung wird.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
Incorrectness logic (IL) based on under-approximation is effective at finding real program bugs. The prior work utilises bi-abductive specification inference mechanism to infer IL specifications for analysing large-scale C projects. However, this approach does not work well with object-oriented (OO) programs because it does not account for class inheritance and method overriding. In our work, we present an IL specification inference system that tackles these issues. At its core, we encode type information in our bi-abductive reasoning and propagate type constraints throughout the analysis. The direct benefit is that we can efficiently identify bugs caused by improper usage of the casting operator, which cannot be handled by the existing specification inference. Meanwhile, our system can reduce false positives while finding more true bugs because of not losing OO-type information. Furthermore, we model dynamic dispatching calls by inferring dynamic specifications, where the possible types of the calling object at runtime are bounded by the type constraints. We prototype our system in ILoop and evaluate it using real-world projects. Experimental results show that it finds 400% more class-cast-exceptions compared with Error Prone and improves the precision of finding null-pointer-exceptions by 27.0% compared with Pulse.
1 Introduction
Incorrectness Logic (IL) [31], as a dual to Hoare logic (HL), is an effective and principled approach for proving the presence of bugs. A recent work [20] implements a tool called Pulse-X to infer IL specifications within the Meta/Infer framework and aims at real bug detection for large C-based projects. In Pulse-X, IL with its extension via Incorrectness Separation Logic (ISL) are used together with bi-abduction [12] to infer specifications automatically. Pulse-X has been shown to be effective in finding bugs in real-world projects such as OpenSSL.
The current IL bi-abductive inference mechanism [20] only associates every variable with its declared type during the analysis. However, this is inadequate for modelling OO programs. In OO programming, types form class hierarchies and declared types encompass themselves and all their subclasses. Consequently, a method could accept multiple types during the real execution. These characteristics create challenges for the existing inference mechanism.
Anzeige
Firstly, it cannot handle the casting operation, which is widely used in OO programs. The casting operation is in the form of \({ (C)\,e }\), which casts the source type of the value by evaluating expression \({ e }\) to type \({ C }\). Casting operations can cause system failures, i.e., class-cast-exception (CCE), at run-time when the source type is not a subtype of \({ C }\). Research [14, 19, 24, 30] has shown that the CCE stands as one of the most pervasive bugs in OO programs. Unfortunately, the current approach could not analyse casting operations as it cannot recognise object type possibilities. In addition, a lack of OO-type information leads to false positives, as some bugs will only occur if some type constraints are satisfied. However, as variables are type-insensitive in [20], it may report infeasible bugs.
Furthermore, because this mechanism uses fixed types, each method call, e.g., \(x.mn(\bar{y})\), is considered to be statically dispatched. Then, the analysis could be imprecise. Suppose the type declared for x is an interface; it could not find a specification as mn does not have an implementation in the interface. If the declared type of x is a normal class, it loses precision due to the ignorance of subtypes and method overriding in OO programs. Considering these unsolved issues are crucial in OO programs, the current inference approach must be advanced.
Incorrectness Logic and Bi-abduction.
denotes an IL triple. Here,
captures symbolic traces of successful or error outcomes. Intuitively, an IL triple is valid if every program state satisfying the postcondition is reachable from some program states satisfying the precondition. A key feature of IL is that it allows dropping execution paths while ensuring all described paths are true in actual executions. Hence, an error postcondition
stands for true bugs. A bi-abduction problem [12] \(p*m\vdash _{bi} q*f\) is to abduce a missing formula m, which is necessary to execute a command and calculate an unchanged framef. Bi-abductive reasoning can generate HL specifications automatically. As IL is dual to HL, Pulse-X adapts bi-abduction to infer IL specifications due to the flipped consequence rule. Specifically, the IL bi-abduction problem is \(q*f\vdash _{bi} p*m\) where m is inferred via frame calculation. Pulse-X analyses each method starting from the typical formula \( emp \,{\wedge }\, true \), while in our system, the initial condition will record all declared type information. Our system builds up and propagates type constraints throughout the reasoning, accommodating the bug finding for CCEs and recording the possible types for dynamic dispatching in real executions.
Errors in OO Programs and Error Reporting. In this work, we target CCEs and null-pointer-exceptions (NPEs), which occur when trying to access a null pointer that does not point to an object. For NPEs, not all the possible errors are of programmers’ interest. For example, the method \({ foo(A\,a)\{a.mn()\} }\) can trigger an NPE when null is given as its input. The programmer may reason that \({ a }\) will rarely be null and decide to ignore this possible NPE.
Anzeige
To systematically decide if an error is worth reporting and reduce false positives, Le et al. [20] defined manifest bugs, which persistently occur regardless of the input values, and latent bugs which only occur for some input values. Following the convention, in this work, we also target manifest bugs for NPEs. Pulse-X may generate multiple specifications for one analysed method. Each specification is associated with one path of the program. However, to determine manifest bugs, they examine specifications individually while ignoring the bugs that exist in multiple paths. We propose a merging mechanism which generalises the reporting strategies to discover more true bugs. In addition, dynamic dispatching calls introduce a large set of paths, as each possible type leads to a different set of paths, which worsens the path explosion problem. The proposed merging mechanism can mitigate the problem by combining compatible specifications. The mechanism reduces the path space without sacrificing path information. On the other hand, we argue that latent CCEs are also worth reporting. For example, the programmers may not be aware of the entire class hierarchy and ignore some type possibilities for input objects. Some inputs are fine, but those ignored objects could be dangerous, especially when the code is re-used or used externally. Hence, we further relax the reporting criteria which covers a larger set of interesting bugs. Our contributions are:
We propose an IL specification inference system for OO programs. Our system is type-sensitive, such that it can effectively reason about OO features and find bugs which cannot be handled by the existing inference system.
We propose bug reporting criterion for both NPEs and CCEs. The NPEs reporting criteria is a generalisation of the existing work via merging and the merging mechanism can also mitigate the path explosion issue.
We implement the inference mechanism in a tool called ILoop. Our experimental results show that our tool outperforms the state-of-the-art tools. The source code of the ILoop is available from [8].
2 Motivating Examples
Our motivation examples demonstrate that our approach can effectively detect CCEs, and increase the precision of the existing static analysis for OO programs.
Fig. 1.
A Casting Error Found in an Open-Source Project Pdfbox [6]
2.1 Detecting Class-Cast-Exceptions
Fig. 1 shows a possible casting error found by ILoop. The input of this method is a \( COSBase \) object. In the if branch, the developer uses an \( instanceof \) operator to guard the casting \( (COSObject)\,o \). However, in the else branch, the developer directly casts the object o to \( COSDictionary \), which may cause a run-time exception as there exist classes that are neither subtypes of \( COSObject \) nor \( COSDictionary \). This issue has been existing for more than ten years, and fixed by the developer recently (June, 2024). To identify this bug, ILoop starts with an initial program state, \(\phi _0 \,{=}\, (ty(o){\prec :} COSBase )\), meaning that the input type of \( o \) is \( COSBase \) or \( COSBase \)’s subclasses. At line 4, ILoop extends the state with the type constraint: \(\phi _1 \,{=}\, (ty(o){\not \prec :} COSObject )\). When analysing line 4, ILoop explores the possibility of CCEs, which is when \(\phi _2 \,{=}\,(ty(o) {\not \prec :} COSDictionary )\). Since \( \phi _2 \) does not contradict to the current state, ILoop infers an error specification with a precondition containing \( \phi _0\,{\wedge }\,\phi _1\,{\wedge }\,\phi _2 \).
2.2 Increasing Bug-finding Precision
ILoop is more accurate than the state-of-the-art tool Pulse, the commercial version of Pulse-X, by reducing false positives while finding more true positives.
Reduce False Positives. As shown in Fig. 2, Pulse reports a bug at line 2, which calls the method defined at line 4 by passing \( null \) as the second argument. As the second formal argument, object \( icon \) is dereferenced at line 7 to access its method \( getImage() \); and if \( icon \) is \( null \), there is an NPE. Hence, Pulse reports this error.
However, as this method call is under an \( instanceof \) checking and \( null \) is not an instance of any class, \( icon \)’s value will never be \( null \) at line 7. Therefore, there is no NPE. ILoop avoids such false positives by inferring specifications containing precise type constraints. The precondition inferred for entering the if branch at line 7 contains a condition \(ty(icon) {\prec :} ImageIcon \). Then, ILoop finds that the method call at line 2 does not take this precondition as a valid call since \(icon = null\). Thus, ILoop does not report any NPEs here.
Fig. 2.
A (Simplified) False Positive Reported by Pulse [4]
Fig. 3.
A True NPE in Infer’s Test Repository
Find True Positives. Fig. 3 presents a buggy program from Infer’s test repository [17]. Unfortunately, this bug has existed in this repository for several years but still cannot be found by its toolchain. There are two classes declared in this example where \({ B }\) is a subclass of \({ A }\). \({ B }\) overrides the method \({ foo() }\) such that \({ A.foo() }\) returns a new \({ Object }\) instance, while \({ B.foo() }\) returns \({ null }\). The method \({ dyn\_mn }\) takes the object \({ o }\) as the input, and \({ o }\) could be either an instance of \({ A }\) or an instance of \({ B }\). The method executes normally if it has type \({ A }\) but throws an NPE if it has type \({ B }\). Pulse could not detect such bugs as it only analyses the case where \( o \) is type \( A \) and fails to consider the other possible type \( B \). In addition, this bug becomes manifest in method \({ buggy }\) at line 4 as it calls method \({ dyn\_mn }\) by always passing a \({ B }\) type instance as an input. However, Pulse does not support the reasoning for the dynamic dispatching call shown in the example, i.e., the specification of the overriding method in the subclass is missing. As such, it could not derive the error specification for \({ buggy }\). This example highlights the need for a systematic method to catch and report such bugs in OO programs.
In our approach, ILoop infers the static specifications for both \({ A.foo() }\) and \({ B.foo() }\) according to their implementations, respectively. Meanwhile, ILoop composes a dynamic specification for \( A\,{:}\,foo() \) from the earlier inferred static specifications of both A and B. The notation \( A\,{:}\,foo() \) means that \({ foo }\) is dynamically dispatched. ILoop utilises the dynamic specification to cover the behaviour when \( o \) is type B in \({ dyn\_mn }\) and captures the missing bug in \({ buggy }\).
3 Target Language and Specification Language
Fig. 4.
A Core Object-Oriented Language
Fig. 4 presents our target OO language, which is call by value and uses single inheritance. The entire class hierarchies of a program are constructed via extends keyword. \({ Object }\) is an implicit superclass of all classes; x, y... range over variables. The \( const \) represents the constant values. Following the encoding convention [20, 31], we represent conditionals as \( (assume(b);S_1){+}(assume(\lnot b);S_2) \) where \( b \) is a Boolean value and \(+\) is a non-deterministic choice between two statements; and \({ while }\) is encoded as \( ( assume (b); S)^* ; assume (\lnot b) \) where \( * \) is the Kleene star iteration. The semantics of the core language can be found in our technical report [27].
Fig. 5.
An Assertion Logic for OOP
Fig. 5 presents the syntax of the specification language, where \(\kappa _1*\kappa _2\) presents two non-overlapping heaps via separation conjunction \(*\); \(\ x.f{\mapsto } e \) means the field f of x points to e and \(x\,{:}\,C\) means the run-time type of x stored in the heap is C. We use a simplified notation \(x{\mapsto } C\langle \bar{f}\,{:}\,\bar{e}\rangle \) to denote a constructed heap object. \(x{\mapsto } C\langle \bar{f}\,{:}\,\bar{e}\rangle \) is a point-to predicate where the object x has the exact type C and the fields \(\bar{f}\) from C points to \(\bar{e}\). We may shorten it to \(x{\mapsto } C\langle \bar{f} \rangle \) for simplicity in some following sections. By default, we know which class a field f belongs to. Lastly, \(\phi \) stands for pure arithmetic constraints. In contrast to the prior works [20, 35], we do not have the notation \(x{\not \mapsto }\) called “negative heap”, as we do not have explicit memory management, such as \({ free() }\) to de-allocate objects from heaps. In addition, we have a set of extra terms to constrain the types in our pure logic. The type of an object is immutable throughout its lifetime. We can use those terms to constrain the allocated type. For example, \(ty(x) {=} C\) means the run-time type of x is exactly C while \(ty(x) {\prec :} C\) (\(ty(x) {=} C \vee ty(x) \prec C\)) can be used when x’s type is either C or its subclasses.
4 Specification Inference
We semantically define IL triples [31] via program transitions. A configuration is a pair \(({{ S }}, \sigma )\) where \( S \) is a program and \(\sigma \) is a program state, i.e., the valuation of both memory stacks and heaps. A program transition is a binary relation \(\leadsto \) on configurations. Relation \(({{ S }}, \sigma ) \,{\leadsto }\, ({{ S' }}, \sigma ')\) holds if the execution of the statement in the configuration \(({{ S }}, \sigma )\) results in the new configuration \(({{ S' }}, \sigma ')\). We define \(\leadsto ^*\), the reflexive-transitive closure of \(\leadsto \), to capture finite executions. We assume all terminating executions end at a \( skip \) statement. We use \(\sigma \,{\in }\,\llbracket p \rrbracket \) to denote that the program state \(\sigma \) satisfy the assertion p. Finally,
denotes a valid IL triple, where \(T_{sp}\) is a context storing the specifications for the analyzed methods. Formally,
with the specification context
.
4.1 IL Triples For OO Statements and Type Constraint Propagation
Fig. 6.
Primitive IL Triples For OO Statements
Fig. 6 presents a set of valid IL triples [26] for primitive OO program statements. As these triples hold without context \(T_{sp}\), we omit it here. Rules \(\mathbf{\scriptstyle Skip}\), \(\mathbf{\scriptstyle Read}\) and \(\mathbf{\scriptstyle Write}\) are standard. Rule \(\mathbf{\scriptstyle Assume}\) allows us to back-propagate the Boolean expression to the precondition as a path condition. There are three possibilities for the \({ instanceof }\) operation. Rule \(\mathbf{\scriptstyle InsNull}\) states that \( null \) is not an instance of any class. If x is allocated, it can either be or not be an instance of \({ C }\), denoted by rules \(\mathbf{\scriptstyle InsT}\) and \(\mathbf{\scriptstyle InsF}\). Similarly, there are three possibilities for casting, and one leads to CCEs. We use the default \( res \) in poststate q to denote the result being returned from an expression \({ e }\) in
.
Based on primitive IL triples, specification inference allows us to generate specifications for bigger code blocks [20, 37], which consist of primitive statements. We show that such a mechanism can be applied to propagate type constraints according to program statements, which are critical for analysing OO programs. For example, statement \( if\,(x\,instanceof\,C)\,{...}\,else\,{...} \) results two possible specifications: \(ty(x) {\prec :}C\) for the if branch and \(ty(x) {\not \prec :} C\) for the else branch. Type constraints indicate the possible types for an object at run-time.
The example in Fig. 7 is taken from an open-source project HdrHistogram and fixed by the developer [7]. For simplicity, we only show the typing part of the inferred specification. The initial state before the \({ if }\) statement is \(\phi _0 \,{=}\)\(({ty(other) {\prec :} { Object }})\), obtained from the method signature. The (boxed) constraint \(\phi _1 \,{=}\, ({ty(other) {\prec :} { AbsHis }})\) (according to the if condition) is back propagated to form the precondition for entering the if branch, i.e., \(\phi _0\,{\wedge }\,\phi _1\). For the casting operation at line 5, ILoop infers \( \phi _2 \,{=}\,(ty({other})\,{\not \prec :}\, {DblHis}) \) (
) as the missing formula which leads to an error postcondition. As the accumulated type constraint is satisfiable when reaching the post, i.e., \((\phi _0\,{\wedge }\,\phi _1\,{\wedge }\,\phi _2) \,{\not =}\, false \), it indicates that this error is on a feasible path. The states at line 2 and line 6 form an error specification for this method. In fact, \({ AbsHis }\) and \({ DblHis }\) are two unrelated classes, and this bug was caused by a typo from the programmer.
Fig. 7.
Finding a CCE [7], via Type Constraint Propagation
4.2 Inference Relations
We now discuss how to automatically achieve IL specification inference for OO programs. Given a statement \( S \), we use the following relation
to infer a missing formula m which is necessary to execute \({ S }\) and computes the corresponding postcondition
with a given precondition p. \(T_{sp}\) is the specification table which initially contains the primitive rules in Fig. 6. For each analysed method, its inferred specifications are stored in \(T_{sp}\), and used to further infer the specifications for the rest of the methods. Instead of using a standard \( emp\,{\wedge }\,true \) symbolic heap [12, 20] when analysing a new method, the inference is initialized with a precondition p that records the declared type for each input object. For example, given a method definition of class C: \( t_0\,{ mn } \, ({args})\, \{ S \} \), the precondition for reasoning S is initialized as follows:
The rest of the inference relations are presented in Fig. 8. The system performs forward symbolic executions. During the inference, the bi-abduction obligations in the form of \(q * f \vdash _{bi} p * m\) are solved by the approaches in [12], where the missing resource m is inferred through frame calculation, and the anti-framef carried is abduced. \(\mathbf{\scriptstyle ASSIGN\text {-}VAR}\) performs standard Floyd’s forward assignment rule. \(\mathbf{\scriptstyle LOCAL}\) picks fresh variables to represent locally declared variables in specifications. The “default\(\_\)value(\(v_t\))” means the default value when a variable of type t is declared. \(\mathbf{\scriptstyle CHOICE}\) rule is design for non-deterministic choice \(+\) which paths could be split. \(\mathbf{\scriptstyle SEQ}\) performs the sequential composition. In \(\mathbf{\scriptstyle SEQ2}\), \({ mod }(S)\) returns the set of variables modified in the program S and \({ fv }(f)\) is the set of free variables in formula f. The \(\mathbf{\scriptstyle UNROLLING}\) rule is designed Kleene star iterations \(S^*\) which allows it to unroll non-deterministically. In this work, we use upper-bounded loop unrolling. The inference process terminates once it reaches an
postcondition.
Method Calls. There are two kinds of calls: \(\mathbf{\scriptstyle CALL\text {-}STATIC}\) and \(\mathbf{\scriptstyle CALL\text {-}DYNAMIC}\) are for static and dynamic calls, respectively. In our language, both the method calls are in the form of \(x.{ mn(\bar{y}) }\) (we omit the arguments and use x.mn for simplicity). We use \(ty\_constraints({ x })\) to denote the set of all type assertions of x in the pre-state formula. We say that this call can be statically determined if there is only one type possibility for x. For example, x is locally initialized by \( new\,C(...) \), then \(ty(x)\,{=}\,C\). In this case, we use the static specification for this call. Static specifications are directly inferred through the inference relations for each method by analysing its concrete implementation. We store the inferred specifications in \(T_{sp}\) and can be retrieved by ST(C.mn). Note that we use \(\mathbf{\scriptstyle CALL-STATIC}\) to process the primitive statements shown in Fig. 6.
Dynamic Specifications. On the other hand, if the type of x is not statically determined, x.mn is dynamically dispatched. We use C.mn for the mn implementation in class C, and \(C\,{:}\,mn\) to denote the set of mn implementations in C and its subclasses. Specifications for such \(C\,{:}\,mn\) are dynamic specifications, denoted by \( DY(C\,{:}\,mn) \). A natural way to derive dynamic specifications is to collect the static specifications of \( mn \) in all \(C'\), where \(C' {\prec :}C\). Formally,
Definition 1
Given class \({ C }\) and its subclasses, let \(C\,{:}\,mn\) be the set of implementations of mn in these classes. The dynamic specification, denoted by \(DY(C\,{:}\,mn)\), is defined as follows:
The derived dynamic specifications will also be stored in \(T_{sp}\). To find a correct dynamic specification for a dynamically dispatched call x.mn, we need to follow these steps: 1) Find the least positive type constraint of x (we call a type constraint \(ty(x)\,{\prec :}\,C\), \(ty(x)\,{\not \prec :}\,C\) as positive constraint and negative constraint, respectively). Let it be \(ty(x)\,{\prec :}\,C_l\). By least positive type constraint, we mean that \(C_l\) is not the superclass of any other C in the other positive type constraints; 2) Find \(DY(C_l\,{:}\,mn)\); 3) Trim \(DY(C_l\,{:}\,mn)\) by removing specifications of infeasible types according to the negative type constraints. Examples can be found in [27].
Note that constructors are special methods that only require static specifications. When analysing a constructor C(...), the initial precondition p contains an allocated heap object (all uninitialized fields are null at the beginning) with the exact type \( ty(...) \,{=}\, C \). Upon an
termination, its reference is implicitly returned. We define the soundness of our inference mechanism in Theorem 1.
Theorem 1
(Soundness of the Inference Relations). For all \(T_{sp}, p, M, S, \epsilon , q\), if the inference relations conclude that
,
then
.
5 Bug Reporting
We aim to create a practical analyser with low false positives and high true positives. This section outlines our efforts to achieve this for OO programs.
5.1 Merging
Prior work [20] defines manifest bugs and latent bugs. In a nutshell, latent bugs are context-dependent, which will not always occur. In contrast, manifest bugs occur regardless of the calling context and should be reported to the user. In particular, to find manifest bugs, the previous tool classifies an
triple as manifest if its precondition is \(emp\wedge true\) or relaxed-manifest if its precondition contains heap-allocated variables without any pure constraints. Otherwise, it is classified as a latent bug. However, this approach only reports a subset of manifest bugs as they examine specifications individually and hence may miss manifest bugs amongst multiple paths. We show such an example in Fig. 9, where class B extends class A, and two branches are rejoining at the \({ error() }\) statement. Hence, the error occurs regardless of the type of the input x.
Fig. 9.
A Manifest Bug
We may infer two specifications for each branch separately. The error occurs in both the if branch and the else branch. However, using the previous approach, we will find that the inferred specifications contain path conditions \(ty(x) \,{\prec :}\, B\) and \(ty(x)\,{\not \prec :}\,B\), respectively. Therefore, we need to classify the triples in both branches as latent bugs and not report them to the user. To reduce such false negatives, we propose a merging mechanism which can join the preconditions of the specifications for the two branches so that this bug can be classified as a manifest bug.
On the other hand, the construction of dynamic specifications requires capturing specifications from multiple classes, which leads to path explosion for method calls. An under-approximating analyser will drop excessive specifications once the limit is reached. Although sacrificing precision, path dropping helps achieve scalability. Our merging mechanism can combine static specifications to form a more concise dynamic specification without losing path information. By doing this, we can slow down the path growth. Therefore, we enhance analysis precision via merging from two perspectives: 1) merging preconditions from error specifications to find more true bugs; and 2) merging static specifications to form dynamic specifications and slow down the path dropping.
Merging Mechanism We first defined c-hierarchy predicate in Definition 2 to model the class hierarchy in OO programs. Each c-hierarchy predicate has a tree-like structure where T is its root (superclass) with some subtrees (subclasses). A c-hierarchy predicate can model the full/partial class inheritance.
Definition 2
(c-hierarchyPredicate). A c-hierarchy predicate is a disjunctive set of objects in the following form:
Recall that \(x\mapsto {T}{\langle }{\bar{f}}{\rangle }\) indicates that x points to a heap object with exact type T. For \(T(\bar{f},\bar{D})\), T is the superclass name, \(\bar{f}\) are the field mappings from T, and \(\bar{D}\) is the predicates of some other classes directly extending T. The notation \(\texttt {++}\) is the appending operator. The subclasses (e.g., \(D_i\)) in a c-hierarchy predicate must always maintain the same state for field mappings inherited from the superclass (e.g., T). For example, \(x\mapsto {T_1}(1,\{{T_2()},{T_3(2)}\})\) means \(x\mapsto {T_1}\langle 1 \rangle \vee x\mapsto {T_2}\langle 1 \rangle \vee x\mapsto {T_3}\langle 1,2 \rangle \). A well-formed c-hierarchy predicate should respect the original class hierarchy from the program. Specifically, one c-hierarchy predicate must form a connected subgraph of the class hierarchy.
This rule merges two formulae where var points to either a subclass or superclass c-hierarchy predicate, where \(S \prec _d T\) means T is the direct superclass of S. The formula for the subclass S may contain an extra frame \(F'\) when var points to a subclass instance (e.g., the objects pointed by extension fields of subclasses). We tag this extra frame as \(F'_{@S}\) to denote that \(F'\) is exclusively owned by the Sc-hierarchy predicate after merging. Similarly, the formula for the superclass T may have already merged with some other direct subclasses \(\bar{S'}\). Hence, it may contain some other tagged frames \(F_{@\bar{S'}}\). These tagged frames will remain unchanged.
Note that the OO method’s specifications will include this object, which denotes the current object. We merge two specifications from the superclass and the subclass using the above
rule for both pre and post by replacing var with this. This merging rule only merges formulae with the same F. In other words, we only merge the superclass and subclass specifications under the same path condition. We keep the specifications separate if the pre or post cannot be merged.
Fig. 10.
A Merging Example
Merging makes the dynamic specification concise by simplifying a disjunctive form \({ P_1\vee P_2 }\) to \({ P_3 }\) such that \({ P_3 = (P_1{\vee }P_2) }\) without loss of information. In the OO context, this happens quite often as a subclass usually behaves very similarly to its superclass. We illustrate the merging through the example in Fig. 10. Both \({ DblA }\) and \({ C }\) extend \({ A }\) where \({ DblA }\) overrides the original methods with a backup field to store the original value in field val. We infer static specifications for the three classes respectively:
for \({ A }\);
for \({ DblA }\);
and
for \({ C }\). By using merging, the dynamic spec for \({ A:set }\) can be obtained as:
. Next, we define the generalised relaxed-manifest bug via merging.
Definition 3
(Relaxed-Manifest Bug). Let E be a mapping from error statement \({ S }\) to the set of error specifications terminated at it. Then, \({ S }\) denotes a manifest bug if point 1 holds and point 3 holds after the merging described by point 2:
1.
\(E({ S }) \ne \emptyset \) and \(\forall spec\in E(S).\ sat(post(spec))\)
Where \(\kappa \) is the heap formula representing the possible heap resources without pure constraints. \(\phi _{ty}\) represents the initial type constraints (constructed from the initial method signature) we mentioned earlier.
We require the postconditions in specifications to be satisfiable and E(s) is non-empty. \(E_{ pre }(s)\) is the set of formulae formed by merging the preconditions from all specifications in E(s) through the following steps repeatedly until the preconditions cannot be merged.
Step 1: Merge all vars in pres with the same path condition by
rule.
Step 2: Combine the merged formulae using the \(\vee \) calculus.
These steps are trying to check if an error happens in several paths. “Context-independent” bugs in the OO program should occur regardless of the types of input parameters as the types of input objects are the additional dimension of the calling context. In the actual implementation, we sometimes relax this requirement. If there is no \( instanceof \) or casting throughout the method, we will report a bug that occurs when the types of inputs are the same as the declared ones since programmers may not consider subclasses in this case.
Note that the merging for the dynamic specification formation and bug reporting are different. The former is the merging of multiple specifications across multiple methods (from different classes) while the latter happens within one method and we only merge preconditions. Both of them may need to use c-hierarchy predicates to represent heap objects.
5.2 Reporting Class-Cast-Exceptions
A statement \({ (C)\, e }\) could cause a CCE if
in the precondition. CCEs and NPEs share the following similarities: 1) The statement might not always trigger a runtime exception; 2) A guard can prevent the error (e.g., null checks for NPEs and \({ instanceof }\) checks for CCEs); 3) Without a guard, it’s difficult to determine if an error should be reported, as the programmer may intentionally omit it based on their design, leading to potential false positives.
Since NPEs and CCEs exhibit similar characteristics, we can adopt the same methodology used for NPEs when addressing CCEs. However, our experimental findings indicate that this approach results in minimal detection of CCEs in real-world projects. There are two potential explanations for this. Firstly, programmers might not experience CCEs like NPEs; for instance, they may not pass a manifest-error object with incompatible types to methods. Secondly, CCEs could arise from external libraries with inaccessible source code or through code reuse. Programmers may lack awareness of the complete class hierarchy, leading them to overlook certain input object possibilities while coding. Even though only a certain kind of inputs can lead to CCEs, they could be in the interests of the programmers. According to a prior survey [30], 50% of the casting operations are unguarded by the \( instanceof \) checking which risks the programs. Is the casting operation safe when the programmers are aware of using \( instanceof \) checking? Our primary thought is that if CCEs still occur when programmers realise to do type filtering by using \( instanceof \), it might be a mistake and we should alert the programmer about such a mistake. When we apply this strategy, we find some true CCEs in real-world projects, such as the examples in Fig. 1 and Fig. 7. We formally define the reporting criteria for CCEs in Definition 4.
Definition 4
(CCE Reporting Criteria). An
triple is reportable if: It ends at a casting operation \({ C\,(e) }\) and the postcondition is satisfiable; and
\({ e }\) is an initialized object such that \( ty(e)=C' \) and \( C' {\not \prec :} C \); or
An \({ instanceof }\) operator has been applied on e before the casting operation.
6 Implementation and Evaluation
Implementation. We build ILoop inside Infer’s framework (version id: 5050294) with an additional 10K lines of OCaml codes. We utilise Infer’s bi-abductive entailment solver to compute missing formulae and frames. ILoop is an under-approximating analyser for finding bugs in Java programs. It performs compositional reasoning and generates IL triples for error reporting. In particular, ILoop includes a function \(compute(p, T_{sp}, { mn }(\bar{C}~\bar{o}))\) as the predicate transformer. Given a method \({ mn }\), this function takes the initial precondition p mentioned in Sect. 4.2 and the specification table as inputs. It then applies the inference relations in Sect. 4 to infer the preconditions and the postcondition \(\epsilon '{:}~Q^\prime \) of \({ mn }\). Given a Java program, ILoop first generates static specifications for methods and then, ILoop reports bugs on error triples if they satisfy the criteria in Sect. 5. The dynamic specifications are computed on-demand to save resources i.e., ILoop infers dynamic specifications for a method only when the method is dynamically dispatched and called somewhere. The inferred specifications are stored in \(T_{sp}\).
To reduce the possible high cost of satisfiability checking when merging formulae for error reporting, we design some heuristics which can identify a subset of bugs in the proposed mechanism. We inspect errors likely to manifest after merging, i.e., the error specifications occupy a large portion of paths when there is no path dropping. We use syntactic checks to filter pairs of triples that are more likely to be merged successfully. Using these heuristics, ILoop keeps more informative IL triples to assist with reportable bugs.
Evaluation. To conduct the experiments, we select a set of real-world programs as our benchmarks. In particular, the benchmarks are from a test case repository developed and maintained by Meta/Infer developers [17], Apache projects[1] and some popular code repositories which receive thousands of stars on Github. This Infer’s repository contains challenging test cases and is accumulated in a real-world codebase. Some are for regression testing, and others for designing and testing new features of its tools, such as Pulse. The latter is beyond its capability, such as detecting CCEs. The experiments are designed to answer the following three research questions (RQ):
RQ1: Is our approach capable of detecting CCEs in OO programs?
RQ2: Are the detected CCEs containing false positives?
RQ3: How does ILoop compare in performance with the state-of-the-art tool for detecting NPEs.
Table 1.
CCEs Reported by ILoop and Error Prone. CCEs: number of CCEs reported. Fixed: the number of CCEs has been fixed according to the commits. Risky: the number of risky CCEs that have not been fixed in any commits. T: running time in seconds. The numbers in
indicate the false positives reported by ILoop.
Dummy
To answer RQ1, we summarize the experimental results on Table 1. Firstly, we compare the reported results with the Github commits. ILoop reports 24 CCEs in total and 8 (33.3%) are corrected by the developers. We examine the rest of the reports and find another 12 reports risky, especially when the code is used by someone unaware of the entire class hierarchy. Secondly, we compare ILoop with Error Prone (version 2.32.0), a popular static analyser developed by Google [2] for Java programs. Error Prone detects bugs through pattern recognition [3] and alerts users when the written code matches the pre-defined error patterns. Error Prone has reported four bugs which are the subset of ILoop’s reports. One of the four is fixed by the developers while the other three match our risky reports. The results show that ILoop could effectively find more meaningful bugs in real-world programs.
Fig. 11.
A (Simplified) False Positive Reported by ILoop [5]
To answer RQ2, as Table 1 shows, we conclude that there are 4 false positives. As the rules for reporting CCEs are designed to avoid false positives, the false positive rate is fairly low (16.7%). We manually investigate the reports, such as by referring to the developer’s comments or using semantic analysis. We find that although some bugs can be syntactically triggered, they may not be of users’ interests. Hence, we mark them as false positives. We show a false positive in Fig. 11. According to our proposed reporting strategy, line 7 may contain a casting error. This is because ILoop finds out that there exist some types that are neither the subtype of \({ AnnotatedMethod }\) nor \({ AnnotatedField }\) for object \({ mutator }\). Hence, casting at line 7 could be risky. The developers seem aware of this issue and wrote a comment on line 6. The comment mentions that they should verify the correctness of this casting. However, the code has not been changed since the creation of this comment. Hence, \({ mutator }\) may not be an instance from the dangerous classes in an actual execution. It could be semantically safe.
Table 2.
NPEs Reported by ILoop and Pulse. Op: the overlapping reports by both tools. \(T_{ OO , PL }\): running time in seconds. -FP: the number of false positives reduced by ILoop. +TP: the number of additional true bugs found by ILoop. +FP: the additional false positive reported by ILoop. -TP: the missed true bugs by ILoop. The commit ID of jackson-databind is 4a40123.
Dummy
To answer RQ3, we compare ILoop with Pulse (Infer version id: 5050294, July 2023). The results of our experiments are shown in Table 2. We analyse the bugs reported by both tools, categorizing them as true or false positives. Focusing on non-overlapping reports to highlight the differences between ILoop and Pulse, we find that ILoop eliminates an average of 16.9% of Pulse’s false positives and identifies 10.1% new true positives. Together, these improvements lead to a 27.0% increase in precision. The missed true bugs and newly introduced false positives represent a small fraction of ILoop’s reports and both tools exhibit similar running times. Overall, our findings demonstrate that our approach effectively enhances bug-finding precision.
7 Related Work and Conclusion
Incorrectness Logic. Applications of IL have been investigated in different domains, such as finding memory errors in large C projects [20], detecting data race/deadlock in concurrent programs [36], verifying quantum while-programs [15], detecting logical bugs in quantum programs [39], detecting forbidden graph structures and failing executions [34]. Similar to IL, other recent logics focusing on under-approximating reasoning include local completeness of abstract interpretation [11], outcome logic [40], and exact separation logic [29]. Unfortunately, none of them supports class inheritance and method overriding, except for [26]. [26] proposes a verification system for upholding Liskov substitution principle (LSP) in under-approximating reasoning. However, specifications must be manually provided in this system, and the lack of automation could limit its practicality in analysing large projects. Moreover, their work focuses on verification. It is hard for users to know if an error specification is risky or likely harmless. Our reporting criteria remedy this by only reporting dangerous error specifications automatically.
Formal Verification for OO programs. OO program verification via over-approximation has been extensively studied in various works: Verifying objects through dynamic frames to handle aliasing problem [18]; using supertype abstraction for concise and modular reasoning [21‐23, 28]; using separation logic and abstraction predicate for reasoning about abstract datatypes [32, 38]; using class invariant to ensure the functional correctness of programs [9, 16, 25]. Later, two independent papers [13, 33] propose the co-existence of static/dynamic specifications for OOP to uphold LSP while avoiding re-verification. Following the landscape of the proposals in [13, 26, 32, 33], we propose our system for IL static/dynamic specification inference in OO programs.
Bugs in OO Programs. NPEs and CCEs are common bug types in OO programs. Error-prone is a pattern-based bug detector[2]. It supports CCE detection, but only finds CCEs in a specific way via pattern recognition [3]. In our work, we thoroughly study how to detect possible CCEs and our ILoop can effectively find more bugs. On the other hand, ILoop also outperforms another state-of-the-art Pulse in terms of finding NPEs as we model the OO features in our approach, such as class inheritance and method overriding. DOOP framework [10] performs pointer analysis for Java programs using Datalog, which potentially discovers CCEs when pointers are cast improperly. However, DOOP’s analysis is not fully modular. It requires a \( main \) method as an entry point, and only pointers initialised can be checked. Such scenarios are the subsets of our CCE reporting criteria. DOOP could not find the errors like Fig. 1, Fig. 7.
Specification Inference via Bi-abduction. Bi-abduction [12] is a form of logical inference for separation logic that automates local reasoning. Bi-abduction generates pre/post based on frame and anti-frame formulae inference. Like the prior tool Pulse-X, we also make use of the bi-abduction technique in our specification inference process. Moreover, we incorporate type information analysis, which enables our tool to support class inheritance and method overriding. In addition, we propose the merging mechanism to support generalised error reporting, which improves the bug-finding precision.
Conclusion. Motivated by the question “How to generically and automatically infer IL specifications for object-oriented programs?”, we demonstrate that carrying type information is crucial. Type constraints reveal runtime type possibilities, enabling static analysis of dynamic behaviours. Our system reasons about casting operations and infers static/dynamic specifications to effectively identify bugs in OO programs. Specifically, we formalise the inference relations to guarantee the validity of our inferred specifications. We also provide novel insights into bug reporting for OO programs, supporting both NPE and CCE detections through sound reasoning. Our approach establishes a formal foundation for IL-based bi-abductive inference in OO programs.
Acknowledgment
This work is supported by a Singapore Ministry of Education (MoE) Tier 3 grant “Automated Program Repair”, MOE-MOET32021-0001. We thank anonymous reviewers for their insightful comments.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.