1 Introduction
-
Adopting an existing library summarization technique for CG generation through a novel stitching algorithm
-
Revisiting existing evaluation methodologies with a focus on correctness, scalability, and memory consumption designed particularly to assess the practicality of our proposed approach
-
Extensive evaluation on a real-world Maven sample
2 Background
2.1 Terminology
groupId:artifactId
.groupId:artifactId:version
.groupId
). While the term project may also be used to refer to artifacts or versioned packages in other contexts, in this study, we consistently use the term project to represent the domain holding a groupId
and containing packages.2.2 Related Work
3 Approach
3.1 Resolving Dependencies
0
and 1
of dep1
depend on dep2:0
. We also demonstrate a time-sensitive dependency resolution in this example, where app:0
specifies a version range dependency on com.example:dep1:[0, 1]
. This results in a time-sensitive dependency resolution for app:0
. For example, if the resolution is performed at time t1
when the latest released version of dep1
is 0
, then dep1:0
would be included in the dependency set. Conversely, if the resolution is done at time t2
after version 1
of dep1
has also been released, then the result would include 1
. This time-sensitive dependency resolution highlights the contextuality, the first challenge that we mentioned earlier.Dep1
invokes the target()
method from Dep2
. In a more complex scenario, Dep1
extends Dep2
and overrides the m1()
method. In the class App
, the object dep
is used to call m1()
. Depending on the specific type of variable dep
, any overridden m1()
from sub-classes of Dep2
could be invoked. The exact type of dep
depends on the time of dependency resolution. For example, if the dependency resolution occurs in t1
, we use dep1:0
, and the implemented create()
method in this version returns a Dep1
instance, resulting in a call to Dep1.m1()
in the App
class. However, in t2
, when version 1
is used, an instance of Dep2
is returned, and thus the callee of this call would be Dep2.m1()
. In this example, the type Dep2
is used to declare the variable dep
, which is referred to as the "receiver type". Determining the exact type of the receiver object is a challenging problem that different algorithms attempt to narrow down to varying extents. The CHA
algorithm generates a sound CG by including all potential edges from all subtypes, such as Dep2
and its subtypes, including Dep1
.
3.2 Requesting or Creating PCGs
groupId
, artifactId
, and a version
, which uniquely identifies a package within the whole Maven ecosystem. If the desired PCG is cached, it can be directly returned, which eliminates redundant processing. However, if the PCG is not yet available in our database, it has to be created first and then added to the cache for future use..jar
file) of the dependency in question from the Maven repository. We then use an existing static analyzer such as OPAL (Eichberg et al. 2018; Helm et al. 2020) to build a CG for this isolated versioned package and transform it into a PCG. Different static analyzers may have different configurations and options that allow users to adjust the coverage, accuracy, etc. OPAL for example is highly configurable. We have used it to generate PCGs and, for that, we configure it for library mode, which uses all public methods as entry points in the analysis. OPAL considers that non-private classes, fields, and methods are accessible from outside, non-final classes are extendable and non-final methods are overridable. We have also enabled rewriting of invokedynamic
instructions to make them easier to analyze. All other configuration options are left at their default values. The complete configuration can be found in our open-source repository to help others reproduce our results.App
in the Java package apppackage
would be referred to as /apppackage/App
. We do not store parameters of generic types because due to the type erasure (Bracha et al. 1998) of Java, it is not possible to reason about them. We differentiate between types that have been declared inside the current versioned package (internal) and those in dependencies (external).Object.equals(Object)
would be stored as equals(/java.lang/Object)/java.lang/BooleanType
. For conciseness in Fig. 3 we eliminated the /java.lang/VoidType
from the methods with no return type.dep
is declared, and in the second one, it is used to call method m1()
. The receiver type of the first CS is /dep1package/Dep1
and it is used to call the create()
method. In the second CS, /dep2package/Dep2
is the receiver type since it is the type of the variable dep
. The first call uses static
invocation while the second one uses virtual
invocation. For each CS, we also store the signature of the target method. In the previous example, the first call’s signature is create()/dep2package/Dep2
and the second one is m1()
. As mentioned before, we use existing tools, more specifically already implemented CHA algorithms to build the PCGs. We only need static information, and existing frameworks and algorithms preserve this information and can be used in this phase. Only receiver type, invocation instruction, and target signature are required, which are available in the bytecode itself without any additional analysis. One could even read the bytecode and build the PCGs directly without using any third-party tool. However, this is beyond the scope of this study. We aim for a lightweight approach that can be easily used on top of existing tools.3.3 Inferring Global Type Hierarchy
app:0
, dep1:0
, dep2:0
} hence \(GTH = TH_{app:0} \cup TH_{dep1:0} \cup TH_{dep2:0}\). All unresolved external types contained in the THs will appear as an internal type in one of the other versioned packages. For convenience and efficient traversal of type hierarchies, we transform this information into multiple index tables. These index tables are shown in Fig. 4 for the example versioned packages. Every type is indexed using its full name since this name should be unique in the classpath of the program otherwise it cannot be compiled. We create three different index tables. One table stores the parents of each type based on the inheritance order. For example, the class /dep1package/Dep1
directly extends /dep2package/Dep2
therefore the first parent that appears in the parent list of /dep1package/Dep1
is /dep2package/Dep2
. This sequence continues for each type until we reach the /java.lang/Object
class. After the list of all parent classes, we also append a set of all interfaces that a type or any of its parents implement. This is because Java always gives precedence to classes. Also, the order of super interfaces that are appended to the list of parents does not matter because there cannot be two interfaces with default implementations of the same signature in the parent list of a type. The parent index table is not directly used in the stitching phase. It is used to facilitate creating the children index and defined methods index. Another index that we create is a list of all children of a type. We identify all types that extend or implement a given type, including indirect relationships through inheritance. Indirect relationships occur when a type’s ancestor extends/implements the given type, such as a grandparent. This set also does not keep any order. The final index in the GTH is the list of methods that each type defines or inherits. The signatures of these methods are then used as another index to find in which versioned package they are implemented in. I.e. if a method is not implemented in the current type itself we refer to its first parent that implements it. We use the ordered list of parents in the parent index to retrieve this information efficiently. For example, as shown in Fig. 4/dep1package/Dep1
inherits method m2()
from its dependency dep2:0
. Note that since /dep1package/Dep1
overrides m1()
we point to dep1:0
as the defining versioned package of this signature.3.4 Stitching the Final CG
invokestatic
, invokevirtual
, invokeinterface
, invokespecial
, and invokedynamic
). Depending on the invocation type, Frankenstein selects the correct resolution strategy for the call when processing the CSs. While different invocation types are handled differently within JVM we treat them in two categories of dynamic dispatch (invokevirtual
, invokeinterface
) and regular dispatch (invokestatic
, invokespecial
) calls in our algorithm. In the following, we briefly explain these invocation types and how we handle them.
invokestatic
is used for regular static method calls. This case can be directly resolved by adding an edge from the surrounding method to the static target method from the instruction. As long as the dependency set is a valid resolution, the target type will exist in the GTH and it will always have a matching method declaration signature either implemented in the type itself or one of its parents. A static call T1.m()
has to have an implementation of m
in T1
or one of its parents. If this is not the case, the resolved dependency set is invalid and not executable.invokespecial
instruction is used for calling non-overridable methods, like private methods, super methods, and constructors. Similar to static invocation, the target type of the CS is in the GTH and can look up the method with a matching signature to draw an edge. For example, to resolve the constructor call T2.<init>()
, we lookup T2
and search for a parameterless constructor.invokevirtual
and invokeinterface
, are the most challenging to resolve. In Java, the corresponding CSs point to a base type or an interface, but during runtime, the receiving instance could have any possible subtype of this base type. For example, even if the bytecode contains an invocation of the Object.hashCode()
method, the receiver type might be any type in the GTH that overrides the hashCode
method. Similarly, while the bytecode might contain a reference to T3.hashCode()
, T3
might not override the method, but inherits it from its superclass, in which case the correct target of the invocation is the method declared in the superclass. Frankenstein supports both cases and searches for matching method signatures in all subtypes of the target and draws edges to those that can be called. In case there is none, it finds the first supertype that implements this signature.invokevirtual
in the JVM. Some optimizations are performed by JVM based on which instruction is used in the bytecode but this does not affect our approach. These optimizations are done on the virtual method table of JVM which we do not use in our approach. Moreover, for invokeinterface
, it is possible to encounter target methods that are defined outside of the hierarchy of the target type. For illustration, consider a base type B
that implements m
and an interface I
that defines m
as well. A subtype S
implements I
and extends B
. As m
is inherited, it is not necessary to implement it. For a CG generator, this is a challenging case to handle, as the invokeinterface
instruction points to I
, while the correct target is B.m
, despite B
having no relation with I
. Frankenstein can handle these cases.invokedynamic
instruction to support alternate programming languages for the JVM that might use more advanced invocation logic. Resolving such invocations is the most challenging part of CG generation tools. Existing frameworks usually have limited support for these invocations since they require very expensive operation due to their highly dynamic nature. Some frameworks such as OPAL can rewrite these invocations using other invocation instructions. We do not have special handling for these invocations in the stitching algorithm. However, if the framework that we use for PCG construction has the feature to rewrite them we utilize it to handle them automatically without special reasoning. As an illustrative example, consider a scenario in which the OPAL framework employs the invokevirtual
instruction to transform a particular CS that was initially an invokedynamic
CS. During the bytecode analysis for PCG creation, we store the modified version of the CS. Subsequently, in the stitching phase, we utilize the dynamic dispatch handling technique that was previously explained.App.main
with pc = 2
. This invocation uses virtual instruction; hence, we need to use the dynamic dispatch category of handling for it. For that, we first query the children index of the GTH to retrieve all children of dep2package/Dep2
. The result of this query is /dep1package/Dep1
. Thus we add this child type to the list of receivers. In the next step we query the defined methods within the /dep1package/Dep1
and /dep2package/Dep2
using the defined methods index of GTH. Having this information we can easily ask for the exact location of the target method. More specifically we query the result of the previous step for the method signature of m1()
. This will identify the two places where m1()
is defined and are thus potential targets of this call. One target is the implementation within dep1:0
inside Dep1
class and the other one is implemented in dep2:0
in the Dep2
class. The next CSs will be also processed similarly (see Fig. 5). After all, PCGs have been processed, the resulting CG is ready and can be used for further static analyses.4 Evaluation
-
RQ1: How accurate are Frankenstein’s CGs?
-
RQ2: How Fast is Frankenstein?
-
RQ3: Is Frankenstein generalizable?
-
RQ4: How much memory does Frankenstein require?
4.1 Creating a Representative Dataset
groupId
and artifactId
) from the Maven repository and then randomly select one version for each package. We use Shrinkwrap (Shrinkwrap resolvers 2023) to resolve the complete dependency sets for the selected versioned packages. Shrinkwrap is a Java library that re-implements Maven’s built-in dependency resolution. During the dependency resolution process, we face “missing artifact on Maven” errors in some cases. In some other cases, dependency resolution results in an empty dependency set, which either means that they were POM projects and do not contain code, or they do not include any dependencies. When we face such cases, we discard them and select another random versioned package until we have 50 fully resolved dependency sets. These dependency sets include a total of 1044 versioned packages (906 unique versioned packages). More characteristics of these versioned packages are shown in Fig. 6. On average, the selected versioned packages have 20 dependencies and consist of 113 files (6051 including dependencies).Linux
build jobs on GitHub.2 To achieve fairness when comparing two approaches, we remove versioned packages for which one of the approaches failed due to OutOfMemoryException. All statistics that we report in the research questions are therefore generated for the dependency sets for which both approaches successfully generated CGs.
4.2 RQ1: How Accurate are Frankenstein’s CGs?
-
How does stitching affect the precision of the CGs?
-
How does stitching affect the recall of the CGs?
A.src()
, OPAL finds three targets: B.t1()
, C.t2()
, and D.t3()
, while Frankenstein only finds two targets: B.t1()
and X.x()
. The intersection set for A.src()
only contains B.t1()
; hence, the precision is \(\frac{1}{2}\) and the recall is \(\frac{1}{3}\).Mean | Std. Deviation | Median | |
---|---|---|---|
Precision | 92.7% | 24.2% | 100% |
Recall | 94.7% | 21.4% | 100% |
LF | # Cases | \(\Box _{{O}{F}}\) | \(\Box \) | \(\boxtimes \) | \(\Box _{O}\) |
---|---|---|---|---|---|
CFNE | 4 | 2 | 0 | 0 | 2 |
CL | 1 | 0 | 1 | 0 | 0 |
CSR | 4 | 0 | 1 | 3 | 0 |
DP | 1 | 0 | 1 | 0 | 0 |
ExtSer | 3 | 3 | 0 | 0 | 0 |
J8DIM | 6 | 0 | 0 | 6 | 0 |
J8SIM | 1 | 0 | 0 | 1 | 0 |
JVMC | 5 | 1 | 4 | 0 | 0 |
Lambda | 4 | 4 | 0 | 0 | 0 |
LIB | 5 | 1 | 1 | 2 | 1 |
LRR | 3 | 0 | 0 | 3 | 0 |
MR | 7 | 7 | 0 | 0 | 0 |
NVC | 5 | 1 | 0 | 3 | 1 |
Ser | 9 | 2 | 4 | 3 | 0 |
Serlam | 2 | 0 | 2 | 0 | 0 |
SI | 8 | 3 | 0 | 0 | 5 |
SPM | 7 | 0 | 0 | 7 | 0 |
TC | 6 | 0 | 0 | 6 | 0 |
TMR | 8 | 0 | 0 | 8 | 0 |
TR | 9 | 0 | 1 | 6 | 2 |
Unsafe | 7 | 0 | 0 | 0 | 7 |
VC | 4 | 0 | 0 | 4 | 0 |
Sum | 109 | 24 | 15 | 52 | 18 |
-
\(\Box _{{O}{F}}\): both OPAL and Frankenstein can generate a sound CG for the test case.
-
\(\Box _{O}\): only OPAL can generate a sound CG for the test case.
-
\(\Box \): neither OPAL nor Frankenstein can generate a sound CG for the test case.
-
\(\boxtimes \): none of the two approaches was able to generate a CG for the test case.
4.3 RQ2: How Fast is Frankenstein?
-
For each versioned package in D, we generate a CG using OPAL and parse it into a PCG.
-
We create the CG Cache using the PCGs.
-
For each versioned package in D, we fetch PCGs from the CG Cache and then stitch them.
Mean | Std. Deviation | Median | |
---|---|---|---|
Frankenstein\(_{initial}\) | 18.1 | 21.2 | 10.2 |
Frankenstein\(_{cached}\) | 5.0 | 7.4 | 2.2 |
OPAL | 6.0 | 6.6 | 3.2 |
4.4 RQ3: Is Frankenstein Generalizable?
Mean | Std. Deviation | Median | |
---|---|---|---|
Precision | 98.9% | 0.07% | 100% |
Recall | 97.5% | 11.0% | 100% |
Mean | Std. Deviation | Median | |
---|---|---|---|
Frankenstein \(_{initial}\) | 119.8 | 147.8 | 65.0 |
Frankenstein \(_{cached}\) | 10.2 | 3.3 | 9.8 |
WALA | 16.6 | 11.4 | 12.7 |
DS | Mean | Std. Deviation | Median | |
---|---|---|---|---|
OPAL-based | PCGs | 285 | 299 | 159 |
GTH | 152 | 146 | 93 | |
WALA-based | PCGs | 272 | 301 | 173 |
GTH | 116 | 102 | 88 |
4.5 RQ4: How Much Memory Does Frankenstein Require?
5 Discussion
6 Threats to Validity
invokedynamic
. We use OPAL’s invokedynamic
rewrite feature, which rewrites this type of method invocation in the form of other invocations. However, some frameworks may not support the invokedynamic
rewrite feature, requiring the implementation of heuristics to support invokedynamic
when adopting our approach. Nonetheless, this and other corner cases that we reported earlier in this paper result in minor accuracy penalties that can be mitigated if developers of existing frameworks adopt the Frankenstein solution within their frameworks. They can reuse the same treatment that they already have in their toolbox for these cases while benefiting from the advantages of Frankenstein.